Skip to content

This project explores the dynamics of online conversations on Reddit. By comment embeddings and clustering them into discrete states, we model the transitions between these states using higher-order Markov processes offering a clear picture of conversational flow and sentiment dynamics over time.

Notifications You must be signed in to change notification settings

AliakbarMehdizadeh/ConversationDynamic-Reddit

Repository files navigation

Markov Model-Based Reddit Conversations Flow

This project investigates the dynamics of online conversations on Reddit using various techniques such as sentence embedding, PCA, K-Means clustering, and Markov models. By reducing the dimensionality of comment embeddings and clustering them into discrete states, we model transitions between these states using higher-order Markov processes. The project includes visualizations of these state transitions through network graphs, providing a comprehensive view of conversational flow and dynamics over time.

Features

  • Sentence Embedding: Convert Reddit comments into dense vector representations.
  • PCA: Reduce dimensionality of the embedding vectors for easier analysis.
  • K-Means Clustering: Group the PCA-reduced data into discrete states.
  • Markov Models: Model transitions between states using higher-order Markov processes.
  • Visualization: Create network graphs and time series plots to illustrate the dynamics of conversations.

Usage

  1. Clone the repository.
  2. Run pip3 install -r requirements.txt.
  3. Get your credentials from Reddit and add them to credentials.py.
  4. Edit config.py.
  5. Run python main.py.

Output Samples:

Higher-order network representation of conversation flow incorporating memory:

Screenshot

Time sereis Representation of Conversation over Embedding Space:

An interesting question here is whether different subreddits or conversations demonstrate different distributions or not. Why?

Screenshot Screenshot

Autocorrelation Function and Partial Autocorrelation Function of Sentence Embedding:

This highlights the fact that conversations include memory and can be described using appropriate ARIMA models.

Screenshot

About

This project explores the dynamics of online conversations on Reddit. By comment embeddings and clustering them into discrete states, we model the transitions between these states using higher-order Markov processes offering a clear picture of conversational flow and sentiment dynamics over time.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages