This project investigates the dynamics of online conversations on Reddit using various techniques such as sentence embedding, PCA, K-Means clustering, and Markov models. By reducing the dimensionality of comment embeddings and clustering them into discrete states, we model transitions between these states using higher-order Markov processes. The project includes visualizations of these state transitions through network graphs, providing a comprehensive view of conversational flow and dynamics over time.
- Sentence Embedding: Convert Reddit comments into dense vector representations.
- PCA: Reduce dimensionality of the embedding vectors for easier analysis.
- K-Means Clustering: Group the PCA-reduced data into discrete states.
- Markov Models: Model transitions between states using higher-order Markov processes.
- Visualization: Create network graphs and time series plots to illustrate the dynamics of conversations.
- Clone the repository.
- Run
pip3 install -r requirements.txt
. - Get your credentials from Reddit and add them to
credentials.py
. - Edit
config.py
. - Run
python main.py
.
An interesting question here is whether different subreddits or conversations demonstrate different distributions or not. Why?
This highlights the fact that conversations include memory and can be described using appropriate ARIMA models.