Code to reproduce the social dimensions and analyses from the 2021 paper "Quantifying social organization and political polarization in online platforms" by Isaac Waller and Ashton Anderson.
- Python 3.x
- Spark and
pyspark
pandas
- Software that can run Jupyter notebooks
- Load the
social-dimensions.ipynb
notebook. - Run all cells in the notebook.
- Resulting scores for all communities will be saved in the
scores.csv
file, as well as thescores
Pandas DataFrame in the notebook for you to explore.
See scores.csv
from the repository for full example output, which this code should reproduce exactly.
- You will need to first download the Pushshift data (see script
commembed/data/download.sh
) and then import it to parquet format (see scriptcommembed/data/import_data.py
). - Notebooks to generate all the plots are in the
notebooks
folder. They are ordered because some notebooks generate data that later notebooks depend on.
If you use any data or code from this repository, please cite our paper:
Waller, I., Anderson, A. Quantifying social organization and political polarization in online platforms. Nature 600, todo-todo (2021). https://doi.org/10.1038/s41586-021-04167-x
If you have any questions, please contact us.