Skip to content

Prediction of the 2020 U.S. presidential election using semantic analysis, sentiment analysis, and statistical methods to correct for sampling biases

Notifications You must be signed in to change notification settings

mwiecksosa/predicting2020

Repository files navigation

Prediction of the 2020 U.S. Presidential Election

  • Summary: We used historical exit polls and Twitter data to predict the 2020 U.S. presidential election by using semantic analysis, sentiment analysis, and statistical methods to correct for political sampling biases (as the views of Twitter users do not necessarily reflect those of the general population).
  • Main Result: By correcting for political sampling biases based on pre-2016 election Twitter data and 2016 exit polls from 20 states, we correctly predicted 9/9 non-swing states and 6/11 swing states based on pre-2020 election Twitter data from those 20 states. See the Corrected by Political Bias tab on the website made by our teammate Jiawei Tang, and click any state to see the results.
  • Conclusion: Predicting elections is hard. Looking back, there are many things we would do differently. Also, there are many questions left to answer. For example, what's the best way to account for changes in the Twitterverse from 2016 to 2020? How should we incorporate information from the 2018 midterm elections? Overall, this project was an amazing learning experience. We covered the full stack, from initial data collection to final vote predictions, in less than two months so that we could submit our predictions before election day on November 3, 2020. If given more time, there are other approaches we would have liked to try, such as using large language models like OpenAI's GPT-3 instead of "old school" NLP techniques from 2010-2020.
  • Ethical Reflections: Political campaigns probably need to use demographic information for strategy, research, and ad microtargeting to stay competitive. That being said, I'm not exactly thrilled by how certain companies monetize user demographic information.
  • Joint work by the following graduate students at UIUC for the course project of Professor Kevin Chen-Chuan Chang's Fall 2020 course Listening to the Social Universe:
    • Naina Balepur: name data processing and demographic bias corrections.
    • Jiawei Tang: website and bot detection.
    • Ziqi Wang: tweet sentiment analysis and user labeling.
    • Michael Wieck-Sosa: data collection and vote predictions.

Acknowledgements

  • We would like to thank Professor Kevin Chen-Chuan Chang and the graduate teaching assistant Hongtai Cao for their guidance.

About

Prediction of the 2020 U.S. presidential election using semantic analysis, sentiment analysis, and statistical methods to correct for sampling biases

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •