Data Science project using Natural Language Processing (NLP), APIs, web-scraping and supervised machine learning to predict team performance based on 370,000 tweets. Project aims to understand what users tweet when the team wins vs when they lose/tie and what insights can be derived from this. Most information pertaining to Twitter users has been anonymized in Excel. For questions or feedback, please feel free to reach out to me at bbellman95@gmail.com
Key Findings:
- Classification models identified 80% of tweets correctly.
I. Data Source
- Team_Stats -- Data had to be properly formatted from this website
- User_Tweets_API -- Some Data was accessed using the Twitter API however, it was found that the Twitter.
- User_Tweets_Web_Scraping -- Data comes from Shopify store that specialized in fitness apparel. Please see link above for Shopify's instructions to export.
II. Jupyter Notebooks
- Social_Sentiment_Analysis -- Notebook Containing Full Project. Notebook is divided in 4 sections.
- Data Wrangling and Exploration:
- Exploratory Data Analysis:
- Pre-Processing and Modelling.
- Findings and Reccomendations.
- Twitter_Scraper -- Program created to run multiple queries and store data from the Twitter_Scraper.
- Positive_Negative -- Notebook to give Positive and Negative scores for each Tweet .
III. Supporting Documentation
- Final Report -- Report detailing steps undertaken and key findings.
- Presentation -- Presentation on project purpose, steps undertaken and results
- Metrics File -- Excel file with model outputs and metric evaluation.
Special thank you to:
- Ben Bell, Springboard Mentor.
Contributions are always welcome!
See contributing.md for ways to get started.
Please adhere to this project's code of conduct.
