DS4A_Team_68_Covid_Tweet

An Analysis of Twitter Comments During Covid-19

Can we predict public sentiment as new Covid-19 variants emerge?

Two years after the first recorded outbreak of Covid-19, the world continues to deal with constant changes and uncertainties within their everyday lives as variants take hold of public health and policy. The pandemic has placed a significant amount of stress on individuals, from job security and economic stability to overall health. Popularity of social media sites has increased in the last 15-years, and as it serves many purposes, one of which is to be used as a large platform for people to be heard and seen.

By analyzing user posts across Twitter, we hope to uncover a trend within individual sentiments regarding their outlook on the world and their own lives. Has there been a significant shift in perspectives from when the first outbreak began, to the Delta variant, and now Omicron? Can we predict where public sentiment will shift, as new variants continue to surface? Can an increase of awareness for resources be timed in such a way that provides more help during the hardest hit periods?

Background

In this project, we explored and analysed Covid-19 posts within the first 10-day window of each variant to find interesting discoveries in the tweets during the pandemic. Data analysis was performed on the numerical features related to Twitter account users' and their posts. The module VaderSentiment, was utilized in performing Natural Language Processing (NLP) Sentiment analysis. Once patterns and trends were uncovered, a web application was created highlighting the findings through the following:

The Dataframe
Sentiment
Heat Map
Word Cloud
Prediction

A user-friendly application was created by using an open source package called Streamlit.

Data Collection and Pre-Processing

Source - Covid-19 Twitter Chatter dataset from Zenodo and Tweeter API Collection -Utilized Twarc2 package to transform list of tweet_ids to full tweets by hydration and selected posts within a 10-day window of each variant (Beta, Delta, & Omicron)

Tabular Pre-processing : Extracted each hour, renamed columns, created variant name by date, & parse data types NLP Text Pre-processing : Applied lowercase, removed punctuations, URLs, & stop words, tokenized and lemmatized tweet text

Web Application on Heroku

Data Frame

The entire data frame is displayed by default

User can adjust how many rows to show
Filter options: Variant (Beta, Delta, & Omicron) user-selected columns, specific date range,
Posts can be filted by user-input keyword search
Correlation map between numerical columns
Scatterplots based on user-selected columns

Correlation map of numerical columns

Observed positive correlation between like_count and reply_count, like_count and retweet_count

Scatterplot of like_counts vs re_tweets_counts

Heat Map

Sentiment scores of all text, aggregated into one value (mean), across hour (x-axis) & variant (y-axis)
Radio button 'no' allows the user to input a text for filtering and display the heat map on the subset of sentiment scores

Radio button 'no' allows the user to input a text for filtering and display the heat map on the subset of sentiment scores

Sentiment

A bar-plot highlighting sentiment across each variant (Beta, Delta, & Omicron
Trend - More shares of positive sentiment for the Beta & Delta variants
Time series aggregated by selected time intervals & selection of variant

Prediction

Used VaderSentiment module to extract sentiment scores
Classify scores into 3 categories: Positive (green), Negative (red), and Neutral (gray)
Sentiment analysis based on user-input text, where score & classification are returned

Score ranges from -1 to 1 with the following ranges:

 - Neutral (-0.5, 0.5), Positive (0.5, 1), & Negative (-1, -0.5)

Word Cloud

The more frequent the word was used, the larger the size of the word display
Each text has a sentiment score highlighted by color:
- Red for negative
- Green for positive
- Gray for neutral
Slider allows user to define the number of words to display

Word Cloud with 75 top words

Words like vaccine, health, life are averagely positive sentiments, whereas words like death, risk, lockdown words are averagely negative sentiments and covid case, new covid, pandemic, covid are averagely neutral sentiments.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
images		images
LICENSE		LICENSE
Procfile		Procfile
README.md		README.md
covid_tweet_main.py		covid_tweet_main.py
dataframe.py		dataframe.py
heatmap.py		heatmap.py
home_page.py		home_page.py
prediction.py		prediction.py
requirements.txt		requirements.txt
sentiment.py		sentiment.py
setup.sh		setup.sh
style.css		style.css
top_wordcloud.py		top_wordcloud.py
tweets.db		tweets.db
tweets_EDA_clean.csv		tweets_EDA_clean.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DS4A_Team_68_Covid_Tweet

An Analysis of Twitter Comments During Covid-19

Can we predict public sentiment as new Covid-19 variants emerge?

Background

Data Collection and Pre-Processing

Web Application on Heroku

Data Frame

Correlation map of numerical columns

Scatterplot of like_counts vs re_tweets_counts

Heat Map

Sentiment

Prediction

Word Cloud

Word Cloud with 75 top words

Word Cloud with 30 top words

About

Releases

Packages

Languages

License

ofunkey/DS4A_Team_68_Covid_Tweet

Folders and files

Latest commit

History

Repository files navigation

DS4A_Team_68_Covid_Tweet

An Analysis of Twitter Comments During Covid-19

Can we predict public sentiment as new Covid-19 variants emerge?

Background

Data Collection and Pre-Processing

Web Application on Heroku

Data Frame

Correlation map of numerical columns

Scatterplot of like_counts vs re_tweets_counts

Heat Map

Sentiment

Prediction

Word Cloud

Word Cloud with 75 top words

Word Cloud with 30 top words

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages