📍 Geocoded Tweets Insights Dashboard 📉 📈 📊

This is an interactive, configurable and generic dashboard. It helps in visualizing some key insights from tweets of country-specific or global users.

The application is built using Plotly Dash.

Detecting influential users, extracting local and global news and events, and understanding public perception from a specific country is a challenging problem and the focus of this work.

In particular, this work uses the geocoded tweets to infer:

Influential users and their country
Influential countries and analysis of tweets from those countries
User communities
Interactions network
Reactive tweets (viral tweets that are quoted with extreme sentiments)
Viral local tweets
Viral global tweets
Potentially sensitive tweets analysis
Popular mentions and hashtags
Sentiment analysis
Daily tweets counts
Basic statistics - total tweets, avg no. of tweets per day, no. of unique users, and date range of the collected tweets.

These insights can help the regulators and decision-makers take appropriate actions.

Checkout the demos for:

Country-specific tweets:
- 🇸🇬 Singapore
- 🇮🇳 India
- 🇺🇸 United States
🌎 Global tweets

Content

Overview
Visualizing the key insights from the sample tweets data
- Manual Setup
- Using Docker
Visualizing the key insights from custom country-specific or global level tweets data
Additional step for Singapore-based users tweets collection

Overview

The repository contains code for:

Geocoding tweets by using location, user description, place, and coordinates data
Filtering country-specific tweets that are ingested into MongoDB
Notebooks containing exploratory data analysis on the collected tweets
Pipeline for tweets processing and cleaning.
Pipeline for generating key insights i.e., dashboard data (csv and json files)
Plotly dash application for visualizing the insights
Fetching followers of 59 Singapore-based official accounts (such as Ministry of Education, Health, and so on)

Refer to this detailed documentation. The documentation provides details on the collected data, geocoding, approach towards key insights generation, implementation using python, and so on.

Visualizing the key insights from the sample tweets data

Download the sample data from this link and keep the country-specific -- singapore, india, united states or global folder inside the data folder.

Manual Setup

Git clone the repository

git clone https://github.com/anshu0612/geocoded-tweets-analysis.git

Install dependencies

pip3 install -r requirements.txt

Run the application on your local

python3 app.py

Hit http://localhost:5000/ to see the application running 🚀.

Using Docker

If you face environment dependencies then you can use docker instead.

Clone the repository
Install Docker on your system
Run the below command to build the docker image

docker image build -t geocoded-tweets-insights-dash:latest .

Run the docker image

docker container run -d -p 5000:5000 geocoded-tweets-insights-dash

Hit http://localhost:5000/ to see the application running 🚀.

Visualizing the key insights from custom country-specific or global level tweets data

Step 1: Collect tweets

The below steps are a guide to collecting keywords and hashtags specific tweets using the Twitter Streaming API and ingesting them into MongoDB.

Create an .env file, and add your Twitter Credentials details:

TWITTER_APP_KEY = 'XXXX'
TWITTER_APP_SECRET = 'XXXX'
TWITTER_OAUTH_TOKEN = 'XXXX'
TWITTER_OAUTH_TOKEN_SECRET = 'XXXX'

Add custom keywords and hashtags in data/tweets_collection_filters/ (see sample hashtags.csv and keywords.csv files)
Run the python script collect_tweets.py to collect the tweets.

Sample:

python3 collect_tweets.py --db_name COVID_VACCINE

Step 2: Update the `constants/country_config.py` file

For collecting country-specific tweets

# (String) Should be Alpha2 country code
# Check `COUNTRY_TO_ALPHA2` for reference in constants/commmon.py file
# Example: 'SG'
COUNTRY_CODE = None

# (List) of country alternatives
# Example 1: ['sg', 'spore', 'singapore', 'singapura']
# Example 2: ['United States', 'america', 'usa', 'us', 'united states of america', 'u.s.', 'states', 'u.s.a']
# --------- USE: ---------
# 1. Helps in estimating a user's location based on the country name alternatives
# 2. Filtering tweets based on the country name alternatives  present in 
#    `location description` and `profile description` of a user
# 3. Skip the country name alternatives from the top hashtags
COUNTRY_ALTS = []

# (Dictionary) - {<twitter_user_screen_name>: <twitter_user_country_code>} - Prior knowledge of a user's country
# Example {'muttons': 'SG', 'POTUS': 'US'}
KNOWN_USERNAMES_COUNTRIES = {}

❗ Important: Do not update the file if you intend to collect global tweets from the users

Step 3: Parse tweets

Connect to remote MongoDB by configuring the .env file

Configure MongoDB

MONGO_HOST = <mongo_host>
MONGO_USER = <mongo_username>
MONGO_PASS = <mongo_password>

Sample command to run the python script to parse the tweets

python3 parse_tweets.py --db_name COVID_VACCINE --collection_no_list 88 89

Arguments

Argument	Description	Default
db_name	Database name to fetch tweets from	-
collection_no_list	List of MongoDB collections	-
running_tweets_save_count	Number of tweets to save during tweets processing	1000
max_csv_tweets_count	Maximum no. of tweets to save in a csv	10000

The csv files for the:

The country-specific tweets will be saved in data/<country>/fragmented_tweets/tweets and data/<country>/fragmented_tweets/tweets/tweets_engagements directories
The global tweets will be saved in data/global/fragmented_tweets/tweets and data/global/fragmented_tweets/tweets_engagements directories

Step 4: Process tweets

Run the python script to process the tweets

python3 process_tweets.py

This will merge and join all the csvs files, and then do the required processing.

The generated file will be stored in:

For country-specific tweets: data/<country>/<country>_tweets.csv
For global tweets: data/global/global_tweets.csv

Step 5: Generate dashboard data

Run the python script to generate data for the dashboard

python3 generate_dash_data.py

You can setup the dashboard related constants in the data/dash_constants.py

This will create the necessary csv and json data files for the dashboard visualization.

The generated data directories containing the files will be stored in:

For country-specific tweets: data/<country>/dash_output/....
For global tweets: data/global/dash_output/....

Step 6: Run the application

Yay! If you successfully ran all the above steps, then go ahead and run the application. 🥳

python3 app.py

Additional step for Singapore-based users tweets collection

Since I started building the dashboard for Singapore, there is an additional step, can say, Step 0 for Singapore-based users' tweets collection.

The step involves collecting Twitter users who follow at least X number of Singapore-linked official accounts (such as Ministry of Education, Health, and so on). The heuristic is to know the potential Singapore-based users and subsequently collect tweets from those users.

The provided sample of users in data/singapore/min_following_users.txt lists the users who follow at least 2 of the 59 Singapore-linked official accounts (check the diagram below for your reference). During tweets collection, this file is used - as one of the steps - to filter Singapore-based tweets.

Collecting followers of the Singapore-linked official accounts`

Running the below command fetches the followers of the 59 Singapore-based official accounts.

python3 get_sg_users.py --min_following_required 2

Arguments

Argument	Description	Default
min_following_required	Filter users following at least these number of Singapore-based official accounts	2

The file data/singapore/min_following_users.txt contains the user ids of the collected twitter Singapore-based official accounts.

The list of followers will be saved in data/singapore/sg_accounts_followers/ folder.

To collect the followers using the Twitter API, add the credentials in the .env file

Contact

If you have any questions, feel free to open an issue or directly contact me via: anshu@comp.nus.edu.sg or anshu.singh173@gmail.com

Acknowledgement

I am thankful to Prof. Mohan Kankanhalli, Dr. Christian von der Weth (@chrisvdweth), and Dr. Jithin Vachery (@jithinvachery) for the guidance. I am grateful to Lloyd’s Register Foundation Institute for the Public Understanding of Risk (IPUR) and NUS Centre for Research in Privacy Technologies (N-CRiPT) for their support.

I would also like to acknowledge Reyha Verma (@reyha) for her initial contributions that provided directions to this work.

Name		Name	Last commit message	Last commit date
Latest commit History 198 Commits
assets		assets
constants		constants
dash_components		dash_components
dash_modules_generators		dash_modules_generators
data		data
notebooks		notebooks
utils		utils
.gitignore		.gitignore
Dockerfile		Dockerfile
Procfile		Procfile
README.md		README.md
app.py		app.py
collect_tweets.py		collect_tweets.py
generate_dash_data.py		generate_dash_data.py
get_sg_users.py		get_sg_users.py
parse_tweets.py		parse_tweets.py
process_tweets.py		process_tweets.py
requirements.txt		requirements.txt
runtime.txt		runtime.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📍 Geocoded Tweets Insights Dashboard 📉 📈 📊

Content

Overview

Visualizing the key insights from the sample tweets data

Manual Setup

Using Docker

Visualizing the key insights from custom country-specific or global level tweets data

Step 1: Collect tweets

Step 2: Update the `constants/country_config.py` file

Step 3: Parse tweets

Step 4: Process tweets

Step 5: Generate dashboard data

Step 6: Run the application

Additional step for Singapore-based users tweets collection

Collecting followers of the Singapore-linked official accounts`

Contact

Acknowledgement

About

Releases

Packages

Languages

anshu0612/geocoded-tweets-insights-dashboard

Folders and files

Latest commit

History

Repository files navigation

📍 Geocoded Tweets Insights Dashboard 📉 📈 📊

Content

Overview

Visualizing the key insights from the sample tweets data

Manual Setup

Using Docker

Visualizing the key insights from custom country-specific or global level tweets data

Step 1: Collect tweets

Step 2: Update the constants/country_config.py file

Step 3: Parse tweets

Step 4: Process tweets

Step 5: Generate dashboard data

Step 6: Run the application

Additional step for Singapore-based users tweets collection

Collecting followers of the Singapore-linked official accounts`

Contact

Acknowledgement

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Step 2: Update the `constants/country_config.py` file

Packages