Coronavirus twitter analysis

This project uses the MapReduce programming model to process all geotagged tweets sent in 2020. In particular, the project determines the number of tweets of each language and country that contain the hashtags #coronavirus and #코로나바이러스. This is part of a Big Data (CSCI 143) homework assignment.

Results

Using this process, we create graphs showing the top 10 countries and languages with #coronavirus and #코로나바이러스 (#coronavirus in Korean). Here are the results:

Running the Code

In order to run this code, you need to be on the CMC lambda server. If you do have access to the server, here are the commands to run the MapReduce algorithm using this repository:

Run this command to run the map script on all tweets from 2020 and keep the script running even after you logout of the lambda server. run_maps.sh is a shell script which runs map.py on the tweets for every day in 2020. map.py process the tweets for a single day and outputs JSON formatted information about the tweets in that day.

$ nohup ./run_maps.sh &

Reduce the outputs for all of the days into a single output files with these two commands. They contain JSON formatted information about tweets grouped by language for the first command and country for the second.

$ ./src/reduce.py --input_paths outputs/geoTwitter*.lang --output_path=reduced.lang
$ ./src/reduce.py --input_paths outputs/geoTwitter*.country --output_path=reduced.country

Visualize the top 10 countries with tweets containing a given hashtag with

$ ./src/visualize.py --input_path=reduced.country --key=HASHTAG

Or the top 10 languages with tweets containing a given hashtag with

$ ./src/visualize.py --input_path=reduced.lang --key=HASHTAG

The above two commands will output images into the graphs directory, which can then be pushed to github and viewed.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
graphs		graphs
src		src
.gitignore		.gitignore
README.md		README.md
mapreduce.png		mapreduce.png
reduced.country		reduced.country
reduced.lang		reduced.lang
run_maps.sh		run_maps.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Coronavirus twitter analysis

Results

Running the Code

About

Releases

Packages

Languages

tennisoctocat/twitter_coronavirus

Folders and files

Latest commit

History

Repository files navigation

Coronavirus twitter analysis

Results

Running the Code

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages