Education trends on Twitter

Objective :

By analyzing a massive collection of education-related tweets, the project explores whether higher tweet volumes correspond to significant trends in the education sector.

Skills/Tools Used :

Python Programming Language
PySpark
Google Cloud Platform
Big Data Analysis

Project Overview :

Performing twitterer identification, location analysis, timeline analysis and tweets uniqueness.

1. Data Collection and Preprocessing:

The dataset was given by the University that I am studying in (University of Chicago). It consists of ~100 million Tweets (~500GB). These tweets are collected on the topics of education, schools, universities, learning, knowledge sharing, etc., but only a fraction of them would be directly related to either primary, secondary or higher education.
Combine individual JSON files and process them for analysis.
Discard irrelevant tweets to focus on education-related content.

2. Exploratory Data Analysis (EDA):

Conduct a comprehensive EDA to identify key variables suitable for profiling Twitter users.
Identify fields that provide insights into message volume, retweets, and more.
Discard poorly populated variables to streamline analysis.

3. Perform Analysis on following topics :

Author identification
Geographical Distribution Analysis
Timeline Analysis
Message Uniqueness Analysis

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Education trends on Twitter

Objective :

Skills/Tools Used :

Project Overview :

Files

README.md

Latest commit

History

README.md

File metadata and controls

Education trends on Twitter

Objective :

Skills/Tools Used :

Project Overview :