Sprint 02 Update
Pre-release
Pre-release
- Collected relevant tweets using Pythonic library, Twint
- Cleaned and preprocessed the data by removing irrelevant information, standardizing text, and reducing dimensionality
- Labeled the sentiment using pre-trained sentiment analysis model, TextBlob
- Evaluated and refined the dataset by identifying mislabeled tweets, imbalanced data, and patterns/trends
- Stored the dataset in a PostgreSQL database management system for easy access and analysis
- Integrated the pipeline with Apache Airflow to automate the entire process and schedule it to run at regular intervals or specific events.