Releases: tylersupersad/star-wars-youtube-comments-pipeline
Releases · tylersupersad/star-wars-youtube-comments-pipeline
Sprint 02 Update
- Collected relevant tweets using Pythonic library, Twint
- Cleaned and preprocessed the data by removing irrelevant information, standardizing text, and reducing dimensionality
- Labeled the sentiment using pre-trained sentiment analysis model, TextBlob
- Evaluated and refined the dataset by identifying mislabeled tweets, imbalanced data, and patterns/trends
- Stored the dataset in a PostgreSQL database management system for easy access and analysis
- Integrated the pipeline with Apache Airflow to automate the entire process and schedule it to run at regular intervals or specific events.
Sprint 01 Update
Sprint 01 aimed to ensure that our team's project workflow was set up.
The following requirements were fulfilled:
- GitHub project for coursework setup.
- Product backlog created.
- Initial tasks are defined as user stories.
- Kanban/project board being used.
- Sprint boards are being used.
- Necessary starting docker files for the project set up and working.
- Correct branches for GitFlow workflow created – includes master, develop, and release branches.
- The first release was created on GitHub.
- Code of Conduct defined.