This project showcases an end-to-end data pipeline that extracts, processes, and analyzes Spotify music data using Python, AWS, and Snowflake. The goal is to automate data ingestion, transformation, and storage, enabling real-time analytics.
- Fetches music data from the Spotify API.
- Formats and structures the extracted data for further processing.
- AWS Lambda: Automates the ETL process.
- Amazon S3: Serves as an intermediary storage layer for raw and processed data.
- Stores structured data for analysis and reporting.
- Snowpipe: Enables real-time data ingestion for continuous updates.
- Creates visual dashboards to analyze music trends.
- Spotipy (Python Library) pulls data from Spotify's API.
- Extracted JSON files are stored in AWS S3 (Raw Layer).
- AWS Lambda processes and transforms JSON data into structured tables (Songs, Albums, Artists).
- Processed data is uploaded to AWS S3 (Transformed Layer).
- Snowpipe automatically ingests structured data into Snowflake.
- Data in Snowflake is used for querying & insights.
- (Optional) Power BI dashboards visualize trends and user preferences.
✅ Built an end-to-end data pipeline integrating AWS and Snowflake. ✅ Gained hands-on experience with automated data ingestion using Snowpipe. ✅ Strengthened proficiency in Python, AWS, and Snowflake. ✅ Laid the foundation for real-time analytics & future enhancements.
🔹 Implement real-time streaming using Kafka or AWS Kinesis. 🔹 Extend features to analyze user listening behavior & engagement. 🔹 Integrate AI-driven recommendations for personalized insights.
📦 Spotify-Data-Pipeline
├── 📂 scripts
│ ├── extract.py # Extracts data from Spotify API
│ ├── transform.py # Cleans and structures raw data
│ ├── load.py # Loads transformed data into Snowflake
│ ├── aws_lambda.py # AWS Lambda function for ETL automation
│ └── config.py # Stores API credentials and configurations
├── 📂 data
│ ├── raw # Raw JSON data from Spotify API
│ ├── processed # Transformed and structured data
├── 📂 dashboards
│ ├── spotify_dashboard.pbix # Power BI visualization file (if applicable)
├── README.md # Project documentation
├── requirements.txt # Python dependencies
└── .gitignore # Excludes unnecessary files