Skip to content

ShwethaSunkara1/Spotify-Datapipeline-using-Snowflake

Repository files navigation

🎵 Spotify Data Pipeline: AWS & Snowflake Integration ☁️❄️

🎯 Overview

This project showcases an end-to-end data pipeline that extracts, processes, and analyzes Spotify music data using Python, AWS, and Snowflake. The goal is to automate data ingestion, transformation, and storage, enabling real-time analytics.


🚀 Technologies Used

🐍 Python (Spotipy Library)

  • Fetches music data from the Spotify API.
  • Formats and structures the extracted data for further processing.

☁️ Amazon Web Services (AWS)

  • AWS Lambda: Automates the ETL process.
  • Amazon S3: Serves as an intermediary storage layer for raw and processed data.

🌨️ Snowflake (Cloud Data Warehouse)

  • Stores structured data for analysis and reporting.
  • Snowpipe: Enables real-time data ingestion for continuous updates.

📊 Power BI (Optional)

  • Creates visual dashboards to analyze music trends.

🛠️ Pipeline Architecture

1️⃣ Data Collection

  • Spotipy (Python Library) pulls data from Spotify's API.
  • Extracted JSON files are stored in AWS S3 (Raw Layer).

2️⃣ Data Processing

  • AWS Lambda processes and transforms JSON data into structured tables (Songs, Albums, Artists).

3️⃣ Data Storage & Loading

  • Processed data is uploaded to AWS S3 (Transformed Layer).
  • Snowpipe automatically ingests structured data into Snowflake.

4️⃣ Analysis & Insights

  • Data in Snowflake is used for querying & insights.
  • (Optional) Power BI dashboards visualize trends and user preferences.

🔥 Key Learnings & Achievements

✅ Built an end-to-end data pipeline integrating AWS and Snowflake. ✅ Gained hands-on experience with automated data ingestion using Snowpipe. ✅ Strengthened proficiency in Python, AWS, and Snowflake. ✅ Laid the foundation for real-time analytics & future enhancements.


🚀 Future Improvements

🔹 Implement real-time streaming using Kafka or AWS Kinesis. 🔹 Extend features to analyze user listening behavior & engagement. 🔹 Integrate AI-driven recommendations for personalized insights.


📂 Project Structure

📦 Spotify-Data-Pipeline
├── 📂 scripts
│   ├── extract.py          # Extracts data from Spotify API
│   ├── transform.py        # Cleans and structures raw data
│   ├── load.py             # Loads transformed data into Snowflake
│   ├── aws_lambda.py       # AWS Lambda function for ETL automation
│   └── config.py           # Stores API credentials and configurations
├── 📂 data
│   ├── raw                 # Raw JSON data from Spotify API
│   ├── processed           # Transformed and structured data
├── 📂 dashboards
│   ├── spotify_dashboard.pbix # Power BI visualization file (if applicable)
├── README.md               # Project documentation
├── requirements.txt        # Python dependencies
└── .gitignore              # Excludes unnecessary files

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published