Skip to content

anoushka1405/Song-Popularity-Prediction

Repository files navigation

🎡 Song Popularity Predictor

An end-to-end machine learning project that predicts song popularity using Spotify audio features

Python Streamlit License Status


🎯 Overview

This project analyzes 114,000 songs from Spotify to predict whether a song will be popular based on its audio features. Using machine learning algorithms, the system achieves 75% ROC-AUC score and provides an interactive web application for real-time predictions.

Key Highlights

  • Comprehensive EDA with 15+ visualizations
  • 3 ML Models trained and compared
  • Interactive Web App built with Streamlit
  • Feature Engineering with 6 custom features
  • 90% Recall for popular songs
  • Deployment Ready for Streamlit Cloud

Features

Interactive Web Application

  • Data Explorer: Visualize distributions, correlations, and genre analysis
  • Model Training: Train and compare multiple ML models
  • Live Predictions: Real-time popularity predictions with confidence scores
  • Model Insights: Feature importance and correlation analysis
  • Professional UI: Clean, intuitive interface with Plotly visualizations

Machine Learning Pipeline

  • Data preprocessing and cleaning
  • Feature engineering (6 custom features)
  • Class imbalance handling
  • Model training with cross-validation
  • Threshold tuning for optimal recall
  • Comprehensive evaluation metrics

Analysis & Visualization

  • Popularity distribution analysis
  • Feature correlation heatmaps
  • Genre performance comparison
  • ROC curves and confusion matrices
  • Feature importance rankings

Demo

Home Page

Overview with dataset statistics and popularity distribution

Data Explorer

Interactive visualizations showing feature distributions and correlations

Model Training

Train multiple models and compare performance in real-time

Make Predictions

Input song features and get instant predictions with confidence scores


Installation

Prerequisites

  • Python 3.8 or higher
  • pip package manager

Quick Start

  1. Clone the repository
git clone https://github.com/your-username/song-popularity-predictor.git
cd song-popularity-predictor
  1. Install dependencies
pip install -r requirements.txt
  1. Update dataset path (in app.py line 234)
data = load_data("path/to/your/dataset.xlsx")
  1. Run the application
streamlit run app.py
  1. Open in browser Navigate to http://localhost:8501

πŸ€– Models & Performance

Models Implemented

Model Accuracy ROC-AUC F1-Score Training Time
XGBoost ⭐ 0.68 0.75 0.73 ~60s
Random Forest 0.68 0.74 0.69 ~45s
Gradient Boosting 0.67 0.73 0.68 ~50s
Logistic Regression 0.65 0.71 0.66 ~5s

⭐ Best Model: XGBoost with threshold tuning (0.404)

Key Metrics

  • ROC-AUC Score: 0.75 (Good discriminative ability)
  • Recall (Popular Songs): 90% (Excellent detection rate)
  • Precision: 61% (Acceptable trade-off)
  • F1-Score: 0.73 (Balanced performance)

Technologies

Core Technologies

Python Streamlit Pandas Scikit-learn Plotly

Libraries Used

  • Data Processing: Pandas, NumPy
  • Machine Learning: Scikit-learn, XGBoost
  • Visualization: Matplotlib, Seaborn, Plotly
  • Web Framework: Streamlit
  • Model Interpretation: SHAP

Results

Top 5 Most Important Features

  1. Genre - Most influential predictor
  2. Energy - Higher energy β†’ more popular
  3. Danceability - Danceable songs perform better
  4. Loudness - Louder tracks correlate with popularity
  5. Energy Γ— Danceability - Engineered feature (interaction effect)

Key Insights

  • 🎸 Genre Matters: Pop, hip-hop, and electronic dominate
  • ⚑ Energy Wins: High-energy songs are 2x more likely to be popular
  • πŸ’ƒ Make It Danceable: Danceability has strong positive correlation
  • πŸ”Š Turn It Up: Louder songs (>-5dB) perform better
  • ⏱️ Sweet Spot: 3-4 minute songs are optimal
  • 🎹 Less Acoustic: Electronic production outperforms acoustic

Business Impact

  • Helps streaming platforms improve recommendations
  • Guides artists on song characteristics for success
  • Assists record labels in identifying potential hits
  • Optimizes playlist curation for engagement

Future Improvements

  • Incorporate temporal features (release date, season)
  • Add artist popularity aggregation
  • Deploy as REST API with FastAPI
  • Add deep learning models (Neural Networks)
  • Incorporate social media engagement data
  • Build recommendation system

Contributing

Contributions are welcome! Here's how you can help:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Areas for Contribution

  • Bug fixes and improvements
  • Additional visualizations
  • New machine learning models
  • Documentation enhancements
  • UI/UX improvements

License

This project is licensed under the MIT License - see the LICENSE file for details.


Acknowledgments

  • Spotify for providing the audio features API
  • Scikit-learn community for excellent ML tools
  • Streamlit team for the amazing web framework
  • All contributors and supporters

Made with Passion, by Anoushka

A portfolio project demonstrating end-to-end ML capabilities

⬆ Back to Top

About

This project analyzes 114,000 songs from Spotify to predict whether a song will be popular based on its audio features. Using machine learning algorithms, the system achieves 75% ROC-AUC score and provides an interactive web application for real-time predictions.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors