An end-to-end machine learning project that predicts song popularity using Spotify audio features
This project analyzes 114,000 songs from Spotify to predict whether a song will be popular based on its audio features. Using machine learning algorithms, the system achieves 75% ROC-AUC score and provides an interactive web application for real-time predictions.
- Comprehensive EDA with 15+ visualizations
- 3 ML Models trained and compared
- Interactive Web App built with Streamlit
- Feature Engineering with 6 custom features
- 90% Recall for popular songs
- Deployment Ready for Streamlit Cloud
- Data Explorer: Visualize distributions, correlations, and genre analysis
- Model Training: Train and compare multiple ML models
- Live Predictions: Real-time popularity predictions with confidence scores
- Model Insights: Feature importance and correlation analysis
- Professional UI: Clean, intuitive interface with Plotly visualizations
- Data preprocessing and cleaning
- Feature engineering (6 custom features)
- Class imbalance handling
- Model training with cross-validation
- Threshold tuning for optimal recall
- Comprehensive evaluation metrics
- Popularity distribution analysis
- Feature correlation heatmaps
- Genre performance comparison
- ROC curves and confusion matrices
- Feature importance rankings
Overview with dataset statistics and popularity distribution
Interactive visualizations showing feature distributions and correlations
Train multiple models and compare performance in real-time
Input song features and get instant predictions with confidence scores
- Python 3.8 or higher
- pip package manager
- Clone the repository
git clone https://github.com/your-username/song-popularity-predictor.git
cd song-popularity-predictor- Install dependencies
pip install -r requirements.txt- Update dataset path (in
app.pyline 234)
data = load_data("path/to/your/dataset.xlsx")- Run the application
streamlit run app.py- Open in browser
Navigate to
http://localhost:8501
| Model | Accuracy | ROC-AUC | F1-Score | Training Time |
|---|---|---|---|---|
| XGBoost β | 0.68 | 0.75 | 0.73 | ~60s |
| Random Forest | 0.68 | 0.74 | 0.69 | ~45s |
| Gradient Boosting | 0.67 | 0.73 | 0.68 | ~50s |
| Logistic Regression | 0.65 | 0.71 | 0.66 | ~5s |
β Best Model: XGBoost with threshold tuning (0.404)
- ROC-AUC Score: 0.75 (Good discriminative ability)
- Recall (Popular Songs): 90% (Excellent detection rate)
- Precision: 61% (Acceptable trade-off)
- F1-Score: 0.73 (Balanced performance)
- Data Processing: Pandas, NumPy
- Machine Learning: Scikit-learn, XGBoost
- Visualization: Matplotlib, Seaborn, Plotly
- Web Framework: Streamlit
- Model Interpretation: SHAP
- Genre - Most influential predictor
- Energy - Higher energy β more popular
- Danceability - Danceable songs perform better
- Loudness - Louder tracks correlate with popularity
- Energy Γ Danceability - Engineered feature (interaction effect)
- πΈ Genre Matters: Pop, hip-hop, and electronic dominate
- β‘ Energy Wins: High-energy songs are 2x more likely to be popular
- π Make It Danceable: Danceability has strong positive correlation
- π Turn It Up: Louder songs (>-5dB) perform better
- β±οΈ Sweet Spot: 3-4 minute songs are optimal
- πΉ Less Acoustic: Electronic production outperforms acoustic
- Helps streaming platforms improve recommendations
- Guides artists on song characteristics for success
- Assists record labels in identifying potential hits
- Optimizes playlist curation for engagement
- Incorporate temporal features (release date, season)
- Add artist popularity aggregation
- Deploy as REST API with FastAPI
- Add deep learning models (Neural Networks)
- Incorporate social media engagement data
- Build recommendation system
Contributions are welcome! Here's how you can help:
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
- Bug fixes and improvements
- Additional visualizations
- New machine learning models
- Documentation enhancements
- UI/UX improvements
This project is licensed under the MIT License - see the LICENSE file for details.
- Spotify for providing the audio features API
- Scikit-learn community for excellent ML tools
- Streamlit team for the amazing web framework
- All contributors and supporters
Made with Passion, by Anoushka
A portfolio project demonstrating end-to-end ML capabilities