SC1015 Mini Project - Song Popularity Predictor 🎼🎼

About

This is our Mini-Project for SC1015 ( Introduction to Data Science and Artificial Intelligence ) which focuses on songs from the Song Popularity Dataset from Kaggle. Here is an overview of the source code:

Contributors

@s-27-b (Bhargavi) - Data Visualisation, Regression Models

@BuggerBugs (Qi Yang) - Data Preparation, Classification Models, Results Comparision

Problem Definition

Musical Artists sometimes have a limit on the number of songs they can release in an album. Being able to predict which ones will do better can help them decide which songs to include in their album for a HIT !!!

Our aim is to come up with the best model to predict how popular a song could be (popularity category) based on it's audio features - song duration, acousticness, danceability, energy, instrumentalness, key, liveness, loudness, audio mode, speechiness, tempo, time signature and audio valence.

Models Used

Originally, the song popularity scores are from 0-100. We identified 4 categories, and want to predict the song's popularity category. Here are the popularity scores and their corresponding popularity categories:

We used 2 approaches to find the best model.

Firstly, we used regression to predict the numerical value of the popularity score, then convert that predicted score into the popularity category of 0-3, and check against the actual song popularity category.

The models we used for regression are :

Multi-Variate Linear Regression
Stepwise Linear Regression
Support Vector Regression
K-nearest Neighbours Regression

Our second approach is to categorise all the song popularity scores into categories first, then train our classification models on these popularity categories to directly predict the song popularity category.

The models we used for classification are :

Decision Trees
Random Forest
Artificial Neural Networks

Conclusions

The best model to predict song popularity for this dataset is Random Forest (50.6% Accuracy)

Regression Models and Non-Oversampled Classification Models are not suitable in predicting song popularities from the lowest and highest categories in this dataset.

It is difficult to get a model with high accuracy to predict what song people would like, or dislike.
SMOTE Oversampling can reduce bias of models towards song popularity category 2 (the majority class), but reduces overall model accuracy for models trained on this dataset.
A larger and less imbalanced dataset could be needed to reduce bias, as well as noise from oversampling, and ultimately train better performing models for artistes to utilise and decide which songs to include in their albums, or which ones to leave out!

What did we learn from this project?

One hot encoding and dummy variable trap

Different models for regression - Stepwise, KNN, SVM.

SMOTE Oversampling technique

Different models for classification - Random Forest, Neural Networks

Hyperparameter optimization with keras tuner

Creating repositories and collaborating on Github

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
SC1015 Mini-Project Files		SC1015 Mini-Project Files
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SC1015 Mini Project - Song Popularity Predictor 🎼🎼

About

Contributors

Problem Definition

Models Used

Conclusions

What did we learn from this project?

References

About

Releases

Packages

Contributors 2

Languages

BuggerBugs/SC1015-Mini-Project

Folders and files

Latest commit

History

Repository files navigation

SC1015 Mini Project - Song Popularity Predictor 🎼🎼

About

Contributors

Problem Definition

Models Used

Conclusions

What did we learn from this project?

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages