This project aims to develop a music genre classifier using Machine Learning algorithms with Scikit-Learn. The classifier is trained on a dataset of precomputed audio features labeled with their genre, and uses a Support Vector Machine (SVM) with a linear kernel to predict the genre of new audio files based on their features.
The dataset used for this project consists of the audio features of songs corresponding to 10 different genres: Blues, Classical, Country, Disco, Hip-hop, Jazz, Metal, Pop, Reggae, and Rock. For each audio file a set of 240 audio features labeled with its corresponding genre is provided in an ARFF file.
To run this project, Python 3 and several Python libraries need to be installed, including Scikit-Learn, Pandas, Numpy, Matplotlib, Seaborn, and Scipy. You can install these libraries using:
pip install scikit-learn pandas numpy matplotlib seaborn scipy
To use this classifier, you need to follow these steps:
-
Download or clone the GitHub repository here.
-
Open the Jupyter notebook
GenreClassificationML.ipynb
and run all the cells. -
The notebook will load the dataset, preprocess it, train the SVM classifier, and evaluate its performance using cross-validation and a confusion matrix. It will also show the five features with the largest weight on the classification.
-
You can modify the parameters of the SVM classifier, such as the kernel or the regularization parameter, to see if you can improve its performance.
The confusion matrix for the classification is the following:
The classification is not perfect, but it achieves an average accuracy of about 82.5% using 10-fold cross-validation. There are some missclassifications due to the similarity between genres (country/blues, classical/jazz, rock/country...)
And the 5 most important features for the classification are:
This project was developed as part of the Advanced Topics in SMC course at UPF.