Skip to content

Music genre classification project as part of the Numerical Analysis for Machine Learning course at Politecnico di Milano, A.Y 2022-2023.

Notifications You must be signed in to change notification settings

andreabosisio/music-genre-classification

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Music genre classification with SENets

Music genre classification projet as part of the Numerical Analysis for Machine Learning course at Politecnico di Milano, A.Y. 2022-2023.

Dataset

The data used for this project can be found here as a Kaggle dataset. It can be downloaded into the res directory directly from the terminal using Kaggle's CLI tool:

kaggle datasets download achgls/gtzan-music-genre -p ./res --unzip

The GTZAN dataset was originally introduced in the following paper by George Tzanetakis (hence the name) in 2002 as part of his Ph.D. thesis work.

@ARTICLE{1021072,
  author={Tzanetakis, G. and Cook, P.},
  journal={IEEE Transactions on Speech and Audio Processing}, 
  title={Musical genre classification of audio signals}, 
  year={2002},
  volume={10},
  number={5},
  pages={293-302},
  doi={10.1109/TSA.2002.800560}}

Reproducibility

In order to allow for proper uncontaminated assessment of each parameter's impact on training, the training script present in this repo allows for full reproducibility of experiments. When modifying a certain parameter to evaluate its impact on training, you can thus be certain that all other parameters remain stable. When a seed is given as argument to the script, the model is prevented to use any non-deterministic operations, and the seed is set as the pseudo-random number generator for the initialization of models weights as well as data sampling. You might get a RuntimeError from NVIDIA backend when trying to run reproducible experiments, in that case you will have to set an environment variable as so:

export CUBLAS_WORKSPACE_CONFIG=:4096:8

Requirements

Libraries used in this project are listed in requirements.txt and can be installed at once with:

pip install -r requirements.txt

In addition to those, you need a torchaudio-compatible audio backend installed. This would be soundfile for Windows machines: pip install soundfile, and sox_io for Unix systems: pip install sox. More info on bakends are available on the PyTorch audio backends documentation.

References

Paper suggested as a guideline for the project:

[1] Xu, Yijie and Zhou, Wuneng, 2020. A deep music genres classification model based on CNN with Squeeze & Excitation Block. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 332-338.

@INPROCEEDINGS{9306374,
  author={Xu, Yijie and Zhou, Wuneng},
  booktitle={2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)}, 
  title={A deep music genres classification model based on CNN with Squeeze & Excitation Block}, 
  year={2020}
}

[2] torchaudio: an audio library for PyTorch, 2021.

@article{yang2021torchaudio,
  title={TorchAudio: Building Blocks for Audio and Speech Processing},
  author={Yao-Yuan Yang and Moto Hira and Zhaoheng Ni and Anjali Chourdia and Artyom Astafurov and Caroline Chen and Ching-Feng Yeh and Christian Puhrsch and David Pollack and Dmitriy Genzel and Donny Greenberg and Edward Z. Yang and Jason Lian and Jay Mahadeokar and Jeff Hwang and Ji Chen and Peter Goldsborough and Prabhat Roy and Sean Narenthiran and Shinji Watanabe and Soumith Chintala and Vincent Quenneville-Bélair and Yangyang Shi},
  journal={arXiv preprint arXiv:2110.15018},
  year={2021}
}

About

Music genre classification project as part of the Numerical Analysis for Machine Learning course at Politecnico di Milano, A.Y 2022-2023.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 96.3%
  • Python 3.7%