Music genre classification projet as part of the Numerical Analysis for Machine Learning course at Politecnico di Milano, A.Y. 2022-2023.
The data used for this project can be found
here
as a Kaggle dataset. It can be downloaded into the res
directory
directly from the terminal using
Kaggle's CLI tool:
kaggle datasets download achgls/gtzan-music-genre -p ./res --unzip
The GTZAN dataset was originally introduced in the following paper by George Tzanetakis (hence the name) in 2002 as part of his Ph.D. thesis work.
@ARTICLE{1021072,
author={Tzanetakis, G. and Cook, P.},
journal={IEEE Transactions on Speech and Audio Processing},
title={Musical genre classification of audio signals},
year={2002},
volume={10},
number={5},
pages={293-302},
doi={10.1109/TSA.2002.800560}}
In order to allow for proper uncontaminated assessment of each parameter's impact
on training, the training script present in this repo allows for full reproducibility of experiments.
When modifying a certain parameter to evaluate its impact on training, you can thus be certain
that all other parameters remain stable.
When a seed is given as argument to the script, the model is prevented to use any
non-deterministic operations, and the seed is set as the pseudo-random
number generator for the initialization of models weights as well as data sampling.
You might get a RuntimeError
from NVIDIA backend when trying to run
reproducible experiments, in that case you will have to set an environment variable as so:
export CUBLAS_WORKSPACE_CONFIG=:4096:8
Libraries used in this project are listed in requirements.txt and can be installed at once with:
pip install -r requirements.txt
In addition to those,
you need a torchaudio-compatible audio backend installed. This would be soundfile
for Windows machines: pip install soundfile
, and sox_io
for Unix systems:
pip install sox
. More info on bakends are available
on the PyTorch audio backends documentation.
Paper suggested as a guideline for the project:
[1] Xu, Yijie and Zhou, Wuneng, 2020. A deep music genres classification model based on CNN with Squeeze & Excitation Block. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 332-338.
@INPROCEEDINGS{9306374,
author={Xu, Yijie and Zhou, Wuneng},
booktitle={2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)},
title={A deep music genres classification model based on CNN with Squeeze & Excitation Block},
year={2020}
}
[2] torchaudio: an audio library for PyTorch, 2021.
@article{yang2021torchaudio,
title={TorchAudio: Building Blocks for Audio and Speech Processing},
author={Yao-Yuan Yang and Moto Hira and Zhaoheng Ni and Anjali Chourdia and Artyom Astafurov and Caroline Chen and Ching-Feng Yeh and Christian Puhrsch and David Pollack and Dmitriy Genzel and Donny Greenberg and Edward Z. Yang and Jason Lian and Jay Mahadeokar and Jeff Hwang and Ji Chen and Peter Goldsborough and Prabhat Roy and Sean Narenthiran and Shinji Watanabe and Soumith Chintala and Vincent Quenneville-Bélair and Yangyang Shi},
journal={arXiv preprint arXiv:2110.15018},
year={2021}
}