Framework for automatic instrument classification using the NSynth dataset and augmenting this dataset with audio effects. The repository contains code for creating of a TFRecords file with the spectograms and labels from the NSynth dataset, augmenting this dataset with VST plugins and training and evaluating an instrument classification model on the augmented datasets.
We use the NSynth dataset, which is available here. For the data augmentation, the following plugins were used:
- TAL-Bitcrusher
- TAL-Tube
- TAL-Reverb-4
- TAL-Chorus-LX
- TAL-Dub-2
- TAL-Flanger
- Camel Audio's CamelCrusher
- Shattered Glass Audio's Ace
- OrilRiver
- Blue Cat's Chorus
- ++delay\footnote
- Blue Cat's Flanger
To apply the audio effects to wav files, MrsWatson should be installed.
After the Nsynth dataset has been downloaded and the effects and MrsWatson installed, data-processing/batch-effect-processing.py
should be adapted to run on the desired directories. When this proccess finishes, the paths in data-processing/feature_extraction.py
should be modified to the correct directories and, then, running this script will create TFRecord dataset files for each audio effect.
To train the model, the main.py
function should be called with the appropriate flags. The path to the training and validation sets should be set and the "effect" flag should be set to the desired effect: bitcrusher, chorus, delay, flanger, reverb, tube, pitch_shifting or none, if no effect is desired.
The evaluation of each model on a dataset can be performed by running predict.py
with the apropriate flags.