Skip to content

Experiments on musical instrument recognition with the IRMAS dataset and Convolutional Neural Networks

Notifications You must be signed in to change notification settings

OdysseasKr/irmas-cnn

Repository files navigation

irmas-cnn

Experiments on the IRMAS dataset using Convolutional Neural Networks.

This repository contains code and Jupyter Notebooks of my attempts on the IRMAS dataset.

The IRMAS dataset [link] is used for musical instrument recognition in audio tracks. It consists of:

  • The trainset: Contains 3-second tracks of solo instruments. There are 11 instruments
  • The testset: A collection of multi-instrumental audio tracks. Each track is labeled with at least one of the instruments, which is considered its "dominant" sound.

In this project, I have extracted different features from the audio signal, which were fed to a Convolutional Neural Network. The two networks included in the project:

  • VGG-16 [paper]
  • A variation of the YOLO architecture [paper]

All of the models are implemented using the new higher level Tensorflow API [link].

There are Jupyter Notebooks for two experiments:

  • Using Mel-Frequency Cepstrum as feature with YOLO-like CNN here
  • Using several handpicked features with a VGG-16 architecture here

Using the DatasetPreprocessor

The repository includes the DatasetPreprocessor class that extracts features from the raw files of the IRMAS dataset and stores them in easy to use .h5 files. All features are generated using Librosa [link].

Initialize the DatasetPreprocessor object like this:

dp = DatasetPreprocessor('mel')

or

dp = DatasetPreprocessor('handpicked')

The mel option extract the Mel-Frequency Cepstrum as feature The handpicked option extracts the following features:

  • Spectral Centroid
  • Spectral Bandwidth
  • Spectral Rolloff
  • Zero-crossing rate
  • RMSE
  • MFCC

To generate the train and test sets call

dp.generateTrain('path/to/trainset/folder')
dp.generateTest('path/to/testset/folder')

Use

dp.normalizeGain('path/to/trainset/folder')

to normalize the gain of all tracks in a folder, to a specific dB value.

About

Experiments on musical instrument recognition with the IRMAS dataset and Convolutional Neural Networks

Topics

Resources

Stars

Watchers

Forks