Skip to content

Latest commit

 

History

History
86 lines (46 loc) · 3.09 KB

ReadMe.md

File metadata and controls

86 lines (46 loc) · 3.09 KB

ACR Training

Automatic Chord Recognition research folder that generates, parses and stores Datasets and defines and trains Models. This part of the project is coded in Python in order to use specific python libraries.

Prerequisites

  • Python 3.8
  • python libraries - sklearn, tensorflow, librosa, mir_eval
  • dataset - wav audio, chord annotations, key annotations

ACR - Results

The Proof of concept of ACR training on the Isophonics and Billboard dataset were performed in a Jupyter Notebook. All trainings were evaluated on the validation, NOT TEST, dataset. You can see the outcome HERE!

The actual ACR research inspired of the POC was evaluated only on the Beatles Dataset. Mir eval scores and Accuracy are used and printed HERE!

Model descriptions are listed BELOW.

Segmentation - Results

In order to increase accuracy of the ACR, the harmony segmentation is researched. The Segmentation results were generated in a Jupyter Notebook. You can see the outcome HERE!

Basicaly, segmentation models are opportunity for the future works. For now, the librosa's beat track, which returns bpm with beat list, is used.

Model descriptions are listed BELOW.

Datasets

Two well known datasets are included.

Dataset issues

  1. No audio files provided.
  2. Not consistent chord annotations - sometimes B chord means Hes, sometimes B chord means H.
  3. Sometimes not correctly annotated chords (Something by Beatles).
  4. When the audio files exist, the dataset is too large.

Preprocessing

Datasets have functions for data preprocessing. The audio waveform is parsed and the spectrogram/chromagram is generated. Each model type needs different kind of preprocessing.

MLP Preprocessing

The function will generate the moving window of spectrograms flattened as a one feature set. Optionally, data can be transposed to C major key (and its mode alternatives).

CRNN Preprocessing

The function will generate the sequence of spectrograms as a one feature set.Optionally, data can be transposed to C major key (and its mode alternatives).

ACR Models - structures

MLP

MLP architecture

CRNN Basic

CRNN Basic architecture

CRNN with EfficientNet

CRNN with efficientnet architecture

CRNN with CRF

CRNN with CRF architecture

CRNN Bass-Third

CRNN bass third architecture

Segmentation Models - structures

Segmentation CRNN

Segmentation CRNN architecture

Encoder-Decoder Segmentation

Encoder-Decoder architecture