Music_Retrieval

Cross-modal Music-to-Story Retrieval task
Retrieving Story from Music
With emotion labels, we tried to exploit embedding spaces for mapping story & music.

Model Architecture

Query (Audio) Encoder

① GST style Reference Encoder

- ② VAE style Reference Encoder

Inference

Set up Environment

pip install -r requirements.txt

Dataset

- Text: NIA의 구연동화 음성합성기 훈련 데이터

Sentences from poetry, novels, dramas, scenarios, etc.
Seven emotion labels (happy, neutral, flustered, anxious, angry, sad, hurt)

- Audio: MTG-Jamendo dataset

https://mtg.github.io/mtg-jamendo-dataset
You need to download audio files for the autotagging_moodtheme.tsv subset.
Many moodtheme tags (action,adventure,advertising,ambiental,background, ballad...)
So, we manually map music labels onto story labels

Training

1. two branch metric learning

bash ./run_twobranch_train.sh

2. three branch metric learning

bash ./run_train.sh

References

Minz Won, Justin Salamon, Nicholas J. Bryan, Gautham J. Mysore, and Xavier Serra

@inproceedings{won2021emotion,
  title={Emotion embedding spaces for matching music to stories},
  author={Won, Minz. and Salamon, Justin. and Bryan, Nicholas J. and Mysore, Gautham J. and Serra, Xavier.},
  booktitle={ISMIR},
  year={2021}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Music_Retrieval

Model Architecture

Query (Audio) Encoder

Inference

Set up Environment

Dataset

- Text: NIA의 구연동화 음성합성기 훈련 데이터

- Audio: MTG-Jamendo dataset

Training

1. two branch metric learning

2. three branch metric learning

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

Music_Retrieval

Model Architecture

Query (Audio) Encoder

Inference

Set up Environment

Dataset

- Text: NIA의 구연동화 음성합성기 훈련 데이터

- Audio: MTG-Jamendo dataset

Training

1. two branch metric learning

2. three branch metric learning

References