Skip to content

Latest commit

 

History

History
69 lines (51 loc) · 1.68 KB

README.md

File metadata and controls

69 lines (51 loc) · 1.68 KB

Music_Retrieval

  • Cross-modal Music-to-Story Retrieval task
  • Retrieving Story from Music
  • With emotion labels, we tried to exploit embedding spaces for mapping story & music.

Model Architecture

Query (Audio) Encoder

  • ① GST style Reference Encoder

- ② VAE style Reference Encoder


Inference

Open In Colab


Set up Environment

pip install -r requirements.txt

Dataset

- Text: NIA의 구연동화 음성합성기 훈련 데이터

  • Sentences from poetry, novels, dramas, scenarios, etc.
  • Seven emotion labels (happy, neutral, flustered, anxious, angry, sad, hurt)

- Audio: MTG-Jamendo dataset

  • https://mtg.github.io/mtg-jamendo-dataset
  • You need to download audio files for the autotagging_moodtheme.tsv subset.
  • Many moodtheme tags (action,adventure,advertising,ambiental,background, ballad...)
  • So, we manually map music labels onto story labels

Training

1. two branch metric learning

bash ./run_twobranch_train.sh

2. three branch metric learning

bash ./run_train.sh

References

  • Minz Won, Justin Salamon, Nicholas J. Bryan, Gautham J. Mysore, and Xavier Serra
@inproceedings{won2021emotion,
  title={Emotion embedding spaces for matching music to stories},
  author={Won, Minz. and Salamon, Justin. and Bryan, Nicholas J. and Mysore, Gautham J. and Serra, Xavier.},
  booktitle={ISMIR},
  year={2021}
}