BandCondiNet: Parallel Transformers-based Conditional Popular Music Generation with Multi-View Features
This is the official implementation of BandCondiNet.
This paper presents BandCondiNet, a controllable music cover generation model that:
- Multi-Views features are proposed to offer fine-grained controls on the granularity of every bar and every track for complex multitrack music.
- We propose structure-enhanced self-attention (SEA) and Cross-Track Transformer (CTT) modules to boost the structure and inter-track dependency modeling for multitrack music respectively.
This repository is organized as follows:
root
├──custom_layers/ structure-enhanced attention layer of BandCondiNet, modified on pytorch-fast-transformers lib
│
├──img/ the illustration of BandCondiNet's architecture
│
├──utils/ utility functions for dataset, inference and model construction
│
├──VQVAE/ scipts for vq-vae training and inference
│
BandCondiNet.py BandCondiNet model
│
BPE_tokenizer_v2.py scipt for BPE learning
│
constants.py constant values
│
Pop_dataset.py dataset for training and inference
│
representation_multiple_v2.py tokenization of REMI_Track representation
│
train_BandCondiNet.py script for training and inference
│
└──vocab_v2.py vocabuluaries
We conduct experiments on a popular music subset of the LakhMIDI(LMD) dataset, which is the largest publicly available symbolic music dataset that contains multiple instruments. To ensure data quality, we perform several data cleaning and processing steps, including genre selection, melody extraction, instrument compression, and data filtering. Please refer to Section 4.1 of the article for details.
git clone https://github.com/Chinglohsiu/BandCondiNet
cd BandCondiNet
pip install -r requirements.txtRun the train_BandCondiNet.py. Set train_or_gen=True for model training. Set it to False for model inference.
The max_seq_len and max_bars parameters determine whether the model trains or performs inference on the 32-bar or 64-bar dataset.
If you have any questions or requests, please write to chinglohsiu[AT]gmail[DOT].com
Please consider citing the following article if you found our work useful:
@article{luo2024bandcondinet,
title={BandCondiNet: Parallel Transformers-based Conditional Popular Music Generation with Multi-View Features},
author={Luo, Jing and Yang, Xinyu and Herremans, Dorien},
journal={arXiv preprint arXiv:2407.10462v2},
year={2024}
}
