To generate polyphonic music of multiple tracks (instruments) using Generative Adversarial Networks (GANs). The models aims to generate 4 bars of multitrack coherent music from scratch for 5 instruments. We also aim to extend the model for Human-AI collaboration where 4 instrument tracks can be conditionally generated on the basis of one Human input track. Checkout our docs here
| Category | Technologies |
|---|---|
| Programming Languages | |
| Frameworks | |
| Libraries | |
| Deep Learning Models | |
| Datasets | |
| Tools | |
| Visualization & Analysis |
│ README.md
│
├───Conditional_Track
│ │ Musegan_Conditional_Track.ipynb
│ │ README.md
│ │
│ └───Outputs
│ Outputs-Epoch-25.wav
│
├───Version_1
│ │ Musegan_Y.ipynb
│ │ README.md
│ │
│ └───Outputs
│ download (12).wav
│ download (14).wav
│
└───Version_2
│ MuseGAN_Ver2.ipynb
│ README.md
│
└───Outputs
Epoch-120_Outputs.wav
The whole MuseGAN model is primarily split into 2 parts - Multitrack and Temporal Models.
This is further split into 3 types of models: Composer, Jamming and Hybrid models
It is responsible for creating a uniformity across instruments of all the tracks by using a single generator and a single discriminator.
It is responsible for giving each instrument tracks its characteristic style by using 5 generators and discriminators for 5 tracks.
The Hybrid Model merges both composer and jamming model into one single model using a global vector Z and 5 track-dependent vectors Zi
This model is responsible for encoding bar-specific temporal encodings to the latent vectors. Temporal Model also has two types:
A Temporal Generator (GTemp) is used when 5 coherent tracks are to be generated from scratch.
If a conditional track input is provided, A Temporal Encoder is used to encode the temporal characteristics of human-input track into the latent vectors.
This incorporates both Temporal Generators and Bar Generators and consists of a Global Latent Vector, z, Global Temporal Vector, Zt, Track Dependent Latent Vectors, Zi, and Track Dependent Temporal Vectors, Zit
The LPD-5 Cleansed dataset is a curated version of the original Lakh Pianoroll Dataset (LPD-5), which itself is derived from the Lakh MIDI Dataset (LMD) containing MIDI files from various sources. It consists of over 60,000 multi-track piano-rolls, each aligned to 4/4 time.
-
Install the dependencies
pip install -r requirements -
Go to the particular version folder you want to train and download the
.ipynbfile. -
Run the Nbk locally or in JupyterLab Notebooks
-
To access the trained checkpoint for a particular model, check the
README.mdfile in the particular Version's folder
To access the output audio, check out the Audio folder under the version Folder
- Thanks to everyone at CoC and ProjectX for helping us in the progress of this project.
- Special shoutout to our mentors Kavya Rambhia and Swayam Shah for their support and guidance throughout
Made By Pratyush Rao and Yashasvi Choudhary