Skip to content

Star-Light-9/MuseGAN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MuseGAN

🎯 Aim

To generate polyphonic music of multiple tracks (instruments) using Generative Adversarial Networks (GANs). The models aims to generate 4 bars of multitrack coherent music from scratch for 5 instruments. We also aim to extend the model for Human-AI collaboration where 4 instrument tracks can be conditionally generated on the basis of one Human input track. Checkout our docs here

⚙️ Tech Stack

Category Technologies
Programming Languages Python
Frameworks PyTorch
Libraries NumPy Pandas SciPy Matplotlib tqdm os torch
Deep Learning Models GAN CNN WGAN-GP
Datasets LPD-5 Cleansed
Tools Git Kaggle
Visualization & Analysis Matplotlib pretty_midi pypianoroll

📂 Folder Structure

│   README.md
│
├───Conditional_Track
│   │   Musegan_Conditional_Track.ipynb
│   │   README.md
│   │
│   └───Outputs
│           Outputs-Epoch-25.wav
│
├───Version_1
│   │   Musegan_Y.ipynb
│   │   README.md
│   │
│   └───Outputs
│           download (12).wav
│           download (14).wav
│
└───Version_2
    │   MuseGAN_Ver2.ipynb
    │   README.md
    │
    └───Outputs
            Epoch-120_Outputs.wav

💃 Model Structure

The whole MuseGAN model is primarily split into 2 parts - Multitrack and Temporal Models.

Multi-Track Model

This is further split into 3 types of models: Composer, Jamming and Hybrid models

image
  • Composer Model

It is responsible for creating a uniformity across instruments of all the tracks by using a single generator and a single discriminator.

  • Jamming Model

It is responsible for giving each instrument tracks its characteristic style by using 5 generators and discriminators for 5 tracks.

  • Hybrid Model

The Hybrid Model merges both composer and jamming model into one single model using a global vector Z and 5 track-dependent vectors Zi

Temporal Model

This model is responsible for encoding bar-specific temporal encodings to the latent vectors. Temporal Model also has two types:

  • Generation From Sratch

A Temporal Generator (GTemp) is used when 5 coherent tracks are to be generated from scratch.

  • Conditional Generation

If a conditional track input is provided, A Temporal Encoder is used to encode the temporal characteristics of human-input track into the latent vectors.

Overall Structure

image

This incorporates both Temporal Generators and Bar Generators and consists of a Global Latent Vector, z, Global Temporal Vector, Zt, Track Dependent Latent Vectors, Zi, and Track Dependent Temporal Vectors, Zit

📊 Data

The LPD-5 Cleansed dataset is a curated version of the original Lakh Pianoroll Dataset (LPD-5), which itself is derived from the Lakh MIDI Dataset (LMD) containing MIDI files from various sources. It consists of over 60,000 multi-track piano-rolls, each aligned to 4/4 time.

🚂 How To Train The Model

  • Install the dependencies

    pip install -r requirements

  • Go to the particular version folder you want to train and download the .ipynb file.

  • Run the Nbk locally or in JupyterLab Notebooks

  • To access the trained checkpoint for a particular model, check the README.md file in the particular Version's folder

🎼 Outputs

To access the output audio, check out the Audio folder under the version Folder

👏 Acknowledgement

  • Thanks to everyone at CoC and ProjectX for helping us in the progress of this project.
  • Special shoutout to our mentors Kavya Rambhia and Swayam Shah for their support and guidance throughout

Made By Pratyush Rao and Yashasvi Choudhary

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •