We introduce StemGMD, a large-scale multi-kit audio dataset of isolated single-instrument drum stems. Each audio clip is synthesized from MIDI recordings of expressive drums performances from Magenta's Groove MIDI Dataset using ten real-sounding acoustic drum kits.
Totaling 1224 hours of audio, StemGMD is the largest dataset of drums to date and the first to comprise isolated audio clips for every instrument in a canonical nine-piece drum kit.
We leverage StemGMD to develop and release LarsNet, a new deep drums demixing model that can separate five stems from a stereo drum mixture faster than real-time using a parallel arrangement of dedicated U-Nets.
📝 The open access paper "Toward Deep Drum Source Separation" authored by A. I. Mezza, R. Giampiccolo, A. Bernardini, and A. Sarti has been published on Pattern Recognition Letters: https://doi.org/10.1016/j.patrec.2024.04.026
@article{larsnet,
title = {Toward deep drum source separation},
journal = {Pattern Recognition Letters},
volume = {183},
pages = {86-91},
year = {2024},
issn = {0167-8655},
doi = {https://doi.org/10.1016/j.patrec.2024.04.026},
url = {https://www.sciencedirect.com/science/article/pii/S0167865524001351},
author = {Alessandro Ilic Mezza and Riccardo Giampiccolo and Alberto Bernardini and Augusto Sarti}
}
StemGMD is freely available on Zenodo under the CC-BY 4.0 license.
StemGMD was created by taking all the MIDI recordings in Groove MIDI Dataset, applying a MIDI mapping reducing the number of channels from 22 down to 9, and then manually synthetizing the isolated tracks as 16bit/44.1kHz WAV files with ten different acoustic drum kits using Apple's Drum Kit Designer in Logic Pro X.
StemGMD contains isolated stems of nine canonical drum pieces:
- Kick Drum
- Snare
- High Tom
- Low-Mid Tom
- High Floor Tom
- Closed Hi-Hat
- Open Hi-Hat
- Crash Cymbal
- Ride Cymbal
These stems were obtained by applying the MIDI mapping described in Appendix B of (Gillick et al., 2019).
LarsNet can separate five stems from a stereo drum mixture:
- Kick Drum
- Snare
- Tom-Toms (High, Mid-Low, and Floor tom)
- Hi-Hat (Open and Closed Hi-Hat)
- Cymbals (Crash and Ride Cymbals)
Pretrained LarsNet model checkpoints can be found here (562 MB) licensed under CC BY-NC 4.0.
First, download the pretrained models.
Then, unzip the folder and place it in the project directory. Alternatively, modify the inference_models
paths in config.py
as needed.
Finally, run the following command on your terminal:
$ python separate.py -i /path/to/the/folder/containing/your/audio/files
By default, the script will create a folder named separated_stems
where to save the results. Alternatively, you can specify the output directory by using the -o
option:
$ python separate.py -i /path/to/the/folder/containing/your/audio/files -o /path/to/output/folder/
Optionally, you can run a LarsNet version implementing α-Wiener filtering by specifying the option -w
followed by a postive floating-point number indicating the exponent α to be applied, e.g.,
$ python separate.py -i /path/to/the/folder/containing/your/audio/files -w 1.0
This latter version is expected to reduce cross-talk artifacts between separated stems, but might introduce side-chain compression-like artifacts. Namely, choosing α∊(0, 1) would result in more bleed, whereas α≥1 risk increasing the so-called ducking effect.
Lastly, you can specify the device using the -d
option (default: cpu
)
$ python separate.py -i /path/to/the/folder/containing/your/audio/files -d cuda:0
We are working toward releasing the scripts for fine-tuning and training LarsNet from scratch. The code will be available soon.
Audio examples are available on our GitHub page
The structure of StemGMD follows that of Magenta's Groove MIDI Dataset (GMD). Therefore, GMD metadata is preserved in StemGMD, including annotations such as drummer
, session
, style
, bpm
, beat_type
, time_signature
, split
, as well as the source MIDI data.
This extends the applications of StemGMD beyond Deep Drums Demixing.
In fact, we argue that StemGMD may rival other large-scale datasets, such as Expanded Groove MIDI Dataset (E-GMD), for tasks such as Automatic Drum Transcription when considering the countless possbilities for data augmentation that having isolated stems allows for.
You may also want to check out LARS, an open-source VST3/AU plug-in that runs LarsNet under the hood and can be used inside any DAW.
LARS was presented at ISMIR 2023 Late-Breaking Demo Session
A. I. Mezza, R. di Palma, E. Morena, A. Orsatti, R. Giampiccolo, A. Bernardini, and A. Sarti, "LARS: An open-source VST3 plug-in for deep drums demixing with pretrained models," ISMIR 2023 LBD Session, 2023.
📝 LP-33: LARS: An open-source VST3 plug-in for deep drums demixing with pretrained models