SOME: Singing-Oriented MIDI Extractor.
WARNING
This project is under beta version now. No backward compatibility is guaranteed.
SOME is a MIDI extractor that can convert singing voice to MIDI sequence, with the following advantages:
- Speed: 9x faster than real-time on an i5 12400 CPU, and 300x on a 3080Ti GPU.
- Low resource dependency: SOME can be trained on custom dataset, and can achieve good results with only 3 hours of training data.
- Functionality: SOME can produce non-integer MIDI values, which is specially suitable for DiffSinger variance labeling.
SOME requires Python 3.8 or later. We strongly recommend you create a virtual environment via Conda or venv before installing dependencies.
-
Install PyTorch 2.1 or later following the official instructions according to your OS and hardware.
-
Install other dependencies via the following command:
pip install -r requirements.txt
-
(Optional) For better pitch extraction results, please download the RMVPE pretrained model from here and extract it into
pretrained/
directory.
Download pretrained model of SOME from releases and extract them somewhere.
To infer with CLI, run the following command:
python infer.py --model CKPT_PATH --wav WAV_PATH
This will load model at CKPT_PATH, extract MIDI from audio file at WAV_PATH and save a MIDI file. For more useful options, run
python infer.py --help
To infer with Web UI, run the following command:
python webui.py --work_dir WORK_DIR
Then you can open the gradio interface through your browser and use the models under WORK_DIR following the instructions on the web page. For more useful options, run
python webui.py --help
Download pretrained model of SOME from releases and extract them somewhere.
To use SOME for an existing DiffSinger dataset, you should have a transcriptions.csv with name
, ph_seq
, ph_dur
and ph_num
in it. Run the following command:
python batch_infer.py --model CKPT_PATH --dataset RAW_DATA_DIR --overwrite
This will use the model to get all MIDI sequences (with floating point pitch values) from the recordings in the dataset and OVERWRITE its transcriptions.csv with note_seq
and note_dur
added or replaced. Please be careful and back up your files if necessary.
For more useful options, run
python batch_infer.py --help
Training scripts are uploaded but may not be well-organized yet. For the best compatibility, we suggest training your own model after a stable release in the future.
Any organization or individual is prohibited from using any recordings obtained without consent from the provider as training data. If you do not comply with this item, you could be in violation of copyright laws or software EULAs.
SOME is licensed under the MIT License.