This release is an update to the first release of the DiffSinger Community Vocoder Project.
This release contains a model weight that has an expanded pitch range up to C6 (1046.5Hz) and significantly improved audio quality. It is distributed as follows:
- A pretrained model for inference in DiffSinger repository
- A pretrained model for fine-tuning in SingingVocoders repository (see release)
- A packaged OpenUTAU dependency that can be directly installed into OpenUTAU (rename the suffix to .zip and unzip it to get the ONNX model)
Please note: the file and package names of this released model are different from the former release in December, 2022. You may have to edit your configuration files to switch from the old model to the new model.
Overview
Architecture: NSF-HiFiGAN
Training data: ~72h carefully selected singing voice
Training step: 110k for fine-tuning
Sampling rate: 44100
Number of mel bins: 128
Hop size: 512
Window size: 2048
Mel frequency (input): 40-16000 Hz
Notice
Pretrained models are released under the Attribution-NonCommercial-ShareAlike 4.0 International license. Please read the notice in the folder if you want to redistribute these pretrained models.
Update (2024.08.03)
We significantly optimized the NSF efficiency in the ONNX model and uploaded a new attachment (nsf_hifigan_44.1k_hop512_128bin_2024.02_logE.oudep). Please also note that the new model accepts log E mel-spectrograms, instead of log10 like the old ones do.