Release Finetuned NSF-HiFiGAN with expanded pitch range and improved audio quality · openvpi/vocoders

This release is an update to the first release of the DiffSinger Community Vocoder Project.

This release contains a model weight that has an expanded pitch range up to C6 (1046.5Hz) and significantly improved audio quality. It is distributed as follows:

A pretrained model for inference in DiffSinger repository
A pretrained model for fine-tuning in SingingVocoders repository (see release)
A packaged OpenUTAU dependency that can be directly installed into OpenUTAU (rename the suffix to .zip and unzip it to get the ONNX model)

Please note: the file and package names of this released model are different from the former release in December, 2022. You may have to edit your configuration files to switch from the old model to the new model.

Overview

Architecture: NSF-HiFiGAN
Training data: ~72h carefully selected singing voice
Training step: 110k for fine-tuning
Sampling rate: 44100
Number of mel bins: 128
Hop size: 512
Window size: 2048
Mel frequency (input): 40-16000 Hz

Notice

Pretrained models are released under the Attribution-NonCommercial-ShareAlike 4.0 International license. Please read the notice in the folder if you want to redistribute these pretrained models.

Update (2024.08.03)

We significantly optimized the NSF efficiency in the ONNX model and uploaded a new attachment (nsf_hifigan_44.1k_hop512_128bin_2024.02_logE.oudep). Please also note that the new model accepts log E mel-spectrograms, instead of log10 like the old ones do.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finetuned NSF-HiFiGAN with expanded pitch range and improved audio quality

Overview

Notice

Update (2024.08.03)