-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How many steps would be enough if i train this model from start? #14
Comments
The model released was trained for 670k steps, normally 400k would be sufficient for codec, according to descript-audio-codec's practice |
Thanks! |
I have trained the model on voxceleb2 for 400K steps. However, the reconstructed speech sounds not as good as the demo page and the reconstructed result of the noisy speech sounds even worse. Is there a problem with the data scale or something else? |
I have checked the samples you shared. |
Tks, I'll try to process my training data with denoise and separation model. |
I found that although the training data has been denoised, tuning the pertained Facodec on this still results in unstable pronunciation. Moreover, the unstable pronunciation seems more significant if the original audios sound poorer. Can I just tune the timbre module and freeze the other parts to make it adapt to new speakers? |
BTW, I find that only the content, prosody, and timbre latent features are used when training the Facodec Redecoder. May I ask why the z_r is not employed? |
Hi! Nice work!
Could you share how many steps would be sufficient to train a new model? I'm trying to train a 16k FAcodec. The results reconstructed by ckpt 130,000 still sound different from the real speech, especially for the speaker timbre.
The text was updated successfully, but these errors were encountered: