Replies: 6 comments 14 replies
-
Interesting! I see that the dataset you've linked has only ~1.5k samples. It's encouraging to see that fine-tuning on a moderately sized dataset is enough to transfer the model to an entirely new language 👏 |
Beta Was this translation helpful? Give feedback.
-
I am training a French model for the moment on the SIWIS dataset (single very high quality speaker). On my first checkpoint the results are encouraging but it's still pretty bad. I had a total loss_gpt_total of 1.4 at the end of the 500 steps training, do you know what batch size I should use on my 3090 ? |
Beta Was this translation helpful? Give feedback.
-
did you just create a dataset with the non-english language and let it train or did you something different/additional? |
Beta Was this translation helpful? Give feedback.
-
@athu16 can you share some detail on dataset prep for marathi i want to train this on hindi. |
Beta Was this translation helpful? Give feedback.
-
Did you only fine-tune the GPT model, and does the diffusion-decoder not need fine-tuning to generate other languages? |
Beta Was this translation helpful? Give feedback.
-
@athu16 can you please suggest about tokenizer, I'm also trying to train Marathi TTS , but stuck in tokenizer file |
Beta Was this translation helpful? Give feedback.
-
EDIT: YouTube Video to demonstrate (not-so-good) voice cloning of English to Non-English speech: https://youtu.be/kzBOrMw7oBk
I cannot thank you enough for this! I just used your colab notebook to finetune the autoregressive model on this dataset, and after about two hours of training, I have probably the best-sounding offline Marathi (it's a language) TTS engine. It'll only get better with a bigger dataset and longer training. It can't replicate male voices correctly since it's a female speaker dataset, but it's certainly a great start.
So if anyone's wondering if you can use this to create non-English TTS models (with full support for your native scripts), you certainly can!
Beta Was this translation helpful? Give feedback.
All reactions