Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New model training #6

Open
lucasjinreal opened this issue Jan 15, 2025 · 6 comments
Open

New model training #6

lucasjinreal opened this issue Jan 15, 2025 · 6 comments

Comments

@lucasjinreal
Copy link
Owner

Kokoro actually didn't opensource their training code yet.
However, am currently crafted on it's training procedure. I will also implement multi-speaker TTS with style, further more, I'd like support Voice Cloning feature as well.

If you also interested in reveal the training of Kokoro, please leave a comment below, let's make a contact to see how much further we can push it into.

@lucasjinreal lucasjinreal pinned this issue Jan 15, 2025
@JMLLR1
Copy link

JMLLR1 commented Jan 15, 2025

I am interested in creating a training script as well. Feel free to dm.

@lucasjinreal
Copy link
Owner Author

lucasjinreal commented Jan 16, 2025

@JMLLR1 Pls get on boat: https://discord.gg/6Cryfdvb we are discussing how to start it with same way like Raven does (kokoro author) training with fixed styles. Pin me (airflow)

@lucasjinreal
Copy link
Owner Author

Guys, training on expresso is on-going now:

Epoch [7/50], Step [10/724], Mel Loss: 0.72029, Gen Loss: 28.53927, Disc Loss: 1.39727, Mono Loss: 0.04416, S2S Loss: 1.00780, SLM Loss: 2.52246
Time elasped: 18.641412496566772
Epoch [7/50], Step [20/724], Mel Loss: 0.72382, Gen Loss: 28.51638, Disc Loss: 1.39997, Mono Loss: 0.04016, S2S Loss: 1.05560, SLM Loss: 2.53120
Time elasped: 36.09541082382202
Epoch [7/50], Step [30/724], Mel Loss: 0.70927, Gen Loss: 28.73309, Disc Loss: 1.38668, Mono Loss: 0.07664, S2S Loss: 1.07032, SLM Loss: 2.59169
Time elasped: 53.391515016555786
Epoch [7/50], Step [40/724], Mel Loss: 0.69908, Gen Loss: 28.99346, Disc Loss: 1.37871, Mono Loss: 0.06705, S2S Loss: 1.01630, SLM Loss: 2.52710
Time elasped: 71.93912291526794
Epoch [7/50], Step [50/724], Mel Loss: 0.71041, Gen Loss: 28.97557, Disc Loss: 1.38679, Mono Loss: 0.05649, S2S Loss: 0.69880, SLM Loss: 2.42558
Time elasped: 89.9345874786377
Epoch [7/50], Step [60/724], Mel Loss: 0.70334, Gen Loss: 29.16667, Disc Loss: 1.38239, Mono Loss: 0.02717, S2S Loss: 0.80091, SLM Loss: 2.44010
Time elasped: 108.03602576255798
Epoch [7/50], Step [70/724], Mel Loss: 0.70238, Gen Loss: 29.45588, Disc Loss: 1.37100, Mono Loss: 0.07006, S2S Loss: 0.91138, SLM Loss: 2.45986
Time elasped: 125.50399613380432
Epoch [7/50], Step [80/724], Mel Loss: 0.69871, Gen Loss: 29.58579, Disc Loss: 1.36915, Mono Loss: 0.06034, S2S Loss: 0.95874, SLM Loss: 2.42004
Time elasped: 144.0021252632141
ɪt sˈiːmd ɐn ˈeɪdʒ, ænd ðə deɪnˈuːmɔ̃ jˈɛt ɐnˈʌðɚɹ ˈeɪdʒ dᵻfˈɜːd.
Epoch [7/50], Step [90/724], Mel Loss: 0.69936, Gen Loss: 29.68557, Disc Loss: 1.36117, Mono Loss: 0.08106, S2S Loss: 0.87225, SLM Loss: 2.47548
Time elasped: 162.62565422058105

Waiting for my updates

@JMLLR1
Copy link

JMLLR1 commented Jan 17, 2025

Very nice, seems I have been too slow.. unfortunately I am unable to join the discord until Sunday but I will join. What dataset are you training on right now?

@lucasjinreal
Copy link
Owner Author

Currently test training with expresso. I had finished 24 / 30 in second training, works normal.

Due to limitness scale of expresso dataset, the result only learns how to speak. Next step would be, enlarge the dataset and training with multilangual.

If anyone interested, please contribute dataset or thoughts on improve model ability.

One side of StyleTTS2 good compare with other TTS AR based, it's the relatively small size and effectiveness.

I think we can push StyleTTS2 further if multilingual and large scale supported.

@Marootc
Copy link

Marootc commented Jan 23, 2025

Hello, friend, how is the result of the training model? What is the result of the training of a single person speaking? Can the character's voice be consistent? Kokoro's whisper is great. I want to know how it was made.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants