Finetunining for better Results? #292

Maki9009 · 2022-10-23T16:21:03Z

Maki9009
Oct 23, 2022

So I assume, the collab isn't actually finetuning. This makes sense because for some voices I can get close enough for it sounds good. But like let's say my voice. I have no idea why but it keeps making me sound like a Posh British man.

Any way to improve the cloning of somevoices?

stargan2vc · 2022-10-23T20:00:54Z

stargan2vc
Oct 23, 2022

The scripts won't be released sadly

0 replies

indieshack · 2023-01-03T05:41:04Z

indieshack
Jan 3, 2023

+1 for this. I have an actor that I'm good friends, he in his 90's now, who recorded narration for a documentary project of mine some years ago. The project content has changed somewhat and I'd like to clone his voice (with his permission of course, he's still very mentally alert and finds the tortoise project fascinating) as it was when he recorded the original narration. I get the concern to not allow high accuracy cloning for various reasons, I think the need outweighs the downsides - I'm quite sure technology like this is already being used by state agencies (probably including our own).

0 replies

neonbjb · 2023-01-03T09:06:53Z

neonbjb
Jan 3, 2023
Maintainer

I partnered with a company to make this fine-tuning technology available for exactly what you are describing, @indieshack: https://play.ht/pricing/

It's not currently cheap to clone voices, but that's because it is quite costly to rent the amount of compute required to perform the initial fine-tuning process. I know play.ht is really interested in supporting use cases like yours, so you might consider contacting them directly and maybe they will set up some sort of one-time fee for you.

0 replies

deviandice · 2023-01-07T01:56:07Z

deviandice
Jan 7, 2023

I'm confused. You've stated many times you can't release the training & fine tuning code because it's unethical, but it was an ethical choice to release the training code to a specific company who makes money from it and gives you a cut?

0 replies

Randy-H0 · 2023-01-07T02:03:46Z

Randy-H0
Jan 7, 2023

That's what happens with greed. I was hoping this for to be open sourced and stuff. This just doesn't make sense. Creator is worried about unethical purposes yet they sell it all to a company!? Like come on, "supporting use cases like yours" or more interested in the money you provide? I feel like it's the latter one.

You can't just come in here, then tell us you won't be sharing the code, and then sell it to a company and advertise said company here

0 replies

Randy-H0 · 2023-01-07T02:05:38Z

Randy-H0
Jan 7, 2023

If you really felt like you wanted to support use cases like indieshack provided, then support them instead of slapping a paywall behind everything

0 replies

indieshack · 2023-01-07T05:13:24Z

indieshack
Jan 7, 2023

I partnered with a company to make this fine-tuning technology available for exactly what you are describing, @indieshack: https://play.ht/pricing/

It's not currently cheap to clone voices, but that's because it is quite costly to rent the amount of compute required to perform the initial fine-tuning process. I know play.ht is really interested in supporting use cases like yours, so you might consider contacting them directly and maybe they will set up some sort of one-time fee for you.

Thanks - took a look, all I can see are pre-cooked voices, I want to actually clone my friend's voice.
BTW, in respect to other comments my view is that it's perfectly OK to make a few $$ out of programming. I think it's a great project with lots of potential.

0 replies

altryne · 2023-01-10T01:08:59Z

altryne
Jan 10, 2023

play.ht is actually a cool service, and I assume @neonbjb chose them as they have a strong "personal" verification (they require you to record a consent using your own voice before cloning)

I'd also love for someone like @UberDuck to have access to this, @neonbjb did you consider licensing this to other places?

0 replies

Randy-H0 · 2023-01-10T01:14:01Z

Randy-H0
Jan 10, 2023

play.ht is actually a cool service, and I assume @neonbjb chose them as they have a strong "personal" verification (they require you to record a consent using your own voice before cloning)

I'd also love for someone like @UberDuck to have access to this, @neonbjb did you consider licensing this to other places?

Uberduck uses open source stuff, 99.99% of models are made by the community and it's all made using open source techniques, switching to tortoise would mean that the training code would need to be open sourced too, since The community, responsible of 99.99% of models, uses colab notebooks

0 replies

neonbjb · 2023-01-10T18:20:49Z

neonbjb
Jan 10, 2023
Maintainer

I've withheld posting here for a bit to try and temper my anger at the community.

but it was an ethical choice to release the training code to a specific company who makes money from it and gives you a cut?

Yes. When people in my culture want to do something that is ethically risky and requires more work than any one person can invest, they generally do so behind a corporation. It's a mechanism to allow people to work together for shared profits and to shield individuals from personal risk. Ack that not all cultures and value systems work this way.

The people who run play.ht genuinely care about their users and want to make this kind of TTS available to everyone. If you truly believe that it is my moral imperative to releasing fine-tuning to the world, then I think this is the correct way to do it. Most people who would actually use such a system do not have access to the GPUs or knowhow required to fine-tune these models. By partnering with a company who can offer this as a service, I am able to allow considerably more people access to this feature than if I simply open sourced it. The exception to the former statement is that if, after open sourcing it, someone else started a company that did the same thing play.ht is doing.

That's what happens with greed. I was hoping this for to be open sourced and stuff. This just doesn't make sense. Creator is worried about unethical purposes yet they sell it all to a company!? Like come on, "supporting use cases like yours" or more interested in the money you provide? I feel like it's the latter one.

Give me a break, man. I didn't "sell it all to a company". I have made very little money from my arrangement with play.ht; basically just contract labor rates. I did get a small amount of stock, which is how I get to share in their success. Regardless, this isn't about money for me at all:

I built Tortoise because I'm a nerd who likes to program and I have a particular interest in machine learning.
I open sourced it (despite the obvious commercial potential) because I have no interest in running a business and because money isn't really a huge motivating factor for me.
I withdrew from active development on it because I am extremely concerned with how it could be abused, and I do not want my name to be associated with the abuse that will happen if I fully release and document fine-tuning.
I partnered with play.ht because the community (here and through personal communication) convinced me that there was a need for this and partnering with a company that could control the technology seemed like a good avenue for making this happen.

Your response to this is totally out of line. I am a long time contributor to open source (and open content in general on the web), and will continue to contribute. What have you contributed? How would you feel if I commented on your projects saying "you didn't give enough away, please give more"? Or "your ethical views are wrong and you don't deserve to make decisions about your creations"?

2 replies

bbecausereasonss Mar 6, 2023

Don't take it personally. It's just an opinion. It says nothing about you.

Tanoshimi Jul 11, 2024

I understand your position.

Randy-H0 · 2023-01-10T19:07:46Z

Randy-H0
Jan 10, 2023

I've withheld posting here for a bit to try and temper my anger at the community.

but it was an ethical choice to release the training code to a specific company who makes money from it and gives you a cut?

Yes. When people in my culture want to do something that is ethically risky and requires more work than any one person can invest, they generally do so behind a corporation. It's a mechanism to allow people to work together for shared profits and to shield individuals from personal risk. Ack that not all cultures and value systems work this way.

The people who run play.ht genuinely care about their users and want to make this kind of TTS available to everyone. If you truly believe that it is my moral imperative to releasing fine-tuning to the world, then I think this is the correct way to do it. Most people who would actually use such a system do not have access to the GPUs or knowhow required to fine-tune these models. By partnering with a company who can offer this as a service, I am able to allow considerably more people access to this feature than if I simply open sourced it. The exception to the former statement is that if, after open sourcing it, someone else started a company that did the same thing play.ht is doing.

That's what happens with greed. I was hoping this for to be open sourced and stuff. This just doesn't make sense. Creator is worried about unethical purposes yet they sell it all to a company!? Like come on, "supporting use cases like yours" or more interested in the money you provide? I feel like it's the latter one.

Give me a break, man. I didn't "sell it all to a company". I have made very little money from my arrangement with play.ht; basically just contract labor rates. I did get a small amount of stock, which is how I get to share in their success. Regardless, this isn't about money for me at all:

I built Tortoise because I'm a nerd who likes to program and I have a particular interest in machine learning.

I open sourced it (despite the obvious commercial potential) because I have no interest in running a business and because money isn't really a huge motivating factor for me.

I withdrew from active development on it because I am extremely concerned with how it could be abused, and I do not want my name to be associated with the abuse that will happen if I fully release and document fine-tuning.

I partnered with play.ht because the community (here and through personal communication) convinced me that there was a need for this and partnering with a company that could control the technology seemed like a good avenue for making this happen.

Your response to this is totally out of line. I am a long time contributor to open source (and open content in general on the web), and will continue to contribute. What have you contributed? How would you feel if I commented on your projects saying "you didn't give enough away, please give more"? Or "your ethical views are wrong and you don't deserve to make decisions about your creations"?

The reason of open sourcing is that other people will be able to use your creations for themselves without having them to do it themselves. They get a gift from you and they're happy you gave it to them. The version of Tortoise that is open source is not bad but not usable either. Look at Nvidia, they made tacotron2 open source, stability made stable diffusion open source, OpenAi made jukebox open source, the world is improving in front of us. But OpenAi out gpt-3 behind a paywall, and they also put Dalle behind a paywall, did people like that even though there's literally a free trail? No! They gave backlash and the community also then invented other things like stability. This will happen to tortoise, not to spite you. Tortoise could've been used by a lot more people but when it's put behind a paywall of 45 USD per month, then no one's going to pay that really.

I haven't contributed much to the open source community because I can't code, I don't have a good GPU to train stuff and I also don't have the time to do so. I've tried my best to make a model for uberduck called CRUST, it's a multispeaker model trained on 20 hours and 168 speakers. It works good enough for me and I open sourced it for everyone to ever find it. People who can use it can train on as low as 30 seconds of data with reasonable results if the voice is "generic". Everyone can use it and that makes me happy. That also sparked another type of this model that's also open sourced.

The point is, make people happy without a paywall, way more people can enjoy it then

0 replies

iamkhalidbashir · 2023-01-10T19:14:59Z

iamkhalidbashir
Jan 10, 2023

Subscribing just for the drama… ;)

0 replies

iamkhalidbashir · 2023-01-10T19:16:43Z

iamkhalidbashir
Jan 10, 2023

@neonbjb regardless of your choice, we love you to opensource the model :)
But It would be waaaaay better if you open-sourced the training code instead of model 😁
Anyways thanks.

0 replies

iamkhalidbashir · 2023-01-10T19:18:31Z

iamkhalidbashir
Jan 10, 2023

Also I think VALL-E from microsoft uses the same concept that this repo uses ? Lets hope they release the codebase, I heard it will be at end of jan

0 replies

Randy-H0 · 2023-01-10T19:20:55Z

Randy-H0
Jan 10, 2023

Also I think VALL-E from microsoft uses the same concept that this repo uses ? Lets hope they release the codebase, I heard it will be at end of jan

Finally open source realistic voice cloning

0 replies

neonbjb · 2023-03-06T02:51:58Z

neonbjb
Mar 6, 2023
Maintainer

Feel obligated to drop this here:
https://archive.ph/2023.03.06-001755/https://www.washingtonpost.com/technology/2023/03/05/ai-voice-scam/

A few of you thought I was being ridiculous about misuse. Well - one of my greatest concerns has officially happened. It makes me ashamed to have open sourced this. My own grandmother was scammed in this way a few years ago and I have never been so furious in my life. To think that I have enabled some scumbag to do it to someone else absolutely kills me.

Next up is a police state (or just some random police) using this tech to frame an innocent person. I'm sure it won't receive as much press coverage (if any).

But don't mind me. This should be totally open to everyone. There should be a button that anyone can press to impersonate anyone else with no controls whatsoever. The world is totally ready for this. All the good things that this tech can be used for overwhelms the bad, right?

8 replies

bbecausereasonss Mar 6, 2023

As others have said, I have no idea what makes you think it's tortoise. As it stands even with devils (fine-tuning) tortoise isn't all that believable. 11Labs is for sure, but even before this tech there were other repos and ways of cloning voices. Scammers will always sam, in fact they ran similar scams years ago.

You need find your backbone. Stand by your choices and work. As I mentioned before, fire can create food and can also be used for arson. It's not the tech it's the people. A minority of people will use any technology for sinister purposes. It has and will always be this way.

lopho Mar 6, 2023

The ridicule was directed at the notion that obscuring/withholding training instructions would somehow prevent misuse. As you can see, it was not misplaced.

I'd argue it even amplified the problem, as now there are a few entities capable of using this at a proficient level, while the knowledge to do so is obscure or scattered all over the place, making a straightforward analysis without stumbling stones a massive pain in the ass.

golf clap

deviandice Mar 6, 2023

I'd argue it even amplified the problem, as now there are a few entities capable of using this at a proficient level, while the knowledge to do so is obscure or scattered all over the place, making a straightforward analysis without stumbling stones a massive pain in the ass.

I agree here.. Withholding it stopped it being available to everyone, which has helped to create the current the status quo. The average everyday person does not believe individuals have access to this technology, if they did they would be more cautious. And equally, corporations claiming otherwise only add more smoke and mirrors to the reality of it.

AnAIGuy Mar 6, 2023

Next up is a police state (or just some random police) using this tech to frame an innocent person. I'm sure it won't receive as much press coverage (if any).

And you think keeping your technology locked behind closed doors where your CEOs are still forced to work with the governments behind the scenes is somehow a better solution? Don't act like Tortoise changed the world of TTS and that all the top tech companies couldn't just as easily scrape together something that sounds indistinguishable from human voice. The only true solution here is to fully expose this stuff so everyone knows what it's capable of. The technology isn't going away, people need to learn they shouldn't blindly trust digital information. The scary thing is people trusted it in the first place. I wonder how many people were already falsely framed by AI in the years before us common folk got our hands on this technology and learned what it was really capable of.

drgrib Mar 6, 2023

I have to agree with the general sentiment that awareness about ways to be scammed is the key, not a futile attempt to suppress the progression of technology and information. You won't hold it back. People want to create this technology for the number of benefits it grants and they'll do that with or without your help.

As people have alluded to, old people lacking technology education were getting fleeced by the software technician bank transfer scam for a solid decade before voice tech was widely available. We just need to broadcast and educate people about the possible scams out there. It's the same principle as security technology. Keep updating the public consciousness with the way people can be scammed so their awareness and security are updated.

iamkhalidbashir · 2023-03-06T06:34:01Z

iamkhalidbashir
Mar 6, 2023

My man drew a whole ass graph to prove tortoise is irrelevant! 😂

On Mon, 6 Mar 2023 at 11:32 AM 152334H ***@***.***> wrote: I don't understand why you are specifically concerned by this event. I think it is fairly clear that most cases in the last month are attributable to 11labs, not TorToiSe, and not even from fine-tuning, but merely from their zero-shot VC feature. On the deepfake fidelity vs programming competency graph, it is highly unlikely that TorToiSe was responsible (even indirectly) for any pareto improvement: [image: image] <https://user-images.githubusercontent.com/54623771/223035581-004853a7-a24c-4c8c-af89-81d2ced09f6e.png> — Reply to this email directly, view it on GitHub <#292 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGS5WW7I5EJSJDGKLYCV2QLW2WAHPANCNFSM6AAAAAAUR5WD7Q> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.*** com>

-- *Mr. Bashir,* *CEO, AMOXT Pvt. Ltd*

1 reply

deviandice Mar 6, 2023

My man drew a whole ass graph to prove tortoise is irrelevant! 😂

Science!

deviandice · 2023-03-06T06:38:51Z

deviandice
Mar 6, 2023

Surely you're joking. I know old people are a bit slow but they're nowhere near as slow as using tortoise. Unless suddenly the Indians have all got RTX 3090s and have hours to spend training models, and then somehow predict, in the future, exactly what will be said and manage to nail it ahead of time, there is still no way this could lead back to you. They where probably using CorentiJs real time voice cloning pipeline, or more than likely something like Let's be honest, they're about as cheap as they come, and they don't have the cash or resources to power your tool or the patience. Also, you sent an article that specifically states: "And there’s little legal precedent for courts to hold the companies that make the tools accountable for their use." So if anything, it just validates the point of view that it's not going to come back to you.

…

On Mon, 6 Mar 2023, 02:52 James Betker, ***@***.***> wrote: Feel obligated to drop this here: https://archive.ph/2023.03.06-001755/https://www.washingtonpost.com/technology/2023/03/05/ai-voice-scam/ A few of you thought I was being ridiculous about misuse. Well - one of my greatest concerns has officially happened. It makes me ashamed to have open sourced this. My own grandmother was scammed in this way a few years ago and I have never been so furious in my life. To think that I have enabled some scumbag to do it to someone else absolutely kills me. Next up is a police state (or just some random police) using this tech to frame an innocent person. I'm sure it won't receive as much press coverage (if any). But don't mind me. This should be totally open to everyone. There should be a button that anyone can press to impersonate anyone else with no controls whatsoever. The world is totally ready for this. All the good things that this tech can be used for overwhelms the bad, right? — Reply to this email directly, view it on GitHub <#292 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJXCEAXJG3S6273EYJYJIN3W2VGNVANCNFSM6AAAAAAUR5WD7Q> . You are receiving this because you commented.Message ID: ***@***.***>

0 replies

neonbjb · 2023-03-06T15:19:53Z

neonbjb
Mar 6, 2023
Maintainer

Yes it is likely 11labs. 11labs is Tortoise, trained longer on better data with some tweaks to the decoder stack.

6 replies

152334H Mar 6, 2023

also, on a completely unrelated note, do you think there's any hope for replacing full attention in the gpt model with some monotonic local attention variant?

I was looking at the attention matrices for a sample in bertviz; each mel token seems to focus mostly on the previous ~10 mel tokens + relevant text tokens + the conditioning latent "token". Intuitively, I also feel like most parts of speech shouldn't need much more context than what's being read and what audio tokens were most recently outputted...

TortoiseFanboy Mar 6, 2023

You can't just assume 11labs is tortoise? What?

deviandice Mar 6, 2023

Yes it is likely 11labs. 11labs is Tortoise, trained longer on better data with some tweaks to the decoder stack.

Currently, https://git.ecker.tech/mrq/ai-voice-cloning is also tortoise, with Nvidias newest BigVGAN vocoder that I integrated to replace the aging Univnet. I've taken your advice. I've seen only positivity - people are thrilled with the noticeable improvements, even if it doesn't affect the internal workings. Altruistic intentions lead to better outcomes, as my hard work demonstrates.

However, you might feel that your approach backfired, resulting in the company advancing faster than you anticipated. Without direction, it may not meet everyone's needs, and there may be more bad actors than good. But don't lose hope. You can encourage positive use by focusing on meaningful improvements that enable its use for creative media, comedy, learning, and entertainment. Re-evaluating your methods will ensure that this technology serves its purpose in the way you intended.

152334H Mar 6, 2023

However, you might feel that your approach backfired, resulting in the company advancing faster than you anticipated. Without direction, it may not meet everyone's needs, and there may be more bad actors than good. But don't lose hope. You can encourage positive use by focusing on meaningful improvements that enable its use for creative media, comedy, learning, and entertainment. Re-evaluating your methods will ensure that this technology serves its purpose in the way you intended.

this was written by chatgpt wasn't it

deviandice Mar 6, 2023

No, I wrote it myself. I'm a game developer so my interest in the software extends into my interest into creating. You have to understand that, whilst I might be a bit of a unicorn, the majority of people who will benefit from this technology are creators. That's always been my frame of mind.

iamkhalidbashir · 2023-03-06T16:05:49Z

iamkhalidbashir
Mar 6, 2023

I think 11Labs use DelightfullTTS from Microsoft, because one of the main founder (dunky11) has already published a voice clone repo based on this
https://github.com/dunky11/voicesmith

2 replies

152334H Mar 6, 2023

dunky11 (and other 11labs employees) has cloned && starred a million tts repos, this is not demonstrative nor informative

bbecausereasonss Mar 10, 2023

They've also outright said it's not even a single line of code from tortoise.

Maki9009 · 2023-03-07T22:17:00Z

Maki9009
Mar 7, 2023
Author

okay so... after this entire thread... which new repo has the ability to train the voice model? like soo much came out so far.. I've tried 11labs but i want something i can have more control with locally. so what should I technically use? voicesmith? ai-voice-cloning? or anything else?

2 replies

GuyARoss Mar 8, 2023

imo, ai-voice-cloning is the best you are going to be able to get without doing any sufficient amount of work.

bbecausereasonss Mar 10, 2023

With that said, so far no one has really created good results with it either.

Tobyrrr00 · 2023-04-04T21:42:42Z

Tobyrrr00
Apr 4, 2023

Didn't know there was this much drama in the TTS world lol. Thanks for the great work @neonbjb

0 replies

Thresher12 · 2023-04-21T10:18:25Z

Thresher12
Apr 21, 2023

neonbjb has no obligation to provide anything. Just that saying this is all due to ethics when the horse has already long left the barn is hard to believe. You're still guarding the door to the tts vault at Fort Knox with a shotgun while every other snot nosed teenage tiktoker and their mother is already running around using it.

Also the implication that power should be gatekept by corporations tends to rub people the wrong way on a place like this. Keeping tortoise under lock clearly isn't going to stop abuse which is still running rampant even as advanced tts remains under the control of corporations.

There is nothing wrong with someone who wants to make money off their stuff rich or poor or even just keep it secret because they feel like it. If he just came out and said that maybe people would understand more. Its just saying this is the morally correct position and by extension releasing fully unlocked software is not... bends people the wrong way. But I suppose everyone is entitled to their opinion.

I like tortoise and I thank neonbjb for releasing it which he didn't have to do. Its a fun little tool although in its current state in and of itself its more of a novelty than a serious tts tool, useful mostly for playing around with the preexisting voices. It hardly matters though since superior paid tts options already exist as evidenced by the flood of youtube videos, and its only a matter of time before someone releases similarly superior open source frameworks.

4 replies

neonbjb Apr 22, 2023
Maintainer

Also the implication that power should be gatekept by corporations tends to rub people the wrong way on a place like this.

That's not my belief. I think this power needs to be gate kept by someone responsible. I am not a "people person" and have no desire to be this gatekeeper. Western governments are utterly incompetent when it comes to technology. I wish that were not the case but I have no idea how to fix it. There are only two entities left to manage this sort of thing, then: corps and nonprofits.

If someone from the FSF came to me today and offered to take over stewardship of Tortoise fine-tuning, I'd gladly hand it off to them. As long as someone is in charge and actually thinking about the safety of these things, I'm happy.

Somehow I don't think that'd make the folks in this thread happy, though. While everyone here likes to talk about "free software" and "gatekeeping", I'm quite certain that in most cases their motivations are very self-centered.

Regardless - the cats out of the bag and someone basically wrote a tutorial on how to do this. If you're really hell bent on fine-tuning Tortoise (spoiler alert: it's not easy!), you can.

152334H Apr 23, 2023

Western governments are utterly incompetent when it comes to technology. I wish that were not the case but I have no idea how to fix it.

🤔 interesting to consider if your employer shares this belief -- it doesn't align with what I thought they thought or were hoping to accomplish.

neonbjb Apr 23, 2023
Maintainer

I'm not my employer and they don't control my opinions, nor are my opinions those of my employer.

With that said, I think this belief is very much what OAI is concerned about - democratic governments are slow moving and this tech is improving fast and will likely have a big impact on our society. I think the organization genuinely wants to help lawmakers make decisions that will both keep everyone safe while avoiding the west losing it's competitive edge.

ThrowawayAccount01 Apr 23, 2023

the amount of copium my man's huffing is unreal. just say you want to monetize your work and be done with it, simple as. 5head

deviandice · 2023-04-22T21:34:17Z

deviandice
Apr 22, 2023

Western governments are incompetent

That's true, china is far better as using AI technology for evil.

…

On Sat, 22 Apr 2023, 17:51 James Betker, ***@***.***> wrote: Also the implication that power should be gatekept by corporations tends to rub people the wrong way on a place like this. That's not my belief. I think this power needs to be gate kept by *someone* responsible. I am not a "people person" and have no desire to be this gatekeeper. Western governments are utterly incompetent when it comes to technology. I wish that were not the case but I have no idea how to fix it. There are only two entities left to manage this sort of thing, then: corps and nonprofits. If someone from the FSF came to me today and offered to take over stewardship of Tortoise fine-tuning, I'd gladly hand it off to them. As long as someone is in charge and actually thinking about the safety of these things, I'm happy. Somehow I don't think that'd make the folks in this thread happy, though. While everyone here likes to talk about "free software" and "gatekeeping", I'm quite certain that in most cases their motivations are very self-centered. Regardless - the cats out of the bag and someone basically wrote a tutorial on how to do this. If you're really hell bent on fine-tuning Tortoise (spoiler alert: it's not easy!), you can. — Reply to this email directly, view it on GitHub <#292 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJXCEAVNNOF3SMNZBDLO6U3XCQD7JANCNFSM6AAAAAAUR5WD7Q> . You are receiving this because you commented.Message ID: ***@***.***>

0 replies

FurkanGozukara · 2023-04-30T23:13:30Z

FurkanGozukara
Apr 30, 2023

I am working on a voice training atm

If it works planning to release my next video with voice over

Will be on https://www.youtube.com/SECourses

Will update here as I progress

I am using 2 hours quality speech

0 replies

FurkanGozukara · 2023-05-15T00:11:06Z

FurkanGozukara
May 15, 2023

I just published today my over 8 days worked fine tuning tutorial. It is so easy to follow and do fine tuning

Master Deep Voice Cloning in Minutes: Unleash Your Vocal Superpowers! Free and Locally on Your PC

This tutorial is based on

Ozen Toolkit for data preprocessing
DLAS for Training
Tortoise TTS Fast for speech synthesis

3 replies

bbecausereasonss May 15, 2023

Awesome, how are the results compared to 11Labs?

FurkanGozukara May 17, 2023

Awesome, how are the results compared to 11Labs?

I haven't. They are just too expensive :)

vitalifa May 17, 2023

Great video, brings together lots of info scatterred across repos. TY 👍

vitalifa · 2023-05-17T08:55:00Z

vitalifa
May 17, 2023

I am unlikely to be the straw to break your position on fine-tuning, but for the OS community to have a fine-tuning guide from the man himself would be so incredible. I will echo what others have said in that this achievement from a single developer is godly. The encouragement (and tough love) you seem to be getting stems from the true potential we all see in this project.

TY 🤘

0 replies

marktellez · 2024-07-01T05:03:39Z

marktellez
Jul 1, 2024

i dont get all the bitching. you fuckers are lucky he didnt just remove the repository.

I was able to fine tune voices using training scripts I globbed from different projects and the quality is better than 11labs.

His not releasing it to you script kiddies is a good move. Him leaving the code up for programmers who will spend time reading and implementing instead of bitching like spoiled kids is also right.

learn to code you silly brats. If you want to spend money for me to show you how to train this and to help you with the code pipeline that does all the work for dataset creation and inference, feel free to contact me. but it wont be inexpensive - I can charge $120/hr for my coding - so you are looking at about $1000 usd for my help.

I have integrated this for several companies now. It works great, but dont expect "real time". It takes between 7 and 20 seconds for inference (and can be a lot more depending on settings) and the fine tuning takes about 3 hours for a 2 hour dataset. It takes quite a bit of time to build the training data and get everything clean enough to be useful.

There IS NO REASON to berate NEO. You guys are making open source horrible with your entitled bullshit. Props to NEO for keeping his cool.

1 reply

drgrib Jul 1, 2024

While I agree that people made ridiculous accusations of neonbjb "selling out" and there was an absurd amount of entitlement, as someone with a day job and limited free time outside of work that has nothing to do with machine learning, it is frustrating to have a repo that is half functional. Without fine-tuning, the results of tortoise are usually jarring and schizophrenic. There is enough quality that you can tell something special is there if it were fine-tuned but not enough to actually use without pouring in hours of extra time I just don't have.

FurkanGozukara · 2024-07-01T08:31:05Z

FurkanGozukara
Jul 1, 2024

This repo is obsolete

Coqui is much better and works much better and faster

0 replies

152334H · 2024-07-01T14:17:58Z

152334H
Jul 1, 2024

wow, this place brings back memories

0 replies

Finetunining for better Results? #292

Replies: 98 comments · 82 replies

neonbjb Jan 3, 2023 Maintainer

neonbjb Jan 10, 2023 Maintainer

neonbjb Mar 6, 2023 Maintainer

neonbjb Mar 6, 2023 Maintainer

Maki9009 Mar 7, 2023 Author

neonbjb Apr 22, 2023 Maintainer

neonbjb Apr 23, 2023 Maintainer

Replies: 98 comments 82 replies

neonbjb
Jan 3, 2023
Maintainer

neonbjb
Jan 10, 2023
Maintainer

neonbjb
Mar 6, 2023
Maintainer

neonbjb
Mar 6, 2023
Maintainer

Maki9009
Mar 7, 2023
Author

neonbjb Apr 22, 2023
Maintainer

neonbjb Apr 23, 2023
Maintainer