Replies: 98 comments 82 replies
-
The scripts won't be released sadly |
Beta Was this translation helpful? Give feedback.
-
+1 for this. I have an actor that I'm good friends, he in his 90's now, who recorded narration for a documentary project of mine some years ago. The project content has changed somewhat and I'd like to clone his voice (with his permission of course, he's still very mentally alert and finds the tortoise project fascinating) as it was when he recorded the original narration. I get the concern to not allow high accuracy cloning for various reasons, I think the need outweighs the downsides - I'm quite sure technology like this is already being used by state agencies (probably including our own). |
Beta Was this translation helpful? Give feedback.
-
I partnered with a company to make this fine-tuning technology available for exactly what you are describing, @indieshack: https://play.ht/pricing/ It's not currently cheap to clone voices, but that's because it is quite costly to rent the amount of compute required to perform the initial fine-tuning process. I know play.ht is really interested in supporting use cases like yours, so you might consider contacting them directly and maybe they will set up some sort of one-time fee for you. |
Beta Was this translation helpful? Give feedback.
-
I'm confused. You've stated many times you can't release the training & fine tuning code because it's unethical, but it was an ethical choice to release the training code to a specific company who makes money from it and gives you a cut? |
Beta Was this translation helpful? Give feedback.
-
That's what happens with greed. I was hoping this for to be open sourced and stuff. This just doesn't make sense. Creator is worried about unethical purposes yet they sell it all to a company!? Like come on, "supporting use cases like yours" or more interested in the money you provide? I feel like it's the latter one. You can't just come in here, then tell us you won't be sharing the code, and then sell it to a company and advertise said company here |
Beta Was this translation helpful? Give feedback.
-
If you really felt like you wanted to support use cases like indieshack provided, then support them instead of slapping a paywall behind everything |
Beta Was this translation helpful? Give feedback.
-
Thanks - took a look, all I can see are pre-cooked voices, I want to actually clone my friend's voice. |
Beta Was this translation helpful? Give feedback.
-
play.ht is actually a cool service, and I assume @neonbjb chose them as they have a strong "personal" verification (they require you to record a consent using your own voice before cloning) I'd also love for someone like @UberDuck to have access to this, @neonbjb did you consider licensing this to other places? |
Beta Was this translation helpful? Give feedback.
-
Uberduck uses open source stuff, 99.99% of models are made by the community and it's all made using open source techniques, switching to tortoise would mean that the training code would need to be open sourced too, since The community, responsible of 99.99% of models, uses colab notebooks |
Beta Was this translation helpful? Give feedback.
-
I've withheld posting here for a bit to try and temper my anger at the community.
Yes. When people in my culture want to do something that is ethically risky and requires more work than any one person can invest, they generally do so behind a corporation. It's a mechanism to allow people to work together for shared profits and to shield individuals from personal risk. Ack that not all cultures and value systems work this way. The people who run play.ht genuinely care about their users and want to make this kind of TTS available to everyone. If you truly believe that it is my moral imperative to releasing fine-tuning to the world, then I think this is the correct way to do it. Most people who would actually use such a system do not have access to the GPUs or knowhow required to fine-tune these models. By partnering with a company who can offer this as a service, I am able to allow considerably more people access to this feature than if I simply open sourced it. The exception to the former statement is that if, after open sourcing it, someone else started a company that did the same thing play.ht is doing.
Give me a break, man. I didn't "sell it all to a company". I have made very little money from my arrangement with play.ht; basically just contract labor rates. I did get a small amount of stock, which is how I get to share in their success. Regardless, this isn't about money for me at all:
Your response to this is totally out of line. I am a long time contributor to open source (and open content in general on the web), and will continue to contribute. What have you contributed? How would you feel if I commented on your projects saying "you didn't give enough away, please give more"? Or "your ethical views are wrong and you don't deserve to make decisions about your creations"? |
Beta Was this translation helpful? Give feedback.
-
The reason of open sourcing is that other people will be able to use your creations for themselves without having them to do it themselves. They get a gift from you and they're happy you gave it to them. The version of Tortoise that is open source is not bad but not usable either. Look at Nvidia, they made tacotron2 open source, stability made stable diffusion open source, OpenAi made jukebox open source, the world is improving in front of us. But OpenAi out gpt-3 behind a paywall, and they also put Dalle behind a paywall, did people like that even though there's literally a free trail? No! They gave backlash and the community also then invented other things like stability. This will happen to tortoise, not to spite you. Tortoise could've been used by a lot more people but when it's put behind a paywall of 45 USD per month, then no one's going to pay that really. I haven't contributed much to the open source community because I can't code, I don't have a good GPU to train stuff and I also don't have the time to do so. I've tried my best to make a model for uberduck called CRUST, it's a multispeaker model trained on 20 hours and 168 speakers. It works good enough for me and I open sourced it for everyone to ever find it. People who can use it can train on as low as 30 seconds of data with reasonable results if the voice is "generic". Everyone can use it and that makes me happy. That also sparked another type of this model that's also open sourced. The point is, make people happy without a paywall, way more people can enjoy it then |
Beta Was this translation helpful? Give feedback.
-
Subscribing just for the drama… ;) |
Beta Was this translation helpful? Give feedback.
-
@neonbjb regardless of your choice, we love you to opensource the model :) |
Beta Was this translation helpful? Give feedback.
-
Also I think VALL-E from microsoft uses the same concept that this repo uses ? Lets hope they release the codebase, I heard it will be at end of jan |
Beta Was this translation helpful? Give feedback.
-
Finally open source realistic voice cloning |
Beta Was this translation helpful? Give feedback.
-
Feel obligated to drop this here: A few of you thought I was being ridiculous about misuse. Well - one of my greatest concerns has officially happened. It makes me ashamed to have open sourced this. My own grandmother was scammed in this way a few years ago and I have never been so furious in my life. To think that I have enabled some scumbag to do it to someone else absolutely kills me. Next up is a police state (or just some random police) using this tech to frame an innocent person. I'm sure it won't receive as much press coverage (if any). But don't mind me. This should be totally open to everyone. There should be a button that anyone can press to impersonate anyone else with no controls whatsoever. The world is totally ready for this. All the good things that this tech can be used for overwhelms the bad, right? |
Beta Was this translation helpful? Give feedback.
-
My man drew a whole ass graph to prove tortoise is irrelevant! 😂
On Mon, 6 Mar 2023 at 11:32 AM 152334H ***@***.***> wrote:
I don't understand why you are specifically concerned by this event. I
think it is fairly clear that most cases in the last month are attributable
to 11labs, not TorToiSe, and not even from fine-tuning, but merely from
their zero-shot VC feature.
On the deepfake fidelity vs programming competency graph, it is highly
unlikely that TorToiSe was responsible (even indirectly) for any pareto
improvement:
[image: image]
<https://user-images.githubusercontent.com/54623771/223035581-004853a7-a24c-4c8c-af89-81d2ced09f6e.png>
—
Reply to this email directly, view it on GitHub
<#292 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGS5WW7I5EJSJDGKLYCV2QLW2WAHPANCNFSM6AAAAAAUR5WD7Q>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***
com>
--
*Mr. Bashir,*
*CEO, AMOXT Pvt. Ltd*
|
Beta Was this translation helpful? Give feedback.
-
Surely you're joking. I know old people are a bit slow but they're nowhere near as slow as using tortoise. Unless suddenly the Indians have all got RTX 3090s and have hours to spend training models, and then somehow
predict, in the future, exactly what will be said and manage to nail it ahead of time, there is still no way this could lead back to you.
They where probably using CorentiJs real time voice cloning pipeline, or more than likely something like Let's be honest, they're about as cheap as they come, and they don't have the cash or resources to power your tool or the patience.
Also, you sent an article that specifically states:
"And there’s little legal precedent for courts to hold the companies that make the tools accountable for their use."
So if anything, it just validates the point of view that it's not going to
come back to you.
…On Mon, 6 Mar 2023, 02:52 James Betker, ***@***.***> wrote:
Feel obligated to drop this here:
https://archive.ph/2023.03.06-001755/https://www.washingtonpost.com/technology/2023/03/05/ai-voice-scam/
A few of you thought I was being ridiculous about misuse. Well - one of my
greatest concerns has officially happened. It makes me ashamed to have open
sourced this. My own grandmother was scammed in this way a few years ago
and I have never been so furious in my life. To think that I have enabled
some scumbag to do it to someone else absolutely kills me.
Next up is a police state (or just some random police) using this tech to
frame an innocent person. I'm sure it won't receive as much press coverage
(if any).
But don't mind me. This should be totally open to everyone. There should
be a button that anyone can press to impersonate anyone else with no
controls whatsoever. The world is totally ready for this. All the good
things that this tech can be used for overwhelms the bad, right?
—
Reply to this email directly, view it on GitHub
<#292 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJXCEAXJG3S6273EYJYJIN3W2VGNVANCNFSM6AAAAAAUR5WD7Q>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Yes it is likely 11labs. 11labs is Tortoise, trained longer on better data with some tweaks to the decoder stack. |
Beta Was this translation helpful? Give feedback.
-
I think 11Labs use DelightfullTTS from Microsoft, because one of the main founder (dunky11) has already published a voice clone repo based on this |
Beta Was this translation helpful? Give feedback.
-
okay so... after this entire thread... which new repo has the ability to train the voice model? like soo much came out so far.. I've tried 11labs but i want something i can have more control with locally. so what should I technically use? voicesmith? ai-voice-cloning? or anything else? |
Beta Was this translation helpful? Give feedback.
-
Didn't know there was this much drama in the TTS world lol. Thanks for the great work @neonbjb |
Beta Was this translation helpful? Give feedback.
-
neonbjb has no obligation to provide anything. Just that saying this is all due to ethics when the horse has already long left the barn is hard to believe. You're still guarding the door to the tts vault at Fort Knox with a shotgun while every other snot nosed teenage tiktoker and their mother is already running around using it. Also the implication that power should be gatekept by corporations tends to rub people the wrong way on a place like this. Keeping tortoise under lock clearly isn't going to stop abuse which is still running rampant even as advanced tts remains under the control of corporations. There is nothing wrong with someone who wants to make money off their stuff rich or poor or even just keep it secret because they feel like it. If he just came out and said that maybe people would understand more. Its just saying this is the morally correct position and by extension releasing fully unlocked software is not... bends people the wrong way. But I suppose everyone is entitled to their opinion. I like tortoise and I thank neonbjb for releasing it which he didn't have to do. Its a fun little tool although in its current state in and of itself its more of a novelty than a serious tts tool, useful mostly for playing around with the preexisting voices. It hardly matters though since superior paid tts options already exist as evidenced by the flood of youtube videos, and its only a matter of time before someone releases similarly superior open source frameworks. |
Beta Was this translation helpful? Give feedback.
-
Western governments are incompetent
That's true, china is far better as using AI technology for evil.
…On Sat, 22 Apr 2023, 17:51 James Betker, ***@***.***> wrote:
Also the implication that power should be gatekept by corporations tends
to rub people the wrong way on a place like this.
That's not my belief. I think this power needs to be gate kept by
*someone* responsible. I am not a "people person" and have no desire to
be this gatekeeper. Western governments are utterly incompetent when it
comes to technology. I wish that were not the case but I have no idea how
to fix it. There are only two entities left to manage this sort of thing,
then: corps and nonprofits.
If someone from the FSF came to me today and offered to take over
stewardship of Tortoise fine-tuning, I'd gladly hand it off to them. As
long as someone is in charge and actually thinking about the safety of
these things, I'm happy.
Somehow I don't think that'd make the folks in this thread happy, though.
While everyone here likes to talk about "free software" and "gatekeeping",
I'm quite certain that in most cases their motivations are very
self-centered.
Regardless - the cats out of the bag and someone basically wrote a
tutorial on how to do this. If you're really hell bent on fine-tuning
Tortoise (spoiler alert: it's not easy!), you can.
—
Reply to this email directly, view it on GitHub
<#292 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJXCEAVNNOF3SMNZBDLO6U3XCQD7JANCNFSM6AAAAAAUR5WD7Q>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
I am working on a voice training atm If it works planning to release my next video with voice over Will be on https://www.youtube.com/SECourses Will update here as I progress I am using 2 hours quality speech |
Beta Was this translation helpful? Give feedback.
-
I just published today my over 8 days worked fine tuning tutorial. It is so easy to follow and do fine tuning Master Deep Voice Cloning in Minutes: Unleash Your Vocal Superpowers! Free and Locally on Your PC This tutorial is based on Ozen Toolkit for data preprocessing |
Beta Was this translation helpful? Give feedback.
-
I am unlikely to be the straw to break your position on fine-tuning, but for the OS community to have a fine-tuning guide from the man himself would be so incredible. I will echo what others have said in that this achievement from a single developer is godly. The encouragement (and tough love) you seem to be getting stems from the true potential we all see in this project. TY 🤘 |
Beta Was this translation helpful? Give feedback.
-
i dont get all the bitching. you fuckers are lucky he didnt just remove the repository. I was able to fine tune voices using training scripts I globbed from different projects and the quality is better than 11labs. His not releasing it to you script kiddies is a good move. Him leaving the code up for programmers who will spend time reading and implementing instead of bitching like spoiled kids is also right. learn to code you silly brats. If you want to spend money for me to show you how to train this and to help you with the code pipeline that does all the work for dataset creation and inference, feel free to contact me. but it wont be inexpensive - I can charge $120/hr for my coding - so you are looking at about $1000 usd for my help. I have integrated this for several companies now. It works great, but dont expect "real time". It takes between 7 and 20 seconds for inference (and can be a lot more depending on settings) and the fine tuning takes about 3 hours for a 2 hour dataset. It takes quite a bit of time to build the training data and get everything clean enough to be useful. There IS NO REASON to berate NEO. You guys are making open source horrible with your entitled bullshit. Props to NEO for keeping his cool. |
Beta Was this translation helpful? Give feedback.
-
This repo is obsolete Coqui is much better and works much better and faster |
Beta Was this translation helpful? Give feedback.
-
wow, this place brings back memories |
Beta Was this translation helpful? Give feedback.
-
So I assume, the collab isn't actually finetuning. This makes sense because for some voices I can get close enough for it sounds good. But like let's say my voice. I have no idea why but it keeps making me sound like a Posh British man.
Any way to improve the cloning of somevoices?
Beta Was this translation helpful? Give feedback.
All reactions