Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Elevenlabs service #83

Merged

Conversation

mohit2152sharma
Copy link
Contributor

@mohit2152sharma mohit2152sharma commented Feb 20, 2024

Added support for elevenlabs, with better parameter support. Have the option to select voice based on voice_name or voice_id. Also change the settings of voice using voice_settings parameter.

Remaining tasks:

@mohit2152sharma mohit2152sharma changed the title Sar 365 elevenlabs voiceover Add elvenlabs service Feb 20, 2024
@mohit2152sharma
Copy link
Contributor Author

@osolmaz , have you had a chance to review the PR? If any changes, let me know...

@osolmaz osolmaz changed the title Add elvenlabs service Add Elevenlabs service Feb 25, 2024
@osolmaz
Copy link
Collaborator

osolmaz commented Feb 25, 2024

Thank you!

I did some minor refactors.

I notice some issues that might be related to the library itself, like cached audio files are not used and are regenerated every run. This is not good since the API is not free. Will try to resolve those now.

In the meanwhile, can you add documentation? Like I saw this, but there was no such section on that page:

Check out https://voiceover.manim.community/en/stable/services.html#elevenlabs to learn how to create an account and get your subscription key.

@osolmaz
Copy link
Collaborator

osolmaz commented Feb 25, 2024

@mohit2152sharma I improved caching behavior and enabled transcription with Whisper by default. This is necessary to use bookmarks.

Can you check whether Elevenlabs API returns word boundaries (timestamps for beginning of each word in the audio)? I looked briefly and couldn't see it, but I feel like it might be hidden somewhere.

Copy link
Collaborator

@osolmaz osolmaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added remaining tasks to your first post, also see my comments.

@mohit2152sharma
Copy link
Contributor Author

mohit2152sharma commented Feb 25, 2024

@osolmaz , I added the documentation. For the bookmark part, it wasn't working for me until i changed "voice_settings": self.voice.model_dump(exclude_none=True). (I tested it with examples/bookmark-example.py setting ElevenlabsService, the animation did get triggered at mentioned bookmarks)

Regarding caching I assumed that it was a bug as it wasn't respecting the --disable_caching flag. But after using elvenlabs for couple of days, I have realised now that it was a good behaviour. Thanks for reverting it.

@osolmaz
Copy link
Collaborator

osolmaz commented Feb 25, 2024

I just ran the bookmark example using the default voice, the quality out of the box is insane.

You could get to something very reasonable with a little tweaking.

BookmarkExample.mp4

@osolmaz
Copy link
Collaborator

osolmaz commented Feb 25, 2024

Btw --disable_caching is necesssary because of a Manim bug with adding sound.

@osolmaz osolmaz merged commit 3e5fa32 into ManimCommunity:main Feb 25, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants