-
Notifications
You must be signed in to change notification settings - Fork 9
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
51b02b8
commit 5652cc0
Showing
4 changed files
with
92 additions
and
40 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,72 +1,94 @@ | ||
# YouTube Auto-Dub | ||
|
||
This repository aims to establish a starting point for deploying degli endpoints which utilizes OpenVoice and FastAPI. The main functionality is to recognize the voice timbre from a YouTube video and recreate the same video with a text-to-speech model in the same timbre after translating the subtitles. This is just a basic setup. | ||
This repository serves as a starting point for developing a FastAPI backend for dubbing YouTube videos by capturing and inferring the voice timbre using OpenVoice. | ||
|
||
YouTube Auto-Dub is an innovative backend application designed for automated voice dubbing of YouTube videos. Utilizing Docker for deployment, OpenVoice for voice timbre recognition, and FastAPI for web services, this project enables the creation of dubbed YouTube videos with text-to-speech models matching the original voice timbre. | ||
![Example Image](static/screen.png) | ||
|
||
## Steps: | ||
## Core Features | ||
|
||
1. Submit a YouTube link via the endpoint `/api/v1/download/`. | ||
2. The final processed video is saved in `backend/data/final_videos`. | ||
- **Voice Timbre Recognition**: Utilizes OpenVoice technology to accurately recognize the voice timbre from the original YouTube video. | ||
- **Text-to-Speech Synthesis**: Downloads and processes subtitles, translating them and converting them into speech, matching the original voice timbre as closely as possible. | ||
- **Flexible Deployment**: Supports deployment via GitHub Actions and Cloud Build, with compatibility for Cloud Run deployment, ensuring scalability and ease of use. Currently, inference is performed using CPU. For setting up Cloud Run with Terraform, refer to the following repository for instructions: | ||
|
||
## Features | ||
[FastAPI-CloudRun-Starter](https://github.com/mazzasaverio/fastapi-cloudrun-starter) | ||
|
||
- Deployment via GitHub Actions and Cloud Build on a Cloud Run. | ||
## Getting Started | ||
|
||
Currently ho provato the deployment is on a Cloud Run (thus, only CPU is used for inference). | ||
To get started with YouTube Auto-Dub, follow these steps: | ||
|
||
For a starting template on setting up Cloud Run with Terraform, refer to this link: | ||
[FastAPI-CloudRun-Starter](https://github.com/mazzasaverio/fastapi-cloudrun-starter) | ||
### 1. Environment Setup | ||
|
||
## Next Steps | ||
For local development, we recommend setting up a conda environment with: | ||
|
||
- Test better models. | ||
- Test serverless GPU. | ||
- Add a frontend. | ||
- Improve translation synchronization. | ||
```bash | ||
conda install mamba -n base -c conda-forge | ||
mamba create -n youtube-auto-dub python=3.9 -y | ||
mamba install -n youtube-auto-dub pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia -y | ||
conda activate youtube-auto-dub | ||
pip install -r requirements.txt | ||
``` | ||
|
||
## Local Installation Instructions | ||
### 2. Download Required Checkpoints | ||
|
||
We recommend the following for local installation: | ||
Download the model checkpoints necessary for voice timbre recognition and synthesis: | ||
|
||
```bash | ||
sudo aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://myshell-public-repo-hosting.s3.amazonaws.com/checkpoints_1226.zip -d /code -o checkpoints_1226.zip | ||
sudo unzip /code/checkpoints_1226.zip -d backend/checkpoints | ||
``` | ||
conda install mamba -n base -c conda-forge | ||
mamba create -n youtube-auto-dub python=3.9 -y | ||
|
||
mamba install -n youtube-auto-dub pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia -y | ||
### 3. Running the Application | ||
|
||
conda activate youtube-auto-dub | ||
With the environment set up and checkpoints downloaded, navigate to the backend directory and start the application using: | ||
|
||
pip install -r requirements.txt | ||
```bash | ||
cd backend | ||
uvicorn app.main:app --reload | ||
``` | ||
|
||
Download the checkpoint from [here](https://myshell-public-repo-hosting.s3.amazonaws.com/checkpoints_1226.zip) and extract it to the `checkpoints` folder. Insert the checkpoint found in `checkpoints_1226` into the `backend` folder. | ||
## Usage | ||
|
||
conda install mamba -n base -c conda-forge | ||
To use YouTube Auto-Dub, begin by submitting a YouTube link via the endpoint: | ||
|
||
mamba create -n youtube-auto-dub python=3.9 -y | ||
``` | ||
/api/v1/download/ | ||
``` | ||
|
||
mamba install -n youtube-auto-dub pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia -y | ||
The application will process the video, recognize the voice timbre, translate the subtitles, synthesize the translated speech matching the original timbre, and then assemble the final video. The processed video will be saved in `backend/data/final_videos`. With the video ID returned in the output, you can check the processing status through the endpoint: | ||
|
||
conda activate youtube-auto-dub | ||
``` | ||
/api/v1/status/{video_id} | ||
``` | ||
|
||
pip install pytube moviepy fastapi uvicorn loguru youtube-dl youtube-transcript-api librosa | ||
Finally, you can download the final video by using the endpoint: | ||
|
||
pip install googletrans==4.0.0-rc1 | ||
``` | ||
/api/v1/download-video/{video_id} | ||
``` | ||
|
||
sudo apt -y install -qq aria2 unzip | ||
inserting the video's ID. | ||
|
||
sudo aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://myshell-public-repo-hosting.s3.amazonaws.com/checkpoints_1226.zip -d /code -o checkpoints_1226.zip | ||
## Deployment | ||
|
||
sudo unzip /code/checkpoints_1226.zip | ||
This project is designed with cloud deployment in mind. The provided `cloudbuild.yaml` and Terraform configurations facilitate deployment on Google Cloud Platform, specifically using Cloud Run for scalable, serverless application hosting. | ||
|
||
<!-- START_SECTION:reference-inspiration --> | ||
## Contributing | ||
|
||
Contributions are welcome! Whether you're fixing a bug, adding new features, or improving the documentation, your help is appreciated. Please feel free to fork the repository and submit pull requests. | ||
|
||
## Reference and Inspiration | ||
|
||
| Repository | Stars | Forks | Last Updated | About | | ||
| :--------------------------------------------------: | :---: | :---: | :----------: | :-------------------------------: | | ||
| [OpenVoice](https://github.com/myshell-ai/OpenVoice) | 13973 | 1213 | 2024-02-09 | Instant voice cloning by MyShell. | | ||
The development of YouTube Auto-Dub was inspired by the following repository: | ||
|
||
- [OpenVoice](https://github.com/myshell-ai/OpenVoice): Instant voice cloning technology by MyShell, utilized for voice timbre recognition and synthesis in this project. | ||
|
||
## Future Directions | ||
|
||
- **Model Improvements**: Explore and integrate better models for voice recognition and synthesis. | ||
- **Serverless GPU Support**: Investigate options for serverless GPU computing to accelerate processing. | ||
- **Frontend Interface**: Develop a user-friendly frontend for easier interaction with the application. | ||
- **Translation Synchronization**: Enhance the synchronization between translated text and video content for a seamless viewing experience. | ||
|
||
## License | ||
|
||
<!-- END_SECTION:reference-inspiration --> | ||
This project is licensed under the MIT License - see the LICENSE file for details. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
0.01 --> 4.75 | ||
ma mangiare al Duomo di Milano con soli | ||
2.409 --> 7.27 | ||
5 euro è possibile in un anno di | ||
4.75 --> 9.49 | ||
accurate ricerche ho trovato ben 4 posti | ||
7.27 --> 11.53 | ||
dove possibile farlo primo è bau buono | ||
9.49 --> 13.870000000000001 | ||
uno street food cinese Dove potrete | ||
11.53 --> 16.75 | ||
mangiare un Bao fatto a mano a soli | ||
13.87 --> 18.85 | ||
€2,50 anche i ravioli sono strepitosi in | ||
16.75 --> 20.948999999999998 | ||
alternativa una bella pizza fritta da | ||
18.85 --> 22.75 | ||
zia Esterina classica Vi costerà 4 euro | ||
20.949 --> 25.51 | ||
Oppure ancora Vi consiglio un bel | ||
22.75 --> 27.67 | ||
panzerotto da Luini prezzo da €3 in su e | ||
25.51 --> 29.830000000000002 | ||
infine immancabile la pizza di Spontini | ||
27.67 --> 31.57 | ||
un trancione di Margherita Vi costerà 5 | ||
29.83 --> 34.769999999999996 | ||
euro a qualcuno con cui andarci la | ||
31.57 --> 34.77 | ||
prossima volta in Duomo e |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.