Skip to content

Commit

Permalink
UPDATE
Browse files Browse the repository at this point in the history
  • Loading branch information
mazzasaverio committed Feb 9, 2024
1 parent 51b02b8 commit 5652cc0
Show file tree
Hide file tree
Showing 4 changed files with 92 additions and 40 deletions.
98 changes: 60 additions & 38 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,72 +1,94 @@
# YouTube Auto-Dub

This repository aims to establish a starting point for deploying degli endpoints which utilizes OpenVoice and FastAPI. The main functionality is to recognize the voice timbre from a YouTube video and recreate the same video with a text-to-speech model in the same timbre after translating the subtitles. This is just a basic setup.
This repository serves as a starting point for developing a FastAPI backend for dubbing YouTube videos by capturing and inferring the voice timbre using OpenVoice.

YouTube Auto-Dub is an innovative backend application designed for automated voice dubbing of YouTube videos. Utilizing Docker for deployment, OpenVoice for voice timbre recognition, and FastAPI for web services, this project enables the creation of dubbed YouTube videos with text-to-speech models matching the original voice timbre.
![Example Image](static/screen.png)

## Steps:
## Core Features

1. Submit a YouTube link via the endpoint `/api/v1/download/`.
2. The final processed video is saved in `backend/data/final_videos`.
- **Voice Timbre Recognition**: Utilizes OpenVoice technology to accurately recognize the voice timbre from the original YouTube video.
- **Text-to-Speech Synthesis**: Downloads and processes subtitles, translating them and converting them into speech, matching the original voice timbre as closely as possible.
- **Flexible Deployment**: Supports deployment via GitHub Actions and Cloud Build, with compatibility for Cloud Run deployment, ensuring scalability and ease of use. Currently, inference is performed using CPU. For setting up Cloud Run with Terraform, refer to the following repository for instructions:

## Features
[FastAPI-CloudRun-Starter](https://github.com/mazzasaverio/fastapi-cloudrun-starter)

- Deployment via GitHub Actions and Cloud Build on a Cloud Run.
## Getting Started

Currently ho provato the deployment is on a Cloud Run (thus, only CPU is used for inference).
To get started with YouTube Auto-Dub, follow these steps:

For a starting template on setting up Cloud Run with Terraform, refer to this link:
[FastAPI-CloudRun-Starter](https://github.com/mazzasaverio/fastapi-cloudrun-starter)
### 1. Environment Setup

## Next Steps
For local development, we recommend setting up a conda environment with:

- Test better models.
- Test serverless GPU.
- Add a frontend.
- Improve translation synchronization.
```bash
conda install mamba -n base -c conda-forge
mamba create -n youtube-auto-dub python=3.9 -y
mamba install -n youtube-auto-dub pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia -y
conda activate youtube-auto-dub
pip install -r requirements.txt
```

## Local Installation Instructions
### 2. Download Required Checkpoints

We recommend the following for local installation:
Download the model checkpoints necessary for voice timbre recognition and synthesis:

```bash
sudo aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://myshell-public-repo-hosting.s3.amazonaws.com/checkpoints_1226.zip -d /code -o checkpoints_1226.zip
sudo unzip /code/checkpoints_1226.zip -d backend/checkpoints
```
conda install mamba -n base -c conda-forge
mamba create -n youtube-auto-dub python=3.9 -y

mamba install -n youtube-auto-dub pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia -y
### 3. Running the Application

conda activate youtube-auto-dub
With the environment set up and checkpoints downloaded, navigate to the backend directory and start the application using:

pip install -r requirements.txt
```bash
cd backend
uvicorn app.main:app --reload
```

Download the checkpoint from [here](https://myshell-public-repo-hosting.s3.amazonaws.com/checkpoints_1226.zip) and extract it to the `checkpoints` folder. Insert the checkpoint found in `checkpoints_1226` into the `backend` folder.
## Usage

conda install mamba -n base -c conda-forge
To use YouTube Auto-Dub, begin by submitting a YouTube link via the endpoint:

mamba create -n youtube-auto-dub python=3.9 -y
```
/api/v1/download/
```

mamba install -n youtube-auto-dub pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia -y
The application will process the video, recognize the voice timbre, translate the subtitles, synthesize the translated speech matching the original timbre, and then assemble the final video. The processed video will be saved in `backend/data/final_videos`. With the video ID returned in the output, you can check the processing status through the endpoint:

conda activate youtube-auto-dub
```
/api/v1/status/{video_id}
```

pip install pytube moviepy fastapi uvicorn loguru youtube-dl youtube-transcript-api librosa
Finally, you can download the final video by using the endpoint:

pip install googletrans==4.0.0-rc1
```
/api/v1/download-video/{video_id}
```

sudo apt -y install -qq aria2 unzip
inserting the video's ID.

sudo aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://myshell-public-repo-hosting.s3.amazonaws.com/checkpoints_1226.zip -d /code -o checkpoints_1226.zip
## Deployment

sudo unzip /code/checkpoints_1226.zip
This project is designed with cloud deployment in mind. The provided `cloudbuild.yaml` and Terraform configurations facilitate deployment on Google Cloud Platform, specifically using Cloud Run for scalable, serverless application hosting.

<!-- START_SECTION:reference-inspiration -->
## Contributing

Contributions are welcome! Whether you're fixing a bug, adding new features, or improving the documentation, your help is appreciated. Please feel free to fork the repository and submit pull requests.

## Reference and Inspiration

| Repository | Stars | Forks | Last Updated | About |
| :--------------------------------------------------: | :---: | :---: | :----------: | :-------------------------------: |
| [OpenVoice](https://github.com/myshell-ai/OpenVoice) | 13973 | 1213 | 2024-02-09 | Instant voice cloning by MyShell. |
The development of YouTube Auto-Dub was inspired by the following repository:

- [OpenVoice](https://github.com/myshell-ai/OpenVoice): Instant voice cloning technology by MyShell, utilized for voice timbre recognition and synthesis in this project.

## Future Directions

- **Model Improvements**: Explore and integrate better models for voice recognition and synthesis.
- **Serverless GPU Support**: Investigate options for serverless GPU computing to accelerate processing.
- **Frontend Interface**: Develop a user-friendly frontend for easier interaction with the application.
- **Translation Synchronization**: Enhance the synchronization between translated text and video content for a seamless viewing experience.

## License

<!-- END_SECTION:reference-inspiration -->
This project is licensed under the MIT License - see the LICENSE file for details.
4 changes: 2 additions & 2 deletions backend/app/api/v1/endpoints.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ def extract_video_id(url: str) -> str:
return match.group(1) if match else None


@router.post("/download/")
@router.post("/url-process/")
async def download_video(request: VideoDownloadRequest, background_tasks: BackgroundTasks):
# Correctly access youtube_url from the request object
video_id = extract_video_id(request.url)
Expand Down Expand Up @@ -83,7 +83,7 @@ def get_task_status(video_id: str):
status = task_status.get(video_id, "Not Found")
return {"video_id": video_id, "status": status}

@router.get("/download-video/{video_id}")
@router.get("/download-final-video/{video_id}")
async def download_video(video_id: str):
video_path = os.path.join(DATA_DIR, "final_videos", f"{video_id}.mp4")
if not os.path.exists(video_path):
Expand Down
30 changes: 30 additions & 0 deletions backend/data/captions/pcydlhq2MWI.it.srt
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
0.01 --> 4.75
ma mangiare al Duomo di Milano con soli
2.409 --> 7.27
5 euro è possibile in un anno di
4.75 --> 9.49
accurate ricerche ho trovato ben 4 posti
7.27 --> 11.53
dove possibile farlo primo è bau buono
9.49 --> 13.870000000000001
uno street food cinese Dove potrete
11.53 --> 16.75
mangiare un Bao fatto a mano a soli
13.87 --> 18.85
€2,50 anche i ravioli sono strepitosi in
16.75 --> 20.948999999999998
alternativa una bella pizza fritta da
18.85 --> 22.75
zia Esterina classica Vi costerà 4 euro
20.949 --> 25.51
Oppure ancora Vi consiglio un bel
22.75 --> 27.67
panzerotto da Luini prezzo da €3 in su e
25.51 --> 29.830000000000002
infine immancabile la pizza di Spontini
27.67 --> 31.57
un trancione di Margherita Vi costerà 5
29.83 --> 34.769999999999996
euro a qualcuno con cui andarci la
31.57 --> 34.77
prossima volta in Duomo e
Binary file added static/screen.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 5652cc0

Please sign in to comment.