The program for automatic dubbing any video file for a lot of languages.
This Python script extracts the audio from a video file, transcribes it, translates it into a different language, generates a new audio file with the translated text, and then merges it with the original video.
- Python 3.8 or higher
- FFmpeg
- Google Cloud Text-to-Speech API: Used to generate the audio for the translated text.
- Google Cloud Translate API: Used to translate the transcribed text into a different language.
- Whisper ASR: Used to transcribe the audio from the video file.
- Spacy: Used for natural language processing tasks, such as tokenization and syllable counting.
- PyDub: Used for manipulating audio files.
- MoviePy: Used for extracting the audio from the video file.
- Clone this repository:
git clone https://github.com/am-sokolov/videodubber.git
- Install the required Python packages:
pip install -r requirements.txt
This script uses Google Cloud's Text-to-Speech and Translate APIs, which require authentication. Follow these steps to get your credentials:
- Create a new project in the Google Cloud Console.
- Enable the Text-to-Speech and Translate APIs for your project.
- Create a new service account for your project in the Service Accounts page.
- Create a new JSON key for your service account, and download it. This is your credentials file.
Run the script with the following command:
python main.py --input <path_to_video_file> --voice <target_voice> --credentials <path_to_credentials_file> --source_language <source_language>
-
<path_to_video_file>
: Path to the source video file -
<target_voice>
: Target dubbing voice name from Google Cloud Text-to-Speech Voices. Default is "es-US-Neural2-B". Recommended voices are:- English: "en-US-Neural2-J"
- Spanish: "es-US-Neural2-B"
- German: "de-DE-Neural2-D"
- Italian: "it-IT-Neural2-C"
- French: "fr-FR-Neural2-D"
- Russian: "ru-RU-Wavenet-D"
- Hindi: "hi-IN-Neural2-B"
But you feel free to use any other voice.
-
<path_to_credentials_file>
: Path to the Google Cloud credentials JSON file -
<source_language>
: Source language, e.g. "english".
Now, the fully supported source languages are: English, German, French, Italian, Catalan, Chinese, Croatian, Danish, Dutch, Finnish, Greek, Japanese, Korean, Lithuanian, Macedonian, Polish, Portuguese, Romanian, Russian, Spanish, Swedish, Ukrainian.
The script will create a new video file with the same name as the input video file, but with "_translated" appended to the name. The new video file will have the original video with the new translated audio track.
Additionaly, the script will create a new .wav
audio track with the same name as the input video file contains translation only.
You can test this script with any video that contains narration. For example, you can use this free video of US President Donald Trump speaking at the Young Black Leadership Summit at the White House.
Here are the step-by-step instructions for testing:
-
Download the video from the link above.
-
Save the video file in the same directory as the script under the name
trump_speech.mp4
. -
Run the script with the downloaded video file as the input. For example, if you saved the video as
trump_speech.mp4
, you would run:python main.py trump_speech.mp4 de-DE-Neural2-B path_to_credentials.json english
Replace
path_to_credentials.json
with the path to your Google Cloud credentials JSON file. -
The script will create a new
.wav
audio file namedtrump_speech.wav
in the same directory. This file contains the translated audio. -
Listen to the
trump_speech.wav
file to verify that the script worked correctly. The audio should be a translation of the original speech in the video.
Feel free to replace de-DE-Neural2-B
with the desired target voice.
Alexey Sokolov (c). This project is licensed under the terms of the MIT license included in this repository.