Skip to content

futurelogxyz/video-speech-translator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

video-speech-translator

localize video speech from one language to another language, lip sync and clone the voice of the speaker.

Demo

  • chinese speech to french speech

chinese speech: https://futurelog-1251943639.cos.ap-shanghai.myqcloud.com/video/linzhiling-shorts.mp4

french speech: https://futurelog-1251943639.cos.ap-shanghai.myqcloud.com/video/linzhiling-shorts-french.mp4

  • chinese speech to english speech

chinese speech: https://futurelog-1251943639.cos.ap-shanghai.myqcloud.com/video/mayun-shorts-1.mp4

english speech: https://futurelog-1251943639.cos.ap-shanghai.myqcloud.com/video/mayun-shorts-1-en.mp4

Feature

  • translate speech audio
  • clone the voice of original speaker
  • lip sync
  • subtitle
  • video watermark

TODO

  • support Japanese speech composition
  • improve generated video quality
  • face swap

Installation

0. clone this repository and create nessary directories

git clone https://github.com/crowaixyz/video-speech-localization.git

mkdir -p output/raw_audio
mkdir -p output/raw_speech
mkdir -p output/translated_speech
mkdir -p output/lip_synced_video
mkdir -p output/final_video
mkdir -p pretrained_models

1. Install ffmpeg and Mecab

conda install -c conda-forge ffmpeg

# install additional packages in order to compose Japanese speech
# yum install mecab  # MeCab is a popular Japanese morphological analyzer
# ln -s /etc/mecabrc /usr/local/etc/mecabrc  # symbolic link
# yum install mecab-ipadic

2. create main virtual environment, and install dependencies faster-whisper and openai and Requests and Gradio and TTS and moviepy and pydub

conda create -n video-speech-localization python=3.10
conda activate video-speech-localization

pip install faster-whisper

pip install requests

pip install gradio

pip install TTS

pip install moviepy

conda deactivate

3. create spleeter virtual environment, and install Spleeter

Tips: use an isolated virtual environment to install Spleeter, because Spleeter may install some dependencies which may conflict with other packages(like TTS).

conda create -n video-speech-localization-spleeter python=3.10
conda activate video-speech-localization-spleeter

pip install spleeter

conda deactivate

4. create video-retalking virtual environment, and install video-retalking

Tips: use an isolated virtual environment to install video-retalking, because video-retalking may install some dependencies which may conflict with other packages.

# clone video-retalking github repository into current directory
git clone https://github.com/vinthony/video-retalking.git
cd video-retalking

conda create -n video-speech-localization-video-retalking python=3.8
conda activate video-speech-localization-video-retalking

# Please follow the instructions from https://pytorch.org/get-started/previous-versions/
# eg. CUDA 11.1
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
# eg. CUDA 11.7
pip install torch==2.0.1 torchvision==0.15.2

pip install -r requirements.txt

conda deactivate

Usage

# activate main virtual environment
conda activate video-speech-localization

# export environment variables
export OPENAI_CHAT_API_URL=YOUR_OPENAI_CHAT_API_URL        # eg. https://api.openai.com/v1/chat/completions
export OPENAI_API_KEY=YOUR_OPENAI_API_KEY                  # eg. sk-xxxxxxxxxxxxxxxxxxxxxxxx
export VSL_SERVER_NAME=YOUR_SERVER_NAME                    # eg. 127.0.0.1
export VSL_SERVER_PORT=YOUR_SERVER_PORT                    # eg. 7860
export CONDA_VIRTUAL_ENV_PATH=YOUR_CONDA_VIRTUAL_ENV_PATH  # eg. /data/anaconda3/envs/
export CUDA_HOME=YOUR_CUDA_HOME                            # eg. /usr/local/cuda-11.8/
export CUDA_VISIBLE_DEVICES=YOUR_CUDA_VISIBLE_DEVICES      # eg. 0
export LD_LIBRARY_PATH=YOUR_LD_LIBRARY_PATH                # eg. /usr/local/cuda/lib64::/usr/local/cuda/lib64:/usr/local/lib::$LD_LIBRARY_PATH

# run app, then open url(default http://localhost:7860/ ) in your browser
python app.py

FAQ & Tips

  1. manually download models in advance in order to use spleeter
  • download pretrained_models, and unarchive, then put them in ./pretrained_models/2stems.
  1. manually download faster-whisper pretrained models and configs
  • download pretrained_models and configs, put them in /data/.hugggingface/cache/hub/models--guillaumekln--faster-whisper-large-v2/refs/main then load model like this:
whisper_model = "/data/.hugggingface/cache/hub/models--guillaumekln--faster-whisper-large-v2/refs/main"
model = WhisperModel(whisper_model, device="cpu", compute_type="int8")
  1. manually download coqui-xTTS pretrained models and configs
  • download pretrained_models and configs and put them in /root/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2
  • then load model like this:
tts = TTS(
    model_path="/root/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2/",
    config_path="/root/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2/config.json"
).to("cpu")
  • if you encounter error: Model is not multi-lingual but "language" is provided., you can try to modify code in /path/to/conda-envs/video-speech-localization/lib/python3.10/site-packages/TTS/api.py
  1. manually download models in advance in order to use video-retalking

About

localize video speech content

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages