localize video speech from one language to another language, lip sync and clone the voice of the speaker.
- chinese speech to french speech
chinese speech: https://futurelog-1251943639.cos.ap-shanghai.myqcloud.com/video/linzhiling-shorts.mp4
french speech: https://futurelog-1251943639.cos.ap-shanghai.myqcloud.com/video/linzhiling-shorts-french.mp4
- chinese speech to english speech
chinese speech: https://futurelog-1251943639.cos.ap-shanghai.myqcloud.com/video/mayun-shorts-1.mp4
english speech: https://futurelog-1251943639.cos.ap-shanghai.myqcloud.com/video/mayun-shorts-1-en.mp4
- translate speech audio
- clone the voice of original speaker
- lip sync
- subtitle
- video watermark
- support Japanese speech composition
- improve generated video quality
- face swap
git clone https://github.com/crowaixyz/video-speech-localization.git
mkdir -p output/raw_audio
mkdir -p output/raw_speech
mkdir -p output/translated_speech
mkdir -p output/lip_synced_video
mkdir -p output/final_video
mkdir -p pretrained_models
1. Install ffmpeg and Mecab
conda install -c conda-forge ffmpeg
# install additional packages in order to compose Japanese speech
# yum install mecab # MeCab is a popular Japanese morphological analyzer
# ln -s /etc/mecabrc /usr/local/etc/mecabrc # symbolic link
# yum install mecab-ipadic
2. create main virtual environment, and install dependencies faster-whisper and openai and Requests and Gradio and TTS and moviepy and pydub
conda create -n video-speech-localization python=3.10
conda activate video-speech-localization
pip install faster-whisper
pip install requests
pip install gradio
pip install TTS
pip install moviepy
conda deactivate
3. create spleeter virtual environment, and install Spleeter
Tips: use an isolated virtual environment to install Spleeter, because Spleeter may install some dependencies which may conflict with other packages(like TTS).
conda create -n video-speech-localization-spleeter python=3.10
conda activate video-speech-localization-spleeter
pip install spleeter
conda deactivate
4. create video-retalking virtual environment, and install video-retalking
Tips: use an isolated virtual environment to install video-retalking, because video-retalking may install some dependencies which may conflict with other packages.
# clone video-retalking github repository into current directory
git clone https://github.com/vinthony/video-retalking.git
cd video-retalking
conda create -n video-speech-localization-video-retalking python=3.8
conda activate video-speech-localization-video-retalking
# Please follow the instructions from https://pytorch.org/get-started/previous-versions/
# eg. CUDA 11.1
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
# eg. CUDA 11.7
pip install torch==2.0.1 torchvision==0.15.2
pip install -r requirements.txt
conda deactivate
# activate main virtual environment
conda activate video-speech-localization
# export environment variables
export OPENAI_CHAT_API_URL=YOUR_OPENAI_CHAT_API_URL # eg. https://api.openai.com/v1/chat/completions
export OPENAI_API_KEY=YOUR_OPENAI_API_KEY # eg. sk-xxxxxxxxxxxxxxxxxxxxxxxx
export VSL_SERVER_NAME=YOUR_SERVER_NAME # eg. 127.0.0.1
export VSL_SERVER_PORT=YOUR_SERVER_PORT # eg. 7860
export CONDA_VIRTUAL_ENV_PATH=YOUR_CONDA_VIRTUAL_ENV_PATH # eg. /data/anaconda3/envs/
export CUDA_HOME=YOUR_CUDA_HOME # eg. /usr/local/cuda-11.8/
export CUDA_VISIBLE_DEVICES=YOUR_CUDA_VISIBLE_DEVICES # eg. 0
export LD_LIBRARY_PATH=YOUR_LD_LIBRARY_PATH # eg. /usr/local/cuda/lib64::/usr/local/cuda/lib64:/usr/local/lib::$LD_LIBRARY_PATH
# run app, then open url(default http://localhost:7860/ ) in your browser
python app.py
- manually download models in advance in order to use spleeter
- download pretrained_models, and unarchive, then put them in ./pretrained_models/2stems.
- manually download faster-whisper pretrained models and configs
- download pretrained_models and configs, put them in
/data/.hugggingface/cache/hub/models--guillaumekln--faster-whisper-large-v2/refs/main
then load model like this:
whisper_model = "/data/.hugggingface/cache/hub/models--guillaumekln--faster-whisper-large-v2/refs/main"
model = WhisperModel(whisper_model, device="cpu", compute_type="int8")
- manually download coqui-xTTS pretrained models and configs
- download pretrained_models and configs and put them in
/root/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2
- then load model like this:
tts = TTS(
model_path="/root/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2/",
config_path="/root/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2/config.json"
).to("cpu")
- if you encounter error:
Model is not multi-lingual but "language" is provided.
, you can try to modify code in/path/to/conda-envs/video-speech-localization/lib/python3.10/site-packages/TTS/api.py
- manually download models in advance in order to use video-retalking
- download pre-trained models and put them in ./checkpoints under video-retalking directory.
- download detection_Resnet50_Final.pth and parsing_parsenet.pth, put them under
/path/to/conda-envs/video-speech-localization-video-retalking/lib/python3.8/site-packages/facexlib/weights/