Voice-Pro: Ultimate AI Voice Conversion and Multilingual Translation Tool 🔊

🌍 한국어 ∙ English ∙ 中文简体 ∙ 中文繁體 ∙ 日本語∙ Deutsch ∙ Español ∙ Português

🎙️ Advanced AI-Powered Multimedia Processing Tool | Whisper Speech Recognition WebUI

Voice-Pro is a state-of-the-art web app that transforms multimedia content creation. It integrates YouTube video downloading, voice separation, speech recognition, translation, and text-to-speech into a single, powerful tool for creators, researchers, and multilingual professionals.

🔊 Top-tier speech recognition: Whisper, Faster-Whisper, Whisper-Timestamped
🎤 Zero-shot voice cloning: F5-TTS, E2-TTS, CosyVoice
📢 Multilingual text-to-speech: Edge-TTS, kokoro
🎥 YouTube processing & audio extraction: yt-dlp
🌍 Instant translation for 100+ languages: Deep-Translator
🔇 Pro-grade vocal isolation: UVR5
🔥 AI cover creation: RVC

A robust alternative to ElevenLabs, Voice-Pro empowers podcasters, developers, and creators with advanced voice solutions.

⚠️ Heads-Up

Now updated to v2.x (Python 3.10.15, Torch 2.5.1+cu124, Gradio 5.14.0)
🆓 Free trial processes up to 60 seconds of media
🔥 New AI Cover feature added
🎤 CosyVoice and kokoro now supported
⏳ Initial launch downloads CozyVoice2-0.5B (9GB)—may take over an hour based on your network
🎧 Celebrity voice options for cloning expanding regularly
Guidance:
- Existing users: Run update.bat to refresh to v2.0.x
- New users: See Installation below—run configure.bat, then start.bat

🚄 Demos

`Dubbing Studio` Tab: Transcription, Translation & TTS

voice-pro-demo-v1.6.7-1080p.mp4

Studio Tab's comprehensive media processing workflow demo: Demonstrates a one-stop media transformation process from YouTube video download to AI-based voice separation, automatic Whisper subtitles, multilingual translation, and professional dubbing using F5-TTS.

`F5-TTS-Multi` Tab: Podcast Creation

f5-tts-demo-elon-zuckerberg-1115-3.mp4

Demonstration of F5-TTS's innovative AI voice cloning technology: Showcasing advanced voice conversion technology that precisely mimics the actual voices of Mark Zuckerberg and Elon Musk to create entirely new content.

`AI Cover` Tab

321132645-44ee3893-145d-474a-840b-1ff45802dfbf.mp4

Make a Trump version of IU's 'Cupid', Kim Kwang-seok's 'I Miss You', and 'Private's Letter'.

`Live Translation` Tab: Real-Time Recognition & Translation

voice-pro-demo-v1.5.7-h264-1080p-live.mp4

Demonstration of real-time multilingual translation feature: Showcasing an innovative multilingual media processing process that instantly captures BBC news content, generates subtitles in real-time, and immediately translates them into other languages.

⭐ Key Features

1. Dubbing Studio

YouTube video downloads & audio extraction
Voice separation with MDX-Net & Demucs
Supports 100+ languages for speech recognition & translation

2. Speech Technologies

Speech-to-Text: Whisper, Faster-Whisper, Whisper-Timestamped
Text-to-Speech:
- Edge-TTS: 100+ languages, 400+ voices
- E2-TTS, F5-TTS, CosyVoice: Zero-shot cloning
- kokoro: Ranked #2 in HuggingFace TTS Arena
🔥 AI Cover (Speech-to-Speech): Vocal removal via UVR5, modulation with RVC

3. Real-Time Translation

Instant speech recognition
Multilingual translation on the fly
Customizable audio inputs

🤖 WebUI

`Dubbing Studio` Tab

All-in-one hub: YouTube downloads, noise removal, subtitles, translation, & TTS
Supports all ffmpeg-compatible formats
Output options: WAV, FLAC, MP3
Subtitles & recognition for 100+ languages
TTS with speed, volume, & pitch controls

`Whisper Caption` Tab

Subtitle-focused: 90+ languages
Video-integrated subtitle display
Word-level highlighting & denoise options

`Translate` Tab

Translation for 100+ languages
Supports subtitle files (ASS, SSA, SRT, etc.)
Real-time voice recognition & translation

`Speech Generation` Tab

Options: Edge-TTS, F5-TTS, CosyVoice, kokoro
Celeb voice podcasts & multilingual support

🔥 `AI Cover` Tab

Vocal removal: MDX-Net, Demucs
Voice modulation: RVC
Download AI voices from Discord AI Hub or request via abus.aikorea@gmail.com

💻 System Requirements

OS: Windows 10/11 (64-bit) ※ Linux/Mac unsupported
GPU: NVIDIA with CUDA 12.4 (recommended)
VRAM: 4GB+ (8GB+ preferred)
RAM: 4GB+
Storage: 20GB+ free space
Internet: Required

📀 Installation

Install Voice-Pro with ease using configure.bat and start.bat.

1. Get the Package

Clone or download the latest release (Source code (zip)) from

git clone https://github.com/abus-aikorea/voice-pro.git

2. Install & Run

🚀 configure.bat
- Sets up git, ffmpeg, and CUDA (if NVIDIA GPU)
- Run once; takes 1+ hour with internet
- Don’t close the command window
🚀 start.bat
- Launches Voice-Pro WebUI
- First run installs dependencies (1+ hour)
- Retry after deleting installer_files if issues arise

3. Update

🚀 update.bat: Refreshes Python environment (faster than reinstall)

4. Uninstall

Run uninstall.bat or delete the folder (portable install)

❓Tips & Tricks

If Browser does not run automatically

Close the Windows-Commnad window and run start.bat again.
Run the browser directly and enter the address displayed in the Windows-Command window (e.g. http://127.0.0.1:7892) in the address bar.

If a CUDA Out-Of-Memory error occurs

Check the GPU memory status in Windows Task Manager - Performance tab.
Set the Denoise level to 0 or 1. Denoise level 2 requires at least 8GB of GPU memory.
Set Compute Type to int type. The float type has better quality, but requires more GPU memory.

How to improve the quality of subtitles?

The quality of subtitles tends to improve with larger Whisper models, but this is not necessarily the case. large > medium > small > base > tiny
Among compute types, float type has good performance. The int type is a model that reduces GPU usage and increases speed through model quantization. On the other hand, performance decreases.
If you increase the denoise level, more background sounds will be removed, and only the remaining voice will be used for voice recognition. It does not always guarantee good results.

📢 caution

Windows Defender may give a warning about untrusted application and disallow further execution of Voice-Pro. If SmartScreen security level is set to "Warn", just click "More info" and then click "Run anyway". If SmartScreen is set to level "Block" there will be no button to run the installation. In this case, open the properties of the start.bat file, and check "Unblock", apply the change and run the start.bat again.

When Windows Defender mistakenly recognizes a batch file as a Trojan, this is often called a 'False Positive'. To solve this problem, you can go through the following steps:

File exception handling: In Windows Defender, you can set certain files or processes to skip security scanning. To do this, follow the steps below:
- Click the ‘Start’ button and go to ‘Settings’.
- Click ‘Update & Security’.
- Select ‘Windows Security’ and go to ‘Virus & threat protection’.
- Click ‘Manage Virus & Threat Protection Settings’.
- Select 'Add exception' in 'Virus & threat protection settings'.
- Select 'File or Folder', find the batch file in question and add it as an exception.
Temporarily disable Windows Defender: This may be a temporary solution. However, you must be careful when using this method as it may expose your computer to other threats.
Report the problem to anti-virus software: If you are sure that the file is not a Trojan horse, you can report it to Microsoft as a False Positive. Microsoft will review this and take any necessary action.

☕ Notice

This repository offers a free trial of Voice-Pro.
The free trial version of Voice-Pro allows you to process up to 60 seconds of media.
The official version of Voice-Pro can be purchased through the ABUS official website (https://abuskorea.imweb.me)
Additionally, if you support us through Buy Me a Coffee ☕, we will give you a usage voucher for up to one month as a token of our gratitude. (#10 (comment))
For inquiries regarding purchases, business partnerships, tuning, investments, etc., please contact us via email (abus.aikorea@gmail.com)."

📬 Contact

Email: abus.aikorea@gmail.com
Homepage (Korean): https://abuskorea.imweb.me
Amazon: US | Japan | Singapore | UAE
Naver: Software | Solution

👍 YouTube

Product Info
Karaoke: Pop | K-Pop | J-Pop

🙏 Credits

Demucs: https://github.com/facebookresearch/demucs
yt-dlp: https://github.com/yt-dlp/yt-dlp
gradio: https://github.com/gradio-app/gradio
edge-TTS: https://github.com/rany2/edge-tts
F5-TTS: https://github.com/SWivid/F5-TTS.git
openai-whisper: https://github.com/openai/whisper
faster-whisper: https://github.com/SYSTRAN/faster-whisper
whisper-timestamped: https://github.com/linto-ai/whisper-timestamped
RVC-Project: https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI
UVR5: https://github.com/Anjok07/ultimatevocalremovergui
CosyVoice: https://github.com/FunAudioLLM/CosyVoice
kokoro: https://github.com/hexgrad/kokoro
Deep-Translator: https://github.com/nidhaloff/deep-translator

©️ Copyright

by ABUS

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.github		.github
app		app
cosyvoice		cosyvoice
docs		docs
model		model
rvc		rvc
src		src
third_party/Matcha-TTS		third_party/Matcha-TTS
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
configure.bat		configure.bat
one_click.cp310-win_amd64.pyd		one_click.cp310-win_amd64.pyd
requirements-voice-cpu.txt		requirements-voice-cpu.txt
requirements-voice-gpu.txt		requirements-voice-gpu.txt
start-abus.py		start-abus.py
start-voice.py		start-voice.py
start.bat		start.bat
uninstall.bat		uninstall.bat
update.bat		update.bat

License

abus-aikorea/voice-pro

Folders and files

Latest commit

History

Repository files navigation

Voice-Pro: Ultimate AI Voice Conversion and Multilingual Translation Tool 🔊

🎙️ Advanced AI-Powered Multimedia Processing Tool | Whisper Speech Recognition WebUI

⚠️ Heads-Up

🚄 Demos

Dubbing Studio Tab: Transcription, Translation & TTS

F5-TTS-Multi Tab: Podcast Creation

AI Cover Tab

Live Translation Tab: Real-Time Recognition & Translation

⭐ Key Features

1. Dubbing Studio

2. Speech Technologies

3. Real-Time Translation

🤖 WebUI

Dubbing Studio Tab

Whisper Caption Tab

Translate Tab

Speech Generation Tab

🔥 AI Cover Tab