🌍 한국어 ∙ English ∙ 中文简体 ∙ 中文繁體 ∙ 日本語∙ Deutsch ∙ Español ∙ Português
Voice-Pro is a state-of-the-art web app that transforms multimedia content creation. It integrates YouTube video downloading, voice separation, speech recognition, translation, and text-to-speech into a single, powerful tool for creators, researchers, and multilingual professionals.
- 🔊 Top-tier speech recognition: Whisper, Faster-Whisper, Whisper-Timestamped
- 🎤 Zero-shot voice cloning: F5-TTS, E2-TTS, CosyVoice
- 📢 Multilingual text-to-speech: Edge-TTS, kokoro
- 🎥 YouTube processing & audio extraction: yt-dlp
- 🌍 Instant translation for 100+ languages: Deep-Translator
- 🔇 Pro-grade vocal isolation: UVR5
- 🔥 AI cover creation: RVC
A robust alternative to ElevenLabs, Voice-Pro empowers podcasters, developers, and creators with advanced voice solutions.
- Now updated to v2.x (Python 3.10.15, Torch 2.5.1+cu124, Gradio 5.14.0)
- 🆓 Free trial processes up to 60 seconds of media
- 🔥 New AI Cover feature added
- 🎤 CosyVoice and kokoro now supported
- ⏳ Initial launch downloads CozyVoice2-0.5B (9GB)—may take over an hour based on your network
- 🎧 Celebrity voice options for cloning expanding regularly
- Guidance:
- Existing users: Run update.bat to refresh to v2.0.x
- New users: See Installation below—run configure.bat, then start.bat
voice-pro-demo-v1.6.7-1080p.mp4
Studio Tab's comprehensive media processing workflow demo: Demonstrates a one-stop media transformation process from YouTube video download to AI-based voice separation, automatic Whisper subtitles, multilingual translation, and professional dubbing using F5-TTS.
f5-tts-demo-elon-zuckerberg-1115-3.mp4
Demonstration of F5-TTS's innovative AI voice cloning technology: Showcasing advanced voice conversion technology that precisely mimics the actual voices of Mark Zuckerberg and Elon Musk to create entirely new content.
321132645-44ee3893-145d-474a-840b-1ff45802dfbf.mp4
Make a Trump version of IU's 'Cupid', Kim Kwang-seok's 'I Miss You', and 'Private's Letter'.
voice-pro-demo-v1.5.7-h264-1080p-live.mp4
Demonstration of real-time multilingual translation feature: Showcasing an innovative multilingual media processing process that instantly captures BBC news content, generates subtitles in real-time, and immediately translates them into other languages.
- YouTube video downloads & audio extraction
- Voice separation with MDX-Net & Demucs
- Supports 100+ languages for speech recognition & translation
- Speech-to-Text: Whisper, Faster-Whisper, Whisper-Timestamped
- Text-to-Speech:
- Edge-TTS: 100+ languages, 400+ voices
- E2-TTS, F5-TTS, CosyVoice: Zero-shot cloning
- kokoro: Ranked #2 in HuggingFace TTS Arena
- 🔥 AI Cover (Speech-to-Speech): Vocal removal via UVR5, modulation with RVC
- Instant speech recognition
- Multilingual translation on the fly
- Customizable audio inputs
- All-in-one hub: YouTube downloads, noise removal, subtitles, translation, & TTS
- Supports all ffmpeg-compatible formats
- Output options: WAV, FLAC, MP3
- Subtitles & recognition for 100+ languages
- TTS with speed, volume, & pitch controls
- Subtitle-focused: 90+ languages
- Video-integrated subtitle display
- Word-level highlighting & denoise options
- Translation for 100+ languages
- Supports subtitle files (ASS, SSA, SRT, etc.)
- Real-time voice recognition & translation
- Options: Edge-TTS, F5-TTS, CosyVoice, kokoro
- Celeb voice podcasts & multilingual support
- Vocal removal: MDX-Net, Demucs
- Voice modulation: RVC
- Download AI voices from Discord AI Hub or request via abus.aikorea@gmail.com
- OS: Windows 10/11 (64-bit) ※ Linux/Mac unsupported
- GPU: NVIDIA with CUDA 12.4 (recommended)
- VRAM: 4GB+ (8GB+ preferred)
- RAM: 4GB+
- Storage: 20GB+ free space
- Internet: Required
Install Voice-Pro with ease using configure.bat and start.bat.
git clone https://github.com/abus-aikorea/voice-pro.git
- 🚀 configure.bat
- Sets up git, ffmpeg, and CUDA (if NVIDIA GPU)
- Run once; takes 1+ hour with internet
- Don’t close the command window
- 🚀 start.bat
- Launches Voice-Pro WebUI
- First run installs dependencies (1+ hour)
- Retry after deleting installer_files if issues arise
- 🚀 update.bat: Refreshes Python environment (faster than reinstall)
- Run uninstall.bat or delete the folder (portable install)
- Close the Windows-Commnad window and run start.bat again.
- Run the browser directly and enter the address displayed in the Windows-Command window (e.g. http://127.0.0.1:7892) in the address bar.
- Check the GPU memory status in Windows Task Manager - Performance tab.
- Set the Denoise level to 0 or 1. Denoise level 2 requires at least 8GB of GPU memory.
- Set Compute Type to int type. The float type has better quality, but requires more GPU memory.
- The quality of subtitles tends to improve with larger Whisper models, but this is not necessarily the case. large > medium > small > base > tiny
- Among compute types, float type has good performance. The int type is a model that reduces GPU usage and increases speed through model quantization. On the other hand, performance decreases.
- If you increase the denoise level, more background sounds will be removed, and only the remaining voice will be used for voice recognition. It does not always guarantee good results.
Windows Defender may give a warning about untrusted application and disallow further execution of Voice-Pro. If SmartScreen security level is set to "Warn", just click "More info" and then click "Run anyway". If SmartScreen is set to level "Block" there will be no button to run the installation. In this case, open the properties of the start.bat file, and check "Unblock", apply the change and run the start.bat again.
When Windows Defender mistakenly recognizes a batch file as a Trojan, this is often called a 'False Positive'. To solve this problem, you can go through the following steps:
- File exception handling: In Windows Defender, you can set certain files or processes to skip security scanning. To do this, follow the steps below:
- Click the ‘Start’ button and go to ‘Settings’.
- Click ‘Update & Security’.
- Select ‘Windows Security’ and go to ‘Virus & threat protection’.
- Click ‘Manage Virus & Threat Protection Settings’.
- Select 'Add exception' in 'Virus & threat protection settings'.
- Select 'File or Folder', find the batch file in question and add it as an exception.
- Temporarily disable Windows Defender: This may be a temporary solution. However, you must be careful when using this method as it may expose your computer to other threats.
- Report the problem to anti-virus software: If you are sure that the file is not a Trojan horse, you can report it to Microsoft as a False Positive. Microsoft will review this and take any necessary action.
- This repository offers a free trial of Voice-Pro.
- The free trial version of Voice-Pro allows you to process up to 60 seconds of media.
- The official version of Voice-Pro can be purchased through the ABUS official website (https://abuskorea.imweb.me)
- Additionally, if you support us through Buy Me a Coffee ☕, we will give you a usage voucher for up to one month as a token of our gratitude. (#10 (comment))
- For inquiries regarding purchases, business partnerships, tuning, investments, etc., please contact us via email (abus.aikorea@gmail.com)."
- Email: abus.aikorea@gmail.com
- Homepage (Korean): https://abuskorea.imweb.me
- Amazon: US | Japan | Singapore | UAE
- Naver: Software | Solution
- Demucs: https://github.com/facebookresearch/demucs
- yt-dlp: https://github.com/yt-dlp/yt-dlp
- gradio: https://github.com/gradio-app/gradio
- edge-TTS: https://github.com/rany2/edge-tts
- F5-TTS: https://github.com/SWivid/F5-TTS.git
- openai-whisper: https://github.com/openai/whisper
- faster-whisper: https://github.com/SYSTRAN/faster-whisper
- whisper-timestamped: https://github.com/linto-ai/whisper-timestamped
- RVC-Project: https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI
- UVR5: https://github.com/Anjok07/ultimatevocalremovergui
- CosyVoice: https://github.com/FunAudioLLM/CosyVoice
- kokoro: https://github.com/hexgrad/kokoro
- Deep-Translator: https://github.com/nidhaloff/deep-translator
by ABUS