Skip to content

Gradio WebUI for audio processing, powered by Whisper (OpenAI-Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer(RVC), zero-shot Voice Cloning (E2, F5-TTS, CosyVoice), YouTube downloading, vocal isolation(UVR5), Text-to-Speech (Edge-TTS, kokoro), and multi-language translation. Perfect for content creators and developers.

License

Notifications You must be signed in to change notification settings

abus-aikorea/voice-pro

Repository files navigation

Voice-Pro: Ultimate AI Voice Conversion and Multilingual Translation Tool 🔊

🌍 한국어English中文简体中文繁體日本語DeutschEspañolPortuguês

GitHub Release

🎙️ Advanced AI-Powered Multimedia Processing Tool | Whisper Speech Recognition WebUI

Voice-Pro is a state-of-the-art web app that transforms multimedia content creation. It integrates YouTube video downloading, voice separation, speech recognition, translation, and text-to-speech into a single, powerful tool for creators, researchers, and multilingual professionals.

  • 🔊 Top-tier speech recognition: Whisper, Faster-Whisper, Whisper-Timestamped
  • 🎤 Zero-shot voice cloning: F5-TTS, E2-TTS, CosyVoice
  • 📢 Multilingual text-to-speech: Edge-TTS, kokoro
  • 🎥 YouTube processing & audio extraction: yt-dlp
  • 🌍 Instant translation for 100+ languages: Deep-Translator
  • 🔇 Pro-grade vocal isolation: UVR5
  • 🔥 AI cover creation: RVC

A robust alternative to ElevenLabs, Voice-Pro empowers podcasters, developers, and creators with advanced voice solutions.

⚠️ Heads-Up

  • Now updated to v2.x (Python 3.10.15, Torch 2.5.1+cu124, Gradio 5.14.0)
  • 🆓 Free trial processes up to 60 seconds of media
  • 🔥 New AI Cover feature added
  • 🎤 CosyVoice and kokoro now supported
  • ⏳ Initial launch downloads CozyVoice2-0.5B (9GB)—may take over an hour based on your network
  • 🎧 Celebrity voice options for cloning expanding regularly
  • Guidance:
    • Existing users: Run update.bat to refresh to v2.0.x
    • New users: See Installation below—run configure.bat, then start.bat

🚄 Demos

Dubbing Studio Tab: Transcription, Translation & TTS

voice-pro-demo-v1.6.7-1080p.mp4

Studio Tab's comprehensive media processing workflow demo: Demonstrates a one-stop media transformation process from YouTube video download to AI-based voice separation, automatic Whisper subtitles, multilingual translation, and professional dubbing using F5-TTS.

F5-TTS-Multi Tab: Podcast Creation

f5-tts-demo-elon-zuckerberg-1115-3.mp4

Demonstration of F5-TTS's innovative AI voice cloning technology: Showcasing advanced voice conversion technology that precisely mimics the actual voices of Mark Zuckerberg and Elon Musk to create entirely new content.

AI Cover Tab

321132645-44ee3893-145d-474a-840b-1ff45802dfbf.mp4

Make a Trump version of IU's 'Cupid', Kim Kwang-seok's 'I Miss You', and 'Private's Letter'.

Live Translation Tab: Real-Time Recognition & Translation

voice-pro-demo-v1.5.7-h264-1080p-live.mp4

Demonstration of real-time multilingual translation feature: Showcasing an innovative multilingual media processing process that instantly captures BBC news content, generates subtitles in real-time, and immediately translates them into other languages.

⭐ Key Features

1. Dubbing Studio

  • YouTube video downloads & audio extraction
  • Voice separation with MDX-Net & Demucs
  • Supports 100+ languages for speech recognition & translation

2. Speech Technologies

  • Speech-to-Text: Whisper, Faster-Whisper, Whisper-Timestamped
  • Text-to-Speech:
    • Edge-TTS: 100+ languages, 400+ voices
    • E2-TTS, F5-TTS, CosyVoice: Zero-shot cloning
    • kokoro: Ranked #2 in HuggingFace TTS Arena
  • 🔥 AI Cover (Speech-to-Speech): Vocal removal via UVR5, modulation with RVC

3. Real-Time Translation

  • Instant speech recognition
  • Multilingual translation on the fly
  • Customizable audio inputs

🤖 WebUI

Dubbing Studio Tab

  • All-in-one hub: YouTube downloads, noise removal, subtitles, translation, & TTS
  • Supports all ffmpeg-compatible formats
  • Output options: WAV, FLAC, MP3
  • Subtitles & recognition for 100+ languages
  • TTS with speed, volume, & pitch controls

Multilingual Voice Conversion and Subtitle Generation Web UI Interface

Whisper Caption Tab

  • Subtitle-focused: 90+ languages
  • Video-integrated subtitle display
  • Word-level highlighting & denoise options

Translate Tab

  • Translation for 100+ languages
  • Supports subtitle files (ASS, SSA, SRT, etc.)
  • Real-time voice recognition & translation

WebUI for Real-Time Speech Recognition and Translation

Speech Generation Tab

  • Options: Edge-TTS, F5-TTS, CosyVoice, kokoro
  • Celeb voice podcasts & multilingual support

Podcast Production WebUI Using Voice-Cloning Technology

🔥 AI Cover Tab

Podcast Production WebUI Using Voice-Cloning Technology

💻 System Requirements

  • OS: Windows 10/11 (64-bit) ※ Linux/Mac unsupported
  • GPU: NVIDIA with CUDA 12.4 (recommended)
  • VRAM: 4GB+ (8GB+ preferred)
  • RAM: 4GB+
  • Storage: 20GB+ free space
  • Internet: Required

📀 Installation

Install Voice-Pro with ease using configure.bat and start.bat.

1. Get the Package

  • Clone or download the latest release (Source code (zip)) from GitHub Release
git clone https://github.com/abus-aikorea/voice-pro.git

2. Install & Run

  1. 🚀 configure.bat
    • Sets up git, ffmpeg, and CUDA (if NVIDIA GPU)
    • Run once; takes 1+ hour with internet
    • Don’t close the command window
  2. 🚀 start.bat
    • Launches Voice-Pro WebUI
    • First run installs dependencies (1+ hour)
    • Retry after deleting installer_files if issues arise

3. Update

  • 🚀 update.bat: Refreshes Python environment (faster than reinstall)

4. Uninstall

  • Run uninstall.bat or delete the folder (portable install)

❓Tips & Tricks

If Browser does not run automatically

  • Close the Windows-Commnad window and run start.bat again.
  • Run the browser directly and enter the address displayed in the Windows-Command window (e.g. http://127.0.0.1:7892) in the address bar.

If a CUDA Out-Of-Memory error occurs

  • Check the GPU memory status in Windows Task Manager - Performance tab.
  • Set the Denoise level to 0 or 1. Denoise level 2 requires at least 8GB of GPU memory.
  • Set Compute Type to int type. The float type has better quality, but requires more GPU memory.

How to improve the quality of subtitles?

  • The quality of subtitles tends to improve with larger Whisper models, but this is not necessarily the case. large > medium > small > base > tiny
  • Among compute types, float type has good performance. The int type is a model that reduces GPU usage and increases speed through model quantization. On the other hand, performance decreases.
  • If you increase the denoise level, more background sounds will be removed, and only the remaining voice will be used for voice recognition. It does not always guarantee good results.

📢 caution

Windows Defender may give a warning about untrusted application and disallow further execution of Voice-Pro. If SmartScreen security level is set to "Warn", just click "More info" and then click "Run anyway". If SmartScreen is set to level "Block" there will be no button to run the installation. In this case, open the properties of the start.bat file, and check "Unblock", apply the change and run the start.bat again.

When Windows Defender mistakenly recognizes a batch file as a Trojan, this is often called a 'False Positive'. To solve this problem, you can go through the following steps:

  1. File exception handling: In Windows Defender, you can set certain files or processes to skip security scanning. To do this, follow the steps below:
    • Click the ‘Start’ button and go to ‘Settings’.
    • Click ‘Update & Security’.
    • Select ‘Windows Security’ and go to ‘Virus & threat protection’.
    • Click ‘Manage Virus & Threat Protection Settings’.
    • Select 'Add exception' in 'Virus & threat protection settings'.
    • Select 'File or Folder', find the batch file in question and add it as an exception.
  2. Temporarily disable Windows Defender: This may be a temporary solution. However, you must be careful when using this method as it may expose your computer to other threats.
  3. Report the problem to anti-virus software: If you are sure that the file is not a Trojan horse, you can report it to Microsoft as a False Positive. Microsoft will review this and take any necessary action.

☕ Notice

  • This repository offers a free trial of Voice-Pro.
  • The free trial version of Voice-Pro allows you to process up to 60 seconds of media.
  • The official version of Voice-Pro can be purchased through the ABUS official website (https://abuskorea.imweb.me)
  • Additionally, if you support us through Buy Me a Coffee ☕, we will give you a usage voucher for up to one month as a token of our gratitude. (#10 (comment))
  • For inquiries regarding purchases, business partnerships, tuning, investments, etc., please contact us via email (abus.aikorea@gmail.com)."

📬 Contact

👍 YouTube

🙏 Credits

©️ Copyright

by ABUS

About

Gradio WebUI for audio processing, powered by Whisper (OpenAI-Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer(RVC), zero-shot Voice Cloning (E2, F5-TTS, CosyVoice), YouTube downloading, vocal isolation(UVR5), Text-to-Speech (Edge-TTS, kokoro), and multi-language translation. Perfect for content creators and developers.

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

Packages

No packages published