A high-performance Python wrapper for Retrieval-Based-Voice-Conversion (RVC) β optimized for speed, simplicity, and integration.
Fast. Lightweight. Production-ready.
This package provides a clean, efficient interface for running RVC voice conversion models in Python. Designed with low-latency inference, batch processing, and pipeline compatibility in mind, itβs ideal for voice apps, AI avatars, content creation tools, and research.
- π§ Preloaded Models: Load models once and reuse them β drastically reducing inference latency.
- π Batch Inference: Convert multiple audio files in parallel with multi-threading support.
- πΎ Flexible I/O: Accepts file paths or raw NumPy arrays; outputs files or in-memory audio arrays.
- π§ Smart Caching: Automatically caches Hubert, models, and pitch estimators for faster repeated conversions.
- βοΈ Configurable Tags: Manage multiple voice models with named configurations (e.g.,
"singer"
,"narrator"
). - π Seamless Resampling & Format Support: Built-in handling for MP3, WAV, OGG, FLAC via
soundfile
andlibrosa
.
- Python 3.10+
- FFmpeg (required for audio loading)
- Install via:
- macOS:
brew install ffmpeg
- Ubuntu/Debian:
sudo apt-get install ffmpeg
- Windows: Download from FFmpeg.org or use
conda install ffmpeg
- macOS:
- Install via:
π‘ Windows Users: You may need Microsoft C++ Build Tools:
- Download Build Tools for Visual Studio
- Install with:
- β C++ build tools
- β MSVC v142+ (VS 2019 or later)
- β Windows 10/11 SDK
pip install git+https://github.com/BF667/rvcpy
β Automatically downloads required models (
hubert_base.pt
,rmvpe.pt
) on first run.
from rvcpy import BaseLoader
# Use GPU (if available) or fallback to CPU
converter = BaseLoader(
only_cpu=False, # Set True to force CPU
hubert_path=None, # Optional: custom Hubert model path
rmvpe_path=None # Optional: custom RMVPE model path
)
Assign a name (tag) to a model and its settings:
converter.apply_conf(
tag="yoimiya", # Your voice model's nickname
file_model="models/yoimiya.pth", # Path to .pth model
pitch_algo="rmvpe+", # Pitch detection: 'pm', 'harvest', 'dio', 'rmvpe', 'rmvpe+'
pitch_lvl=0, # Pitch shift in semitones
file_index="models/yoimiya.index",# Optional index file for timbre preservation
index_influence=0.66, # Index influence (0.0β1.0)
respiration_median_filtering=3, # Filter breathiness
envelope_ratio=0.25, # Volume envelope mix (0 = original, 1 = converted)
consonant_breath_protection=0.33, # Protect consonants from artifacts
resample_sr=0, # Resample output (0 = keep original sample rate)
)
π You can define multiple tags (e.g.,
"singer"
,"narrator"
) and switch between them seamlessly.
audio_files = ["input1.wav", "input2.mp3"] # or just a string: "single.wav"
tags = ["yoimiya"] * len(audio_files) # or ["singer", "narrator"] per file
results = converter(
audio_files=audio_files,
tag_list=tags,
overwrite=False, # Keep originals
parallel_workers=4, # Use 4 threads
type_output="mp3" # Optional: force output format (wav, mp3, flac, ogg)
)
print("Converted files:", results)
# Output: ['input1_edited.mp3', 'input2_edited.mp3']
Use raw audio arrays (e.g., from mic, streaming, or preprocessing):
import numpy as np
# Simulate input: (audio_array, sample_rate)
audio_array = np.random.randn(16000).astype(np.float32) * 0.1
sample_rate = 16000
audio_data = (audio_array, sample_rate)
# Generate with caching (fast repeated calls)
result_array, output_sr = converter.generate_from_cache(
audio_data=audio_data,
tag="yoimiya",
reload=False # Use cached model if config unchanged
)
# Save or play
import soundfile as sf
sf.write("output.wav", result_array, output_sr)
# Or play in Jupyter
from IPython.display import Audio
Audio(result_array, rate=output_sr)
β
generate_from_cache()
is ideal for real-time apps, Gradio demos, or notebook prototyping.
Tip | Benefit |
---|---|
Use generate_from_cache() |
Avoids reloading models on every call |
Set parallel_workers > 1 |
Speed up batch processing |
Keep models in GPU memory | Minimize CPU-GPU transfer overhead |
Reuse BaseLoader instances |
Maximize caching benefits |
β οΈ Note: Changing a tag's config forces a model reload. Use multipleBaseLoader
instances for true multi-model concurrency.
Free GPU/CPU memory when done:
converter.unload_models()
Run different voices simultaneously using separate loaders:
singer = BaseLoader()
narrator = BaseLoader()
singer.apply_conf(tag="singer", file_model="singer.pth", pitch_algo="rmvpe+")
narrator.apply_conf(tag="narrator", file_model="narrator.pth", pitch_algo="pm")
# Run independently
singer("song.wav", "singer", type_output="array")
narrator("story.mp3", "narrator", type_output="array")
Method | Speed | Quality | Notes |
---|---|---|---|
pm |
β‘ Fastest | β Good | Default, lightweight |
harvest |
π’ Slow | β β High | Legacy, stable pitch |
rmvpe |
βοΈ Balanced | β β β Best | Recommended for singing |
rmvpe+ |
β‘ Fast | β β High | Optimized version of RMVPE |
- RVC Project β Retrieval-Based-Voice-Conversion-WebUI
- FFmpeg β Audio decoding backend
- Faiss β Facebook AI Similarity Search (for index matching)
- Fairseq β Hubert model loading
MIT License β Free for personal, academic, and commercial use.
See LICENSE for details.
This software is provided for educational and research purposes only. The authors do not endorse:
- Voice impersonation
- Misuse in deepfakes or deceptive content
- Violation of privacy or consent
Use responsibly and ethically. You are fully responsible for how you use this tool.
Found a bug? Want a new feature?
π Open an issue or submit a PR!
We welcome improvements in:
- Performance
- Documentation
- Audio quality
- New backends (ONNX, TensorRT)