🎤 RVC Python FastInference

A high-performance Python wrapper for Retrieval-Based-Voice-Conversion (RVC) — optimized for speed, simplicity, and integration.

Fast. Lightweight. Production-ready.

This package provides a clean, efficient interface for running RVC voice conversion models in Python. Designed with low-latency inference, batch processing, and pipeline compatibility in mind, it’s ideal for voice apps, AI avatars, content creation tools, and research.

✨ Features

🔧 Preloaded Models: Load models once and reuse them — drastically reducing inference latency.
🚀 Batch Inference: Convert multiple audio files in parallel with multi-threading support.
💾 Flexible I/O: Accepts file paths or raw NumPy arrays; outputs files or in-memory audio arrays.
🧠 Smart Caching: Automatically caches Hubert, models, and pitch estimators for faster repeated conversions.
⚙️ Configurable Tags: Manage multiple voice models with named configurations (e.g., "singer", "narrator").
🔄 Seamless Resampling & Format Support: Built-in handling for MP3, WAV, OGG, FLAC via soundfile and librosa.

🛠️ Installation

Prerequisites

Python 3.10+
FFmpeg (required for audio loading)
- Install via:
  - macOS: brew install ffmpeg
  - Ubuntu/Debian: sudo apt-get install ffmpeg
  - Windows: Download from FFmpeg.org or use conda install ffmpeg

💡 Windows Users: You may need Microsoft C++ Build Tools:

Download Build Tools for Visual Studio

Install with:

✅ C++ build tools

✅ MSVC v142+ (VS 2019 or later)

✅ Windows 10/11 SDK

Install via git

pip install git+https://github.com/BF667/rvcpy

✅ Automatically downloads required models (hubert_base.pt, rmvpe.pt) on first run.

🚀 Quick Start

1. Initialize the Converter

from rvcpy import BaseLoader

# Use GPU (if available) or fallback to CPU
converter = BaseLoader(
    only_cpu=False,           # Set True to force CPU
    hubert_path=None,         # Optional: custom Hubert model path
    rmvpe_path=None           # Optional: custom RMVPE model path
)

2. Configure a Voice Model (Tag-Based)

Assign a name (tag) to a model and its settings:

converter.apply_conf(
    tag="yoimiya",                    # Your voice model's nickname
    file_model="models/yoimiya.pth",  # Path to .pth model
    pitch_algo="rmvpe+",              # Pitch detection: 'pm', 'harvest', 'dio', 'rmvpe', 'rmvpe+'
    pitch_lvl=0,                      # Pitch shift in semitones
    file_index="models/yoimiya.index",# Optional index file for timbre preservation
    index_influence=0.66,             # Index influence (0.0–1.0)
    respiration_median_filtering=3,   # Filter breathiness
    envelope_ratio=0.25,              # Volume envelope mix (0 = original, 1 = converted)
    consonant_breath_protection=0.33, # Protect consonants from artifacts
    resample_sr=0,                    # Resample output (0 = keep original sample rate)
)

🔁 You can define multiple tags (e.g., "singer", "narrator") and switch between them seamlessly.

3. Run Inference

Convert one or more files:

audio_files = ["input1.wav", "input2.mp3"]  # or just a string: "single.wav"
tags = ["yoimiya"] * len(audio_files)       # or ["singer", "narrator"] per file

results = converter(
    audio_files=audio_files,
    tag_list=tags,
    overwrite=False,             # Keep originals
    parallel_workers=4,          # Use 4 threads
    type_output="mp3"            # Optional: force output format (wav, mp3, flac, ogg)
)

print("Converted files:", results)
# Output: ['input1_edited.mp3', 'input2_edited.mp3']

4. Real-Time / Array-Based Inference (Advanced)

Use raw audio arrays (e.g., from mic, streaming, or preprocessing):

import numpy as np

# Simulate input: (audio_array, sample_rate)
audio_array = np.random.randn(16000).astype(np.float32) * 0.1
sample_rate = 16000
audio_data = (audio_array, sample_rate)

# Generate with caching (fast repeated calls)
result_array, output_sr = converter.generate_from_cache(
    audio_data=audio_data,
    tag="yoimiya",
    reload=False  # Use cached model if config unchanged
)

# Save or play
import soundfile as sf
sf.write("output.wav", result_array, output_sr)

# Or play in Jupyter
from IPython.display import Audio
Audio(result_array, rate=output_sr)

✅ generate_from_cache() is ideal for real-time apps, Gradio demos, or notebook prototyping.

🧠 Performance Tips

Tip	Benefit
Use `generate_from_cache()`	Avoids reloading models on every call
Set `parallel_workers > 1`	Speed up batch processing
Keep models in GPU memory	Minimize CPU-GPU transfer overhead
Reuse `BaseLoader` instances	Maximize caching benefits

⚠️ Note: Changing a tag's config forces a model reload. Use multiple BaseLoader instances for true multi-model concurrency.

🧹 Clean Up

Free GPU/CPU memory when done:

converter.unload_models()

📚 Advanced: Multiple Models

Run different voices simultaneously using separate loaders:

singer = BaseLoader()
narrator = BaseLoader()

singer.apply_conf(tag="singer", file_model="singer.pth", pitch_algo="rmvpe+")
narrator.apply_conf(tag="narrator", file_model="narrator.pth", pitch_algo="pm")

# Run independently
singer("song.wav", "singer", type_output="array")
narrator("story.mp3", "narrator", type_output="array")

🏷️ Supported Pitch Algorithms

Method	Speed	Quality	Notes
`pm`	⚡ Fastest	✅ Good	Default, lightweight
`harvest`	🐢 Slow	✅✅ High	Legacy, stable pitch
`rmvpe`	⚖️ Balanced	✅✅✅ Best	Recommended for singing
`rmvpe+`	⚡ Fast	✅✅ High	Optimized version of RMVPE

📜 Credits

RVC Project – Retrieval-Based-Voice-Conversion-WebUI
FFmpeg – Audio decoding backend
Faiss – Facebook AI Similarity Search (for index matching)
Fairseq – Hubert model loading

📄 License

MIT License — Free for personal, academic, and commercial use.

See LICENSE for details.

⚠️ Disclaimer

This software is provided for educational and research purposes only. The authors do not endorse:

Voice impersonation
Misuse in deepfakes or deceptive content
Violation of privacy or consent

Use responsibly and ethically. You are fully responsible for how you use this tool.

🤝 Feedback & Contributions

Found a bug? Want a new feature?
👉 Open an issue or submit a PR!

We welcome improvements in:

Performance
Documentation
Audio quality
New backends (ONNX, TensorRT)

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
rvcpy		rvcpy
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎤 RVC Python FastInference

✨ Features

🛠️ Installation

Prerequisites

Install via git

🚀 Quick Start

1. Initialize the Converter

2. Configure a Voice Model (Tag-Based)

3. Run Inference

Convert one or more files:

4. Real-Time / Array-Based Inference (Advanced)

🧠 Performance Tips

🧹 Clean Up

📚 Advanced: Multiple Models

🏷️ Supported Pitch Algorithms

📜 Credits

📄 License

⚠️ Disclaimer

🤝 Feedback & Contributions

About

Uh oh!

Contributors 2

Uh oh!

Languages

License

BF667/rvcpy

Folders and files

Latest commit

History

Repository files navigation

🎤 RVC Python FastInference

✨ Features

🛠️ Installation

Prerequisites

Install via git

🚀 Quick Start

1. Initialize the Converter

2. Configure a Voice Model (Tag-Based)

3. Run Inference

Convert one or more files:

4. Real-Time / Array-Based Inference (Advanced)

🧠 Performance Tips

🧹 Clean Up

📚 Advanced: Multiple Models

🏷️ Supported Pitch Algorithms

📜 Credits

📄 License

⚠️ Disclaimer

🤝 Feedback & Contributions

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 2

Uh oh!

Languages