Skip to content

BF667/rvcpy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

28 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎀 RVC Python FastInference

A high-performance Python wrapper for Retrieval-Based-Voice-Conversion (RVC) β€” optimized for speed, simplicity, and integration.

Fast. Lightweight. Production-ready.

This package provides a clean, efficient interface for running RVC voice conversion models in Python. Designed with low-latency inference, batch processing, and pipeline compatibility in mind, it’s ideal for voice apps, AI avatars, content creation tools, and research.


✨ Features

  • πŸ”§ Preloaded Models: Load models once and reuse them β€” drastically reducing inference latency.
  • πŸš€ Batch Inference: Convert multiple audio files in parallel with multi-threading support.
  • πŸ’Ύ Flexible I/O: Accepts file paths or raw NumPy arrays; outputs files or in-memory audio arrays.
  • 🧠 Smart Caching: Automatically caches Hubert, models, and pitch estimators for faster repeated conversions.
  • βš™οΈ Configurable Tags: Manage multiple voice models with named configurations (e.g., "singer", "narrator").
  • πŸ”„ Seamless Resampling & Format Support: Built-in handling for MP3, WAV, OGG, FLAC via soundfile and librosa.

πŸ› οΈ Installation

Prerequisites

  • Python 3.10+
  • FFmpeg (required for audio loading)
    • Install via:
      • macOS: brew install ffmpeg
      • Ubuntu/Debian: sudo apt-get install ffmpeg
      • Windows: Download from FFmpeg.org or use conda install ffmpeg

πŸ’‘ Windows Users: You may need Microsoft C++ Build Tools:

  1. Download Build Tools for Visual Studio
  2. Install with:
    • βœ… C++ build tools
    • βœ… MSVC v142+ (VS 2019 or later)
    • βœ… Windows 10/11 SDK

Install via git

pip install git+https://github.com/BF667/rvcpy

βœ… Automatically downloads required models (hubert_base.pt, rmvpe.pt) on first run.


πŸš€ Quick Start

1. Initialize the Converter

from rvcpy import BaseLoader

# Use GPU (if available) or fallback to CPU
converter = BaseLoader(
    only_cpu=False,           # Set True to force CPU
    hubert_path=None,         # Optional: custom Hubert model path
    rmvpe_path=None           # Optional: custom RMVPE model path
)

2. Configure a Voice Model (Tag-Based)

Assign a name (tag) to a model and its settings:

converter.apply_conf(
    tag="yoimiya",                    # Your voice model's nickname
    file_model="models/yoimiya.pth",  # Path to .pth model
    pitch_algo="rmvpe+",              # Pitch detection: 'pm', 'harvest', 'dio', 'rmvpe', 'rmvpe+'
    pitch_lvl=0,                      # Pitch shift in semitones
    file_index="models/yoimiya.index",# Optional index file for timbre preservation
    index_influence=0.66,             # Index influence (0.0–1.0)
    respiration_median_filtering=3,   # Filter breathiness
    envelope_ratio=0.25,              # Volume envelope mix (0 = original, 1 = converted)
    consonant_breath_protection=0.33, # Protect consonants from artifacts
    resample_sr=0,                    # Resample output (0 = keep original sample rate)
)

πŸ” You can define multiple tags (e.g., "singer", "narrator") and switch between them seamlessly.


3. Run Inference

Convert one or more files:

audio_files = ["input1.wav", "input2.mp3"]  # or just a string: "single.wav"
tags = ["yoimiya"] * len(audio_files)       # or ["singer", "narrator"] per file

results = converter(
    audio_files=audio_files,
    tag_list=tags,
    overwrite=False,             # Keep originals
    parallel_workers=4,          # Use 4 threads
    type_output="mp3"            # Optional: force output format (wav, mp3, flac, ogg)
)

print("Converted files:", results)
# Output: ['input1_edited.mp3', 'input2_edited.mp3']

4. Real-Time / Array-Based Inference (Advanced)

Use raw audio arrays (e.g., from mic, streaming, or preprocessing):

import numpy as np

# Simulate input: (audio_array, sample_rate)
audio_array = np.random.randn(16000).astype(np.float32) * 0.1
sample_rate = 16000
audio_data = (audio_array, sample_rate)

# Generate with caching (fast repeated calls)
result_array, output_sr = converter.generate_from_cache(
    audio_data=audio_data,
    tag="yoimiya",
    reload=False  # Use cached model if config unchanged
)

# Save or play
import soundfile as sf
sf.write("output.wav", result_array, output_sr)

# Or play in Jupyter
from IPython.display import Audio
Audio(result_array, rate=output_sr)

βœ… generate_from_cache() is ideal for real-time apps, Gradio demos, or notebook prototyping.


🧠 Performance Tips

Tip Benefit
Use generate_from_cache() Avoids reloading models on every call
Set parallel_workers > 1 Speed up batch processing
Keep models in GPU memory Minimize CPU-GPU transfer overhead
Reuse BaseLoader instances Maximize caching benefits

⚠️ Note: Changing a tag's config forces a model reload. Use multiple BaseLoader instances for true multi-model concurrency.


🧹 Clean Up

Free GPU/CPU memory when done:

converter.unload_models()

πŸ“š Advanced: Multiple Models

Run different voices simultaneously using separate loaders:

singer = BaseLoader()
narrator = BaseLoader()

singer.apply_conf(tag="singer", file_model="singer.pth", pitch_algo="rmvpe+")
narrator.apply_conf(tag="narrator", file_model="narrator.pth", pitch_algo="pm")

# Run independently
singer("song.wav", "singer", type_output="array")
narrator("story.mp3", "narrator", type_output="array")

🏷️ Supported Pitch Algorithms

Method Speed Quality Notes
pm ⚑ Fastest βœ… Good Default, lightweight
harvest 🐒 Slow βœ…βœ… High Legacy, stable pitch
rmvpe βš–οΈ Balanced βœ…βœ…βœ… Best Recommended for singing
rmvpe+ ⚑ Fast βœ…βœ… High Optimized version of RMVPE

πŸ“œ Credits

  • RVC Project – Retrieval-Based-Voice-Conversion-WebUI
  • FFmpeg – Audio decoding backend
  • Faiss – Facebook AI Similarity Search (for index matching)
  • Fairseq – Hubert model loading

πŸ“„ License

MIT License β€” Free for personal, academic, and commercial use.

See LICENSE for details.


⚠️ Disclaimer

This software is provided for educational and research purposes only. The authors do not endorse:

  • Voice impersonation
  • Misuse in deepfakes or deceptive content
  • Violation of privacy or consent

Use responsibly and ethically. You are fully responsible for how you use this tool.


🀝 Feedback & Contributions

Found a bug? Want a new feature?
πŸ‘‰ Open an issue or submit a PR!

We welcome improvements in:

  • Performance
  • Documentation
  • Audio quality
  • New backends (ONNX, TensorRT)

About

Voice conversion framework based on VITS

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

  •  
  •  

Languages