Skip to content

Commit

Permalink
docs(vc): add page on voice conversion
Browse files Browse the repository at this point in the history
[ci skip]
  • Loading branch information
eginhard committed Jan 15, 2025
1 parent 5e1085c commit 0c3d995
Show file tree
Hide file tree
Showing 4 changed files with 91 additions and 2 deletions.
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -235,7 +235,7 @@ tts.tts_to_file(text="Ich bin eine Testnachricht.", file_path=OUTPUT_PATH)

#### Voice conversion (VC)

Converting the voice in `source_wav` to the voice of `target_wav`
Converting the voice in `source_wav` to the voice of `target_wav`:

```python
tts = TTS("voice_conversion_models/multilingual/vctk/freevc24").to("cuda")
Expand All @@ -247,9 +247,13 @@ tts.voice_conversion_to_file(
```

Other available voice conversion models:
- `voice_conversion_models/multilingual/multi-dataset/knnvc`
- `voice_conversion_models/multilingual/multi-dataset/openvoice_v1`
- `voice_conversion_models/multilingual/multi-dataset/openvoice_v2`

For more details, see the
[documentation](https://coqui-tts.readthedocs.io/en/latest/vc.html).

#### Voice cloning by combining single speaker TTS model with the default VC model

This way, you can clone voices by using any model in 🐸TTS. The FreeVC model is
Expand Down
1 change: 1 addition & 0 deletions docs/source/inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Coqui TTS provides three main methods for inference:

```{toctree}
:hidden:
vc
server
marytts
```
2 changes: 1 addition & 1 deletion docs/source/models/xtts.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ To use the model API, you need to download the model files and pass config and m
If you want to be able to `load_checkpoint` with `use_deepspeed=True` and **enjoy the speedup**, you need to install deepspeed first.

```console
pip install deepspeed==0.10.3
pip install deepspeed
```

#### Inference parameters
Expand Down
84 changes: 84 additions & 0 deletions docs/source/vc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# Voice conversion

## Overview

Voice conversion (VC) converts the voice in a speech signal from one speaker to
that of another speaker while preserving the linguistic content. Coqui supports
both voice conversion on its own, as well as applying it after speech synthesis
to enable multi-speaker output with single-speaker TTS models.

### Python API

Converting the voice in `source_wav` to the voice of `target_wav` (the latter
can also be a list of files):

```python
from TTS.api import TTS

tts = TTS("voice_conversion_models/multilingual/vctk/freevc24").to("cuda")
tts.voice_conversion_to_file(
source_wav="my/source.wav",
target_wav="my/target.wav",
file_path="output.wav"
)
```

Voice cloning by combining TTS and VC. The FreeVC model is used for voice
conversion after synthesizing speech.

```python

tts = TTS("tts_models/de/thorsten/tacotron2-DDC")
tts.tts_with_vc_to_file(
"Wie sage ich auf Italienisch, dass ich dich liebe?",
speaker_wav=["target1.wav", "target2.wav"],
file_path="output.wav"
)
```

Some models, including [XTTS](models/xtts.md), support voice cloning directly
and a separate voice conversion step is not necessary:

```python
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to("cuda")
wav = tts.tts(
text="Hello world!",
speaker_wav="my/cloning/audio.wav",
language="en"
)
```

### CLI

```sh
tts --out_path output/path/speech.wav \
--model_name "<language>/<dataset>/<model_name>" \
--source_wav <path/to/speaker/wav> \
--target_wav <path/to/reference/wav1> <path/to/reference/wav2>
```

## Pretrained models

Coqui includes the following pretrained voice conversion models. Training is not
supported.

### FreeVC

- `voice_conversion_models/multilingual/vctk/freevc24`

Adapted from: https://github.com/OlaWod/FreeVC

### kNN-VC

- `voice_conversion_models/multilingual/multi-dataset/knnvc`

At least 1-5 minutes of target speaker data are recommended.

Adapted from: https://github.com/bshall/knn-vc

### OpenVoice

- `voice_conversion_models/multilingual/multi-dataset/openvoice_v1`
- `voice_conversion_models/multilingual/multi-dataset/openvoice_v2`

Adapted from: https://github.com/myshell-ai/OpenVoice

0 comments on commit 0c3d995

Please sign in to comment.