Skip to content

Commit

Permalink
0.14.0 +streaming, +pcm, +wav, +temp, top_p, etc.
Browse files Browse the repository at this point in the history
  • Loading branch information
matatonic committed Jun 27, 2024
1 parent 65c03e3 commit ae6a384
Show file tree
Hide file tree
Showing 8 changed files with 205 additions and 72 deletions.
30 changes: 29 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ An OpenAI API compatible text to speech server.
Full Compatibility:
* `tts-1`: `alloy`, `echo`, `fable`, `onyx`, `nova`, and `shimmer` (configurable)
* `tts-1-hd`: `alloy`, `echo`, `fable`, `onyx`, `nova`, and `shimmer` (configurable, uses OpenAI samples by default)
* response_format: `mp3`, `opus`, `aac`, or `flac`
* response_format: `mp3`, `opus`, `aac`, `flac`, `wav` and `pcm`
* speed 0.25-4.0 (and more)

Details:
Expand All @@ -20,13 +20,23 @@ Details:
* Custom cloned voices can be used for tts-1-hd, See: [Custom Voices Howto](#custom-voices-howto)
* 🌐 [Multilingual](#multilingual) support with XTTS voices
* [Custom fine-tuned XTTS model support](#custom-fine-tuned-model-support)
* Configurable [generation parameters](#generation-parameters)
* Streamed output while generating
* Occasionally, certain words or symbols may sound incorrect, you can fix them with regex via `pre_process_map.yaml`


If you find a better voice match for `tts-1` or `tts-1-hd`, please let me know so I can update the defaults.

## Recent Changes

Version 0.14.0, 2024-06-26

* Added `response_format`: `wav` and `pcm` support
* Output streaming (while generating) for `tts-1` and `tts-1-hd`
* Enhanced [generation parameters](#generation-parameters) for xtts models (temperature, top_p, etc.)
* Idle unload timer (optional) - doesn't work perfectly yet
* Improved error handling

Version 0.13.0, 2024-06-25

* Added [Custom fine-tuned XTTS model support](#custom-fine-tuned-model-support)
Expand Down Expand Up @@ -313,3 +323,21 @@ tts-1-hd:
model_path: voices/halo
```
3) The model will be loaded when you access the voice for the first time (`--preload` doesn't work with custom models yet)

## Generation Parameters

The generation of XTTSv2 voices can be fine tuned with the following options (defaults included below):

```yaml
tts-1-hd:
alloy:
model: xtts
speaker: voices/alloy.wav
enable_text_splitting: True
length_penalty: 1.0
repetition_penalty: 10
speed: 1.0
temperature: 0.75
top_k: 50
top_p: 0.85
```
5 changes: 1 addition & 4 deletions download_voices_tts-1-hd.bat
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,7 @@
set COQUI_TOS_AGREED=1
set TTS_HOME=voices

set MODELS=%*
if "%MODELS%" == "" set MODELS=xtts

for %%i in (%MODELS%) do (
for %%i in (%*) do (
python -c "from TTS.utils.manage import ModelManager; ModelManager().download_model('%%i')"
)
call download_samples.bat
3 changes: 1 addition & 2 deletions download_voices_tts-1-hd.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,7 @@
export COQUI_TOS_AGREED=1
export TTS_HOME=voices

MODELS=${*:-xtts}
for model in $MODELS; do
for model in $*; do
python -c "from TTS.utils.manage import ModelManager; ModelManager().download_model('$model')"
done
./download_samples.sh
4 changes: 3 additions & 1 deletion requirements-rocm.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@ loguru
# piper-tts
piper-tts==1.2.0
# xtts
TTS
TTS==0.22.0
# https://github.com/huggingface/transformers/issues/31040
transformers<4.41.0
# XXX, 3.8+ has some issue for now
spacy==3.7.4

Expand Down
4 changes: 3 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@ loguru
# piper-tts
piper-tts==1.2.0
# xtts
TTS
TTS==0.22.0
# https://github.com/huggingface/transformers/issues/31040
transformers<4.41.0
# XXX, 3.8+ has some issue for now
spacy==3.7.4

Expand Down
2 changes: 1 addition & 1 deletion sample.env
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@ TTS_HOME=voices
HF_HOME=voices
#PRELOAD_MODEL=xtts
#PRELOAD_MODEL=xtts_v2.0.2
#EXTRA_ARGS=--log-level DEBUG
#EXTRA_ARGS=--log-level DEBUG --unload-timer 300
#USE_ROCM=1
Loading

0 comments on commit ae6a384

Please sign in to comment.