0.14.0 +streaming, +pcm, +wav, +temp, top_p, etc.

matatonic · Jun 27, 2024 · ae6a384 · ae6a384
1 parent 65c03e3
commit ae6a384
Show file tree

Hide file tree

Showing 8 changed files with 205 additions and 72 deletions.
diff --git a/README.md b/README.md
@@ -10,7 +10,7 @@ An OpenAI API compatible text to speech server.
 Full Compatibility:
 * `tts-1`: `alloy`, `echo`, `fable`, `onyx`, `nova`, and `shimmer` (configurable)
 * `tts-1-hd`:  `alloy`, `echo`, `fable`, `onyx`, `nova`, and `shimmer` (configurable, uses OpenAI samples by default)
-* response_format: `mp3`, `opus`, `aac`, or `flac`
+* response_format: `mp3`, `opus`, `aac`, `flac`, `wav` and `pcm`
 * speed 0.25-4.0 (and more)
 
 Details:
@@ -20,13 +20,23 @@ Details:
   * Custom cloned voices can be used for tts-1-hd, See: [Custom Voices Howto](#custom-voices-howto)
   * 🌐 [Multilingual](#multilingual) support with XTTS voices
   * [Custom fine-tuned XTTS model support](#custom-fine-tuned-model-support)
+  * Configurable [generation parameters](#generation-parameters)
+  * Streamed output while generating
 * Occasionally, certain words or symbols may sound incorrect, you can fix them with regex via `pre_process_map.yaml`
 
 
 If you find a better voice match for `tts-1` or `tts-1-hd`, please let me know so I can update the defaults.
 
 ## Recent Changes
 
+Version 0.14.0, 2024-06-26
+
+* Added `response_format`: `wav` and `pcm` support
+* Output streaming (while generating) for `tts-1` and `tts-1-hd`
+* Enhanced [generation parameters](#generation-parameters) for xtts models (temperature, top_p, etc.)
+* Idle unload timer (optional) - doesn't work perfectly yet
+* Improved error handling
+
 Version 0.13.0, 2024-06-25
 
 * Added [Custom fine-tuned XTTS model support](#custom-fine-tuned-model-support)
@@ -313,3 +323,21 @@ tts-1-hd:
     model_path: voices/halo
 ```
 3) The model will be loaded when you access the voice for the first time (`--preload` doesn't work with custom models yet)
+
+## Generation Parameters
+
+The generation of XTTSv2 voices can be fine tuned with the following options (defaults included below):
+
+```yaml
+tts-1-hd:
+  alloy:
+    model: xtts
+    speaker: voices/alloy.wav
+    enable_text_splitting: True
+    length_penalty: 1.0
+    repetition_penalty: 10
+    speed: 1.0
+    temperature: 0.75
+    top_k: 50
+    top_p: 0.85
+```
diff --git a/download_voices_tts-1-hd.bat b/download_voices_tts-1-hd.bat
@@ -2,10 +2,7 @@
 set COQUI_TOS_AGREED=1
 set TTS_HOME=voices
 
-set MODELS=%* 
-if "%MODELS%" == "" set MODELS=xtts
-
-for %%i in (%MODELS%) do (
+for %%i in (%*) do (
     python -c "from TTS.utils.manage import ModelManager; ModelManager().download_model('%%i')"
 )
 call download_samples.bat
diff --git a/download_voices_tts-1-hd.sh b/download_voices_tts-1-hd.sh
@@ -2,8 +2,7 @@
 export COQUI_TOS_AGREED=1
 export TTS_HOME=voices
 
-MODELS=${*:-xtts}
-for model in $MODELS; do
+for model in $*; do
 	python -c "from TTS.utils.manage import ModelManager; ModelManager().download_model('$model')"
 done
 ./download_samples.sh
diff --git a/requirements-rocm.txt b/requirements-rocm.txt
@@ -4,7 +4,9 @@ loguru
 # piper-tts
 piper-tts==1.2.0
 # xtts
-TTS
+TTS==0.22.0
+# https://github.com/huggingface/transformers/issues/31040
+transformers<4.41.0 
 # XXX, 3.8+ has some issue for now
 spacy==3.7.4
 

diff --git a/requirements.txt b/requirements.txt
@@ -4,7 +4,9 @@ loguru
 # piper-tts
 piper-tts==1.2.0
 # xtts
-TTS
+TTS==0.22.0
+# https://github.com/huggingface/transformers/issues/31040
+transformers<4.41.0 
 # XXX, 3.8+ has some issue for now
 spacy==3.7.4
 

diff --git a/sample.env b/sample.env
@@ -2,5 +2,5 @@ TTS_HOME=voices
 HF_HOME=voices
 #PRELOAD_MODEL=xtts
 #PRELOAD_MODEL=xtts_v2.0.2
-#EXTRA_ARGS=--log-level DEBUG
+#EXTRA_ARGS=--log-level DEBUG --unload-timer 300
 #USE_ROCM=1