diff --git a/.github/workflows/build-docker.yml b/.github/workflows/build-docker.yml index 4c5ba51..bff70ae 100644 --- a/.github/workflows/build-docker.yml +++ b/.github/workflows/build-docker.yml @@ -177,6 +177,8 @@ jobs: tags: ${{ env.DOCKER_REGISTRY }}/${{ env.IMAGE_NAME }}:latest labels: version=${{ github.run_id }} platforms: linux/amd64,linux/arm64 + build-args: | + USE_ROCM=1 # For tagged releases, build and push the Docker image with the corresponding tag - name: Build and Push Docker Image (Tagged) @@ -189,4 +191,6 @@ jobs: tags: ${{ env.DOCKER_REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.ref_name }} labels: version=${{ github.run_id }} platforms: linux/amd64,linux/arm64 + build-args: | + USE_ROCM=1 diff --git a/Dockerfile b/Dockerfile index 9d2fcfa..57a335e 100644 --- a/Dockerfile +++ b/Dockerfile @@ -14,7 +14,7 @@ ARG USE_ROCM ENV USE_ROCM=${USE_ROCM} COPY requirements*.txt /app/ -RUN if [ ${USE_ROCM} = "1" ]; then mv /app/requirements-rocm.txt /app/requirements.txt; fi +RUN if [ "${USE_ROCM}" = "1" ]; then mv /app/requirements-rocm.txt /app/requirements.txt; fi RUN --mount=type=cache,target=/root/.cache/pip pip install -r requirements.txt COPY speech.py openedai.py say.py *.sh *.default.yaml README.md LICENSE /app/ @@ -23,7 +23,6 @@ ARG PRELOAD_MODEL ENV PRELOAD_MODEL=${PRELOAD_MODEL} ENV TTS_HOME=voices ENV HF_HOME=voices -ENV OPENEDAI_LOG_LEVEL=INFO ENV COQUI_TOS_AGREED=1 CMD bash startup.sh diff --git a/Dockerfile.min b/Dockerfile.min index ae1a72b..cc1db1f 100644 --- a/Dockerfile.min +++ b/Dockerfile.min @@ -10,12 +10,11 @@ RUN apt-get clean && rm -rf /var/lib/apt/lists/* WORKDIR /app RUN mkdir -p voices config -RUN --mount=type=cache,target=/root/.cache/pip pip install piper-tts==1.2.0 pyyaml fastapi uvicorn loguru numpy\<2 - +COPY requirements*.txt /app/ +RUN --mount=type=cache,target=/root/.cache/pip pip install -r requirements-min.txt COPY speech.py openedai.py say.py *.sh *.default.yaml README.md LICENSE /app/ ENV TTS_HOME=voices ENV HF_HOME=voices -ENV OPENEDAI_LOG_LEVEL=INFO CMD bash startup.min.sh diff --git a/README.md b/README.md index c4dfc36..b2b93a3 100644 --- a/README.md +++ b/README.md @@ -27,11 +27,11 @@ If you find a better voice match for `tts-1` or `tts-1-hd`, please let me know s ## Recent Changes -Version 0.13.0, 2024-06-22 +Version 0.13.0, 2024-06-25 * Added [Custom fine-tuned XTTS model support](#custom-fine-tuned-model-support) -* Initial prebuilt arm64 image support with MPS (Apple M-series, Raspberry Pi), thanks @JakeStevenson, @hchasens -* Initial AMD GPU (rocm 5.7) support, set USE_ROCM=1 when building docker or use requirements-rocm.txt +* Initial prebuilt arm64 image support (Apple M-series, Raspberry Pi - MPS is not supported in XTTS/torch), thanks @JakeStevenson, @hchasens +* Initial attempt at AMD GPU (ROCm 5.7) support * Parler-tts support removed * Move the *.default.yaml to the root folder * Run the docker as a service by default (`restart: unless-stopped`) @@ -86,63 +86,68 @@ Version: 0.7.3, 2024-03-20 ## Installation instructions -1. Copy the `sample.env` to `speech.env` (customize if needed) +### Create a `speech.env` environment file + +Copy the `sample.env` to `speech.env` (customize if needed) ```bash cp sample.env speech.env ``` -#### AMD GPU (ROCm support) -> If you have an AMD GPU and want to use ROCm, set `USE_ROCM=1` in the `speech.env` before building the docker image. You will need to `docker compose build` before running the container in the next step. - -2. Option: Docker (**recommended**) (prebuilt images are available) -Run the server: -```shell -docker compose up +#### Defaults +```bash +TTS_HOME=voices +HF_HOME=voices +#PRELOAD_MODEL=xtts +#PRELOAD_MODEL=xtts_v2.0.2 +#EXTRA_ARGS=--log-level DEBUG +#USE_ROCM=1 ``` -> For a minimal docker image with only piper support (<1GB vs. 8GB) use `docker compose -f docker-compose.min.yml up` - - -2. Option: Manual installation: +### Option A: Manual installation ```shell # install curl and ffmpeg sudo apt install curl ffmpeg # Create & activate a new virtual environment (optional but recommended) python -m venv .venv source .venv/bin/activate -# Install the Python requirements - use requirements-rocm.txt for AMD GPU (ROCm support) +# Install the Python requirements +# - use requirements-rocm.txt for AMD GPU (ROCm support) +# - use requirements-min.txt for piper only (CPU only) pip install -r requirements.txt # run the server bash startup.sh ``` +> On first run, the voice models will be downloaded automatically. This might take a while depending on your network connection. -## Usage +### Option B: Docker Image (*recommended*) -``` -usage: speech.py [-h] [--xtts_device XTTS_DEVICE] [--preload PRELOAD] [-P PORT] [-H HOST] [-L {DEBUG,INFO,WARNING,ERROR,CRITICAL}] +#### Nvidia GPU (cuda) -OpenedAI Speech API Server +```shell +docker compose up +``` -options: - -h, --help show this help message and exit - --xtts_device XTTS_DEVICE - Set the device for the xtts model. The special value of 'none' will use piper for all models. (default: cuda) - --preload PRELOAD Preload a model (Ex. 'xtts' or 'xtts_v2.0.2'). By default it's loaded on first use. (default: None) - -P PORT, --port PORT Server tcp port (default: 8000) - -H HOST, --host HOST Host to listen on, Ex. 0.0.0.0 (default: 0.0.0.0) - -L {DEBUG,INFO,WARNING,ERROR,CRITICAL}, --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL} - Set the log level (default: INFO) +#### AMD GPU (ROCm support) +```shell +docker compose -d docker-compose.rocm.yml up ``` -## API Documentation +#### ARM64 (Apple M-series, Raspberry Pi) -* [OpenAI Text to speech guide](https://platform.openai.com/docs/guides/text-to-speech) -* [OpenAI API Reference](https://platform.openai.com/docs/api-reference/audio/createSpeech) +> XTTS only has CPU support here and will be very slow, you can use the Nvidia image for XTTS with CPU (slow), or use the piper only image (recommended) +#### CPU only, No GPU (piper only) -### Sample API Usage +> For a minimal docker image with only piper support (<1GB vs. 8GB). + +```shell +docker compose -f docker-compose.min.yml up +``` + + +## Sample Usage You can use it like this: @@ -193,51 +198,18 @@ python say.py -t "The quick brown fox jumped over the lazy dog." -p python say.py -t "The quick brown fox jumped over the lazy dog." -m tts-1-hd -v onyx -f flac -o fox.flac ``` -``` -usage: say.py [-h] [-m MODEL] [-v VOICE] [-f {mp3,aac,opus,flac}] [-s SPEED] [-t TEXT] [-i INPUT] [-o OUTPUT] [-p] - -Text to speech using the OpenAI API - -options: - -h, --help show this help message and exit - -m MODEL, --model MODEL - The model to use (default: tts-1) - -v VOICE, --voice VOICE - The voice of the speaker (default: alloy) - -f {mp3,aac,opus,flac}, --format {mp3,aac,opus,flac} - The output audio format (default: mp3) - -s SPEED, --speed SPEED - playback speed, 0.25-4.0 (default: 1.0) - -t TEXT, --text TEXT Provide text to read on the command line (default: None) - -i INPUT, --input INPUT - Read text from a file (default is to read from stdin) (default: None) - -o OUTPUT, --output OUTPUT - The filename to save the output to (default: None) - -p, --playsound Play the audio (default: False) - -``` - You can also try the included `audio_reader.py` for listening to longer text and streamed input. +Example usage: +```bash +python audio_reader.py -s 2 < LICENSE # read the software license - fast ``` -usage: audio_reader.py [-h] [-m MODEL] [-v VOICE] [-s SPEED] -Text to speech player +## OpenAI API Documentation and Guide -options: - -h, --help show this help message and exit - -m MODEL, --model MODEL - The OpenAI model (default: tts-1) - -v VOICE, --voice VOICE - The voice to use (default: alloy) - -s SPEED, --speed SPEED - How fast to read the audio (default: 1.0) +* [OpenAI Text to speech guide](https://platform.openai.com/docs/guides/text-to-speech) +* [OpenAI API Reference](https://platform.openai.com/docs/api-reference/audio/createSpeech) -``` -Example usage: -```bash -$ python audio_reader.py -s 2 < LICENSE -``` ## Custom Voices Howto diff --git a/config/config_files_will_go_here.txt b/config/config_files_will_go_here.txt new file mode 100644 index 0000000..e69de29 diff --git a/docker-compose.rocm.yml b/docker-compose.rocm.yml index 29cc5bd..8f3bf4e 100644 --- a/docker-compose.rocm.yml +++ b/docker-compose.rocm.yml @@ -2,9 +2,9 @@ services: server: build: dockerfile: Dockerfile + args: + - USE_ROCM=1 image: ghcr.io/matatonic/openedai-speech-rocm - environment: - - USE_ROCM=1 env_file: speech.env ports: - "8000:8000" diff --git a/requirements-min.txt b/requirements-min.txt new file mode 100644 index 0000000..744da39 --- /dev/null +++ b/requirements-min.txt @@ -0,0 +1,6 @@ +pyyaml +fastapi +uvicorn +loguru +numpy<2 +piper-tts==1.2.0 diff --git a/requirements.txt b/requirements.txt index 4334264..1155e14 100644 --- a/requirements.txt +++ b/requirements.txt @@ -12,7 +12,7 @@ spacy==3.7.4 # Re: https://github.com/pytorch/pytorch/issues/121834 torch==2.2.2; sys_platform != "darwin" torchaudio; sys_platform != "darwin" -# for MPS accelerated torch on Mac +# for MPS accelerated torch on Mac - doesn't work yet, incomplete support in torch and torchaudio torch==2.2.2; --index-url https://download.pytorch.org/whl/cpu; sys_platform == "darwin" torchaudio==2.2.2; --index-url https://download.pytorch.org/whl/cpu; sys_platform == "darwin" diff --git a/speech.py b/speech.py index 7b4c22e..a1c1266 100755 --- a/speech.py +++ b/speech.py @@ -204,7 +204,7 @@ async def generate_speech(request: GenerateSpeechRequest): return StreamingResponse(content=ffmpeg_proc.stdout, media_type=media_type) - +# We return 'mps' but currently XTTS will not work with mps devices as the cuda support is incomplete def auto_torch_device(): try: import torch @@ -213,7 +213,6 @@ def auto_torch_device(): except: return 'none' - if __name__ == "__main__": parser = argparse.ArgumentParser( description='OpenedAI Speech API Server',