NeuralVox · ShabbirMarfatiya · Dec 13, 2025 · Dec 13, 2025 · Dec 13, 2025
diff --git a/API_DOCS.md b/API_DOCS.md
@@ -1 +1,151 @@
-Coming soon
+# StyleTTS2 HTTP Streaming API Documentation
+
+## Overview
+
+The HTTP Streaming API provides text-to-speech synthesis with real-time audio streaming. The server uses Flask and returns WAV audio data.
+
+## Base URL
+
+```
+http://localhost:5000
+```
+
+## Endpoints
+
+### GET /
+
+Returns API documentation in HTML format.
+
+---
+
+### POST /api/v1/stream
+
+Synthesizes speech from text with streaming audio response.
+
+**Request Body (form-data):**
+
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `text` | string | Yes | Text to synthesize |
+| `voice` | string | Yes | Voice ID (see available voices below) |
+| `steps` | integer | No | Diffusion steps (default: 7, higher = better quality) |
+
+**Response:**
+- Content-Type: `audio/x-wav`
+- Streams WAV audio data in chunks
+
+**Example with curl:**
+
+```bash
+curl -X POST http://localhost:5000/api/v1/stream \
+  -d "text=Hello, this is a test of the streaming API." \
+  -d "voice=f-us-1" \
+  -d "steps=7" \
+  --output output.wav
+```
+
+**Example with Python:**
+
+```python
+import requests
+
+response = requests.post(
+    "http://localhost:5000/api/v1/stream",
+    data={
+        "text": "Hello, this is a test.",
+        "voice": "f-us-1",
+        "steps": 7
+    },
+    stream=True
+)
+
+with open("output.wav", "wb") as f:
+    for chunk in response.iter_content(chunk_size=8192):
+        f.write(chunk)
+```
+
+---
+
+### POST /api/v1/static
+
+Synthesizes speech from text and returns complete audio file.
+
+**Request Body (form-data):**
+
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `text` | string | Yes | Text to synthesize |
+| `voice` | string | Yes | Voice ID |
+
+**Response:**
+- Content-Type: `audio/wav`
+- Returns complete WAV file
+
+**Example:**
+
+```bash
+curl -X POST http://localhost:5000/api/v1/static \
+  -d "text=Hello world" \
+  -d "voice=m-us-1" \
+  --output output.wav
+```
+
+---
+
+## Available Voices
+
+| Voice ID | Description |
+|----------|-------------|
+| `f-us-1` | Female US English #1 |
+| `f-us-2` | Female US English #2 |
+| `f-us-3` | Female US English #3 |
+| `f-us-4` | Female US English #4 |
+| `m-us-1` | Male US English #1 |
+| `m-us-2` | Male US English #2 |
+| `m-us-3` | Male US English #3 |
+| `m-us-4` | Male US English #4 |
+
+---
+
+## Error Responses
+
+All errors return JSON with an `error` field:
+
+```json
+{
+  "error": "Missing required fields. Please include \"text\" and \"voice\" in your request."
+}
+```
+
+**Common errors:**
+- `400`: Missing required fields or invalid voice selection
+
+---
+
+## Testing
+
+Use the provided test client:
+
+```bash
+# List available voices
+python test_api_client.py --list-voices
+
+# Check server status
+python test_api_client.py --check-server
+
+# Synthesize speech
+python test_api_client.py -t "Hello world" -v f-us-1 -o output.wav
+
+# With custom diffusion steps
+python test_api_client.py -t "Hello world" -v m-us-2 -o output.wav -s 10
+```
+
+---
+
+## Starting the Server
+
+```bash
+python api.py
+```
+
+The server starts on `http://0.0.0.0:5000` by default.
diff --git a/README.md b/README.md
@@ -1,8 +1,5 @@
 # StyleTTS 2 API
 
-> [!CAUTION]
-> The Streaming API is not fully implemented yet.
-
 [Original Repo](https://github.com/yl4579/StyleTTS2) - [CLI Tool](https://github.com/fakerybakery/styletTS2-cli) - **Streaming API**
 
 (GPL licensed due to Phonemizer. Should I switch to OpenPhonemizer and make it MIT-licensed?)
@@ -27,9 +24,9 @@ Online demo: [Hugging Face](https://huggingface.co/spaces/styletts2/styletts2) (
 - [x] Add a finetuning script for new speakers with base pre-trained multispeaker models
 - [x] REST API
 - [x] Importable inference script (PR #78)
+- [x] Streaming API and WebSocket support
 - [ ] Fix DDP (accelerator) for `train_second.py` **(I have tried everything I could to fix this but had no success, so if you are willing to help, please see [#7](https://github.com/yl4579/StyleTTS2/issues/7))**
 - [ ] Pip package
-- [ ] Demo of audio streaming
 
 ## Pre-requisites
 1. Python >= 3.7
@@ -56,16 +53,54 @@ For LibriTTS, you will need to combine train-clean-360 with train-clean-100 and
 
 ## Streaming API
 
-You can use StyleTTS 2 in your projects by launching the HTTP API with streaming support. Synthesize text from your frontend apps, etc by making HTTP calls to the API server. The server uses Flask. It has not been extensively tested and should not be used for production purposes.
+You can use StyleTTS 2 in your projects by launching the HTTP API with streaming support. Synthesize text from your frontend apps, etc by making HTTP calls to the API server. The server uses Flask.
 
 API documentation may be found in the [`API_DOCS.md`](API_DOCS.md) file.
 
 Launch server:
 
-```
+```bash
 python api.py
 ```
 
+## WebSocket API
+
+For real-time TTS streaming with chunked text input and low-latency audio output, use the WebSocket API powered by FastAPI.
+
+**Features:**
+- Real-time bidirectional communication
+- Chunked text input (send text incrementally)
+- Base64-encoded MP3 audio output
+- GPU queue management for concurrent requests
+- Idle timeout and connection management
+
+**Quick Start:**
+
+```bash
+# Start WebSocket server (default port 8765)
+python ws_server.py
+
+# Or with custom port
+python ws_server.py 9000
+```
+
+**Endpoints:**
+- WebSocket: `ws://localhost:8765/ws/tts`
+- Health Check: `http://localhost:8765/health`
+- Voice List: `http://localhost:8765/voices`
+
+**Test the WebSocket API:**
+
+```bash
+# Simple test
+python test_ws_client.py --text "Hello world" --voice f-us-1 --output output.mp3
+
+# Chunked streaming test
+python test_ws_client.py --text "This is a longer text" --voice m-us-2 --chunked
+```
+
+Full WebSocket documentation: [`WEBSOCKET_DOCS.md`](WEBSOCKET_DOCS.md)
+
 ## Python API
 
 You can now use StyleTTS 2 directly in your programs! A `pip`-compatible package is coming soon.