OpenAI-compatible API servers for Qwen3-TTS and Qwen3-ASR, enabling self-hosted text-to-speech and speech-to-text via the standard OpenAI audio endpoints:
/v1/audio/speech— Text-to-speech (TTS)/v1/audio/transcriptions— Speech-to-text (ASR)
| Language | Directory | Status |
|---|---|---|
| Python | python/ | Available |
| Rust | rust/ | Available |
The Python implementation is a FastAPI server built on the qwen-tts and qwen-asr Python packages. See python/README.md for setup, Docker images, API reference, and usage examples.
The Rust implementation is a high-performance axum/tokio server built on the qwen3_tts and qwen3_asr Rust crates, with libtorch (Linux) and MLX (macOS Apple Silicon) backends. See rust/README.md for setup, pre-built binaries, API reference, and usage examples.
-
Text-to-Speech (TTS): Generate natural speech from text using Qwen3-TTS models
- Multiple voice presets (Vivian, Ryan, Serena, etc.)
- Voice cloning from audio samples
- Multiple languages (English, Chinese, Japanese, Korean, and more)
- Multiple output formats (WAV, MP3, FLAC, Opus, AAC)
-
Speech-to-Text (ASR): Transcribe audio to text using Qwen3-ASR models
- 30+ language support with auto-detection
- Accepts various audio formats (WAV, MP3, M4A, etc.)
The purpose of these API servers is to provide self-hosted, free backend audio services for projects such as:
- OpenClaw — AI agent
- EchoKit — Voice AI device
- Olares — Personal AI cloud OS
- GaiaNet — Incentivized AI agent network and marketplace
Any application that speaks the OpenAI audio API can swap in this server as a drop-in replacement.
See LICENSE.