TalkToMePy

Local Qwen3 VoiceDesign TTS service built with FastAPI.

Status

Working end-to-end via curl
Working end-to-end from a separate Swift CLI client
Current output format: audio/wav (PCM16 mono 24kHz)

Demo (2 minutes)

uv run python main.py

In another terminal:

mkdir -p outputs
curl -X POST http://127.0.0.1:8000/model/load \
  -H "Content-Type: application/json" \
  -d '{"mode":"voice_design","strict_load":false}'
curl -X POST http://127.0.0.1:8000/synthesize/voice-design \
  -H "Content-Type: application/json" \
  -d '{"text":"Hello from TalkToMePy demo.","instruct":"Warm and clear narrator voice.","language":"English","format":"wav"}' \
  --output outputs/demo.wav
afplay outputs/demo.wav

Requirements

Python >=3.13
uv
sox on PATH (macOS: brew install sox)

Setup

./scripts/setup.sh

This script:

checks for uv and sox
runs uv sync
creates outputs/
creates .env.launchd from .env.example if missing

Configuration

Single-instance defaults:

cp .env.example .env.launchd

Instance-specific launchd env files are also supported:

.env.launchd.stable
.env.launchd.dev

scripts/run_service.sh accepts an env file path argument, and scripts/launchd_instance.sh wires this automatically.

Run Service

uv run python main.py

Service URL: http://127.0.0.1:8000

API Spec (OpenAPI)

FastAPI exposes live docs/spec automatically:

OpenAPI JSON: http://127.0.0.1:8000/openapi.json
Swagger UI: http://127.0.0.1:8000/docs
ReDoc: http://127.0.0.1:8000/redoc

This repo includes separate target and generated YAML specs:

Target spec (do not overwrite): openapi/openapi.yaml
Backup copy of target spec: openapi/openapi.target.yaml
Generated export from app OpenAPI schema: openapi/openapi.generated.yaml

Regenerate the generated spec after API changes:

uv run python scripts/export_openapi.py

Check parity between target and generated specs:

diff -u openapi/openapi.yaml openapi/openapi.generated.yaml

Run the parity gate test:

uv run python scripts/export_openapi.py
uv run pytest -q tests/test_openapi_parity.py

CI

GitHub Actions runs two lanes:

pytest (required fast lane): lockfile sync + OpenAPI export + full pytest suite.
smoke-e2e (separate model-backed lane on main push, nightly schedule, and manual dispatch):
- direct model smoke: scripts/voice_design_smoke.py
- API e2e custom voice: scripts/custom_voice_smoke.py
- API e2e voice clone: scripts/voice_clone_smoke.py

smoke-e2e requires repository secret HF_TOKEN and uploads smoke artifacts from outputs/.

Run as macOS Background Service (launchd)

This repo includes:

LaunchAgent template: launchd/com.talktomepy.plist
Runner script: scripts/run_service.sh
Instance manager: scripts/launchd_instance.sh

Install and start a single instance:

./scripts/launchd_instance.sh install --instance stable --port 8000

Manage it:

./scripts/launchd_instance.sh status --instance stable
./scripts/launchd_instance.sh logs --instance stable
./scripts/launchd_instance.sh restart --instance stable
./scripts/launchd_instance.sh stop --instance stable
./scripts/launchd_instance.sh remove --instance stable

Stable + Dev Side-by-Side on One Mac

Install from each clone with a different instance name and port:

Stable clone (~/Workspace/services/talkToMePy):

cd ~/Workspace/services/talkToMePy
./scripts/launchd_instance.sh install --instance stable --port 8000

Dev clone (~/Workspace/talkToMePy):

cd ~/Workspace/talkToMePy
./scripts/launchd_instance.sh install --instance dev --port 8001

Health check both:

curl http://127.0.0.1:8000/health
curl http://127.0.0.1:8001/health

Notes for modern macOS (including macOS 26):

Prefer bootstrap/bootout/kickstart over legacy load/unload.
launchd has a minimal environment; keep required env vars in scripts/run_service.sh and per-instance env files.
scripts/run_service.sh sets a Homebrew-friendly default PATH so sox is resolvable under launchd.

Endpoints

GET /health returns service status
GET /version returns API/service version metadata
GET /adapters lists available runtime adapters
GET /adapters/{adapter_id}/status returns adapter-specific status
GET /model/status returns mode-aware model runtime readiness/status
GET /model/inventory returns supported model inventory and local availability
POST /model/load accepts mode-aware load request and lazily loads selected model
POST /model/unload unloads the model from memory
GET /custom-voice/speakers returns supported custom-voice speakers for selected model
POST /synthesize/voice-design returns generated audio bytes as audio/wav
POST /synthesize/custom-voice returns generated audio bytes as audio/wav
POST /synthesize/voice-clone returns generated audio bytes as audio/wav

Notes:

POST /model/load may return 202 Accepted while loading is in progress.
Synth routes return 503 with Retry-After if model is still loading.
Legacy POST /synthesize and POST /synthesize/stream were removed in v0.5.0.

Quickstart

curl http://127.0.0.1:8000/health

curl http://127.0.0.1:8000/version

curl http://127.0.0.1:8000/adapters

curl http://127.0.0.1:8000/adapters/qwen3-tts/status

curl http://127.0.0.1:8000/model/status

curl http://127.0.0.1:8000/model/inventory

curl -X POST http://127.0.0.1:8000/model/load \
  -H "Content-Type: application/json" \
  -d '{"mode":"voice_design","strict_load":false}'

curl -X POST http://127.0.0.1:8000/model/unload

curl -X POST http://127.0.0.1:8000/synthesize/voice-design \
  -H "Content-Type: application/json" \
  -d '{"text":"Hello from Swift bridge!","instruct":"Warm and friendly voice with steady pace.","language":"English","format":"wav"}' \
  --output outputs/from_service.wav

curl -X POST http://127.0.0.1:8000/synthesize/custom-voice \
  -H "Content-Type: application/json" \
  -d '{"text":"Custom voice endpoint test.","speaker":"ryan","language":"English","format":"wav"}' \
  --output outputs/from_custom.wav

curl -X POST http://127.0.0.1:8000/synthesize/voice-clone \
  -H "Content-Type: application/json" \
  -d '{"text":"Voice clone endpoint test.","reference_audio_b64":"UklGRg==","language":"English","format":"wav"}' \
  --output outputs/from_clone.wav

Play the generated file on macOS:

afplay outputs/from_service.wav

VoiceDesign Smoke Script

uv run python scripts/voice_design_smoke.py \
  --text "Hello from my Swift CLI bridge." \
  --instruct "Energetic, friendly, and slightly brisk pacing with bright tone." \
  --output outputs/swift_bridge_demo.wav

Notes

qwen-tts currently requires transformers==4.57.3 (pinned in this repo).
All synth endpoints currently support format: "wav" only.
Model id can be overridden with env var QWEN_TTS_MODEL_ID.
Optional idle auto-unload can be enabled with env var QWEN_TTS_IDLE_UNLOAD_SECONDS.
Optional startup warm-load can be enabled with env var QWEN_TTS_WARM_LOAD_ON_START=true.
Optional load settings: QWEN_TTS_DEVICE_MAP, QWEN_TTS_TORCH_DTYPE.
When QWEN_TTS_DEVICE_MAP is unset or auto, a synthesis meta-tensor runtime failure now triggers one automatic reload/retry on CPU (device_map=cpu, torch_dtype=float32).

Roadmap and TODO tracking live in ROADMAP.md.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
app		app
launchd		launchd
openapi		openapi
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
README.md		README.md
ROADMAP.md		ROADMAP.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TalkToMePy

Status

Demo (2 minutes)

Requirements

Setup

Configuration

Run Service

API Spec (OpenAPI)

CI

Run as macOS Background Service (launchd)

Stable + Dev Side-by-Side on One Mac

Endpoints

Quickstart

VoiceDesign Smoke Script

Notes

About

Uh oh!

Releases 4

Packages

Uh oh!

Languages

gaelic-ghost/talkToMePy

Folders and files

Latest commit

History

Repository files navigation

TalkToMePy

Status

Demo (2 minutes)

Requirements

Setup

Configuration

Run Service

API Spec (OpenAPI)

CI

Run as macOS Background Service (launchd)

Stable + Dev Side-by-Side on One Mac

Endpoints

Quickstart

VoiceDesign Smoke Script

Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Languages

Packages