TalkToMeKit provides a Swift service + server for speech synthesis with Qwen3-TTS, using an embedded arm64 CPython runtime staged inside the package.
- In-process CPython embedding is implemented and running.
TTMServiceintegrates Qwen3-TTS as aswift-service-lifecycleservice.TalkToMeServerexposes/synthesize/voice-design,/synthesize/custom-voice,/synthesize/voice-clone, and model status/load endpoints.- Bundled runtime auto-discovery is supported via
TTMPythonRuntimeBundle.
Sources/TTMService: service-level integration (TTMQwenService).Sources/TTMService: service runtime + embedded CPython bridge +qwen_tts_runner.py.Sources/TTMPythonRuntimeBundle: packaged Python runtime resources.Sources/TTMServer: HTTP server and OpenAPI handlers.scripts/stage_python_runtime.sh: stage bundled Python runtime + optional Qwen install/model download.
- macOS arm64
- Swift 6.2 toolchain
- Python 3.11 available on PATH (for staging script), e.g.
python3.11
Use the SwiftPM command plugin for all normal workflows:
# Default: incremental staging.
# Stages only missing categories (runtime, site-packages, selected models).
# Default selected models: 0.6B CustomVoice + 0.6B Base (VoiceClone).
swift package plugin --allow-network-connections all stage-python-runtime
# Force uv installer
swift package plugin --allow-network-connections all stage-python-runtime -- -uv
# Full restage (all categories)
swift package plugin --allow-network-connections all stage-python-runtime -- --restage
# Rebuild runtime only (libpython/stdlib/bin tools)
swift package plugin --allow-network-connections all stage-python-runtime -- --restage-runtime
# Reinstall site-packages only
swift package plugin --allow-network-connections all stage-python-runtime -- --restage-packages
# Redownload selected models only
swift package plugin --allow-network-connections all stage-python-runtime -- --restage-models
# Install deps only (skip model downloads)
swift package plugin --allow-network-connections all stage-python-runtime -- --noload
# Also include large 1.7B models
swift package plugin --allow-network-connections all stage-python-runtime -- --bigcv --bigvd --bigvcDirect script usage is still available for local debugging:
./scripts/stage_python_runtime.sh -uv --noloadDependency pinning:
- Runtime dependencies are specified in
scripts/python-runtime/pyproject.toml. - Resolved pins are locked in
scripts/python-runtime/uv.lock. - Keep both updated together with a validated set when changing torch/qwen/transformers stack.
Staging behavior:
- Runtime category:
libpython, stdlib, and bundledsoxtools. - Packages category:
site-packagesruntime dependencies. - Models category: selected model directories under
Runtime/current/models. - By default, staging is conditional and only refreshes missing categories.
--restageforces all categories.--restage-runtime,--restage-packages, and--restage-modelstarget categories independently.--noloaddisables model downloads for the run.
Staged location:
Sources/TTMPythonRuntimeBundle/Resources/Runtime/current
Note:
Runtime/currentis intentionally git-ignored and not committed.- Each developer/CI environment should stage it locally before running server/app code.
swift run TalkToMeServer \
--hostname 127.0.0.1 \
--port 8091 \
--python-runtime-root Sources/TTMPythonRuntimeBundle/Resources/Runtime/current \
--python-version 3.11 \
--qwen-mode voice_design \
--qwen-model-id Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesignIf no --python-runtime-root is provided, the server attempts bundled runtime
auto-discovery from TTMPythonRuntimeBundle.
TalkToMeKit includes a CLI client product for the server API.
swift run ttm-cli --helpCommon checks:
swift run ttm-cli health
swift run ttm-cli version
swift run ttm-cli status
swift run ttm-cli inventory
swift run ttm-cli adapters
swift run ttm-cli adapter-status qwen3-tts
swift run ttm-cli speakers --model-id Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoiceModel load/unload:
swift run ttm-cli load --mode custom_voice --model-id Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice
swift run ttm-cli load --mode voice_design --strict-load
swift run ttm-cli unloadSynthesize to file:
swift run ttm-cli synthesize --mode voice-design --text "Hello from TalkToMeKit" --output /tmp/ttm-vd.wav
swift run ttm-cli synthesize --mode custom-voice --speaker ryan --text "Hello from TalkToMeKit" --output /tmp/ttm-cv.wav
swift run ttm-cli synthesize --mode voice-clone --reference-audio /path/to/reference.wav --text "Hello from TalkToMeKit" --output /tmp/ttm-vc.wavPlay immediately:
swift run ttm-cli play --mode voice-design --text "Quick playback check"TalkToMeService now exposes an app-facing runtime wrapper:
import Foundation
import TTMService
let runtime = TTMServiceRuntime(
configuration: .init(
assetProvider: TTMBundledRuntimeAssetProvider(pythonVersion: "3.11"),
startupTimeoutSeconds: 60
)
)
try await runtime.start()
let wav = try await runtime.synthesize(.init(text: "Hello from app"))
await runtime.stop()For app-managed assets, use TTMLocalRuntimeAssetProvider(runtimeRoot:) with a
runtime path in app support storage (for example after first-launch download).
First-launch download scaffold:
import Foundation
import TTMService
let appSupport = FileManager.default.urls(for: .applicationSupportDirectory, in: .userDomainMask)[0]
let runtimeRoot = appSupport.appendingPathComponent("TalkToMeKit/Runtime/current", isDirectory: true)
let runtime = TTMServiceRuntime(
configuration: .firstLaunch(
runtimeRoot: runtimeRoot,
pythonVersion: "3.11",
downloadIfNeeded: { root, version in
// App-specific download/unpack implementation here.
// Ensure root contains lib/libpython{version}.dylib and lib/python{version}/...
_ = (root, version)
},
onStateChange: { state in
print("Asset state:", state)
}
)
)Use this when embedding TalkToMeService into a macOS app target.
- Add local package dependency to your app project:
<path-to-TalkToMeKit> - Link package product
TalkToMeServiceto the app target. - Stage runtime once in the package checkout:
cd <path-to-TalkToMeKit>
swift package plugin --allow-writing-to-package-directory --allow-network-connections all stage-python-runtime -- -uv- Add a Run Script build phase to the app target (before app code signing), using:
set -euo pipefail
PKG_RUNTIME_ROOT="${SRCROOT}/../TalkToMeKit/Sources/TTMPythonRuntimeBundle/Resources/Runtime/current"
DEST_RUNTIME_ROOT="${TARGET_BUILD_DIR}/${UNLOCALIZED_RESOURCES_FOLDER_PATH}/Runtime/current"
if [ ! -d "${PKG_RUNTIME_ROOT}" ]; then
echo "error: staged runtime not found at ${PKG_RUNTIME_ROOT}" >&2
exit 1
fi
rm -rf "${DEST_RUNTIME_ROOT}"
mkdir -p "${DEST_RUNTIME_ROOT}"
/usr/bin/rsync -a --delete "${PKG_RUNTIME_ROOT}/" "${DEST_RUNTIME_ROOT}/"
SIGN_IDENTITY="${EXPANDED_CODE_SIGN_IDENTITY:-}"
if [ -z "${SIGN_IDENTITY}" ]; then
SIGN_IDENTITY="-"
fi
/usr/bin/find "${DEST_RUNTIME_ROOT}" -type f \( -name "*.dylib" -o -name "*.so" \) -print0 |
while IFS= read -r -d '' artifact; do
/usr/bin/codesign --force --sign "${SIGN_IDENTITY}" --options runtime --timestamp=none "${artifact}"
done
# Also sign Mach-O executables under Runtime/current/bin (for bundled sox).
/usr/bin/find "${DEST_RUNTIME_ROOT}/bin" -type f -print0 2>/dev/null |
while IFS= read -r -d '' artifact; do
if /usr/bin/file -b "${artifact}" | /usr/bin/grep -q "Mach-O"; then
/usr/bin/codesign --force --sign "${SIGN_IDENTITY}" --options runtime --timestamp=none "${artifact}"
fi
done- In app code, use bundle-local runtime path:
Bundle.main.resourceURL/Runtime/current
Notes:
- If script phase cannot read sibling directories, set app build setting
ENABLE_USER_SCRIPT_SANDBOXING = NO. DYLD_*environment variables are not required for MPS.ENABLE_OUTGOING_NETWORK_CONNECTIONSis only needed when downloading models/deps at runtime.- The staging script stages
soxfromstatic-soxintoRuntime/current/binand does not use hostsox. - Prepend
Runtime/current/bintoPATHbeforeruntime.start()soqwen-ttscan find bundledsox.
Use per-request mode, model, voice/speaker, and language:
import TTMService
let wav = try await runtime.synthesize(
.init(
text: "Hello from app",
voice: "ryan", // speaker for custom_voice, instruction for voice_design
instruct: "Cheerful with slightly faster pacing", // optional for custom_voice
mode: .customVoice, // or .voiceDesign / .voiceClone
modelID: .customVoice0_6B, // or .customVoice1_7B / .voiceDesign1_7B / .voiceClone0_6B / .voiceClone1_7B
language: "English"
)
)Defaults:
- VoiceDesign default model:
Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign - CustomVoice default model:
Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice - VoiceClone default model:
Qwen/Qwen3-TTS-12Hz-0.6B-Base - CustomVoice default speaker (when
voiceis omitted):ryan
curl -sS http://127.0.0.1:8091/health
curl -sS http://127.0.0.1:8091/model/status
curl -sS http://127.0.0.1:8091/model/inventory
curl -sS http://127.0.0.1:8091/custom-voice/speakers
curl -sS -o /tmp/tts-vd.wav \
-H 'content-type: application/json' \
-d '{"text":"Hello from TalkToMeKit","instruct":"Warm narrator voice","language":"English","format":"wav"}' \
http://127.0.0.1:8091/synthesize/voice-design
curl -sS -o /tmp/tts-cv.wav \
-H 'content-type: application/json' \
-d '{"text":"Hello from TalkToMeKit","speaker":"ryan","instruct":"Cheerful and energetic","language":"English","format":"wav"}' \
http://127.0.0.1:8091/synthesize/custom-voice
curl -sS -o /tmp/tts-clone.wav \
-H 'content-type: application/json' \
-d '{"text":"Hello from TalkToMeKit","reference_audio_b64":"<BASE64_WAV_BYTES>","language":"English","format":"wav"}' \
http://127.0.0.1:8091/synthesize/voice-clone
curl -sS \
-H 'content-type: application/json' \
-d '{"mode":"custom_voice","model_id":"Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice"}' \
http://127.0.0.1:8091/model/load
curl -sS \
-H 'content-type: application/json' \
-d '{"mode":"voice_design","model_id":"Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign","strict_load":true}' \
http://127.0.0.1:8091/model/load
curl -sS \
-H 'content-type: application/json' \
-d '{"mode":"voice_clone","model_id":"Qwen/Qwen3-TTS-12Hz-0.6B-Base"}' \
http://127.0.0.1:8091/model/load- Use
GET /model/inventoryto verify which model directories are present under staged runtimemodels/. - Use
GET /model/statusto compare requested vs active model:requested_mode,requested_model_id, andstrict_loadshow what was asked for.fallback_appliedindicates runtime loaded a fallback model.
- Use
POST /model/loadwith"strict_load": trueto fail fast instead of falling back when a requested model is not staged. - Use
GET /custom-voice/speakersto inspect available speaker IDs for the active or requested CustomVoice model. - If inventory is missing entries, restage with:
swift package plugin --allow-network-connections all stage-python-runtime -- --restageTTM_QWEN_SYNTH_TIMEOUT_SECONDS: synth request timeout (default120).TTM_QWEN_LOCAL_MODEL_PATH: override model path.TTM_QWEN_MODE: startup mode fallback (voice_designdefault).TTM_QWEN_ALLOW_CROSS_MODE_FALLBACK: when enabled (1), loader may fall back across known models/modes.TTM_QWEN_DEVICE_MAP: torch/qwen device map (defaultauto). On Apple Silicon, this typically selectsmpswhen supported.TTM_QWEN_TORCH_DTYPE: optional torch dtype override (float16andbfloat16supported). Leave unset to let backend defaults apply.TTM_QWEN_ALLOW_FALLBACK: allow silent fallback output when model load/synth fails (1or0).TTM_PYTHON_ENABLE_FINALIZE: opt-in CPython finalize on shutdown (1); disabled by default for runtime stability with native extension threads.
Apple Silicon performance tip:
export TTM_QWEN_DEVICE_MAP=auto
unset TTM_QWEN_TORCH_DTYPEIf you force TTM_QWEN_DEVICE_MAP=mps and encounter runtime synthesis instability (for example NaN/inf sampling errors), the runner now retries once on CPU/float32 automatically. For deterministic CPU mode:
export TTM_QWEN_DEVICE_MAP=cpu
unset TTM_QWEN_TORCH_DTYPEswift build
swift testModel-backed bundled bridge integration tests are opt-in:
TTM_RUN_BUNDLED_MODEL_INTEGRATION=1 swift test --filter "TTMServiceTests.TTMServiceTests/bridgeIntegrates"Notes:
- These tests enforce strict exact-model loads and disable silent fallback output.
- They currently force
cpu/float32internally for determinism, due observed nativeSIGSEGV (11)instability in 1.7B auto/MPS load paths.
For better native-runtime isolation (each test in a fresh swiftpm-testing-helper process), use:
scripts/bridge_integrates_isolated.shOn some toolchain combinations (for example Swift 6.2.3 + Xcode 26.2), ld
may print warnings like:
warning: input verification failednote: while processing ... .swift.o
This is a debug-info verification warning and does not typically indicate a functional runtime or link failure.
If you want quieter CI/build logs, build with debug info disabled:
swift build --product TalkToMeServer -Xswiftc -gnoneYou may also see warnings like:
ld: warning: building for macOS-11.0, but linking with dylib '/usr/lib/swift/libswiftCore.dylib' which was built for newer version 13.0
In this package, these are emitted while linking the Swift OpenAPI generator
tool used by the OpenAPIGenerator plugin, not while linking TalkToMeServer.
The upstream swift-openapi-generator package currently declares
.macOS(.v10_15), which can trigger these warnings with newer Xcode/Swift
toolchains.
These warnings are generally benign. If you need quieter CI logs, either:
- keep generated OpenAPI sources checked in and avoid plugin-driven regeneration during routine builds, or
- filter this specific warning line in CI log post-processing.
When embedding TalkToMeService into a macOS app, the following items should be explicitly accounted for:
- Scripts
- Runtime staging script in this repo:
scripts/stage_python_runtime.sh - App-side copy/sign script in your host app project:
Scripts/stage_python_runtime.sh
- Runtime staging script in this repo:
- Build settings (app project)
ENABLE_USER_SCRIPT_SANDBOXING = NOwhen app build scripts need sibling checkout access (for example../TalkToMeKit/...).ENABLE_HARDENED_RUNTIME = YESfor production hardening.ENABLE_OUTGOING_NETWORK_CONNECTIONS = YESonly if runtime/model download may happen at runtime; disable for offline-only staged deployments.
- Sandbox caveats
- User script sandbox can block reading runtime assets outside project root.
- Pre-stage runtime/models in CI or local dev to avoid runtime network dependency.
- Hardened runtime caveats
- All copied runtime Mach-O artifacts (
.dylib,.so, and executable tools likesox) must be signed in the app bundle. - Prefer Xcode-provided signing identity values (
EXPANDED_CODE_SIGN_IDENTITY) over hardcoded identities.
- All copied runtime Mach-O artifacts (
Security note:
- Keep copy/sign scripts secret-free; use Xcode-provided signing identity context at build time.
- Avoid publishing full CI logs that include detailed local code-signing identity metadata unless needed.
Run stability smoke as Swift integration tests (mode-switch + cold-start):
TTM_RUN_STABILITY_SMOKE=1 swift test --filter "TTM stability smoke"Adjust iteration counts:
TTM_RUN_STABILITY_SMOKE=1 TTM_STABILITY_MIXED_ITERS=10 TTM_STABILITY_COLD_ITERS=4 swift test --filter "TTM stability smoke"Notes:
- These tests are opt-in and skipped unless
TTM_RUN_STABILITY_SMOKE=1. - They require a staged runtime at
Sources/TTMPythonRuntimeBundle/Resources/Runtime/current.
Run opt-in backend/dtype integration tests (auto backend with dtype unset, CPU baseline, and invalid backend/dtype behavior):
TTM_RUN_BACKEND_DTYPE_MATRIX=1 swift test --filter "Backend/dtype matrix"Notes:
- These tests are opt-in and skipped unless
TTM_RUN_BACKEND_DTYPE_MATRIX=1. - They require a staged runtime at
Sources/TTMPythonRuntimeBundle/Resources/Runtime/current. - They currently require
Qwen3-TTS-12Hz-0.6B-CustomVoiceto be present under stagedmodels/.
Build the server artifact first:
swift build -c release --product TTMServerSmoke (artifact startup + health + synth):
TTM_RUN_ARTIFACT_SMOKE=1 TTM_ARTIFACT_PATH=.build/release/TTMServer swift test --filter ServerArtifactSmokeTestsFunctional (staging/failure paths + backend/dtype artifact matrix):
TTM_RUN_ARTIFACT_FUNCTIONAL=1 TTM_ARTIFACT_PATH=.build/release/TTMServer swift test --filter ServerArtifactStagingTests
TTM_RUN_ARTIFACT_FUNCTIONAL=1 TTM_ARTIFACT_PATH=.build/release/TTMServer swift test --filter ServerArtifactBackendDtypeTestsAudio quality metrics (objective checks):
TTM_RUN_ARTIFACT_AUDIO=1 TTM_ARTIFACT_PATH=.build/release/TTMServer swift test --filter ServerArtifactAudioMetricsTestsNotes:
- These suites are opt-in and skipped unless their
TTM_RUN_ARTIFACT_*env var is set. - Default runtime root is
Sources/TTMPythonRuntimeBundle/Resources/Runtime/currentunless overridden byTTM_RUNTIME_DIR. - Artifact suites require staged runtime assets (including
lib/libpython3.11.dylib) and theQwen3-TTS-12Hz-0.6B-CustomVoicemodel. - Backend/dtype and audio suites can be parameterized with
TTM_TEST_BACKENDandTTM_TEST_DTYPE. - Set
TTM_ARTIFACT_REQUIRE_PREREQS=1(or run in CI withCI=1) to fail functional/audio suites when runtime/model prerequisites are missing, instead of silently no-oping those tests.