Skip to content
forked from steipete/sag

Like the macOS say command, but with a modern voice.

License

Notifications You must be signed in to change notification settings

tars90percent/sag

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sag 🗣️ — “Mac-style speech with ElevenLabs and MiniMax”

One-liner TTS that works like say: stream to speakers by default, list voices, or save audio files. Defaults to ElevenLabs, with MiniMax available via speech-* model IDs.

Install

Homebrew (macOS):

brew install steipete/tap/sag  # auto-taps steipete/tap

Go toolchain:

go install ./cmd/sag

Requires Go 1.24+.

Configuration

  • ElevenLabs: ELEVENLABS_API_KEY (or SAG_API_KEY)
  • MiniMax: MINIMAX_API_KEY (or SAG_API_KEY)
  • --api-key-file or ELEVENLABS_API_KEY_FILE/MINIMAX_API_KEY_FILE/SAG_API_KEY_FILE to load the key from a file
  • Optional defaults: ELEVENLABS_VOICE_ID, MINIMAX_VOICE_ID, or SAG_VOICE_ID
  • Optional: MINIMAX_API_HOST or MINIMAX_BASE_URL to override the MiniMax base URL

Usage

Features:

  • macOS say-style default: sag "Hello" routes to speak automatically.
  • Streaming playback to speakers with optional file output.
  • Voice discovery via sag voices (ElevenLabs) and -v ? (provider-specific).
  • Speed/rate controls, latency tiers, and format inference from output extension.
  • Model selection via --model-id (defaults to eleven_v3; use eleven_multilingual_v2 for a stable baseline, speech-* for MiniMax).

Speak (streams audio):

sag speak -v Roger "Hello world"

Call it like macOS say: omitting the subcommand pipes text to speak by default.

sag "Hello world"

macOS say compatibility shortcuts (subcommand optional):

sag -v Roger -r 200 "Faster speech"
sag -o out.mp3 "Save to file"
sag -v ?      # list voices

More examples:

echo "piped input" | sag speak -v Roger
sag speak -v Roger --stream --latency-tier 3 "Faster start"
sag speak -v Roger --speed 1.2 "Talk a bit faster"
sag speak -v Roger --model-id eleven_multilingual_v2 "Use stable v2 baseline"
sag speak -v Roger --output out.wav --format pcm_44100 "Wave output"
sag speak --model-id speech-01 -v ? "List MiniMax voices"
sag speak --model-id speech-01 --output out.flac --stream=false "MiniMax file output"

Key flags (subset):

  • -v, --voice voice name or ID (? to list)
  • --api-key-file read API key from a file
  • -r, --rate words per minute (maps to ElevenLabs speed; default 175)
  • -f, --input-file read text from file (- for stdin)
  • -o, --output write audio file; format inferred by extension (.wav -> PCM, .mp3 -> MP3)
  • --speed explicit speed multiplier (0.5–2.0)
  • --stability v3: 0|0.5|1 (Creative/Natural/Robust); v2/v2.5: 0..1 (higher = more consistent, less expressive)
  • --similarity / --similarity-boost 0..1 (higher = closer to the reference voice)
  • --style 0..1 (higher = more stylized delivery; model/voice dependent)
  • --speaker-boost / --no-speaker-boost toggle clarity boost (model dependent)
  • --seed 0..4294967295 best-effort repeatability across runs
  • --normalize auto|on|off numbers/units/URLs normalization (when set)
  • --lang en|de|fr|... 2-letter ISO 639-1 language code (when set)
  • --stream/--no-stream stream while generating (default on)
  • --latency-tier 0–4 lower latency tiers
  • --play/--no-play control speaker playback
  • --metrics print basic stats to stderr

Voices:

sag voices --search english --limit 20
sag voices --search english --limit 5 --try
sag voices --query "crazy scientist" --limit 5 --try
sag voices --label accent=british --label use_case=character --limit 10

Prompting (make it sound better)

Run:

sag prompting

Highlights:

  • v2/v2.5: SSML pauses via <break time="1.5s" /> (v3 does not support SSML breaks).
  • v3: use audio tags like [whispers] and pause tags like [short pause].
  • Use the voice knobs: --stability, --similarity, --style, --speaker-boost, plus request controls --seed, --normalize, --lang.

Models / engines

Provider selection:

  • ElevenLabs (default): any ElevenLabs model_id via --model-id (we pass it through).
  • MiniMax: use a speech-* model ID to route requests to MiniMax. Streaming/playback is MP3-only; use --stream=false for WAV/FLAC output.

Practical defaults + common ElevenLabs IDs:

Engine --model-id Prompting style Best for
v3 (alpha) eleven_v3 (default) Audio tags like [whispers], [short pause] (no SSML <break>) Most expressive / “acting”
v2 (stable) eleven_multilingual_v2 SSML <break> supported Reliable baseline, simple prompts
v2.5 Flash eleven_flash_v2_5 SSML <break> supported Ultra-low latency (~75ms) + 50% lower price per character
v2.5 Turbo eleven_turbo_v2_5 SSML <break> supported Low latency (~250–300ms) + 50% lower price per character

Notes:

  • SSML <break> works on v2/v2.5, not v3. Use pause tags on v3 instead.
  • Input limits differ by engine (v3: 5,000 chars; v2: 10,000 chars; v2.5 Turbo/Flash: 40,000 chars). If you hit limits, chunk text and stitch audio.
  • --normalize on may not be available for v2.5 Turbo/Flash (higher latency); prefer auto/off if it errors.
  • Source of truth: ElevenLabs “Models” docs.

Development

  • With pnpm:
    • pnpm format
    • pnpm lint
    • pnpm test
    • pnpm build
    • pnpm sag -- --help (passes args to the Go binary)
  • Direct Go:
    • Format: go fmt ./...
    • Lint: golangci-lint run
    • Tests: go test ./...
    • Build: go build ./cmd/sag

Limitations

  • ElevenLabs or MiniMax account and API key required (per provider).
  • Voice defaults to first available if not provided.
  • Non-mac platforms: playback still works via go-mp3 + oto, but device selection flags are no-ops.

About

Like the macOS say command, but with a modern voice.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Go 100.0%