sag 🗣️ — “Mac-style speech with ElevenLabs and MiniMax”

One-liner TTS that works like say: stream to speakers by default, list voices, or save audio files. Defaults to ElevenLabs, with MiniMax available via speech-* model IDs.

Install

Homebrew (macOS):

brew install steipete/tap/sag  # auto-taps steipete/tap

Go toolchain:

go install ./cmd/sag

Requires Go 1.24+.

Configuration

ElevenLabs: ELEVENLABS_API_KEY (or SAG_API_KEY)
MiniMax: MINIMAX_API_KEY (or SAG_API_KEY)
--api-key-file or ELEVENLABS_API_KEY_FILE/MINIMAX_API_KEY_FILE/SAG_API_KEY_FILE to load the key from a file
Optional defaults: ELEVENLABS_VOICE_ID, MINIMAX_VOICE_ID, or SAG_VOICE_ID
Optional: MINIMAX_API_HOST or MINIMAX_BASE_URL to override the MiniMax base URL

Usage

Features:

macOS say-style default: sag "Hello" routes to speak automatically.
Streaming playback to speakers with optional file output.
Voice discovery via sag voices (ElevenLabs) and -v ? (provider-specific).
Speed/rate controls, latency tiers, and format inference from output extension.
Model selection via --model-id (defaults to eleven_v3; use eleven_multilingual_v2 for a stable baseline, speech-* for MiniMax).

Speak (streams audio):

sag speak -v Roger "Hello world"

Call it like macOS say: omitting the subcommand pipes text to speak by default.

sag "Hello world"

macOS say compatibility shortcuts (subcommand optional):

sag -v Roger -r 200 "Faster speech"
sag -o out.mp3 "Save to file"
sag -v ?      # list voices

More examples:

echo "piped input" | sag speak -v Roger
sag speak -v Roger --stream --latency-tier 3 "Faster start"
sag speak -v Roger --speed 1.2 "Talk a bit faster"
sag speak -v Roger --model-id eleven_multilingual_v2 "Use stable v2 baseline"
sag speak -v Roger --output out.wav --format pcm_44100 "Wave output"
sag speak --model-id speech-01 -v ? "List MiniMax voices"
sag speak --model-id speech-01 --output out.flac --stream=false "MiniMax file output"

Key flags (subset):

-v, --voice voice name or ID (? to list)
--api-key-file read API key from a file
-r, --rate words per minute (maps to ElevenLabs speed; default 175)
-f, --input-file read text from file (- for stdin)
-o, --output write audio file; format inferred by extension (.wav -> PCM, .mp3 -> MP3)
--speed explicit speed multiplier (0.5–2.0)
--stability v3: 0|0.5|1 (Creative/Natural/Robust); v2/v2.5: 0..1 (higher = more consistent, less expressive)
--similarity / --similarity-boost 0..1 (higher = closer to the reference voice)
--style 0..1 (higher = more stylized delivery; model/voice dependent)
--speaker-boost / --no-speaker-boost toggle clarity boost (model dependent)
--seed 0..4294967295 best-effort repeatability across runs
--normalize auto|on|off numbers/units/URLs normalization (when set)
--lang en|de|fr|... 2-letter ISO 639-1 language code (when set)
--stream/--no-stream stream while generating (default on)
--latency-tier 0–4 lower latency tiers
--play/--no-play control speaker playback
--metrics print basic stats to stderr

Voices:

sag voices --search english --limit 20
sag voices --search english --limit 5 --try
sag voices --query "crazy scientist" --limit 5 --try
sag voices --label accent=british --label use_case=character --limit 10

Prompting (make it sound better)

Run:

sag prompting

Highlights:

v2/v2.5: SSML pauses via <break time="1.5s" /> (v3 does not support SSML breaks).
v3: use audio tags like [whispers] and pause tags like [short pause].
Use the voice knobs: --stability, --similarity, --style, --speaker-boost, plus request controls --seed, --normalize, --lang.

Models / engines

Provider selection:

ElevenLabs (default): any ElevenLabs model_id via --model-id (we pass it through).
MiniMax: use a speech-* model ID to route requests to MiniMax. Streaming/playback is MP3-only; use --stream=false for WAV/FLAC output.

Practical defaults + common ElevenLabs IDs:

Engine	`--model-id`	Prompting style	Best for
v3 (alpha)	`eleven_v3` (default)	Audio tags like `[whispers]`, `[short pause]` (no SSML `<break>`)	Most expressive / “acting”
v2 (stable)	`eleven_multilingual_v2`	SSML `<break>` supported	Reliable baseline, simple prompts
v2.5 Flash	`eleven_flash_v2_5`	SSML `<break>` supported	Ultra-low latency (~75ms) + 50% lower price per character
v2.5 Turbo	`eleven_turbo_v2_5`	SSML `<break>` supported	Low latency (~250–300ms) + 50% lower price per character

Notes:

SSML <break> works on v2/v2.5, not v3. Use pause tags on v3 instead.
Input limits differ by engine (v3: 5,000 chars; v2: 10,000 chars; v2.5 Turbo/Flash: 40,000 chars). If you hit limits, chunk text and stitch audio.
--normalize on may not be available for v2.5 Turbo/Flash (higher latency); prefer auto/off if it errors.
Source of truth: ElevenLabs “Models” docs.

Development

With pnpm:
- pnpm format
- pnpm lint
- pnpm test
- pnpm build
- pnpm sag -- --help (passes args to the Go binary)
Direct Go:
- Format: go fmt ./...
- Lint: golangci-lint run
- Tests: go test ./...
- Build: go build ./cmd/sag

Limitations

ElevenLabs or MiniMax account and API key required (per provider).
Voice defaults to first available if not provided.
Non-mac platforms: playback still works via go-mp3 + oto, but device selection flags are no-ops.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.github/workflows		.github/workflows
cmd		cmd
docs		docs
internal		internal
.gitignore		.gitignore
.golangci.yml		.golangci.yml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sag 🗣️ — “Mac-style speech with ElevenLabs and MiniMax”

Install

Configuration

Usage

Prompting (make it sound better)

Models / engines

Development

Limitations

About

Uh oh!

Releases

Packages

Languages

License

tars90percent/sag

Folders and files

Latest commit

History

Repository files navigation

sag 🗣️ — “Mac-style speech with ElevenLabs and MiniMax”

Install

Configuration

Usage

Prompting (make it sound better)

Models / engines

Development

Limitations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages