AI-native macOS menu bar dictation with a post-recording text pipeline.
Muninn records speech, transcribes it, then runs the transcript through a configurable pipeline before injecting the final text back into the active app. The core idea is not just voice capture. It is the AI-native pass after recording that can correct, reshape, or enhance the transcribed text so technical dictation survives intact.
It is designed for code-adjacent dictation: commands, flags, package names, file paths, env vars, acronyms, and other text that normal voice tools often mangle.
Muninn is:
- a local app with global hotkeys, a menu bar indicator, microphone capture, and keyboard injection
- a post-recording pipeline runner that can chain built-in AI steps with normal Unix commands
- BYOK by design: you bring the provider keys, models, and settings; Muninn orchestrates the flow and applies its own developer-focused text transformation layer on top
High-level flow:
hotkey -> record temp WAV (default 16 kHz mono) -> resolve transcription route -> transcribe with the first available provider -> run Muninn refine pass -> optional filters -> inject text
The default setup is already a two-pass AI pipeline. First, your chosen STT provider turns audio into raw text. Then Muninn runs a second pass that aligns that text to developer needs: technical terms, commands, flags, paths, env vars, acronyms, and obvious dictation errors. That second pass is conservative by default, but it is still part of the core product behavior, not an afterthought.
The current app supports:
- a live menu bar app
- macOS global hotkeys
- microphone recording to a temp WAV with configurable mono output and sample rate
- ordered transcription-provider routing plus built-in refine and external pipeline steps
- keyboard-event text injection into the current app
- stderr tracing logs plus optional replay artifacts per utterance
The pipeline is still the core idea. Muninn now resolves an ordered transcription route from [transcription].providers before the runner sees the utterance, then it hands the runner an ordinary concrete pipeline. Existing configs that spell STT steps directly in pipeline.steps still work unchanged.
Each step is declared in config and runs as a command with:
cmd- optional
args timeout_mson_error- optional
io_mode
Ordered transcription providers:
apple_speechwhisper_cppdeepgramopenaigoogle
Built-in pipeline steps:
stt_apple_speechstt_whisper_cppstt_deepgramstt_openaistt_googlerefine
What makes this flexible:
- The preferred STT surface is
[transcription].providers, so profiles can reorder or narrow fallback without copying raw step lists. - Built-ins are still referenced directly in config, so you can use Muninn's own steps without wiring separate binaries.
- External Unix tools work too. Text filters like
sed,tr, andawkcan be dropped into the pipeline directly. - External steps default to plain text filtering. Use
io_mode = "envelope_json"only when a step truly needs the full JSON envelope. - Each step has its own timeout and error policy, so you can choose when to
continue,fallback, orabort. - Muninn prefers
output.final_textfor injection, but can fall back totranscript.raw_textwhen a later step fails. - The built-in
refinestep takes the raw transcript, applies a fixed Muninn contract plus your configured hints, and writes the accepted result tooutput.final_text.
That gives you a small but useful contract: keep the default developer-focused pipeline if it works, or swap in your own tools when you want more control over the transformation chain.
Example shape:
[transcription]
providers = ["apple_speech", "whisper_cpp", "deepgram", "openai", "google"]
[[pipeline.steps]]
id = "refine"
cmd = "refine"
timeout_ms = 2500
on_error = "continue"
[[pipeline.steps]]
id = "uppercase"
cmd = "/usr/bin/tr"
args = ["[:lower:]", "[:upper:]"]
timeout_ms = 250
on_error = "continue"If you already have explicit stt_* steps in pipeline.steps, Muninn still accepts them and preserves that route order. The ordered-provider surface is the preferred way to express fallback now.
Muninn reads provider credentials from your environment or config and uses them directly for its built-in steps. Environment variables override config values.
Setup:
- Apple Speech: no API key is required; this local leg requires macOS 26+ and Apple-managed Speech assets for the selected locale
- Whisper.cpp: no API key is required; Muninn auto-downloads the selected or default model on first use when it knows the canonical upstream file name, or you can still point
providers.whisper_cpp.modelat a local.binfile - Deepgram: set
DEEPGRAM_API_KEY; Muninn uses the prerecorded/v1/listenAPI withmodel = "nova-3",language = "en", and smart formatting enabled by default - OpenAI: set
OPENAI_API_KEYfor the OpenAI route leg and for the default refine pass - Google: set
GOOGLE_API_KEYorGOOGLE_STT_TOKENfor the Google route leg - optional provider settings such as endpoints and models live in the config you control
The shipped route order is local-first. apple_speech and whisper_cpp run locally on completed recordings for post-processing transcription; deepgram is the preferred cloud leg for prerecorded uploads; openai and google remain fallback cloud legs.
Whisper model lifecycle:
- documented first-use default:
tiny.en, resolved asggml-tiny.en.bin - default model directory:
~/.local/share/muninn/models - override surface:
[providers.whisper_cpp].model,[providers.whisper_cpp].model_dir, and[providers.whisper_cpp].device - install behavior today: Muninn auto-downloads the selected/default canonical Whisper model into
providers.whisper_cpp.model_diron first use; explicit custom paths still need you to place the file there yourself - first-use tradeoff: the first utterance that needs a missing model will block on the download before transcription starts
- explicit-path failure mode: if you point
providers.whisper_cpp.modelat a custom absolute/tilde path and the file is missing, Muninn records an actionable missing-model diagnostic and continues the ordered route - performance tradeoff:
tiny.enis the fastest and smallest launchable default, while larger models such asbase.entrade more disk and latency for better accuracy - acceleration:
device = "auto"prefers Metal on Apple Silicon builds when available and uses CPU elsewhere;device = "gpu"is explicit and fails diagnostically on unsupported builds
Deepgram provider defaults:
- prerecorded endpoint:
https://api.deepgram.com/v1/listen - documented first-use model:
nova-3 - default language hint:
en - request behavior: Muninn uploads the completed recording binary with
smart_format=true - override surface:
[providers.deepgram].endpoint,[providers.deepgram].model,[providers.deepgram].language - env overrides:
DEEPGRAM_STT_ENDPOINT,DEEPGRAM_STT_MODEL, andDEEPGRAM_STT_LANGUAGE
That makes Muninn AI-native even in BYOK mode. You are not just piping audio into someone else's transcript API and injecting whatever comes back. The default flow uses your STT provider for the first pass, then uses Muninn's own built-in prompt contract for a second pass that aligns the text to developer dictation.
Think of transcript.system_prompt as a voice/style hint for refine:
[transcript]
system_prompt = "Prefer minimal corrections. Focus on technical terms, developer tools, package names, commands, flags, file names, paths, env vars, acronyms, and obvious dictation errors. If uncertain, keep the original wording."It does not change the speaker's voice or the STT provider. It steers the second-pass text transformation. The shipped default hint is intentionally light-touch: preserve wording, fix technical tokens, and avoid stylistic rewrites. If refine is unsure, or if a change is too aggressive, Muninn keeps the original transcript instead of forcing a rewrite. If you want a stronger opinionated output, you can change that prompt, add extra pipeline filters, or attach a custom envelope-aware step.
If you want bounded vocabulary biasing without introducing a dedicated provider subsystem, append a small JSON block through the same hint surface:
[transcript]
system_prompt_append = """
Vocabulary JSON:
{"terms":["Muninn","whisper.cpp","Deepgram","Cargo.toml"],"commands":["cargo test -q","rg --files"],"paths":["src/config.rs",".env"]}
"""system_prompt_append is generic prompt composition. Muninn does not parse the JSON or translate it into provider-native adaptation APIs. It simply forwards the extra block into the built-in refine pass. That means:
- users who do nothing keep the current STT and refine behavior
- prompt-based vocabulary biasing is best-effort only
- provider-native vocabulary and adaptation features remain out of scope
Replay artifacts redact provider secrets before they are written. When replay logging retains audio, Muninn prefers a filesystem hard link and falls back to a copy.
Muninn can now resolve different refine styles from the current app context. It captures the frontmost app bundle id, app name, and a best-effort window title, then applies the first matching profile_rules entry. Order matters: put the most specific rules first. If nothing matches, Muninn falls back to app.profile.
Use voices to define refine-oriented behavior plus an optional one-letter tray glyph. Use profiles to choose a voice and optionally add per-context recording, pipeline, transcript, or refine overrides on top. Voice here means text-shaping behavior, not audio voice.
Use system_prompt when you want a full replacement. Use system_prompt_append when you want to layer another bounded hint block, such as context-specific vocabulary JSON, without copying the whole base prompt.
[app]
profile = "default"
[voices.codex]
indicator_glyph = "C"
system_prompt = "Prefer terse developer dictation. Keep commands, flags, file names, and code tokens intact."
system_prompt_append = """
Vocabulary JSON:
{"terms":["Codex","Muninn","Cargo.toml"],"commands":["cargo test -q","cargo clippy -q --all-targets -- -D warnings"]}
"""
[voices.terminal]
indicator_glyph = "T"
system_prompt = "Preserve shell commands exactly. Prefer minimal punctuation changes."
[voices.mail]
indicator_glyph = "E"
system_prompt = "Correct spelling and obvious grammar in the language already being used. Preserve the intended language, names, quoted text, URLs, and code. Do not translate."
[profiles.codex]
voice = "codex"
[profiles.terminal]
voice = "terminal"
[profiles.mail]
voice = "mail"
[profiles.mail.transcript]
system_prompt_append = """
Vocabulary JSON:
{"terms":["Siobhan","Niamh","Muninn"],"products":["Deepgram"]}
"""
[[profile_rules]]
id = "codex-app"
profile = "codex"
app_name = "Codex"
[[profile_rules]]
id = "terminal-app"
profile = "terminal"
bundle_id = "com.apple.Terminal"
[[profile_rules]]
id = "mail-app"
profile = "mail"
bundle_id_prefix = "com.apple.mail"Resolution order is:
- start from the base config
- apply the matched voice for refine-oriented defaults
- apply the matched profile last, so profile overrides win when both touch the same field
Tray behavior follows the resolved voice:
- idle preview shows the glyph for the currently matched voice; when no app rule matches, the tray falls back to
Meven thoughapp.profilestill applies - recording and processing freeze the resolved glyph for that utterance even if the frontmost app changes
?remains reserved for missing-credentials feedback and overrides any voice glyph
This is the shortest path to a working local setup.
cargo buildConfig file precedence:
MUNINN_CONFIG$XDG_CONFIG_HOME/muninn/config.toml~/.config/muninn/config.toml
If the resolved config file is missing, Muninn creates a launchable default config automatically. If you want the sample config explicitly:
if [ -n "${MUNINN_CONFIG:-}" ]; then
CONFIG_PATH="$MUNINN_CONFIG"
elif [ -n "${XDG_CONFIG_HOME:-}" ]; then
CONFIG_PATH="$XDG_CONFIG_HOME/muninn/config.toml"
else
CONFIG_PATH="$HOME/.config/muninn/config.toml"
fi
mkdir -p "$(dirname "$CONFIG_PATH")"
cp configs/config.sample.toml "$CONFIG_PATH"
echo "Using config: $CONFIG_PATH"The sample enables the local-first ordered transcription route and keeps refine as the first explicit pipeline step.
In other words: resolve providers, transcribe with the first usable leg, run Muninn's developer-focused refine pass, then inject.
It also defaults recording output to mono = true and sample_rate_khz = 16.
Replay audio retention defaults to replay_retain_audio = true; set it to false if you only want replay metadata.
Muninn auto-downloads the selected/default canonical Whisper model on first use. If you want to avoid first-use latency, pre-warm the cache once:
mkdir -p "$HOME/.local/share/muninn/models"
curl -L \
-o "$HOME/.local/share/muninn/models/ggml-tiny.en.bin" \
"https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.en.bin"This matches the launchable default config:
providers.whisper_cpp.model = "tiny.en"providers.whisper_cpp.model_dir = "~/.local/share/muninn/models"providers.whisper_cpp.device = "auto"
If you use an explicit custom path such as providers.whisper_cpp.model = "~/models/custom.bin" and that file is missing, Muninn will log missing_whisper_cpp_model, skip refine because transcript.raw_text is still empty, and a local-only Whisper route will inject nothing.
Boundary and tradeoffs:
- Whisper.cpp is post-recording only in Muninn; there is no streaming or partial-result path in this backend
tiny.enis English-only and optimized for footprint and latency- moving up to a larger model such as
base.enusually improves accuracy at the cost of more disk, memory, and inference time
Muninn now tries to load ./.env from the current working directory by default. Existing shell environment variables still win over .env and config values. Set MUNINN_LOAD_DOTENV=0, false, or no if you want to disable this.
| Concern | Variables | Notes |
|---|---|---|
| Apple Speech transcription | none | Configure [providers.apple_speech] (locale and install_assets) in config; this provider is completed-recording only, requires macOS 26+, and uses Apple-managed assets |
| Whisper.cpp transcription | none | Muninn auto-downloads the selected/default canonical model into providers.whisper_cpp.model_dir on first use. If you point providers.whisper_cpp.model at a custom local path and that file is missing, Muninn logs missing_whisper_cpp_model and a local-only Whisper route produces no injected text. |
| Deepgram transcription | DEEPGRAM_API_KEY, optional DEEPGRAM_STT_ENDPOINT, optional DEEPGRAM_STT_MODEL, optional DEEPGRAM_STT_LANGUAGE, optional MUNINN_DEEPGRAM_STUB_TEXT |
Deepgram is the preferred cloud route leg for prerecorded uploads; Muninn sends the completed recording with smart_format=true, and stub text is only an optional bypass. |
| OpenAI transcription | OPENAI_API_KEY, MUNINN_OPENAI_STUB_TEXT |
OpenAI runs live when transcript.raw_text is missing; stub text is only an optional bypass. |
| Google transcription | GOOGLE_API_KEY or GOOGLE_STT_TOKEN, optional GOOGLE_STT_ENDPOINT, optional GOOGLE_STT_MODEL, optional MUNINN_GOOGLE_STUB_TEXT |
Google STT runs live when transcript.raw_text is missing; stub text is only an optional bypass. |
| Refine step | OPENAI_API_KEY, MUNINN_REFINE_STUB_TEXT |
This is the second AI pass. transcript.system_prompt can give it voice/style hints. Stub text bypasses the network for refine. |
MUNINN_CONFIG="$PWD/configs/config.sample.toml" cargo runCurrent upstream distribution status:
- GitHub Releases currently publish raw macOS binaries for Apple Silicon and Intel.
- Muninn does not yet ship an official signed and notarized
.appbundle. - Short term, the supported upstream path is: ship the binary, document a manual macOS setup step, and keep the local app bundle flow as an opt-in convenience.
If you install a release binary directly, keep it at a stable path before granting macOS permissions. For example:
mkdir -p "$HOME/.local/bin"
mv muninn "$HOME/.local/bin/muninn"
chmod +x "$HOME/.local/bin/muninn"
"$HOME/.local/bin/muninn"When you run Muninn as a raw binary:
- macOS permissions attach to that exact binary path
- moving or replacing the binary may require re-granting permissions
- Finder Login Items and app-style launch behavior do not apply
Optional macOS app bundle (recommended when you want stable permissions and Login Items instead of a raw LaunchAgent):
cargo build --release --bin muninn
bash scripts/package-macos-app.sh
open dist/Muninn.appThis app bundle flow is currently the recommended manual macOS setup step when you want stable permissions without waiting for an official upstream .app release. The packaging script signs the bundle ad hoc by default so macOS sees a stable app identity instead of only the linker-signed binary. Set CODESIGN_IDENTITY when you want to sign with a Developer ID certificate, or CODESIGN_APP=0 if you explicitly want to skip signing.
Then:
- move
dist/Muninn.appto/Applications/Muninn.appto keep the app identity stable - make sure your config and provider setup live at Muninn's normal resolved paths, because Finder/Login Items will not inherit your shell exports
- launch it once and grant permissions to
Muninn - add
Muninn.appunder System Settings > General > Login Items - keep
[app].autostart = falsewhen using the packaged app, because the built-in autostart still writes a raw-binary LaunchAgent
Optional macOS autostart:
- set
autostart = trueunder[app]in your config - Muninn uses the current executable path when writing the LaunchAgent
- Muninn writes
~/Library/LaunchAgents/com.bnomei.muninn.plistwhen it starts or reloads config - changes take effect on the next macOS login
- login autostart does not inherit shell exports; prefer config-backed credentials, or make sure the LaunchAgent working directory contains the
.envfile you want Muninn to read - if you are using
Muninn.app, prefer macOS Login Items over this LaunchAgent path
Muninn needs these macOS permissions:
| Permission | Why Muninn needs it | System Settings path |
|---|---|---|
| Input Monitoring | Listen for global hotkeys even when Muninn is not frontmost | Privacy & Security > Input Monitoring |
| Accessibility | Inject the final text into the current app | Privacy & Security > Accessibility |
| Microphone | Record your speech | Privacy & Security > Microphone |
Important:
- Grant these permissions to Muninn itself.
- Do not grant them to the target app you want to dictate into. Terminal, Codex, Mail, Slack, and other target apps do not need Input Monitoring or Accessibility for Muninn to work.
- If you launch Muninn from Terminal during development, do not assume Terminal's permissions are enough. The exact Muninn app or binary you launched must be allowed by macOS.
- If macOS shows a prompt, grant access and then retry the recording or injection action.
What to expect:
- A tray click can start recording and bootstrap the Microphone prompt even before Input Monitoring is granted. If Input Monitoring is still missing, Muninn also asks for it, but tray recording itself is not blocked on that permission.
- The first hotkey recording attempt may trigger the Input Monitoring prompt.
- The first text injection attempt may trigger the Accessibility prompt.
- If Input Monitoring was previously denied, macOS may not show the prompt again automatically.
If a permission prompt stops appearing, re-enable the permission manually in System Settings or reset the specific TCC service and relaunch Muninn:
tccutil reset ListenEvent
tccutil reset Accessibility
tccutil reset MicrophoneOptional. Built-ins can be run directly with:
cargo run -q -- __internal_step <stt_apple_speech|stt_whisper_cpp|stt_deepgram|stt_openai|stt_google|refine>Use the fixtures in tests/fixtures/ when you want example input.
stt_apple_speechis the native macOS 26+ on-device route leg; it reads completed recordings fromaudio.wav_path, uses Apple-managed speech assets, writestranscript.raw_texton success, and falls through when unsupported platform/locale or assets are unavailablestt_whisper_cppreadsaudio.wav_path, runs local whisper.cpp inference on completed recordings, writestranscript.raw_texton success, and records missing-model or unsupported-build diagnostics before falling throughstt_deepgramuploads the completed recording to Deepgram's prerecorded/v1/listenAPI, writestranscript.raw_texton success, and records structured missing-credential, request-failure, or empty-transcript diagnostics before falling throughstt_openaifillstranscript.raw_textwhen OpenAI is configured, otherwise it records structured failure details and lets later route legs runstt_googlefillstranscript.raw_textwhen Google is configured, otherwise it records structured failure details and lets later route legs runrefineapplies Muninn's built-in developer contract plus yourtranscript.system_prompthints and writes accepted output tooutput.final_text- recommended default:
[transcription].providers -> refine -> optional external filters
v0.2.0 introduces [transcription].providers as the ordered STT route that the runtime resolves before it hands a concrete pipeline to the runner. The shipped default list is local-first: apple_speech, whisper_cpp, deepgram, openai, then google. During execution Muninn records which provider was attempted, why it succeeded or failed, and whether the normalized route metadata allows the next provider to run.
Profiles can override only the provider order for their context, without re-encoding raw pipeline steps. For example, a mail profile that prefers the cloud leg can narrow the chain:
[profiles.mail.transcription]
providers = ["deepgram", "openai", "google"]This profile now skips the local-first defaults while other profiles continue inheriting the system-wide chained route.
- tracing logs go to stderr and are controlled with
RUST_LOG - replay logging is optional and writes per-utterance artifacts to
replay_dir replay_retain_audio = truekeeps anaudio.*artifact when possible by trying a hard link before copyingreplay_retain_audio = falsekeepsrecord.jsonand metadata only- replay snapshots redact provider secrets
- Muninn currently supports macOS only.
- Deepgram is currently a prerecorded-upload backend only; streaming and provider-specific vocabulary prompting remain out of scope here.
- Replay artifacts are for inspection, not re-run.
- There is no replay UI yet.
- Provider-backed transcription needs realistic timeout budgets.
Run the tracked benchmark suite with:
cargo bench --bench runtime_bottlenecksThe suite focuses on the bottlenecks that directly affect per-utterance latency without relying on network calls:
- audio output transform and resampling
- envelope JSON round trips on representative payload sizes
- Google request-body construction for representative WAV sizes
- per-utterance profile and voice resolution across many rules
- replacement scoring on dense candidate sets
- in-process pipeline runner overhead on larger envelopes
- replay persistence with and without retained audio artifacts
Filter to one hotspot with a benchmark name substring, for example:
cargo bench --bench runtime_bottlenecks pipeline_runner
cargo bench --bench runtime_bottlenecks replay_persistCodSpeed runs the same benchmark target in CI so regressions in these paths show up on PRs.
This repo ships a native prek.toml for fast local gates before you commit.
prek validate-config
prek run --all-files
prek installThe hooks stay intentionally small: cargo fmt --all -- --check and cargo clippy --all-targets --all-features -- -D warnings.
