Skip to content

bnomei/muninn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

muninn

Crates.io Version CI CodSpeed Crates.io Downloads License Discord Buymecoffee

AI-native macOS menu bar dictation with a post-recording text pipeline.

Muninn records speech, transcribes it, then runs the transcript through a configurable pipeline before injecting the final text back into the active app. The core idea is not just voice capture. It is the AI-native pass after recording that can correct, reshape, or enhance the transcribed text so technical dictation survives intact.

It is designed for code-adjacent dictation: commands, flags, package names, file paths, env vars, acronyms, and other text that normal voice tools often mangle.

Muninn is:

  • a local app with global hotkeys, a menu bar indicator, microphone capture, and keyboard injection
  • a post-recording pipeline runner that can chain built-in AI steps with normal Unix commands
  • BYOK by design: you bring the provider keys, models, and settings; Muninn orchestrates the flow and applies its own developer-focused text transformation layer on top

screenshot

What Muninn Does

High-level flow:

hotkey -> record temp WAV (default 16 kHz mono) -> resolve transcription route -> transcribe with the first available provider -> run Muninn refine pass -> optional filters -> inject text

The default setup is already a two-pass AI pipeline. First, your chosen STT provider turns audio into raw text. Then Muninn runs a second pass that aligns that text to developer needs: technical terms, commands, flags, paths, env vars, acronyms, and obvious dictation errors. That second pass is conservative by default, but it is still part of the core product behavior, not an afterthought.

The current app supports:

  • a live menu bar app
  • macOS global hotkeys
  • microphone recording to a temp WAV with configurable mono output and sample rate
  • ordered transcription-provider routing plus built-in refine and external pipeline steps
  • keyboard-event text injection into the current app
  • stderr tracing logs plus optional replay artifacts per utterance

Pipeline-First By Design

The pipeline is still the core idea. Muninn now resolves an ordered transcription route from [transcription].providers before the runner sees the utterance, then it hands the runner an ordinary concrete pipeline. Existing configs that spell STT steps directly in pipeline.steps still work unchanged.

Each step is declared in config and runs as a command with:

  • cmd
  • optional args
  • timeout_ms
  • on_error
  • optional io_mode

Ordered transcription providers:

  • apple_speech
  • whisper_cpp
  • deepgram
  • openai
  • google

Built-in pipeline steps:

  • stt_apple_speech
  • stt_whisper_cpp
  • stt_deepgram
  • stt_openai
  • stt_google
  • refine

What makes this flexible:

  • The preferred STT surface is [transcription].providers, so profiles can reorder or narrow fallback without copying raw step lists.
  • Built-ins are still referenced directly in config, so you can use Muninn's own steps without wiring separate binaries.
  • External Unix tools work too. Text filters like sed, tr, and awk can be dropped into the pipeline directly.
  • External steps default to plain text filtering. Use io_mode = "envelope_json" only when a step truly needs the full JSON envelope.
  • Each step has its own timeout and error policy, so you can choose when to continue, fallback, or abort.
  • Muninn prefers output.final_text for injection, but can fall back to transcript.raw_text when a later step fails.
  • The built-in refine step takes the raw transcript, applies a fixed Muninn contract plus your configured hints, and writes the accepted result to output.final_text.

That gives you a small but useful contract: keep the default developer-focused pipeline if it works, or swap in your own tools when you want more control over the transformation chain.

Example shape:

[transcription]
providers = ["apple_speech", "whisper_cpp", "deepgram", "openai", "google"]

[[pipeline.steps]]
id = "refine"
cmd = "refine"
timeout_ms = 2500
on_error = "continue"

[[pipeline.steps]]
id = "uppercase"
cmd = "/usr/bin/tr"
args = ["[:lower:]", "[:upper:]"]
timeout_ms = 250
on_error = "continue"

If you already have explicit stt_* steps in pipeline.steps, Muninn still accepts them and preserves that route order. The ordered-provider surface is the preferred way to express fallback now.

BYOK And AI-Native Defaults

Muninn reads provider credentials from your environment or config and uses them directly for its built-in steps. Environment variables override config values.

Setup:

  • Apple Speech: no API key is required; this local leg requires macOS 26+ and Apple-managed Speech assets for the selected locale
  • Whisper.cpp: no API key is required; Muninn auto-downloads the selected or default model on first use when it knows the canonical upstream file name, or you can still point providers.whisper_cpp.model at a local .bin file
  • Deepgram: set DEEPGRAM_API_KEY; Muninn uses the prerecorded /v1/listen API with model = "nova-3", language = "en", and smart formatting enabled by default
  • OpenAI: set OPENAI_API_KEY for the OpenAI route leg and for the default refine pass
  • Google: set GOOGLE_API_KEY or GOOGLE_STT_TOKEN for the Google route leg
  • optional provider settings such as endpoints and models live in the config you control

The shipped route order is local-first. apple_speech and whisper_cpp run locally on completed recordings for post-processing transcription; deepgram is the preferred cloud leg for prerecorded uploads; openai and google remain fallback cloud legs.

Whisper model lifecycle:

  • documented first-use default: tiny.en, resolved as ggml-tiny.en.bin
  • default model directory: ~/.local/share/muninn/models
  • override surface: [providers.whisper_cpp].model, [providers.whisper_cpp].model_dir, and [providers.whisper_cpp].device
  • install behavior today: Muninn auto-downloads the selected/default canonical Whisper model into providers.whisper_cpp.model_dir on first use; explicit custom paths still need you to place the file there yourself
  • first-use tradeoff: the first utterance that needs a missing model will block on the download before transcription starts
  • explicit-path failure mode: if you point providers.whisper_cpp.model at a custom absolute/tilde path and the file is missing, Muninn records an actionable missing-model diagnostic and continues the ordered route
  • performance tradeoff: tiny.en is the fastest and smallest launchable default, while larger models such as base.en trade more disk and latency for better accuracy
  • acceleration: device = "auto" prefers Metal on Apple Silicon builds when available and uses CPU elsewhere; device = "gpu" is explicit and fails diagnostically on unsupported builds

Deepgram provider defaults:

  • prerecorded endpoint: https://api.deepgram.com/v1/listen
  • documented first-use model: nova-3
  • default language hint: en
  • request behavior: Muninn uploads the completed recording binary with smart_format=true
  • override surface: [providers.deepgram].endpoint, [providers.deepgram].model, [providers.deepgram].language
  • env overrides: DEEPGRAM_STT_ENDPOINT, DEEPGRAM_STT_MODEL, and DEEPGRAM_STT_LANGUAGE

That makes Muninn AI-native even in BYOK mode. You are not just piping audio into someone else's transcript API and injecting whatever comes back. The default flow uses your STT provider for the first pass, then uses Muninn's own built-in prompt contract for a second pass that aligns the text to developer dictation.

Think of transcript.system_prompt as a voice/style hint for refine:

[transcript]
system_prompt = "Prefer minimal corrections. Focus on technical terms, developer tools, package names, commands, flags, file names, paths, env vars, acronyms, and obvious dictation errors. If uncertain, keep the original wording."

It does not change the speaker's voice or the STT provider. It steers the second-pass text transformation. The shipped default hint is intentionally light-touch: preserve wording, fix technical tokens, and avoid stylistic rewrites. If refine is unsure, or if a change is too aggressive, Muninn keeps the original transcript instead of forcing a rewrite. If you want a stronger opinionated output, you can change that prompt, add extra pipeline filters, or attach a custom envelope-aware step.

If you want bounded vocabulary biasing without introducing a dedicated provider subsystem, append a small JSON block through the same hint surface:

[transcript]
system_prompt_append = """
Vocabulary JSON:
{"terms":["Muninn","whisper.cpp","Deepgram","Cargo.toml"],"commands":["cargo test -q","rg --files"],"paths":["src/config.rs",".env"]}
"""

system_prompt_append is generic prompt composition. Muninn does not parse the JSON or translate it into provider-native adaptation APIs. It simply forwards the extra block into the built-in refine pass. That means:

  • users who do nothing keep the current STT and refine behavior
  • prompt-based vocabulary biasing is best-effort only
  • provider-native vocabulary and adaptation features remain out of scope

Replay artifacts redact provider secrets before they are written. When replay logging retains audio, Muninn prefers a filesystem hard link and falls back to a copy.

Contextual Profiles And Voices

Muninn can now resolve different refine styles from the current app context. It captures the frontmost app bundle id, app name, and a best-effort window title, then applies the first matching profile_rules entry. Order matters: put the most specific rules first. If nothing matches, Muninn falls back to app.profile.

Use voices to define refine-oriented behavior plus an optional one-letter tray glyph. Use profiles to choose a voice and optionally add per-context recording, pipeline, transcript, or refine overrides on top. Voice here means text-shaping behavior, not audio voice.

Use system_prompt when you want a full replacement. Use system_prompt_append when you want to layer another bounded hint block, such as context-specific vocabulary JSON, without copying the whole base prompt.

[app]
profile = "default"

[voices.codex]
indicator_glyph = "C"
system_prompt = "Prefer terse developer dictation. Keep commands, flags, file names, and code tokens intact."
system_prompt_append = """
Vocabulary JSON:
{"terms":["Codex","Muninn","Cargo.toml"],"commands":["cargo test -q","cargo clippy -q --all-targets -- -D warnings"]}
"""

[voices.terminal]
indicator_glyph = "T"
system_prompt = "Preserve shell commands exactly. Prefer minimal punctuation changes."

[voices.mail]
indicator_glyph = "E"
system_prompt = "Correct spelling and obvious grammar in the language already being used. Preserve the intended language, names, quoted text, URLs, and code. Do not translate."

[profiles.codex]
voice = "codex"

[profiles.terminal]
voice = "terminal"

[profiles.mail]
voice = "mail"
[profiles.mail.transcript]
system_prompt_append = """
Vocabulary JSON:
{"terms":["Siobhan","Niamh","Muninn"],"products":["Deepgram"]}
"""

[[profile_rules]]
id = "codex-app"
profile = "codex"
app_name = "Codex"

[[profile_rules]]
id = "terminal-app"
profile = "terminal"
bundle_id = "com.apple.Terminal"

[[profile_rules]]
id = "mail-app"
profile = "mail"
bundle_id_prefix = "com.apple.mail"

Resolution order is:

  • start from the base config
  • apply the matched voice for refine-oriented defaults
  • apply the matched profile last, so profile overrides win when both touch the same field

Tray behavior follows the resolved voice:

  • idle preview shows the glyph for the currently matched voice; when no app rule matches, the tray falls back to M even though app.profile still applies
  • recording and processing freeze the resolved glyph for that utterance even if the frontmost app changes
  • ? remains reserved for missing-credentials feedback and overrides any voice glyph

Quick Start

This is the shortest path to a working local setup.

1) Build the app

cargo build

2) Resolve the config path

Config file precedence:

  • MUNINN_CONFIG
  • $XDG_CONFIG_HOME/muninn/config.toml
  • ~/.config/muninn/config.toml

If the resolved config file is missing, Muninn creates a launchable default config automatically. If you want the sample config explicitly:

if [ -n "${MUNINN_CONFIG:-}" ]; then
  CONFIG_PATH="$MUNINN_CONFIG"
elif [ -n "${XDG_CONFIG_HOME:-}" ]; then
  CONFIG_PATH="$XDG_CONFIG_HOME/muninn/config.toml"
else
  CONFIG_PATH="$HOME/.config/muninn/config.toml"
fi

mkdir -p "$(dirname "$CONFIG_PATH")"
cp configs/config.sample.toml "$CONFIG_PATH"
echo "Using config: $CONFIG_PATH"

The sample enables the local-first ordered transcription route and keeps refine as the first explicit pipeline step. In other words: resolve providers, transcribe with the first usable leg, run Muninn's developer-focused refine pass, then inject. It also defaults recording output to mono = true and sample_rate_khz = 16. Replay audio retention defaults to replay_retain_audio = true; set it to false if you only want replay metadata.

3) Optional: preinstall a local Whisper model

Muninn auto-downloads the selected/default canonical Whisper model on first use. If you want to avoid first-use latency, pre-warm the cache once:

mkdir -p "$HOME/.local/share/muninn/models"
curl -L \
  -o "$HOME/.local/share/muninn/models/ggml-tiny.en.bin" \
  "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.en.bin"

This matches the launchable default config:

  • providers.whisper_cpp.model = "tiny.en"
  • providers.whisper_cpp.model_dir = "~/.local/share/muninn/models"
  • providers.whisper_cpp.device = "auto"

If you use an explicit custom path such as providers.whisper_cpp.model = "~/models/custom.bin" and that file is missing, Muninn will log missing_whisper_cpp_model, skip refine because transcript.raw_text is still empty, and a local-only Whisper route will inject nothing.

Boundary and tradeoffs:

  • Whisper.cpp is post-recording only in Muninn; there is no streaming or partial-result path in this backend
  • tiny.en is English-only and optimized for footprint and latency
  • moving up to a larger model such as base.en usually improves accuracy at the cost of more disk, memory, and inference time

4) Set provider env vars

Muninn now tries to load ./.env from the current working directory by default. Existing shell environment variables still win over .env and config values. Set MUNINN_LOAD_DOTENV=0, false, or no if you want to disable this.

Concern Variables Notes
Apple Speech transcription none Configure [providers.apple_speech] (locale and install_assets) in config; this provider is completed-recording only, requires macOS 26+, and uses Apple-managed assets
Whisper.cpp transcription none Muninn auto-downloads the selected/default canonical model into providers.whisper_cpp.model_dir on first use. If you point providers.whisper_cpp.model at a custom local path and that file is missing, Muninn logs missing_whisper_cpp_model and a local-only Whisper route produces no injected text.
Deepgram transcription DEEPGRAM_API_KEY, optional DEEPGRAM_STT_ENDPOINT, optional DEEPGRAM_STT_MODEL, optional DEEPGRAM_STT_LANGUAGE, optional MUNINN_DEEPGRAM_STUB_TEXT Deepgram is the preferred cloud route leg for prerecorded uploads; Muninn sends the completed recording with smart_format=true, and stub text is only an optional bypass.
OpenAI transcription OPENAI_API_KEY, MUNINN_OPENAI_STUB_TEXT OpenAI runs live when transcript.raw_text is missing; stub text is only an optional bypass.
Google transcription GOOGLE_API_KEY or GOOGLE_STT_TOKEN, optional GOOGLE_STT_ENDPOINT, optional GOOGLE_STT_MODEL, optional MUNINN_GOOGLE_STUB_TEXT Google STT runs live when transcript.raw_text is missing; stub text is only an optional bypass.
Refine step OPENAI_API_KEY, MUNINN_REFINE_STUB_TEXT This is the second AI pass. transcript.system_prompt can give it voice/style hints. Stub text bypasses the network for refine.

5) Run the tray app

MUNINN_CONFIG="$PWD/configs/config.sample.toml" cargo run

Current upstream distribution status:

  • GitHub Releases currently publish raw macOS binaries for Apple Silicon and Intel.
  • Muninn does not yet ship an official signed and notarized .app bundle.
  • Short term, the supported upstream path is: ship the binary, document a manual macOS setup step, and keep the local app bundle flow as an opt-in convenience.

If you install a release binary directly, keep it at a stable path before granting macOS permissions. For example:

mkdir -p "$HOME/.local/bin"
mv muninn "$HOME/.local/bin/muninn"
chmod +x "$HOME/.local/bin/muninn"
"$HOME/.local/bin/muninn"

When you run Muninn as a raw binary:

  • macOS permissions attach to that exact binary path
  • moving or replacing the binary may require re-granting permissions
  • Finder Login Items and app-style launch behavior do not apply

Optional macOS app bundle (recommended when you want stable permissions and Login Items instead of a raw LaunchAgent):

cargo build --release --bin muninn
bash scripts/package-macos-app.sh
open dist/Muninn.app

This app bundle flow is currently the recommended manual macOS setup step when you want stable permissions without waiting for an official upstream .app release. The packaging script signs the bundle ad hoc by default so macOS sees a stable app identity instead of only the linker-signed binary. Set CODESIGN_IDENTITY when you want to sign with a Developer ID certificate, or CODESIGN_APP=0 if you explicitly want to skip signing.

Then:

  • move dist/Muninn.app to /Applications/Muninn.app to keep the app identity stable
  • make sure your config and provider setup live at Muninn's normal resolved paths, because Finder/Login Items will not inherit your shell exports
  • launch it once and grant permissions to Muninn
  • add Muninn.app under System Settings > General > Login Items
  • keep [app].autostart = false when using the packaged app, because the built-in autostart still writes a raw-binary LaunchAgent

Optional macOS autostart:

  • set autostart = true under [app] in your config
  • Muninn uses the current executable path when writing the LaunchAgent
  • Muninn writes ~/Library/LaunchAgents/com.bnomei.muninn.plist when it starts or reloads config
  • changes take effect on the next macOS login
  • login autostart does not inherit shell exports; prefer config-backed credentials, or make sure the LaunchAgent working directory contains the .env file you want Muninn to read
  • if you are using Muninn.app, prefer macOS Login Items over this LaunchAgent path

6) Grant macOS permissions

Muninn needs these macOS permissions:

Permission Why Muninn needs it System Settings path
Input Monitoring Listen for global hotkeys even when Muninn is not frontmost Privacy & Security > Input Monitoring
Accessibility Inject the final text into the current app Privacy & Security > Accessibility
Microphone Record your speech Privacy & Security > Microphone

Important:

  • Grant these permissions to Muninn itself.
  • Do not grant them to the target app you want to dictate into. Terminal, Codex, Mail, Slack, and other target apps do not need Input Monitoring or Accessibility for Muninn to work.
  • If you launch Muninn from Terminal during development, do not assume Terminal's permissions are enough. The exact Muninn app or binary you launched must be allowed by macOS.
  • If macOS shows a prompt, grant access and then retry the recording or injection action.

What to expect:

  • A tray click can start recording and bootstrap the Microphone prompt even before Input Monitoring is granted. If Input Monitoring is still missing, Muninn also asks for it, but tray recording itself is not blocked on that permission.
  • The first hotkey recording attempt may trigger the Input Monitoring prompt.
  • The first text injection attempt may trigger the Accessibility prompt.
  • If Input Monitoring was previously denied, macOS may not show the prompt again automatically.

If a permission prompt stops appearing, re-enable the permission manually in System Settings or reset the specific TCC service and relaunch Muninn:

tccutil reset ListenEvent
tccutil reset Accessibility
tccutil reset Microphone

Internal Step Smoke Checks

Optional. Built-ins can be run directly with:

cargo run -q -- __internal_step <stt_apple_speech|stt_whisper_cpp|stt_deepgram|stt_openai|stt_google|refine>

Use the fixtures in tests/fixtures/ when you want example input.

Built-In Step Behavior

  • stt_apple_speech is the native macOS 26+ on-device route leg; it reads completed recordings from audio.wav_path, uses Apple-managed speech assets, writes transcript.raw_text on success, and falls through when unsupported platform/locale or assets are unavailable
  • stt_whisper_cpp reads audio.wav_path, runs local whisper.cpp inference on completed recordings, writes transcript.raw_text on success, and records missing-model or unsupported-build diagnostics before falling through
  • stt_deepgram uploads the completed recording to Deepgram's prerecorded /v1/listen API, writes transcript.raw_text on success, and records structured missing-credential, request-failure, or empty-transcript diagnostics before falling through
  • stt_openai fills transcript.raw_text when OpenAI is configured, otherwise it records structured failure details and lets later route legs run
  • stt_google fills transcript.raw_text when Google is configured, otherwise it records structured failure details and lets later route legs run
  • refine applies Muninn's built-in developer contract plus your transcript.system_prompt hints and writes accepted output to output.final_text
  • recommended default: [transcription].providers -> refine -> optional external filters

Ordered transcription provider routing

v0.2.0 introduces [transcription].providers as the ordered STT route that the runtime resolves before it hands a concrete pipeline to the runner. The shipped default list is local-first: apple_speech, whisper_cpp, deepgram, openai, then google. During execution Muninn records which provider was attempted, why it succeeded or failed, and whether the normalized route metadata allows the next provider to run.

Profiles can override only the provider order for their context, without re-encoding raw pipeline steps. For example, a mail profile that prefers the cloud leg can narrow the chain:

[profiles.mail.transcription]
providers = ["deepgram", "openai", "google"]

This profile now skips the local-first defaults while other profiles continue inheriting the system-wide chained route.

Replay And Debugging

  • tracing logs go to stderr and are controlled with RUST_LOG
  • replay logging is optional and writes per-utterance artifacts to replay_dir
  • replay_retain_audio = true keeps an audio.* artifact when possible by trying a hard link before copying
  • replay_retain_audio = false keeps record.json and metadata only
  • replay snapshots redact provider secrets

Current Limits

  • Muninn currently supports macOS only.
  • Deepgram is currently a prerecorded-upload backend only; streaming and provider-specific vocabulary prompting remain out of scope here.
  • Replay artifacts are for inspection, not re-run.
  • There is no replay UI yet.
  • Provider-backed transcription needs realistic timeout budgets.

Benchmarking

Run the tracked benchmark suite with:

cargo bench --bench runtime_bottlenecks

The suite focuses on the bottlenecks that directly affect per-utterance latency without relying on network calls:

  • audio output transform and resampling
  • envelope JSON round trips on representative payload sizes
  • Google request-body construction for representative WAV sizes
  • per-utterance profile and voice resolution across many rules
  • replacement scoring on dense candidate sets
  • in-process pipeline runner overhead on larger envelopes
  • replay persistence with and without retained audio artifacts

Filter to one hotspot with a benchmark name substring, for example:

cargo bench --bench runtime_bottlenecks pipeline_runner
cargo bench --bench runtime_bottlenecks replay_persist

CodSpeed runs the same benchmark target in CI so regressions in these paths show up on PRs.

Local Pre-commit

This repo ships a native prek.toml for fast local gates before you commit.

prek validate-config
prek run --all-files
prek install

The hooks stay intentionally small: cargo fmt --all -- --check and cargo clippy --all-targets --all-features -- -D warnings.