Implement local dictation with WhisperKit + Hammerspoon

# Dictation Plan

## Goal

Implement fast, accurate, local dictation on macOS for VS Code, Obsidian, and terminal workflows (Ghostty + tmux). Prefer streaming; hold-to-talk acceptable for MVP.

- Hardware: Apple Silicon (M1 Max, M4)
- English only
- Local-only (low-latency, no network calls, no subscription costs)

## Approach

- Engine: whisper.cpp (Metal-accelerated, open source)
- MVP: hold-to-talk → transcribe → clipboard → smart paste
- Hotkey: Hammerspoon (global hotkey + per-app paste logic)
- Later: streaming via CLI if needed

## Phase 0 — Setup

```bash
# 1. Install deps (adds to .Brewfile)
brew install whisper-cpp hammerspoon sox

# 2. Run setup (checks deps, downloads model)
dictate --setup

# 3. Grant Hammerspoon permissions
#    System Settings → Privacy → Accessibility
#    System Settings → Privacy → Microphone

# 4. Reload Hammerspoon
```

## Phase 1 — MVP (hold-to-talk)

### Deliverables

1. **`bin/dictate`**
   - `--setup`: check/install deps, download model (interactive)
   - `--check`: quick deps check (exit 0 if ready, 1 if not)
   - Default: record 16kHz mono WAV, transcribe, output plaintext to stdout
   - Exit codes: 0 success, 1 not ready, 2 recording failed, 3 transcription failed
   - Model: `ggml-large-v3-turbo-q5_0.bin` (quantized, ~600MB, best accuracy/speed)
   - Temp files: `~/.cache/dictate/` (delete after transcription)
   - Model files: `~/.cache/dictate/models/`
   - Minimum recording: 0.5s (skip transcription if shorter, still play sounds)

2. **`.hammerspoon/init.lua`**
   - Hold-to-talk: hold **Fn** to record (matches Wispr Flow PTT style)
   - On key down: play start sound, start sox recording, show menu bar indicator
   - On key up: play stop sound, stop sox, run transcription, copy to clipboard, smart paste
   - Menu bar icon: idle (⎯) / recording (●) / processing (⋯)
   - Audio feedback via `hs.sound.getByName()`: start=Tink, stop=Pop, error=Basso
   - On error (exit 1): play error sound, show notification "Run: dictate --setup"
   - Fn detection via `hs.eventtap` + `flagsChanged` (raw flag `0x800000`)
   - Note: If using Wispr Flow, disable it or configure different hotkey in Phase 2
   - Future: check deps on Hammerspoon load, show ⚠ if not ready

### Smart paste logic

Detect frontmost app and use appropriate paste method:

| App | Paste method |
|-----|--------------|
| Ghostty | Cmd+Shift+V (bracketed paste, works in tmux) |
| Terminal.app | Cmd+V |
| iTerm2 | Cmd+V |
| VS Code, Obsidian, others | Cmd+V |

Note: Bracketed paste works correctly in tmux. Defer `tmux set-buffer` detection to Phase 2 only if users report issues.

### Acceptance criteria

- Works reliably in:
  - VS Code, Obsidian (GUI apps)
  - Ghostty + tmux (terminal)
  - Claude Code, Gemini CLI, Copilot CLI, Codex (AI CLI tools in terminal)
- Audio feedback on start/stop/error (no menu bar required for MVP)
- Transcript pasted after releasing hotkey
- Graceful failure if sox/whisper-cpp unavailable

## Phase 2 — Quality pass

### Enhancements

- Post-processing substitutions:
  - "backtick" → `, "open paren" → (, "underscore" → _
  - Optional: "snake case" / "camel case" transforms
- Technical vocabulary prompting (if whisper.cpp supports initial prompt)
- Silence trim before transcription (reduce latency)
- Config file: `~/.config/dictate/config.yml` for substitutions, per-app rules, hotkey
- Configurable hotkey (Fn, Hyper+D, custom)
- tmux `set-buffer` paste if bracketed paste proves problematic

### Acceptance criteria

- Technical tokens improved vs baseline
- Config file supports substitutions and hotkey customization

## Phase 3 — Streaming evaluation

### Goal

Validate real-time streaming via whisper.cpp `--stream` mode.

### Tasks

- Test streaming with `whisper-cpp --stream`
- Evaluate partials vs final output format
- Strategy: show partials in overlay, paste final on silence/endpoint

### Acceptance criteria

- Streaming observed live (or documented limitations)
- Clear decision: proceed with streaming or stay with PTT

## Phase 4 — Native app (only if needed)

Pursue only if Phase 3 streaming insufficient.

### Implementation outline

- Swift app with AVAudioEngine capture
- whisper.cpp via C++ bindings or whisper-cpp-kit
- VAD/endpointing (silence detection)
- UI: menu bar + optional overlay for partials
- Alternative: Hammerspoon `hs.canvas` overlay (lighter weight)

### Acceptance criteria

- Low-latency streaming dictation
- Final transcript pasted once, no "partial spam"

## Decisions made

- **Engine** — whisper.cpp (open source, Metal-accelerated, no subscription)
- **Hold-to-talk** (not toggle) — simpler, no state tracking
- **Fn-hold hotkey** — matches Wispr Flow PTT, configurable in Phase 2
- **Smart paste** — per-app detection, Cmd+Shift+V for Ghostty (bracketed paste works in tmux)
- **Model** — `ggml-large-v3-turbo-q5_0.bin` for English + technical terms
- **Paths** — temp in `~/.cache/dictate/`, config in `~/.config/dictate/` (XDG compliant)
- **Audio** — Hammerspoon `hs.sound.getByName()`: Tink/Pop/Basso
- **Min duration** — 0.5s, skip transcription if shorter
- **Menu bar indicator** — required for recording feedback
- **Phase 4 conditional** — only if streaming insufficient


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement local dictation with WhisperKit + Hammerspoon #1

Dictation Plan

Goal

Approach

Phase 0 — Setup

Phase 1 — MVP (hold-to-talk)

Deliverables

Smart paste logic

Acceptance criteria

Phase 2 — Quality pass

Enhancements

Acceptance criteria

Phase 3 — Streaming evaluation

Goal

Tasks

Acceptance criteria

Phase 4 — Native app (only if needed)

Implementation outline

Acceptance criteria

Decisions made

Sub-issues

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

App	Paste method
Ghostty	Cmd+Shift+V (bracketed paste, works in tmux)
Terminal.app	Cmd+V
iTerm2	Cmd+V
VS Code, Obsidian, others	Cmd+V

Implement local dictation with WhisperKit + Hammerspoon #1

Description

Dictation Plan

Goal

Approach

Phase 0 — Setup

Phase 1 — MVP (hold-to-talk)

Deliverables

Smart paste logic

Acceptance criteria

Phase 2 — Quality pass

Enhancements

Acceptance criteria

Phase 3 — Streaming evaluation

Goal

Tasks

Acceptance criteria

Phase 4 — Native app (only if needed)

Implementation outline

Acceptance criteria

Decisions made

Sub-issues

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions