Local voice dictation for macOS. Hold the fn key (configurable) to speak, release to transcribe. Works with any application.
100% on-device using WhisperKit or Parakeet - no cloud services, no data leaves your Mac.
Speak2 supports two speech recognition models:
| Model | Size | Languages | Best For |
|---|---|---|---|
| Whisper (base.en) | ~140 MB | English only | Fast, accurate English transcription |
| Parakeet v3 | ~600 MB | 25 languages | Multilingual users |
You can download both and switch between them from the menu bar. Only one model is loaded at a time to conserve memory.
- macOS 14.0 or later
- Apple Silicon Mac (M1/M2/M3)
Download the latest .dmg from the releases page and install.
git clone https://github.com/zachswift615/speak2.git
cd speak2
swift build -c releaseswift runOr run the release binary directly:
.build/release/Speak2On first launch, a setup window will appear. You need to:
This is required for global fn key detection.
Click "Grant" next to Accessibility on the first launch window
Then click Open System Settings
Then find speak2 in the list and toggle the permission switch on and authenticate with password or fingerprint. If Speak2 is not in the list, click the + button and nagivate to your Applications directory where you dragged it to install, and Add Speak2 to the list of apps.
Option A: Add Speak2 directly
- Open System Settings > Privacy & Security > Accessibility
- Click the + button
- Press Cmd+Shift+G and paste:
~/.build/release/Speak2(or wherever you built it) - Select the Speak2 executable and enable it
Option B: Enable Terminal (easier for development)
- Open System Settings > Privacy & Security > Accessibility
- Find Terminal in the list and toggle it ON
- This allows any app run from Terminal to use accessibility features
Click "Grant" next to Microphone. And click "Allow" on the permission window that pops up.
Choose a model and click "Download":
- Whisper (base.en) - ~140MB, English only, faster
- Parakeet v3 - ~600MB, 25 languages, best for multilingual users
Note: Parakeet takes longer to load initially (~20-30 seconds) as it compiles the neural engine model. Subsequent loads are faster. The menu bar icon will show a spinning indicator while loading.
Once all three items show checkmarks, the setup window will indicate completion and you can close it.
- Hold the fn key - Recording starts (menu bar icon turns red)
- Speak - Say what you want to type
- Release fn key - Transcription happens (icon shows spinner), then text is pasted
The transcribed text is automatically pasted into whatever application text field has focus.
Speak2 runs as a menu bar app (no dock icon). Look for the microphone icon:
- White/Black (depending on macOS theme) - Idle, ready to record
- Yellow spinning arrows - Loading model
- Red mic - Recording in progress
- Cyan spinner - Transcribing
The menu shows a status line at the top indicating the current state (e.g., "Ready – Whisper (base.en)").
Click the menu bar icon and select Model to switch between downloaded models. Models not yet downloaded show a ↓ indicator - clicking them opens the setup window to download.
Click Manage Models... to open the setup window where you can download additional models or delete existing ones to free up disk space.
You can choose from several hotkey options. Sometimes external keyboards don't send the function key reliably. In that case, you can choose one of the other options from the menu.
You can choose to have Speak2 launch at login. If selected, a checkmark will appear beside this option. Click it again to remove it from the list of start up apps. You'll see this when you choose the start up option:
Click the menu bar icon and click "Quit Speak2".
- HotkeyManager - Detects hotkey press/release using CGEvent tap
- AudioRecorder - Captures microphone audio at 16kHz mono PCM
- ModelManager - Handles model downloading, loading, and switching
- WhisperTranscriber - Runs WhisperKit on-device for speech-to-text
- ParakeetTranscriber - Runs FluidAudio/Parakeet on-device for speech-to-text
- TextInjector - Copies transcription to clipboard and simulates Cmd+V to paste
The selected model stays loaded in memory (~300-600MB RAM depending on model) for instant transcription.
- Speak naturally with punctuation inflection - Whisper handles periods, commas, and question marks based on your tone
- Keep recordings under 30 seconds for best performance
- First transcription may be slightly slower as the model warms up
- Parakeet model takes ~20-30 seconds to load on first use (compiling neural engine model)
- Uses clipboard for text injection (temporarily overwrites clipboard contents)
- fn key detection requires Accessibility permission
- Only tested on Apple Silicon Macs
- Swift + SwiftUI
- WhisperKit - Apple's optimized Whisper implementation
- FluidAudio - Parakeet speech recognition for Apple Silicon
- AVFoundation for audio capture
- CGEvent for global hotkey detection
MIT