Skip to content

ishuru/speak2

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Speak2

Local voice dictation for macOS. Hold the fn key (configurable) to speak, release to transcribe. Works with any application.

100% on-device using WhisperKit or Parakeet - no cloud services, no data leaves your Mac.

Speech Recognition Models

Speak2 supports two speech recognition models:

Model Size Languages Best For
Whisper (base.en) ~140 MB English only Fast, accurate English transcription
Parakeet v3 ~600 MB 25 languages Multilingual users

You can download both and switch between them from the menu bar. Only one model is loaded at a time to conserve memory.

Requirements

  • macOS 14.0 or later
  • Apple Silicon Mac (M1/M2/M3)

Installation

From DMG (recommended)

Download the latest .dmg from the releases page and install.

Build from source

git clone https://github.com/zachswift615/speak2.git
cd speak2
swift build -c release

Run

swift run

Or run the release binary directly:

.build/release/Speak2

First Launch Setup

On first launch, a setup window will appear. You need to:

1. Grant Accessibility Permission

This is required for global fn key detection.

DMG installs

Screenshot 2025-12-01 at 2 13 06 PM

Click "Grant" next to Accessibility on the first launch window

image

Then click Open System Settings

image

Then find speak2 in the list and toggle the permission switch on and authenticate with password or fingerprint. If Speak2 is not in the list, click the + button and nagivate to your Applications directory where you dragged it to install, and Add Speak2 to the list of apps.

Building from source

Option A: Add Speak2 directly

  1. Open System Settings > Privacy & Security > Accessibility
  2. Click the + button
  3. Press Cmd+Shift+G and paste: ~/.build/release/Speak2 (or wherever you built it)
  4. Select the Speak2 executable and enable it

Option B: Enable Terminal (easier for development)

  1. Open System Settings > Privacy & Security > Accessibility
  2. Find Terminal in the list and toggle it ON
  3. This allows any app run from Terminal to use accessibility features

2. Grant Microphone Permission

Click "Grant" next to Microphone. And click "Allow" on the permission window that pops up.

3. Download Speech Model

Choose a model and click "Download":

  • Whisper (base.en) - ~140MB, English only, faster
  • Parakeet v3 - ~600MB, 25 languages, best for multilingual users

Note: Parakeet takes longer to load initially (~20-30 seconds) as it compiles the neural engine model. Subsequent loads are faster. The menu bar icon will show a spinning indicator while loading.

Once all three items show checkmarks, the setup window will indicate completion and you can close it.

Usage

  1. Hold the fn key - Recording starts (menu bar icon turns red)
  2. Speak - Say what you want to type
  3. Release fn key - Transcription happens (icon shows spinner), then text is pasted

The transcribed text is automatically pasted into whatever application text field has focus.

Menu Bar

Speak2 runs as a menu bar app (no dock icon). Look for the microphone icon:

  • White/Black (depending on macOS theme) - Idle, ready to record
  • Yellow spinning arrows - Loading model
  • Red mic - Recording in progress
  • Cyan spinner - Transcribing

The menu shows a status line at the top indicating the current state (e.g., "Ready – Whisper (base.en)").

Switching Models

Click the menu bar icon and select Model to switch between downloaded models. Models not yet downloaded show a ↓ indicator - clicking them opens the setup window to download.

Manage Models

Click Manage Models... to open the setup window where you can download additional models or delete existing ones to free up disk space.

Choosing Hotkey

You can choose from several hotkey options. Sometimes external keyboards don't send the function key reliably. In that case, you can choose one of the other options from the menu.

Launch at Login

You can choose to have Speak2 launch at login. If selected, a checkmark will appear beside this option. Click it again to remove it from the list of start up apps. You'll see this when you choose the start up option:

image

Quit Speak2

Click the menu bar icon and click "Quit Speak2".

How It Works

  • HotkeyManager - Detects hotkey press/release using CGEvent tap
  • AudioRecorder - Captures microphone audio at 16kHz mono PCM
  • ModelManager - Handles model downloading, loading, and switching
  • WhisperTranscriber - Runs WhisperKit on-device for speech-to-text
  • ParakeetTranscriber - Runs FluidAudio/Parakeet on-device for speech-to-text
  • TextInjector - Copies transcription to clipboard and simulates Cmd+V to paste

The selected model stays loaded in memory (~300-600MB RAM depending on model) for instant transcription.

Tips

  • Speak naturally with punctuation inflection - Whisper handles periods, commas, and question marks based on your tone
  • Keep recordings under 30 seconds for best performance
  • First transcription may be slightly slower as the model warms up

Known Limitations

  • Parakeet model takes ~20-30 seconds to load on first use (compiling neural engine model)
  • Uses clipboard for text injection (temporarily overwrites clipboard contents)
  • fn key detection requires Accessibility permission
  • Only tested on Apple Silicon Macs

Tech Stack

  • Swift + SwiftUI
  • WhisperKit - Apple's optimized Whisper implementation
  • FluidAudio - Parakeet speech recognition for Apple Silicon
  • AVFoundation for audio capture
  • CGEvent for global hotkey detection

License

MIT

About

Local Voice Dictation for MacOS

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Swift 93.5%
  • Shell 6.5%