A macOS CLI for real-time speech recognition with CoreML acceleration, based on whisper.cpp's stream example.
MIT License - see LICENSE file for details.
- Real-time speech transcription from microphone with low latency
- CoreML acceleration for optimal performance on Apple Silicon Macs
- Metal GPU backend support for enhanced processing
- Voice Activity Detection (VAD) for efficient real-time processing
- Comprehensive model management with automatic downloads and storage optimization
- Multi-format export system supporting TXT, Markdown, JSON, CSV, SRT, VTT, XML
- Auto-copy functionality with automatic clipboard integration
- Multi-language speech transcription with bilingual output support (original + English translation)
- Advanced configuration system with JSON files, environment variables, and CLI options
- Professional subtitle generation in SRT and VTT formats
- Session metadata tracking with detailed performance metrics
- AI-powered meeting organization with Claude CLI integration for structured meeting summaries
- macOS 10.15 or later
- SDL2 library (
brew install sdl2) - CMake (
brew install cmake) - Models are downloaded automatically when needed
- Claude CLI (
https://claude.ai/code) for AI-powered meeting transcription organization - Meeting mode works without Claude CLI but provides raw transcription fallback
make install-deps && make build# Using build script
./build.sh
# Manual build
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DWHISPER_COREML=ON -DGGML_USE_METAL=ON
make -j$(nproc)make help # Show all available commands
# Build Commands
make build # Full build (configure + compile)
make rebuild # Quick rebuild (skip configure)
make clean # Remove build artifacts
make fresh # Clean + build
# Dependencies
make check-deps # Check if dependencies are installed
make install-deps # Install dependencies via Homebrew
# Run Commands
make run # Interactive model selection
make run-model MODEL=base.en # Run with specific model
make run-vad # Run VAD mode (recommended)
make list-models # Show available models
# Model Management
make list-downloaded # Show downloaded models with details
make show-storage # Show storage usage summary
make cleanup-models # Remove orphaned model files
# Export Examples
make run-export-txt # Transcribe with text export
make run-export-md # Transcribe with Markdown export
make run-export-json # Transcribe with JSON export
# Configuration
make config-list # Show current configuration
make config-set KEY=value VALUE=value # Set configuration
make config-get KEY=key # Get configuration
make config-reset # Reset to defaults
# Installation
make install # Install system-wide (/usr/local/bin)
make install-user # Install for current user (~/bin)
make uninstall # Remove system installation
make package # Create distribution package
# Development
make test # Test basic functionality
make stop # Stop all running dev apps
make run
# The CLI will guide you through model selection and downloadmake run-model MODEL=base.en
# Downloads base.en model automatically if not presentmake list-models # Show all available models for download
make list-downloaded # Show downloaded models with details
make show-storage # Show storage usage and cleanup suggestions# Delete specific model
recognize --delete-model base.en
# Delete all downloaded models
recognize --delete-all-models
# Cleanup orphaned files
recognize --cleanuprecognize -m base.en --step 0 --length 30000 -vth 0.6recognize -m base.en --step 500 --length 5000recognize -m base.en --coreml # Enable CoreML (default)
recognize -m base.en --no-coreml # Disable CoreML# Export to text file (auto-generated filename)
recognize -m base.en --export --export-format txt
# Export to Markdown with custom filename
recognize -m base.en --export --export-format md --export-file meeting.md
# Export to JSON with confidence scores
recognize -m base.en --export --export-format json --export-include-confidence
# Export to SRT subtitle file
recognize -m base.en --export --export-format srt
# Export with all metadata and timestamps
recognize -m base.en --export --export-format json
# Export without metadata (clean output)
recognize -m base.en --export --export-format txt --export-no-metadata --export-no-timestamps# Basic meeting transcription with AI organization
recognize --meeting
# Custom prompt file (advanced usage)
recognize --meeting --prompt custom_prompt.txt
# Meeting with specific model and output language
recognize --meeting --output-mode english -m base.en
# Meeting with speaker segmentation
recognize --meeting --tinydiarize -m small.en-tdrzMeeting Organization Features:
- Automatic AI Processing: Raw transcription is processed by Claude CLI when recording ends
- Structured Output: Generates professional meeting summaries with action items, decisions, and metadata
- Smart Fallback: If Claude CLI unavailable, saves raw transcription to same date-based file
- Date-Based Naming: Always saves to
[YYYY]-[MM]-[DD].mdwith automatic numeric suffix if file exists - Original Content Preserved: On success, raw transcription is wrapped in HTML comments
<!-- -->in the output - Default Prompt: Comprehensive meeting organization template included
- Integration: Works with all existing features (export, auto-copy, speaker segmentation)
- TXT: Plain text with optional timestamps and metadata
- Markdown: Formatted document with tables and styling
- JSON: Structured data with segments, metadata, and confidence scores
- CSV: Spreadsheet-compatible format with segment timing
- SRT: Standard subtitle format for video players
- VTT: WebVTT subtitle format for web players
- XML: Structured markup with complete session details
-h, --help- Show help message-m, --model- Model name (e.g., base.en, tiny.en) or file path-l, --language- Source language (default: en)-t, --threads- Number of threads (default: 4)--list-models- List all available models for download
--list-downloaded- Show downloaded models with sizes and paths--show-storage- Show detailed storage usage breakdown--delete-model MODEL- Delete a specific model--delete-all-models- Delete all downloaded models--cleanup- Remove orphaned model files
--export- Enable transcription export when session ends--export-format FORMAT- Export format: txt, md, json, csv, srt, vtt, xml--export-file FILE- Export to specific file (default: auto-generated)--export-auto-filename- Generate automatic filename with timestamp--export-no-metadata- Exclude session metadata from export--export-no-timestamps- Exclude timestamps from export--export-include-confidence- Include confidence scores in export
--auto-copy- Automatically copy transcription to clipboard when session ends--auto-copy-max-duration N- Max session duration in hours before skipping auto-copy--auto-copy-max-size N- Max transcription size in bytes before skipping auto-copy
--meeting- Enable meeting transcription mode with AI organization (saves to[YYYY]-[MM]-[DD].md)--prompt TEXT- Custom prompt for meeting organization (uses comprehensive default if not provided)--name PATH- (Deprecated) Meeting mode always uses date-based naming[YYYY]-[MM]-[DD].mdwith numeric suffix
-c, --capture- Audio capture device ID (default: -1 for default)--step- Audio step size in ms (default: 3000, 0 for VAD mode)--length- Audio length in ms (default: 10000)--keep- Audio to keep from previous step in ms (default: 200)
-tr, --translate- Translate to English-vth, --vad-thold- VAD threshold (default: 0.6)-fth, --freq-thold- High-pass frequency cutoff (default: 100.0)-bs, --beam-size- Beam search size (default: -1)-mt, --max-tokens- Max tokens per chunk (default: 32)
--coreml- Enable CoreML acceleration (default: enabled)--no-coreml- Disable CoreML acceleration-cm, --coreml-model- Specific CoreML model path
-tdrz, --tinydiarize- Enable speaker segmentation (requires tdrz model)- Speaker segmentation detects when different people are speaking and marks speaker turns
- Requires models with
tdrzsuffix (e.g.,ggml-small.en-tdrz.bin) - Currently supports English-only with small.en models
- Output includes
[SPEAKER_TURN]markers when speakers change
-f, --file- Output transcription to file-om, --output-mode- Output mode: original, english, bilingual (default: original)-sa, --save-audio- Save recorded audio to WAV file--no-timestamps- Disable timestamp output (auto in continuous mode)-ps, --print-special- Print special tokens
The CLI supports a comprehensive configuration system with multiple layers:
- Command-line arguments (highest priority)
- Environment variables
- Project config file (
.whisper-config.jsonorconfig.json) - User config file (
~/.recognize/config.json)
# Show current configuration (including system paths)
recognize config list
# Set configuration values
recognize config set model base.en
recognize config set threads 8
recognize config set use_coreml true
recognize config set models_dir /custom/path/to/models
# Get configuration values
recognize config get model
recognize config get threads
# Remove configuration values
recognize config unset model
# Reset all configuration to defaults
recognize config reset# Configuration management via Makefile
make config-list
make config-set KEY=model VALUE=base.en
make config-get KEY=threads
make config-resetAll configuration options can be set via environment variables with the WHISPER_ prefix:
export WHISPER_MODEL=base.en
export WHISPER_MODELS_DIR=/custom/path/to/models
export WHISPER_THREADS=8
export WHISPER_COREML=true
export WHISPER_VAD_THRESHOLD=0.7
export WHISPER_STEP_MS=3000
export WHISPER_LANGUAGE=en
export WHISPER_TINYDIARIZE=true
export WHISPER_AUTO_COPY=true
export WHISPER_AUTO_COPY_MAX_DURATION=2
export WHISPER_AUTO_COPY_MAX_SIZE=1048576Configuration files use JSON format:
{
"default_model": "base.en",
"models_directory": "/custom/path/to/models",
"threads": 8,
"use_coreml": true,
"vad_threshold": 0.6,
"step_ms": 3000,
"length_ms": 10000,
"language": "en",
"translate": false,
"save_audio": false,
"tinydiarize": false,
"auto_copy_enabled": true,
"auto_copy_max_duration_hours": 2,
"auto_copy_max_size_bytes": 1048576
}model/default_model- Default model to usemodels_dir/models_directory- Directory to store modelscoreml/use_coreml- Enable/disable CoreML accelerationcoreml_model- Specific CoreML model pathcapture/capture_device- Audio capture device IDstep/step_ms- Audio step size in millisecondslength/length_ms- Audio length in millisecondskeep/keep_ms- Audio to keep from previous stepvad/vad_threshold- Voice activity detection thresholdfreq/freq_threshold- High-pass frequency cutoffthreads- Number of processing threadstokens/max_tokens- Maximum tokens per chunkbeam/beam_size- Beam search sizelanguage/lang- Source languagetranslate- Translate to Englishtimestamps/no_timestamps- Disable timestampsspecial/print_special- Print special tokenscolors/print_colors- Print colors based on token confidencesave_audio- Save recorded audiotinydiarize/speaker_segmentation- Enable speaker segmentation (requires tdrz model)output/output_file- Output file pathformat/output_format- Output format (json, plain, timestamped)mode/output_mode- Output mode: original, english, bilingual
auto_copy/auto_copy_enabled- Enable/disable automatic clipboard copy when session endsauto_copy_max_duration/auto_copy_max_duration_hours- Maximum session duration (hours) before skipping auto-copy (default: 2)auto_copy_max_size/auto_copy_max_size_bytes- Maximum transcription size (bytes) before skipping auto-copy (default: 1MB)
export_enabled- Enable/disable automatic export when session ends (default: false)export_format- Default export format: txt, md, json, csv, srt, vtt, xml (default: txt)export_auto_filename- Generate automatic filename with timestamp (default: true)export_include_metadata- Include session metadata in exports (default: true)export_include_timestamps- Include timestamps in exports (default: true)export_include_confidence- Include confidence scores in exports (default: false)
meeting_mode- Enable/disable meeting transcription mode (default: false)meeting_prompt- Custom prompt for meeting organization (uses comprehensive default if empty)meeting_name- (Deprecated) Meeting mode always uses date-based naming[YYYY]-[MM]-[DD].mdwith numeric suffix
The CLI supports multi-language speech transcription with three output modes for seamless translation workflows:
original- Transcribe in the original spoken language only (default)english- Translate everything to English onlybilingual- Show both original language and English translation side by side
# Bilingual Chinese-English transcription
recognize -m medium --output-mode bilingual -l zh
# Japanese to English translation only
recognize -m medium --output-mode english -l ja
# Spanish transcription in original language
recognize -m medium --output-mode original -l es
# Set bilingual as default
recognize config set output_mode bilingual
recognize config set language zh
recognize -m medium # Uses configured defaultsBilingual Mode (with timestamps):
[00:01.000 --> 00:02.500] zh: 你好世界
[00:01.000 --> 00:02.500] en: Hello World
[00:02.500 --> 00:04.000] zh: 这是一个测试
[00:02.500 --> 00:04.000] en: This is a test
Bilingual Mode (plain text):
zh: 你好世界
en: Hello World
zh: 这是一个测试
en: This is a test
English-only Mode:
[00:01.000 --> 00:02.500] en: Hello World
[00:02.500 --> 00:04.000] en: This is a test
- Multilingual models required: Use models without
.ensuffix (e.g.,base,medium,large-v3) - Source language specification: Use
-lor--languagewith appropriate language code (e.g.,zh,es,fr,ja) - Two-pass processing: Bilingual mode performs both transcription and translation for optimal accuracy
All Whisper-supported languages work with the multi-language features:
- Chinese (
zh), Japanese (ja), Korean (ko) - Spanish (
es), French (fr), German (de), Italian (it) - Russian (
ru), Arabic (ar), Hindi (hi) - And 90+ more languages
- Bilingual mode: Approximately 2x processing time (runs two inference passes)
- English/Original modes: Standard processing time (single inference pass)
- Model recommendations:
mediumorlarge-v3for best translation quality
- Use CoreML: Enabled by default for best performance on Apple Silicon
- VAD Mode: Use
--step 0for efficient processing with voice detection - Model Selection:
base.enfor English-only, good balance of speed/accuracytiny.enfor fastest processingsmall.enfor better accuracy than tiny
- Thread Count: Use
-tto match your CPU cores for optimal performance
recognize
# 1. Shows available models
# 2. Prompts for model selection
# 3. Downloads automatically with progress
# 4. Shows usage examplesrecognize -m base.en --step 0 --length 30000recognize -m base.en --step 500 --length 5000recognize -m base.en -f transcript.txt# Chinese with English translation (side by side)
recognize -m base --output-mode bilingual -l zh
# Spanish to English translation only
recognize -m base --output-mode english -l es
# Traditional translate flag (compatibility)
recognize -m base -l es --translaterecognize -m tiny.en --step 500# Enable auto-copy with default settings (2 hours max, 1MB max)
recognize -m base.en --auto-copy
# Enable auto-copy with custom limits
recognize -m base.en --auto-copy --auto-copy-max-duration 1 --auto-copy-max-size 500000
# Configure via environment variables
export WHISPER_AUTO_COPY=true
export WHISPER_AUTO_COPY_MAX_DURATION=3
recognize -m base.en
# Configure via config file
recognize config set auto_copy_enabled true
recognize config set auto_copy_max_duration_hours 1
recognize -m base.en# List downloaded models with details
recognize --list-downloaded
# Show storage usage and get cleanup suggestions
recognize --show-storage
# Delete specific model to free space
recognize --delete-model medium.en
# Clean up orphaned files
recognize --cleanup
# Delete all models (nuclear option)
recognize --delete-all-models# Enable speaker segmentation with tdrz model
recognize -m small.en-tdrz --tinydiarize
# Speaker segmentation with VAD mode for meetings
recognize -m small.en-tdrz --tinydiarize --step 0 --length 30000
# Save speaker-segmented transcription to file
recognize -m small.en-tdrz --tinydiarize -f meeting_transcript.txt
# Configure speaker segmentation as default
recognize config set tinydiarize true
recognize config set model small.en-tdrz# Export meeting transcript to Markdown
recognize -m base.en --export --export-format md --export-file meeting_notes.md
# Export with confidence scores for analysis
recognize -m base.en --export --export-format json --export-include-confidence
# Generate SRT subtitles for video
recognize -m base.en --export --export-format srt --export-file video_subtitles.srt
# Quick text export with auto-naming
recognize -m base.en --export --export-format txt
# Clean CSV export for data processing
recognize -m base.en --export --export-format csv --export-no-metadata
# Configure default export settings
recognize config set export_enabled true
recognize config set export_format json
recognize config set export_include_confidence true
recognize -m base.en # Will automatically export to JSON with confidence scores# Basic meeting transcription with AI-powered organization
recognize --meeting
# Team standup with English translation and speaker segmentation
recognize --meeting --output-mode english --tinydiarize -m small.en-tdrz
# Client meeting with high-quality model and bilingual output
recognize --meeting --output-mode bilingual -m medium -l auto
# Meeting with custom prompt for specialized format
recognize --meeting --prompt ~/custom-meeting-prompt.txt
# Configure meeting mode as default
recognize config set meeting_mode true
recognize -m base.en # Will automatically organize meetings
# Combined meeting and export
recognize --meeting --export --export-format jsonMeeting Organization Workflow:
- Recording: Transcribe meeting with any existing features (VAD, speaker segmentation, translation)
- Processing: When recording ends (Ctrl-C), automatically sends transcription to Claude CLI
- Organization: AI structures the raw transcription into professional meeting summary
- Output: Saves to
[YYYY]-[MM]-[DD].mdwith structured content and raw transcription in HTML comments - Fallback: If Claude CLI unavailable, saves raw transcription to same date-based file
Meeting Output Includes:
- Meeting metadata (title, date, attendees, duration)
- Executive summary with key outcomes
- Detailed discussion topics
- Action items tracker with owners and deadlines
- Key decisions log with rationale
- Open issues and follow-up requirements
- Quality improvement notes
- Original raw transcription in HTML comments
<!-- -->(when AI processing succeeds)
The CLI automatically downloads models when needed. Available models:
tiny.en(39 MB) - Fastest processing, lower accuracybase.en(148 MB) - Good balance of speed and accuracysmall.en(488 MB) - Higher accuracy than basemedium.en(1.5 GB) - Very high accuracy, slowerlarge(3.1 GB) - Highest accuracy, slowest
tiny(39 MB) - Fastest, 99 languages, lower accuracybase(148 MB) - Good balance, 99 languagessmall(488 MB) - Higher accuracy, 99 languagesmedium(1.5 GB) - Very high accuracy, 99 languageslarge-v3(3.1 GB) - Highest accuracy, 99 languages
View all available models:
make list-models- TUTORIAL.md - Comprehensive usage guide with examples
- README.md - This file (quick reference)
- Run
make help- Show all Makefile commands
- Ensure SDL2 is installed:
brew install sdl2 - Verify CMake version:
cmake --version - Clean build:
rm -rf build && ./build.sh
- Check microphone permissions in System Preferences > Security & Privacy
- Verify model file exists and is not corrupted
- Try different audio devices with
-cflag - Adjust VAD threshold with
-vthif speech detection is poor
- Enable CoreML with
--coreml(should be default) - Use smaller model (tiny.en vs base.en)
- Adjust thread count with
-t - Try VAD mode with
--step 0