A minimal, background dictation tool for Windows 11 that runs in the system tray. Uses faster-whisper for CPU-based transcription and automatically pastes transcripts into the active application.
- Python 3.11 or 3.12 (Python 3.14 is not yet fully supported by all dependencies)
- Windows 11
- PowerShell (for setup script)
# Clone the repository
git clone https://github.com/Jakedoes1111/dictate.git
cd dictate
# Run the setup script (handles everything automatically)
.\run.ps1python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
python dictation_tray.py- System Tray Only: No visible window, runs in background
- Two Modes:
- Push-to-Talk: Hold hotkey to record, release to transcribe
- Toggle: Press hotkey to start/stop recording
- Auto-Paste: Transcripts automatically paste into the active text field
- Configurable: Change hotkeys, model, and settings via Preferences
- Low Memory: Optimized for systems with 8 GB RAM
- Right-Alt (Alt Gr): Push-to-Talk (hold to record)
- Ctrl+Alt+D: Toggle Dictation mode
- Ctrl+Alt+Q: Quit application
- Ctrl+Alt+P: Open Preferences
Settings are stored in config.toml in the application directory. You can edit this file directly or use the Preferences dialog.
| Model | Size | RAM Usage | Speed | Accuracy |
|---|---|---|---|---|
tiny.en |
~75 MB | <1 GB | ⚡ Fastest | ⭐⭐ Low |
base.en |
~140 MB | <1 GB | ⚡ Fast | ⭐⭐⭐ Good |
small.en |
~500 MB | <2 GB | 🐢 Medium | ⭐⭐⭐⭐ Best |
You can override settings via environment variables:
DICTATION_MODEL: Model name (e.g.,base.en)DICTATION_COMPUTE_TYPE: Compute type (e.g.,int8)DICTATION_MODE: Mode (push_to_talkortoggle)DICTATION_HOTKEY_PTT: Push-to-Talk hotkeyDICTATION_HOTKEY_TOGGLE: Toggle hotkeyDICTATION_HOTKEY_QUIT: Quit hotkeyDICTATION_HOTKEY_PREFS: Preferences hotkey
- Run PowerShell as Administrator
- Check if another application is using the same hotkeys
- Restart the application after changing hotkeys
- Check internet connection
- Ensure sufficient disk space (models are 75-500 MB)
- Models are cached in
~/.cache/huggingface/or similar
- Switch to
tiny.enorbase.enmodel in Preferences - Ensure no other heavy applications are running
- Close and restart the application
- Check microphone permissions in Windows Settings
- Verify microphone is connected and working
- Check audio device in Windows Sound Settings
- Transcription Speed: ~1-3 seconds for a normal sentence
- Memory Usage:
tiny.en: <1 GB peak RSSbase.en: <1 GB peak RSSsmall.en: <2 GB peak RSS
- CPU: Uses CPU-only inference (no GPU required)
- Audio Format: Mono, 16 kHz, 16-bit PCM
- Transcription Engine: faster-whisper (CTranslate2)
- Beam Size: 5 (balanced accuracy/speed)
- VAD: Enabled (filters out non-speech)
- Language: English only
dictation_tray.py: Main application with DictationApp classrequirements.txt: Python dependenciesconfig.toml: Configuration file (auto-generated)run.ps1: Setup and launch script
- Uses
pystrayfor system tray integration pynputfor global hotkey handlingsounddevicefor audio capturefaster-whisperfor transcriptionpyperclipfor clipboard operations
- Audio shape error: Fixed by flattening 2D audio arrays to 1D
- Paste timing: Added delays and fallback methods
- Model compatibility: Uses local Python 3.12 to avoid 3.14 conflicts
See LICENSE file (if provided) or use as needed.
For issues or questions:
- Check the troubleshooting section
- Review
config.tomlsettings - Check application output in console (if running from terminal)
AI Assistant Note: This project is optimized for seamless AI assistance with comprehensive documentation, clear architecture, and well-structured code. All dependencies are specified and the setup process is fully automated.