Skip to content

Jakedoes1111/dictate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dictation Tool

A minimal, background dictation tool for Windows 11 that runs in the system tray. Uses faster-whisper for CPU-based transcription and automatically pastes transcripts into the active application.

🚀 Quick Start for AI Assistants

Prerequisites

  • Python 3.11 or 3.12 (Python 3.14 is not yet fully supported by all dependencies)
  • Windows 11
  • PowerShell (for setup script)

Installation & Setup

# Clone the repository
git clone https://github.com/Jakedoes1111/dictate.git
cd dictate

# Run the setup script (handles everything automatically)
.\run.ps1

Manual Setup

python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
python dictation_tray.py

🎯 Features

  • System Tray Only: No visible window, runs in background
  • Two Modes:
    • Push-to-Talk: Hold hotkey to record, release to transcribe
    • Toggle: Press hotkey to start/stop recording
  • Auto-Paste: Transcripts automatically paste into the active text field
  • Configurable: Change hotkeys, model, and settings via Preferences
  • Low Memory: Optimized for systems with 8 GB RAM

⌨️ Default Hotkeys

  • Right-Alt (Alt Gr): Push-to-Talk (hold to record)
  • Ctrl+Alt+D: Toggle Dictation mode
  • Ctrl+Alt+Q: Quit application
  • Ctrl+Alt+P: Open Preferences

🔧 Configuration

Settings are stored in config.toml in the application directory. You can edit this file directly or use the Preferences dialog.

Available Models

Model Size RAM Usage Speed Accuracy
tiny.en ~75 MB <1 GB ⚡ Fastest ⭐⭐ Low
base.en ~140 MB <1 GB ⚡ Fast ⭐⭐⭐ Good
small.en ~500 MB <2 GB 🐢 Medium ⭐⭐⭐⭐ Best

Environment Variables

You can override settings via environment variables:

  • DICTATION_MODEL: Model name (e.g., base.en)
  • DICTATION_COMPUTE_TYPE: Compute type (e.g., int8)
  • DICTATION_MODE: Mode (push_to_talk or toggle)
  • DICTATION_HOTKEY_PTT: Push-to-Talk hotkey
  • DICTATION_HOTKEY_TOGGLE: Toggle hotkey
  • DICTATION_HOTKEY_QUIT: Quit hotkey
  • DICTATION_HOTKEY_PREFS: Preferences hotkey

🐛 Troubleshooting

Global Hotkeys Not Working

  1. Run PowerShell as Administrator
  2. Check if another application is using the same hotkeys
  3. Restart the application after changing hotkeys

Model Download Issues

  1. Check internet connection
  2. Ensure sufficient disk space (models are 75-500 MB)
  3. Models are cached in ~/.cache/huggingface/ or similar

Memory Issues

  1. Switch to tiny.en or base.en model in Preferences
  2. Ensure no other heavy applications are running
  3. Close and restart the application

Audio Issues

  1. Check microphone permissions in Windows Settings
  2. Verify microphone is connected and working
  3. Check audio device in Windows Sound Settings

📊 Performance

  • Transcription Speed: ~1-3 seconds for a normal sentence
  • Memory Usage:
    • tiny.en: <1 GB peak RSS
    • base.en: <1 GB peak RSS
    • small.en: <2 GB peak RSS
  • CPU: Uses CPU-only inference (no GPU required)

🛠️ Technical Details

  • Audio Format: Mono, 16 kHz, 16-bit PCM
  • Transcription Engine: faster-whisper (CTranslate2)
  • Beam Size: 5 (balanced accuracy/speed)
  • VAD: Enabled (filters out non-speech)
  • Language: English only

📝 Development Notes for AI Assistants

Key Files

  • dictation_tray.py: Main application with DictationApp class
  • requirements.txt: Python dependencies
  • config.toml: Configuration file (auto-generated)
  • run.ps1: Setup and launch script

Architecture

  • Uses pystray for system tray integration
  • pynput for global hotkey handling
  • sounddevice for audio capture
  • faster-whisper for transcription
  • pyperclip for clipboard operations

Common Issues & Solutions

  • Audio shape error: Fixed by flattening 2D audio arrays to 1D
  • Paste timing: Added delays and fallback methods
  • Model compatibility: Uses local Python 3.12 to avoid 3.14 conflicts

📄 License

See LICENSE file (if provided) or use as needed.

🤝 Contributing

For issues or questions:

  1. Check the troubleshooting section
  2. Review config.toml settings
  3. Check application output in console (if running from terminal)

AI Assistant Note: This project is optimized for seamless AI assistance with comprehensive documentation, clear architecture, and well-structured code. All dependencies are specified and the setup process is fully automated.

About

Local dictation tool.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published