Dictation Tool

A minimal, background dictation tool for Windows 11 that runs in the system tray. Uses faster-whisper for CPU-based transcription and automatically pastes transcripts into the active application.

🚀 Quick Start for AI Assistants

Prerequisites

Python 3.11 or 3.12 (Python 3.14 is not yet fully supported by all dependencies)
Windows 11
PowerShell (for setup script)

Installation & Setup

# Clone the repository
git clone https://github.com/Jakedoes1111/dictate.git
cd dictate

# Run the setup script (handles everything automatically)
.\run.ps1

Manual Setup

python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
python dictation_tray.py

🎯 Features

System Tray Only: No visible window, runs in background
Two Modes:
- Push-to-Talk: Hold hotkey to record, release to transcribe
- Toggle: Press hotkey to start/stop recording
Auto-Paste: Transcripts automatically paste into the active text field
Configurable: Change hotkeys, model, and settings via Preferences
Low Memory: Optimized for systems with 8 GB RAM

⌨️ Default Hotkeys

Right-Alt (Alt Gr): Push-to-Talk (hold to record)
Ctrl+Alt+D: Toggle Dictation mode
Ctrl+Alt+Q: Quit application
Ctrl+Alt+P: Open Preferences

🔧 Configuration

Settings are stored in config.toml in the application directory. You can edit this file directly or use the Preferences dialog.

Available Models

Model	Size	RAM Usage	Speed	Accuracy
`tiny.en`	~75 MB	<1 GB	⚡ Fastest	⭐⭐ Low
`base.en`	~140 MB	<1 GB	⚡ Fast	⭐⭐⭐ Good
`small.en`	~500 MB	<2 GB	🐢 Medium	⭐⭐⭐⭐ Best

Environment Variables

You can override settings via environment variables:

DICTATION_MODEL: Model name (e.g., base.en)
DICTATION_COMPUTE_TYPE: Compute type (e.g., int8)
DICTATION_MODE: Mode (push_to_talk or toggle)
DICTATION_HOTKEY_PTT: Push-to-Talk hotkey
DICTATION_HOTKEY_TOGGLE: Toggle hotkey
DICTATION_HOTKEY_QUIT: Quit hotkey
DICTATION_HOTKEY_PREFS: Preferences hotkey

🐛 Troubleshooting

Global Hotkeys Not Working

Run PowerShell as Administrator
Check if another application is using the same hotkeys
Restart the application after changing hotkeys

Model Download Issues

Check internet connection
Ensure sufficient disk space (models are 75-500 MB)
Models are cached in ~/.cache/huggingface/ or similar

Memory Issues

Switch to tiny.en or base.en model in Preferences
Ensure no other heavy applications are running
Close and restart the application

Audio Issues

Check microphone permissions in Windows Settings
Verify microphone is connected and working
Check audio device in Windows Sound Settings

📊 Performance

Transcription Speed: ~1-3 seconds for a normal sentence
Memory Usage:
- tiny.en: <1 GB peak RSS
- base.en: <1 GB peak RSS
- small.en: <2 GB peak RSS
CPU: Uses CPU-only inference (no GPU required)

🛠️ Technical Details

Audio Format: Mono, 16 kHz, 16-bit PCM
Transcription Engine: faster-whisper (CTranslate2)
Beam Size: 5 (balanced accuracy/speed)
VAD: Enabled (filters out non-speech)
Language: English only

📝 Development Notes for AI Assistants

Key Files

dictation_tray.py: Main application with DictationApp class
requirements.txt: Python dependencies
config.toml: Configuration file (auto-generated)
run.ps1: Setup and launch script

Architecture

Uses pystray for system tray integration
pynput for global hotkey handling
sounddevice for audio capture
faster-whisper for transcription
pyperclip for clipboard operations

Common Issues & Solutions

Audio shape error: Fixed by flattening 2D audio arrays to 1D
Paste timing: Added delays and fallback methods
Model compatibility: Uses local Python 3.12 to avoid 3.14 conflicts

📄 License

See LICENSE file (if provided) or use as needed.

🤝 Contributing

For issues or questions:

Check the troubleshooting section
Review config.toml settings
Check application output in console (if running from terminal)

AI Assistant Note: This project is optimized for seamless AI assistance with comprehensive documentation, clear architecture, and well-structured code. All dependencies are specified and the setup process is fully automated.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
docs		docs
.gitignore		.gitignore
LOCAL_PYTHON_SETUP.md		LOCAL_PYTHON_SETUP.md
QUICK_START.md		QUICK_START.md
README.md		README.md
SETUP_COMPLETE.md		SETUP_COMPLETE.md
SETUP_NOTES.md		SETUP_NOTES.md
STARTUP_SETUP.md		STARTUP_SETUP.md
TROUBLESHOOTING.md		TROUBLESHOOTING.md
debug_audio.wav		debug_audio.wav
dictation.log		dictation.log
dictation_tray.py		dictation_tray.py
requirements.txt		requirements.txt
run.ps1		run.ps1
start_dictation.bat		start_dictation.bat
start_dictation.vbs		start_dictation.vbs
stop_dictation.bat		stop_dictation.bat
test_app.py		test_app.py
test_final.py		test_final.py
test_signal.wav		test_signal.wav
test_transcription.py		test_transcription.py
test_transcription2.py		test_transcription2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dictation Tool

🚀 Quick Start for AI Assistants

Prerequisites

Installation & Setup

Manual Setup

🎯 Features

⌨️ Default Hotkeys

🔧 Configuration

Available Models

Environment Variables

🐛 Troubleshooting

Global Hotkeys Not Working

Model Download Issues

Memory Issues

Audio Issues

📊 Performance

🛠️ Technical Details

📝 Development Notes for AI Assistants

Key Files

Architecture

Common Issues & Solutions

📄 License

🤝 Contributing

About

Uh oh!

Releases

Packages

Languages

Jakedoes1111/dictate

Folders and files

Latest commit

History

Repository files navigation

Dictation Tool

🚀 Quick Start for AI Assistants

Prerequisites

Installation & Setup

Manual Setup

🎯 Features

⌨️ Default Hotkeys

🔧 Configuration

Available Models

Environment Variables

🐛 Troubleshooting

Global Hotkeys Not Working

Model Download Issues

Memory Issues

Audio Issues

📊 Performance

🛠️ Technical Details

📝 Development Notes for AI Assistants

Key Files

Architecture

Common Issues & Solutions

📄 License

🤝 Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages