Linux voice dictation system using faster-whisper (CTranslate2) for local speech-to-text conversion and automatic text input.
This system provides real-time voice dictation capabilities by:
- Recording audio from your system's microphone.
- Converting speech to text using highly optimized local models (OpenAI Whisper & Distil-Whisper).
- Automatically typing the recognized text into any application using
pynput(withydotoolfallback). - Supporting multilingual dictation and translation (e.g., speak French -> type English).
- Fast: Uses
faster-whisperbackend with 8-bit quantization for <1s latency on modern CPUs. - SOTA Models: Supports
large-v3-turbo,distil-whisper, and standard OpenAI models. - System Tray Control: Switch models, languages, and tasks (transcribe/translate) instantly from the tray.
- Smart Typing: Robust text injection handling Unicode characters (accents, emojis).
- Audio Management: "Discard Recording" feature to cancel bad takes.
- Linux system with
systemd - Python 3.x
- Root access for installation
- Microphone
ydotool(installed automatically by script)
- Clone this repository:
git clone https://github.com/nilock/dictate
cd dictate- Run the installation script as root:
sudo ./installation.shThis will:
- Install system dependencies (
ydotool,portaudio, etc.) - Create a Python virtual environment in
/opt/dictation_venv - Set up the background daemon (
dictation.service) and tray app (dictation_tray.service) - Configure permissions for
ydotool(socket access)
- Trigger: Run
dictation(or bind it to a hotkey likeCtrl+Alt+D). - Speak: The tray icon turns Red.
- Stop: Run
dictationagain. The tray icon turns Grey (processing), then types the text.
- Tray Menu: Right-click the system tray icon to:
- Discard Recording: Cancel the current audio without typing.
- Model: Select between speed (
tiny,distil-small.en) and accuracy (large-v3-turbo). - Language: Force English (
en), French (fr), or Auto-Detect. - Task: Choose Transcribe (Input Language -> Input Language) or Translate (Input Language -> English).
- CLI: You can also control it via terminal:
dictation config --model large-v3-turbo dictation config --language fr --task translate dictation discard
Bind the command /usr/local/bin/dictation to a custom keyboard shortcut in your desktop environment (GNOME, KDE, i3, etc.).
-
Check Status:
systemctl status dictation systemctl --user status dictation_tray
-
View Logs:
- Daemon (Recording/Transcribing):
tail -f /tmp/dictation_daemon.log - Tray (Typing/UI):
journalctl --user -u dictation_tray -f
- Daemon (Recording/Transcribing):
-
Typing Issues: If text appears as numbers or gibberish (e.g.
fran242...), ensureydotoolis working or try restarting the tray service to re-attemptpynputconnection.
Configuration is stored in ~/.config/dictation/config.json.
You can edit this manually or use the CLI/Tray to update it.
faster-whispersounddevicenumpypynputpystrayydotool(System package)
GPL-3.0