Add streaming transcription functionality and improve Python version handling#31
Add streaming transcription functionality and improve Python version handling#31AlexanderMakarov wants to merge 5 commits intojakovius:mainfrom
Conversation
|
Hello @AlexanderMakarov, I tested your PR on my Omarchy 3.2.x thinkpad (T490) and have some feedback.
If you do an actions build of the this in your fork I can test it. |
|
Hi @mattsn0w, I've not tried to use While in general idea of making streaming for voxd led me to necessity to speed-up whisper.cpp and now I am making migration to https://github.com/SYSTRAN/faster-whisper which promises 4x speed for same Whisper models. Streaming requires at least 2x speed of transcribing while I don't have (proper) GPU on my laptop. Faster-whisper is a different beast but it tuned for real-time transcribing, provides embedded Python API and offers word-level timestamps which are very handy. So I first would try to implement this migration in my https://github.com/AlexanderMakarov/voxd due to I don't have proper speech-to-text on my laptop yet. |
|
@mattsn0w I've implemented the fix. BTW it is not something coming with my changes but in general behavior of the repo - installation from the packet uses different paths than And about my idea to switch on faster-whisper - I have found out that updating VOXD repo with it is not the best way and switched on simpler "Soupawhisper" repo (no UI, only notifications). Implemented streaming in my fork of it - https://github.com/AlexanderMakarov/soupawhisper Note that with streaming quality of transcription drops significantly (with Whisper models). |
Summary
This PR introduces streaming transcription functionality to VOXD, enabling real-time incremental typing as you speak. Additionally, it includes improvements to Python version handling in installation scripts (inspired by PR #15).
🎙️ Streaming Transcription Feature
Overview
VOXD now supports streaming transcription by default, which means text appears incrementally as you speak, not after recording stops. This provides a more natural and responsive voice-typing experience.
Key Features
How It Works
Implementation Details
New Components:
StreamingWhisperTranscriber(src/voxd/core/streaming_transcriber.py): Processes audio in chunks and emits incremental text updatesStreamingCoreProcessThread(src/voxd/core/streaming_core.py): Orchestrates streaming recording, transcription, and typing for GUI/tray modesConfiguration Options:
streaming_enabled: true # Enable/disable streaming mode
streaming_chunk_seconds: 3.0 # Audio chunk size in seconds
streaming_overlap_seconds: 0.5 # Overlap between chunks
streaming_emit_interval_seconds: 2.0 # Minimum time between text updates
streaming_emit_word_count: 3 # Minimum words before emitting text
streaming_typing_delay: 0.01 # Delay between typed characters
streaming_min_chars_to_type: 3 # Minimum characters before typing
Modes Supported:
voxd --rh)voxd --gui)voxd --tray)Backward Compatibility:
Streaming is enabled by default but can be disabled via config to use the traditional "record-then-transcribe" behavior.
🐍 Python Version Improvements
This PR also includes improvements from PR #15 that remove hard-coded Python version checks:
>= 3.9check, making it compatible with future Python versions automaticallyChanges:
packaging/voxd.wrapperto use version comparison (>= 3.9) instead of hard-coded version listsTesting
Tested on:
Streaming transcription works as expected, providing real-time feedback during dictation. The Python version improvements ensure compatibility with future Python releases.
Benefits
Related