A Wispr Flow offline alternative for Android
Fully local voice dictation with advanced AI post-processing
| Feature | Description |
|---|---|
| π’ Sequential Digit Conversion | "one two three four five" β 12345 (Wispr Flow-style) |
| π° Enhanced Currency Formatting | "thirty million US dollars" β $30 million USD |
| π§ Offline LLM Activation Fixed | Settings toggle now properly activates offline processing |
| π₯ Fixed Model Download URL | Qwen3 ~405MB model downloads correctly from Hugging Face |
| π Improved Post-Processing | Filler removal, grammar fixes, smart punctuation |
| Feature | Description |
|---|---|
| π 100% Offline | No internet required β all processing on-device |
| β‘ Fast Transcription | NVIDIA Parakeet TDT 0.6B model (int8 quantized) |
| π§ Offline LLM | Qwen3 0.6B (~405MB) for AI post-processing without internet |
| π’ Smart Numbers | "one two three four" β 1234 (phone numbers, IDs) |
| π° Currency Formatting | "$30 million USD", "25%", "$100" |
| π― Course Correction | "No wait, I mean..." β Clean, corrected output |
| π£οΈ Voice Commands | "Period", "comma", "new line", "delete that" |
| π Personal Dictionary | FUTO-style custom word replacements |
| β±οΈ 30-Second Timer | Optional auto-stop after 30 seconds |
| π Silence Detection | Auto-stop when you stop speaking |
| π Universal Injection | Works with any app via Accessibility Service |
| Keyboard | Status |
|---|---|
| HeliBoard | β Tested & Working |
| SwiftKey | β Tested & Working |
| OpenBoard | π Should work |
| FlorisBoard | π Should work |
| AnySoftKeyboard | π Should work |
Note: Only HeliBoard and SwiftKey have been tested. Other open-source keyboards with voice input support should be compatible.
- Download
VoiceAI-v1.2.1.apkfrom Releases - Install on your Android device
- Enable in Settings β Language & Input β Keyboards
- Enable Accessibility Service for text injection
- Grant microphone permission
- Open any text field in any app
- Tap the microphone button on your keyboard
- Speak naturally β use voice commands if needed
- Tap screen or wait for auto-stop
| You Say | VoiceAI Outputs |
|---|---|
| "Let's meet tomorrow no wait let's do Friday" | Let's do Friday. |
| "I think um actually never mind I mean yes" | Yes. |
| "Send to John no sorry to Mike" | Send to Mike. |
| You Say | VoiceAI Does |
|---|---|
| "Hello comma how are you question mark" | Hello, how are you? |
| "New paragraph" | Inserts paragraph break |
| "Delete that" | Removes last dictation |
| You Say | VoiceAI Outputs |
|---|---|
| "twenty five percent" | 25% |
| "thirty million US dollars" | $30 million USD |
| "one hundred dollars" | $100 |
| "microphone testing one two three four" | Microphone testing 1234 |
| "twenty twenty four" | 2024 |
| "four twenty pm" | 4:20 PM |
| "uh so i was thinking um" | So, I was thinking |
Access via VoiceAI app β Open Settings:
| Setting | Description |
|---|---|
| β±οΈ 30-Second Limit | Auto-stop dictation after 30 seconds |
| π Silence Detection | Auto-stop when you stop speaking |
| π Personal Dictionary | Add custom words (e.g., @Groq, ChatGPT, Anthropic) |
| π§ Offline LLM | Enable on-device AI post-processing |
| π Groq API Key | Optional cloud LLM for enhanced formatting |
VoiceAI offers two AI processing modes for intelligent text formatting:
Fully private, no internet required
| Model | Size | Source |
|---|---|---|
| Qwen3 0.6B Q4 | ~405 MB | Hugging Face |
Features:
- β Filler word removal ("um", "uh", "like")
- β Grammar corrections (contractions, "i" β "I")
- β Smart punctuation and question detection
- β Sequential digit conversion ("one two three" β "123")
- β Currency formatting ("$30 million USD")
Setup: Settings β Offline Processing β Download (~405 MB, one-time)
Faster, more accurate, requires internet
| Provider | Model | Speed |
|---|---|---|
| Groq | Llama 3.1 70B | ~500ms |
Setup:
- Get free API key at console.groq.com/keys
- Paste in Settings β API Key
π‘ Tip: Use Offline LLM for privacy, Groq API for best quality
VoiceAIPipeline (Chain of Responsibility)
βββ CommandInterpreter # Voice commands first
βββ CourseCorrector # "No wait" handling
βββ RepetitionCleaner # Stutter removal
βββ PersonalDictionary # Custom words
βββ FillerRemover # "Uh", "um" removal
βββ NumberNormalizer # "25" from "twenty five"
βββ PunctuationRestorer # Add periods, commas
βββ CasingApplicator # Proper nouns
- Rust +
eguiβ Native Android UI - Java β Activities, Services, Accessibility
| Component | Model | Size |
|---|---|---|
| Speech-to-Text | NVIDIA Parakeet TDT 0.6B (int8) | ~470 MB |
| Offline LLM | Qwen3 0.6B Q4_K_XL | ~405 MB |
| Cloud LLM | Groq Llama 3.1 70B | API |
- Cargo β Rust package manager
- Android SDK/NDK β Native compilation
- ONNX Runtime β Neural network inference
# Clone
git clone https://github.com/ai-dev-2024/VoiceAI.git
cd VoiceAI
# Download model files (required)
# From: https://huggingface.co/nvidia/parakeet-tdt-0.6b
# Place in: assets/parakeet-tdt-0.6b-v3-int8/
# Build (Windows PowerShell)
./build.ps1
# Install
adb install -r VoiceAI-v1.2.1.apkRequirements:
- Android SDK (API 36)
- Android NDK 28
- Rust toolchain with
aarch64-linux-androidtarget - Parakeet TDT 0.6B model files (~600MB)
| Project | Contribution |
|---|---|
| transcribe-rs | Core ASR Rust library |
| FUTO Voice Input | Personal dictionary inspiration |
| Wispr Flow | Course correction concept |
| NVIDIA NeMo | Parakeet TDT speech model |
| Microsoft Phi-2 | On-device LLM (planned) |
If you find VoiceAI useful, consider supporting the development:
MIT License β See LICENSE for details.
VoiceAI β Voice dictation, reimagined for Android.
Offline. Private. Fast.
β Star this repo if you find it useful! β


