Skip to content

WIspr Flow alternative on Android, runs locally with Parakeet model

License

Notifications You must be signed in to change notification settings

ai-dev-2024/VoiceAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

37 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎀 VoiceAI

VoiceAI Hero Banner

A Wispr Flow offline alternative for Android

Fully local voice dictation with advanced AI post-processing

Release Build Status Built with Antigravity License: MIT Android


πŸ–ΌοΈ Screenshots

Main Screen
Main Screen
Settings
Settings

πŸ†• What's New in v1.2.1

Feature Description
πŸ”’ Sequential Digit Conversion "one two three four five" β†’ 12345 (Wispr Flow-style)
πŸ’° Enhanced Currency Formatting "thirty million US dollars" β†’ $30 million USD
🧠 Offline LLM Activation Fixed Settings toggle now properly activates offline processing
πŸ“₯ Fixed Model Download URL Qwen3 ~405MB model downloads correctly from Hugging Face
πŸ”„ Improved Post-Processing Filler removal, grammar fixes, smart punctuation

✨ Features

Feature Description
πŸ”’ 100% Offline No internet required β€” all processing on-device
⚑ Fast Transcription NVIDIA Parakeet TDT 0.6B model (int8 quantized)
🧠 Offline LLM Qwen3 0.6B (~405MB) for AI post-processing without internet
πŸ”’ Smart Numbers "one two three four" β†’ 1234 (phone numbers, IDs)
πŸ’° Currency Formatting "$30 million USD", "25%", "$100"
🎯 Course Correction "No wait, I mean..." β†’ Clean, corrected output
πŸ—£οΈ Voice Commands "Period", "comma", "new line", "delete that"
πŸ“– Personal Dictionary FUTO-style custom word replacements
⏱️ 30-Second Timer Optional auto-stop after 30 seconds
πŸ”‡ Silence Detection Auto-stop when you stop speaking
🌐 Universal Injection Works with any app via Accessibility Service

πŸ“± Compatible Keyboards

Keyboard Status
HeliBoard βœ… Tested & Working
SwiftKey βœ… Tested & Working
OpenBoard πŸ”„ Should work
FlorisBoard πŸ”„ Should work
AnySoftKeyboard πŸ”„ Should work

Note: Only HeliBoard and SwiftKey have been tested. Other open-source keyboards with voice input support should be compatible.


πŸš€ Quick Start

Download & Install

  1. Download VoiceAI-v1.2.1.apk from Releases
  2. Install on your Android device
  3. Enable in Settings β†’ Language & Input β†’ Keyboards
  4. Enable Accessibility Service for text injection
  5. Grant microphone permission

Usage

  1. Open any text field in any app
  2. Tap the microphone button on your keyboard
  3. Speak naturally β€” use voice commands if needed
  4. Tap screen or wait for auto-stop

🎯 Post-Processing Examples

Course Correction (NEW!)

You Say VoiceAI Outputs
"Let's meet tomorrow no wait let's do Friday" Let's do Friday.
"I think um actually never mind I mean yes" Yes.
"Send to John no sorry to Mike" Send to Mike.

Voice Commands (NEW!)

You Say VoiceAI Does
"Hello comma how are you question mark" Hello, how are you?
"New paragraph" Inserts paragraph break
"Delete that" Removes last dictation

Smart Formatting

You Say VoiceAI Outputs
"twenty five percent" 25%
"thirty million US dollars" $30 million USD
"one hundred dollars" $100
"microphone testing one two three four" Microphone testing 1234
"twenty twenty four" 2024
"four twenty pm" 4:20 PM
"uh so i was thinking um" So, I was thinking

βš™οΈ Settings

Access via VoiceAI app β†’ Open Settings:

Setting Description
⏱️ 30-Second Limit Auto-stop dictation after 30 seconds
πŸ”‡ Silence Detection Auto-stop when you stop speaking
πŸ“– Personal Dictionary Add custom words (e.g., @Groq, ChatGPT, Anthropic)
🧠 Offline LLM Enable on-device AI post-processing
πŸ”‘ Groq API Key Optional cloud LLM for enhanced formatting

🧠 AI Processing Options

VoiceAI offers two AI processing modes for intelligent text formatting:

Option 1: Offline LLM (Recommended) πŸ”’

Fully private, no internet required

Model Size Source
Qwen3 0.6B Q4 ~405 MB Hugging Face

Features:

  • βœ… Filler word removal ("um", "uh", "like")
  • βœ… Grammar corrections (contractions, "i" β†’ "I")
  • βœ… Smart punctuation and question detection
  • βœ… Sequential digit conversion ("one two three" β†’ "123")
  • βœ… Currency formatting ("$30 million USD")

Setup: Settings β†’ Offline Processing β†’ Download (~405 MB, one-time)

Option 2: Groq API (Cloud) ☁️

Faster, more accurate, requires internet

Provider Model Speed
Groq Llama 3.1 70B ~500ms

Setup:

  1. Get free API key at console.groq.com/keys
  2. Paste in Settings β†’ API Key

πŸ’‘ Tip: Use Offline LLM for privacy, Groq API for best quality


πŸ—οΈ Architecture

VoiceAIPipeline (Chain of Responsibility)
β”œβ”€β”€ CommandInterpreter     # Voice commands first
β”œβ”€β”€ CourseCorrector        # "No wait" handling
β”œβ”€β”€ RepetitionCleaner      # Stutter removal
β”œβ”€β”€ PersonalDictionary     # Custom words
β”œβ”€β”€ FillerRemover          # "Uh", "um" removal
β”œβ”€β”€ NumberNormalizer       # "25" from "twenty five"
β”œβ”€β”€ PunctuationRestorer    # Add periods, commas
└── CasingApplicator       # Proper nouns

πŸ› οΈ Tech Stack

Frontend

  • Rust + egui β€” Native Android UI
  • Java β€” Activities, Services, Accessibility

AI Models

Component Model Size
Speech-to-Text NVIDIA Parakeet TDT 0.6B (int8) ~470 MB
Offline LLM Qwen3 0.6B Q4_K_XL ~405 MB
Cloud LLM Groq Llama 3.1 70B API

Build

  • Cargo β€” Rust package manager
  • Android SDK/NDK β€” Native compilation
  • ONNX Runtime β€” Neural network inference

πŸ—οΈ Building from Source

# Clone
git clone https://github.com/ai-dev-2024/VoiceAI.git
cd VoiceAI

# Download model files (required)
# From: https://huggingface.co/nvidia/parakeet-tdt-0.6b
# Place in: assets/parakeet-tdt-0.6b-v3-int8/

# Build (Windows PowerShell)
./build.ps1

# Install
adb install -r VoiceAI-v1.2.1.apk

Requirements:

  • Android SDK (API 36)
  • Android NDK 28
  • Rust toolchain with aarch64-linux-android target
  • Parakeet TDT 0.6B model files (~600MB)

πŸ™ Credits & Acknowledgments

Project Contribution
transcribe-rs Core ASR Rust library
FUTO Voice Input Personal dictionary inspiration
Wispr Flow Course correction concept
NVIDIA NeMo Parakeet TDT speech model
Microsoft Phi-2 On-device LLM (planned)

β˜• Support

If you find VoiceAI useful, consider supporting the development:

Ko-fi


πŸ“„ License

MIT License β€” See LICENSE for details.


VoiceAI β€” Voice dictation, reimagined for Android.

Offline. Private. Fast.

⭐ Star this repo if you find it useful! ⭐

About

WIspr Flow alternative on Android, runs locally with Parakeet model

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •