🎤 VoiceAI

A Wispr Flow offline alternative for Android

Fully local voice dictation with advanced AI post-processing

🖼️ Screenshots

Main Screen

Settings

🆕 What's New in v1.2.1

Feature	Description
🔢 Sequential Digit Conversion	"one two three four five" → `12345` (Wispr Flow-style)
💰 Enhanced Currency Formatting	"thirty million US dollars" → `$30 million USD`
🧠 Offline LLM Activation Fixed	Settings toggle now properly activates offline processing
📥 Fixed Model Download URL	Qwen3 ~405MB model downloads correctly from Hugging Face
🔄 Improved Post-Processing	Filler removal, grammar fixes, smart punctuation

✨ Features

Feature	Description
🔒 100% Offline	No internet required — all processing on-device
⚡ Fast Transcription	NVIDIA Parakeet TDT 0.6B model (int8 quantized)
🧠 Offline LLM	Qwen3 0.6B (~405MB) for AI post-processing without internet
🔢 Smart Numbers	"one two three four" → `1234` (phone numbers, IDs)
💰 Currency Formatting	"$30 million USD", "25%", "$100"
🎯 Course Correction	"No wait, I mean..." → Clean, corrected output
🗣️ Voice Commands	"Period", "comma", "new line", "delete that"
📖 Personal Dictionary	FUTO-style custom word replacements
⏱️ 30-Second Timer	Optional auto-stop after 30 seconds
🔇 Silence Detection	Auto-stop when you stop speaking
🌐 Universal Injection	Works with any app via Accessibility Service

📱 Compatible Keyboards

Keyboard	Status
HeliBoard	✅ Tested & Working
SwiftKey	✅ Tested & Working
OpenBoard	🔄 Should work
FlorisBoard	🔄 Should work
AnySoftKeyboard	🔄 Should work

Note: Only HeliBoard and SwiftKey have been tested. Other open-source keyboards with voice input support should be compatible.

🚀 Quick Start

Download & Install

Download VoiceAI-v1.2.1.apk from Releases
Install on your Android device
Enable in Settings → Language & Input → Keyboards
Enable Accessibility Service for text injection
Grant microphone permission

Usage

Open any text field in any app
Tap the microphone button on your keyboard
Speak naturally — use voice commands if needed
Tap screen or wait for auto-stop

🎯 Post-Processing Examples

Course Correction (NEW!)

You Say	VoiceAI Outputs
"Let's meet tomorrow no wait let's do Friday"	Let's do Friday.
"I think um actually never mind I mean yes"	Yes.
"Send to John no sorry to Mike"	Send to Mike.

Voice Commands (NEW!)

You Say	VoiceAI Does
"Hello comma how are you question mark"	Hello, how are you?
"New paragraph"	Inserts paragraph break
"Delete that"	Removes last dictation

Smart Formatting

You Say	VoiceAI Outputs
"twenty five percent"	25%
"thirty million US dollars"	$30 million USD
"one hundred dollars"	$100
"microphone testing one two three four"	Microphone testing 1234
"twenty twenty four"	2024
"four twenty pm"	4:20 PM
"uh so i was thinking um"	So, I was thinking

⚙️ Settings

Access via VoiceAI app → Open Settings:

Setting	Description
⏱️ 30-Second Limit	Auto-stop dictation after 30 seconds
🔇 Silence Detection	Auto-stop when you stop speaking
📖 Personal Dictionary	Add custom words (e.g., `@Groq, ChatGPT, Anthropic`)
🧠 Offline LLM	Enable on-device AI post-processing
🔑 Groq API Key	Optional cloud LLM for enhanced formatting

🧠 AI Processing Options

VoiceAI offers two AI processing modes for intelligent text formatting:

Option 1: Offline LLM (Recommended) 🔒

Fully private, no internet required

Model	Size	Source
Qwen3 0.6B Q4	~405 MB	Hugging Face

Features:

✅ Filler word removal ("um", "uh", "like")
✅ Grammar corrections (contractions, "i" → "I")
✅ Smart punctuation and question detection
✅ Sequential digit conversion ("one two three" → "123")
✅ Currency formatting ("$30 million USD")

Setup: Settings → Offline Processing → Download (~405 MB, one-time)

Option 2: Groq API (Cloud) ☁️

Faster, more accurate, requires internet

Provider	Model	Speed
Groq	Llama 3.1 70B	~500ms

Setup:

Get free API key at console.groq.com/keys
Paste in Settings → API Key

💡 Tip: Use Offline LLM for privacy, Groq API for best quality

🏗️ Architecture

VoiceAIPipeline (Chain of Responsibility)
├── CommandInterpreter     # Voice commands first
├── CourseCorrector        # "No wait" handling
├── RepetitionCleaner      # Stutter removal
├── PersonalDictionary     # Custom words
├── FillerRemover          # "Uh", "um" removal
├── NumberNormalizer       # "25" from "twenty five"
├── PunctuationRestorer    # Add periods, commas
└── CasingApplicator       # Proper nouns

🛠️ Tech Stack

Frontend

Rust + egui — Native Android UI
Java — Activities, Services, Accessibility

AI Models

Component	Model	Size
Speech-to-Text	NVIDIA Parakeet TDT 0.6B (int8)	~470 MB
Offline LLM	Qwen3 0.6B Q4_K_XL	~405 MB
Cloud LLM	Groq Llama 3.1 70B	API

Build

Cargo — Rust package manager
Android SDK/NDK — Native compilation
ONNX Runtime — Neural network inference

🏗️ Building from Source

# Clone
git clone https://github.com/ai-dev-2024/VoiceAI.git
cd VoiceAI

# Download model files (required)
# From: https://huggingface.co/nvidia/parakeet-tdt-0.6b
# Place in: assets/parakeet-tdt-0.6b-v3-int8/

# Build (Windows PowerShell)
./build.ps1

# Install
adb install -r VoiceAI-v1.2.1.apk

Requirements:

Android SDK (API 36)
Android NDK 28
Rust toolchain with aarch64-linux-android target
Parakeet TDT 0.6B model files (~600MB)

🙏 Credits & Acknowledgments

Project	Contribution
transcribe-rs	Core ASR Rust library
FUTO Voice Input	Personal dictionary inspiration
Wispr Flow	Course correction concept
NVIDIA NeMo	Parakeet TDT speech model
Microsoft Phi-2	On-device LLM (planned)

☕ Support

If you find VoiceAI useful, consider supporting the development:

📄 License

MIT License — See LICENSE for details.

VoiceAI — Voice dictation, reimagined for Android.

Offline. Private. Fast.

⭐ Star this repo if you find it useful! ⭐

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.cargo		.cargo
.github/workflows		.github/workflows
assets/parakeet-tdt-0.6b-v3-int8		assets/parakeet-tdt-0.6b-v3-int8
docs/images		docs/images
jniLibs/arm64-v8a		jniLibs/arm64-v8a
libs		libs
res		res
screenshots		screenshots
src		src
transcribe-rs		transcribe-rs
.gitignore		.gitignore
AndroidManifest.xml		AndroidManifest.xml
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
VoiceAI-v1.0.1.apk.idsig		VoiceAI-v1.0.1.apk.idsig
VoiceAI.code-workspace		VoiceAI.code-workspace
build.ps1		build.ps1
build.sh		build.sh
build_aab.sh		build_aab.sh
create_icon.py		create_icon.py
icon.png		icon.png
run_tests.ps1		run_tests.ps1
test_e2e.ps1		test_e2e.ps1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎤 VoiceAI

🖼️ Screenshots

🆕 What's New in v1.2.1

✨ Features

📱 Compatible Keyboards

🚀 Quick Start

Download & Install

Usage

🎯 Post-Processing Examples

Course Correction (NEW!)

Voice Commands (NEW!)

Smart Formatting

⚙️ Settings

🧠 AI Processing Options

Option 1: Offline LLM (Recommended) 🔒

Option 2: Groq API (Cloud) ☁️

🏗️ Architecture

🛠️ Tech Stack

Frontend

AI Models

Build

🏗️ Building from Source

🙏 Credits & Acknowledgments

☕ Support

📄 License

About

Uh oh!

Releases 2

Packages

Contributors 2

Uh oh!

Languages

License

ai-dev-2024/VoiceAI

Folders and files

Latest commit

History

Repository files navigation

🎤 VoiceAI

🖼️ Screenshots

🆕 What's New in v1.2.1

✨ Features

📱 Compatible Keyboards

🚀 Quick Start

Download & Install

Usage

🎯 Post-Processing Examples

Course Correction (NEW!)

Voice Commands (NEW!)

Smart Formatting

⚙️ Settings

🧠 AI Processing Options

Option 1: Offline LLM (Recommended) 🔒

Option 2: Groq API (Cloud) ☁️

🏗️ Architecture

🛠️ Tech Stack

Frontend

AI Models

Build

🏗️ Building from Source

🙏 Credits & Acknowledgments

☕ Support

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Uh oh!

Languages

Packages