AI-Powered Speech-to-Text with Local Processing
VoiceTransor is a desktop application that converts audio to text using OpenAI's Whisper model, with optional AI text processing powered by Ollama. Everything runs locally on your computer - your data never leaves your machine.
中文版说明 (Chinese README) | Developer Guide
- Accurate Speech Recognition - Powered by OpenAI Whisper
- GPU Acceleration - Automatic CUDA/MPS detection for faster processing
- AI Text Processing - Summarize, translate, or process transcripts with Ollama
- Multiple Export Formats - Save as TXT or PDF
- Privacy First - All processing happens locally
- Cross-Platform - Windows, macOS, and Linux support
- Multiple Languages - Supports 100 languages
Minimum:
- Windows 10 / macOS 10.15 / Linux
- 8GB RAM
- 5GB free disk space
Recommended for GPU Acceleration:
- NVIDIA GPU (GTX 900 series or newer)
- Driver version >= 525.60 (for CUDA support)
Note: GPU is optional - the app automatically detects your hardware and falls back to CPU if needed.
Windows:
- Download
VoiceTransor-v0.9.0-Windows-x64-Setup.exefrom Releases - Run the installer and follow the setup wizard
- Launch VoiceTransor from the Start Menu or Desktop shortcut
macOS / Linux:
- Coming soon - Currently only Windows installer is available
- For development setup, see Developer Guide
Important: You also need to install FFmpeg (see below).
VoiceTransor needs FFmpeg for audio processing.
Windows:
- Download: https://www.gyan.dev/ffmpeg/builds/
- Choose "ffmpeg-release-essentials.zip"
- Extract and add to PATH (How?)
macOS:
brew install ffmpegLinux:
sudo apt install ffmpeg # Ubuntu/Debian- Launch VoiceTransor
- Click "Import Audio" (supports WAV, MP3, M4A, FLAC, etc.)
- Click "Transcribe to Text"
- Choose settings:
- Model:
base(recommended for most users) - Device:
auto(automatically uses GPU if available) - Language: Auto-detect or select specific language
- Model:
- Wait for transcription (first time downloads the model ~140MB)
- Export as TXT or process with AI
Ollama enables AI-powered text processing (summarize, translate, etc.) - completely offline.
Installation:
- Download from: https://ollama.com/download
- Install and run
ollama serve - Pull a model:
ollama pull llama3.1:8b # English ollama pull qwen2.5:7b # Chinese/English
Note: Ollama works on both CPU and GPU - no special setup needed.
Transcription Speed (1 hour audio):
| Hardware | Time |
|---|---|
| CPU (8-core) | ~30-60 min |
| NVIDIA RTX 3060 | ~2-5 min |
| Apple M1 Pro | ~3-6 min |
GPU Compatibility:
- ✅ Single universal build - works on both GPU and CPU systems
- ✅ Automatic detection - no manual configuration needed
- ✅ NVIDIA GPUs (GTX 900+, RTX series) - uses CUDA acceleration
- ✅ Apple Silicon (M1/M2/M3) - uses Metal Performance Shaders
- ✅ Graceful fallback to CPU if GPU unavailable
- ℹ️ No separate CPU/GPU versions - one installer works for everyone
- User Guide - Detailed usage instructions
- Installation Guide - Step-by-step installation
- Build Instructions - For developers
- 中文文档 - Chinese documentation
"ffprobe not found"
- Install FFmpeg and ensure it's in your PATH
Slow transcription
- Use smaller model (
tinyorbase) - Check Device setting is
autoorcuda - Ensure GPU drivers are up to date
"Ollama is not running"
- Open terminal and run
ollama serve
For more help, see USER_GUIDE.md
MIT License. See LICENSE for details.
- GitHub Issues: https://github.com/leonshen/VoiceTransor/issues
- Email: voicetransor@gmail.com
Made with ❤️ using OpenAI Whisper and Ollama