Skip to content

AI-powered speech-to-text desktop app using Whisper and Ollama. 100% local processing, privacy-first.

License

Notifications You must be signed in to change notification settings

leonshen/VoiceTransor

Repository files navigation

VoiceTransor

Release License Platform Python

AI-Powered Speech-to-Text with Local Processing

VoiceTransor is a desktop application that converts audio to text using OpenAI's Whisper model, with optional AI text processing powered by Ollama. Everything runs locally on your computer - your data never leaves your machine.

中文版说明 (Chinese README) | Developer Guide

Features

  • Accurate Speech Recognition - Powered by OpenAI Whisper
  • GPU Acceleration - Automatic CUDA/MPS detection for faster processing
  • AI Text Processing - Summarize, translate, or process transcripts with Ollama
  • Multiple Export Formats - Save as TXT or PDF
  • Privacy First - All processing happens locally
  • Cross-Platform - Windows, macOS, and Linux support
  • Multiple Languages - Supports 100 languages

Quick Start

1. System Requirements

Minimum:

  • Windows 10 / macOS 10.15 / Linux
  • 8GB RAM
  • 5GB free disk space

Recommended for GPU Acceleration:

  • NVIDIA GPU (GTX 900 series or newer)
  • Driver version >= 525.60 (for CUDA support)

Note: GPU is optional - the app automatically detects your hardware and falls back to CPU if needed.

2. Installation

Windows:

  1. Download VoiceTransor-v0.9.0-Windows-x64-Setup.exe from Releases
  2. Run the installer and follow the setup wizard
  3. Launch VoiceTransor from the Start Menu or Desktop shortcut

macOS / Linux:

  • Coming soon - Currently only Windows installer is available
  • For development setup, see Developer Guide

Important: You also need to install FFmpeg (see below).

3. Install FFmpeg (Required)

VoiceTransor needs FFmpeg for audio processing.

Windows:

macOS:

brew install ffmpeg

Linux:

sudo apt install ffmpeg  # Ubuntu/Debian

4. First Use

  1. Launch VoiceTransor
  2. Click "Import Audio" (supports WAV, MP3, M4A, FLAC, etc.)
  3. Click "Transcribe to Text"
  4. Choose settings:
    • Model: base (recommended for most users)
    • Device: auto (automatically uses GPU if available)
    • Language: Auto-detect or select specific language
  5. Wait for transcription (first time downloads the model ~140MB)
  6. Export as TXT or process with AI

Optional: Install Ollama

Ollama enables AI-powered text processing (summarize, translate, etc.) - completely offline.

Installation:

  1. Download from: https://ollama.com/download
  2. Install and run ollama serve
  3. Pull a model:
    ollama pull llama3.1:8b  # English
    ollama pull qwen2.5:7b   # Chinese/English

Note: Ollama works on both CPU and GPU - no special setup needed.

Performance

Transcription Speed (1 hour audio):

Hardware Time
CPU (8-core) ~30-60 min
NVIDIA RTX 3060 ~2-5 min
Apple M1 Pro ~3-6 min

GPU Compatibility:

  • Single universal build - works on both GPU and CPU systems
  • ✅ Automatic detection - no manual configuration needed
  • ✅ NVIDIA GPUs (GTX 900+, RTX series) - uses CUDA acceleration
  • ✅ Apple Silicon (M1/M2/M3) - uses Metal Performance Shaders
  • ✅ Graceful fallback to CPU if GPU unavailable
  • ℹ️ No separate CPU/GPU versions - one installer works for everyone

Documentation

Troubleshooting

"ffprobe not found"

  • Install FFmpeg and ensure it's in your PATH

Slow transcription

  • Use smaller model (tiny or base)
  • Check Device setting is auto or cuda
  • Ensure GPU drivers are up to date

"Ollama is not running"

  • Open terminal and run ollama serve

For more help, see USER_GUIDE.md

License

MIT License. See LICENSE for details.

Support


Made with ❤️ using OpenAI Whisper and Ollama

About

AI-powered speech-to-text desktop app using Whisper and Ollama. 100% local processing, privacy-first.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •