A streamlined Progressive Web App (PWA) for voice transcription that combines speech recognition with AI text formatting - now with enhanced mobile support and an improved user experience!
✨ Streamlined Settings Interface: New tabbed configuration panel with organized sections for Whisper, LLM, and system settings
🎯 Quick Setup Presets: One-click configuration presets for popular setups (Local, Groq Cloud, Hybrid)
📱 Mobile-Optimized Design: Responsive layout that works beautifully on phones, tablets, and desktops
🚀 Progressive Web App (PWA): Install it like a native app on any device and use it offline
⚡ Enhanced Performance: Better layout management and optimized for all screen sizes
Visit: https://humanface-tech.github.io/whisper-recorder-ui/
On Desktop (Chrome/Edge/Safari):
- Visit the web app URL
- Look for the "Install" button in the address bar
- Click "Install" to add it to your desktop
- Launch from your desktop like any native app
On Mobile (iOS/Android):
- Open the web app in your mobile browser
- iOS: Tap the "Share" button → "Add to Home Screen"
- Android: Tap the menu (⋮) → "Add to Home Screen" or "Install App"
- The app will appear on your home screen
On Desktop (Alternative):
- Click the browser menu (⋮)
- Select "Install Whisper Recorder UI..."
- Confirm installation
- Download the repository files
- Open
index.htmlin any modern browser - No server required - runs entirely in your browser!
- 🎤 Recording: Captures high-quality audio directly from your browser using the Web Audio API
- 🔤 Transcription: Sends the audio to a Whisper ASR service (local or cloud-based) for speech-to-text conversion
- 🤖 AI Processing: Sends the raw transcription to an LLM for intelligent formatting, grammar fixes, and text refinement
- 📋 Output: Displays both raw transcription and processed text with one-click copy to clipboard
The entire app runs client-side with no backend dependencies, making it completely portable and privacy-friendly.
For detailed technical background, read the Voice to Code - Blog Article.
v2 introduces one-click configuration presets for the most popular setups. Just click the ⚙️ settings icon and choose:
- 🏠 Local Setup: Use locally running Whisper + Ollama services
- ☁️ Groq Cloud: Full cloud setup with Groq's APIs (fastest to get started)
- 🔄 Hybrid: Local Whisper + cloud LLM (best of both worlds)
This configuration uses locally running instances of Whisper ASR and Ollama for LLM processing:
-
Install and Run Local Whisper ASR Service:
# Using Docker docker pull onerahmet/openai-whisper-asr-webservice:latest docker run -d -p 9100:9000 onerahmet/openai-whisper-asr-webservice:latest -
Install and Run Ollama:
# Install Ollama (MacOS/Linux) curl -fsSL https://ollama.com/install.sh | sh # Pull a model ollama pull qwen2.5-coder:14b # Start Ollama service (it runs on port 11434 by default) ollama serve
-
Configure the App:
- Whisper API Endpoint:
http://localhost:9100/asr - Whisper API Format:
Local Whisper - LLM Provider:
Ollama - LLM Endpoint:
http://localhost:11434/api/generate - LLM Model:
qwen2.5-coder:14b
- Whisper API Endpoint:
Alternatively - you can use this fork: HumanFace-Tech/whisper-asr-with-ui
This configuration uses cloud services for both ASR and LLM processing:
-
Sign Up for Groq:
- Create an account at groq.com
- Generate an API key from the dashboard
- Note: Groq offers both LLM APIs as well as Whisper APIs through their OpenAI-compatible endpoints
-
Configure the App for Groq:
- Whisper API Endpoint:
https://api.groq.com/openai/v1/audio/transcriptions - Whisper API Format:
OpenAI/Groq Compatible - Whisper API Key:
[Your Groq API Key] - Whisper Model:
whisper-large-v3 - LLM Provider:
OpenAI-compatible - LLM Endpoint:
https://api.groq.com/openai/v1/chat/completions - LLM Model:
gemma2-9b-it(from our tests - it works the best and is cost-effective) - API Key:
[Your Groq API Key]
- Whisper API Endpoint:
This combines local Whisper ASR with a cloud LLM service like Groq:
-
Install and Run Local Whisper ASR Service (as in Option 1)
-
Configure the App:
- Whisper API Endpoint:
http://localhost:9100/asr - Whisper API Format:
Local Whisper - LLM Provider:
OpenAI-compatible - LLM Endpoint:
https://api.groq.com/openai/v1/chat/completions - LLM Model:
gemma2-9b-it - API Key:
[Your Groq API Key]
- Whisper API Endpoint:
- Transcription fails: Check that your Whisper service is running and accessible
- API Connection Issues: If connecting to external services, ensure your API key is valid and correctly entered
- LLM processing fails: Verify the model name exists and is spelled correctly
- No audio recording: Ensure your browser has microphone permissions
- Cross-Origin Errors: If using the app directly from a file, some services might block requests due to CORS policies
The default system prompt instructs the LLM how to format and clean up the transcribed text. You can customize this in the configuration panel to suit your specific needs:
- For more thorough corrections, emphasize grammar and spelling fixes
- For minimal interference, specify that the LLM should preserve the original wording
- For specialized formats (like code snippets), instruct the LLM to detect and format these elements properly
This tool is designed to be flexible and expandable. We welcome contributions to support additional ASR or LLM providers:
- Supporting New LLM Providers: Fork the repository and add handlers for new API formats
- Improving Transcription: Enhancements to audio processing or format conversion are welcome
- UI Improvements: Suggestions for better usability while maintaining the single-file approach
If you'd like to contribute or have suggestions, please feel free to fork the project or submit your ideas.
🎯 Vibe-coded with passion at HumanFace Tech - where we build tools that actually make sense and work the way you'd expect them to.
Like this project? Check out what else we're building at HumanFace Tech!
This project is licensed under the MIT License.
