feat: add voice dictation via whisper.cpp#28
feat: add voice dictation via whisper.cpp#28gapmiss wants to merge 1 commit intomy-claude-utils:mainfrom
Conversation
Hold-to-talk mic button on the context strip records audio on the phone, sends it to the agent server, and transcribes it locally using whisper.cpp. Transcribed text is injected into the terminal as stdin. Everything stays local — no cloud APIs. Server: - POST /api/transcribe endpoint (JWT-authenticated, multipart audio via multer) - Converts audio to 16kHz WAV via ffmpeg, runs whisper-cli, returns text - Configurable via WHISPER_CPP_PATH and WHISPER_MODEL env vars Client: - useDictation hook (MediaRecorder with webm/opus + mp4/aac fallback) - Mic button with hold-to-talk, recording/processing toast indicators - Haptic feedback on recording start Also fixes .ngrok-free.app missing from Vite allowedHosts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
9c8b97c to
795bef0
Compare
|
Hey @gapmiss, thanks for the PR and for taking the time to build this out! Voice dictation via whisper.cpp is a genuinely cool idea, and the local-only approach fits well with clsh's philosophy. That said, I'm going to close this one. The CLI is still very early and I want to keep the core lean and focused on the terminal experience for now. Adding external system dependencies (ffmpeg, whisper-cpp) and a new API surface is more than I'm comfortable merging without community discussion first. I've opened a feature request issue so the community can weigh in. If there's enough interest, we can figure out the right way to integrate it (maybe as a plugin system down the road). If you want to use this yourself in the meantime, a fork would work great. Appreciate the contribution! |
Summary
WHISPER_MODELenv var is set; zero impact otherwiseChanges
Server (
packages/agent):POST /api/transcribeendpoint — JWT-authenticated, accepts multipart audio viamulter, converts to WAV viaffmpeg, runswhisper-cli, returns transcribed textconfig.ts— addedWHISPER_CPP_PATHandWHISPER_MODELenv varsClient (
packages/web):useDictationhook —MediaRecorderwith webm/opus + mp4/aac iOS fallbackContextStripwith hold-to-talk, recording/processing toast indicators, haptic feedbackTerminalViewto send transcribed text as stdinOther:
.ngrok-free.appto ViteallowedHosts(was missing, only had.ngrok-free.dev)Dependencies
multer(new server dependency for multipart upload)ffmpegandwhisper-cppon the host (both available viabrew)Test plan
🤖 Generated with Claude Code