π€ AI-powered voice assistant with MCP integration - A fork of Whispo that transforms your voice into intelligent actions with advanced speech recognition, LLM processing, and Model Context Protocol (MCP) tool execution.
speakmcp-vid.mp4
Cross-Platform Support: macOS (Apple Silicon & Intel), Windows (x64), Linux (x64)
Voice Recording:
- Hold
Ctrlkey to start recording your voice - Release
Ctrlto stop recording and transcribe - Text is automatically inserted into your active application
MCP Agent Mode:
- Hold
Ctrl+Altto start recording for agent mode - Release
Ctrl+Altto process with MCP tools - Watch real-time progress as the agent executes tools
- Results are automatically inserted or displayed
Text Input:
- Press
Ctrl+Tto open text input mode for direct typing
- Voice-to-Text: Hold
Ctrlto record, release to transcribe - Toggle Voice Dictation: Press
Fnkey to start/stop recording (configurable) - Multi-Language Support: 30+ languages including Spanish, French, German, Chinese, Japanese, Arabic, Hindi
- Text-to-Speech (TTS): AI-generated speech with 50+ voices across OpenAI, Groq, and Gemini
- Auto-Play TTS: Automatic speech playback for seamless conversations
- MCP Agent Mode: Hold
Ctrl+Altfor intelligent tool execution with real-time progress - MCP Integration: Connect to any MCP-compatible tools and services
- OAuth 2.1 Support: Secure authentication for MCP servers with deep link integration
- Tool Management: Per-server tool toggles and approval prompts
- Conversation Continuity: Context preservation across agent interactions
- Cross-Platform: macOS, Windows, and Linux support with native builds
- Rate Limit Handling: Exponential backoff retry for API rate limits (429 errors)
- Model Selection: Choose specific models for OpenAI, Groq, and Gemini providers
- Debug Modes: Comprehensive logging for LLM calls, tool execution, and TTS
- Universal Integration: Works with any text-input application
- Text Input: Press
Ctrl+Tfor direct text input mode - Dark/Light Themes: Toggle between dark and light modes
- Resizable Panels: Drag-to-resize interface components
- Kill Switch: Emergency stop for agent operations (
Escapekey) - Conversation Management: Full conversation history with tool call visualization
Built with modern technologies for cross-platform performance:
- Electron: Main process for system integration, MCP orchestration, and TTS processing
- React + TypeScript: Modern UI with real-time progress tracking and conversation management
- Rust: High-performance keyboard monitoring and text injection across platforms
- MCP Client: Full Model Context Protocol implementation with OAuth 2.1 support
- Multi-Provider AI: OpenAI, Groq, and Gemini integration for speech, text, and TTS
Prerequisites: Node.js 18+, pnpm, Rust toolchain
# Setup
git clone https://github.com/aj47/SpeakMCP.git
cd SpeakMCP
pnpm install
pnpm build-rs # Build Rust binary for your platform
pnpm dev # Start development server
# Platform-specific builds
pnpm build # Production build for current platform
pnpm build:mac # macOS build (Apple Silicon + Intel)
pnpm build:win # Windows build (x64)
pnpm build:linux # Linux build (x64)
# Testing
pnpm test # Run test suite
pnpm test:tts # Test TTS functionalityAI Providers: OpenAI, Groq, Google Gemini
- Configure API keys and custom base URLs in settings
- Select specific models for each provider
- Multi-language speech recognition support
- TTS with 50+ voices across providers
MCP Servers: Configure tools in mcpServers JSON format:
{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": ["@modelcontextprotocol/server-filesystem", "/path"]
},
"web-search": {
"command": "npx",
"args": ["@modelcontextprotocol/server-web-search"],
"env": {"BRAVE_API_KEY": "your-key"}
}
}
}Keyboard Shortcuts:
- Hold Ctrl: Voice recording (traditional mode)
- Fn Key: Toggle voice dictation (press once to start/stop)
- Hold Ctrl+Alt: MCP agent mode
- Ctrl+T: Text input mode
- Escape: Cancel/kill switch for operations
MCP (Model Context Protocol) enables AI assistants to connect to external tools. SpeakMCP implements a full MCP client with advanced capabilities.
Enhanced Features:
- Intelligent Tool Selection: Automatically determines which tools to use
- Real-time Progress: Visual feedback with TTS narration during execution
- Conversation Continuity: Context preservation across multi-turn interactions
- OAuth 2.1 Integration: Secure authentication for MCP servers
- Rate Limit Handling: Automatic retry with exponential backoff
- Kill Switch: Emergency stop functionality with
Escapekey - Tool Management: Per-server tool toggles and approval prompts
Example commands:
- "Create a new project folder and add a README"
- "Search for latest AI news and summarize the top 3 articles"
- "Send a message to the team about today's progress"
- "Analyze this codebase and suggest improvements"
Recent Major Features:
- 50+ AI Voices: OpenAI (6 voices), Groq (23 voices), Gemini (30+ voices)
- Auto-Play: Seamless conversation flow with automatic speech playback
- Smart Preprocessing: Converts code blocks, URLs, and markdown to natural speech
- Multi-Language: Support for 30+ languages with native pronunciation
- Windows Build: Full Windows compatibility with native builds
- Enhanced macOS: Apple Silicon and Intel support
- Linux Ready: Complete Linux build pipeline
- Toggle Voice Dictation: Press
Fnkey to start/stop recording - Multi-Language Recognition: 30+ languages with automatic detection
- Configurable Hotkeys: Customize keyboard shortcuts for all functions
- Rate Limit Handling: Automatic retry with exponential backoff for API limits
- OAuth 2.1: Secure authentication for MCP servers with deep link integration
- Kill Switch: Emergency stop functionality for all operations
- Model Selection: Choose specific AI models for each provider
For development and troubleshooting, SpeakMCP includes comprehensive debug logging:
# Enable all debug modes
pnpm dev d # Shortest option
pnpm dev debug-all # Readable format
# Enable specific modes
pnpm dev debug-llm # LLM calls and responses
pnpm dev debug-tools # MCP tool execution
pnpm dev debug-tts # Text-to-speech debuggingWe welcome contributions! Fork the repo, create a feature branch, and open a Pull Request.
π¬ Get help on Discord | π More info at techfren.net
This project is licensed under the AGPL-3.0 License.
- Whispo - This project is a fork of Whispo, the original AI voice assistant
- OpenAI for Whisper speech recognition and GPT models
- Anthropic for Claude and MCP protocol development
- Model Context Protocol for the extensible tool integration standard
- Electron for cross-platform desktop framework
- React for the user interface
- Rust for system-level integration
- Groq for fast inference capabilities
- Google for Gemini models
Made with β€οΈ by the SpeakMCP team