SpeakMCP

🎤 AI-powered voice assistant with MCP integration - A fork of Whispo that transforms your voice into intelligent actions with advanced speech recognition, LLM processing, and Model Context Protocol (MCP) tool execution.

🎬 Preview

speakmcp-vid.mp4

🚀 Quick Start

Download

Cross-Platform Support: macOS (Apple Silicon & Intel), Windows (x64), Linux (x64)

📥 Download Latest Release

Basic Usage

Voice Recording:

Hold Ctrl key to start recording your voice
Release Ctrl to stop recording and transcribe
Text is automatically inserted into your active application

MCP Agent Mode:

Hold Ctrl+Alt to start recording for agent mode
Release Ctrl+Alt to process with MCP tools
Watch real-time progress as the agent executes tools
Results are automatically inserted or displayed

Text Input:

Press Ctrl+T to open text input mode for direct typing

✨ Features

🎤 Voice & Speech

Voice-to-Text: Hold Ctrl to record, release to transcribe
Toggle Voice Dictation: Press Fn key to start/stop recording (configurable)
Multi-Language Support: 30+ languages including Spanish, French, German, Chinese, Japanese, Arabic, Hindi
Text-to-Speech (TTS): AI-generated speech with 50+ voices across OpenAI, Groq, and Gemini
Auto-Play TTS: Automatic speech playback for seamless conversations

🤖 AI Agent & MCP

MCP Agent Mode: Hold Ctrl+Alt for intelligent tool execution with real-time progress
MCP Integration: Connect to any MCP-compatible tools and services
OAuth 2.1 Support: Secure authentication for MCP servers with deep link integration
Tool Management: Per-server tool toggles and approval prompts
Conversation Continuity: Context preservation across agent interactions

🛠️ Platform & Performance

Cross-Platform: macOS, Windows, and Linux support with native builds
Rate Limit Handling: Exponential backoff retry for API rate limits (429 errors)
Model Selection: Choose specific models for OpenAI, Groq, and Gemini providers
Debug Modes: Comprehensive logging for LLM calls, tool execution, and TTS
Universal Integration: Works with any text-input application

🎨 User Experience

Text Input: Press Ctrl+T for direct text input mode
Dark/Light Themes: Toggle between dark and light modes
Resizable Panels: Drag-to-resize interface components
Kill Switch: Emergency stop for agent operations (Escape key)
Conversation Management: Full conversation history with tool call visualization

🏗️ Architecture

Built with modern technologies for cross-platform performance:

Electron: Main process for system integration, MCP orchestration, and TTS processing
React + TypeScript: Modern UI with real-time progress tracking and conversation management
Rust: High-performance keyboard monitoring and text injection across platforms
MCP Client: Full Model Context Protocol implementation with OAuth 2.1 support
Multi-Provider AI: OpenAI, Groq, and Gemini integration for speech, text, and TTS

🛠️ Development

Prerequisites: Node.js 18+, pnpm, Rust toolchain

# Setup
git clone https://github.com/aj47/SpeakMCP.git
cd SpeakMCP
pnpm install
pnpm build-rs  # Build Rust binary for your platform
pnpm dev       # Start development server

# Platform-specific builds
pnpm build        # Production build for current platform
pnpm build:mac    # macOS build (Apple Silicon + Intel)
pnpm build:win    # Windows build (x64)
pnpm build:linux  # Linux build (x64)

# Testing
pnpm test         # Run test suite
pnpm test:tts     # Test TTS functionality

⚙️ Configuration

AI Providers: OpenAI, Groq, Google Gemini

Configure API keys and custom base URLs in settings
Select specific models for each provider
Multi-language speech recognition support
TTS with 50+ voices across providers

MCP Servers: Configure tools in mcpServers JSON format:

{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": ["@modelcontextprotocol/server-filesystem", "/path"]
    },
    "web-search": {
      "command": "npx",
      "args": ["@modelcontextprotocol/server-web-search"],
      "env": {"BRAVE_API_KEY": "your-key"}
    }
  }
}

Keyboard Shortcuts:

Hold Ctrl: Voice recording (traditional mode)
Fn Key: Toggle voice dictation (press once to start/stop)
Hold Ctrl+Alt: MCP agent mode
Ctrl+T: Text input mode
Escape: Cancel/kill switch for operations

🤖 MCP Agent Mode

MCP (Model Context Protocol) enables AI assistants to connect to external tools. SpeakMCP implements a full MCP client with advanced capabilities.

Enhanced Features:

Intelligent Tool Selection: Automatically determines which tools to use
Real-time Progress: Visual feedback with TTS narration during execution
Conversation Continuity: Context preservation across multi-turn interactions
OAuth 2.1 Integration: Secure authentication for MCP servers
Rate Limit Handling: Automatic retry with exponential backoff
Kill Switch: Emergency stop functionality with Escape key
Tool Management: Per-server tool toggles and approval prompts

Example commands:

"Create a new project folder and add a README"
"Search for latest AI news and summarize the top 3 articles"
"Send a message to the team about today's progress"
"Analyze this codebase and suggest improvements"

🆕 What's New

Recent Major Features:

🎵 Text-to-Speech (TTS) Integration

50+ AI Voices: OpenAI (6 voices), Groq (23 voices), Gemini (30+ voices)
Auto-Play: Seamless conversation flow with automatic speech playback
Smart Preprocessing: Converts code blocks, URLs, and markdown to natural speech
Multi-Language: Support for 30+ languages with native pronunciation

🖥️ Cross-Platform Support

Windows Build: Full Windows compatibility with native builds
Enhanced macOS: Apple Silicon and Intel support
Linux Ready: Complete Linux build pipeline

🎛️ Enhanced Voice Controls

Toggle Voice Dictation: Press Fn key to start/stop recording
Multi-Language Recognition: 30+ languages with automatic detection
Configurable Hotkeys: Customize keyboard shortcuts for all functions

🔧 Reliability & Performance

Rate Limit Handling: Automatic retry with exponential backoff for API limits
OAuth 2.1: Secure authentication for MCP servers with deep link integration
Kill Switch: Emergency stop functionality for all operations
Model Selection: Choose specific AI models for each provider

🐛 Debug Mode

For development and troubleshooting, SpeakMCP includes comprehensive debug logging:

# Enable all debug modes
pnpm dev d               # Shortest option
pnpm dev debug-all       # Readable format

# Enable specific modes
pnpm dev debug-llm       # LLM calls and responses
pnpm dev debug-tools     # MCP tool execution
pnpm dev debug-tts       # Text-to-speech debugging

🤝 Contributing

We welcome contributions! Fork the repo, create a feature branch, and open a Pull Request.

💬 Get help on Discord | 🌐 More info at techfren.net

📄 License

This project is licensed under the AGPL-3.0 License.

🙏 Acknowledgments

Whispo - This project is a fork of Whispo, the original AI voice assistant
OpenAI for Whisper speech recognition and GPT models
Anthropic for Claude and MCP protocol development
Model Context Protocol for the extensible tool integration standard
Electron for cross-platform desktop framework
React for the user interface
Rust for system-level integration
Groq for fast inference capabilities
Google for Gemini models

Made with ❤️ by the SpeakMCP team

Name		Name	Last commit message	Last commit date
Latest commit History 277 Commits
.github/workflows		.github/workflows
.vscode		.vscode
backend		backend
build		build
resources		resources
scripts		scripts
speakmcp-rs		speakmcp-rs
src		src
.editorconfig		.editorconfig
.gitignore		.gitignore
.npmrc		.npmrc
.prettierignore		.prettierignore
.prettierrc		.prettierrc
DEBUGGING.md		DEBUGGING.md
LICENSE		LICENSE
PRIVACY.md		PRIVACY.md
README.md		README.md
SECURITY.md		SECURITY.md
WINDOWS_BUILD_FIXES.md		WINDOWS_BUILD_FIXES.md
WINDOWS_BUILD_SETUP.md		WINDOWS_BUILD_SETUP.md
components.json		components.json
electron-builder.config.cjs		electron-builder.config.cjs
electron.vite.config.ts		electron.vite.config.ts
package-lock.json		package-lock.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
postcss.config.js		postcss.config.js
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
tsconfig.web.json		tsconfig.web.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpeakMCP

🎬 Preview

🚀 Quick Start

Download

Basic Usage

✨ Features

🎤 Voice & Speech

🤖 AI Agent & MCP

🛠️ Platform & Performance

🎨 User Experience

🏗️ Architecture

🛠️ Development

⚙️ Configuration

🤖 MCP Agent Mode

🆕 What's New

🎵 Text-to-Speech (TTS) Integration

🖥️ Cross-Platform Support

🎛️ Enhanced Voice Controls

🔧 Reliability & Performance

🐛 Debug Mode

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

rishitank/SpeakMCP

Folders and files

Latest commit

History

Repository files navigation

SpeakMCP

🎬 Preview

🚀 Quick Start

Download

Basic Usage

✨ Features

🎤 Voice & Speech

🤖 AI Agent & MCP

🛠️ Platform & Performance

🎨 User Experience

🏗️ Architecture

🛠️ Development

⚙️ Configuration

🤖 MCP Agent Mode

🆕 What's New

🎵 Text-to-Speech (TTS) Integration

🖥️ Cross-Platform Support

🎛️ Enhanced Voice Controls

🔧 Reliability & Performance

🐛 Debug Mode

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages