Local Speech-to-Text MCP Server

A high-performance Model Context Protocol (MCP) server providing local speech-to-text transcription using whisper.cpp, optimized for Apple Silicon.

🎯 Features

🏠 100% Local Processing: No cloud APIs, complete privacy
🚀 Apple Silicon Optimized: 15x+ real-time transcription speed
🎤 Speaker Diarization: Identify and separate multiple speakers
🎵 Universal Audio Support: Automatic conversion from MP3, M4A, FLAC, and more
📝 Multiple Output Formats: txt, json, vtt, srt, csv
💾 Low Memory Footprint: <2GB memory usage
🔧 TypeScript: Full type safety and modern development

🚀 Quick Start

Prerequisites

Node.js 18+
whisper.cpp (brew install whisper-cpp)
For audio format conversion: ffmpeg (brew install ffmpeg) - automatically handles MP3, M4A, FLAC, OGG, etc.
For speaker diarization: Python 3.8+ and HuggingFace token (free)

Supported Audio Formats

Native whisper.cpp formats: WAV, FLAC
Auto-converted formats: MP3, M4A, AAC, OGG, WMA, and more
Automatic conversion: Powered by ffmpeg with 16kHz/mono optimization for whisper.cpp
Format detection: Automatic format detection and conversion when needed

Installation

git clone https://github.com/your-username/local-stt-mcp.git
cd local-stt-mcp/mcp-server
npm install
npm run build

# Download whisper models
npm run setup:models

# For speaker diarization, set HuggingFace token
export HF_TOKEN="your_token_here"  # Get free token from huggingface.co

Speaker Diarization Note: Requires HuggingFace account and accepting pyannote/speaker-diarization-3.1 license.

MCP Client Configuration

Add to your MCP client configuration:

{
  "mcpServers": {
    "whisper-mcp": {
      "command": "node",
      "args": ["path/to/local-stt-mcp/mcp-server/dist/index.js"]
    }
  }
}

🛠️ Available Tools

Tool	Description
`transcribe`	Basic audio transcription with automatic format conversion
`transcribe_long`	Long audio file processing with chunking and format conversion
`transcribe_with_speakers`	Speaker diarization and transcription with format support
`list_models`	Show available whisper models
`health_check`	System diagnostics
`version`	Server version information

📊 Performance

Apple Silicon Benchmarks:

Processing Speed: 15.8x real-time (vs WhisperX 5.5x)
Memory Usage: <2GB (vs WhisperX ~4GB)
GPU Acceleration: ✅ Apple Neural Engine
Setup: Medium complexity but superior performance

See /benchmarks/ for detailed performance comparisons.

🏗️ Project Structure

mcp-server/
├── src/                    # TypeScript source code
│   ├── tools/             # MCP tool implementations
│   ├── whisper/           # whisper.cpp integration
│   ├── utils/             # Speaker diarization & utilities
│   └── types/             # Type definitions
├── dist/                  # Compiled JavaScript
└── python/                # Python dependencies

🔧 Development

# Build
npm run build

# Development mode (watch)
npm run dev

# Linting & formatting
npm run lint
npm run format

# Type checking
npm run type-check

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests
Submit a pull request

📄 License

MIT License - see LICENSE file for details.

🙏 Acknowledgments

whisper.cpp for optimized inference
OpenAI Whisper for the original models
Model Context Protocol for the framework
Pyannote.audio for speaker diarization

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
benchmarks		benchmarks
input		input
mcp-server		mcp-server
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Local Speech-to-Text MCP Server

🎯 Features

🚀 Quick Start

Prerequisites

Supported Audio Formats

Installation

MCP Client Configuration

🛠️ Available Tools

📊 Performance

🏗️ Project Structure

🔧 Development

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

SmartLittleApps/local-stt-mcp

Folders and files

Latest commit

History

Repository files navigation

Local Speech-to-Text MCP Server

🎯 Features

🚀 Quick Start

Prerequisites

Supported Audio Formats

Installation

MCP Client Configuration

🛠️ Available Tools

📊 Performance

🏗️ Project Structure

🔧 Development

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages