A modern, modular Text-to-Speech (TTS) system built with Flask, featuring local inference with Kokoro TTS models.
- Modular Architecture: Easily extensible with new TTS models
- Web Interface: Clean, responsive Flask-based UI
- Local Inference: Run completely offline with local models
- Performance Monitoring: Real-time GPU/CPU usage tracking
- Multiple Voices: Support for various voices and speakers
- Audio Generation: High-quality WAV output
- Logging: Comprehensive generation logs with performance metrics
- Kokoro: Fast, lightweight TTS optimized for local macOS deployment
- macOS (Intel or Apple Silicon)
- Python 3.10+
- Homebrew
Run the setup script to automatically configure your environment:
./setup-neo-tts.shThis will:
- Install Homebrew dependencies (ffmpeg, pkg-config, Python 3.10)
- Create a virtual environment
- Install all required Python packages
- Download TTS models
- Set up project directories
If you prefer manual installation:
-
Install system dependencies:
brew install ffmpeg pkg-config python@3.10
-
Create virtual environment:
python3.10 -m venv venv source venv/bin/activate -
Install Python dependencies:
pip install -r requirements-neo.txt
-
Download models:
python -c " from huggingface_hub import snapshot_download snapshot_download(repo_id='hexgrad/Kokoro-82M', local_dir='models/kokoro_cache') "
-
Activate virtual environment:
source venv/bin/activate -
Start the server:
python app/app.py
-
Open your browser: Visit
http://localhost:5000 -
Generate speech:
- Select a TTS model
- Choose a voice
- Enter your text
- Click generate
local-tts-devlopment/
├── app/ # Flask application
│ ├── static/ # Static assets (CSS, JS, output files)
│ ├── templates/ # HTML templates
│ ├── app.py # Main Flask application
│ ├── device_utils.py # Device monitoring utilities
│ └── __init__.py
├── models/ # TTS model implementations
│ └── kokoro.py # Kokoro TTS wrapper
├── logs/ # Generation logs and metrics
├── venv/ # Virtual environment (created by setup)
├── setup-neo-tts.sh # Automated setup script
├── requirements-neo.txt # Python dependencies
└── README.md # This file
The application automatically detects your hardware and optimizes accordingly:
- Apple Silicon (M1/M2/M3): Uses Metal acceleration when available
- Intel Macs: Falls back to CPU inference
- GPU Monitoring: Tracks VRAM usage and performance metrics
The application provides real-time monitoring of:
- CPU/GPU usage during generation
- Generation time and audio duration
- Model performance metrics
- Device information
Access monitoring data via the /api/device-info endpoint.
Run the included verification tests:
# Test Kokoro model
python -c "
from models.kokoro import list_voices, generate_audio
voices = list_voices()
if voices:
generate_audio('Hello from Kokoro', voices[0])
print('✅ Kokoro OK')
"- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make your changes
- Test thoroughly
- Submit a pull request
To add support for new TTS models:
- Create a new module in
models/ - Implement the required interface:
list_voices(): Return available voicesgenerate_audio(text, voice, output_path): Generate audio file
- Register the model in
app/app.pyMODELS dictionary
This project is licensed under the MIT License - see the LICENSE file for details.
- Kokoro TTS - Fast local TTS
- Flask - Web framework
- PyTorch - Machine learning framework
If you encounter any issues:
- Check the logs in
app/logs/results.csv - Ensure your virtual environment is activated
- Verify all dependencies are installed
- Check device compatibility
For bugs or feature requests, please open an issue on GitHub.