Skip to content

speaches-ai/speaches

This branch is up to date with master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Fedir Zadniprovskyifedirz
Fedir Zadniprovskyi
and
Mar 6, 2025
ea01c10 · Mar 6, 2025
Jan 26, 2025
Jan 26, 2025
Mar 6, 2025
Feb 14, 2025
Mar 3, 2025
Feb 17, 2025
Mar 6, 2025
Mar 6, 2025
Mar 3, 2025
May 20, 2024
Mar 3, 2025
Feb 19, 2025
Mar 3, 2025
May 20, 2024
Mar 3, 2025
Feb 17, 2025
Jan 10, 2025
Feb 14, 2025
Jan 12, 2025
Feb 14, 2025
Feb 19, 2025
Mar 3, 2025
Mar 6, 2025
Jan 29, 2025
Jan 29, 2025
Mar 3, 2025
Mar 2, 2025
Mar 6, 2025
Nov 1, 2024
Mar 6, 2025

Repository files navigation

Note

This project was previously named faster-whisper-server. I've decided to change the name from faster-whisper-server, as the project has evolved to support more than just ASR.

Speaches

speaches is an OpenAI API-compatible server supporting streaming transcription, translation, and speech generation. Speach-to-Text is powered by faster-whisper and for Text-to-Speech piper and Kokoro are used. This project aims to be Ollama, but for TTS/STT models.

Try it out on the HuggingFace Space

See the documentation for installation instructions and usage: speaches.ai

Features:

  • OpenAI API compatible. All tools and SDKs that work with OpenAI's API should work with speaches.
  • Audio generation (chat completions endpoint) | OpenAI Documentation
    • Generate a spoken audio summary of a body of text (text in, audio out)
    • Perform sentiment analysis on a recording (audio in, text out)
    • Async speech to speech interactions with a model (audio in, audio out)
  • Streaming support (transcription is sent via SSE as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it).
  • Dynamic model loading / offloading. Just specify which model you want to use in the request and it will be loaded automatically. It will then be unloaded after a period of inactivity.
  • Text-to-Speech via kokoro(Ranked #1 in the TTS Arena) and piper models.
  • GPU and CPU support.
  • Deployable via Docker Compose / Docker
  • Highly configurable
  • Realtime API

Please create an issue if you find a bug, have a question, or a feature suggestion.

Demo

Streaming Transcription

TODO

Speech Generation

2025-01-12_13-20-58.webm