WhisperServer for macOS

No speedup. MacBook Pro 13, M2, 16 GB.

WhisperServer is a lightweight macOS menu bar app that runs in the background.
It exposes a local HTTP server compatible with the OpenAI Whisper API for audio transcription.

Key features

Local HTTP server compatible with the OpenAI Whisper API
Menu bar application (no Dock icon)
Streaming via Server‑Sent Events (SSE) with automatic chunked fallback
Automatic VAD-based chunking for Whisper models to prevent repeated text in long audio files — a common issue with standard whisper.cpp
Automatically downloads models on first use
Fast, high‑quality quantized models
Parakeet model can transcribe ~1 hour of audio in about 1 minute

Requirements

macOS 14.6 or newer
Apple Silicon (ARM64) only

Recommended by

Project	Platform	Key features
VibeScribe	macOS	Automatic call summarization and transcription for meetings, interviews, and brainstorming. Key features: AI-powered summaries, easy export of notes, transcription.

Installation

Download from GitHub Releases

Go to the Releases page.
Download the latest .dmg file.
Open the .dmg file.
Drag WhisperServer to your Applications folder.

🚨 First launch

This app is not signed by Apple. To open it the first time:

Control‑click (or right‑click) WhisperServer in Applications.
Choose Open.
In the warning dialog, click Open.
Or go to System Settings → Privacy & Security and allow the app.

Usage

 Apple Shortcut

From Audio to SRT

Example

1017.3.mp4

HTTP

curl -X POST http://localhost:12017/v1/audio/transcriptions \
  -F file=@/path/to/audio.mp3

Supported parameters

Parameter	Description	Values	Required
file	Audio file	wav, mp3, m4a	yes
model	Model to use	model ID	no
prompt	Guide style/tone (Whisper)	string	no
response_format	Output format	json, text, srt, vtt, verbose_json	no
language	Input language (ISO 639‑1)	2‑letter code	no
diarize	Enable Fluid speaker diarization	true, false (default false)	no
stream	Enable streaming (SSE or chunked)	true, false	no

Models

Model	Relative speed	Quality
`parakeet-tdt-0.6b-v3`	Fastest	Medium
`tiny-q5_1`	Fast	Good (English), Low (other languages)
`large-v3-turbo-q5_0`	Slow	Medium–Good
`medium-q5_0`	Slowest	Good

Response formats

The server supports multiple response formats:

curl -X POST http://localhost:12017/v1/audio/transcriptions \
  -F file=@/path/to/audio.mp3 \
  -F response_format=json

json (default)

{
  "text": "Transcription text."
}

verbose_json

{
  "task": "transcribe",
  "language": "en",
  "duration": 10.5,
  "text": "Full transcription text.",
  "segments": [
    {
      "id": 0,
      "seek": 0,
      "start": 0.0,
      "end": 5.0,
      "text": "First segment.",
      "tokens": [50364, 13, 11, 263, 6116],
      "temperature": 0.0,
      "avg_logprob": -0.45,
      "compression_ratio": 1.275,
      "no_speech_prob": 0.1
    }
  ]
}

text

And so, my fellow Americans, ask not what your country can do for you, ask what you can do for your country.

srt

1
00:00:00,240 --> 00:00:07,839
And so, my fellow Americans, ask not what your country can do for you

2
00:00:07,839 --> 00:00:10,640
ask what you can do for your country.

vtt

WEBVTT

00:00:00.240 --> 00:00:07.839
And so, my fellow Americans, ask not what your country can do for you

00:00:07.839 --> 00:00:10.640
ask what you can do for your country.

Streaming support

WhisperServer supports real‑time streaming with automatic protocol detection. Note: timestamped streaming (srt, vtt, verbose_json) requires the Whisper provider; the Fluid provider streams text/JSON only.

Server‑Sent Events (SSE)

If the client sends the header Accept: text/event-stream, the server uses SSE:

curl -X POST http://localhost:12017/v1/audio/transcriptions \
  -H "Accept: text/event-stream" \
  -F file=@audio.wav \
  -F stream=true \
  --no-buffer

Response format:

data: First transcribed segment
data:

data: Second transcribed segment
data:

event: end
data:

Chunked response

If SSE isn’t supported, the server falls back to HTTP chunked transfer encoding:

curl -X POST http://localhost:12017/v1/audio/transcriptions \
  -F file=@audio.wav \
  -F stream=true \
  --no-buffer

FluidAudio diarization

Add speaker labels (who is talking) when you use the FluidAudio provider. Diarization is off by default to stay compatible with the OpenAI Whisper API.

How to enable:

Select the Fluid provider in the menu bar (or pass the Fluid model ID), and
Add diarize=true to your request.

Example:

curl -X POST http://localhost:12017/v1/audio/transcriptions \
  -F file=@meeting.wav \
  -F model=parakeet-tdt-0.6b-v3 \
  -F response_format=json \
  -F diarize=true

What you get:

For response_format=json, the server adds a speaker_segments array:

{
  "text": "Good morning everyone...",
  "speaker_segments": [
    {
      "speaker": "Speaker_1",
      "start": 0.0,
      "end": 4.2,
      "text": "Good morning everyone"
    },
    {
      "speaker": "Speaker_2",
      "start": 4.2,
      "end": 7.8,
      "text": "Morning! Shall we begin?"
    }
  ]
}

For response_format=verbose_json, speaker_segments is added as well. The existing segments field stays unchanged.

Streaming:

Streaming sends one JSON chunk with speaker_segments when diarization completes.
Then the standard end event is sent.

Build from Source

If you want to build WhisperServer yourself:

Clone the repository:

git clone https://github.com/pfrankov/whisper-server.git
cd whisper-server

Open the project in Xcode.
Select your development team:
- Click the project in Xcode
- Select the WhisperServer target

Go to "Signing & Capabilities"
Choose your team

Build and run:
- Press Cmd + R to build and run
- Or use the menu: Product → Run

Testing

Run the app, then run the script: test_api.sh (complete API test suite)

Importing Custom Models

In the menu bar, open Select Model → Import Whisper Model…
Choose a .bin model file (optionally add its .mlmodelc bundle in the same dialog)
The model becomes selectable in the menu and is listed in GET /v1/models

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
WhisperServer.xcodeproj		WhisperServer.xcodeproj
WhisperServer		WhisperServer
WhisperServerTests		WhisperServerTests
scripts		scripts
.gitignore		.gitignore
AGENTS.md		AGENTS.md
GenerateAppIcon.sh		GenerateAppIcon.sh
GenerateAppIcon.swift		GenerateAppIcon.swift
LICENSE		LICENSE
README.md		README.md
TESTING.md		TESTING.md
icon.png		icon.png
jfk.wav		jfk.wav
test_api.sh		test_api.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

WhisperServer for macOS

Key features

Requirements

Recommended by

Installation

Download from GitHub Releases

🚨 First launch

Usage

 Apple Shortcut

HTTP

Supported parameters

Models

Response formats

Streaming support

Server‑Sent Events (SSE)

Chunked response

FluidAudio diarization

Build from Source

Testing

Importing Custom Models

License

About

Uh oh!

Releases 8

Languages

Uh oh!

License

Uh oh!

pfrankov/whisper-server

Folders and files

Latest commit

History

Repository files navigation

WhisperServer for macOS

Key features

Requirements

Recommended by

Installation

Download from GitHub Releases

🚨 First launch

Usage

 Apple Shortcut

HTTP

Supported parameters

Models

Response formats

Streaming support

Server‑Sent Events (SSE)

Chunked response

FluidAudio diarization

Build from Source

Testing

Importing Custom Models

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 8

Languages