Author: Mr. Watson 🦄 Date: 2026-02-07
- Goal
- Quick operations
- Nginx config
- Basic auth file
- Service app
- Environment
- systemd unit
- Setup commands (sanitized)
- Operations
- Data lifecycle
Expose a protected /whisper endpoint to upload media and generate transcript artifacts (txt, srt, pdf) with optional diarization.
✨ Auto-Loading Service: Frontend always active at https://beachlab.org/whisper/
- GPU model loads automatically when you submit a job
- Auto-unloads after 120 seconds of inactivity (frees VRAM)
- If GPU memory is full, job fails with clear error message
systemctl status whisper-web
journalctl -u whisper-web -n 80 --no-pager
gpu-service status # Check GPU memory usageSee GPU Service Management for troubleshooting.
Implemented:
- Nginx route protection (
/whisper+ basic auth) - Local service on
127.0.0.1:8060 - Upload queue (SQLite)
- Faster-Whisper transcription (
cuda/float16on RTX 2070 Super) - Pyannote diarization support
- Cleanup on job deletion and post-processing
Added to beachlab.org server block:
location = /whisper {
return 301 /whisper/;
}
location /whisper/ {
auth_basic "Restricted";
auth_basic_user_file /etc/nginx/.htpasswd-whisper;
proxy_pass http://127.0.0.1:8060/;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $remote_addr;
client_max_body_size 700M;
}sudo htpasswd -bc /etc/nginx/.htpasswd-whisper <USER> '<PASSWORD>'
sudo chown root:www-data /etc/nginx/.htpasswd-whisper
sudo chmod 640 /etc/nginx/.htpasswd-whisperKeep credentials out of git/docs in real deployments.
Path:
/opt/whisper-service/app.py
Main endpoints (served behind /whisper via nginx):
GET /→ web UIPOST /jobs→ upload + create jobGET /jobs→ list jobsGET /jobs/{id}→ job detailsDELETE /jobs/{id}→ delete job + remove source/artifactsGET /jobs/{id}/download/{txt|srt|pdf}→ artifacts
/etc/whisper-service.env
# Required for speaker diarization model access
HF_TOKEN=<redacted>
# Run diarization on GPU (RTX 2070 Super, sm_75+)
WHISPER_DIAR_DEVICE=cudaPyannote diarization requires:
- HuggingFace account + token
- Acceptance of model terms for
pyannote/speaker-diarization-3.1 HF_TOKENset in/etc/whisper-service.env
Without token/terms acceptance, transcription may work but diarization jobs fail.
Compatibility fixes applied on host app:
- pyannote now uses
token=(with fallback touse_auth_token) to match currentpyannote.audioAPI. - pyannote 4 returns
DiarizeOutput; app now readsoutput.speaker_diarizationbefore iterating tracks.
Upgraded eGPU from GTX 1060 3GB (sm_61) to RTX 2070 Super 8GB (sm_75).
Changes applied:
- CPU fallback removed from
Engine.__init__— if CUDA fails, service errors explicitly rather than silently falling back WHISPER_DIAR_DEVICE=cuda— diarization now runs on GPU- Both Whisper transcription (
float16) and pyannote diarization run fully on GPU end-to-end
/etc/systemd/system/whisper-web.service
[Unit]
Description=Whisper web (transcription + diarization)
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
User=pink
Group=pink
WorkingDirectory=/opt/whisper-service
EnvironmentFile=/etc/whisper-service.env
ExecStart=/opt/whisper-service/.venv/bin/uvicorn app:app --host 127.0.0.1 --port 8060
Restart=always
RestartSec=3
[Install]
WantedBy=multi-user.targetsudo mkdir -p /opt/whisper-service/{uploads,outputs,temp}
sudo chown -R pink:pink /opt/whisper-service
python3 -m venv /opt/whisper-service/.venv
/opt/whisper-service/.venv/bin/pip install --upgrade pip
/opt/whisper-service/.venv/bin/pip install fastapi uvicorn[standard] python-multipart jinja2 reportlab faster-whisper
# diarization deps
/opt/whisper-service/.venv/bin/pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121
/opt/whisper-service/.venv/bin/pip install pyannote.audiosudo systemctl daemon-reload
sudo systemctl enable --now whisper-web
sudo systemctl status whisper-web
sudo nginx -t
sudo systemctl reload nginx
# logs
sudo journalctl -u whisper-web -n 200 --no-pager- Uploaded source is stored under
/opt/whisper-service/uploads - Successful job deletes source file
- Artifacts are kept in
/opt/whisper-service/outputs
Recommended next step:
- add retention cleanup timer (e.g., delete artifacts older than 30 days)