feat(soniox): add Soniox real-time streaming STT provider by DamianPala · Pull Request #418 · OpenWhispr/openwhispr

DamianPala · 2026-03-12T12:05:03Z

Summary

Adds Soniox as a fifth cloud STT provider. Soniox offers strong accuracy on English as well as Slavic and Eastern European languages, competitive pricing (significantly cheaper than Deepgram/AssemblyAI for comparable quality), and sub-second cold start (~250ms, no warmup connection needed).

Key additions:

Secondary language hints for mixed-language transcription (e.g. Polish + English in the same session), useful for multilingual users who code-switch
Full integration matching existing provider patterns: settings UI, onboarding, API key management, BYOK detection, icon, i18n (10 locales)

Also introduces the project's first unit tests (25 tests, Node built-in runner, zero new deps).

Changes

Core streaming (src/helpers/sonioxStreaming.js): New 375-line module. WebSocket connection to Soniox RT API, cold-start PCM buffering (3s at 16kHz), keepalive with 30s idle timeout, graceful finalization with drain. Includes text-level filler word cleanup to handle Soniox BPE tokenization artifacts.

IPC & audio (ipcHandlers.js, audioManager.js): Soniox handlers mirroring existing providers. isDestroyed() guards, cleanupAllStreaming() on app quit, defensive trim before paste.

UI (TranscriptionModelPicker.tsx, SettingsPage.tsx, OnboardingFlow.tsx): Soniox tab with API key input, model selection via registry, secondary language selector for mixed-language transcription. Unified with existing provider card pattern.

Tests (tests/helpers/sonioxStreaming.test.js): 25 tests for text processing using Node built-in test runner (zero new dependencies).

Test plan

npm test — 25 unit tests pass
Manual: Add Soniox API key in Settings → Soniox tab, select stt-rt-v4 model
Manual: Record speech with fillers ("uh", "um", "hmm") → verify they are stripped from transcript
Manual: Record speech starting with a filler → verify first letter is capitalized
Manual: Set secondary language (e.g. English + Polish), speak mixed-language → verify transcription
Manual: Verify no WebSocket leak after multiple start/stop cycles (check DevTools Network tab)
CI: Linux and Windows builds pass (build run)

gabrielste1n · 2026-03-13T16:14:29Z

very cool thanks @DamianPala - will aim to review asap

alumpe · 2026-03-14T22:28:12Z

Soniox looks really great and the quality of their speech recognition is crazy good, I'd use this asap once its merged in!

DamianPala · 2026-03-15T09:32:16Z

I am still testing this daily and ran into one issue: latency. Using the Soniox backend from Europe, it takes 400-500ms to open a WebSocket each time. That delay is noticeable compared to other providers that pre-open connections.

I added a configurable warm connection Stay connected for setting in the Soniox tab. When enabled, the WebSocket stays open between recordings so the next one starts instantly instead of ~500ms. Since Soniox charges for connection time (not just audio), each option shows the estimated cost increase. Default is Off.

Still testing real-world costs. Should be ready to merge in the next several days.

PS. I contacted Soniox with a feature request proposal but I don't think they will change it quicly.

Add Soniox as a fourth cloud streaming provider alongside Deepgram, AssemblyAI, and OpenAI Realtime. Includes WebSocket streaming core with cold-start buffering, full Electron IPC pipeline, settings UI with API key management, onboarding validation, and BYOK detection.

- Remove Soniox-specific render branch in TranscriptionModelPicker, use same ModelCardList + API key maps as OpenAI/Groq/Mistral - Replace hardcoded "stt-rt-v4" in UI with registry-based model selection - Add Soniox "S" icon SVG (from official wordmark) - Translate soniox_stt_rt_v4 model description in 9 locale files

When audioManager calls finalize() before disconnect(), the server has already received it. Sending it again in drainFinalTokens() caused a 3s timeout waiting for a response that would never come. Track finalize state with _finalizeSent flag and skip the redundant call.

Soniox connects in ~250ms, no benefit from keeping an idle WebSocket between dictation sessions. Avoids unnecessary Soniox session usage and potential idle timeout issues.

- Remove closeResolve (never assigned, close handler check unreachable) - Use getFullTranscript() instead of inline .map().join() duplicate - Remove soniox special-case in handleCloudProviderChange (generic path handles it)

Soniox supports multi-language transcription via language_hints array. Add a secondary language selector in the Soniox provider tab so users can hint a second language (e.g. Polish + English) for code-switching. - New sonioxSecondaryLanguage setting in store/hook - LanguageSelector dropdown in Soniox tab (inline layout) - Disabled when primary language is auto (no bias needed) - Language codes normalized to base form (en-US → en) - i18n keys added for all 10 locales

- Add 30s idle timeout to Soniox keepalive to prevent zombie WebSocket connections surviving renderer hot-reload or crash - Add cleanupAllStreaming() to close all streaming backends on app quit - Add isDestroyed() guards to Soniox and dictation IPC callbacks, matching the pattern used by Deepgram and AssemblyAI - Prefer cleanupAll() over cleanup() for backends that support it (Deepgram, AssemblyAI) to also clean warm connections and timers

Soniox sends a U+FFFD replacement character as a final token when recording silence, which gets pasted as garbage. Filter out empty, whitespace-only, and replacement character tokens in Soniox handler. Also trim finalText before the paste guard in audioManager as a defensive check for all streaming providers.

Strip hesitation fillers (uh, um, yyy, eee, mmm, hmm) from assembled transcript text. Soniox BPE tokenization splits fillers across sub-word tokens, so removal works on joined text using word boundaries. Capitalizes first letter after filler removal at sentence boundaries (.!?) and at text start, with full Unicode support (Polish ć/ó/ś, accented Latin, Cyrillic). Preserves real exclamations (Oh, Ah) and words containing filler substrings (umbrella, human, summer). Adds first test infrastructure (node:test, zero deps) with 25 tests.

Extract _drainCallback helper to eliminate near-identical drainFinalTokens/drainSessionEnd methods. Add isValidToken predicate for clearer token filtering, extract isExplicitLang to simplify nested ternary, and log errors in cleanup catch blocks.

Pre-opens WebSocket between recordings to eliminate ~500ms cold-start delay. Configurable idle timeout (30s-5min) with cost estimates in UI. Falls back to cold-start on config mismatch or connection loss.

The filler regex consumed periods after fillers ("word, uh. Next" became "word Next"), merging sentences and losing capitalization. The replacement function now checks whether a consumed period is a sentence boundary or part of a standalone filler. Also stops treating "hmm" as a filler since it carries intentional meaning ("Hmm, interesting" vs hesitation noise like "uh" or "eee").

setSonioxKeepAliveTimeout was implemented in the store but missing from the SettingsState interface. Also adds sonioxKeepAliveTimeout to NUMERIC_SETTINGS so cross-window sync preserves the number type.

DamianPala · 2026-03-18T07:59:36Z

Branch is rebased, tested, ready for review. Keep-alive works well in practice - cold start 400-800 ms (at my location), warm ~50-100 ms. I've been using it daily for a week without issues. Also fixed filler removal edge cases and added a few TS consistency fixes along the way.

Re CodeQL: false positive. It flags debugLogger.js:189 where meta (with WebSocket error details) is passed to console.log. Every other streaming backend does the same thing - they're just in the CodeQL baseline because the files are older. My file is new so it shows as a "new alert".

Soniox BPE emits spaces as standalone tokens (e.g. after punctuation). The .trim() check in isValidToken rejected them, merging words across punctuation: "No,ładne" instead of "No, ładne".

DamianPala force-pushed the feat/soniox-streaming branch from 221b476 to 9d02380 Compare March 12, 2026 12:07

DamianPala marked this pull request as ready for review March 12, 2026 12:34

DamianPala force-pushed the feat/soniox-streaming branch from 9d02380 to 86990c2 Compare March 13, 2026 10:18

gabrielste1n self-requested a review March 13, 2026 16:14

DamianPala marked this pull request as draft March 13, 2026 21:36

DamianPala added 13 commits March 18, 2026 08:10

fix(soniox): make warmup a no-op for cold-start-only design

292d69a

Soniox connects in ~250ms, no benefit from keeping an idle WebSocket between dictation sessions. Avoids unnecessary Soniox session usage and potential idle timeout issues.

refactor(soniox): remove dead code and redundant special-case

b1f2454

- Remove closeResolve (never assigned, close handler check unreachable) - Use getFullTranscript() instead of inline .map().join() duplicate - Remove soniox special-case in handleCloudProviderChange (generic path handles it)

feat(soniox): add warm connection keep-alive for instant start

349f7db

Pre-opens WebSocket between recordings to eliminate ~500ms cold-start delay. Configurable idle timeout (30s-5min) with cost estimates in UI. Falls back to cold-start on config mismatch or connection loss.

fix(soniox): add missing type and numeric sync for keep-alive setting

f00cd6c

setSonioxKeepAliveTimeout was implemented in the store but missing from the SettingsState interface. Also adds sonioxKeepAliveTimeout to NUMERIC_SETTINGS so cross-window sync preserves the number type.

DamianPala force-pushed the feat/soniox-streaming branch from 86990c2 to f00cd6c Compare March 18, 2026 07:37

DamianPala marked this pull request as ready for review March 18, 2026 07:59

DamianPala added 2 commits March 18, 2026 12:01

fix(soniox): preserve standalone space tokens from BPE tokenizer

ac31d5d

Soniox BPE emits spaces as standalone tokens (e.g. after punctuation). The .trim() check in isValidToken rejected them, merging words across punctuation: "No,ładne" instead of "No, ładne".

feat(soniox): pass custom dictionary as context.terms

0a1ed72

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(soniox): add Soniox real-time streaming STT provider#418

feat(soniox): add Soniox real-time streaming STT provider#418
DamianPala wants to merge 15 commits intoOpenWhispr:mainfrom
DamianPala:feat/soniox-streaming

DamianPala commented Mar 12, 2026 •

edited

Loading

Uh oh!

gabrielste1n commented Mar 13, 2026

Uh oh!

alumpe commented Mar 14, 2026

Uh oh!

DamianPala commented Mar 15, 2026

Uh oh!

DamianPala commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

DamianPala commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Uh oh!

gabrielste1n commented Mar 13, 2026

Uh oh!

alumpe commented Mar 14, 2026

Uh oh!

DamianPala commented Mar 15, 2026

Uh oh!

DamianPala commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DamianPala commented Mar 12, 2026 •

edited

Loading