Share single model across channels and add 2-channel e2e tests#4142
Merged
Share single model across channels and add 2-channel e2e tests#4142
Conversation
Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
Contributor
Author
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
✅ Deploy Preview for hyprnote-storybook canceled.
|
✅ Deploy Preview for hyprnote canceled.
|
Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Moves model creation (
hypr_cactus::Model::builder().build()) out of the per-channel loop so that a single model is loaded once and shared (viaArc::clone()) across all channel streams. Previously, for 2-channel interleaved audio, two full model instances (weights + VAD) were loaded into memory.This is safe because each
cactus_stream_transcribe_processcall runs a complete encode→decode cycle and then callscactus_reset(), leaving no model state between calls. The per-channel state (audio buffer, confirmation logic) lives entirely in the separateCactusStreamTranscribeHandleinstances. Concurrent access is serialized by the existingmodel_mutexon the C++ side.Tradeoff: Channels now serialize on the model mutex instead of running in parallel. With ~50-150ms inference per 300ms chunk, this adds at most one inference duration of latency when both channels need the model simultaneously.
Updates since last revision
Added two new e2e tests and a CI workflow to exercise 2-channel inference:
e2e_streaming_dual_channel: Creates a single sharedModelwith twotranscribe_streamhandles, feedsenglish_1to ch0 andenglish_2to ch1, asserts both channels produce events.e2e_websocket_dual_channel: Spins up the fullTranscribeService, connects via WebSocket withchannels=2, sends interleaved stereo PCM (english_1 + english_2), assertsResultsmessages arrive for bothchannel_index0 and 1..github/workflows/local_stt_e2e.yaml: New workflow running the above tests (plus the existinge2e_streaming) ondepot-ubuntu-24.04-arm-8with the moonshine-base model. Triggers on changes totranscribe-cactus,cactus, orcactus-sys.Review & Testing Checklist for Human
e2e_streaming_dual_channeltest usestokio::select!over two event streams — if one stream finishes before the other (e.g. shorter audio), the loop breaks and remaining events from the other stream are silently dropped. Verify this doesn't cause false passes or flaky failures.local_stt_e2eworkflow passes end-to-end on the first run.cactus_reset()fully clears all model state (KV-cache, encoder output, persistent nodes) between channel inferences — any missed state would cause one channel's audio context to bleed into the other.Suggested manual test plan: Run a real 2-channel transcription session on device and compare transcription quality/latency against the previous 2-model approach.
Notes