fix: improve cactus API ergonomics and consistency#4150
Merged
Conversation
- Wrap complete_stream return type in CompletionStream struct with Stream impl, cancel() method, and Drop-based cleanup - Wrap transcribe_stream return type in TranscriptionSession struct with Stream impl, audio_tx()/cancel() accessors, and Drop cleanup - Unify token count types: CompletionResult u32->u64, StreamResult f64->u64 (now consistent with TranscriptionResult which already used u64) - Add tracing::warn when recovering from poisoned inference mutex - Update callers in llm-cactus and transcribe-cactus Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
Contributor
Author
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
✅ Deploy Preview for hyprnote-storybook canceled.
|
✅ Deploy Preview for hyprnote canceled.
|
…rop impls Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
…ncy of transcribe-cactus) Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
The C++ side stores token counts as double and serialises via operator<< which may emit 42.0 instead of 42. serde_json rejects 42.0 when deserialising into u64, so we accept both forms. Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
CompletionStream and TranscriptionSession Drop impls were calling handle.join() which blocks the current thread. When dropped on a tokio worker thread (e.g. SSE client disconnect), this starves the async runtime. Now we spawn a lightweight background thread to join and log panics without blocking the caller. Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fix: improve cactus API ergonomics and consistency
Summary
Addresses three low-priority issues from a cactus FFI wrapper review:
Stream return type ergonomics:
complete_streampreviously returned a 3-tuple(Stream, CancellationToken, JoinHandle)andtranscribe_streamreturned a 4-tuple. Both are now wrapped in proper structs (CompletionStreamandTranscriptionSession) that implementStream, exposecancel()methods, and handle cleanup viaDrop.Token count type consistency: Unified token count fields across result structs —
CompletionResultchanged fromu32tou64,StreamResultchanged fromf64tou64(now consistent withTranscriptionResultwhich already usedu64).Mutex poisoning observability: Added
tracing::warnwhen recovering from a poisoned inference mutex inModel::lock_inference.Callers in
llm-cactusandtranscribe-cactusare updated accordingly. Thedrop_guard+unfoldpattern inllm-cactusstreaming is replaced byCompletionStream's ownDropimpl, and the manualworker_handlesjoin loop intranscribe-cactusis replaced byTranscriptionSession::Drop. Worker panic logging is preserved viatracing::error!in bothDropimpls.Review & Testing Checklist for Human
f64→u64deserialization forStreamResulttoken fields: If the C++build_stream_responseemits JSON numbers as floats (e.g.,"prefill_tokens": 12.0),serde_jsonwill fail to deserialize them intou64. Verify the C++ side emits integer-typed JSON for these fields, or add adeserialize_withhelper to handle both. This is a runtime-only failure that CI cannot catch.Dropon async runtime: BothCompletionStream::drop()andTranscriptionSession::drop()callhandle.join(), which blocks the current thread. Verify this isn't called on a tokio worker thread (it should be fine since SSE streams and websocket sessions run on their own tasks, but worth confirming).drop_guard+unfoldpattern inllm-cactuswas replaced by relying onCompletionStream'sDrop. Verify that client disconnect still cancels inference promptly — the new path is: SSE stream dropped →FilterMapdropped →CompletionStreamdropped →cancel()+join().Suggested test plan: Run an LLM streaming completion and a live transcription session end-to-end. Verify (1) streaming tokens arrive normally, (2) client disconnect cancels inference promptly, and (3) token count fields in metrics/responses are populated as integers.
CI status: All functional checks pass (cactus, desktop_ci on linux-x86_64/linux-aarch64/macos, local-stt-e2e). The
fmtcheck failed due to a transient network timeout downloading rustfmt, unrelated to these changes.Notes