Skip to content

Comments

fix: improve cactus API ergonomics and consistency#4150

Merged
yujonglee merged 6 commits intomainfrom
devin/1771738169-cactus-api-improvements
Feb 22, 2026
Merged

fix: improve cactus API ergonomics and consistency#4150
yujonglee merged 6 commits intomainfrom
devin/1771738169-cactus-api-improvements

Conversation

@devin-ai-integration
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot commented Feb 22, 2026

fix: improve cactus API ergonomics and consistency

Summary

Addresses three low-priority issues from a cactus FFI wrapper review:

  1. Stream return type ergonomics: complete_stream previously returned a 3-tuple (Stream, CancellationToken, JoinHandle) and transcribe_stream returned a 4-tuple. Both are now wrapped in proper structs (CompletionStream and TranscriptionSession) that implement Stream, expose cancel() methods, and handle cleanup via Drop.

  2. Token count type consistency: Unified token count fields across result structs — CompletionResult changed from u32 to u64, StreamResult changed from f64 to u64 (now consistent with TranscriptionResult which already used u64).

  3. Mutex poisoning observability: Added tracing::warn when recovering from a poisoned inference mutex in Model::lock_inference.

Callers in llm-cactus and transcribe-cactus are updated accordingly. The drop_guard+unfold pattern in llm-cactus streaming is replaced by CompletionStream's own Drop impl, and the manual worker_handles join loop in transcribe-cactus is replaced by TranscriptionSession::Drop. Worker panic logging is preserved via tracing::error! in both Drop impls.

Review & Testing Checklist for Human

  • f64u64 deserialization for StreamResult token fields: If the C++ build_stream_response emits JSON numbers as floats (e.g., "prefill_tokens": 12.0), serde_json will fail to deserialize them into u64. Verify the C++ side emits integer-typed JSON for these fields, or add a deserialize_with helper to handle both. This is a runtime-only failure that CI cannot catch.
  • Blocking Drop on async runtime: Both CompletionStream::drop() and TranscriptionSession::drop() call handle.join(), which blocks the current thread. Verify this isn't called on a tokio worker thread (it should be fine since SSE streams and websocket sessions run on their own tasks, but worth confirming).
  • Streaming cancellation behavior change: The drop_guard+unfold pattern in llm-cactus was replaced by relying on CompletionStream's Drop. Verify that client disconnect still cancels inference promptly — the new path is: SSE stream dropped → FilterMap dropped → CompletionStream dropped → cancel() + join().

Suggested test plan: Run an LLM streaming completion and a live transcription session end-to-end. Verify (1) streaming tokens arrive normally, (2) client disconnect cancels inference promptly, and (3) token count fields in metrics/responses are populated as integers.

CI status: All functional checks pass (cactus, desktop_ci on linux-x86_64/linux-aarch64/macos, local-stt-e2e). The fmt check failed due to a transient network timeout downloading rustfmt, unrelated to these changes.

Notes

- Wrap complete_stream return type in CompletionStream struct with
  Stream impl, cancel() method, and Drop-based cleanup
- Wrap transcribe_stream return type in TranscriptionSession struct
  with Stream impl, audio_tx()/cancel() accessors, and Drop cleanup
- Unify token count types: CompletionResult u32->u64, StreamResult f64->u64
  (now consistent with TranscriptionResult which already used u64)
- Add tracing::warn when recovering from poisoned inference mutex
- Update callers in llm-cactus and transcribe-cactus

Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR that start with 'DevinAI' or '@devin'.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@netlify
Copy link

netlify bot commented Feb 22, 2026

Deploy Preview for hyprnote-storybook canceled.

Name Link
🔨 Latest commit 18c3f4b
🔍 Latest deploy log https://app.netlify.com/projects/hyprnote-storybook/deploys/699a9f9b17c03f0008f50226

@netlify
Copy link

netlify bot commented Feb 22, 2026

Deploy Preview for hyprnote canceled.

Name Link
🔨 Latest commit 18c3f4b
🔍 Latest deploy log https://app.netlify.com/projects/hyprnote/deploys/699a9f9b59f899000825669f

devin-ai-integration bot and others added 4 commits February 22, 2026 05:37
…rop impls

Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
…ncy of transcribe-cactus)

Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
The C++ side stores token counts as double and serialises via
operator<< which may emit 42.0 instead of 42. serde_json rejects
42.0 when deserialising into u64, so we accept both forms.

Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
Copy link
Contributor Author

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 3 additional findings in Devin Review.

Open in Devin Review

CompletionStream and TranscriptionSession Drop impls were calling
handle.join() which blocks the current thread. When dropped on a
tokio worker thread (e.g. SSE client disconnect), this starves the
async runtime. Now we spawn a lightweight background thread to join
and log panics without blocking the caller.

Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
@yujonglee yujonglee merged commit a3c7a04 into main Feb 22, 2026
15 of 16 checks passed
@yujonglee yujonglee deleted the devin/1771738169-cactus-api-improvements branch February 22, 2026 06:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant