-
Notifications
You must be signed in to change notification settings - Fork 6.5k
feat(llms/anthropic): Add support for RawMessageDeltaEvent in streaming #20206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(llms/anthropic): Add support for RawMessageDeltaEvent in streaming #20206
Conversation
fba6b5d to
b10dfef
Compare
AstraBert
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR seems legit but the e2e tests you added are failing with:
AssertionError: stop_reason should be captured from RawMessageDeltaEvent
E assert None is not None
I've done some digging around with some debugging statements and it seems like the stop_reason is not captured because the last_chunk is streamed before we receive the delta with the stop reason (see this log):
CHUNK: {'usage': {'input_tokens': 14, 'output_tokens': 2}, 'stop_reason': None}
CHUNK: {'usage': {'input_tokens': 14, 'output_tokens': 2}, 'stop_reason': None}
CHUNK: {'usage': {'input_tokens': 14, 'output_tokens': 2}, 'stop_reason': None}
RECEIVED_STOP_REASON: end_turn
which causes the test to fail. I would fix that before merging this PR
|
@AstraBert Thanks for catching that timing issue! I've fixed it by yielding an additional chunk when we receive the
The |
This commit implements support for capturing usage metadata and stop_reason from RawMessageDeltaEvent when using Anthropic's streaming API, addressing issue run-llama#20194. Changes: - Added imports for RawMessageStartEvent, RawMessageDeltaEvent, RawMessageStopEvent - Modified stream_chat() to track usage_metadata and stop_reason from streaming events - Modified astream_chat() to track usage_metadata and stop_reason from streaming events - Added usage and stop_reason to ChatMessage additional_kwargs in streaming responses - Added comprehensive tests for both sync and async streaming metadata capture The implementation captures: - input_tokens and output_tokens from usage metadata for cost tracking - stop_reason (e.g., 'end_turn', 'max_tokens', 'tool_use') to understand why streaming stopped This maintains backward compatibility as the new fields default to None when not available. Fixes run-llama#20194 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
The previous implementation had a timing issue where stop_reason arrived after the last chunk was yielded. This fix ensures that when we receive RawMessageDeltaEvent (which contains the stop_reason), we yield an additional chunk with the updated metadata. This addresses the test failure identified by @AstraBert where: - Content chunks were yielded before stop_reason arrived - stop_reason was None in the last chunk - RawMessageDeltaEvent came after ContentBlockStopEvent Now the flow is: 1. ContentBlockDeltaEvent -> yield chunks with content 2. ContentBlockStopEvent -> yield final content chunk 3. RawMessageDeltaEvent -> yield metadata update with stop_reason 4. Tests now pass with stop_reason correctly captured
b23094f to
7ac1df0
Compare
Add mock-based unit tests for RawMessageDeltaEvent usage and stop_reason capture to meet CI coverage requirements without requiring API keys. **Changes:** - Add test_stream_chat_usage_and_stop_reason_mock() for sync streaming - Add test_astream_chat_usage_and_stop_reason_mock() for async streaming - Mock RawMessageDeltaEvent with usage metadata and stop_reason - Achieve 54% test coverage (meets 50% CI requirement) **Testing:** - Both mock tests pass without ANTHROPIC_API_KEY - Existing real API tests remain for integration testing - Coverage increased from 6% to 54% Follows CONTRIBUTING.md guidance: "If you're integrating with a remote system, mock it to prevent test failures from external changes." Related to run-llama#20194
Description
This PR adds support for capturing usage metadata and stop_reason from Anthropic's
RawMessageDeltaEventduring streaming responses, addressing issue #20194.Changes:
RawMessageStartEvent,RawMessageDeltaEvent,RawMessageStopEventinput_tokens,output_tokens) from streaming eventsstop_reason(end_turn, max_tokens, tool_use, etc.) from eventsChatResponse.additional_kwargsfor both sync and asyncstream_chatandastream_chatmethodsBenefits:
stop_reasonfor debugging and optimizationfinish_reason)None)Fixes #20194
New Package?
Did I fill in the
tool.llamahubsection in thepyproject.tomland provide a detailed README.md for my new integration or package?Version Bump?
Did I bump the version in the
pyproject.tomlfile of the package I am updating? (Except for thellama-index-corepackage)Type of Change
Please delete options that are not relevant.
How Has This Been Tested?
Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.
Test coverage:
test_stream_chat_usage_and_stop_reason()for sync streamingtest_astream_chat_usage_and_stop_reason()for async streaminginput_tokens,output_tokens, andstop_reasoncaptureANTHROPIC_API_KEYenvironment variableExample Usage