feat(llms/anthropic): Add support for RawMessageDeltaEvent in streaming #20206

zhangzhefang-github · 2025-11-02T16:03:36Z

Description

This PR adds support for capturing usage metadata and stop_reason from Anthropic's RawMessageDeltaEvent during streaming responses, addressing issue #20194.

Changes:

Import RawMessageStartEvent, RawMessageDeltaEvent, RawMessageStopEvent
Track usage metadata (input_tokens, output_tokens) from streaming events
Capture stop_reason (end_turn, max_tokens, tool_use, etc.) from events
Include metadata in ChatResponse.additional_kwargs for both sync and async
Add comprehensive tests for both stream_chat and astream_chat methods

Benefits:

📊 Enable cost tracking and monitoring for streaming responses
🐛 Provide stop_reason for debugging and optimization
🔄 Achieve feature parity with OpenAI integration (finish_reason)
⬆️ Maintain backward compatibility (new fields default to None)

Fixes #20194

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

Yes
No - This is an enhancement to existing package

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

Yes
No - Will be handled by maintainers during release

Type of Change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.

I added new unit tests to cover this change
I believe this change is already covered by existing unit tests

Test coverage:

Added test_stream_chat_usage_and_stop_reason() for sync streaming
Added test_astream_chat_usage_and_stop_reason() for async streaming
Tests verify input_tokens, output_tokens, and stop_reason capture
Tests require ANTHROPIC_API_KEY environment variable

Example Usage

from llama_index.llms.anthropic import Anthropic
from llama_index.core.llms import ChatMessage

llm = Anthropic(model="claude-3-5-sonnet-latest")
messages = [ChatMessage(role="user", content="Hello")]

for chunk in llm.stream_chat(messages):
    usage = chunk.message.additional_kwargs.get("usage")
    stop_reason = chunk.message.additional_kwargs.get("stop_reason")
    
    if usage:
        print(f"Tokens: {usage['input_tokens']} in, {usage['output_tokens']} out")
    if stop_reason:
        print(f"Stopped because: {stop_reason}")
Suggested Checklist:
 I have performed a self-review of my own code
 I have commented my code, particularly in hard-to-understand areas
 I have made corresponding changes to the documentation
 I have added Google Colab support for the newly added notebooks.
 My changes generate no new warnings
 I have added tests that prove my fix is effective or that my feature works
 New and existing unit tests pass locally with my changes
 I ran uv run make format; uv run make lint to appease the lint gods
Notes
This change is backward compatible - new fields default to None if not available
The implementation follows the existing pattern used in the codebase
Usage metadata enables cost tracking for streaming responses
stop_reason provides insight into why streaming stopped (feature parity with OpenAI)

AstraBert

The PR seems legit but the e2e tests you added are failing with:

AssertionError: stop_reason should be captured from RawMessageDeltaEvent
E       assert None is not None

I've done some digging around with some debugging statements and it seems like the stop_reason is not captured because the last_chunk is streamed before we receive the delta with the stop reason (see this log):

CHUNK: {'usage': {'input_tokens': 14, 'output_tokens': 2}, 'stop_reason': None}
CHUNK: {'usage': {'input_tokens': 14, 'output_tokens': 2}, 'stop_reason': None}
CHUNK: {'usage': {'input_tokens': 14, 'output_tokens': 2}, 'stop_reason': None}
RECEIVED_STOP_REASON: end_turn

which causes the test to fail. I would fix that before merging this PR

zhangzhefang-github · 2025-11-04T14:58:31Z

@AstraBert Thanks for catching that timing issue!

I've fixed it by yielding an additional chunk when we receive the RawMessageDeltaEvent. Now the flow is:

ContentBlockDeltaEvent → yield chunks with content
ContentBlockStopEvent → yield final content chunk
RawMessageDeltaEvent → yield metadata update with stop_reason ✅

The stop_reason should now be correctly captured in the last chunk. The tests should pass now. 🎉

This commit implements support for capturing usage metadata and stop_reason from RawMessageDeltaEvent when using Anthropic's streaming API, addressing issue run-llama#20194. Changes: - Added imports for RawMessageStartEvent, RawMessageDeltaEvent, RawMessageStopEvent - Modified stream_chat() to track usage_metadata and stop_reason from streaming events - Modified astream_chat() to track usage_metadata and stop_reason from streaming events - Added usage and stop_reason to ChatMessage additional_kwargs in streaming responses - Added comprehensive tests for both sync and async streaming metadata capture The implementation captures: - input_tokens and output_tokens from usage metadata for cost tracking - stop_reason (e.g., 'end_turn', 'max_tokens', 'tool_use') to understand why streaming stopped This maintains backward compatibility as the new fields default to None when not available. Fixes run-llama#20194 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

@AstraBert

The previous implementation had a timing issue where stop_reason arrived after the last chunk was yielded. This fix ensures that when we receive RawMessageDeltaEvent (which contains the stop_reason), we yield an additional chunk with the updated metadata. This addresses the test failure identified by @AstraBert where: - Content chunks were yielded before stop_reason arrived - stop_reason was None in the last chunk - RawMessageDeltaEvent came after ContentBlockStopEvent Now the flow is: 1. ContentBlockDeltaEvent -> yield chunks with content 2. ContentBlockStopEvent -> yield final content chunk 3. RawMessageDeltaEvent -> yield metadata update with stop_reason 4. Tests now pass with stop_reason correctly captured

Add mock-based unit tests for RawMessageDeltaEvent usage and stop_reason capture to meet CI coverage requirements without requiring API keys. **Changes:** - Add test_stream_chat_usage_and_stop_reason_mock() for sync streaming - Add test_astream_chat_usage_and_stop_reason_mock() for async streaming - Mock RawMessageDeltaEvent with usage metadata and stop_reason - Achieve 54% test coverage (meets 50% CI requirement) **Testing:** - Both mock tests pass without ANTHROPIC_API_KEY - Existing real API tests remain for integration testing - Coverage increased from 6% to 54% Follows CONTRIBUTING.md guidance: "If you're integrating with a remote system, mock it to prevent test failures from external changes." Related to run-llama#20194

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Nov 2, 2025

zhangzhefang-github force-pushed the feat/anthropic-raw-message-delta-event branch from fba6b5d to b10dfef Compare November 2, 2025 16:25

AstraBert reviewed Nov 4, 2025

View reviewed changes

zhangzhefang-github and others added 2 commits November 4, 2025 23:08

zhangzhefang-github force-pushed the feat/anthropic-raw-message-delta-event branch from b23094f to 7ac1df0 Compare November 4, 2025 15:08

zhangzhefang-github requested a review from AstraBert November 4, 2025 15:12

logan-markewich and others added 4 commits November 7, 2025 15:58

Merge branch 'main' into feat/anthropic-raw-message-delta-event

7398cd1

vbump

b828018

fix lint

734b7ad

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(llms/anthropic): Add support for RawMessageDeltaEvent in streaming #20206

feat(llms/anthropic): Add support for RawMessageDeltaEvent in streaming #20206

zhangzhefang-github commented Nov 2, 2025 •

edited

Loading

Uh oh!

AstraBert left a comment

Uh oh!

zhangzhefang-github commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat(llms/anthropic): Add support for RawMessageDeltaEvent in streaming #20206

Are you sure you want to change the base?

feat(llms/anthropic): Add support for RawMessageDeltaEvent in streaming #20206

Conversation

zhangzhefang-github commented Nov 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

New Package?

Version Bump?

Type of Change

How Has This Been Tested?

Example Usage

Uh oh!

AstraBert left a comment

Choose a reason for hiding this comment

Uh oh!

zhangzhefang-github commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zhangzhefang-github commented Nov 2, 2025 •

edited

Loading