Skip to content

Conversation

@zhangzhefang-github
Copy link

@zhangzhefang-github zhangzhefang-github commented Nov 2, 2025

Description

This PR adds support for capturing usage metadata and stop_reason from Anthropic's RawMessageDeltaEvent during streaming responses, addressing issue #20194.

Changes:

  • Import RawMessageStartEvent, RawMessageDeltaEvent, RawMessageStopEvent
  • Track usage metadata (input_tokens, output_tokens) from streaming events
  • Capture stop_reason (end_turn, max_tokens, tool_use, etc.) from events
  • Include metadata in ChatResponse.additional_kwargs for both sync and async
  • Add comprehensive tests for both stream_chat and astream_chat methods

Benefits:

  • 📊 Enable cost tracking and monitoring for streaming responses
  • 🐛 Provide stop_reason for debugging and optimization
  • 🔄 Achieve feature parity with OpenAI integration (finish_reason)
  • ⬆️ Maintain backward compatibility (new fields default to None)

Fixes #20194

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

  • Yes
  • No - This is an enhancement to existing package

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

  • Yes
  • No - Will be handled by maintainers during release

Type of Change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.

  • I added new unit tests to cover this change
  • I believe this change is already covered by existing unit tests

Test coverage:

  • Added test_stream_chat_usage_and_stop_reason() for sync streaming
  • Added test_astream_chat_usage_and_stop_reason() for async streaming
  • Tests verify input_tokens, output_tokens, and stop_reason capture
  • Tests require ANTHROPIC_API_KEY environment variable

Example Usage

from llama_index.llms.anthropic import Anthropic
from llama_index.core.llms import ChatMessage

llm = Anthropic(model="claude-3-5-sonnet-latest")
messages = [ChatMessage(role="user", content="Hello")]

for chunk in llm.stream_chat(messages):
    usage = chunk.message.additional_kwargs.get("usage")
    stop_reason = chunk.message.additional_kwargs.get("stop_reason")
    
    if usage:
        print(f"Tokens: {usage['input_tokens']} in, {usage['output_tokens']} out")
    if stop_reason:
        print(f"Stopped because: {stop_reason}")
Suggested Checklist:
 I have performed a self-review of my own code
 I have commented my code, particularly in hard-to-understand areas
 I have made corresponding changes to the documentation
 I have added Google Colab support for the newly added notebooks.
 My changes generate no new warnings
 I have added tests that prove my fix is effective or that my feature works
 New and existing unit tests pass locally with my changes
 I ran uv run make format; uv run make lint to appease the lint gods
Notes
This change is backward compatible - new fields default to None if not available
The implementation follows the existing pattern used in the codebase
Usage metadata enables cost tracking for streaming responses
stop_reason provides insight into why streaming stopped (feature parity with OpenAI)

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Nov 2, 2025
@zhangzhefang-github zhangzhefang-github force-pushed the feat/anthropic-raw-message-delta-event branch from fba6b5d to b10dfef Compare November 2, 2025 16:25
Copy link
Member

@AstraBert AstraBert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR seems legit but the e2e tests you added are failing with:

AssertionError: stop_reason should be captured from RawMessageDeltaEvent
E       assert None is not None

I've done some digging around with some debugging statements and it seems like the stop_reason is not captured because the last_chunk is streamed before we receive the delta with the stop reason (see this log):

CHUNK: {'usage': {'input_tokens': 14, 'output_tokens': 2}, 'stop_reason': None}
CHUNK: {'usage': {'input_tokens': 14, 'output_tokens': 2}, 'stop_reason': None}
CHUNK: {'usage': {'input_tokens': 14, 'output_tokens': 2}, 'stop_reason': None}
RECEIVED_STOP_REASON: end_turn

which causes the test to fail. I would fix that before merging this PR

@zhangzhefang-github
Copy link
Author

@AstraBert Thanks for catching that timing issue!

I've fixed it by yielding an additional chunk when we receive the RawMessageDeltaEvent. Now the flow is:

  1. ContentBlockDeltaEvent → yield chunks with content
  2. ContentBlockStopEvent → yield final content chunk
  3. RawMessageDeltaEventyield metadata update with stop_reason

The stop_reason should now be correctly captured in the last chunk. The tests should pass now. 🎉

zhangzhefang-github and others added 2 commits November 4, 2025 23:08
This commit implements support for capturing usage metadata and stop_reason
from RawMessageDeltaEvent when using Anthropic's streaming API, addressing
issue run-llama#20194.

Changes:
- Added imports for RawMessageStartEvent, RawMessageDeltaEvent, RawMessageStopEvent
- Modified stream_chat() to track usage_metadata and stop_reason from streaming events
- Modified astream_chat() to track usage_metadata and stop_reason from streaming events
- Added usage and stop_reason to ChatMessage additional_kwargs in streaming responses
- Added comprehensive tests for both sync and async streaming metadata capture

The implementation captures:
- input_tokens and output_tokens from usage metadata for cost tracking
- stop_reason (e.g., 'end_turn', 'max_tokens', 'tool_use') to understand why streaming stopped

This maintains backward compatibility as the new fields default to None when not available.

Fixes run-llama#20194

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
The previous implementation had a timing issue where stop_reason arrived
after the last chunk was yielded. This fix ensures that when we receive
RawMessageDeltaEvent (which contains the stop_reason), we yield an
additional chunk with the updated metadata.

This addresses the test failure identified by @AstraBert where:
- Content chunks were yielded before stop_reason arrived
- stop_reason was None in the last chunk
- RawMessageDeltaEvent came after ContentBlockStopEvent

Now the flow is:
1. ContentBlockDeltaEvent -> yield chunks with content
2. ContentBlockStopEvent -> yield final content chunk
3. RawMessageDeltaEvent -> yield metadata update with stop_reason
4. Tests now pass with stop_reason correctly captured
@zhangzhefang-github zhangzhefang-github force-pushed the feat/anthropic-raw-message-delta-event branch from b23094f to 7ac1df0 Compare November 4, 2025 15:08
logan-markewich and others added 4 commits November 7, 2025 15:58
Add mock-based unit tests for RawMessageDeltaEvent usage and stop_reason
capture to meet CI coverage requirements without requiring API keys.

**Changes:**
- Add test_stream_chat_usage_and_stop_reason_mock() for sync streaming
- Add test_astream_chat_usage_and_stop_reason_mock() for async streaming
- Mock RawMessageDeltaEvent with usage metadata and stop_reason
- Achieve 54% test coverage (meets 50% CI requirement)

**Testing:**
- Both mock tests pass without ANTHROPIC_API_KEY
- Existing real API tests remain for integration testing
- Coverage increased from 6% to 54%

Follows CONTRIBUTING.md guidance: "If you're integrating with a remote
system, mock it to prevent test failures from external changes."

Related to run-llama#20194
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request]: Support for Anthropic RawMessageDeltaEvent

3 participants