Skip to content

Conversation

@thesynapses
Copy link

@thesynapses thesynapses commented Nov 24, 2025

Fix: LiteLLM Streaming Content Duplication in Tool Call Responses

Summary

Fixes content duplication in streaming responses when using LiteLLM models with ADK's planning features (e.g., PlanReActPlanner). Planning and reasoning text was appearing twice: once during streaming as individual chunks, and again in the aggregated tool-call message.

Resolves #3697

Problem

When the model generates planning/reasoning text (e.g., <PLANNING>I need to search...</PLANNING>) followed by tool calls during streaming:

  1. Text chunks are streamed to users in real-time (lines 1288-1296)
  2. Same text included again in aggregated tool-call message with content=text (line 1352)

This violates OpenAI/LiteLLM conventions where tool-call-only messages should have content=None.

Solution

Changed line 1352 to set content=None for tool-call messages:

ChatCompletionAssistantMessage(
    role="assistant",
    content=None,  # FIX: Avoid duplication, follow OpenAI spec
    tool_calls=tool_calls,
)

Why This Works

  • Planning text already streamed: Users see it in real-time via individual chunks
  • Preserved in thought_parts: Reasoning captured separately at line 1357 for conversation history
  • Follows API standards: OpenAI, Claude, GPT APIs expect content=None for tool-only messages
  • Correct semantics: Tool-call messages represent function invocations, not answer text

Impact

  • Eliminates content duplication
  • Aligns with OpenAI/LiteLLM conventions
  • Preserves reasoning context via thought_parts
  • Clean conversation history without redundant content
  • Proper semantic representation of tool-call turns

Testing

Tested with:

  • LiteLLM models (Claude, GPT) with planning workflows
  • Streaming enabled with tool calls
  • Multi-turn conversations requiring tool usage

Additional Context

  • Affects only streaming mode with tool calls
  • Non-streaming path already handles this correctly
  • thought_parts parameter preserves reasoning separately from message content
  • No breaking changes to existing APIs

Fixes google#3665

Streaming responses from LiteLLM models (Claude, GPT, etc.) were not setting
finish_reason on aggregated LlmResponse objects, causing agent runners to not
properly recognize completion states.

This fix mirrors the finish_reason mapping logic from the non-streaming path
(lines 776-784) and applies it to both streaming code paths:
- Tool call responses (lines 1340-1368)
- Text-only responses (lines 1369-1390)

Without this fix, agents using Claude or GPT via LiteLLM would encounter stop
conditions that couldn't be properly handled, leading to incomplete responses
or unexpected agent behavior.

Tested with Claude Sonnet 4.5 and GPT-5 via Azure OpenAI in production
multi-agent system with MCP tools.
Fixes google#3676

MCP tool responses arrive as JSON strings but were being double-serialized
by _safe_json_serialize(), creating triple-nested JSON that prevented Claude
and GPT from parsing tool results.

Example of the bug:
'{"content": [{"type": "text", "text": "{\n \"type\"..."}]}'

This fix adds an isinstance(str) check before serialization. If the response
is already a string (from MCP or other sources), it's used directly. Otherwise,
it's serialized normally.

Impact: Without this fix, agents using LiteLLM with MCP tools would successfully
call tools but fail to present results to users, appearing to hang or produce
incomplete responses.

Tested with Claude Sonnet 4.5 and GPT-5 via Azure OpenAI with MCP tools
(Google Drive, HubSpot CRM) in production multi-agent system.
…plication

- Fixes content duplication where planning/reasoning text appears twice
  (once during streaming, again in aggregated tool-call message)
- Aligns with OpenAI/LiteLLM conventions for tool-call messages
- Planning text is preserved in thought_parts and already streamed to users
- Resolves semantic confusion where tool-call messages contained text content

Fixes google#3697
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @thesynapses, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves critical issues related to content handling and API compliance in LiteLLM streaming responses, especially when tool calls are involved. It refines how planning text and tool responses are processed during streaming to ensure a clean, accurate, and API-compliant conversation history, while also improving the agent runner's ability to interpret completion states from various models.

Highlights

  • Eliminate Content Duplication in Streaming Tool Call Responses: Previously, planning and reasoning text would appear twice in streaming responses when using LiteLLM models with ADK's planning features: once as individual chunks and again in the aggregated tool-call message. This fix ensures such text is streamed only once, providing a cleaner user experience.
  • Align Tool-Call Messages with OpenAI/LiteLLM API Conventions: Tool-call-only messages now correctly set content=None, adhering to API specifications. This prevents redundant content in conversation history and ensures proper semantic representation of tool-call turns.
  • Prevent Double-Serialization of Tool Responses: Addressed an issue where _safe_json_serialize would double-serialize already-JSON string tool responses, leading to malformed (triple-nested) JSON and parsing failures for models like Claude/GPT. The change now checks if the response is already a string before serialization.
  • Ensure Proper finish_reason Mapping for Streaming Responses: Implemented logic to correctly map the finish_reason to the FinishReason enum for aggregated LlmResponse objects in streaming mode. This ensures the ADK agent runner accurately recognizes completion states, mirroring behavior from the non-streaming path.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@adk-bot adk-bot added the models [Component] Issues related to model support label Nov 24, 2025
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @thesynapses, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the robustness and API compliance of LiteLLM integrations within ADK, particularly for streaming workflows involving tool calls. It resolves critical issues related to content duplication, incorrect serialization of tool responses, and inconsistent finish_reason handling, leading to cleaner conversation histories, more reliable tool execution, and accurate agent state management.

Highlights

  • Eliminate Content Duplication in Streaming Tool Calls: Resolved an issue where planning and reasoning text was duplicated in streaming responses when LiteLLM models made tool calls, appearing both as individual chunks and within the aggregated tool-call message. This is fixed by setting content=None for tool-only messages, aligning with OpenAI/LiteLLM API specifications.
  • Correct Tool Response Serialization: Addressed a bug where _safe_json_serialize was double-serializing JSON strings within function_response.response, leading to malformed, triple-nested JSON that prevented models like Claude/GPT from correctly parsing tool results. The change now checks if the response is already a string before serialization.
  • Consistent Finish Reason Mapping for Streaming: Implemented logic to correctly map the finish_reason to the FinishReason enum for aggregated LlmResponse objects in streaming scenarios (both tool-call and text-only responses). This ensures that the ADK agent runner can properly recognize completion states, mirroring behavior from non-streaming paths.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively resolves the content duplication issue in LiteLLM streaming responses by setting content=None for tool-call messages, which is a good fix that aligns with API conventions. The additional changes to prevent double-serialization of tool response content and to correctly map the finish_reason for streaming responses are also valuable improvements. I've noticed a small area of code duplication in the new finish_reason handling logic and have left a comment with a suggestion to refactor it for better maintainability. Overall, the changes are well-implemented and improve the robustness of the LiteLLM integration.

Comment on lines 1374 to 1380
if isinstance(finish_reason, types.FinishReason):
aggregated_llm_response_with_tool_call.finish_reason = finish_reason
else:
finish_reason_str = str(finish_reason).lower()
aggregated_llm_response_with_tool_call.finish_reason = _FINISH_REASON_MAPPING.get(
finish_reason_str, types.FinishReason.OTHER
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block of code for setting the finish_reason is duplicated in the elif finish_reason == "stop" block below (lines 1402-1408). To improve maintainability and reduce redundancy, consider extracting this logic into a private helper function that can be called from both places.

Comment on lines 1402 to 1408
if isinstance(finish_reason, types.FinishReason):
aggregated_llm_response.finish_reason = finish_reason
else:
finish_reason_str = str(finish_reason).lower()
aggregated_llm_response.finish_reason = _FINISH_REASON_MAPPING.get(
finish_reason_str, types.FinishReason.OTHER
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This is a duplication of the logic at lines 1374-1380. Please see my comment there for a refactoring suggestion to improve maintainability.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses the content duplication issue in LiteLLM streaming responses by correctly setting content=None for tool-call messages. The changes are well-documented and align with the OpenAI/LiteLLM specification. Additionally, the fixes for preventing double JSON serialization and for consistently mapping finish_reason in streaming mode are valuable improvements that enhance correctness and reliability.

My only suggestion is to refactor the duplicated logic for mapping finish_reason to improve code maintainability.

Comment on lines 1395 to 1408
# FIX: Map finish_reason to FinishReason enum for streaming text-only responses.
# Previously, streaming responses did not set finish_reason on aggregated
# LlmResponse objects, causing the ADK agent runner to not properly recognize
# completion states. This mirrors the logic from non-streaming path (lines 776-784)
# to ensure consistent behavior across both streaming and non-streaming modes.
# Without this, Claude and other models via LiteLLM would hit stop conditions
# that the agent couldn't properly handle.
if isinstance(finish_reason, types.FinishReason):
aggregated_llm_response.finish_reason = finish_reason
else:
finish_reason_str = str(finish_reason).lower()
aggregated_llm_response.finish_reason = _FINISH_REASON_MAPPING.get(
finish_reason_str, types.FinishReason.OTHER
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This logic for mapping finish_reason is duplicated from lines 1374-1380. To improve maintainability and reduce code repetition, consider refactoring this logic into a private helper function.

For example, you could create a function like _set_finish_reason(response: LlmResponse, finish_reason: Any) that encapsulates this mapping and assignment.

def _set_finish_reason(response: LlmResponse, finish_reason: Any):
    """Sets the finish reason on the LlmResponse, mapping from string if necessary."""
    if isinstance(finish_reason, types.FinishReason):
        response.finish_reason = finish_reason
    else:
        finish_reason_str = str(finish_reason).lower()
        response.finish_reason = _FINISH_REASON_MAPPING.get(
            finish_reason_str, types.FinishReason.OTHER
        )

This would make the code cleaner and easier to manage if the mapping logic needs to change in the future.

thesynapses and others added 2 commits November 25, 2025 07:41
…tion

Extract finish_reason mapping logic into a reusable helper function to address
code duplication feedback from Gemini Code Assist review on PR google#3698.

Changes:
- Added _set_finish_reason(response, finish_reason) helper function
- Replaced three duplicate mapping blocks with single helper call:
  * Non-streaming path (line ~880)
  * Streaming tool-call path (line ~1387)
  * Streaming text-only path (line ~1409)
- Preserved all existing comments and behavior
- Improved maintainability - single source of truth for mapping logic

Addresses: https://github.com/google/adk-python/pull/3698\#discussion_r18xxxxx
@ryanaiagent ryanaiagent self-assigned this Nov 25, 2025
@ryanaiagent ryanaiagent added the request clarification [Status] The maintainer need clarification or more information from the author label Nov 30, 2025
@ryanaiagent
Copy link
Collaborator

Hi @thesynapses , Thank you for your work on this pull request. We appreciate the effort you've invested.
Before we can proceed with the review can you please fix the lint errors. You can use autoformat.sh.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

models [Component] Issues related to model support request clarification [Status] The maintainer need clarification or more information from the author

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LiteLLM Streaming Content Duplication in Tool Call Responses

3 participants