Fix: LiteLLM Streaming Content Duplication in Tool Call Responses #3698

thesynapses · 2025-11-24T21:25:47Z

Fix: LiteLLM Streaming Content Duplication in Tool Call Responses

Summary

Fixes content duplication in streaming responses when using LiteLLM models with ADK's planning features (e.g., PlanReActPlanner). Planning and reasoning text was appearing twice: once during streaming as individual chunks, and again in the aggregated tool-call message.

Resolves #3697

Problem

When the model generates planning/reasoning text (e.g., <PLANNING>I need to search...</PLANNING>) followed by tool calls during streaming:

Text chunks are streamed to users in real-time (lines 1288-1296)
Same text included again in aggregated tool-call message with content=text (line 1352)

This violates OpenAI/LiteLLM conventions where tool-call-only messages should have content=None.

Solution

Changed line 1352 to set content=None for tool-call messages:

ChatCompletionAssistantMessage(
    role="assistant",
    content=None,  # FIX: Avoid duplication, follow OpenAI spec
    tool_calls=tool_calls,
)

Why This Works

Planning text already streamed: Users see it in real-time via individual chunks
Preserved in thought_parts: Reasoning captured separately at line 1357 for conversation history
Follows API standards: OpenAI, Claude, GPT APIs expect content=None for tool-only messages
Correct semantics: Tool-call messages represent function invocations, not answer text

Impact

Eliminates content duplication
Aligns with OpenAI/LiteLLM conventions
Preserves reasoning context via thought_parts
Clean conversation history without redundant content
Proper semantic representation of tool-call turns

Testing

Tested with:

LiteLLM models (Claude, GPT) with planning workflows
Streaming enabled with tool calls
Multi-turn conversations requiring tool usage

Additional Context

Affects only streaming mode with tool calls
Non-streaming path already handles this correctly
thought_parts parameter preserves reasoning separately from message content
No breaking changes to existing APIs

Fixes google#3665 Streaming responses from LiteLLM models (Claude, GPT, etc.) were not setting finish_reason on aggregated LlmResponse objects, causing agent runners to not properly recognize completion states. This fix mirrors the finish_reason mapping logic from the non-streaming path (lines 776-784) and applies it to both streaming code paths: - Tool call responses (lines 1340-1368) - Text-only responses (lines 1369-1390) Without this fix, agents using Claude or GPT via LiteLLM would encounter stop conditions that couldn't be properly handled, leading to incomplete responses or unexpected agent behavior. Tested with Claude Sonnet 4.5 and GPT-5 via Azure OpenAI in production multi-agent system with MCP tools.

Fixes google#3676 MCP tool responses arrive as JSON strings but were being double-serialized by _safe_json_serialize(), creating triple-nested JSON that prevented Claude and GPT from parsing tool results. Example of the bug: '{"content": [{"type": "text", "text": "{\n \"type\"..."}]}' This fix adds an isinstance(str) check before serialization. If the response is already a string (from MCP or other sources), it's used directly. Otherwise, it's serialized normally. Impact: Without this fix, agents using LiteLLM with MCP tools would successfully call tools but fail to present results to users, appearing to hang or produce incomplete responses. Tested with Claude Sonnet 4.5 and GPT-5 via Azure OpenAI with MCP tools (Google Drive, HubSpot CRM) in production multi-agent system.

…fixes

…plication - Fixes content duplication where planning/reasoning text appears twice (once during streaming, again in aggregated tool-call message) - Aligns with OpenAI/LiteLLM conventions for tool-call messages - Planning text is preserved in thought_parts and already streamed to users - Resolves semantic confusion where tool-call messages contained text content Fixes google#3697

gemini-code-assist · 2025-11-24T21:26:02Z

Summary of Changes

Hello @thesynapses, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves critical issues related to content handling and API compliance in LiteLLM streaming responses, especially when tool calls are involved. It refines how planning text and tool responses are processed during streaming to ensure a clean, accurate, and API-compliant conversation history, while also improving the agent runner's ability to interpret completion states from various models.

Highlights

Eliminate Content Duplication in Streaming Tool Call Responses: Previously, planning and reasoning text would appear twice in streaming responses when using LiteLLM models with ADK's planning features: once as individual chunks and again in the aggregated tool-call message. This fix ensures such text is streamed only once, providing a cleaner user experience.
Align Tool-Call Messages with OpenAI/LiteLLM API Conventions: Tool-call-only messages now correctly set content=None, adhering to API specifications. This prevents redundant content in conversation history and ensures proper semantic representation of tool-call turns.
Prevent Double-Serialization of Tool Responses: Addressed an issue where _safe_json_serialize would double-serialize already-JSON string tool responses, leading to malformed (triple-nested) JSON and parsing failures for models like Claude/GPT. The change now checks if the response is already a string before serialization.
Ensure Proper finish_reason Mapping for Streaming Responses: Implemented logic to correctly map the finish_reason to the FinishReason enum for aggregated LlmResponse objects in streaming mode. This ensures the ADK agent runner accurately recognizes completion states, mirroring behavior from the non-streaming path.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist · 2025-11-24T21:28:23Z

Summary of Changes

Hello @thesynapses, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the robustness and API compliance of LiteLLM integrations within ADK, particularly for streaming workflows involving tool calls. It resolves critical issues related to content duplication, incorrect serialization of tool responses, and inconsistent finish_reason handling, leading to cleaner conversation histories, more reliable tool execution, and accurate agent state management.

Highlights

Eliminate Content Duplication in Streaming Tool Calls: Resolved an issue where planning and reasoning text was duplicated in streaming responses when LiteLLM models made tool calls, appearing both as individual chunks and within the aggregated tool-call message. This is fixed by setting content=None for tool-only messages, aligning with OpenAI/LiteLLM API specifications.
Correct Tool Response Serialization: Addressed a bug where _safe_json_serialize was double-serializing JSON strings within function_response.response, leading to malformed, triple-nested JSON that prevented models like Claude/GPT from correctly parsing tool results. The change now checks if the response is already a string before serialization.
Consistent Finish Reason Mapping for Streaming: Implemented logic to correctly map the finish_reason to the FinishReason enum for aggregated LlmResponse objects in streaming scenarios (both tool-call and text-only responses). This ensures that the ADK agent runner can properly recognize completion states, mirroring behavior from non-streaming paths.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request effectively resolves the content duplication issue in LiteLLM streaming responses by setting content=None for tool-call messages, which is a good fix that aligns with API conventions. The additional changes to prevent double-serialization of tool response content and to correctly map the finish_reason for streaming responses are also valuable improvements. I've noticed a small area of code duplication in the new finish_reason handling logic and have left a comment with a suggestion to refactor it for better maintainability. Overall, the changes are well-implemented and improve the robustness of the LiteLLM integration.

gemini-code-assist · 2025-11-24T21:28:51Z

src/google/adk/models/lite_llm.py

+            if isinstance(finish_reason, types.FinishReason):
+              aggregated_llm_response_with_tool_call.finish_reason = finish_reason
+            else:
+              finish_reason_str = str(finish_reason).lower()
+              aggregated_llm_response_with_tool_call.finish_reason = _FINISH_REASON_MAPPING.get(
+                  finish_reason_str, types.FinishReason.OTHER
+              )


This block of code for setting the finish_reason is duplicated in the elif finish_reason == "stop" block below (lines 1402-1408). To improve maintainability and reduce redundancy, consider extracting this logic into a private helper function that can be called from both places.

gemini-code-assist · 2025-11-24T21:28:51Z

src/google/adk/models/lite_llm.py

+            if isinstance(finish_reason, types.FinishReason):
+              aggregated_llm_response.finish_reason = finish_reason
+            else:
+              finish_reason_str = str(finish_reason).lower()
+              aggregated_llm_response.finish_reason = _FINISH_REASON_MAPPING.get(
+                  finish_reason_str, types.FinishReason.OTHER
+              )


This is a duplication of the logic at lines 1374-1380. Please see my comment there for a refactoring suggestion to improve maintainability.

gemini-code-assist

Code Review

This pull request effectively addresses the content duplication issue in LiteLLM streaming responses by correctly setting content=None for tool-call messages. The changes are well-documented and align with the OpenAI/LiteLLM specification. Additionally, the fixes for preventing double JSON serialization and for consistently mapping finish_reason in streaming mode are valuable improvements that enhance correctness and reliability.

My only suggestion is to refactor the duplicated logic for mapping finish_reason to improve code maintainability.

gemini-code-assist · 2025-11-24T21:30:06Z

src/google/adk/models/lite_llm.py

+            # FIX: Map finish_reason to FinishReason enum for streaming text-only responses.
+            # Previously, streaming responses did not set finish_reason on aggregated
+            # LlmResponse objects, causing the ADK agent runner to not properly recognize
+            # completion states. This mirrors the logic from non-streaming path (lines 776-784)
+            # to ensure consistent behavior across both streaming and non-streaming modes.
+            # Without this, Claude and other models via LiteLLM would hit stop conditions
+            # that the agent couldn't properly handle.
+            if isinstance(finish_reason, types.FinishReason):
+              aggregated_llm_response.finish_reason = finish_reason
+            else:
+              finish_reason_str = str(finish_reason).lower()
+              aggregated_llm_response.finish_reason = _FINISH_REASON_MAPPING.get(
+                  finish_reason_str, types.FinishReason.OTHER
+              )


This logic for mapping finish_reason is duplicated from lines 1374-1380. To improve maintainability and reduce code repetition, consider refactoring this logic into a private helper function.

For example, you could create a function like _set_finish_reason(response: LlmResponse, finish_reason: Any) that encapsulates this mapping and assignment.

def _set_finish_reason(response: LlmResponse, finish_reason: Any): """Sets the finish reason on the LlmResponse, mapping from string if necessary.""" if isinstance(finish_reason, types.FinishReason): response.finish_reason = finish_reason else: finish_reason_str = str(finish_reason).lower() response.finish_reason = _FINISH_REASON_MAPPING.get( finish_reason_str, types.FinishReason.OTHER )

This would make the code cleaner and easier to manage if the mapping logic needs to change in the future.

…tion Extract finish_reason mapping logic into a reusable helper function to address code duplication feedback from Gemini Code Assist review on PR google#3698. Changes: - Added _set_finish_reason(response, finish_reason) helper function - Replaced three duplicate mapping blocks with single helper call: * Non-streaming path (line ~880) * Streaming tool-call path (line ~1387) * Streaming text-only path (line ~1409) - Preserved all existing comments and behavior - Improved maintainability - single source of truth for mapping logic Addresses: https://github.com/google/adk-python/pull/3698\#discussion_r18xxxxx

ryanaiagent · 2025-11-30T05:01:40Z

Hi @thesynapses , Thank you for your work on this pull request. We appreciate the effort you've invested.
Before we can proceed with the review can you please fix the lint errors. You can use autoformat.sh.

thesynapses added 4 commits November 23, 2025 09:24

Merge branch 'fix/litellm-tool-serialization' into photon-production-…

547c719

…fixes

adk-bot added the models [Component] Issues related to model support label Nov 24, 2025

gemini-code-assist bot reviewed Nov 24, 2025

View reviewed changes

thesynapses and others added 2 commits November 25, 2025 07:41

Merge branch 'main' into fix/litellm-streaming-content-duplication

bcf034c

ryanaiagent self-assigned this Nov 25, 2025

ryanaiagent added the request clarification [Status] The maintainer need clarification or more information from the author label Nov 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: LiteLLM Streaming Content Duplication in Tool Call Responses #3698

Fix: LiteLLM Streaming Content Duplication in Tool Call Responses #3698

thesynapses commented Nov 24, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Nov 24, 2025

Uh oh!

gemini-code-assist bot commented Nov 24, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 24, 2025

Uh oh!

gemini-code-assist bot Nov 24, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 24, 2025

Uh oh!

ryanaiagent commented Nov 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix: LiteLLM Streaming Content Duplication in Tool Call Responses #3698

Are you sure you want to change the base?

Fix: LiteLLM Streaming Content Duplication in Tool Call Responses #3698

Conversation

thesynapses commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix: LiteLLM Streaming Content Duplication in Tool Call Responses

Summary

Problem

Solution

Why This Works

Impact

Testing

Additional Context

Uh oh!

gemini-code-assist bot commented Nov 24, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot commented Nov 24, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

ryanaiagent commented Nov 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

thesynapses commented Nov 24, 2025 •

edited

Loading