fix: prevent LLM observation hallucination by properly attributing tool results #4182

devin-ai-integration · 2026-01-06T06:58:00Z

fix: prevent LLM observation hallucination by properly attributing tool results

Summary

When agents use tools, the tool's observation was being appended to the assistant message in conversation history. This caused the LLM to learn that it should generate "Observation:" content itself, leading to hallucinated tool outputs during flow streaming.

The fix: Separate the LLM's actual response from the tool observation in message history:

LLM's response (Thought/Action/Action Input) → stored as assistant message
Tool observation → stored as user message

Changes:

Added llm_response field to AgentAction to preserve the original LLM response
Modified handle_agent_action_core to store llm_response before appending observation
Updated all three executors (CrewAgentExecutor, LiteAgent, CrewAgentExecutorFlow) to use proper message attribution
Fixed add_image_tool special case in both CrewAgentExecutor and CrewAgentExecutorFlow

Review & Testing Checklist for Human

This is a behavioral change to how conversation history is structured. Please verify:

Message history structure: After a tool call, verify the message list contains an assistant message with the action (no observation), followed by a user message with the observation
Flow streaming: Test with flow streaming to confirm observations are no longer hallucinated by the LLM
add_image_tool behavior: The special case for add_image_tool was modified - verify image tools still work correctly
LiteAgent and CrewAgentExecutorFlow: These executors received the same fix but have no direct tests - consider manual testing
Existing workflows: Verify existing agent workflows that use tools still function correctly

Recommended test plan:

Create a simple crew with an agent that uses a tool
Run with verbose logging and inspect the message history
Confirm observations appear as user messages, not assistant messages
Test with flow streaming if that's your use case

Notes

The text field on AgentAction still contains the observation (for logging/tracing purposes), but llm_response contains the clean LLM output
Unit tests pass locally but CI will provide full coverage

Link to Devin run: https://app.devin.ai/sessions/344d8b0e09a0493981fc25eeb2285771
Requested by: João

…ol results Fixes #4181 The issue was that tool observations were being appended to the assistant message in the conversation history, which caused the LLM to learn to hallucinate fake observations during tool calls. Changes: - Add llm_response field to AgentAction to store the original LLM response before observation is appended - Modify handle_agent_action_core to store llm_response before appending observation to text (text still contains observation for logging) - Update CrewAgentExecutor._invoke_loop and _ainvoke_loop to: - Append LLM response as assistant message - Append observation as user message (not assistant) - Apply same fix to LiteAgent._invoke_loop - Apply same fix to CrewAgentExecutorFlow.execute_tool_action - Fix add_image_tool special case in both executors to use same pattern - Add comprehensive tests for proper message attribution Co-Authored-By: João <joao@crewai.com>

devin-ai-integration · 2026-01-06T06:58:04Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: prevent LLM observation hallucination by properly attributing tool results #4182

fix: prevent LLM observation hallucination by properly attributing tool results #4182

devin-ai-integration bot commented Jan 6, 2026

Uh oh!

devin-ai-integration bot commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix: prevent LLM observation hallucination by properly attributing tool results #4182

Are you sure you want to change the base?

fix: prevent LLM observation hallucination by properly attributing tool results #4182

Conversation

devin-ai-integration bot commented Jan 6, 2026

fix: prevent LLM observation hallucination by properly attributing tool results

Summary

Review & Testing Checklist for Human

Notes

Uh oh!

devin-ai-integration bot commented Jan 6, 2026

🤖 Devin AI Engineer

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant