fix: prevent LLM observation hallucination by properly attributing tool results #4182
+398
−9
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
fix: prevent LLM observation hallucination by properly attributing tool results
Summary
Fixes #4181
When agents use tools, the tool's observation was being appended to the assistant message in conversation history. This caused the LLM to learn that it should generate "Observation:" content itself, leading to hallucinated tool outputs during flow streaming.
The fix: Separate the LLM's actual response from the tool observation in message history:
Changes:
llm_responsefield toAgentActionto preserve the original LLM responsehandle_agent_action_coreto storellm_responsebefore appending observationCrewAgentExecutor,LiteAgent,CrewAgentExecutorFlow) to use proper message attributionadd_image_toolspecial case in bothCrewAgentExecutorandCrewAgentExecutorFlowReview & Testing Checklist for Human
This is a behavioral change to how conversation history is structured. Please verify:
add_image_toolwas modified - verify image tools still work correctlyRecommended test plan:
Notes
textfield onAgentActionstill contains the observation (for logging/tracing purposes), butllm_responsecontains the clean LLM outputLink to Devin run: https://app.devin.ai/sessions/344d8b0e09a0493981fc25eeb2285771
Requested by: João