fix(chat): prevent freeze on long date range queries (#4927)#5543
fix(chat): prevent freeze on long date range queries (#4927)#5543
Conversation
…low on long date ranges Issue #4927: Chat freezes when asking about 30+ day ranges. Two failure modes: 1. SafetyGuard 500K token limit hit -> 'narrow down' error 2. HTTP 120s timeout hit before first stream token -> freeze with no response Both tools (get_conversations_tool, search_conversations_tool) now cap output at ~400K tokens (1.6M chars) and include the most recent conversations that fit, with a note about omitted older ones.
Greptile SummaryThis PR adds output-length truncation to Key changes:
Issues found:
Confidence Score: 3/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Tool called with date range / query] --> B[Fetch conversations from Firestore]
B --> C{conversations_data empty?}
C -- Yes --> D[Return 'no conversations' message]
C -- No --> E[Convert to Conversation objects\nCap transcript segments at 1000]
E --> F[Append all to conversations_collected\nfor citation tracking]
F --> G[Call conversations_to_string\nFULL LIST — may be 5–10 MB]
G --> H{len result > 1_600_000?}
H -- No --> I[Return full result]
H -- Yes --> J[Loop: conversations_to_string per item\neach call labels it 'Conversation #1']
J --> K{total_chars + next part > limit\nAND included_count > 0?}
K -- Yes --> L[Stop loop]
K -- No --> M[Append part, add to total_chars]
M --> J
L --> N[separator.join truncated_parts\n+ omission note]
N --> O[Return truncated result\nwith note about omitted count]
Last reviewed commit: fd1b355 |
| part = Conversation.conversations_to_string( | ||
| [conversation], | ||
| use_transcript=include_transcript, | ||
| include_timestamps=include_timestamps, | ||
| people=people, | ||
| ) | ||
| if total_chars + len(part) + len(separator) > MAX_RESULT_CHARS and included_count > 0: | ||
| break | ||
| truncated_parts.append(part) | ||
| total_chars += len(part) + len(separator) | ||
| included_count += 1 |
There was a problem hiding this comment.
All truncated conversations labelled "Conversation #1"
conversations_to_string([conversation]) is called with a single-element list, so enumerate always starts at i=0 — meaning every conversation in the truncated output gets the header Conversation #1. When truncation is active, the LLM receives a context where every entry claims to be the first conversation.
The same issue exists in search_conversations_tool at lines 509-519.
A straightforward fix is to avoid re-calling conversations_to_string on individual items and instead carry the per-conversation strings through a single pre-computation pass:
# Pre-compute individual parts once, preserving original indices
all_parts = []
for idx, conversation in enumerate(conversations):
sep_str = f"Conversation #{idx + 1}\n"
# ... format using the same logic, or factor out a single-conversation formatterAlternatively, build the formatted parts first and then join/truncate, rather than formatting the full list first and re-formatting from scratch on overflow.
| result = Conversation.conversations_to_string( | ||
| conversations, use_transcript=include_transcript, include_timestamps=include_timestamps, people=people | ||
| ) |
There was a problem hiding this comment.
Double-formatting wastes memory and time on the hot path
The full conversations_to_string(conversations, ...) is called unconditionally before the truncation check, producing a potentially 5–10 MB string that is immediately discarded when the result exceeds MAX_RESULT_CHARS. Then the loop calls conversations_to_string([conv]) one more time per conversation. For the large-range queries this PR is meant to fix (e.g., limit=5000, include_transcript=True), this doubles the serialisation work and peak memory, and could itself approach the 120 s HTTP timeout before truncation even begins.
Consider building the per-conversation parts first (one pass), then joining and truncating in a second step — this avoids ever materialising the oversized string:
separator = "\n\n---------------------\n\n"
parts = [
Conversation.conversations_to_string(
[conv],
use_transcript=include_transcript,
include_timestamps=include_timestamps,
people=people,
)
for conv in conversations
]
total = 0
kept = []
for part in parts:
if total + len(part) + len(separator) > MAX_RESULT_CHARS and kept:
break
kept.append(part)
total += len(part) + len(separator)
result = separator.join(kept)| total_chars = 0 | ||
| included_count = 0 | ||
| separator = "\n\n---------------------\n\n" | ||
| for i, conversation in enumerate(conversations): |
There was a problem hiding this comment.
Unused loop variable i
i is assigned by enumerate but never referenced inside the loop body.
| for i, conversation in enumerate(conversations): | |
| for conversation in conversations: |
| def _call_tool_with_conversations(self, conversations_data, max_result_chars=None): | ||
| """Helper to simulate the truncation logic from get_conversations_tool.""" | ||
| MAX_RESULT_CHARS = max_result_chars or 1_600_000 | ||
|
|
||
| conversations = [] | ||
| for conv_data in conversations_data: | ||
| conversations.append(Conversation(**conv_data)) | ||
|
|
||
| result = Conversation.conversations_to_string(conversations) | ||
|
|
||
| if len(result) > MAX_RESULT_CHARS: | ||
| truncated_parts = [] | ||
| total_chars = 0 | ||
| included_count = 0 | ||
| separator = "\n\n---------------------\n\n" | ||
| for conversation in conversations: | ||
| part = Conversation.conversations_to_string([conversation]) | ||
| if total_chars + len(part) + len(separator) > MAX_RESULT_CHARS and included_count > 0: | ||
| break | ||
| truncated_parts.append(part) | ||
| total_chars += len(part) + len(separator) | ||
| included_count += 1 | ||
|
|
||
| omitted = len(conversations) - included_count | ||
| result = separator.join(truncated_parts) | ||
| if omitted > 0: | ||
| result += f"\n\n[Note: {omitted} older conversations omitted to fit context. Ask about a shorter time period for full details.]" | ||
|
|
||
| return result, len(conversations) |
There was a problem hiding this comment.
Tests exercise a re-implementation, not the actual tool
_call_tool_with_conversations is a copy-paste of the truncation algorithm rather than a call to get_conversations_tool. This means the tests can pass even if the logic inside the real tool diverges (e.g., wrong parameters forwarded to conversations_to_string, or the truncation block is accidentally removed). The numbering bug noted above is a direct consequence of this gap — the helper calls conversations_to_string([conversation]) identically to the production code, so the tests pass while every conversation gets labelled "Conversation #1".
Consider patching out the Firestore/config dependencies and exercising get_conversations_tool end-to-end, or at least extracting the truncation logic into a standalone helper that both the tool and the tests can call directly.
Summary
Fixes #4927 — Chat freezes when asking about lengthy date ranges (e.g., "analyze my last 30 days").
Root cause: Two failure modes:
Fix: Both
get_conversations_toolandsearch_conversations_toolnow cap output at ~400K tokens (1.6M chars). If the formatted result exceeds that, it truncates to the most recent conversations that fit, with a note about omitted older ones:This prevents both failure modes — the tool self-limits before the safety guard or timeout can trigger.
Changes
backend/utils/retrieval/tools/conversation_tools.py— Add output truncation to both toolsbackend/tests/unit/test_chat_context_truncation.py— 9 tests covering truncation logicbackend/test.sh— Register new test fileTest plan