Skip to content

fix(chat): prevent freeze on long date range queries (#4927)#5543

Open
beastoin wants to merge 3 commits intomainfrom
fix/chat-freeze-long-date-range-4927
Open

fix(chat): prevent freeze on long date range queries (#4927)#5543
beastoin wants to merge 3 commits intomainfrom
fix/chat-freeze-long-date-range-4927

Conversation

@beastoin
Copy link
Collaborator

Summary

Fixes #4927 — Chat freezes when asking about lengthy date ranges (e.g., "analyze my last 30 days").

Root cause: Two failure modes:

  1. "Narrow down" error — Conversation tool returns data exceeding 500K token safety guard limit
  2. Freeze (no response) — Firestore query + LLM processing exceeds 120s HTTP timeout before first streaming token → bare 504

Fix: Both get_conversations_tool and search_conversations_tool now cap output at ~400K tokens (1.6M chars). If the formatted result exceeds that, it truncates to the most recent conversations that fit, with a note about omitted older ones:

[Note: 187 older conversations omitted to fit context. Ask about a shorter time period for full details.]

This prevents both failure modes — the tool self-limits before the safety guard or timeout can trigger.

Changes

  • backend/utils/retrieval/tools/conversation_tools.py — Add output truncation to both tools
  • backend/tests/unit/test_chat_context_truncation.py — 9 tests covering truncation logic
  • backend/test.sh — Register new test file

Test plan

  • 9/9 unit tests pass
  • Manual test: ask "summarize my last month" on dev with a user who has 30+ days of conversations
  • Verify the LLM receives truncated context and produces a useful summary instead of freezing

…low on long date ranges

Issue #4927: Chat freezes when asking about 30+ day ranges. Two failure modes:
1. SafetyGuard 500K token limit hit -> 'narrow down' error
2. HTTP 120s timeout hit before first stream token -> freeze with no response

Both tools (get_conversations_tool, search_conversations_tool) now cap output
at ~400K tokens (1.6M chars) and include the most recent conversations that fit,
with a note about omitted older ones.
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 10, 2026

Greptile Summary

This PR adds output-length truncation to get_conversations_tool and search_conversations_tool to prevent chat freezes (#4927) caused by either a 500K-token safety-guard breach or a 120 s HTTP timeout when a user queries a large date range. Both tools now cap their formatted output at ~1.6 M characters (~400 K tokens) and append a note listing how many older conversations were omitted.

Key changes:

  • conversation_tools.py — Both tools check len(result) > MAX_RESULT_CHARS after formatting and rebuild a truncated version, keeping the most-recent conversations that fit.
  • test_chat_context_truncation.py — 9 unit tests covering truncation trigger, pass-through for small results, ordering, custom limits, and single-oversized-conversation edge case.
  • test.sh — New test file registered in the shell test runner.

Issues found:

  • Conversation numbering bug: The truncation loop calls Conversation.conversations_to_string([single_conv]) for each item, so every entry in a truncated response is labelled "Conversation Client #1" rather than its sequential position. This affects both tools.
  • Double-formatting: The full conversations_to_string(all_conversations) call executes unconditionally before the truncation check, materialising (and then discarding) the oversized string; on worst-case inputs this itself risks a timeout.
  • Test coverage gap: The test helper re-implements the truncation algorithm rather than calling the actual tool, masking the numbering bug and leaving the integration path untested.

Confidence Score: 3/5

  • Safe to merge as a stop-gap for the freeze bug, but the conversation numbering regression should be addressed before merging.
  • The truncation logic correctly prevents oversized payloads reaching the LLM, directly fixing the reported freeze. However, the per-conversation re-formatting in the truncation loop always emits "Conversation Client #1", producing confusing context for the LLM on every large query. The double-format pattern also still risks slow tool execution on very large inputs. Tests don't cover the actual tool functions.
  • backend/utils/retrieval/tools/conversation_tools.py — both truncation loops need the numbering fix; backend/tests/unit/test_chat_context_truncation.py — tests should exercise the actual tool or at minimum assert correct sequential numbering in truncated output.

Important Files Changed

Filename Overview
backend/utils/retrieval/tools/conversation_tools.py Core fix: adds MAX_RESULT_CHARS=1_600_000 truncation to both get_conversations_tool and search_conversations_tool. Has a correctness bug where the truncation loop re-calls conversations_to_string with single-element lists, causing all truncated conversations to be labelled "Conversation #1". Also builds the full string before checking truncation (double-format inefficiency) and has an unused enumerate variable.
backend/tests/unit/test_chat_context_truncation.py 9 unit tests for truncation logic, but the helper re-implements the algorithm instead of calling the actual tool, creating a gap that allows the "Conversation #1 numbering" bug to go undetected.
backend/test.sh Trivial one-line addition registering the new test file in the test runner script.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Tool called with date range / query] --> B[Fetch conversations from Firestore]
    B --> C{conversations_data empty?}
    C -- Yes --> D[Return 'no conversations' message]
    C -- No --> E[Convert to Conversation objects\nCap transcript segments at 1000]
    E --> F[Append all to conversations_collected\nfor citation tracking]
    F --> G[Call conversations_to_string\nFULL LIST — may be 5–10 MB]
    G --> H{len result > 1_600_000?}
    H -- No --> I[Return full result]
    H -- Yes --> J[Loop: conversations_to_string per item\neach call labels it 'Conversation #1']
    J --> K{total_chars + next part > limit\nAND included_count > 0?}
    K -- Yes --> L[Stop loop]
    K -- No --> M[Append part, add to total_chars]
    M --> J
    L --> N[separator.join truncated_parts\n+ omission note]
    N --> O[Return truncated result\nwith note about omitted count]
Loading

Last reviewed commit: fd1b355

Comment on lines +265 to +275
part = Conversation.conversations_to_string(
[conversation],
use_transcript=include_transcript,
include_timestamps=include_timestamps,
people=people,
)
if total_chars + len(part) + len(separator) > MAX_RESULT_CHARS and included_count > 0:
break
truncated_parts.append(part)
total_chars += len(part) + len(separator)
included_count += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All truncated conversations labelled "Conversation #1"

conversations_to_string([conversation]) is called with a single-element list, so enumerate always starts at i=0 — meaning every conversation in the truncated output gets the header Conversation #1. When truncation is active, the LLM receives a context where every entry claims to be the first conversation.

The same issue exists in search_conversations_tool at lines 509-519.

A straightforward fix is to avoid re-calling conversations_to_string on individual items and instead carry the per-conversation strings through a single pre-computation pass:

# Pre-compute individual parts once, preserving original indices
all_parts = []
for idx, conversation in enumerate(conversations):
    sep_str = f"Conversation #{idx + 1}\n"
    # ... format using the same logic, or factor out a single-conversation formatter

Alternatively, build the formatted parts first and then join/truncate, rather than formatting the full list first and re-formatting from scratch on overflow.

Comment on lines 254 to 256
result = Conversation.conversations_to_string(
conversations, use_transcript=include_transcript, include_timestamps=include_timestamps, people=people
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double-formatting wastes memory and time on the hot path

The full conversations_to_string(conversations, ...) is called unconditionally before the truncation check, producing a potentially 5–10 MB string that is immediately discarded when the result exceeds MAX_RESULT_CHARS. Then the loop calls conversations_to_string([conv]) one more time per conversation. For the large-range queries this PR is meant to fix (e.g., limit=5000, include_transcript=True), this doubles the serialisation work and peak memory, and could itself approach the 120 s HTTP timeout before truncation even begins.

Consider building the per-conversation parts first (one pass), then joining and truncating in a second step — this avoids ever materialising the oversized string:

separator = "\n\n---------------------\n\n"
parts = [
    Conversation.conversations_to_string(
        [conv],
        use_transcript=include_transcript,
        include_timestamps=include_timestamps,
        people=people,
    )
    for conv in conversations
]

total = 0
kept = []
for part in parts:
    if total + len(part) + len(separator) > MAX_RESULT_CHARS and kept:
        break
    kept.append(part)
    total += len(part) + len(separator)

result = separator.join(kept)

total_chars = 0
included_count = 0
separator = "\n\n---------------------\n\n"
for i, conversation in enumerate(conversations):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused loop variable i

i is assigned by enumerate but never referenced inside the loop body.

Suggested change
for i, conversation in enumerate(conversations):
for conversation in conversations:

Comment on lines +101 to +129
def _call_tool_with_conversations(self, conversations_data, max_result_chars=None):
"""Helper to simulate the truncation logic from get_conversations_tool."""
MAX_RESULT_CHARS = max_result_chars or 1_600_000

conversations = []
for conv_data in conversations_data:
conversations.append(Conversation(**conv_data))

result = Conversation.conversations_to_string(conversations)

if len(result) > MAX_RESULT_CHARS:
truncated_parts = []
total_chars = 0
included_count = 0
separator = "\n\n---------------------\n\n"
for conversation in conversations:
part = Conversation.conversations_to_string([conversation])
if total_chars + len(part) + len(separator) > MAX_RESULT_CHARS and included_count > 0:
break
truncated_parts.append(part)
total_chars += len(part) + len(separator)
included_count += 1

omitted = len(conversations) - included_count
result = separator.join(truncated_parts)
if omitted > 0:
result += f"\n\n[Note: {omitted} older conversations omitted to fit context. Ask about a shorter time period for full details.]"

return result, len(conversations)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests exercise a re-implementation, not the actual tool

_call_tool_with_conversations is a copy-paste of the truncation algorithm rather than a call to get_conversations_tool. This means the tests can pass even if the logic inside the real tool diverges (e.g., wrong parameters forwarded to conversations_to_string, or the truncation block is accidentally removed). The numbering bug noted above is a direct consequence of this gap — the helper calls conversations_to_string([conversation]) identically to the production code, so the tests pass while every conversation gets labelled "Conversation #1".

Consider patching out the Firestore/config dependencies and exercising get_conversations_tool end-to-end, or at least extracting the truncation logic into a standalone helper that both the tool and the tests can call directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Omi chat freezes with handling lengthy date ranges, such as "analyze my last 30 days."

1 participant