fix(chat): prevent freeze on long date range queries (#4927) by beastoin · Pull Request #5543 · BasedHardware/omi

beastoin · 2026-03-10T13:22:28Z

Summary

Fixes #4927 — Chat freezes when asking about lengthy date ranges (e.g., "analyze my last 30 days").

Root cause: Two failure modes:

"Narrow down" error — Conversation tool returns data exceeding 500K token safety guard limit
Freeze (no response) — Firestore query + LLM processing exceeds 120s HTTP timeout before first streaming token → bare 504

Fix: Both get_conversations_tool and search_conversations_tool now cap output at ~400K tokens (1.6M chars). If the formatted result exceeds that, it truncates to the most recent conversations that fit, with a note about omitted older ones:

[Note: 187 older conversations omitted to fit context. Ask about a shorter time period for full details.]

This prevents both failure modes — the tool self-limits before the safety guard or timeout can trigger.

Changes

backend/utils/retrieval/tools/conversation_tools.py — Add output truncation to both tools
backend/tests/unit/test_chat_context_truncation.py — 9 tests covering truncation logic
backend/test.sh — Register new test file

Test plan

9/9 unit tests pass
Manual test: ask "summarize my last month" on dev with a user who has 30+ days of conversations
Verify the LLM receives truncated context and produces a useful summary instead of freezing

…low on long date ranges Issue #4927: Chat freezes when asking about 30+ day ranges. Two failure modes: 1. SafetyGuard 500K token limit hit -> 'narrow down' error 2. HTTP 120s timeout hit before first stream token -> freeze with no response Both tools (get_conversations_tool, search_conversations_tool) now cap output at ~400K tokens (1.6M chars) and include the most recent conversations that fit, with a note about omitted older ones.

greptile-apps · 2026-03-10T13:26:35Z

Greptile Summary

This PR adds output-length truncation to get_conversations_tool and search_conversations_tool to prevent chat freezes (#4927) caused by either a 500K-token safety-guard breach or a 120 s HTTP timeout when a user queries a large date range. Both tools now cap their formatted output at ~1.6 M characters (~400 K tokens) and append a note listing how many older conversations were omitted.

Key changes:

conversation_tools.py — Both tools check len(result) > MAX_RESULT_CHARS after formatting and rebuild a truncated version, keeping the most-recent conversations that fit.
test_chat_context_truncation.py — 9 unit tests covering truncation trigger, pass-through for small results, ordering, custom limits, and single-oversized-conversation edge case.
test.sh — New test file registered in the shell test runner.

Issues found:

Conversation numbering bug: The truncation loop calls Conversation.conversations_to_string([single_conv]) for each item, so every entry in a truncated response is labelled "Conversation Client #1" rather than its sequential position. This affects both tools.
Double-formatting: The full conversations_to_string(all_conversations) call executes unconditionally before the truncation check, materialising (and then discarding) the oversized string; on worst-case inputs this itself risks a timeout.
Test coverage gap: The test helper re-implements the truncation algorithm rather than calling the actual tool, masking the numbering bug and leaving the integration path untested.

Confidence Score: 3/5

Safe to merge as a stop-gap for the freeze bug, but the conversation numbering regression should be addressed before merging.
The truncation logic correctly prevents oversized payloads reaching the LLM, directly fixing the reported freeze. However, the per-conversation re-formatting in the truncation loop always emits "Conversation Client #1", producing confusing context for the LLM on every large query. The double-format pattern also still risks slow tool execution on very large inputs. Tests don't cover the actual tool functions.
backend/utils/retrieval/tools/conversation_tools.py — both truncation loops need the numbering fix; backend/tests/unit/test_chat_context_truncation.py — tests should exercise the actual tool or at minimum assert correct sequential numbering in truncated output.

Important Files Changed

Filename	Overview
backend/utils/retrieval/tools/conversation_tools.py	Core fix: adds MAX_RESULT_CHARS=1_600_000 truncation to both get_conversations_tool and search_conversations_tool. Has a correctness bug where the truncation loop re-calls conversations_to_string with single-element lists, causing all truncated conversations to be labelled "Conversation #1". Also builds the full string before checking truncation (double-format inefficiency) and has an unused `enumerate` variable.
backend/tests/unit/test_chat_context_truncation.py	9 unit tests for truncation logic, but the helper re-implements the algorithm instead of calling the actual tool, creating a gap that allows the "Conversation #1 numbering" bug to go undetected.
backend/test.sh	Trivial one-line addition registering the new test file in the test runner script.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Tool called with date range / query] --> B[Fetch conversations from Firestore]
    B --> C{conversations_data empty?}
    C -- Yes --> D[Return 'no conversations' message]
    C -- No --> E[Convert to Conversation objects\nCap transcript segments at 1000]
    E --> F[Append all to conversations_collected\nfor citation tracking]
    F --> G[Call conversations_to_string\nFULL LIST — may be 5–10 MB]
    G --> H{len result > 1_600_000?}
    H -- No --> I[Return full result]
    H -- Yes --> J[Loop: conversations_to_string per item\neach call labels it 'Conversation #1']
    J --> K{total_chars + next part > limit\nAND included_count > 0?}
    K -- Yes --> L[Stop loop]
    K -- No --> M[Append part, add to total_chars]
    M --> J
    L --> N[separator.join truncated_parts\n+ omission note]
    N --> O[Return truncated result\nwith note about omitted count]

_{Last reviewed commit: fd1b355}

greptile-apps · 2026-03-10T13:26:39Z

backend/utils/retrieval/tools/conversation_tools.py

+                part = Conversation.conversations_to_string(
+                    [conversation],
+                    use_transcript=include_transcript,
+                    include_timestamps=include_timestamps,
+                    people=people,
+                )
+                if total_chars + len(part) + len(separator) > MAX_RESULT_CHARS and included_count > 0:
+                    break
+                truncated_parts.append(part)
+                total_chars += len(part) + len(separator)
+                included_count += 1


All truncated conversations labelled "Conversation #1"

conversations_to_string([conversation]) is called with a single-element list, so enumerate always starts at i=0 — meaning every conversation in the truncated output gets the header Conversation #1. When truncation is active, the LLM receives a context where every entry claims to be the first conversation.

The same issue exists in search_conversations_tool at lines 509-519.

A straightforward fix is to avoid re-calling conversations_to_string on individual items and instead carry the per-conversation strings through a single pre-computation pass:

# Pre-compute individual parts once, preserving original indices all_parts = [] for idx, conversation in enumerate(conversations): sep_str = f"Conversation #{idx + 1}\n" # ... format using the same logic, or factor out a single-conversation formatter

Alternatively, build the formatted parts first and then join/truncate, rather than formatting the full list first and re-formatting from scratch on overflow.

greptile-apps · 2026-03-10T13:26:40Z

backend/utils/retrieval/tools/conversation_tools.py

        result = Conversation.conversations_to_string(
            conversations, use_transcript=include_transcript, include_timestamps=include_timestamps, people=people
        )


Double-formatting wastes memory and time on the hot path

The full conversations_to_string(conversations, ...) is called unconditionally before the truncation check, producing a potentially 5–10 MB string that is immediately discarded when the result exceeds MAX_RESULT_CHARS. Then the loop calls conversations_to_string([conv]) one more time per conversation. For the large-range queries this PR is meant to fix (e.g., limit=5000, include_transcript=True), this doubles the serialisation work and peak memory, and could itself approach the 120 s HTTP timeout before truncation even begins.

Consider building the per-conversation parts first (one pass), then joining and truncating in a second step — this avoids ever materialising the oversized string:

separator = "\n\n---------------------\n\n" parts = [ Conversation.conversations_to_string( [conv], use_transcript=include_transcript, include_timestamps=include_timestamps, people=people, ) for conv in conversations ] total = 0 kept = [] for part in parts: if total + len(part) + len(separator) > MAX_RESULT_CHARS and kept: break kept.append(part) total += len(part) + len(separator) result = separator.join(kept)

greptile-apps · 2026-03-10T13:26:41Z

backend/utils/retrieval/tools/conversation_tools.py

+            total_chars = 0
+            included_count = 0
+            separator = "\n\n---------------------\n\n"
+            for i, conversation in enumerate(conversations):


Unused loop variable i

i is assigned by enumerate but never referenced inside the loop body.

Suggested change

for i, conversation in enumerate(conversations):

for conversation in conversations:

greptile-apps · 2026-03-10T13:26:42Z

backend/tests/unit/test_chat_context_truncation.py

+    def _call_tool_with_conversations(self, conversations_data, max_result_chars=None):
+        """Helper to simulate the truncation logic from get_conversations_tool."""
+        MAX_RESULT_CHARS = max_result_chars or 1_600_000
+
+        conversations = []
+        for conv_data in conversations_data:
+            conversations.append(Conversation(**conv_data))
+
+        result = Conversation.conversations_to_string(conversations)
+
+        if len(result) > MAX_RESULT_CHARS:
+            truncated_parts = []
+            total_chars = 0
+            included_count = 0
+            separator = "\n\n---------------------\n\n"
+            for conversation in conversations:
+                part = Conversation.conversations_to_string([conversation])
+                if total_chars + len(part) + len(separator) > MAX_RESULT_CHARS and included_count > 0:
+                    break
+                truncated_parts.append(part)
+                total_chars += len(part) + len(separator)
+                included_count += 1
+
+            omitted = len(conversations) - included_count
+            result = separator.join(truncated_parts)
+            if omitted > 0:
+                result += f"\n\n[Note: {omitted} older conversations omitted to fit context. Ask about a shorter time period for full details.]"
+
+        return result, len(conversations)


Tests exercise a re-implementation, not the actual tool

_call_tool_with_conversations is a copy-paste of the truncation algorithm rather than a call to get_conversations_tool. This means the tests can pass even if the logic inside the real tool diverges (e.g., wrong parameters forwarded to conversations_to_string, or the truncation block is accidentally removed). The numbering bug noted above is a direct consequence of this gap — the helper calls conversations_to_string([conversation]) identically to the production code, so the tests pass while every conversation gets labelled "Conversation #1".

Consider patching out the Firestore/config dependencies and exercising get_conversations_tool end-to-end, or at least extracting the truncation logic into a standalone helper that both the tool and the tests can call directly.

beastoin added 3 commits March 10, 2026 14:21

test: add 9 tests for chat context truncation (issue #4927)

6465544

chore: add test_chat_context_truncation to test.sh

fd1b355

greptile-apps bot reviewed Mar 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(chat): prevent freeze on long date range queries (#4927)#5543

fix(chat): prevent freeze on long date range queries (#4927)#5543
beastoin wants to merge 3 commits intomainfrom
fix/chat-freeze-long-date-range-4927

beastoin commented Mar 10, 2026

Uh oh!

greptile-apps bot commented Mar 10, 2026

Uh oh!

greptile-apps bot Mar 10, 2026

Uh oh!

greptile-apps bot Mar 10, 2026

Uh oh!

greptile-apps bot Mar 10, 2026

Uh oh!

greptile-apps bot Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	for i, conversation in enumerate(conversations):
	for conversation in conversations:

Conversation

beastoin commented Mar 10, 2026

Summary

Changes

Test plan

Uh oh!

greptile-apps bot commented Mar 10, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant