fix(llm): prevent reranker context overflow on large chunks by kevin-courbet · Pull Request #234 · tobi/qmd

kevin-courbet · 2026-02-20T19:29:37Z

Problem

LlamaRankingContext.rankAll throws "input lengths exceed context size" when chunks exceed 2048 tokens. This happens with code blocks, non-ASCII text, or any content where the chunk + query + Qwen3 template overhead exceeds RERANK_CONTEXT_SIZE (2048).

The crash is unrecoverable — the entire rerank call fails, which means search results come back unranked or not at all.

Fix

Two-part fix:

Increase RERANK_CONTEXT_SIZE from 2048 → 8192. The Qwen3-Reranker model supports larger contexts. At 8192 with flash attention, VRAM usage is ~4 GB per context (vs ~960 MB at 2048) — a reasonable trade-off for robustness. Still 5× less than auto (40960).
Add truncation safety net. Before passing documents to rankAll(), estimate each document's token count (chars / 4) and truncate any that would exceed context size minus query and template overhead. This ensures the reranker never crashes even with unexpectedly large chunks.

Changes

src/llm.ts: bump RERANK_CONTEXT_SIZE to 8192, add pre-rankAll truncation logic

Backward compatible — no API or behavioral changes beyond improved resilience. Slightly higher VRAM usage per rerank context.

This fix was developed with AI assistance (Claude). The problem was discovered and validated in a production OpenClaw deployment using QMD with an RTX 5090.

Increase RERANK_CONTEXT_SIZE from 2048 to 8192 and add truncation safety net to prevent crashes when individual chunks exceed the context window. The Qwen3-Reranker model supports larger contexts, and 8192 tokens only uses ~4 GB VRAM with flash attention (vs ~960 MB at 2048), which is a reasonable trade-off for robustness. Additionally, before passing documents to rankAll(), estimate each document's token count (chars/4) and truncate any that would exceed the context size minus query and template overhead. This ensures the reranker never crashes even with unexpectedly large chunks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

fix(llm): prevent reranker context overflow on large chunks#234

fix(llm): prevent reranker context overflow on large chunks#234
kevin-courbet wants to merge 1 commit intotobi:mainfrom
kevin-courbet:fix/reranker-context-overflow

kevin-courbet commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

kevin-courbet commented Feb 20, 2026

Problem

Fix

Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant