Skip to content

Default to token-based guard thresholds; make system overhead configurable#10

Open
conal wants to merge 1 commit intoRuya-AI:mainfrom
conal:smart-token-defaults
Open

Default to token-based guard thresholds; make system overhead configurable#10
conal wants to merge 1 commit intoRuya-AI:mainfrom
conal:smart-token-defaults

Conversation

@conal
Copy link

@conal conal commented Mar 9, 2026

Summary

  • Guard now defaults to token-based thresholds (75%/45% of context window) instead of requiring explicit --threshold-tokens. Previously, token checking was entirely skipped without this flag, so the guard only watched file size (50MB default).
  • SYSTEM_OVERHEAD_TOKENS is now configurable via --system-overhead-tokens flag or COZEMPIC_SYSTEM_OVERHEAD_TOKENS env var. The default (21K) underestimates overhead for sessions with heavy rules files, MCP servers, and tool schemas (which can reach 30K-40K+).

Motivation

Sessions with high token density but small file size (meaning-rich conversations without build output) could hit context limits long before the 50MB file size threshold. In our case, Cozempic's guard reported 68% context usage while the client reported 85%, and compaction triggered before the guard intervened.

Root causes:

  1. Token checking was opt-in (--threshold-tokens), not the default
  2. The heuristic fallback path underestimated overhead (21K constant vs ~34K actual)

Changes

  • tokens.py: Added get_system_overhead_tokens() (env var override), default_token_thresholds() (75%/45% of context window). Replaced hardcoded SYSTEM_OVERHEAD_TOKENS in heuristic and calibration paths.
  • guard.py: start_guard() now computes default token thresholds from the session's context window when none are passed.
  • cli.py: Added --system-overhead-tokens global flag. Updated help text for --threshold-tokens and --soft-threshold-tokens.

Test plan

  • All 24 existing test_tokens.py tests pass
  • Smoke-tested default_token_thresholds() returns (150000, 90000) for 200K context
  • Smoke-tested COZEMPIC_SYSTEM_OVERHEAD_TOKENS env var override works

Fixes #8, fixes #9.

🤖 Generated with Claude Code

…rable

Guard now computes token thresholds automatically (75%/45% of context
window) when --threshold-tokens is not passed. Previously, token checking
was entirely skipped without explicit flags, so the guard only watched
file size (50MB default). Sessions with high token density but small
file size (e.g. meaning-rich conversations without build output) could
hit context limits long before file size thresholds.

Also makes SYSTEM_OVERHEAD_TOKENS configurable via --system-overhead-tokens
flag or COZEMPIC_SYSTEM_OVERHEAD_TOKENS env var. The default (21K) is
conservative; sessions with heavy rules files, MCP servers, and tool
schemas can have 30K-40K+ tokens of system overhead, causing the heuristic
to underestimate context usage.

Fixes Ruya-AI#8, fixes Ruya-AI#9.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SYSTEM_OVERHEAD_TOKENS underestimates real overhead, guard fires too late Guard should default to token-based thresholds, not just file size

1 participant