Default to token-based guard thresholds; make system overhead configurable by conal · Pull Request #10 · Ruya-AI/cozempic

conal · 2026-03-09T02:34:49Z

Summary

Guard now defaults to token-based thresholds (75%/45% of context window) instead of requiring explicit --threshold-tokens. Previously, token checking was entirely skipped without this flag, so the guard only watched file size (50MB default).
SYSTEM_OVERHEAD_TOKENS is now configurable via --system-overhead-tokens flag or COZEMPIC_SYSTEM_OVERHEAD_TOKENS env var. The default (21K) underestimates overhead for sessions with heavy rules files, MCP servers, and tool schemas (which can reach 30K-40K+).

Motivation

Sessions with high token density but small file size (meaning-rich conversations without build output) could hit context limits long before the 50MB file size threshold. In our case, Cozempic's guard reported 68% context usage while the client reported 85%, and compaction triggered before the guard intervened.

Root causes:

Token checking was opt-in (--threshold-tokens), not the default
The heuristic fallback path underestimated overhead (21K constant vs ~34K actual)

Changes

tokens.py: Added get_system_overhead_tokens() (env var override), default_token_thresholds() (75%/45% of context window). Replaced hardcoded SYSTEM_OVERHEAD_TOKENS in heuristic and calibration paths.
guard.py: start_guard() now computes default token thresholds from the session's context window when none are passed.
cli.py: Added --system-overhead-tokens global flag. Updated help text for --threshold-tokens and --soft-threshold-tokens.

Test plan

All 24 existing test_tokens.py tests pass
Smoke-tested default_token_thresholds() returns (150000, 90000) for 200K context
Smoke-tested COZEMPIC_SYSTEM_OVERHEAD_TOKENS env var override works

Fixes #8, fixes #9.

🤖 Generated with Claude Code

…rable Guard now computes token thresholds automatically (75%/45% of context window) when --threshold-tokens is not passed. Previously, token checking was entirely skipped without explicit flags, so the guard only watched file size (50MB default). Sessions with high token density but small file size (e.g. meaning-rich conversations without build output) could hit context limits long before file size thresholds. Also makes SYSTEM_OVERHEAD_TOKENS configurable via --system-overhead-tokens flag or COZEMPIC_SYSTEM_OVERHEAD_TOKENS env var. The default (21K) is conservative; sessions with heavy rules files, MCP servers, and tool schemas can have 30K-40K+ tokens of system overhead, causing the heuristic to underestimate context usage. Fixes Ruya-AI#8, fixes Ruya-AI#9. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

This was referenced Mar 9, 2026

Global flags after subcommand silently rejected by argparse #13

Open

Question: Is the guard daemon being used successfully by other users? #18

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default to token-based guard thresholds; make system overhead configurable#10

Default to token-based guard thresholds; make system overhead configurable#10
conal wants to merge 1 commit intoRuya-AI:mainfrom
conal:smart-token-defaults

conal commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

conal commented Mar 9, 2026

Summary

Motivation

Changes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant