The open-source LLM token compression tool that cuts AI agent costs by up to 97%.
Claw Compactor is an open-source LLM token compression engine that reduces AI agent costs by up to 97%. It compresses workspace memory, session transcripts, and prompt context using 5 deterministic compression layers — completely free, with no LLM inference required.
If you're building AI agents with large context windows (100K+ tokens), Claw Compactor can save you thousands of dollars per month by compressing what gets loaded into the model context before it reaches the LLM.
Built for OpenClaw agents. Works with any LLM workflow — ChatGPT, Claude, GPT-4, Gemini, Llama, or any OpenAI-compatible API — that needs token saving and prompt compression.
- AI agent developers building autonomous agents (OpenClaw, AutoGPT, CrewAI, LangChain, Semantic Kernel)
- LLM application builders managing large context windows with ChatGPT, Claude, or GPT-4
- Teams running multi-agent systems where token costs multiply across sub-agents
- Anyone paying for LLM API calls who wants to reduce their bill without losing context quality
- Open-source AI projects looking for free, deterministic prompt compression
Every token you send to an LLM costs money. As context windows grow, so does spending:
| Scenario | Typical Context Size | Monthly Cost (Opus) |
|---|---|---|
| AI coding agent | 100K tokens/session | $500–$2,000 |
| Multi-agent orchestration | 200K+ tokens | $2,000–$10,000 |
| Autonomous agent (24/7) | 1M+ tokens/day | $10,000+ |
Claw Compactor performs LLM token compression at the workspace level — reducing what gets loaded into context before it reaches the model. Less tokens in = less cost out.
- 🔥 Up to 97% token reduction on session transcripts
- 📉 50–70% token compression on first run across any workspace
- 🧮 5 compression layers working in sequence for maximum token saving
- 💰 Zero LLM cost — all compression is rule-based and deterministic
- 🔄 Lossless roundtrip for dictionary, RLE, and rule-based token compression
- 📊 Tiered summaries (L0/L1/L2) for progressive context loading and token reduction
- 🌏 CJK-aware — full Chinese/Japanese/Korean token compression support
- ⚡ One command (
full) runs the entire token compression pipeline - 🪝 Auto-compress hook (v7.0) — compress on every file change, zero config
Claw Compactor applies 5 sequential compression layers to achieve maximum token reduction:
| Layer | Technique | What It Does | Token Savings |
|---|---|---|---|
| 1 | Rule engine | Dedup lines, strip markdown filler, merge sections | 4–8% |
| 2 | Dictionary encoding | Auto-learned codebook, $XX token substitution |
4–5% |
| 3 | Observation compression | Session JSONL → structured summaries | ~97% |
| 4 | RLE patterns | Path shorthand ($WS), IP prefix, enum compaction |
1–2% |
| 5 | Compressed Context Protocol | ultra/medium/light abbreviation | 20–60% |
Layers 1, 2, and 4 are fully lossless — perfect roundtrip decompression. Layers 3 and 5 are lossy but preserve all facts, decisions, and context — only verbose formatting is removed.
# Clone the token compression tool
git clone https://github.com/aeromomo/claw-compactor.git
cd claw-compactor
# Benchmark: see how many tokens you'd save (non-destructive)
python3 scripts/mem_compress.py /path/to/workspace benchmark
# Compress: run full token compression pipeline
python3 scripts/mem_compress.py /path/to/workspace full
# Auto-compress: compress on every file change (v7.0+)
python3 scripts/mem_compress.py /path/to/workspace autoRequirements: Python 3.9+. Optional: pip install tiktoken for exact token counts (falls back to CJK-aware heuristic).
All commands: python3 scripts/mem_compress.py <workspace> <command> [options]
| Command | Description | Token Savings |
|---|---|---|
full |
Complete token compression pipeline (all layers) | 50%+ combined |
benchmark |
Dry-run token reduction report | — |
compress |
Rule-based token compression | 4–8% |
dict |
Dictionary encoding with auto-codebook | 4–5% |
observe |
Session transcript → observation compression | ~97% |
tiers |
Generate L0/L1/L2 token-reduced summaries | 88–95% |
dedup |
Cross-file duplicate token detection | varies |
estimate |
Token count report | — |
audit |
Workspace health check | — |
optimize |
Tokenizer-level format optimization | 1–3% |
auto |
Watch and auto-compress on file changes | continuous |
--json— Machine-readable JSON output--dry-run— Preview token savings without writing--since YYYY-MM-DD— Filter sessions by date--auto-merge— Auto-merge duplicates (dedup)--report— Print token savings summary--quiet— Suppress output (for hook/automation use)--changed-file <path>— Compress a single changed file (hook mode)
| Use Case | Token Reduction | Notes |
|---|---|---|
Session transcripts (observe) |
~97% | Megabytes of JSONL → concise observation MD |
| Verbose/new workspace | 50–70% | First run on unoptimized workspace |
| Regular maintenance | 10–20% | Weekly token compression on active workspace |
| Already-optimized | 3–12% | Diminishing returns — workspace is clean |
$ python3 scripts/mem_compress.py ~/workspace benchmark
Claw Compactor — Token Compression Benchmark
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Before: 167,821 tokens
After: 157,376 tokens
Saved: 10,445 tokens (6.2%)
Breakdown:
Rule compression: 2,100 tokens saved
Dictionary encoding: 1,800 tokens saved
RLE patterns: 445 tokens saved
Format optimization: 6,100 tokens saved
Combine Claw Compactor's token compression with LLM prompt caching for compound savings:
{
"agents": {
"defaults": {
"models": {
"anthropic/claude-opus-4-6": {
"params": {
"cacheRetention": "long"
}
}
}
}
}
}- Token compression reduces the number of tokens sent
- Prompt caching reduces the cost per cached token by 90%
- Combined: 50% token reduction + 90% cache discount = 95% effective cost reduction
Claw Compactor can automatically compress files every time your AI agent writes or edits them:
# Enable auto-compression (zero config)
python3 scripts/mem_compress.py /path/to/workspace auto --changed-file memory/2026-03-09.mdThe PostToolUse hook integrates with OpenClaw to trigger automatic token compression after every file change. Smart skip logic ignores .git/, node_modules/, binary files, and files under 200 tokens.
Performance: <2 seconds per file, typically 10–20% token reduction per compression pass.
┌─────────────────────────────────────────────────────────────┐
│ mem_compress.py │
│ (unified token compression entry point) │
└──────┬──────┬──────┬──────┬──────┬──────┬──────┬──────┬────┘
│ │ │ │ │ │ │ │
▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼
estimate compress dict dedup observe tiers audit optimize
└──────┴──────┴──┬───┴──────┴──────┴──────┴──────┘
▼
┌────────────────┐
│ lib/ │
│ tokens.py │ ← tiktoken or CJK-aware heuristic
│ markdown.py │ ← section parsing
│ dedup.py │ ← shingle hashing
│ dictionary.py │ ← codebook compression
│ rle.py │ ← path/IP/enum encoding
│ optimizer.py │ ← format optimization
│ config.py │ ← JSON config
│ exceptions.py │ ← error types
└────────────────┘
How does Claw Compactor compare to other LLM token compression and prompt compression tools?
| Feature | Claw Compactor | LLMLingua-2 | SelectiveContext | Prompt Compression (generic) |
|---|---|---|---|---|
| LLM required | ❌ No (zero cost) | ✅ Yes (inference cost) | ✅ Yes | ✅ Usually |
| Max token reduction | Up to 97% | 20–50% | 15–40% | 10–30% |
| Latency | <50ms | 200ms+ | 100ms+ | varies |
| Lossless layers | 3 of 5 ✅ | 0 | 0 | 0 |
| Workspace-level compression | ✅ Full workspace | ❌ Per-prompt only | ❌ Per-prompt only | ❌ Per-prompt only |
| CJK support | ✅ Full | ❌ No | ||
| ROUGE-L @ rate=0.3 | 0.653 | 0.346 | — | — |
| ROUGE-L @ rate=0.5 | 0.723 | 0.570 | — | — |
| Cost to run | $0 | LLM inference cost | LLM inference cost | varies |
| Integration | OpenClaw native, standalone | Python library | Python library | varies |
| License | MIT | MIT | MIT | varies |
Benchmark result: Claw Compactor outperforms LLMLingua-2 by +88.2% on ROUGE-L at 0.3 compression rate and +26.8% at 0.5 compression rate. Full benchmark results →
Reduce the context window footprint of autonomous AI agents. Less tokens = faster responses + lower cost.
Compress long conversation histories before loading into LLM context. Preserve decisions and facts, remove filler.
Shrink MEMORY.md, daily notes, and workspace files. Keep them under token budget without losing information.
Convert raw JSONL session logs (often 100K+ tokens) into structured 3K-token observations. 97% token saving.
When sub-agents inherit parent context, every token counts. Compress before passing to reduce cost across the agent tree.
Optional claw-compactor-config.json in workspace root:
{
"chars_per_token": 4,
"level0_max_tokens": 200,
"level1_max_tokens": 500,
"dedup_similarity_threshold": 0.6,
"dedup_shingle_size": 3
}All fields optional — sensible defaults are used when absent.
| File | Purpose |
|---|---|
memory/.codebook.json |
Dictionary codebook (must travel with memory files) |
memory/.observed-sessions.json |
Tracks processed transcripts |
memory/observations/ |
Compressed session summaries |
memory/MEMORY-L0.md |
Level 0 summary (~200 tokens) |
memory/.compactor-state.json |
Auto-compress tracking state |
Run token compression weekly or on heartbeat:
## Memory Maintenance (weekly)
- python3 skills/claw-compactor/scripts/mem_compress.py <workspace> benchmark
- If savings > 5%: run full token compression pipeline
- If pending transcripts: run observeCron example:
0 3 * * 0 cd /path/to/skills/claw-compactor && python3 scripts/mem_compress.py /path/to/workspace fullQ: Will token compression lose my data? A: Rule engine, dictionary, RLE, and tokenizer optimization are fully lossless. Observation compression and CCP are lossy but preserve all facts and decisions.
Q: How does dictionary decompression work?
A: decompress_text(text, codebook) expands all $XX codes back. The codebook JSON must be present.
Q: Can I run individual token compression steps?
A: Yes. Every command is independent: compress, dict, observe, tiers, dedup, optimize.
Q: What if tiktoken isn't installed? A: Falls back to a CJK-aware heuristic (chars÷4). Token counts are ~90% accurate.
Q: Does token compression handle Chinese/Japanese/Unicode? A: Yes. Full CJK support including character-aware token estimation and Chinese punctuation normalization.
Q: How does this compare to LLMLingua? A: Claw Compactor achieves higher ROUGE-L scores at the same compression rate, requires zero LLM inference cost, and works at the workspace level rather than individual prompts.
Q: Can I use this without OpenClaw? A: Yes. Claw Compactor is a standalone Python tool. OpenClaw integration (hooks, auto-compress) is optional.
FileNotFoundErroron workspace: Ensure path points to workspace root (containsmemory/orMEMORY.md)- Dictionary decompression fails: Check
memory/.codebook.jsonexists and is valid JSON - Zero savings on
benchmark: Workspace is already optimized — token compression has diminishing returns observefinds no transcripts: Check sessions directory for.jsonlfiles- Token count seems wrong: Install tiktoken:
pip3 install tiktoken
# Clone from GitHub
git clone https://github.com/aeromomo/claw-compactor.git
cd claw-compactor
# Optional: install for exact token counting
pip install tiktoken
# Or install as a Python package (with all extras)
pip install -e ".[accurate]"
# Development install (includes pytest)
pip install -e ".[dev,accurate]"Requirements: Python 3.9+. No other dependencies required — tiktoken is optional for exact token counts (falls back to CJK-aware heuristic).
- OpenClaw — The AI agent platform that Claw Compactor was built for
- ClawhubAI — Discover more AI agent skills and tools
- OpenClaw Discord — Community support and discussion
- OpenClaw Docs — Full documentation
token-compression llm-tools prompt-compression context-compression ai-agent cost-reduction context-window-optimization workspace-compression memory-compression openclaw python developer-tools ai-infrastructure llm-cost-reduction token-optimization
- Inspired by claude-mem by thedotmack
- Built by Bot777 for OpenClaw
MIT — Free for commercial and personal use.
