Squeez

Squeeze out the juice, leave the pulp behind.

Tool output pruner for LLM coding agents
Pipe any tool output (pytest, grep, git log, npm build, kubectl, ...) through squeez with a task description, get back only the relevant lines
Fine-tuned Qwen 3.5 2B, 0.79 F1, ~91% compression
CLI pipe, Python library, or vLLM server

Existing context pruning tools (SWE-Pruner, Zilliz Semantic Highlight, Provence) are built for source code or document paragraphs. They don't handle the mixed, unstructured format of tool output (stack traces interleaved with passing tests, grep matches with context lines, build logs with timestamps). Squeez is trained specifically on 14 types of tool output from real SWE-bench workflows.

pip install squeez
python -m pytest tests/ -v 2>&1 | squeez "find the test failure related to authentication"

Example

Task: "Find the test failure related to authentication"

Before (45 lines, ~1,500 tokens) After (6 lines, ~200 tokens)

$ python -m pytest tests/ -v
======================== test session starts ========================
platform linux -- Python 3.12.1, pytest-8.1.1
collected 23 items

tests/test_auth.py::test_login_valid PASSED
tests/test_auth.py::test_login_invalid PASSED
tests/test_auth.py::test_token_refresh FAILED
tests/test_auth.py::test_logout PASSED
tests/test_users.py::test_create_user PASSED
tests/test_users.py::test_delete_user PASSED
tests/test_users.py::test_list_users PASSED
tests/test_middleware.py::test_csrf_check PASSED
tests/test_middleware.py::test_rate_limit PASSED
tests/test_middleware.py::test_cors_headers PASSED

======================= FAILURES ================================
_____ test_token_refresh ________________________________________

    def test_token_refresh(self):
        token = self.client.get_token(expired=True)
>       refreshed = self.client.refresh(token)
E       AuthenticationError: Token refresh window expired
E       Expected: new token within 30m window
E       Got: rejection after 15m (timeout changed?)

tests/test_auth.py:47: AuthenticationError
================ short test summary info ========================
FAILED tests/test_auth.py::test_token_refresh
================== 1 failed, 9 passed ==========================

tests/test_auth.py::test_token_refresh FAILED

    def test_token_refresh(self):
        token = self.client.get_token(expired=True)
>       refreshed = self.client.refresh(token)
E       AuthenticationError: Token refresh window expired
E       Expected: new token within 30m window
E       Got: rejection after 15m (timeout changed?)

87% compression. Only the failing test and its traceback survive.

More examples

Filtering git log:

$ git log --oneline -25 | squeez "find the commit that changed the authentication timeout"

u6v7w8x Change auth timeout from 30m to 1h

Filtering build output:

$ npm run build 2>&1 | squeez "find the TypeScript error"

src/components/Auth.tsx(34,5): error TS2345: Argument of type 'string' is
  not assignable to parameter of type 'AuthToken'.

Filtering kubectl output:

$ kubectl describe pod api-server-7d4b | squeez "why is the pod failing"

    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
  Warning  BackOff  3m (x5)  kubelet  Back-off restarting failed container

Results

Evaluated on 617 held-out test samples from SWE-bench, across 14 tool types:

Model	Precision	Recall	F1	Compression
Squeez-2B	0.8043	0.8624	0.7895	0.9150
Qwen 3.5 35B A3B (zero-shot)	0.7402	0.7498	0.7000	0.9177
Kimi K2 (zero-shot)	0.6128	0.5286	0.5344	0.9425
Qwen 3.5 2B (untrained)	0.4154	0.5299	0.4075	0.8197
BM25 (10%)	0.1277	0.2172	0.1314	0.9036
Random (10%)	0.0738	0.1009	0.0697	0.9067

Squeez-2B (2B params) outperforms a 35B MoE model at zero-shot and is 6x better than BM25 on Span F1.

Quick start

With vLLM (recommended)

pip install vllm
vllm serve KRLabsOrg/squeez-2b --dtype bfloat16 --max-model-len 16384

# Use from squeez CLI
pip install squeez
export SQUEEZ_SERVER_URL=http://localhost:8000/v1
cat output.txt | squeez "find the bug"

vLLM keeps the model warm in memory with batched inference and high throughput.

Local inference (no server)

pip install squeez

cat output.txt | squeez "Find the failing traceback block"
squeez "Fix the CSRF bug" --input-file output.txt

Note: Local mode loads the model on every call. Fine for one-off use, but for repeated calls (e.g. an agent piping every tool through squeez), use vLLM.

Any OpenAI-compatible API

Works with Groq, Together, or any OpenAI-compatible server. Set the URL, model name, and API key:

export SQUEEZ_SERVER_URL=https://api.groq.com/openai/v1
export SQUEEZ_SERVER_MODEL=squeez
export SQUEEZ_API_KEY=gsk_...

Python API

from squeez.inference.extractor import ToolOutputExtractor

# Default: loads KRLabsOrg/squeez-2b locally
extractor = ToolOutputExtractor()

# Or connect to a server
extractor = ToolOutputExtractor(base_url="http://localhost:8000/v1")

filtered = extractor.extract(
    task="Find the referer validation block",
    tool_output=raw_output,
)

Use with Claude Code

Add to your CLAUDE.md:

Always when you invoke a shell command, pipe it through `squeez` and tell exactly what you want to know.

Examples:
- `bun test 2>&1 | squeez "did the tests pass?"`
- `git log --oneline -50 | squeez "find the commit that broke CSRF"`
- `cat src/auth/middleware.py | squeez "find the referer validation logic"`

Do NOT use squeez when:
- You need exact, uncompressed output (e.g. writing a patch)
- The command is interactive

Works with other coding agents (Codex CLI, OpenCode, etc.) via their equivalent instruction files.

Advanced

Configuration

Resolved in order: CLI flags > environment variables > config file.

Config file is loaded from the first found: ./squeez.yaml, ./configs/default.yaml, ~/.config/squeez/config.yaml.

# squeez.yaml
server_url: "http://localhost:8000/v1"
# local_model_path: "./output/squeez_qwen"  # for local inference instead
# backend: null  # auto-detect; or "transformers", "vllm", "encoder"

Environment variables:

Variable	Description
`SQUEEZ_SERVER_URL`	Server URL (vLLM, Ollama, etc.)
`SQUEEZ_LOCAL_MODEL`	Path to local model directory
`SQUEEZ_SERVER_MODEL`	Model name on the server
`SQUEEZ_API_KEY`	API key (if needed)
`SQUEEZ_BACKEND`	Force backend: `transformers`, `vllm`, `encoder`

Encoder models

Squeez also supports encoder-based extraction (ModernBERT, etc.) as an alternative to the generative model. These are faster but less accurate.

Two encoder approaches:

Token encoder: per-token binary classification, aggregated per line via max-pool
Pooled encoder: single-pass encoder with line-level mean-pool classification

from squeez.inference.extractor import ToolOutputExtractor

extractor = ToolOutputExtractor(model_path="./output/squeez_encoder")
filtered = extractor.extract(task="Find the bug", tool_output=raw_output)

Standalone loading without squeez installed:

from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained("output/squeez_pooled", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("output/squeez_pooled")

result = model.process(
    task="Find the traceback",
    tool_output=open("output.log").read(),
    tokenizer=tokenizer,
)
print(result["highlighted_lines"])

Training

See TRAINING.md for full training and evaluation commands.

# Download dataset
python scripts/download_data.py

# Train generative model (Qwen 3.5 2B + LoRA)
squeez train --train-file data/train.jsonl --eval-file data/dev.jsonl

# Train token encoder
python -m squeez.encoder.train \
    --classifier-type token \
    --train-file data/encoder_train.jsonl \
    --eval-file data/encoder_dev.jsonl \
    --base-model answerdotai/ModernBERT-base \
    --output-dir output/squeez_encoder

# Evaluate
squeez eval --extractor-model output/squeez_qwen --eval-file data/test.jsonl

Dataset

Training data: KRLabsOrg/tool-output-extraction-swebench

Built from SWE-bench repositories. Each sample has:

query: a focused extraction request or agent subgoal
tool_output: raw tool output as seen by the agent
gold_spans: contiguous spans over the raw output

From this canonical format, Squeez derives generative SFT files and encoder training files.

To regenerate from scratch:

python scripts/build_full_dataset.py \
    --output-dir data/v3 \
    --teacher-model openai/gpt-oss-120b \
    --teacher-base-url http://localhost:8000/v1

Citation

@software{kovacs2026squeez,
    title={Squeez: Compressing Tool Output for LLM Coding Agents},
    author={Adam Kovacs},
    year={2026},
    url={https://github.com/KRLabsOrg/squeez}
}

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.github/workflows		.github/workflows
assets		assets
configs		configs
docs		docs
scripts		scripts
squeez		squeez
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
TRAINING.md		TRAINING.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
requirements-encoder.txt		requirements-encoder.txt
requirements-train.txt		requirements-train.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Squeez

Example

Results

Quick start

With vLLM (recommended)

Local inference (no server)

Any OpenAI-compatible API

Python API

Use with Claude Code

Advanced

Citation

License

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Squeez

Example

Results

Quick start

With vLLM (recommended)

Local inference (no server)

Any OpenAI-compatible API

Python API

Use with Claude Code

Advanced

Citation

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages