Compression Lab - Benchmark

A benchmarking suite that measures Coding Agent's token consumption under different scenarios, then analyses the results to evaluate the economic impact of Edgee's AI token compressor.

See the reports/ folder for detailed, real-world reports from our latest benchmark runs, including token usage, costs, and scenario breakdowns.

Overview

The benchmark works in two phases:

Run — Launch isolated coding-agent sessions that complete a fixed set of coding instructions. Each session runs in one of three scenarios (compression strategies).
Analyse — Read the generated session artefacts and produce cost reports.

Prerequisites

Node.js ≥ 18 with npm
edgee CLI installed
claude CLI installed and accessible in your PATH
RTK (Rust Token Killer): required for the rtk scenario; see https://github.com/rtk-ai/rtk
An .env file at the project root (see below)

Install dependencies:

npm install

Environment setup

First, create an empty .edgee/credentials.toml file at the root of this project, so you can use mulitple edgee profiles. Then, create two Edgee accounts, one for the normal scenario (without compression) and one for the edgee scenario (with compression). If you want to test rtk as well, you'll have to create another account dedicated to it (optional).

Then use login

edgee auth login -p normal 
edgee auth login -p edgee
edgee auth login -p rtk #optional

Reports

If you want to generate reports, you'll need the following variable:

EDGEE_API_TOKEN_REPORT=<your-token-to-generate-reports>

Running a benchmark session

./run.sh <agent> <scenario>

Agents:

Scenario	Description
`claude`	Baseline — Claude requests go through Edgee AI Gateway with no compression
`codex`	Edgee token compressor is enabled; input tokens are reduced before forwarding to Anthropic

Scenarios:

Scenario	Description
`normal`	Baseline — Claude requests go through Edgee AI Gateway with no compression
`edgee`	Edgee token compressor is enabled; input tokens are reduced before forwarding to Anthropic
`rtk`	RTK (Rust Token Killer) is enabled as a local bash proxy; Claude's bash tool calls go through RTK before hitting the gateway

Each run:

Copies the cli/ source directory into a fresh _<agent>-<scenario>-<random>/ folder
Creates an isolated Claude/Codex config directory inside it
Launches Claude/Codex with --dangerously-skip-permissions (or equivalent)

Example:

./run.sh claude edgee

This creates _claude-edgee-4a2f8c1d/ and starts a Claude session inside it.

What to do inside the Coding Agent session

Once the agent starts, put it in plan mode, then paste the coding instructions one at a time from instructions.md. For each instruction:

Paste the instruction
Let Claude produce a plan
Approve the plan and let it execute
Move on to the next instruction

Analysing results

Standard benchmark (`npm run analyze`)

Reads all _<agent>-<scenario>-* session directories that contain session-stats.json (excluding -full ones), aggregates token and cost metrics by agent + scenario, then calls the Edgee LLM API to produce an AI-written analysis.

npm run analyze

Outputs two files in the project root:

report-<ISO-date>.json — raw aggregated metrics
report-<ISO-date>.md — human-readable markdown report with tables and LLM analysis

Endurance benchmark (`npm run analyze-full`)

Reads all _<agent>-<scenario>-*-full session directories, uses session-stats.json for token/cost totals, and uses claude-pro-usage.json for the recorded per-instruction endurance progression.

npm run analyze-full

Outputs:

report-full-<ISO-date>.json
report-full-<ISO-date>.md

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
cli @ e0ebdbe		cli @ e0ebdbe
config		config
reports		reports
src		src
.env.example		.env.example
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
instructions.md		instructions.md
package-lock.json		package-lock.json
package.json		package.json
run.sh		run.sh
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Compression Lab - Benchmark

Overview

Prerequisites

Environment setup

Reports

Running a benchmark session

What to do inside the Coding Agent session

Analysing results

Standard benchmark (`npm run analyze`)

Endurance benchmark (`npm run analyze-full`)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Compression Lab - Benchmark

Overview

Prerequisites

Environment setup

Reports

Running a benchmark session

What to do inside the Coding Agent session

Analysing results

Standard benchmark (npm run analyze)

Endurance benchmark (npm run analyze-full)

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Standard benchmark (`npm run analyze`)

Endurance benchmark (`npm run analyze-full`)

Packages