A benchmarking suite that measures Coding Agent's token consumption under different scenarios, then analyses the results to evaluate the economic impact of Edgee's AI token compressor.
See the
reports/folder for detailed, real-world reports from our latest benchmark runs, including token usage, costs, and scenario breakdowns.
The benchmark works in two phases:
- Run — Launch isolated coding-agent sessions that complete a fixed set of coding instructions. Each session runs in one of three scenarios (compression strategies).
- Analyse — Read the generated session artefacts and produce cost reports.
- Node.js ≥ 18 with
npm edgeeCLI installedclaudeCLI installed and accessible in yourPATH- RTK (Rust Token Killer): required for the
rtkscenario; see https://github.com/rtk-ai/rtk - An
.envfile at the project root (see below)
Install dependencies:
npm installFirst, create an empty .edgee/credentials.toml file at the root of this project, so you can use mulitple edgee profiles.
Then, create two Edgee accounts, one for the normal scenario (without compression) and one for the edgee scenario (with compression).
If you want to test rtk as well, you'll have to create another account dedicated to it (optional).
Then use login
edgee auth login -p normal
edgee auth login -p edgee
edgee auth login -p rtk #optionalIf you want to generate reports, you'll need the following variable:
EDGEE_API_TOKEN_REPORT=<your-token-to-generate-reports>./run.sh <agent> <scenario>Agents:
| Scenario | Description |
|---|---|
claude |
Baseline — Claude requests go through Edgee AI Gateway with no compression |
codex |
Edgee token compressor is enabled; input tokens are reduced before forwarding to Anthropic |
Scenarios:
| Scenario | Description |
|---|---|
normal |
Baseline — Claude requests go through Edgee AI Gateway with no compression |
edgee |
Edgee token compressor is enabled; input tokens are reduced before forwarding to Anthropic |
rtk |
RTK (Rust Token Killer) is enabled as a local bash proxy; Claude's bash tool calls go through RTK before hitting the gateway |
Each run:
- Copies the
cli/source directory into a fresh_<agent>-<scenario>-<random>/folder - Creates an isolated Claude/Codex config directory inside it
- Launches Claude/Codex with
--dangerously-skip-permissions(or equivalent)
Example:
./run.sh claude edgeeThis creates _claude-edgee-4a2f8c1d/ and starts a Claude session inside it.
Once the agent starts, put it in plan mode, then paste the coding instructions one at a time from instructions.md. For each instruction:
- Paste the instruction
- Let Claude produce a plan
- Approve the plan and let it execute
- Move on to the next instruction
Reads all _<agent>-<scenario>-* session directories that contain session-stats.json (excluding -full ones), aggregates token and cost metrics by agent + scenario, then calls the Edgee LLM API to produce an AI-written analysis.
npm run analyzeOutputs two files in the project root:
report-<ISO-date>.json— raw aggregated metricsreport-<ISO-date>.md— human-readable markdown report with tables and LLM analysis
Reads all _<agent>-<scenario>-*-full session directories, uses session-stats.json for token/cost totals, and uses claude-pro-usage.json for the recorded per-instruction endurance progression.
npm run analyze-fullOutputs:
report-full-<ISO-date>.jsonreport-full-<ISO-date>.md