refactor: promote trace metrics to top-level EvaluationResult fields by christso · Pull Request #398 · EntityProcess/agentv

christso · 2026-02-28T13:12:21Z

Summary

Promotes execution metrics (tokenUsage, costUsd, durationMs, startTime, endTime) from nested TraceSummary to top-level fields on EvaluationResult, EvaluationContext, and JSONL output
Introduces TraceComputeResult type to separate trace-specific data (tool calls, errors) from execution metrics
Updates all consumers: evaluators, code judge payload/schemas, CLI commands, OTel exporter, baseline stripping, docs, examples, and tests

Breaking change: JSONL output now has cost_usd, duration_ms, token_usage, start_time, end_time at the result root instead of nested under trace. External jq scripts or parsers that read trace.cost_usd etc. need updating.

Test plan

All 992 unit tests pass
Build, typecheck, lint all pass (pre-push hook)
Run a real eval with --trace and verify promoted fields appear at JSONL root
Verify agentv trace show and agentv trace stats display metrics correctly

🤖 Generated with Claude Code

…esult Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ionContext Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Update all core test files to use the new flat metric fields (tokenUsage, costUsd, durationMs) at context/result level instead of nested inside TraceSummary. Fix computeTraceSummary usage to destructure TraceComputeResult. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Move token_usage, cost_usd, duration_ms from trace object to top-level in test fixtures and update assertions accordingly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Update code judge scripts to read tokenUsage, costUsd, durationMs from top-level input fields instead of trace object. Update jq example in trace-analysis README. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Move token_usage, cost_usd, duration_ms, start_time, end_time from trace structure to top-level input fields in code judge docs. Update jq examples in trace CLI docs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…race!) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Replace non-null assertions with proper narrowing in execution-metrics evaluator - Fix biome formatting across multiple files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Fix jq paths in trace-analyst skill (trace.cost_usd -> cost_usd, etc.) - Update evaluator JSDoc to reflect promoted metrics access pattern Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cloudflare-workers-and-pages · 2026-02-28T13:13:03Z

Deploying agentv with Cloudflare Pages

Latest commit:	`f5f9663`
Status:	✅ Deploy successful!
Preview URL:	https://1eec0017.agentv.pages.dev
Branch Preview URL:	https://refactor-promote-trace-metri.agentv.pages.dev

View logs

christso and others added 15 commits February 27, 2026 20:29

refactor: remove promoted fields from TraceSummary, add TraceComputeR…

af8c2ad

…esult Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

refactor: add promoted metrics fields to EvaluationResult and Evaluat…

6065cf1

…ionContext Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

refactor: update orchestrator to produce/consume flat metrics

8563862

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

refactor: update evaluators to read promoted metrics from context

16a4a25

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

refactor: update code judge payload and Zod schemas for promoted metrics

440d2a0

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

refactor: add promoted metrics to baseline stripped fields

07868ab

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

refactor: update OTel exporter to read flat metrics + reduced trace

87fa0a6

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

refactor: update CLI commands for promoted metrics

0934468

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

test: update CLI tests for promoted trace metrics

bac5819

Move token_usage, cost_usd, duration_ms from trace object to top-level in test fixtures and update assertions accordingly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

refactor: update example scripts for promoted metrics

dbd3e02

Update code judge scripts to read tokenUsage, costUsd, durationMs from top-level input fields instead of trace object. Update jq example in trace-analysis README. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs: update JSONL format examples for promoted metrics

d671ff6

Move token_usage, cost_usd, duration_ms, start_time, end_time from trace structure to top-level input fields in code judge docs. Update jq examples in trace CLI docs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: correct syntax error in execution-metrics evaluator (trace? -> t…

107d0f6

…race!) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: resolve lint/typecheck errors and formatting issues

bd7459d

- Replace non-null assertions with proper narrowing in execution-metrics evaluator - Fix biome formatting across multiple files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs: update stale trace references in skill file and JSDoc comments

f5f9663

- Fix jq paths in trace-analyst skill (trace.cost_usd -> cost_usd, etc.) - Update evaluator JSDoc to reflect promoted metrics access pattern Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: promote trace metrics to top-level EvaluationResult fields#398

refactor: promote trace metrics to top-level EvaluationResult fields#398
christso wants to merge 15 commits intomainfrom
refactor/promote-trace-metrics

christso commented Feb 28, 2026

Uh oh!

cloudflare-workers-and-pages bot commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

christso commented Feb 28, 2026

Summary

Test plan

Uh oh!

cloudflare-workers-and-pages bot commented Feb 28, 2026

Deploying agentv with Cloudflare Pages

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant