Skip to content

refactor: promote trace metrics to top-level EvaluationResult fields#398

Open
christso wants to merge 15 commits intomainfrom
refactor/promote-trace-metrics
Open

refactor: promote trace metrics to top-level EvaluationResult fields#398
christso wants to merge 15 commits intomainfrom
refactor/promote-trace-metrics

Conversation

@christso
Copy link
Collaborator

Summary

  • Promotes execution metrics (tokenUsage, costUsd, durationMs, startTime, endTime) from nested TraceSummary to top-level fields on EvaluationResult, EvaluationContext, and JSONL output
  • Introduces TraceComputeResult type to separate trace-specific data (tool calls, errors) from execution metrics
  • Updates all consumers: evaluators, code judge payload/schemas, CLI commands, OTel exporter, baseline stripping, docs, examples, and tests

Breaking change: JSONL output now has cost_usd, duration_ms, token_usage, start_time, end_time at the result root instead of nested under trace. External jq scripts or parsers that read trace.cost_usd etc. need updating.

Test plan

  • All 992 unit tests pass
  • Build, typecheck, lint all pass (pre-push hook)
  • Run a real eval with --trace and verify promoted fields appear at JSONL root
  • Verify agentv trace show and agentv trace stats display metrics correctly

🤖 Generated with Claude Code

christso and others added 15 commits February 27, 2026 20:29
…esult

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ionContext

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update all core test files to use the new flat metric fields
(tokenUsage, costUsd, durationMs) at context/result level instead
of nested inside TraceSummary. Fix computeTraceSummary usage to
destructure TraceComputeResult.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move token_usage, cost_usd, duration_ms from trace object to
top-level in test fixtures and update assertions accordingly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update code judge scripts to read tokenUsage, costUsd, durationMs
from top-level input fields instead of trace object. Update jq
example in trace-analysis README.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move token_usage, cost_usd, duration_ms, start_time, end_time from
trace structure to top-level input fields in code judge docs.
Update jq examples in trace CLI docs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…race!)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace non-null assertions with proper narrowing in execution-metrics evaluator
- Fix biome formatting across multiple files

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix jq paths in trace-analyst skill (trace.cost_usd -> cost_usd, etc.)
- Update evaluator JSDoc to reflect promoted metrics access pattern

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: f5f9663
Status: ✅  Deploy successful!
Preview URL: https://1eec0017.agentv.pages.dev
Branch Preview URL: https://refactor-promote-trace-metri.agentv.pages.dev

View logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant