Skip to content

[Feature]: Expose session timing metrics in Stage 4 evaluation reports #357

@sjnims

Description

@sjnims

Parent Issue

Sub-issue of #344 (Leverage Additional SDK v0.2.25 Capabilities)
Follow-up to #349 (SessionStart/SessionEnd hooks)

Feature Type

Output / reporting formats

Problem or Need

#349 adds session_timing data to ExecutionResult via SessionStart/SessionEnd hooks, capturing session lifecycle events (startup source, end reason, timestamps). However, Stage 4 evaluation doesn't currently use this data — it's captured but not surfaced in reports or available for evaluation logic.

Exposing session timing would enable:

  • Reporting session startup latency (time from SessionStart to first tool use)
  • Identifying slow plugin/MCP initialization in batch sessions
  • Distinguishing infrastructure overhead from actual LLM execution time
  • Tracking /clear-triggered session restarts within batches

Proposed Solution

  1. Add session timing to evaluation report output — Include session_timing summary in the per-scenario evaluation results (e.g., startup duration, total session duration, number of restarts)

  2. Surface in CLI output — Show timing breakdown in the results table or summary

  3. Optionally use in evaluation logic — Stage 4 could flag scenarios with unusually long startup times or unexpected session restarts as diagnostic notes

Pipeline Stage Affected

Stage 4 - Evaluation

Component Type

Not component-specific

Alternatives Considered

  • Leave session_timing as raw data only, accessible via JSON output but not surfaced in reports. Simpler but less useful for diagnostics.

How important is this feature to you?

Low - Just a suggestion

Additional Context

The SessionTimingCapture type from #349 provides:

  • starts: SessionStartCapture[] — source ('startup'|'resume'|'clear'|'compact'), timestamp, agent_type?, model?
  • end?: SessionEndCapture — reason (ExitReason), timestamp

Risk Assessment

Low risk — Purely additive reporting. No changes to execution or evaluation behavior.

Files Affected

  • src/stages/4-evaluation/ — Report generation
  • CLI output formatters

🤖 Created with Claude Code

Metadata

Metadata

Assignees

Labels

component:cliCLI entry point and pipeline orchestration (src/index.ts)effort:medium1-4 hoursenhancementNew feature or requestperformancePerformance and efficiency improvementspriority:lowNice to havescope:hooksHook evaluation (Phase 2)stage:evaluationStage 4: Programmatic detection and LLM judgestatus:analyzedIssue has been analyzed by Claude

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions