-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Parent Issue
Sub-issue of #344 (Leverage Additional SDK v0.2.25 Capabilities)
Follow-up to #349 (SessionStart/SessionEnd hooks)
Feature Type
Output / reporting formats
Problem or Need
#349 adds session_timing data to ExecutionResult via SessionStart/SessionEnd hooks, capturing session lifecycle events (startup source, end reason, timestamps). However, Stage 4 evaluation doesn't currently use this data — it's captured but not surfaced in reports or available for evaluation logic.
Exposing session timing would enable:
- Reporting session startup latency (time from SessionStart to first tool use)
- Identifying slow plugin/MCP initialization in batch sessions
- Distinguishing infrastructure overhead from actual LLM execution time
- Tracking
/clear-triggered session restarts within batches
Proposed Solution
-
Add session timing to evaluation report output — Include
session_timingsummary in the per-scenario evaluation results (e.g., startup duration, total session duration, number of restarts) -
Surface in CLI output — Show timing breakdown in the results table or summary
-
Optionally use in evaluation logic — Stage 4 could flag scenarios with unusually long startup times or unexpected session restarts as diagnostic notes
Pipeline Stage Affected
Stage 4 - Evaluation
Component Type
Not component-specific
Alternatives Considered
- Leave session_timing as raw data only, accessible via JSON output but not surfaced in reports. Simpler but less useful for diagnostics.
How important is this feature to you?
Low - Just a suggestion
Additional Context
The SessionTimingCapture type from #349 provides:
starts: SessionStartCapture[]— source ('startup'|'resume'|'clear'|'compact'), timestamp, agent_type?, model?end?: SessionEndCapture— reason (ExitReason), timestamp
Risk Assessment
Low risk — Purely additive reporting. No changes to execution or evaluation behavior.
Files Affected
src/stages/4-evaluation/— Report generation- CLI output formatters
🤖 Created with Claude Code