Add file-based usage metrics for local runs (#110) by neoneye · Pull Request #219 · PlanExeOrg/PlanExe

neoneye · 2026-03-09T11:39:26Z

Summary

Add file-based usage metrics (usage_metrics.jsonl) for local runs that don't have database access
Records per-LLM-call metrics via llama_index instrumentation: model (with provider prefix), tokens (input/output/thinking), duration, cost, and success/failure
Failures are recorded separately by LLMExecutor since instrumentation end events aren't emitted on failure
Pipeline sets the metrics path at start and clears it after completion
usage_metrics.jsonl is excluded from pipeline progress calculation

Example output

{"timestamp": "2026-03-10T13:36:48.250446", "success": true, "model": "Google AI Studio:google/gemini-2.0-flash-001", "duration_seconds": 4.879, "input_tokens": 5316, "output_tokens": 643, "cost_usd": 0.0007888}
{"timestamp": "2026-03-10T13:36:53.554864", "success": true, "model": "Google:google/gemini-2.0-flash-001", "duration_seconds": 5.237, "input_tokens": 8877, "output_tokens": 562, "cost_usd": 0.0011125}

Code quality improvements

Move usage_metrics import to top-level so bad imports fail hard on startup
Remove redundant try/except — record_usage_metric handles errors internally
Warn (not debug-log) when metrics path is unset or write fails
Document set_usage_metrics_path(None) teardown in module docstring
Place success field before model in JSONL output for easier error skimming

Bug fixes

Resume for legacy plans: Frontend and MCP resume checks incorrectly rejected plans created before pipeline_version was stamped into parameters (comparing None != PIPELINE_VERSION). The worker-side check against the actual snapshot metadata file is the real safety gate
Heartbeat crash: A corrupted psycopg2 connection during WorkerItem.upsert_heartbeat() was propagating up and killing Luigi tasks. Wrapped in try/except with session rollback since the heartbeat is just a liveness signal

Files changed

worker_plan/worker_plan_internal/llm_util/usage_metrics.py — core module for file-based metric recording
worker_plan/worker_plan_internal/llm_util/track_activity.py — records successful calls via _record_file_usage_metric()
worker_plan/worker_plan_internal/llm_util/llm_executor.py — records failed calls only
worker_plan/worker_plan_internal/plan/run_plan_pipeline.py — sets/clears metrics path around pipeline execution
worker_plan/worker_plan_api/filenames.py — add USAGE_METRICS_JSONL constant
frontend_multi_user/src/app.py — fix resume version check for legacy plans
mcp_cloud/db_queries.py — fix resume version check for legacy plans
worker_plan_database/app.py — protect pipeline from heartbeat failures
docs/proposals/110-usage-metrics-local-runs.md — mark as implemented
docs/proposals/111-promising-directions.md — mark Mcp example plans #110 as complete

Test plan

Run a local pipeline and verify usage_metrics.jsonl is created with per-call token counts and cost
Verify the model field includes the provider prefix (e.g. Google AI Studio:google/gemini-2.0-flash-001)
Verify metrics recording does not block pipeline on write failure
Stop a plan, then resume it — verify usage_metrics.jsonl is created on resume
Resume a legacy plan (no pipeline_version in parameters) — verify it is not rejected
Verify a heartbeat database error does not crash the pipeline

🤖 Generated with Claude Code

Write per-LLM-call metrics (model, tokens, duration, success/failure) to usage_metrics.jsonl in the run output directory. Works without a database — designed for local/offline runs where the DB-backed token_metrics_store is unavailable. - Add USAGE_METRICS_JSONL to ExtraFilenameEnum - New usage_metrics.py module with set/get path and record function - Extend LLMExecutor._record_attempt_token_metrics() to also write file-based metrics alongside existing DB recording - Wire usage metrics path in ExecutePipeline.run() - Add usage_metrics.jsonl to progress ignore list Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…utor Fail hard on startup if imports are bad instead of silently swallowing errors inside a try block at runtime. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

record_usage_metric and extract_token_count handle errors internally, so the outer try block was unnecessary. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Plans created before pipeline_version was stamped into parameters were incorrectly rejected by the frontend and MCP resume checks. The worker-side check against the actual snapshot metadata file is the real safety gate, so allow None through at the API layer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Point to worker_plan_database/app.py and the actual snapshot metadata file (001-3-planexe_metadata.json) so readers can find the real check. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The detailed provider/model info (e.g. "Google AI Studio:google/gemini-2.0-flash-001") was already extracted by token_counter but not recorded. Now included in usage_metrics.jsonl when available. Also move success field before model for easier error skimming. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Successful LLM calls are now recorded by TrackActivity which has access to the real ChatResponse with full token counts, cost, and provider:model info. LLMExecutor only records failures since instrumentation end events are not emitted when the call fails. Skip events without token usage or cost to avoid "unknown" rows. Remove redundant upstream_provider and upstream_model fields since model already contains "provider:model". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Mark proposal 110 as implemented with PR #219 details: JSONL format, instrumentation-based recording, resolved open questions. Mark 110 as complete in the promising directions roadmap. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

A corrupted psycopg2 connection during WorkerItem.upsert_heartbeat() was propagating up and killing Luigi tasks. The heartbeat is just a liveness signal — wrap it in try/except with a session rollback so the pipeline can continue. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

neoneye deleted the branch main March 10, 2026 00:48

neoneye closed this Mar 10, 2026

neoneye reopened this Mar 10, 2026

neoneye changed the base branch from feature/plan-resume-tool to main March 10, 2026 01:43

neoneye and others added 12 commits March 10, 2026 11:05

Merge branch 'main' into feature/110-usage-metrics

97a5c35

Move usage_metrics and token_counter imports to top-level in llm_exec…

7e1d5f8

…utor Fail hard on startup if imports are bad instead of silently swallowing errors inside a try block at runtime. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Change usage metric recording failure log level from debug to warning

7344ed8

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Warn when usage metrics path is unset or write fails

60c9bc5

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Document set_usage_metrics_path(None) teardown in usage example

b1d5e3d

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Remove redundant try/except around usage metric recording

f7a1dd8

record_usage_metric and extract_token_count handle errors internally, so the outer try block was unnecessary. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Elaborate on worker-side version check in resume comments

253bab4

Point to worker_plan_database/app.py and the actual snapshot metadata file (001-3-planexe_metadata.json) so readers can find the real check. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

neoneye merged commit 6970347 into main Mar 10, 2026
3 checks passed

neoneye deleted the feature/110-usage-metrics branch March 10, 2026 15:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add file-based usage metrics for local runs (#110)#219

Add file-based usage metrics for local runs (#110)#219
neoneye merged 13 commits intomainfrom
feature/110-usage-metrics

neoneye commented Mar 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

neoneye commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Example output

Code quality improvements

Bug fixes

Files changed

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

neoneye commented Mar 9, 2026 •

edited

Loading