Motivation
Currently each audit run overwrites audit.json, losing historical data. Teams need temporal tracking to:
- See violation trends over time (improving vs. regressing)
- Identify which files/rules are getting better or worse
- Track progress toward migration goals
- Correlate audit runs with commits, threads, and milestones
- Measure migration velocity and estimate completion
Proposed Solution
Introduce immutable audit snapshots with metadata, summaries, and diffs to enable progress tracking.
Directory Structure
.amp/effect-migrate/
├── audits/
│ ├── manifest.json # Append-only index of all runs
│ ├── metrics.jsonl # One line per run for fast trending
│ └── runs/
│ └── <run_id>/
│ ├── meta.json # Run metadata (commit, CI, actor, context)
│ ├── audit.json.gz # Raw audit (gzipped)
│ ├── summary.json # Aggregated metrics
│ ├── diff.json # Delta vs parent run
│ └── fingerprints.json.gz # Optional cache
├── audit.json # Latest (symlink or copy for compatibility)
├── index.json
└── threads.json
Run ID Format
<ISO8601-timestamp>--<commit-sha7>--<short-uuid>
Example: 2025-01-16T12:34:56Z--abc1234--p9k3f2
Run Metadata (meta.json)
{
"schema_version": "1.0",
"run_id": "2025-01-16T12:34:56Z--abc1234--p9k3f2",
"timestamp": "2025-01-16T12:34:56Z",
"parent_run_id": "2025-01-15T18:01:02Z--89de321--m1n2o3",
"tool_version": "0.1.0",
"commit": {
"sha": "abc1234...",
"branch": "feature/migrate",
"repo": "org/repo"
},
"ci": {
"provider": "github",
"run_url": "https://...",
"job_id": "123456"
},
"actor": {
"user": "alice",
"email": "alice@example.com"
},
"context": {
"pr": 987,
"milestone": "M3",
"thread_ids": ["t-abc123..."],
"tags": ["pilot", "phase-1"]
}
}
Violation Fingerprinting
Compute stable fingerprints to track "same violation" across runs:
fingerprint = sha1(
rule_id + "|" +
rel_path + "|" +
start_line + ":" + start_col + "-" + end_line + ":" + end_col + "|" +
normalize(message)
)
Normalization: Strip dynamic numbers/paths from messages for stability.
Summary (summary.json)
Fast-access aggregated metrics without re-scanning raw audit:
{
"schema_version": "1.0",
"run_id": "...",
"totals": {
"violations": 3210,
"files_with_violations": 456
},
"by_rule": {
"no-async-await": 230,
"no-barrel-imports": 31
},
"by_severity": {
"error": 950,
"warning": 2260
},
"by_dir": {
"src/": 2100,
"packages/core/": 600,
"tests/": 510
},
"top_files": [
{ "path": "src/api/handler.ts", "count": 120 }
],
"time_to_run_ms": 52234
}
Diff (diff.json)
Delta relative to parent run:
{
"schema_version": "1.0",
"run_id": "...",
"parent_run_id": "...",
"delta": {
"new": [
{
"fingerprint": "abc123...",
"rule_id": "no-async-await",
"path": "src/new-file.ts",
"location": { "start": { "line": 10, "column": 5 } }
}
],
"fixed": [ /* ... */ ],
"unchanged_count": 2980
},
"by_rule_delta": {
"no-async-await": { "new": 5, "fixed": 10, "net": -5 }
},
"by_file_delta": {
"src/api/handler.ts": { "new": 2, "fixed": 0, "net": 2 }
}
}
Manifest & Metrics for Fast Queries
manifest.json (append-only):
[
{
"run_id": "2025-01-16T12:34:56Z--abc1234--p9k3f2",
"timestamp": "2025-01-16T12:34:56Z",
"parent_run_id": "...",
"commit_sha": "abc1234",
"pr": 987,
"milestone": "M3",
"tags": ["pilot"]
}
]
metrics.jsonl (one line per run):
{"run_id":"...","ts":"...","total":3210,"error":950,"warn":2260,"new":42,"fixed":81,"net":-39}
Query Patterns
-
Trending overall violations:
tail -100 metrics.jsonl | jq '{ts, total, error, warn}'
-
Trend by rule:
jq -s 'map({ts, count: .rules["no-async-await"]})' metrics.jsonl
-
Top regressing files:
# Sum net deltas across last 10 diff.json files
-
Progress to goal:
jq 'select(.total < 100)' metrics.jsonl | head -1
-
Correlate with threads/commits:
jq -s 'map({ts, commit_sha, thread_ids, total})' manifest.json
Write Flow
- Run audit and generate
audit.json
- Create
run_id and runs/<run_id>/ directory
- Write
meta.json with commit/CI/actor/context
- Gzip and write
audit.json.gz
- Compute fingerprints and
summary.json
- Load
parent_run_id from manifest (latest entry)
- Compute
diff.json vs parent
- Append to
manifest.json and metrics.jsonl
- Update
audit.json symlink/copy to point to latest
Performance & Size Constraints
- Always gzip raw
audit.json (level 6-9)
- Summaries/diffs/metrics stay small
- Retention policy: Keep last N raw audits (configurable), but keep all metrics/summaries
- Optional sharding:
audits/2025-01/runs/...
- Optional: Cache fingerprints per run to speed up diffing
CLI Helpers (Future)
effect-migrate history ls [--limit 10]
effect-migrate history trend [--rule <id>] [--window 30]
effect-migrate history diff <run_a> <run_b>
effect-migrate history hotfiles [--window 10]
Implementation Phases
Phase 1: Basic History (Small)
Phase 2: Summaries & Metrics (Medium)
Phase 3: Diffs & Analysis (Medium)
Phase 4: Query Tools (Large)
Advanced Path (Future)
If file-based queries become too slow or complex:
- SQLite/DuckDB: Ingest runs into local DB
- Tables: runs, violations, summaries, deltas
- Indexes: By rule, file, fingerprint
- Complex queries: Multi-dimensional slicing, joins
Estimated Effort
- Phase 1: Small (≤1 hour) - Basic immutable snapshots
- Phase 2: Medium (1-3 hours) - Summaries and metrics
- Phase 3: Medium (1-3 hours) - Diffs and deltas
- Phase 4: Large (1-2 days) - Full tooling and analytics
Acceptance Criteria
Related
Motivation
Currently each audit run overwrites
audit.json, losing historical data. Teams need temporal tracking to:Proposed Solution
Introduce immutable audit snapshots with metadata, summaries, and diffs to enable progress tracking.
Directory Structure
Run ID Format
Run Metadata (meta.json)
{ "schema_version": "1.0", "run_id": "2025-01-16T12:34:56Z--abc1234--p9k3f2", "timestamp": "2025-01-16T12:34:56Z", "parent_run_id": "2025-01-15T18:01:02Z--89de321--m1n2o3", "tool_version": "0.1.0", "commit": { "sha": "abc1234...", "branch": "feature/migrate", "repo": "org/repo" }, "ci": { "provider": "github", "run_url": "https://...", "job_id": "123456" }, "actor": { "user": "alice", "email": "alice@example.com" }, "context": { "pr": 987, "milestone": "M3", "thread_ids": ["t-abc123..."], "tags": ["pilot", "phase-1"] } }Violation Fingerprinting
Compute stable fingerprints to track "same violation" across runs:
Normalization: Strip dynamic numbers/paths from messages for stability.
Summary (summary.json)
Fast-access aggregated metrics without re-scanning raw audit:
{ "schema_version": "1.0", "run_id": "...", "totals": { "violations": 3210, "files_with_violations": 456 }, "by_rule": { "no-async-await": 230, "no-barrel-imports": 31 }, "by_severity": { "error": 950, "warning": 2260 }, "by_dir": { "src/": 2100, "packages/core/": 600, "tests/": 510 }, "top_files": [ { "path": "src/api/handler.ts", "count": 120 } ], "time_to_run_ms": 52234 }Diff (diff.json)
Delta relative to parent run:
{ "schema_version": "1.0", "run_id": "...", "parent_run_id": "...", "delta": { "new": [ { "fingerprint": "abc123...", "rule_id": "no-async-await", "path": "src/new-file.ts", "location": { "start": { "line": 10, "column": 5 } } } ], "fixed": [ /* ... */ ], "unchanged_count": 2980 }, "by_rule_delta": { "no-async-await": { "new": 5, "fixed": 10, "net": -5 } }, "by_file_delta": { "src/api/handler.ts": { "new": 2, "fixed": 0, "net": 2 } } }Manifest & Metrics for Fast Queries
manifest.json (append-only):
[ { "run_id": "2025-01-16T12:34:56Z--abc1234--p9k3f2", "timestamp": "2025-01-16T12:34:56Z", "parent_run_id": "...", "commit_sha": "abc1234", "pr": 987, "milestone": "M3", "tags": ["pilot"] } ]metrics.jsonl (one line per run):
{"run_id":"...","ts":"...","total":3210,"error":950,"warn":2260,"new":42,"fixed":81,"net":-39}Query Patterns
Trending overall violations:
Trend by rule:
jq -s 'map({ts, count: .rules["no-async-await"]})' metrics.jsonlTop regressing files:
# Sum net deltas across last 10 diff.json filesProgress to goal:
Correlate with threads/commits:
jq -s 'map({ts, commit_sha, thread_ids, total})' manifest.jsonWrite Flow
audit.jsonrun_idandruns/<run_id>/directorymeta.jsonwith commit/CI/actor/contextaudit.json.gzsummary.jsonparent_run_idfrom manifest (latest entry)diff.jsonvs parentmanifest.jsonandmetrics.jsonlaudit.jsonsymlink/copy to point to latestPerformance & Size Constraints
audit.json(level 6-9)audits/2025-01/runs/...CLI Helpers (Future)
Implementation Phases
Phase 1: Basic History (Small)
meta.jsonmanifest.jsonaudit.jsonas latest symlink/copyPhase 2: Summaries & Metrics (Medium)
summary.jsonper runmetrics.jsonlfor fast trendingPhase 3: Diffs & Analysis (Medium)
diff.jsonvs parentPhase 4: Query Tools (Large)
Advanced Path (Future)
If file-based queries become too slow or complex:
Estimated Effort
Acceptance Criteria
Related