feat(v0.4-0.7): Multi-assistant provenance + semantic blame + evidence packs#4
feat(v0.4-0.7): Multi-assistant provenance + semantic blame + evidence packs#4
Conversation
…e packs ## v0.7 - Multi-Assistant Support (Codex CLI) - Add codex_capture.py for parsing Codex JSONL session logs - Add diachron-codex Rust wrapper binary for standalone usage - Integrate capture into /handoffcodex and /handoffcodex-full skills - Support both old (custom_tool_call) and new (exec_command) Codex formats - 12 Python tests, 3 Rust tests passing ## v0.6 - Reliability & Developer UX - Add log rotation with tracing-appender (daily rolling) - Add `diachron maintenance` command (VACUUM, ANALYZE, prune) - Add `diachron timeline --watch` for real-time events - Create IPC-API.md for community integrations - Fix OpenAI→Anthropic references in docs ## v0.5 - Intent Extraction - Extract user intent from conversation history for blame - Multi-factor relevance scoring (+3 file, +2 tool, +1 branch) - 9 new intent extraction tests (51 total) ## v0.4 - Semantic Blame & Evidence Packs - Add fingerprint-based blame (content_hash, context_hash) - Add PR correlation (events→commits→PRs) - Add evidence pack generation with hash chain verification - Add GitHub Action template for automated PR comments - Three-tier confidence matching (HIGH/MEDIUM/LOW/INFERRED) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary of ChangesHello @wolfiesch, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request delivers a suite of major features aimed at significantly enhancing the system's ability to track, attribute, and verify code changes made by AI assistants. It expands compatibility to multiple AI tools, introduces robust mechanisms for code provenance and tamper detection, and provides tools for generating detailed narratives of AI-assisted development. These changes collectively improve the transparency, auditability, and overall reliability of AI-driven coding workflows. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Pull request overview
This is a major feature release spanning versions 0.4-0.7, introducing multi-assistant provenance tracking, semantic blame, intent extraction, and reliability improvements. The PR adds comprehensive features for tracking AI-generated code changes across multiple assistants (Claude Code, OpenAI Codex CLI), with tamper-detection via hash chains and exportable evidence packs for PR narratives.
Changes:
- Multi-assistant support with Codex CLI integration via Rust wrapper and Python capture module
- Semantic blame using content fingerprinting (SHA256 hashes + semantic similarity)
- Intent extraction from conversation history to show "why" code was written
- Database maintenance commands (VACUUM, ANALYZE, pruning), log rotation, and real-time timeline watch mode
- Evidence pack generation for PR correlation with JSON/Markdown export
- Hash chain integrity verification for tamper detection
- Search result caching and parallel FTS/vector search
- GitHub Action for automated PR narrative posting
Reviewed changes
Copilot reviewed 44 out of 44 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| rust/tests/integration_tests.rs | Comprehensive integration tests for hash chain, PR correlation, evidence packs, and fingerprinting |
| rust/daemon/src/main.rs | Added search cache, log rotation with daily file appender, test helper methods |
| rust/daemon/src/handlers.rs | New handlers for maintenance, fingerprint blame, PR evidence correlation; parallelized hybrid search with caching |
| rust/daemon/src/db.rs | Hash chain integration in event insertion, maintenance operations, intent extraction queries, read-only connections |
| rust/daemon/src/cache.rs | LRU cache implementation for search results with database version tracking |
| rust/core/src/types.rs | New IPC message types for maintenance, blame, and evidence correlation |
| rust/core/src/schema.rs | Schema v4 migration adding hash chain and fingerprint columns |
| rust/core/src/pr_correlation.rs | PR-to-commit event correlation with confidence levels (HIGH/MEDIUM/LOW) |
| rust/core/src/hash_chain.rs | SHA256 hash chain implementation with GENESIS_HASH and checkpoint support |
| rust/core/src/fingerprint.rs | Content-based fingerprinting for stable blame across refactors |
| rust/core/src/evidence_pack.rs | Evidence pack generation and Markdown rendering for PR narratives |
| rust/codex-wrapper/src/main.rs | Standalone Rust wrapper for OpenAI Codex CLI capturing file operations |
| rust/cli/src/main.rs | New commands: verify, maintenance, blame, export-evidence, pr-comment; timeline watch mode |
| lib/codex_capture.py | Python module for parsing Codex JSONL sessions and sending events to daemon |
| lib/test_codex_capture.py | Comprehensive pytest tests for Codex capture functionality |
| github-action/ | TypeScript GitHub Action for posting evidence to PR comments |
| docs/IPC-API.md | Complete IPC API documentation for daemon integration |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
lib/codex_capture.py
Outdated
|
|
||
| import argparse | ||
| import json | ||
| import os |
There was a problem hiding this comment.
Import of 'os' is not used.
| import os |
lib/codex_capture.py
Outdated
| import re | ||
| import socket | ||
| import sys | ||
| from datetime import datetime |
There was a problem hiding this comment.
Import of 'datetime' is not used.
| from datetime import datetime |
| "timestamp": timestamp, | ||
| "raw_input": cmd, | ||
| }) | ||
| except json.JSONDecodeError: |
There was a problem hiding this comment.
'except' clause does nothing but pass and there is no explanatory comment.
| except json.JSONDecodeError: | |
| except json.JSONDecodeError: | |
| # Ignore malformed arguments for this event and continue processing other log entries. |
docs/IPC-API.md
Outdated
| run: | | ||
| echo '{"type":"CorrelateEvidence","payload":{ | ||
| "pr_id": ${{ github.event.pull_request.number }}, | ||
| "commits": ${{ toJson(github.event.pull_request.commits) }}, | ||
| "branch": "${{ github.head_ref }}", | ||
| "start_time": "2026-01-01T00:00:00Z", | ||
| "end_time": "2026-01-11T23:59:59Z" | ||
| }}' | nc -U ~/.diachron/diachron.sock > evidence.json |
There was a problem hiding this comment.
The GitHub Actions example constructs a shell command with github.head_ref interpolated directly inside a single-quoted echo string that is then piped to nc. Because branch names on GitHub can contain characters like single quotes and are attacker-controlled for forked PRs, a malicious branch name can break out of the quoted string and inject arbitrary shell commands executed in the Actions runner. You should build the JSON payload without unescaped string interpolation (e.g., using a safer JSON construction mechanism or proper shell escaping) so that github.head_ref and other dynamic values cannot alter the shell command structure.
| run: | | |
| echo '{"type":"CorrelateEvidence","payload":{ | |
| "pr_id": ${{ github.event.pull_request.number }}, | |
| "commits": ${{ toJson(github.event.pull_request.commits) }}, | |
| "branch": "${{ github.head_ref }}", | |
| "start_time": "2026-01-01T00:00:00Z", | |
| "end_time": "2026-01-11T23:59:59Z" | |
| }}' | nc -U ~/.diachron/diachron.sock > evidence.json | |
| env: | |
| PR_ID: ${{ github.event.pull_request.number }} | |
| COMMITS: ${{ toJson(github.event.pull_request.commits) }} | |
| BRANCH: ${{ github.head_ref }} | |
| run: | | |
| PAYLOAD=$(python - << 'PY' | |
| import json, os, sys | |
| pr_id = int(os.environ["PR_ID"]) | |
| commits = json.loads(os.environ["COMMITS"]) | |
| branch = os.environ["BRANCH"] | |
| payload = { | |
| "type": "CorrelateEvidence", | |
| "payload": { | |
| "pr_id": pr_id, | |
| "commits": commits, | |
| "branch": branch, | |
| "start_time": "2026-01-01T00:00:00Z", | |
| "end_time": "2026-01-11T23:59:59Z", | |
| }, | |
| } | |
| sys.stdout.write(json.dumps(payload)) | |
| PY | |
| ) | |
| printf '%s\n' "$PAYLOAD" | nc -U ~/.diachron/diachron.sock > evidence.json |
There was a problem hiding this comment.
Code Review
This is a massive and impressive pull request that introduces a suite of powerful features for provenance, including multi-assistant support, semantic blame, and evidence packs. The implementation is robust, well-documented, and thoroughly tested. The addition of hash-chain tamper evidence, content fingerprinting, and a detailed IPC API are particularly noteworthy. The new benchmark scripts and GitHub Action are also great additions. My review has identified a few areas for improvement, primarily concerning a potential race condition in the Codex wrapper, some minor issues in the benchmark scripts and documentation, and some code duplication in the markdown rendering logic. Overall, this is an excellent contribution that significantly enhances the capabilities of the project.
| } | ||
|
|
||
| /// Find the most recent Codex session JSONL file | ||
| fn find_latest_session() -> Option<PathBuf> { |
There was a problem hiding this comment.
The find_latest_session function finds the most recently modified session file. If multiple codex commands are run concurrently or in quick succession, this could lead to a race condition where the wrapper captures events from the wrong session. To make this more robust, you could have codex output the session file path and pass it to the wrapper, or use a more specific identifier than just 'latest' to associate the execution with its corresponding log file.
| time_ms() { | ||
| local start=$(python3 -c "import time; print(int(time.time() * 1000))") | ||
| eval "$@" >/dev/null 2>&1 | ||
| local end=$(python3 -c "import time; print(int(time.time() * 1000))") | ||
| echo $((end - start)) | ||
| } |
There was a problem hiding this comment.
The time_ms function uses eval, which can be a security risk if the command arguments are not properly sanitized. While it seems safe with the current usage in this script, it's a best practice to avoid eval. Consider rewriting the function to execute the command directly and updating the call sites.
| time_ms() { | |
| local start=$(python3 -c "import time; print(int(time.time() * 1000))") | |
| eval "$@" >/dev/null 2>&1 | |
| local end=$(python3 -c "import time; print(int(time.time() * 1000))") | |
| echo $((end - start)) | |
| } | |
| time_ms() { | |
| local start=$(python3 -c "import time; print(int(time.time() * 1000))") | |
| "$@" >/dev/null 2>&1 | |
| local end=$(python3 -c "import time; print(int(time.time() * 1000))") | |
| echo $((end - start)) | |
| } |
benchmarks/compare_benchmarks.sh
Outdated
|
|
||
| # Calculate improvements | ||
| if [[ "$DIACHRON_COLD_START" =~ ^[0-9]+$ ]] && [[ "$EPISODIC_COLD_START" == "2500-3500" ]]; then | ||
| COLD_IMPROVEMENT=$(echo "scale=0; 3000 / $DIACHRON_COLD_START" | bc) |
There was a problem hiding this comment.
The calculation for COLD_IMPROVEMENT could result in a division-by-zero error if $DIACHRON_COLD_START is 0. You've handled this for SEARCH_IMPROVEMENT on line 260. I recommend applying a similar safeguard here to prevent the script from failing.
| COLD_IMPROVEMENT=$(echo "scale=0; 3000 / $DIACHRON_COLD_START" | bc) | |
| COLD_IMPROVEMENT=$(echo "scale=0; 3000 / $DIACHRON_COLD_START" | bc 2>/dev/null || echo "N/A") |
| | Exchanges | 284288 | ||
| 284729 | ~230K | |
| get_percentiles() { | ||
| local path="$1" | ||
| python3 - "$path" <<'PY' | ||
| import json | ||
| import math | ||
| import sys | ||
| from pathlib import Path | ||
|
|
||
| path = Path(sys.argv[1]) | ||
| if not path.exists() or path.stat().st_size == 0: | ||
| print("N/A N/A N/A") | ||
| raise SystemExit(0) | ||
|
|
||
| data = json.loads(path.read_text()) | ||
| times = data.get("results", [{}])[0].get("times", []) | ||
| if not times: | ||
| print("N/A N/A N/A") | ||
| raise SystemExit(0) | ||
|
|
||
| def pct(vals, p): | ||
| vals = sorted(vals) | ||
| k = (len(vals) - 1) * (p / 100) | ||
| f = math.floor(k) | ||
| c = math.ceil(k) | ||
| if f == c: | ||
| return vals[int(k)] | ||
| return vals[f] + (vals[c] - vals[f]) * (k - f) | ||
|
|
||
| p50 = pct(times, 50) * 1000 | ||
| p95 = pct(times, 95) * 1000 | ||
| p99 = pct(times, 99) * 1000 | ||
| print(f"{p50:.1f} {p95:.1f} {p99:.1f}") | ||
| PY | ||
| } |
There was a problem hiding this comment.
| ### Python Example | ||
|
|
||
| ```python | ||
| import socket |
docs/IPC-API.md
Outdated
| "pr_id": ${{ github.event.pull_request.number }}, | ||
| "commits": ${{ toJson(github.event.pull_request.commits) }}, |
There was a problem hiding this comment.
The example for generating an evidence pack in a CI/CD pipeline incorrectly uses github.event.pull_request.commits, which is a number representing the count of commits, not a list of commit SHAs. This will cause the IPC call to fail. The action should instead gather the list of commit SHAs from the PR context, for example by using the GitHub API or git log.
| Commands::PrComment { pr, evidence } => { | ||
| println!("Posting PR narrative comment...\n"); | ||
|
|
||
| // Read evidence pack | ||
| let evidence_content = std::fs::read_to_string(&evidence) | ||
| .context("Failed to read evidence file")?; | ||
|
|
||
| let pack: serde_json::Value = serde_json::from_str(&evidence_content) | ||
| .context("Failed to parse evidence JSON")?; | ||
|
|
||
| // Build markdown narrative | ||
| let mut md = String::new(); | ||
|
|
||
| // Header | ||
| md.push_str(&format!( | ||
| "## PR #{}: AI Provenance Evidence\n\n", | ||
| pack["pr_id"].as_u64().unwrap_or(pr) | ||
| )); | ||
|
|
||
| // Intent section (if available) | ||
| if let Some(intent) = pack["intent"].as_str() { | ||
| if !intent.is_empty() { | ||
| md.push_str("### Intent\n"); | ||
| md.push_str(&format!("> {}\n\n", intent)); | ||
| } | ||
| } | ||
|
|
||
| // Summary section | ||
| md.push_str("### What Changed\n"); | ||
| md.push_str(&format!( | ||
| "- **Files modified**: {}\n", | ||
| pack["summary"]["files_changed"].as_u64().unwrap_or(0) | ||
| )); | ||
| md.push_str(&format!( | ||
| "- **Lines**: +{} / -{}\n", | ||
| pack["summary"]["lines_added"].as_u64().unwrap_or(0), | ||
| pack["summary"]["lines_removed"].as_u64().unwrap_or(0) | ||
| )); | ||
| md.push_str(&format!( | ||
| "- **Tool operations**: {}\n", | ||
| pack["summary"]["tool_operations"].as_u64().unwrap_or(0) | ||
| )); | ||
| md.push_str(&format!( | ||
| "- **Sessions**: {}\n\n", | ||
| pack["summary"]["sessions"].as_u64().unwrap_or(0) | ||
| )); | ||
|
|
||
| // Evidence trail section | ||
| md.push_str("### Evidence Trail\n"); | ||
| let coverage = pack["coverage_pct"].as_f64().unwrap_or(0.0); | ||
| let unmatched = pack["unmatched_count"].as_u64().unwrap_or(0); | ||
| md.push_str(&format!("- **Coverage**: {:.1}% of events matched to commits", coverage)); | ||
| if unmatched > 0 { | ||
| md.push_str(&format!(" ({} unmatched)", unmatched)); | ||
| } | ||
| md.push_str("\n"); | ||
|
|
||
| // List commits with their events | ||
| if let Some(commits) = pack["commits"].as_array() { | ||
| for commit in commits { | ||
| let sha = commit["sha"].as_str().unwrap_or(""); | ||
| let sha_short = &sha[..7.min(sha.len())]; | ||
| let confidence = commit["confidence"].as_str().unwrap_or("LOW"); | ||
|
|
||
| md.push_str(&format!("\n**Commit `{}`**", sha_short)); | ||
| if let Some(msg) = commit["message"].as_str() { | ||
| let first_line = msg.lines().next().unwrap_or(msg); | ||
| md.push_str(&format!(": {}", first_line)); | ||
| } | ||
| md.push_str(&format!(" ({})\n", confidence)); | ||
|
|
||
| if let Some(events) = commit["events"].as_array() { | ||
| for event in events.iter().take(5) { | ||
| let tool = event["tool_name"].as_str().unwrap_or("-"); | ||
| let file = event["file_path"].as_str().unwrap_or("-"); | ||
| let op = event["operation"].as_str().unwrap_or("-"); | ||
| md.push_str(&format!(" - `{}` {} → {}\n", tool, op, file)); | ||
| } | ||
| if events.len() > 5 { | ||
| md.push_str(&format!(" - *...and {} more*\n", events.len() - 5)); | ||
| } | ||
| } | ||
| } | ||
| } | ||
| md.push_str("\n"); | ||
|
|
||
| // Verification section | ||
| md.push_str("### Verification\n"); | ||
| md.push_str(&format!( | ||
| "- [{}] Hash chain integrity\n", | ||
| if pack["verification"]["chain_verified"].as_bool().unwrap_or(false) { "x" } else { " " } | ||
| )); | ||
| md.push_str(&format!( | ||
| "- [{}] Tests executed after changes\n", | ||
| if pack["verification"]["tests_executed"].as_bool().unwrap_or(false) { "x" } else { " " } | ||
| )); | ||
| md.push_str(&format!( | ||
| "- [{}] Build succeeded\n", | ||
| if pack["verification"]["build_succeeded"].as_bool().unwrap_or(false) { "x" } else { " " } | ||
| )); | ||
| md.push_str(&format!( | ||
| "- [{}] Human review\n\n", | ||
| if pack["verification"]["human_reviewed"].as_bool().unwrap_or(false) { "x" } else { " " } | ||
| )); | ||
|
|
||
| // Footer | ||
| md.push_str(&format!( | ||
| "---\n*Generated by [Diachron](https://github.com/wolfiesch/diachron) v{} at {}*\n", | ||
| pack["diachron_version"].as_str().unwrap_or(env!("CARGO_PKG_VERSION")), | ||
| pack["generated_at"].as_str().unwrap_or("unknown") | ||
| )); | ||
|
|
||
| // Post via gh CLI | ||
| let status = std::process::Command::new("gh") | ||
| .args(["pr", "comment", &pr.to_string(), "-b", &md]) | ||
| .status() | ||
| .context("Failed to run gh CLI")?; | ||
|
|
||
| if status.success() { | ||
| println!("✅ PR comment posted successfully"); | ||
| println!("\nPosted content:\n{}", md); | ||
| } else { | ||
| eprintln!("Failed to post PR comment (gh exit code: {:?})", status.code()); | ||
| std::process::exit(1); | ||
| } | ||
| } |
There was a problem hiding this comment.
The logic to render the evidence pack into a markdown comment is duplicated here, in rust/core/src/evidence_pack.rs, and in the TypeScript code for the GitHub Action (github-action/src/index.ts). This creates a maintenance burden, as any change to the markdown format needs to be updated in multiple places. Consider centralizing this logic. For example, the daemon could have an IPC endpoint that takes an evidence pack and returns the rendered markdown, which both the CLI and the GitHub Action could use.
- Fix CI: dtolnay/rust-action → dtolnay/rust-toolchain (missing action) - Fix security: Shell injection in IPC-API.md CI example (use Python for safe JSON) - Fix docs: Add missing `import os` to Python example - Fix docs: Use `gh pr view` to get commit SHAs (not count) - Fix Python: Remove unused os/datetime imports, add comment to except clause - Fix benchmarks: Replace eval with direct execution, add div-by-zero guard - Fix benchmarks: Repair broken table formatting in results markdown - Add note about race condition limitation in Rust wrapper Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
⚡ Benchmark Results
|
Summary
Major feature release spanning v0.4 through v0.7, adding:
Key Features
diachron maintenance--watchmodeFiles Changed
codex-wrapper/codex_capture.py,test_codex_capture.pyIPC-API.md,github-action/hash_chain.rs,fingerprint.rs,pr_correlation.rs,evidence_pack.rsTest plan
diachron timeline --watchshows live eventsdiachron maintenanceruns without errors🤖 Generated with Claude Code