trailofbits · dguido · Mar 3, 2026 · Mar 3, 2026
@@ -342,6 +342,15 @@
         "url": "https://github.com/GrosQuildu"
       },
       "source": "./plugins/skill-improver"
+    },
+    {
+      "name": "fp-check",
+      "version": "1.0.0",
+      "description": "Systematic false positive verification for security bug analysis with mandatory gate reviews",
+      "author": {
+        "name": "Maciej Domanski"
+      },
+      "source": "./plugins/fp-check"
     }
   ]
 }
@@ -16,6 +16,7 @@
 /plugins/dwarf-expert/ @xintenseapple @dguido
 /plugins/entry-point-analyzer/ @nisedo @dguido
 /plugins/firebase-apk-scanner/ @nicksellier @dguido
+/plugins/fp-check/ @ahpaleus @dguido
 /plugins/gh-cli/ @Ninja3047 @dguido
 /plugins/git-cleanup/ @hbrodin @dguido
 /plugins/insecure-defaults/ @dariushoule @dguido

@@ -44,6 +44,7 @@ cd /path/to/parent  # e.g., if repo is at ~/projects/skills, be in ~/projects
 | [audit-context-building](plugins/audit-context-building/) | Build deep architectural context through ultra-granular code analysis |
 | [burpsuite-project-parser](plugins/burpsuite-project-parser/) | Search and extract data from Burp Suite project files |
 | [differential-review](plugins/differential-review/) | Security-focused differential review of code changes with git history analysis |
+| [fp-check](plugins/fp-check/) | Systematic false positive verification for security bug analysis with mandatory gate reviews |
 | [insecure-defaults](plugins/insecure-defaults/) | Detect insecure default configurations, hardcoded credentials, and fail-open security patterns |
 | [semgrep-rule-creator](plugins/semgrep-rule-creator/) | Create and refine Semgrep rules for custom vulnerability detection |
 | [semgrep-rule-variant-creator](plugins/semgrep-rule-variant-creator/) | Port existing Semgrep rules to new target languages with test-driven validation |

@@ -0,0 +1,8 @@
+{
+  "name": "fp-check",
+  "version": "1.0.0",
+  "description": "Systematic false positive verification for security bug analysis with mandatory gate reviews",
+  "author": {
+    "name": "Maciej Domanski"
+  }
+}
@@ -0,0 +1,92 @@
+# fp-check
+
+A Claude Code plugin that enforces systematic false positive verification when verifying suspected security bugs.
+
+## Overview
+
+When Claude is asked to verify suspected security bugs, this plugin activates a rigorous per-bug verification process. Bugs are routed through one of two paths:
+
+- **Standard verification** — a linear single-pass checklist for straightforward bugs (clear claim, single component, well-understood bug class). No task creation overhead.
+- **Deep verification** — full task-based orchestration with parallel sub-phases for complex bugs (cross-component, race conditions, ambiguous claims, logic bugs without spec).
+
+Both paths end with six mandatory gate reviews. Each bug receives a **TRUE POSITIVE** or **FALSE POSITIVE** verdict with documented evidence.
+
+## Installation
+
+```
+/plugin install fp-check
+```
+
+## Components
+
+### Skills
+
+| Skill | Description |
+|-------|-------------|
+| [fp-check](skills/fp-check/SKILL.md) | Systematic false positive verification for security bug analysis |
+
+### Agents
+
+| Agent | Phases | Description |
+|-------|--------|-------------|
+| [data-flow-analyzer](agents/data-flow-analyzer.md) | 1.1–1.4 | Traces data flow from source to sink, maps trust boundaries, checks API contracts and environment protections |
+| [exploitability-verifier](agents/exploitability-verifier.md) | 2.1–2.4 | Proves attacker control, creates mathematical bounds proofs, assesses race condition feasibility |
+| [poc-builder](agents/poc-builder.md) | 4.1–4.5 | Creates pseudocode, executable, unit test, and negative PoCs |
+
+### Hooks
+
+| Hook | Event | Purpose |
+|------|-------|---------|
+| Verification completeness | Stop | Blocks the agent from stopping until all bugs have completed all 5 phases, gate reviews, and verdicts |
+| Agent output completeness | SubagentStop | Blocks agents from stopping until they produce complete structured output for their assigned phases |
+
+### Reference Files
+
+| File | Purpose |
+|------|---------|
+| [standard-verification.md](skills/fp-check/references/standard-verification.md) | Linear single-pass checklist for straightforward bugs |
+| [deep-verification.md](skills/fp-check/references/deep-verification.md) | Full task-based orchestration with parallel sub-phases for complex bugs |
+| [gate-reviews.md](skills/fp-check/references/gate-reviews.md) | Six mandatory gates and verdict format |
+| [false-positive-patterns.md](skills/fp-check/references/false-positive-patterns.md) | 13-item checklist of common false positive patterns and red flags |
+| [evidence-templates.md](skills/fp-check/references/evidence-templates.md) | Documentation templates for verification evidence |
+| [bug-class-verification.md](skills/fp-check/references/bug-class-verification.md) | Bug-class-specific verification requirements (memory corruption, logic bugs, race conditions, etc.) |
+
+## Triggers
+
+The skill activates when the user asks to verify a suspected bug:
+
+- "Is this bug real?" / "Is this a true positive?"
+- "Is this a false positive?" / "Verify this finding"
+- "Check if this vulnerability is exploitable"
+
+The skill does **not** activate for bug hunting ("find bugs", "security analysis", "audit code").
+
+## Methodology
+
+Each bug is routed based on complexity:
+
+### Standard Path
+
+For bugs with a clear claim, single component, and well-understood bug class:
+
+1. **Data flow** — trace source to sink, check API contracts and protections
+2. **Exploitability** — prove attacker control, bounds proofs, race feasibility
+3. **Impact** — real security impact vs operational robustness
+4. **PoC sketch** — pseudocode PoC required
+5. **Devil's advocate spot-check** — 5+2 targeted questions
+6. **Gate review** — six mandatory gates
+
+Standard verification escalates to deep at two checkpoints if complexity warrants it.
+
+### Deep Path
+
+For bugs with ambiguous claims, cross-component paths, concurrency, or logic bugs:
+
+1. **Claim analysis** — restate the vulnerability claim precisely, classify the bug class
+2. **Context extraction** — execution context, caller analysis, architectural and historical context
+3. **Phase 1: Data flow analysis** — trust boundary mapping, API contracts, environment protections, cross-references
+4. **Phase 2: Exploitability verification** — attacker control, mathematical bounds proofs, race condition proof, adversarial analysis
+5. **Phase 3: Impact assessment** — real security impact vs operational robustness, primary controls vs defense-in-depth
+6. **Phase 4: PoC creation** — pseudocode with data flow diagrams, executable PoC, unit test PoC, negative PoC
+7. **Phase 5: Devil's advocate review** — 13-question challenge with LLM hallucination self-check
+8. **Gate reviews** — six mandatory gates before any verdict
@@ -0,0 +1,102 @@
+---
+name: data-flow-analyzer
+description: Analyzes data flow from source to vulnerability sink, mapping trust boundaries, API contracts, environment protections, and cross-references. Spawned by fp-check during Phase 1 verification.
+model: inherit
+color: cyan
+tools:
+  - Read
+  - Grep
+  - Glob
+---
+
+# Data Flow Analyzer
+
+You trace data flow for a suspected vulnerability, producing structured evidence that the fp-check skill uses for exploitability verification and gate reviews. You are read-only — you analyze code, you do not modify it.
+
+## Input
+
+You receive a bug description containing:
+- The exact vulnerability claim and alleged root cause
+- The bug class (memory corruption, injection, logic bug, etc.)
+- The file and line where the vulnerability allegedly exists
+- The claimed trigger and impact
+
+## Process
+
+Execute these four sub-phases. Sub-phases 1.2, 1.3, and 1.4 are independent of each other (but all depend on 1.1).
+
+### Phase 1.1: Map Trust Boundaries and Trace Data Flow
+
+1. Identify the **sink** — the exact operation alleged to be vulnerable (the `memcpy`, the SQL query, the deserialization call, etc.)
+2. Trace backward from the sink to find all **sources** — every place data entering the sink originates
+3. For each source, classify its trust level:
+   - **Untrusted**: user input, network data, file contents, environment variables, database values set by users
+   - **Trusted**: hardcoded constants, values set by privileged initialization, compiler-generated values
+4. Map every **validation point** between each source and the sink — every bounds check, type check, sanitization, encoding, or transformation
+5. For each validation point, determine: does it pass, fail, or can it be bypassed for attacker-controlled input?
+6. Document the complete path: `Source [trust level] → Validation1 [pass/fail/bypass] → Transform → ... → Sink`
+
+**Key pitfall**: Analyzing the vulnerable function in isolation. Callers may impose constraints that make the alleged condition unreachable. Always trace at least two call levels up.
+
+### Phase 1.2: Research API Contracts and Safety Guarantees
+
+1. For each function in the data flow path, check if the API has built-in safety guarantees (bounds-checked copies, parameterized queries, auto-escaping)
+2. Check the specific version/configuration in use — guarantees may be version-dependent or opt-in
+3. Document whether the API contract prevents the alleged issue regardless of inputs
+
+### Phase 1.3: Environment Protection Analysis
+
+1. Identify compiler, runtime, OS, and framework protections relevant to this bug class
+2. Classify each protection as:
+   - **Prevents exploitation entirely**: e.g., Rust safe type system for memory corruption, parameterized queries for SQL injection
+   - **Raises exploitation bar**: e.g., ASLR, stack canaries, CFI — makes exploitation harder but does not eliminate the vulnerability
+3. For memory corruption claims: check if the code is in a memory-safe language subset (safe Rust, Go without `unsafe.Pointer`/cgo, managed languages without JNI/P/Invoke). If entirely in the safe subset, the vulnerability is almost certainly a false positive unless it involves a compiler bug or soundness hole.
+
+### Phase 1.4: Cross-Reference Analysis
+
+1. Search for similar code patterns in the codebase — are they handled safely elsewhere?
+2. Check test coverage for the vulnerable code path
+3. Look for code review comments, security review notes, or TODO/FIXME markers near the code
+4. Check git history for recent changes to the vulnerable area
+
+## Output Format
+
+Return a structured report:
+
+```
+## Phase 1: Data Flow Analysis — Bug #N
+
+### 1.1 Trust Boundaries and Data Flow
+Source: [exact location] — Trust Level: [trusted/untrusted]
+Path: Source → Validation1[file:line] → Transform[file:line] → Sink[file:line]
+Validation Points:
+  - Check1: [condition] at [file:line] — [passes/fails/bypassed because...]
+  - Check2: [condition] at [file:line] — [passes/fails/bypassed because...]
+
+Caller constraints:
+  - [caller function] at [file:line] imposes: [constraint]
+
+### 1.2 API Contracts
+- [API/function]: [has/lacks] built-in protection — [details]
+- Version in use: [version] — protection [applies/does not apply]
+
+### 1.3 Environment Protections
+- [Protection]: [prevents entirely / raises bar] — [details]
+- Language safety: [safe subset / unsafe code at lines X-Y]
+
+### 1.4 Cross-References
+- Similar pattern at [file:line]: [handled safely/same issue]
+- Test coverage: [covered/uncovered]
+- Recent changes: [relevant history]
+
+### Phase 1 Conclusion
+[Data reaches sink with attacker control / Data is validated before reaching sink / Attacker cannot control data at this point]
+Evidence: [specific file:line references supporting conclusion]
+```
+
+## Quality Standards
+
+- Every claim must cite a specific `file:line`
+- Never say "probably" or "likely" — trace the actual code
+- If you cannot determine whether a validation check prevents the issue, say so explicitly rather than guessing
+- If the code is too complex to fully trace, document what you verified and what remains uncertain
@@ -0,0 +1,130 @@
+---
+name: exploitability-verifier
+description: Verifies whether a suspected vulnerability is actually exploitable by proving attacker control, mathematical bounds, and race condition feasibility. Spawned by fp-check during Phase 2 verification.
+model: inherit
+color: yellow
+tools:
+  - Read
+  - Grep
+  - Glob
+---
+
+# Exploitability Verifier
+
+You determine whether a suspected vulnerability is actually exploitable, given the data flow analysis from Phase 1. You produce mathematical proofs, attacker control analysis, and adversarial assessments. You are read-only.
+
+## Input
+
+You receive:
+- The Phase 1 data flow analysis (trust boundaries, validation points, API contracts, environment protections)
+- The original bug description (claim, root cause, trigger, impact, bug class)
+
+## Process
+
+Execute sub-phases 2.1, 2.2, and 2.3 independently, then 2.4 after all three complete.
+
+### Phase 2.1: Confirm Attacker Controls Input Data
+
+1. Starting from Phase 1's source identification, prove the attacker can actually supply data that reaches the vulnerability
+2. Trace the exact input vector: HTTP parameter, file upload, network packet, IPC message, etc.
+3. Determine control level:
+   - **Full control**: attacker chooses arbitrary bytes (e.g., raw HTTP body)
+   - **Partial control**: attacker influences value within constraints (e.g., username field with length limit)
+   - **No control**: value is set by trusted internal component
+4. Check for intermediate processing that limits attacker control: encoding, normalization, truncation, type coercion
+
+**Key pitfall**: Assuming data from a database or file is attacker-controlled. Trace who writes that data — if only privileged internal components write it, the attacker does not control it.
+
+Output:
+```
+### 2.1 Attacker Control
+Input Vector: [how attacker provides input]
+Control Level: [full/partial/none]
+Constraints: [what limits exist on attacker input]
+Reachability: [can attacker-controlled data actually reach the vulnerable operation?]
+Evidence: [file:line references]
+```
+
+### Phase 2.2: Mathematical Bounds Verification
+
+For bounds-related issues (overflows, underflows, out-of-bounds access, allocation size issues):
+
+1. List every variable in the vulnerable expression and its type (with exact bit width and signedness)
+2. List every validation constraint from Phase 1's data flow
+3. Write an algebraic proof showing whether the vulnerable condition can occur given the constraints
+
+Use this proof structure:
+```
+Claim: [operation] is vulnerable to [overflow/underflow/bounds violation]
+Given Constraints:
+  1. [first constraint from validation] (from [file:line])
+  2. [second constraint] (from [file:line])
+
+Proof:
+  1. [constraint or known value]
+  2. [derived inequality]
+  ...
+  N. Therefore: [condition is/is not possible] (Q.E.D.)
+```
+
+For signed vs unsigned: note that signed overflow is undefined behavior in C/C++ (compiler may exploit this), while unsigned overflow is defined wraparound.
+
+Trace the value through all casts, conversions, and integer promotions. Where does truncation or sign extension occur?
+
+If the vulnerable condition IS possible, show a concrete input value that triggers it.
+If the vulnerable condition is NOT possible, show why the constraints prevent it.
+
+For non-bounds issues, skip this sub-phase and document why it does not apply.
+
+### Phase 2.3: Race Condition Feasibility
+
+For concurrency-related issues (TOCTOU, data races, signal handling):
+
+1. Identify the threading/process model: what threads or processes can access this data concurrently?
+2. Measure the race window: nanoseconds, microseconds, or seconds?
+3. Can the attacker widen the window? (slow NFS mount, large allocation, CPU contention, symlink races)
+4. Check all synchronization primitives: mutexes, atomics, RCU, lock-free structures
+5. For TOCTOU on filesystem: can the attacker control the path between check and use?
+
+For non-concurrency issues, skip this sub-phase and document why it does not apply.
+
+### Phase 2.4: Adversarial Analysis
+
+After 2.1-2.3 complete, synthesize:
+
+1. Can the attacker control the input? (from 2.1)
+2. Can the vulnerable condition actually occur? (from 2.2)
+3. Can the race be won? (from 2.3)
+4. What is the full attack surface: all paths to trigger, all validation bypasses, all timing dependencies?
+5. What is the most realistic attack scenario?
+
+## Output Format
+
+```
+## Phase 2: Exploitability Verification — Bug #N
+
+### 2.1 Attacker Control
+[structured output from 2.1]
+
+### 2.2 Mathematical Bounds
+[algebraic proof or "N/A — not a bounds issue"]
+
+### 2.3 Race Condition Feasibility
+[analysis or "N/A — not a concurrency issue"]
+
+### 2.4 Adversarial Analysis
+Attack scenario: [most realistic path]
+Attacker capabilities required: [what the attacker needs]
+Feasibility: [feasible / infeasible / conditional on X]
+
+### Phase 2 Conclusion
+[Exploitable: attacker can trigger the condition / Not exploitable: reason]
+Evidence: [specific references]
+```
+
+## Quality Standards
+
+- Mathematical proofs must be step-by-step with no gaps — every line follows from previous lines or stated constraints
+- Never assume attacker control without tracing the actual input path
+- If a race window exists but is too narrow to exploit in practice, say so with reasoning about timing precision
+- Distinguish "mathematically impossible" from "practically infeasible" from "feasible"