From 4bf603730a356af036955af0b7b4be95bbdf4b55 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Fri, 16 Jan 2026 17:54:10 +0000
Subject: [PATCH 01/21] Add policy system v2 design documentation

Design docs for next-generation policy system with:
- File correspondence matching (sets and pairs)
- Idempotent command execution
- Queue-based state tracking with detector/evaluator pattern
- Folder-based policy storage using frontmatter markdown files

Key changes from current system:
- Policies move from single .deepwork.policy.yml to .deepwork/policies/*.md
- YAML frontmatter for config, markdown body for instructions
- New 'set' syntax for bidirectional file relationships
- New 'pair' syntax for directional file relationships
- New 'action' field for running commands instead of prompts
- Queue system prevents duplicate policy triggers across sessions
---
 doc/policy_syntax.md        | 691 ++++++++++++++++++++++++++++++++++++
 doc/policy_system_design.md | 580 ++++++++++++++++++++++++++++++
 doc/test_scenarios.md       | 509 ++++++++++++++++++++++++++
 3 files changed, 1780 insertions(+)
 create mode 100644 doc/policy_syntax.md
 create mode 100644 doc/policy_system_design.md
 create mode 100644 doc/test_scenarios.md

diff --git a/doc/policy_syntax.md b/doc/policy_syntax.md
new file mode 100644
index 00000000..72654b08
--- /dev/null
+++ b/doc/policy_syntax.md
@@ -0,0 +1,691 @@
+# Policy Configuration Syntax
+
+This document describes the syntax for policy files in the `.deepwork/policies/` directory.
+
+## Directory Structure
+
+Policies are stored as individual markdown files with YAML frontmatter:
+
+```
+.deepwork/
+└── policies/
+    ├── readme-accuracy.md
+    ├── source-test-pairing.md
+    ├── api-documentation.md
+    └── python-formatting.md
+```
+
+Each file has:
+- **Frontmatter**: YAML configuration between `---` delimiters
+- **Body**: Instructions (for prompt policies) or description (for command policies)
+
+This structure enables code files to reference policies:
+```python
+# Read the policy `.deepwork/policies/source-test-pairing.md` before editing
+class AuthService:
+    ...
+```
+
+## Quick Reference
+
+### Instruction Policy
+
+`.deepwork/policies/readme-accuracy.md`:
+```markdown
+---
+trigger: src/**/*
+safety: README.md
+---
+Source code changed. Please verify README.md is accurate.
+
+Check that:
+- All public APIs are documented
+- Examples are up to date
+- Installation instructions are correct
+```
+
+### Correspondence Set (bidirectional)
+
+`.deepwork/policies/source-test-pairing.md`:
+```markdown
+---
+set:
+  - src/{path}.py
+  - tests/{path}_test.py
+---
+Source and test files should change together.
+
+When modifying source code, ensure corresponding tests are updated.
+When adding tests, ensure they test actual source code.
+```
+
+### Correspondence Pair (directional)
+
+`.deepwork/policies/api-documentation.md`:
+```markdown
+---
+pair:
+  trigger: api/{path}.py
+  expects: docs/api/{path}.md
+---
+API changes require documentation updates.
+
+When modifying an API endpoint, update its documentation to reflect:
+- Parameter changes
+- Response format changes
+- New error conditions
+```
+
+### Command Policy
+
+`.deepwork/policies/python-formatting.md`:
+```markdown
+---
+trigger: "**/*.py"
+action:
+  command: ruff format {file}
+---
+Automatically formats Python files using ruff.
+
+This policy runs `ruff format` on any changed Python files to ensure
+consistent code style across the codebase.
+```
+
+## Policy Types
+
+### Instruction Policies
+
+Instruction policies prompt the AI agent with guidance when certain files change.
+
+**Frontmatter fields:**
+```yaml
+---
+trigger: pattern              # Required: file pattern(s) that trigger
+safety: pattern               # Optional: file pattern(s) that suppress
+compare_to: base              # Optional: comparison baseline
+priority: normal              # Optional: output priority
+---
+```
+
+The markdown body contains the instructions shown to the agent.
+
+**Example:** `.deepwork/policies/security-review.md`
+
+```markdown
+---
+trigger:
+  - src/auth/**/*
+  - src/crypto/**/*
+safety: SECURITY.md
+compare_to: base
+priority: critical
+---
+Security-sensitive code has been modified.
+
+Please verify:
+1. No credentials are hardcoded
+2. Input validation is present
+3. Authentication checks are correct
+```
+
+### Correspondence Sets
+
+Sets define bidirectional relationships between files. When any file in a correspondence group changes, all related files should also change.
+
+**Frontmatter fields:**
+```yaml
+---
+set:                            # Required: list of corresponding patterns
+  - pattern1/{path}.ext1
+  - pattern2/{path}.ext2
+---
+```
+
+The markdown body contains instructions for when correspondence is incomplete.
+
+**How it works:**
+
+1. A file changes that matches one pattern in the set
+2. System extracts the variable portions (e.g., `{path}`)
+3. System generates expected files by substituting into other patterns
+4. If ALL expected files also changed: policy is satisfied (no trigger)
+5. If ANY expected file is missing: policy triggers with instructions
+
+**Example:** `.deepwork/policies/source-test-pairing.md`
+
+```markdown
+---
+set:
+  - src/{path}.py
+  - tests/{path}_test.py
+---
+Source and test files should change together.
+
+Changed: {trigger_file}
+Expected: {expected_files}
+
+Please ensure both source and test are updated.
+```
+
+If `src/auth/login.py` changes:
+- Extracts `{path}` = `auth/login`
+- Expects `tests/auth/login_test.py` to also change
+- If test didn't change, shows instructions
+
+If `tests/auth/login_test.py` changes:
+- Extracts `{path}` = `auth/login`
+- Expects `src/auth/login.py` to also change
+- If source didn't change, shows instructions
+
+**Example:** `.deepwork/policies/model-schema-migration.md`
+
+```markdown
+---
+set:
+  - models/{name}.py
+  - schemas/{name}.py
+  - migrations/{name}.sql
+---
+Models, schemas, and migrations should stay in sync.
+
+When modifying database models, ensure:
+- Schema definitions are updated
+- Migration files are created or updated
+```
+
+### Correspondence Pairs
+
+Pairs define directional relationships. Changes to trigger files require corresponding expected files to change, but not vice versa.
+
+**Frontmatter fields:**
+```yaml
+---
+pair:
+  trigger: pattern/{path}.ext     # Required: pattern that triggers
+  expects: pattern/{path}.ext     # Required: expected to also change
+---
+```
+
+Can also specify multiple expected patterns:
+
+```yaml
+---
+pair:
+  trigger: pattern/{path}.ext
+  expects:
+    - pattern1/{path}.ext
+    - pattern2/{path}.ext
+---
+```
+
+**Example:** `.deepwork/policies/api-documentation.md`
+
+```markdown
+---
+pair:
+  trigger: api/{module}/{name}.py
+  expects: docs/api/{module}/{name}.md
+---
+API endpoint changed without documentation update.
+
+Changed: {trigger_file}
+Please update: {expected_files}
+
+Ensure the documentation covers:
+- Endpoint URL and method
+- Request parameters
+- Response format
+- Error cases
+```
+
+If `api/users/create.py` changes:
+- Expects `docs/api/users/create.md` to also change
+- If doc didn't change, shows instructions
+
+If `docs/api/users/create.md` changes alone:
+- No trigger (documentation can be updated independently)
+
+### Command Policies
+
+Command policies run idempotent commands instead of prompting the agent.
+
+**Frontmatter fields:**
+```yaml
+---
+trigger: pattern                  # Required: files that trigger
+safety: pattern                   # Optional: files that suppress
+action:
+  command: command {file}         # Required: command to run
+  run_for: each_match             # Optional: each_match (default) or all_matches
+---
+```
+
+The markdown body serves as a description of what the command does (shown in logs, not to agent).
+
+**Template Variables in Commands:**
+
+| Variable | Description | Available When |
+|----------|-------------|----------------|
+| `{file}` | Single file path | `run_for: each_match` |
+| `{files}` | Space-separated file paths | `run_for: all_matches` |
+| `{repo_root}` | Repository root directory | Always |
+
+**Example:** `.deepwork/policies/python-formatting.md`
+
+```markdown
+---
+trigger: "**/*.py"
+safety: "*.pyi"
+action:
+  command: ruff format {file}
+  run_for: each_match
+---
+Automatically formats Python files using ruff.
+
+This ensures consistent code style without requiring manual formatting.
+Stub files (*.pyi) are excluded as they have different formatting rules.
+```
+
+**Example:** `.deepwork/policies/eslint-check.md`
+
+```markdown
+---
+trigger: "**/*.{js,ts,tsx}"
+action:
+  command: eslint --fix {files}
+  run_for: all_matches
+---
+Runs ESLint with auto-fix on all changed JavaScript/TypeScript files.
+```
+
+**Idempotency Requirement:**
+
+Commands MUST be idempotent. The system verifies this by:
+1. Running the command
+2. Checking for changes
+3. If changes occurred, running again
+4. If more changes occur, marking as failed
+
+## Pattern Syntax
+
+### Basic Glob Patterns
+
+Standard glob patterns work in `trigger` and `safety` fields:
+
+| Pattern | Matches |
+|---------|---------|
+| `*.py` | Python files in current directory |
+| `**/*.py` | Python files in any directory |
+| `src/**/*` | All files under src/ |
+| `test_*.py` | Files starting with `test_` |
+| `*.{js,ts}` | JavaScript and TypeScript files |
+
+### Variable Patterns
+
+Variable patterns use `{name}` syntax to capture path segments:
+
+| Pattern | Captures | Example Match |
+|---------|----------|---------------|
+| `src/{path}.py` | `{path}` = multi-segment path | `src/foo/bar.py` → `path=foo/bar` |
+| `src/{name}.py` | `{name}` = single segment | `src/utils.py` → `name=utils` |
+| `{module}/{name}.py` | Both variables | `auth/login.py` → `module=auth, name=login` |
+
+**Variable Naming Conventions:**
+
+- `{path}` - Conventional name for multi-segment captures (`**/*`)
+- `{name}` - Conventional name for single-segment captures (`*`)
+- Custom names allowed: `{module}`, `{component}`, etc.
+
+**Multi-Segment vs Single-Segment:**
+
+By default, `{path}` matches multiple path segments and `{name}` matches one:
+
+```yaml
+# {path} matches: foo, foo/bar, foo/bar/baz
+- "src/{path}.py"  # src/foo.py, src/foo/bar.py, src/a/b/c.py
+
+# {name} matches only single segment
+- "src/{name}.py"  # src/foo.py (NOT src/foo/bar.py)
+```
+
+To explicitly control this, use `{**name}` for multi-segment or `{*name}` for single:
+
+```yaml
+- "src/{**module}/index.py"   # src/foo/bar/index.py → module=foo/bar
+- "src/{*component}.py"       # src/Button.py → component=Button
+```
+
+## Field Reference
+
+### File Naming
+
+Policy files are named using kebab-case with `.md` extension:
+- `readme-accuracy.md`
+- `source-test-pairing.md`
+- `api-documentation.md`
+
+The filename (without extension) serves as the policy's unique identifier for logging and promise tags.
+
+### trigger (instruction/command policies)
+
+File patterns that cause the policy to fire. Can be string or array.
+
+```yaml
+---
+# Single pattern
+trigger: src/**/*.py
+---
+
+---
+# Multiple patterns
+trigger:
+  - src/**/*.py
+  - lib/**/*.py
+---
+```
+
+### safety (optional)
+
+File patterns that suppress the policy. If ANY changed file matches a safety pattern, the policy does not fire.
+
+```yaml
+---
+# Single pattern
+safety: CHANGELOG.md
+---
+
+---
+# Multiple patterns
+safety:
+  - CHANGELOG.md
+  - docs/**/*
+---
+```
+
+### set (correspondence sets)
+
+List of patterns defining bidirectional file relationships.
+
+```yaml
+---
+set:
+  - src/{path}.py
+  - tests/{path}_test.py
+---
+```
+
+### pair (correspondence pairs)
+
+Object with `trigger` and `expects` patterns for directional relationships.
+
+```yaml
+---
+pair:
+  trigger: api/{path}.py
+  expects: docs/api/{path}.md
+---
+
+---
+# Or with multiple expects
+pair:
+  trigger: api/{path}.py
+  expects:
+    - docs/api/{path}.md
+    - schemas/{path}.json
+---
+```
+
+### Markdown Body (instructions)
+
+The markdown content after the frontmatter serves as instructions shown to the agent when the policy fires.
+
+**Template Variables in Instructions:**
+
+| Variable | Description |
+|----------|-------------|
+| `{trigger_file}` | The file that triggered the policy |
+| `{trigger_files}` | All files that matched trigger patterns |
+| `{expected_files}` | Expected corresponding files (for sets/pairs) |
+| `{safety_files}` | Files that would suppress the policy |
+
+### action (command policies)
+
+Specifies a command to run instead of prompting.
+
+```yaml
+---
+action:
+  command: ruff format {file}
+  run_for: each_match  # or all_matches
+---
+```
+
+### compare_to (optional)
+
+Determines the baseline for detecting file changes.
+
+| Value | Description |
+|-------|-------------|
+| `base` (default) | Compare to merge-base with default branch |
+| `default_tip` | Compare to current tip of default branch |
+| `prompt` | Compare to state at last prompt submission |
+
+```yaml
+---
+compare_to: prompt
+---
+```
+
+### priority (optional)
+
+Controls output ordering and visibility.
+
+| Value | Behavior |
+|-------|----------|
+| `critical` | Always shown first, blocks progress |
+| `high` | Shown prominently |
+| `normal` (default) | Standard display |
+| `low` | Shown in summary, may be collapsed |
+
+```yaml
+---
+priority: critical
+---
+```
+
+### defer (optional)
+
+When `true`, policy output is deferred to end of session.
+
+```yaml
+---
+defer: true
+---
+```
+
+## Complete Examples
+
+### Example 1: Test Coverage Policy
+
+`.deepwork/policies/test-coverage.md`:
+```markdown
+---
+set:
+  - src/{path}.py
+  - tests/{path}_test.py
+compare_to: base
+---
+Source code was modified without corresponding test updates.
+
+Modified source: {trigger_file}
+Expected test: {expected_files}
+
+Please either:
+1. Add/update tests for the changed code
+2. Explain why tests are not needed (and mark with <promise>)
+```
+
+### Example 2: Documentation Sync
+
+`.deepwork/policies/api-documentation-sync.md`:
+```markdown
+---
+pair:
+  trigger: src/api/{module}/{endpoint}.py
+  expects:
+    - docs/api/{module}/{endpoint}.md
+    - openapi/{module}.yaml
+priority: high
+---
+API endpoint changed. Please update:
+- Documentation: {expected_files}
+- Ensure OpenAPI spec is current
+
+If this is an internal-only change, mark as addressed.
+```
+
+### Example 3: Auto-formatting Pipeline
+
+`.deepwork/policies/python-black-formatting.md`:
+```markdown
+---
+trigger: "**/*.py"
+safety:
+  - "**/*.pyi"
+  - "**/migrations/**"
+action:
+  command: black {file}
+  run_for: each_match
+---
+Formats Python files using Black.
+
+Excludes:
+- Type stub files (*.pyi)
+- Database migration files
+```
+
+`.deepwork/policies/typescript-prettier.md`:
+```markdown
+---
+trigger: "**/*.{ts,tsx}"
+action:
+  command: prettier --write {file}
+  run_for: each_match
+---
+Formats TypeScript files using Prettier.
+```
+
+### Example 4: Multi-file Correspondence
+
+`.deepwork/policies/full-stack-feature-sync.md`:
+```markdown
+---
+set:
+  - backend/api/{feature}/routes.py
+  - backend/api/{feature}/models.py
+  - frontend/src/api/{feature}.ts
+  - frontend/src/components/{feature}/**/*
+---
+Feature files should be updated together across the stack.
+
+When modifying a feature, ensure:
+- Backend routes are updated
+- Backend models are updated
+- Frontend API client is updated
+- Frontend components are updated
+
+Changed: {trigger_files}
+Expected: {expected_files}
+```
+
+### Example 5: Conditional Safety
+
+`.deepwork/policies/version-bump-required.md`:
+```markdown
+---
+trigger:
+  - src/**/*.py
+  - pyproject.toml
+safety:
+  - pyproject.toml
+  - CHANGELOG.md
+compare_to: base
+priority: low
+defer: true
+---
+Code changes detected. Before merging, ensure:
+- Version is bumped in pyproject.toml (if needed)
+- CHANGELOG.md is updated
+
+This policy is suppressed if you've already modified pyproject.toml
+or CHANGELOG.md, as that indicates you're handling versioning.
+```
+
+## Promise Tags
+
+When a policy fires but should be dismissed, use promise tags in the conversation:
+
+```
+<promise>policy-filename</promise>
+```
+
+Use the policy filename (without `.md` extension) as the identifier:
+
+```
+<promise>test-coverage</promise>
+<promise>api-documentation-sync</promise>
+```
+
+This tells the system the policy has been addressed (either by action or explicit acknowledgment).
+
+## Validation
+
+Policy files are validated on load. Common errors:
+
+**Invalid frontmatter:**
+```
+Error: .deepwork/policies/my-policy.md - invalid YAML frontmatter
+```
+
+**Missing required field:**
+```
+Error: .deepwork/policies/my-policy.md - must have 'trigger', 'set', or 'pair'
+```
+
+**Invalid pattern:**
+```
+Error: .deepwork/policies/test-coverage.md - invalid pattern "src/{path" - unclosed brace
+```
+
+**Conflicting fields:**
+```
+Error: .deepwork/policies/my-policy.md - has both 'trigger' and 'set' - use one or the other
+```
+
+**Empty body:**
+```
+Error: .deepwork/policies/my-policy.md - instruction policies require markdown body
+```
+
+## Referencing Policies in Code
+
+A key benefit of the `.deepwork/policies/` folder structure is that code files can reference policies directly:
+
+```python
+# Read `.deepwork/policies/source-test-pairing.md` before editing this file
+
+class UserService:
+    """Service for user management."""
+    pass
+```
+
+```typescript
+// This file is governed by `.deepwork/policies/api-documentation.md`
+// Any changes here require corresponding documentation updates
+
+export async function createUser(data: UserInput): Promise<User> {
+    // ...
+}
+```
+
+This helps AI agents and human developers understand which policies apply to specific files.
diff --git a/doc/policy_system_design.md b/doc/policy_system_design.md
new file mode 100644
index 00000000..d62d78bf
--- /dev/null
+++ b/doc/policy_system_design.md
@@ -0,0 +1,580 @@
+# Policy System Design
+
+## Overview
+
+The deepwork policy system enables automated enforcement of development standards during AI-assisted coding sessions. This document describes the architecture for the next-generation policy system with support for:
+
+1. **File correspondence matching** (sets and pairs)
+2. **Idempotent command execution**
+3. **Stateful evaluation with queue-based processing**
+4. **Efficient agent output management**
+
+## Core Concepts
+
+### Policy Types
+
+The system supports three policy types:
+
+| Type | Purpose | Trigger Direction |
+|------|---------|-------------------|
+| **Instruction policies** | Prompt agent with instructions | Any matched file |
+| **Command policies** | Run idempotent commands | Any matched file |
+| **Correspondence policies** | Enforce file relationships | When relationship is incomplete |
+
+### File Correspondence
+
+Correspondence policies define relationships between files that should change together.
+
+**Sets (Bidirectional)**
+- Define N patterns that share a common variable path
+- If ANY file matching one pattern changes, ALL corresponding files should change
+- Example: Source files and their tests
+
+**Pairs (Directional)**
+- Define a trigger pattern and one or more expected patterns
+- Changes to trigger files require corresponding expected files to also change
+- Changes to expected files alone do not trigger the policy
+- Example: API code requires documentation updates
+
+### Pattern Variables
+
+Patterns use `{name}` syntax for capturing variable path segments:
+
+```
+src/{path}.py          # {path} captures everything between src/ and .py
+tests/{path}_test.py   # {path} must match the same value
+```
+
+Special variable names:
+- `{path}` - Matches any path segments (equivalent to `**/*`)
+- `{name}` - Matches a single path segment (equivalent to `*`)
+- `{**}` - Explicit multi-segment wildcard
+- `{*}` - Explicit single-segment wildcard
+
+### Actions
+
+Policies can specify two types of actions:
+
+**Prompt Action (default)**
+```yaml
+action:
+  type: prompt
+  instructions: |
+    Please review the changes...
+```
+
+**Command Action**
+```yaml
+action:
+  type: command
+  command: "ruff format {file}"
+  run_for: each_match
+```
+
+Command actions execute idempotent commands. The system verifies idempotency by running the command twice and checking that no additional changes occur.
+
+## Architecture
+
+### Component Overview
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                        Policy System                             │
+├─────────────────────────────────────────────────────────────────┤
+│                                                                  │
+│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐      │
+│  │   Detector   │───▶│    Queue     │◀───│  Evaluator   │      │
+│  │              │    │              │    │              │      │
+│  │ - Watch files│    │ .deepwork/   │    │ - Process    │      │
+│  │ - Match pols │    │ tmp/policy/  │    │   queued     │      │
+│  │ - Create     │    │ queue/       │    │ - Run action │      │
+│  │   entries    │    │              │    │ - Update     │      │
+│  └──────────────┘    └──────────────┘    │   status     │      │
+│                                          └──────────────┘      │
+│                                                                  │
+│  ┌──────────────┐    ┌──────────────┐                          │
+│  │   Matcher    │    │   Resolver   │                          │
+│  │              │    │              │                          │
+│  │ - Pattern    │    │ - Variable   │                          │
+│  │   matching   │    │   extraction │                          │
+│  │ - Glob       │    │ - Path       │                          │
+│  │   expansion  │    │   generation │                          │
+│  └──────────────┘    └──────────────┘                          │
+│                                                                  │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+### Detector
+
+The detector identifies when policies should be evaluated:
+
+1. **Trigger Detection**: Monitors for file changes that match policy triggers
+2. **Deduplication**: Computes a hash to avoid re-processing identical triggers
+3. **Queue Entry Creation**: Creates entries for the evaluator to process
+
+**Trigger Hash Computation**:
+```python
+hash_input = f"{policy_name}:{sorted(trigger_files)}:{baseline_ref}"
+trigger_hash = sha256(hash_input.encode()).hexdigest()[:12]
+```
+
+The baseline_ref varies by `compare_to` mode:
+- `base`: merge-base commit hash
+- `default_tip`: remote tip commit hash
+- `prompt`: timestamp of last prompt submission
+
+### Queue
+
+The queue persists policy trigger state in `.deepwork/tmp/policy/queue/`:
+
+```
+.deepwork/tmp/policy/queue/
+├── {hash}.queued.json      # Detected, awaiting evaluation
+├── {hash}.passed.json      # Evaluated, policy satisfied
+├── {hash}.failed.json      # Evaluated, policy not satisfied
+└── {hash}.skipped.json     # Safety pattern matched, skipped
+```
+
+**Queue Entry Schema**:
+```json
+{
+  "policy_name": "string",
+  "trigger_hash": "string",
+  "status": "queued|passed|failed|skipped",
+  "created_at": "ISO8601 timestamp",
+  "evaluated_at": "ISO8601 timestamp or null",
+  "baseline_ref": "string",
+  "trigger_files": ["array", "of", "files"],
+  "expected_files": ["array", "of", "files"],
+  "matched_files": ["array", "of", "files"],
+  "action_result": {
+    "type": "prompt|command",
+    "output": "string or null",
+    "exit_code": "number or null"
+  }
+}
+```
+
+**Queue Cleanup**:
+- Entries older than 24 hours are automatically pruned
+- `passed` and `skipped` entries are pruned after 1 hour
+- Manual cleanup via `deepwork policy clear-queue`
+
+### Evaluator
+
+The evaluator processes queued entries:
+
+1. **Load Entry**: Read queued entry from disk
+2. **Verify Still Relevant**: Re-check that trigger conditions still apply
+3. **Execute Action**:
+   - For prompts: Format message and return to hook system
+   - For commands: Execute command, verify idempotency
+4. **Update Status**: Mark as passed, failed, or skipped
+5. **Report Results**: Return appropriate response to caller
+
+### Matcher
+
+Pattern matching with variable extraction:
+
+**Algorithm**:
+```python
+def match_pattern(pattern: str, filepath: str) -> dict[str, str] | None:
+    """
+    Match filepath against pattern, extracting variables.
+
+    Returns dict of {variable_name: captured_value} or None if no match.
+    """
+    # Convert pattern to regex with named groups
+    # {path} -> (?P<path>.+)
+    # {name} -> (?P<name>[^/]+)
+    # Literal parts are escaped
+    regex = pattern_to_regex(pattern)
+    match = re.fullmatch(regex, filepath)
+    if match:
+        return match.groupdict()
+    return None
+```
+
+**Pattern Compilation**:
+```python
+def pattern_to_regex(pattern: str) -> str:
+    """Convert pattern with {var} placeholders to regex."""
+    result = []
+    for segment in parse_pattern(pattern):
+        if segment.is_variable:
+            if segment.name in ('path', '**'):
+                result.append(f'(?P<{segment.name}>.+)')
+            else:
+                result.append(f'(?P<{segment.name}>[^/]+)')
+        else:
+            result.append(re.escape(segment.value))
+    return ''.join(result)
+```
+
+### Resolver
+
+Generates expected filepaths from patterns and captured variables:
+
+```python
+def resolve_pattern(pattern: str, variables: dict[str, str]) -> str:
+    """
+    Substitute variables into pattern to generate filepath.
+
+    Example:
+        resolve_pattern("tests/{path}_test.py", {"path": "foo/bar"})
+        -> "tests/foo/bar_test.py"
+    """
+    result = pattern
+    for name, value in variables.items():
+        result = result.replace(f'{{{name}}}', value)
+    return result
+```
+
+## Evaluation Flow
+
+### Standard Instruction Policy
+
+```
+1. Detector: File changes detected
+2. Detector: Check each policy's trigger patterns
+3. Detector: For matching policy, compute trigger hash
+4. Detector: If hash not in queue, create .queued entry
+5. Evaluator: Process queued entry
+6. Evaluator: Check safety patterns against changed files
+7. Evaluator: If safety matches, mark .skipped
+8. Evaluator: If no safety match, return instructions to agent
+9. Agent: Addresses policy, includes <promise> tag
+10. Evaluator: On next check, mark .passed (promise found)
+```
+
+### Correspondence Policy (Set)
+
+```
+1. Detector: File src/foo/bar.py changed
+2. Matcher: Matches pattern "src/{path}.py" with {path}="foo/bar"
+3. Resolver: Generate expected files from other patterns:
+   - "tests/{path}_test.py" -> "tests/foo/bar_test.py"
+4. Detector: Check if tests/foo/bar_test.py also changed
+5. Detector: If yes, mark .skipped (correspondence satisfied)
+6. Detector: If no, create .queued entry
+7. Evaluator: Return instructions prompting for test update
+```
+
+### Correspondence Policy (Pair)
+
+```
+1. Detector: File api/users.py changed (trigger pattern)
+2. Matcher: Matches "api/{path}.py" with {path}="users"
+3. Resolver: Generate expected: "docs/api/users.md"
+4. Detector: Check if docs/api/users.md also changed
+5. Detector: If yes, mark .skipped
+6. Detector: If no, create .queued entry
+7. Evaluator: Return instructions
+
+Note: If only docs/api/users.md changed (not api/users.py),
+the pair policy does NOT trigger (directional).
+```
+
+### Command Policy
+
+```
+1. Detector: Python file changed, matches "**/*.py"
+2. Detector: Create .queued entry for format policy
+3. Evaluator: Execute "ruff format {file}"
+4. Evaluator: Run git diff to check for changes
+5. Evaluator: If changes made, re-run command (idempotency check)
+6. Evaluator: If no additional changes, mark .passed
+7. Evaluator: If changes keep occurring, mark .failed, alert user
+```
+
+## Agent Output Management
+
+### Problem
+
+When many policies trigger, the agent receives excessive output, degrading performance.
+
+### Solution
+
+**1. Output Batching**
+Group related policies into single messages:
+
+```
+The following policies require attention:
+
+## File Correspondence Issues (3)
+
+1. **Source/Test Pairing**: src/auth/login.py changed without tests/auth/login_test.py
+2. **Source/Test Pairing**: src/api/users.py changed without tests/api/users_test.py
+3. **API Documentation**: api/users.py changed without docs/api/users.md
+
+## Code Quality (1)
+
+4. **README Accuracy**: Source files changed, please verify README.md
+```
+
+**2. Priority Levels**
+Policies can specify priority (critical, high, normal, low):
+
+```yaml
+- name: "Security Review"
+  trigger: "src/auth/**/*"
+  priority: critical
+```
+
+Only critical and high priority shown immediately. Normal/low shown in summary.
+
+**3. Deferred Policies**
+Low-priority policies can be deferred to end of session:
+
+```yaml
+- name: "Documentation Check"
+  trigger: "src/**/*"
+  priority: low
+  defer: true  # Show at session end, not immediately
+```
+
+**4. Collapsed Instructions**
+Long instructions are truncated with expansion available:
+
+```
+## README Accuracy
+
+Source code changed. Please verify README.md is accurate.
+
+[+] Show full instructions (15 lines)
+```
+
+## State Persistence
+
+### Directory Structure
+
+```
+.deepwork/
+├── policies/                # Policy definitions (frontmatter markdown)
+│   ├── readme-accuracy.md
+│   ├── source-test-pairing.md
+│   ├── api-documentation.md
+│   └── python-formatting.md
+├── tmp/
+│   └── policy/
+│       ├── queue/           # Queue entries
+│       │   ├── abc123.queued.json
+│       │   └── def456.passed.json
+│       ├── baselines/       # Cached baseline states
+│       │   └── prompt_1705420800.json
+│       └── cache/           # Pattern matching cache
+│           └── patterns.json
+└── policy_state.json        # Session state summary
+```
+
+### Policy File Format
+
+Each policy is a markdown file with YAML frontmatter:
+
+```markdown
+---
+trigger: src/**/*.py
+safety: README.md
+priority: normal
+---
+Instructions shown to the agent when this policy fires.
+
+These can be multi-line with full markdown formatting.
+```
+
+This format enables:
+1. Code files to reference policies in comments
+2. Human-readable policy documentation
+3. Easy editing with any markdown editor
+4. Clear separation of configuration and content
+
+### Baseline Management
+
+For `compare_to: prompt`, baselines are captured at prompt submission:
+
+```json
+{
+  "timestamp": "2024-01-16T12:00:00Z",
+  "commit": "abc123",
+  "staged_files": ["file1.py", "file2.py"],
+  "untracked_files": ["file3.py"]
+}
+```
+
+Multiple baselines can exist for different prompts in a session.
+
+### Queue Lifecycle
+
+```
+                  ┌─────────┐
+                  │ Created │
+                  │ .queued │
+                  └────┬────┘
+                       │
+         ┌─────────────┼─────────────┐
+         │             │             │
+         ▼             ▼             ▼
+    ┌─────────┐   ┌─────────┐   ┌─────────┐
+    │ .passed │   │ .failed │   │.skipped │
+    └─────────┘   └─────────┘   └─────────┘
+         │             │             │
+         └─────────────┼─────────────┘
+                       │
+                       ▼
+                  ┌─────────┐
+                  │ Pruned  │
+                  │(cleanup)│
+                  └─────────┘
+```
+
+## Error Handling
+
+### Pattern Errors
+
+Invalid patterns are caught at policy load time:
+
+```python
+class PatternError(PolicyError):
+    """Invalid pattern syntax."""
+    pass
+
+# Validation
+def validate_pattern(pattern: str) -> None:
+    # Check for unbalanced braces
+    # Check for invalid variable names
+    # Check for unsupported syntax
+```
+
+### Command Errors
+
+Command execution errors are captured and reported:
+
+```json
+{
+  "status": "failed",
+  "action_result": {
+    "type": "command",
+    "command": "ruff format {file}",
+    "exit_code": 1,
+    "stdout": "",
+    "stderr": "error: invalid syntax in foo.py:10"
+  }
+}
+```
+
+### Queue Corruption
+
+If queue entries become corrupted:
+1. Log error with entry details
+2. Remove corrupted entry
+3. Re-detect triggers on next evaluation
+
+## Configuration
+
+### Policy Files
+
+Policies are stored in `.deepwork/policies/` as individual markdown files with YAML frontmatter. See `doc/policy_syntax.md` for complete syntax documentation.
+
+**Loading Order:**
+1. All `.md` files in `.deepwork/policies/` are loaded
+2. Files are processed in alphabetical order
+3. Filename (without extension) becomes policy identifier
+
+**Policy Discovery:**
+```python
+def load_policies(policies_dir: Path) -> list[Policy]:
+    """Load all policies from the policies directory."""
+    policies = []
+    for path in sorted(policies_dir.glob("*.md")):
+        policy = parse_policy_file(path)
+        policy.name = path.stem  # filename without .md
+        policies.append(policy)
+    return policies
+```
+
+### System Configuration
+
+In `.deepwork/config.yml`:
+
+```yaml
+policy:
+  enabled: true
+  policies_dir: .deepwork/policies  # Can be customized
+  queue_retention_hours: 24
+  max_queued_entries: 100
+  output_mode: batched  # batched, individual, summary
+  priority_threshold: normal  # Show this priority and above
+```
+
+## Performance Considerations
+
+### Caching
+
+- Pattern compilation is cached per-session
+- Baseline diffs are cached by commit hash
+- Queue lookups use hash-based O(1) access
+
+### Lazy Evaluation
+
+- Patterns only compiled when needed
+- File lists only computed for triggered policies
+- Instructions only loaded when policy fires
+
+### Parallel Processing
+
+- Multiple queue entries can be processed in parallel
+- Command actions can run concurrently (with file locking)
+- Pattern matching is parallelized across policies
+
+## Migration from Legacy System
+
+The legacy system used a single `.deepwork.policy.yml` file with array of policies. The new system uses individual markdown files in `.deepwork/policies/`.
+
+**Breaking Changes:**
+- Single YAML file replaced with folder of markdown files
+- Policy `name` field replaced with filename
+- `instructions` / `instructions_file` replaced with markdown body
+- New features: sets, pairs, commands, queue-based state
+
+**No backwards compatibility is provided.** Existing `.deepwork.policy.yml` files must be converted manually.
+
+**Conversion Example:**
+
+Old format (`.deepwork.policy.yml`):
+```yaml
+- name: "README Accuracy"
+  trigger: "src/**/*"
+  safety: "README.md"
+  instructions: |
+    Please verify README.md is accurate.
+```
+
+New format (`.deepwork/policies/readme-accuracy.md`):
+```markdown
+---
+trigger: src/**/*
+safety: README.md
+---
+Please verify README.md is accurate.
+```
+
+## Security Considerations
+
+### Command Execution
+
+- Commands run in sandboxed subprocess
+- No shell expansion (arguments passed as array)
+- Working directory is always repo root
+- Environment variables are filtered
+
+### Queue File Permissions
+
+- Queue directory: 700 (owner only)
+- Queue files: 600 (owner only)
+- No sensitive data in queue entries
+
+### Input Validation
+
+- All policy files validated against schema
+- Pattern variables sanitized before use
+- File paths normalized and validated
diff --git a/doc/test_scenarios.md b/doc/test_scenarios.md
new file mode 100644
index 00000000..c9460f75
--- /dev/null
+++ b/doc/test_scenarios.md
@@ -0,0 +1,509 @@
+# Policy System Test Scenarios
+
+This document describes test scenarios for validating the policy system implementation.
+
+## 1. Pattern Matching
+
+### 1.1 Basic Glob Patterns
+
+| ID | Scenario | Pattern | File | Expected |
+|----|----------|---------|------|----------|
+| PM-1.1.1 | Exact match | `README.md` | `README.md` | Match |
+| PM-1.1.2 | Exact no match | `README.md` | `readme.md` | No match |
+| PM-1.1.3 | Single wildcard | `*.py` | `main.py` | Match |
+| PM-1.1.4 | Single wildcard nested | `*.py` | `src/main.py` | No match |
+| PM-1.1.5 | Double wildcard | `**/*.py` | `src/main.py` | Match |
+| PM-1.1.6 | Double wildcard deep | `**/*.py` | `src/a/b/c/main.py` | Match |
+| PM-1.1.7 | Double wildcard root | `**/*.py` | `main.py` | Match |
+| PM-1.1.8 | Directory prefix | `src/**/*` | `src/foo.py` | Match |
+| PM-1.1.9 | Directory prefix deep | `src/**/*` | `src/a/b/c.py` | Match |
+| PM-1.1.10 | Directory no match | `src/**/*` | `lib/foo.py` | No match |
+| PM-1.1.11 | Brace expansion | `*.{js,ts}` | `app.ts` | Match |
+| PM-1.1.12 | Brace expansion second | `*.{js,ts}` | `app.js` | Match |
+| PM-1.1.13 | Brace expansion no match | `*.{js,ts}` | `app.py` | No match |
+
+### 1.2 Variable Patterns
+
+| ID | Scenario | Pattern | File | Expected Variables |
+|----|----------|---------|------|-------------------|
+| PM-1.2.1 | Single var path | `src/{path}.py` | `src/foo/bar.py` | `{path: "foo/bar"}` |
+| PM-1.2.2 | Single var name | `src/{name}.py` | `src/utils.py` | `{name: "utils"}` |
+| PM-1.2.3 | Name no nested | `src/{name}.py` | `src/foo/bar.py` | No match |
+| PM-1.2.4 | Two variables | `{dir}/{name}.py` | `src/main.py` | `{dir: "src", name: "main"}` |
+| PM-1.2.5 | Prefix and suffix | `test_{name}_test.py` | `test_foo_test.py` | `{name: "foo"}` |
+| PM-1.2.6 | Nested path | `src/{path}/index.py` | `src/a/b/index.py` | `{path: "a/b"}` |
+| PM-1.2.7 | Explicit multi | `src/{**mod}/main.py` | `src/a/b/c/main.py` | `{mod: "a/b/c"}` |
+| PM-1.2.8 | Explicit single | `src/{*name}.py` | `src/utils.py` | `{name: "utils"}` |
+| PM-1.2.9 | Mixed explicit | `{*dir}/{**path}.py` | `src/a/b/c.py` | `{dir: "src", path: "a/b/c"}` |
+
+### 1.3 Pattern Resolution
+
+| ID | Scenario | Pattern | Variables | Expected Output |
+|----|----------|---------|-----------|-----------------|
+| PM-1.3.1 | Simple substitution | `tests/{path}_test.py` | `{path: "foo"}` | `tests/foo_test.py` |
+| PM-1.3.2 | Nested path | `tests/{path}_test.py` | `{path: "a/b/c"}` | `tests/a/b/c_test.py` |
+| PM-1.3.3 | Multiple vars | `{dir}/test_{name}.py` | `{dir: "tests", name: "foo"}` | `tests/test_foo.py` |
+
+## 2. Instruction Policies
+
+### 2.1 Basic Trigger/Safety
+
+| ID | Scenario | Changed Files | Trigger | Safety | Expected |
+|----|----------|---------------|---------|--------|----------|
+| IP-2.1.1 | Trigger match, no safety | `["src/main.py"]` | `src/**/*.py` | None | Fire |
+| IP-2.1.2 | Trigger match, safety match | `["src/main.py", "README.md"]` | `src/**/*.py` | `README.md` | No fire |
+| IP-2.1.3 | Trigger no match | `["docs/readme.md"]` | `src/**/*.py` | None | No fire |
+| IP-2.1.4 | Multiple triggers, one match | `["lib/utils.py"]` | `["src/**/*.py", "lib/**/*.py"]` | None | Fire |
+| IP-2.1.5 | Safety match only | `["README.md"]` | `src/**/*.py` | `README.md` | No fire |
+| IP-2.1.6 | Multiple safety, one match | `["src/main.py", "CHANGELOG.md"]` | `src/**/*.py` | `["README.md", "CHANGELOG.md"]` | No fire |
+| IP-2.1.7 | Multiple triggers, multiple files | `["src/a.py", "lib/b.py"]` | `["src/**/*.py", "lib/**/*.py"]` | None | Fire |
+
+### 2.2 Compare Modes
+
+```
+Setup: Branch diverged 3 commits ago from main
+- Commit 1: Added src/feature.py
+- Commit 2: Modified src/feature.py
+- Commit 3: Added tests/feature_test.py
+- Unstaged: Modified src/utils.py
+```
+
+| ID | Scenario | compare_to | Expected Changed Files |
+|----|----------|------------|----------------------|
+| IP-2.2.1 | Base comparison | `base` | `["src/feature.py", "tests/feature_test.py", "src/utils.py"]` |
+| IP-2.2.2 | Default tip (main ahead 1) | `default_tip` | All base + main's changes |
+| IP-2.2.3 | Prompt baseline (captured after commit 2) | `prompt` | `["tests/feature_test.py", "src/utils.py"]` |
+
+### 2.3 Promise Tags
+
+Policy names are now derived from filenames (without `.md` extension).
+
+| ID | Scenario | Conversation Contains | Policy File | Expected |
+|----|----------|----------------------|-------------|----------|
+| IP-2.3.1 | Exact promise | `<promise>readme-accuracy</promise>` | `readme-accuracy.md` | Suppressed |
+| IP-2.3.2 | Promise with checkmark | `<promise>✓ readme-accuracy</promise>` | `readme-accuracy.md` | Suppressed |
+| IP-2.3.3 | Case insensitive | `<promise>README-ACCURACY</promise>` | `readme-accuracy.md` | Suppressed |
+| IP-2.3.4 | Whitespace | `<promise>  readme-accuracy  </promise>` | `readme-accuracy.md` | Suppressed |
+| IP-2.3.5 | No promise | (none) | `readme-accuracy.md` | Not suppressed |
+| IP-2.3.6 | Wrong promise | `<promise>other-policy</promise>` | `readme-accuracy.md` | Not suppressed |
+| IP-2.3.7 | Multiple promises | `<promise>a</promise><promise>b</promise>` | `a.md` | Suppressed |
+
+## 3. Correspondence Sets
+
+### 3.1 Two-Pattern Sets
+
+```yaml
+set:
+  - "src/{path}.py"
+  - "tests/{path}_test.py"
+```
+
+| ID | Scenario | Changed Files | Expected |
+|----|----------|---------------|----------|
+| CS-3.1.1 | Both changed | `["src/foo.py", "tests/foo_test.py"]` | No fire (satisfied) |
+| CS-3.1.2 | Only source | `["src/foo.py"]` | Fire (missing test) |
+| CS-3.1.3 | Only test | `["tests/foo_test.py"]` | Fire (missing source) |
+| CS-3.1.4 | Nested both | `["src/a/b.py", "tests/a/b_test.py"]` | No fire |
+| CS-3.1.5 | Nested only source | `["src/a/b.py"]` | Fire |
+| CS-3.1.6 | Unrelated file | `["docs/readme.md"]` | No fire |
+| CS-3.1.7 | Source + unrelated | `["src/foo.py", "docs/readme.md"]` | Fire |
+| CS-3.1.8 | Both + unrelated | `["src/foo.py", "tests/foo_test.py", "docs/readme.md"]` | No fire |
+
+### 3.2 Three-Pattern Sets
+
+```yaml
+set:
+  - "models/{name}.py"
+  - "schemas/{name}.py"
+  - "migrations/{name}.sql"
+```
+
+| ID | Scenario | Changed Files | Expected |
+|----|----------|---------------|----------|
+| CS-3.2.1 | All three | `["models/user.py", "schemas/user.py", "migrations/user.sql"]` | No fire |
+| CS-3.2.2 | Two of three | `["models/user.py", "schemas/user.py"]` | Fire (missing migration) |
+| CS-3.2.3 | One of three | `["models/user.py"]` | Fire (missing 2) |
+| CS-3.2.4 | Different names | `["models/user.py", "schemas/order.py"]` | Fire (both incomplete) |
+
+### 3.3 Edge Cases
+
+| ID | Scenario | Changed Files | Expected |
+|----|----------|---------------|----------|
+| CS-3.3.1 | File matches both patterns | `["src/test_foo_test.py"]` | Depends on pattern specificity |
+| CS-3.3.2 | Empty path variable | (N/A - patterns require content) | Pattern validation error |
+| CS-3.3.3 | Multiple files same pattern | `["src/a.py", "src/b.py"]` | Fire for each without corresponding test |
+
+## 4. Correspondence Pairs
+
+### 4.1 Basic Pairs
+
+```yaml
+pair:
+  trigger: "api/{path}.py"
+  expects: "docs/api/{path}.md"
+```
+
+| ID | Scenario | Changed Files | Expected |
+|----|----------|---------------|----------|
+| CP-4.1.1 | Both changed | `["api/users.py", "docs/api/users.md"]` | No fire |
+| CP-4.1.2 | Only trigger | `["api/users.py"]` | Fire |
+| CP-4.1.3 | Only expected | `["docs/api/users.md"]` | No fire (directional) |
+| CP-4.1.4 | Trigger + unrelated | `["api/users.py", "README.md"]` | Fire |
+| CP-4.1.5 | Expected + unrelated | `["docs/api/users.md", "README.md"]` | No fire |
+
+### 4.2 Multi-Expects Pairs
+
+```yaml
+pair:
+  trigger: "api/{path}.py"
+  expects:
+    - "docs/api/{path}.md"
+    - "openapi/{path}.yaml"
+```
+
+| ID | Scenario | Changed Files | Expected |
+|----|----------|---------------|----------|
+| CP-4.2.1 | All three | `["api/users.py", "docs/api/users.md", "openapi/users.yaml"]` | No fire |
+| CP-4.2.2 | Trigger + one expect | `["api/users.py", "docs/api/users.md"]` | Fire (missing openapi) |
+| CP-4.2.3 | Only trigger | `["api/users.py"]` | Fire (missing both) |
+| CP-4.2.4 | Both expects only | `["docs/api/users.md", "openapi/users.yaml"]` | No fire |
+
+## 5. Command Policies
+
+### 5.1 Basic Commands
+
+```yaml
+- name: "Format Python"
+  trigger: "**/*.py"
+  action:
+    command: "ruff format {file}"
+    run_for: each_match
+```
+
+| ID | Scenario | Changed Files | Expected Behavior |
+|----|----------|---------------|-------------------|
+| CMD-5.1.1 | Single file | `["src/main.py"]` | Run `ruff format src/main.py` |
+| CMD-5.1.2 | Multiple files | `["src/a.py", "src/b.py"]` | Run command for each file |
+| CMD-5.1.3 | Non-matching | `["README.md"]` | No command run |
+
+### 5.2 All Matches Mode
+
+```yaml
+action:
+  command: "eslint --fix {files}"
+  run_for: all_matches
+```
+
+| ID | Scenario | Changed Files | Expected Command |
+|----|----------|---------------|------------------|
+| CMD-5.2.1 | Multiple files | `["a.js", "b.js", "c.js"]` | `eslint --fix a.js b.js c.js` |
+| CMD-5.2.2 | Single file | `["a.js"]` | `eslint --fix a.js` |
+
+### 5.3 Idempotency Verification
+
+| ID | Scenario | First Run | Second Run | Expected Result |
+|----|----------|-----------|------------|-----------------|
+| CMD-5.3.1 | Truly idempotent | Changes files | No changes | Pass |
+| CMD-5.3.2 | Not idempotent | Changes files | Changes files | Fail |
+| CMD-5.3.3 | No changes needed | No changes | (not run) | Pass |
+
+### 5.4 Command Errors
+
+| ID | Scenario | Command Result | Expected |
+|----|----------|----------------|----------|
+| CMD-5.4.1 | Exit code 0 | Success | Pass |
+| CMD-5.4.2 | Exit code 1 | Failure | Fail, show stderr |
+| CMD-5.4.3 | Timeout | Command hangs | Fail, timeout error |
+| CMD-5.4.4 | Command not found | Not executable | Fail, not found error |
+
+## 6. Queue System
+
+### 6.1 Queue Entry Lifecycle
+
+| ID | Scenario | Initial State | Action | Final State |
+|----|----------|---------------|--------|-------------|
+| QS-6.1.1 | New trigger | (none) | Trigger detected | `.queued` |
+| QS-6.1.2 | Safety suppression | `.queued` | Safety pattern matches | `.skipped` |
+| QS-6.1.3 | Prompt addressed | `.queued` | Promise tag found | `.passed` |
+| QS-6.1.4 | Command success | `.queued` | Command passes | `.passed` |
+| QS-6.1.5 | Command failure | `.queued` | Command fails | `.failed` |
+| QS-6.1.6 | Re-trigger same | `.passed` | Same files changed | No new entry |
+| QS-6.1.7 | Re-trigger different | `.passed` | Different files | New `.queued` |
+
+### 6.2 Hash Calculation
+
+| ID | Scenario | Policy | Files | Baseline | Expected Hash Differs? |
+|----|----------|--------|-------|----------|------------------------|
+| QS-6.2.1 | Same everything | PolicyA | `[a.py]` | commit1 | Same hash |
+| QS-6.2.2 | Different files | PolicyA | `[a.py]` vs `[b.py]` | commit1 | Different |
+| QS-6.2.3 | Different baseline | PolicyA | `[a.py]` | commit1 vs commit2 | Different |
+| QS-6.2.4 | Different policy | PolicyA vs PolicyB | `[a.py]` | commit1 | Different |
+
+### 6.3 Queue Cleanup
+
+| ID | Scenario | Entry Age | Entry Status | Expected |
+|----|----------|-----------|--------------|----------|
+| QS-6.3.1 | Old queued | 25 hours | `.queued` | Pruned |
+| QS-6.3.2 | Recent queued | 1 hour | `.queued` | Kept |
+| QS-6.3.3 | Old passed | 2 hours | `.passed` | Pruned |
+| QS-6.3.4 | Recent passed | 30 min | `.passed` | Kept |
+| QS-6.3.5 | Old failed | 25 hours | `.failed` | Pruned |
+
+### 6.4 Concurrent Access
+
+| ID | Scenario | Process A | Process B | Expected |
+|----|----------|-----------|-----------|----------|
+| QS-6.4.1 | Simultaneous create | Creates entry | Creates entry | One wins, other no-ops |
+| QS-6.4.2 | Create during eval | Creating | Evaluating existing | A creates new, B continues |
+| QS-6.4.3 | Both evaluate same | Evaluating | Evaluating | File locking prevents race |
+
+## 7. Output Management
+
+### 7.1 Priority Ordering
+
+```
+Policies:
+- Critical: "Security Review"
+- High: "API Documentation"
+- Normal: "README Accuracy"
+- Low: "Code Style"
+```
+
+| ID | Scenario | Triggered Policies | Expected Order |
+|----|----------|-------------------|----------------|
+| OM-7.1.1 | All priorities | All 4 | Security, API, README, Style |
+| OM-7.1.2 | Mixed | High, Low | API, Style |
+| OM-7.1.3 | Same priority | 3 Normal | Alphabetical within priority |
+
+### 7.2 Output Batching
+
+| ID | Scenario | Triggered Policies | Expected Output |
+|----|----------|-------------------|-----------------|
+| OM-7.2.1 | Single policy | 1 | Full instructions |
+| OM-7.2.2 | Two policies | 2 | Both, numbered |
+| OM-7.2.3 | Many policies | 10 | Batched with summary |
+| OM-7.2.4 | Same type | 3 Source/Test pairs | Grouped under heading |
+
+### 7.3 Deferred Policies
+
+| ID | Scenario | Policy defer Setting | Agent Action | Expected |
+|----|----------|---------------------|--------------|----------|
+| OM-7.3.1 | Deferred, stop | `defer: true` | Stop | Not shown |
+| OM-7.3.2 | Deferred, session end | `defer: true` | Session ends | Shown |
+| OM-7.3.3 | Not deferred | `defer: false` | Stop | Shown |
+
+## 8. Schema Validation
+
+### 8.1 Required Fields
+
+| ID | Scenario | Missing Field | Expected Error |
+|----|----------|---------------|----------------|
+| SV-8.1.1 | Missing name | `name` | "required field 'name'" |
+| SV-8.1.2 | Missing trigger (instruction) | `trigger` | "required 'trigger', 'set', or 'pair'" |
+| SV-8.1.3 | Missing instructions | `instructions` | "required 'instructions' or 'instructions_file'" |
+| SV-8.1.4 | Missing set patterns | `set` is empty | "set requires at least 2 patterns" |
+
+### 8.2 Mutually Exclusive Fields
+
+| ID | Scenario | Fields Present | Expected Error |
+|----|----------|----------------|----------------|
+| SV-8.2.1 | Both instructions types | `instructions` + `instructions_file` | "use one or the other" |
+| SV-8.2.2 | Both trigger types | `trigger` + `set` | "use trigger, set, or pair" |
+| SV-8.2.3 | All trigger types | `trigger` + `set` + `pair` | "use one policy type" |
+
+### 8.3 Pattern Validation
+
+| ID | Scenario | Pattern | Expected Error |
+|----|----------|---------|----------------|
+| SV-8.3.1 | Unclosed brace | `src/{path.py` | "unclosed brace" |
+| SV-8.3.2 | Empty variable | `src/{}.py` | "empty variable name" |
+| SV-8.3.3 | Invalid chars in var | `src/{path/name}.py` | "invalid variable name" |
+| SV-8.3.4 | Duplicate variable | `{path}/{path}.py` | "duplicate variable 'path'" |
+
+### 8.4 Value Validation
+
+| ID | Scenario | Field | Value | Expected Error |
+|----|----------|-------|-------|----------------|
+| SV-8.4.1 | Invalid compare_to | `compare_to` | `"yesterday"` | "must be base, default_tip, or prompt" |
+| SV-8.4.2 | Invalid priority | `priority` | `"urgent"` | "must be critical, high, normal, or low" |
+| SV-8.4.3 | Invalid run_for | `run_for` | `"first_match"` | "must be each_match or all_matches" |
+
+## 9. Integration Tests
+
+### 9.1 End-to-End Instruction Policy
+
+```
+Given: Policy requiring tests for source changes
+When: User modifies src/auth/login.py without test
+Then:
+  1. Stop hook fires
+  2. Detector creates queue entry
+  3. Evaluator returns instructions
+  4. Agent sees policy message
+  5. Agent adds tests
+  6. Agent includes promise tag
+  7. Next stop: queue entry marked passed
+  8. Agent can stop successfully
+```
+
+### 9.2 End-to-End Command Policy
+
+```
+Given: Auto-format policy for Python files
+When: User creates unformatted src/new_file.py
+Then:
+  1. Stop hook fires
+  2. Detector creates queue entry
+  3. Evaluator runs formatter
+  4. Formatter modifies file
+  5. Evaluator verifies idempotency
+  6. Queue entry marked passed
+  7. Agent notified of formatting changes
+```
+
+### 9.3 End-to-End Correspondence Set
+
+```
+Given: Source/test pairing policy
+When: User modifies src/utils.py only
+Then:
+  1. Detector matches src/utils.py to pattern
+  2. Resolver calculates expected tests/utils_test.py
+  3. tests/utils_test.py not in changed files
+  4. Queue entry created for incomplete correspondence
+  5. Evaluator returns instructions
+  6. Agent sees "expected tests/utils_test.py to change"
+```
+
+### 9.4 Multiple Policies Same File
+
+```
+Given:
+  - Policy A: "Format Python" (command)
+  - Policy B: "Test Coverage" (set)
+  - Policy C: "README Accuracy" (instruction)
+When: User modifies src/main.py
+Then:
+  1. All three policies trigger
+  2. Command policy runs first
+  3. Set policy checks for test
+  4. Instruction policy prepares message
+  5. Agent sees batched output with all requirements
+```
+
+### 9.5 Safety Pattern Across Policies
+
+```
+Given:
+  - Policy A: trigger=src/**/*.py, safety=CHANGELOG.md
+  - Policy B: trigger=src/**/*.py, safety=README.md
+When: User modifies src/main.py and CHANGELOG.md
+Then:
+  1. Policy A: safety match, skipped
+  2. Policy B: no safety match, fires
+  3. Only Policy B instructions shown
+```
+
+## 10. Performance Tests
+
+### 10.1 Large File Count
+
+| ID | Scenario | File Count | Expected |
+|----|----------|------------|----------|
+| PT-10.1.1 | Many changed files | 100 | < 1s evaluation |
+| PT-10.1.2 | Very many files | 1000 | < 5s evaluation |
+| PT-10.1.3 | Pattern-heavy | 50 policies, 100 files | < 2s evaluation |
+
+### 10.2 Queue Size
+
+| ID | Scenario | Queue Entries | Expected |
+|----|----------|---------------|----------|
+| PT-10.2.1 | Moderate queue | 100 entries | < 100ms load |
+| PT-10.2.2 | Large queue | 1000 entries | < 500ms load |
+| PT-10.2.3 | Cleanup performance | 10000 old entries | < 1s cleanup |
+
+### 10.3 Pattern Matching
+
+| ID | Scenario | Patterns | Files | Expected |
+|----|----------|----------|-------|----------|
+| PT-10.3.1 | Simple patterns | 10 | 100 | < 10ms |
+| PT-10.3.2 | Complex patterns | 50 with variables | 100 | < 50ms |
+| PT-10.3.3 | Deep recursion | `**/**/**/*.py` | 1000 | < 100ms |
+
+## Test Data Fixtures
+
+### Sample Policy Files
+
+Policies are stored as individual markdown files in `.deepwork/policies/`:
+
+**`.deepwork/policies/readme-accuracy.md`**
+```markdown
+---
+trigger: src/**/*
+safety: README.md
+---
+Please review README.md for accuracy.
+```
+
+**`.deepwork/policies/source-test-pairing.md`**
+```markdown
+---
+set:
+  - src/{path}.py
+  - tests/{path}_test.py
+---
+Source and test should change together.
+```
+
+**`.deepwork/policies/api-documentation.md`**
+```markdown
+---
+pair:
+  trigger: api/{module}.py
+  expects: docs/api/{module}.md
+---
+API changes need documentation.
+```
+
+**`.deepwork/policies/python-formatting.md`**
+```markdown
+---
+trigger: "**/*.py"
+action:
+  command: black {file}
+  run_for: each_match
+---
+Auto-formats Python files with Black.
+```
+
+### Sample Queue Entry
+
+```json
+{
+  "policy_name": "source-test-pairing",
+  "trigger_hash": "abc123def456",
+  "status": "queued",
+  "created_at": "2024-01-16T10:00:00Z",
+  "evaluated_at": null,
+  "baseline_ref": "abc123",
+  "trigger_files": ["src/auth/login.py"],
+  "expected_files": ["tests/auth/login_test.py"],
+  "matched_files": [],
+  "action_result": null
+}
+```
+
+### Directory Structure for Tests
+
+```
+.deepwork/
+├── policies/
+│   ├── readme-accuracy.md
+│   ├── source-test-pairing.md
+│   ├── api-documentation.md
+│   └── python-formatting.md
+└── tmp/
+    └── policy/
+        └── queue/
+            └── (queue entries created during tests)
+```

From cf8b7e28fd6afd6543e94f1e9c4f0df596984e4e Mon Sep 17 00:00:00 2001
From: Noah Horton <noah@unsupervised.com>
Date: Fri, 16 Jan 2026 11:26:58 -0700
Subject: [PATCH 02/21] Feedback from review

---
 doc/policy_syntax.md        | 2 +-
 doc/policy_system_design.md | 8 ++++++++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/doc/policy_syntax.md b/doc/policy_syntax.md
index 72654b08..ec1c1a7b 100644
--- a/doc/policy_syntax.md
+++ b/doc/policy_syntax.md
@@ -36,7 +36,7 @@ class AuthService:
 trigger: src/**/*
 safety: README.md
 ---
-Source code changed. Please verify README.md is accurate.
+Source code changed. Please verify README.md is accurate. Note that this is called only once even if there are many changes, so verify all changes.
 
 Check that:
 - All public APIs are documented
diff --git a/doc/policy_system_design.md b/doc/policy_system_design.md
index d62d78bf..b7a158d6 100644
--- a/doc/policy_system_design.md
+++ b/doc/policy_system_design.md
@@ -306,6 +306,14 @@ The following policies require attention:
 1. **Source/Test Pairing**: src/auth/login.py changed without tests/auth/login_test.py
 2. **Source/Test Pairing**: src/api/users.py changed without tests/api/users_test.py
 3. **API Documentation**: api/users.py changed without docs/api/users.md
+ 
+[FEEDBACK] - these sections should all be shorter. Ex:
+## Source/Test Pairings
+src/auth/login.py changed without tests/auth/login_test.py
+src/api/users.py changed without tests/api/users_test.py
+
+## API Documentation
+api/users.py changed without docs/api/users.md
 
 ## Code Quality (1)
 

From 64d994a41773578a52f7e6248a0795eba2114dbf Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Fri, 16 Jan 2026 18:32:23 +0000
Subject: [PATCH 03/21] Simplify policy system design based on feedback
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Key changes:
- Restructure taxonomy: detection modes (trigger/safety, set, pair) + action types (prompt, command)
- Add required `name` field for human-friendly promise tag display (e.g., "✓ Source/Test Pairing")
- Remove priority and defer features (not needed yet)
- Clarify .deepwork/tmp is gitignored, so cleanup is not critical
- Shorten output format - group by policy name, use simple arrow notation for correspondence
- Update all examples to include name field
---
 doc/policy_syntax.md        | 318 +++++++++++-------------------------
 doc/policy_system_design.md | 126 +++++---------
 doc/test_scenarios.md       |  82 ++++------
 3 files changed, 177 insertions(+), 349 deletions(-)

diff --git a/doc/policy_syntax.md b/doc/policy_syntax.md
index ec1c1a7b..382e9669 100644
--- a/doc/policy_syntax.md
+++ b/doc/policy_syntax.md
@@ -17,7 +17,7 @@ Policies are stored as individual markdown files with YAML frontmatter:
 
 Each file has:
 - **Frontmatter**: YAML configuration between `---` delimiters
-- **Body**: Instructions (for prompt policies) or description (for command policies)
+- **Body**: Instructions (for prompt actions) or description (for command actions)
 
 This structure enables code files to reference policies:
 ```python
@@ -28,15 +28,16 @@ class AuthService:
 
 ## Quick Reference
 
-### Instruction Policy
+### Simple Trigger with Prompt
 
 `.deepwork/policies/readme-accuracy.md`:
 ```markdown
 ---
+name: README Accuracy
 trigger: src/**/*
 safety: README.md
 ---
-Source code changed. Please verify README.md is accurate. Note that this is called only once even if there are many changes, so verify all changes.
+Source code changed. Please verify README.md is accurate.
 
 Check that:
 - All public APIs are documented
@@ -49,6 +50,7 @@ Check that:
 `.deepwork/policies/source-test-pairing.md`:
 ```markdown
 ---
+name: Source/Test Pairing
 set:
   - src/{path}.py
   - tests/{path}_test.py
@@ -64,6 +66,7 @@ When adding tests, ensure they test actual source code.
 `.deepwork/policies/api-documentation.md`:
 ```markdown
 ---
+name: API Documentation
 pair:
   trigger: api/{path}.py
   expects: docs/api/{path}.md
@@ -76,11 +79,12 @@ When modifying an API endpoint, update its documentation to reflect:
 - New error conditions
 ```
 
-### Command Policy
+### Command Action
 
 `.deepwork/policies/python-formatting.md`:
 ```markdown
 ---
+name: Python Formatting
 trigger: "**/*.py"
 action:
   command: ruff format {file}
@@ -91,212 +95,145 @@ This policy runs `ruff format` on any changed Python files to ensure
 consistent code style across the codebase.
 ```
 
-## Policy Types
+## Policy Structure
 
-### Instruction Policies
+Every policy has two orthogonal aspects:
 
-Instruction policies prompt the AI agent with guidance when certain files change.
+### Detection Mode
 
-**Frontmatter fields:**
-```yaml
----
-trigger: pattern              # Required: file pattern(s) that trigger
-safety: pattern               # Optional: file pattern(s) that suppress
-compare_to: base              # Optional: comparison baseline
-priority: normal              # Optional: output priority
----
-```
+How the policy decides when to fire:
 
-The markdown body contains the instructions shown to the agent.
+| Mode | Field | Description |
+|------|-------|-------------|
+| **Trigger/Safety** | `trigger`, `safety` | Fire when trigger matches and safety doesn't |
+| **Set** | `set` | Fire when file correspondence is incomplete (bidirectional) |
+| **Pair** | `pair` | Fire when file correspondence is incomplete (directional) |
 
-**Example:** `.deepwork/policies/security-review.md`
+### Action Type
 
-```markdown
+What happens when the policy fires:
+
+| Type | Field | Description |
+|------|-------|-------------|
+| **Prompt** (default) | (markdown body) | Show instructions to the agent |
+| **Command** | `action.command` | Run an idempotent command |
+
+## Detection Modes
+
+### Trigger/Safety Mode
+
+The simplest detection mode. Fires when changed files match `trigger` patterns and no changed files match `safety` patterns.
+
+```yaml
 ---
+name: Security Review
 trigger:
   - src/auth/**/*
   - src/crypto/**/*
 safety: SECURITY.md
 compare_to: base
-priority: critical
 ---
-Security-sensitive code has been modified.
-
-Please verify:
-1. No credentials are hardcoded
-2. Input validation is present
-3. Authentication checks are correct
 ```
 
-### Correspondence Sets
+### Set Mode (Bidirectional Correspondence)
 
-Sets define bidirectional relationships between files. When any file in a correspondence group changes, all related files should also change.
+Defines files that should change together. If ANY file in a correspondence group changes, ALL related files should also change.
 
-**Frontmatter fields:**
 ```yaml
 ---
-set:                            # Required: list of corresponding patterns
-  - pattern1/{path}.ext1
-  - pattern2/{path}.ext2
+name: Source/Test Pairing
+set:
+  - src/{path}.py
+  - tests/{path}_test.py
 ---
 ```
 
-The markdown body contains instructions for when correspondence is incomplete.
-
 **How it works:**
 
 1. A file changes that matches one pattern in the set
 2. System extracts the variable portions (e.g., `{path}`)
 3. System generates expected files by substituting into other patterns
 4. If ALL expected files also changed: policy is satisfied (no trigger)
-5. If ANY expected file is missing: policy triggers with instructions
-
-**Example:** `.deepwork/policies/source-test-pairing.md`
-
-```markdown
----
-set:
-  - src/{path}.py
-  - tests/{path}_test.py
----
-Source and test files should change together.
-
-Changed: {trigger_file}
-Expected: {expected_files}
-
-Please ensure both source and test are updated.
-```
+5. If ANY expected file is missing: policy fires
 
 If `src/auth/login.py` changes:
 - Extracts `{path}` = `auth/login`
 - Expects `tests/auth/login_test.py` to also change
-- If test didn't change, shows instructions
+- If test didn't change, fires with instructions
 
 If `tests/auth/login_test.py` changes:
 - Extracts `{path}` = `auth/login`
 - Expects `src/auth/login.py` to also change
-- If source didn't change, shows instructions
-
-**Example:** `.deepwork/policies/model-schema-migration.md`
-
-```markdown
----
-set:
-  - models/{name}.py
-  - schemas/{name}.py
-  - migrations/{name}.sql
----
-Models, schemas, and migrations should stay in sync.
-
-When modifying database models, ensure:
-- Schema definitions are updated
-- Migration files are created or updated
-```
+- If source didn't change, fires with instructions
 
-### Correspondence Pairs
+### Pair Mode (Directional Correspondence)
 
-Pairs define directional relationships. Changes to trigger files require corresponding expected files to change, but not vice versa.
+Defines directional relationships. Changes to trigger files require corresponding expected files to change, but not vice versa.
 
-**Frontmatter fields:**
 ```yaml
 ---
+name: API Documentation
 pair:
-  trigger: pattern/{path}.ext     # Required: pattern that triggers
-  expects: pattern/{path}.ext     # Required: expected to also change
+  trigger: api/{module}/{name}.py
+  expects: docs/api/{module}/{name}.md
 ---
 ```
 
-Can also specify multiple expected patterns:
+Can specify multiple expected patterns:
 
 ```yaml
 ---
 pair:
-  trigger: pattern/{path}.ext
+  trigger: api/{path}.py
   expects:
-    - pattern1/{path}.ext
-    - pattern2/{path}.ext
+    - docs/api/{path}.md
+    - schemas/{path}.json
 ---
 ```
 
-**Example:** `.deepwork/policies/api-documentation.md`
-
-```markdown
----
-pair:
-  trigger: api/{module}/{name}.py
-  expects: docs/api/{module}/{name}.md
----
-API endpoint changed without documentation update.
-
-Changed: {trigger_file}
-Please update: {expected_files}
-
-Ensure the documentation covers:
-- Endpoint URL and method
-- Request parameters
-- Response format
-- Error cases
-```
-
 If `api/users/create.py` changes:
 - Expects `docs/api/users/create.md` to also change
-- If doc didn't change, shows instructions
+- If doc didn't change, fires with instructions
 
 If `docs/api/users/create.md` changes alone:
 - No trigger (documentation can be updated independently)
 
-### Command Policies
+## Action Types
 
-Command policies run idempotent commands instead of prompting the agent.
+### Prompt Action (Default)
 
-**Frontmatter fields:**
-```yaml
----
-trigger: pattern                  # Required: files that trigger
-safety: pattern                   # Optional: files that suppress
-action:
-  command: command {file}         # Required: command to run
-  run_for: each_match             # Optional: each_match (default) or all_matches
----
-```
+The markdown body after frontmatter serves as instructions shown to the agent. This is the default when no `action` field is specified.
 
-The markdown body serves as a description of what the command does (shown in logs, not to agent).
+**Template Variables in Instructions:**
 
-**Template Variables in Commands:**
+| Variable | Description |
+|----------|-------------|
+| `{trigger_file}` | The file that triggered the policy |
+| `{trigger_files}` | All files that matched trigger patterns |
+| `{expected_files}` | Expected corresponding files (for sets/pairs) |
 
-| Variable | Description | Available When |
-|----------|-------------|----------------|
-| `{file}` | Single file path | `run_for: each_match` |
-| `{files}` | Space-separated file paths | `run_for: all_matches` |
-| `{repo_root}` | Repository root directory | Always |
+### Command Action
 
-**Example:** `.deepwork/policies/python-formatting.md`
+Runs an idempotent command instead of prompting the agent.
 
-```markdown
+```yaml
 ---
+name: Python Formatting
 trigger: "**/*.py"
 safety: "*.pyi"
 action:
   command: ruff format {file}
   run_for: each_match
 ---
-Automatically formats Python files using ruff.
-
-This ensures consistent code style without requiring manual formatting.
-Stub files (*.pyi) are excluded as they have different formatting rules.
 ```
 
-**Example:** `.deepwork/policies/eslint-check.md`
+**Template Variables in Commands:**
 
-```markdown
----
-trigger: "**/*.{js,ts,tsx}"
-action:
-  command: eslint --fix {files}
-  run_for: all_matches
----
-Runs ESLint with auto-fix on all changed JavaScript/TypeScript files.
-```
+| Variable | Description | Available When |
+|----------|-------------|----------------|
+| `{file}` | Single file path | `run_for: each_match` |
+| `{files}` | Space-separated file paths | `run_for: all_matches` |
+| `{repo_root}` | Repository root directory | Always |
 
 **Idempotency Requirement:**
 
@@ -357,6 +294,16 @@ To explicitly control this, use `{**name}` for multi-segment or `{*name}` for si
 
 ## Field Reference
 
+### name (required)
+
+Human-friendly name for the policy. Displayed in promise tags and output.
+
+```yaml
+---
+name: Source/Test Pairing
+---
+```
+
 ### File Naming
 
 Policy files are named using kebab-case with `.md` extension:
@@ -364,20 +311,18 @@ Policy files are named using kebab-case with `.md` extension:
 - `source-test-pairing.md`
 - `api-documentation.md`
 
-The filename (without extension) serves as the policy's unique identifier for logging and promise tags.
+The filename serves as the policy's identifier in the queue system.
 
-### trigger (instruction/command policies)
+### trigger
 
-File patterns that cause the policy to fire. Can be string or array.
+File patterns that cause the policy to fire (trigger/safety mode). Can be string or array.
 
 ```yaml
 ---
-# Single pattern
 trigger: src/**/*.py
 ---
 
 ---
-# Multiple patterns
 trigger:
   - src/**/*.py
   - lib/**/*.py
@@ -390,21 +335,19 @@ File patterns that suppress the policy. If ANY changed file matches a safety pat
 
 ```yaml
 ---
-# Single pattern
 safety: CHANGELOG.md
 ---
 
 ---
-# Multiple patterns
 safety:
   - CHANGELOG.md
   - docs/**/*
 ---
 ```
 
-### set (correspondence sets)
+### set
 
-List of patterns defining bidirectional file relationships.
+List of patterns defining bidirectional file relationships (set mode).
 
 ```yaml
 ---
@@ -414,9 +357,9 @@ set:
 ---
 ```
 
-### pair (correspondence pairs)
+### pair
 
-Object with `trigger` and `expects` patterns for directional relationships.
+Object with `trigger` and `expects` patterns for directional relationships (pair mode).
 
 ```yaml
 ---
@@ -426,7 +369,6 @@ pair:
 ---
 
 ---
-# Or with multiple expects
 pair:
   trigger: api/{path}.py
   expects:
@@ -435,20 +377,7 @@ pair:
 ---
 ```
 
-### Markdown Body (instructions)
-
-The markdown content after the frontmatter serves as instructions shown to the agent when the policy fires.
-
-**Template Variables in Instructions:**
-
-| Variable | Description |
-|----------|-------------|
-| `{trigger_file}` | The file that triggered the policy |
-| `{trigger_files}` | All files that matched trigger patterns |
-| `{expected_files}` | Expected corresponding files (for sets/pairs) |
-| `{safety_files}` | Files that would suppress the policy |
-
-### action (command policies)
+### action (optional)
 
 Specifies a command to run instead of prompting.
 
@@ -476,33 +405,6 @@ compare_to: prompt
 ---
 ```
 
-### priority (optional)
-
-Controls output ordering and visibility.
-
-| Value | Behavior |
-|-------|----------|
-| `critical` | Always shown first, blocks progress |
-| `high` | Shown prominently |
-| `normal` (default) | Standard display |
-| `low` | Shown in summary, may be collapsed |
-
-```yaml
----
-priority: critical
----
-```
-
-### defer (optional)
-
-When `true`, policy output is deferred to end of session.
-
-```yaml
----
-defer: true
----
-```
-
 ## Complete Examples
 
 ### Example 1: Test Coverage Policy
@@ -510,10 +412,10 @@ defer: true
 `.deepwork/policies/test-coverage.md`:
 ```markdown
 ---
+name: Test Coverage
 set:
   - src/{path}.py
   - tests/{path}_test.py
-compare_to: base
 ---
 Source code was modified without corresponding test updates.
 
@@ -522,7 +424,7 @@ Expected test: {expected_files}
 
 Please either:
 1. Add/update tests for the changed code
-2. Explain why tests are not needed (and mark with <promise>)
+2. Explain why tests are not needed
 ```
 
 ### Example 2: Documentation Sync
@@ -530,18 +432,16 @@ Please either:
 `.deepwork/policies/api-documentation-sync.md`:
 ```markdown
 ---
+name: API Documentation Sync
 pair:
   trigger: src/api/{module}/{endpoint}.py
   expects:
     - docs/api/{module}/{endpoint}.md
     - openapi/{module}.yaml
-priority: high
 ---
 API endpoint changed. Please update:
 - Documentation: {expected_files}
 - Ensure OpenAPI spec is current
-
-If this is an internal-only change, mark as addressed.
 ```
 
 ### Example 3: Auto-formatting Pipeline
@@ -549,6 +449,7 @@ If this is an internal-only change, mark as addressed.
 `.deepwork/policies/python-black-formatting.md`:
 ```markdown
 ---
+name: Python Black Formatting
 trigger: "**/*.py"
 safety:
   - "**/*.pyi"
@@ -564,22 +465,12 @@ Excludes:
 - Database migration files
 ```
 
-`.deepwork/policies/typescript-prettier.md`:
-```markdown
----
-trigger: "**/*.{ts,tsx}"
-action:
-  command: prettier --write {file}
-  run_for: each_match
----
-Formats TypeScript files using Prettier.
-```
-
 ### Example 4: Multi-file Correspondence
 
 `.deepwork/policies/full-stack-feature-sync.md`:
 ```markdown
 ---
+name: Full Stack Feature Sync
 set:
   - backend/api/{feature}/routes.py
   - backend/api/{feature}/models.py
@@ -593,9 +484,6 @@ When modifying a feature, ensure:
 - Backend models are updated
 - Frontend API client is updated
 - Frontend components are updated
-
-Changed: {trigger_files}
-Expected: {expected_files}
 ```
 
 ### Example 5: Conditional Safety
@@ -603,15 +491,13 @@ Expected: {expected_files}
 `.deepwork/policies/version-bump-required.md`:
 ```markdown
 ---
+name: Version Bump Required
 trigger:
   - src/**/*.py
   - pyproject.toml
 safety:
   - pyproject.toml
   - CHANGELOG.md
-compare_to: base
-priority: low
-defer: true
 ---
 Code changes detected. Before merging, ensure:
 - Version is bumped in pyproject.toml (if needed)
@@ -623,20 +509,14 @@ or CHANGELOG.md, as that indicates you're handling versioning.
 
 ## Promise Tags
 
-When a policy fires but should be dismissed, use promise tags in the conversation:
-
-```
-<promise>policy-filename</promise>
-```
-
-Use the policy filename (without `.md` extension) as the identifier:
+When a policy fires but should be dismissed, use promise tags in the conversation. The tag content should be human-readable, using the policy's `name` field with a checkmark:
 
 ```
-<promise>test-coverage</promise>
-<promise>api-documentation-sync</promise>
+<promise>✓ Source/Test Pairing</promise>
+<promise>✓ API Documentation Sync</promise>
 ```
 
-This tells the system the policy has been addressed (either by action or explicit acknowledgment).
+The checkmark and friendly name make promise tags easy to read when displayed in the conversation. The system matches promise tags to policies using case-insensitive comparison of the `name` field (ignoring the checkmark prefix).
 
 ## Validation
 
diff --git a/doc/policy_system_design.md b/doc/policy_system_design.md
index b7a158d6..93f49896 100644
--- a/doc/policy_system_design.md
+++ b/doc/policy_system_design.md
@@ -11,26 +11,37 @@ The deepwork policy system enables automated enforcement of development standard
 
 ## Core Concepts
 
-### Policy Types
+### Policy Structure
 
-The system supports three policy types:
+Every policy has two orthogonal aspects:
 
-| Type | Purpose | Trigger Direction |
-|------|---------|-------------------|
-| **Instruction policies** | Prompt agent with instructions | Any matched file |
-| **Command policies** | Run idempotent commands | Any matched file |
-| **Correspondence policies** | Enforce file relationships | When relationship is incomplete |
+**Detection Mode** - How the policy decides when to fire:
 
-### File Correspondence
+| Mode | Field | Description |
+|------|-------|-------------|
+| **Trigger/Safety** | `trigger`, `safety` | Fire when trigger matches and safety doesn't |
+| **Set** | `set` | Fire when file correspondence is incomplete (bidirectional) |
+| **Pair** | `pair` | Fire when file correspondence is incomplete (directional) |
 
-Correspondence policies define relationships between files that should change together.
+**Action Type** - What happens when the policy fires:
 
-**Sets (Bidirectional)**
+| Type | Field | Description |
+|------|-------|-------------|
+| **Prompt** (default) | (markdown body) | Show instructions to the agent |
+| **Command** | `action.command` | Run an idempotent command |
+
+### Detection Modes
+
+**Trigger/Safety Mode**
+- Simplest mode: fire when files match `trigger` and none match `safety`
+- Good for general checks like "source changed, verify README"
+
+**Set Mode (Bidirectional Correspondence)**
 - Define N patterns that share a common variable path
 - If ANY file matching one pattern changes, ALL corresponding files should change
 - Example: Source files and their tests
 
-**Pairs (Directional)**
+**Pair Mode (Directional Correspondence)**
 - Define a trigger pattern and one or more expected patterns
 - Changes to trigger files require corresponding expected files to also change
 - Changes to expected files alone do not trigger the policy
@@ -51,22 +62,14 @@ Special variable names:
 - `{**}` - Explicit multi-segment wildcard
 - `{*}` - Explicit single-segment wildcard
 
-### Actions
-
-Policies can specify two types of actions:
+### Action Types
 
 **Prompt Action (default)**
-```yaml
-action:
-  type: prompt
-  instructions: |
-    Please review the changes...
-```
+The markdown body of the policy file serves as instructions shown to the agent.
 
 **Command Action**
 ```yaml
 action:
-  type: command
   command: "ruff format {file}"
   run_for: each_match
 ```
@@ -156,9 +159,7 @@ The queue persists policy trigger state in `.deepwork/tmp/policy/queue/`:
 ```
 
 **Queue Cleanup**:
-- Entries older than 24 hours are automatically pruned
-- `passed` and `skipped` entries are pruned after 1 hour
-- Manual cleanup via `deepwork policy clear-queue`
+Since `.deepwork/tmp/` is gitignored, queue entries are transient local state. No aggressive cleanup is required—entries can accumulate without causing issues. The directory can be safely deleted at any time to reset state.
 
 ### Evaluator
 
@@ -296,61 +297,27 @@ When many policies trigger, the agent receives excessive output, degrading perfo
 ### Solution
 
 **1. Output Batching**
-Group related policies into single messages:
+Group related policies into concise sections:
 
 ```
 The following policies require attention:
 
-## File Correspondence Issues (3)
-
-1. **Source/Test Pairing**: src/auth/login.py changed without tests/auth/login_test.py
-2. **Source/Test Pairing**: src/api/users.py changed without tests/api/users_test.py
-3. **API Documentation**: api/users.py changed without docs/api/users.md
- 
-[FEEDBACK] - these sections should all be shorter. Ex:
-## Source/Test Pairings
-src/auth/login.py changed without tests/auth/login_test.py
-src/api/users.py changed without tests/api/users_test.py
+## Source/Test Pairing
+src/auth/login.py → tests/auth/login_test.py
+src/api/users.py → tests/api/users_test.py
 
 ## API Documentation
-api/users.py changed without docs/api/users.md
-
-## Code Quality (1)
-
-4. **README Accuracy**: Source files changed, please verify README.md
-```
-
-**2. Priority Levels**
-Policies can specify priority (critical, high, normal, low):
-
-```yaml
-- name: "Security Review"
-  trigger: "src/auth/**/*"
-  priority: critical
-```
+api/users.py → docs/api/users.md
 
-Only critical and high priority shown immediately. Normal/low shown in summary.
-
-**3. Deferred Policies**
-Low-priority policies can be deferred to end of session:
-
-```yaml
-- name: "Documentation Check"
-  trigger: "src/**/*"
-  priority: low
-  defer: true  # Show at session end, not immediately
-```
-
-**4. Collapsed Instructions**
-Long instructions are truncated with expansion available:
-
-```
 ## README Accuracy
+Source files changed. Verify README.md is accurate.
+```
 
-Source code changed. Please verify README.md is accurate.
+**2. Grouped by Policy Name**
+Multiple violations of the same policy are grouped together under a single heading, keeping output compact.
 
-[+] Show full instructions (15 lines)
-```
+**3. Minimal Decoration**
+Avoid excessive formatting, numbering, or emphasis. Use simple arrow notation for correspondence violations.
 
 ## State Persistence
 
@@ -363,7 +330,7 @@ Source code changed. Please verify README.md is accurate.
 │   ├── source-test-pairing.md
 │   ├── api-documentation.md
 │   └── python-formatting.md
-├── tmp/
+├── tmp/                     # GITIGNORED - transient state
 │   └── policy/
 │       ├── queue/           # Queue entries
 │       │   ├── abc123.queued.json
@@ -375,15 +342,17 @@ Source code changed. Please verify README.md is accurate.
 └── policy_state.json        # Session state summary
 ```
 
+**Important:** The entire `.deepwork/tmp/` directory is gitignored. All queue entries, baselines, and caches are local transient state that is not committed. This means cleanup is not critical—files can accumulate and will be naturally cleaned when the directory is deleted or the repo is re-cloned.
+
 ### Policy File Format
 
 Each policy is a markdown file with YAML frontmatter:
 
 ```markdown
 ---
+name: README Accuracy
 trigger: src/**/*.py
 safety: README.md
-priority: normal
 ---
 Instructions shown to the agent when this policy fires.
 
@@ -425,16 +394,10 @@ Multiple baselines can exist for different prompts in a session.
     ┌─────────┐   ┌─────────┐   ┌─────────┐
     │ .passed │   │ .failed │   │.skipped │
     └─────────┘   └─────────┘   └─────────┘
-         │             │             │
-         └─────────────┼─────────────┘
-                       │
-                       ▼
-                  ┌─────────┐
-                  │ Pruned  │
-                  │(cleanup)│
-                  └─────────┘
 ```
 
+Terminal states persist in `.deepwork/tmp/` (gitignored) until manually cleared or the directory is deleted.
+
 ## Error Handling
 
 ### Pattern Errors
@@ -508,10 +471,7 @@ In `.deepwork/config.yml`:
 policy:
   enabled: true
   policies_dir: .deepwork/policies  # Can be customized
-  queue_retention_hours: 24
-  max_queued_entries: 100
-  output_mode: batched  # batched, individual, summary
-  priority_threshold: normal  # Show this priority and above
+  output_mode: batched  # batched or individual
 ```
 
 ## Performance Considerations
diff --git a/doc/test_scenarios.md b/doc/test_scenarios.md
index c9460f75..0651ad2e 100644
--- a/doc/test_scenarios.md
+++ b/doc/test_scenarios.md
@@ -76,17 +76,17 @@ Setup: Branch diverged 3 commits ago from main
 
 ### 2.3 Promise Tags
 
-Policy names are now derived from filenames (without `.md` extension).
-
-| ID | Scenario | Conversation Contains | Policy File | Expected |
-|----|----------|----------------------|-------------|----------|
-| IP-2.3.1 | Exact promise | `<promise>readme-accuracy</promise>` | `readme-accuracy.md` | Suppressed |
-| IP-2.3.2 | Promise with checkmark | `<promise>✓ readme-accuracy</promise>` | `readme-accuracy.md` | Suppressed |
-| IP-2.3.3 | Case insensitive | `<promise>README-ACCURACY</promise>` | `readme-accuracy.md` | Suppressed |
-| IP-2.3.4 | Whitespace | `<promise>  readme-accuracy  </promise>` | `readme-accuracy.md` | Suppressed |
-| IP-2.3.5 | No promise | (none) | `readme-accuracy.md` | Not suppressed |
-| IP-2.3.6 | Wrong promise | `<promise>other-policy</promise>` | `readme-accuracy.md` | Not suppressed |
-| IP-2.3.7 | Multiple promises | `<promise>a</promise><promise>b</promise>` | `a.md` | Suppressed |
+Promise tags use the policy's `name` field (not filename) with a checkmark prefix for human readability.
+
+| ID | Scenario | Conversation Contains | Policy `name` | Expected |
+|----|----------|----------------------|---------------|----------|
+| IP-2.3.1 | Standard promise | `<promise>✓ README Accuracy</promise>` | `README Accuracy` | Suppressed |
+| IP-2.3.2 | Without checkmark | `<promise>README Accuracy</promise>` | `README Accuracy` | Suppressed |
+| IP-2.3.3 | Case insensitive | `<promise>✓ readme accuracy</promise>` | `README Accuracy` | Suppressed |
+| IP-2.3.4 | Whitespace | `<promise>  ✓ README Accuracy  </promise>` | `README Accuracy` | Suppressed |
+| IP-2.3.5 | No promise | (none) | `README Accuracy` | Not suppressed |
+| IP-2.3.6 | Wrong promise | `<promise>✓ Other Policy</promise>` | `README Accuracy` | Not suppressed |
+| IP-2.3.7 | Multiple promises | `<promise>✓ A</promise><promise>✓ B</promise>` | `A` | Suppressed |
 
 ## 3. Correspondence Sets
 
@@ -259,38 +259,22 @@ action:
 
 ## 7. Output Management
 
-### 7.1 Priority Ordering
-
-```
-Policies:
-- Critical: "Security Review"
-- High: "API Documentation"
-- Normal: "README Accuracy"
-- Low: "Code Style"
-```
-
-| ID | Scenario | Triggered Policies | Expected Order |
-|----|----------|-------------------|----------------|
-| OM-7.1.1 | All priorities | All 4 | Security, API, README, Style |
-| OM-7.1.2 | Mixed | High, Low | API, Style |
-| OM-7.1.3 | Same priority | 3 Normal | Alphabetical within priority |
-
-### 7.2 Output Batching
+### 7.1 Output Batching
 
 | ID | Scenario | Triggered Policies | Expected Output |
 |----|----------|-------------------|-----------------|
-| OM-7.2.1 | Single policy | 1 | Full instructions |
-| OM-7.2.2 | Two policies | 2 | Both, numbered |
-| OM-7.2.3 | Many policies | 10 | Batched with summary |
-| OM-7.2.4 | Same type | 3 Source/Test pairs | Grouped under heading |
+| OM-7.1.1 | Single policy | 1 | Full instructions |
+| OM-7.1.2 | Two policies | 2 | Both, grouped |
+| OM-7.1.3 | Many policies | 10 | Batched by policy name |
+| OM-7.1.4 | Same policy multiple files | 3 Source/Test pairs | Grouped under single heading |
 
-### 7.3 Deferred Policies
+### 7.2 Output Format
 
-| ID | Scenario | Policy defer Setting | Agent Action | Expected |
-|----|----------|---------------------|--------------|----------|
-| OM-7.3.1 | Deferred, stop | `defer: true` | Stop | Not shown |
-| OM-7.3.2 | Deferred, session end | `defer: true` | Session ends | Shown |
-| OM-7.3.3 | Not deferred | `defer: false` | Stop | Shown |
+| ID | Scenario | Input | Expected Format |
+|----|----------|-------|-----------------|
+| OM-7.2.1 | Correspondence violation | `src/foo.py` missing `tests/foo_test.py` | `src/foo.py → tests/foo_test.py` |
+| OM-7.2.2 | Multiple same policy | 3 correspondence violations | Single heading, 3 lines |
+| OM-7.2.3 | Instruction policy | Source files changed | Short summary + instructions |
 
 ## 8. Schema Validation
 
@@ -299,17 +283,17 @@ Policies:
 | ID | Scenario | Missing Field | Expected Error |
 |----|----------|---------------|----------------|
 | SV-8.1.1 | Missing name | `name` | "required field 'name'" |
-| SV-8.1.2 | Missing trigger (instruction) | `trigger` | "required 'trigger', 'set', or 'pair'" |
-| SV-8.1.3 | Missing instructions | `instructions` | "required 'instructions' or 'instructions_file'" |
+| SV-8.1.2 | Missing detection mode | no `trigger`, `set`, or `pair` | "must have 'trigger', 'set', or 'pair'" |
+| SV-8.1.3 | Missing markdown body | empty body (prompt action) | "instruction policies require markdown body" |
 | SV-8.1.4 | Missing set patterns | `set` is empty | "set requires at least 2 patterns" |
 
 ### 8.2 Mutually Exclusive Fields
 
 | ID | Scenario | Fields Present | Expected Error |
 |----|----------|----------------|----------------|
-| SV-8.2.1 | Both instructions types | `instructions` + `instructions_file` | "use one or the other" |
-| SV-8.2.2 | Both trigger types | `trigger` + `set` | "use trigger, set, or pair" |
-| SV-8.2.3 | All trigger types | `trigger` + `set` + `pair` | "use one policy type" |
+| SV-8.2.1 | Both trigger and set | `trigger` + `set` | "use trigger, set, or pair" |
+| SV-8.2.2 | Both trigger and pair | `trigger` + `pair` | "use trigger, set, or pair" |
+| SV-8.2.3 | All detection modes | `trigger` + `set` + `pair` | "use only one detection mode" |
 
 ### 8.3 Pattern Validation
 
@@ -325,8 +309,7 @@ Policies:
 | ID | Scenario | Field | Value | Expected Error |
 |----|----------|-------|-------|----------------|
 | SV-8.4.1 | Invalid compare_to | `compare_to` | `"yesterday"` | "must be base, default_tip, or prompt" |
-| SV-8.4.2 | Invalid priority | `priority` | `"urgent"` | "must be critical, high, normal, or low" |
-| SV-8.4.3 | Invalid run_for | `run_for` | `"first_match"` | "must be each_match or all_matches" |
+| SV-8.4.2 | Invalid run_for | `run_for` | `"first_match"` | "must be each_match or all_matches" |
 
 ## 9. Integration Tests
 
@@ -439,6 +422,7 @@ Policies are stored as individual markdown files in `.deepwork/policies/`:
 **`.deepwork/policies/readme-accuracy.md`**
 ```markdown
 ---
+name: README Accuracy
 trigger: src/**/*
 safety: README.md
 ---
@@ -448,6 +432,7 @@ Please review README.md for accuracy.
 **`.deepwork/policies/source-test-pairing.md`**
 ```markdown
 ---
+name: Source/Test Pairing
 set:
   - src/{path}.py
   - tests/{path}_test.py
@@ -458,6 +443,7 @@ Source and test should change together.
 **`.deepwork/policies/api-documentation.md`**
 ```markdown
 ---
+name: API Documentation
 pair:
   trigger: api/{module}.py
   expects: docs/api/{module}.md
@@ -468,6 +454,7 @@ API changes need documentation.
 **`.deepwork/policies/python-formatting.md`**
 ```markdown
 ---
+name: Python Formatting
 trigger: "**/*.py"
 action:
   command: black {file}
@@ -480,7 +467,8 @@ Auto-formats Python files with Black.
 
 ```json
 {
-  "policy_name": "source-test-pairing",
+  "policy_name": "Source/Test Pairing",
+  "policy_file": "source-test-pairing.md",
   "trigger_hash": "abc123def456",
   "status": "queued",
   "created_at": "2024-01-16T10:00:00Z",
@@ -502,7 +490,7 @@ Auto-formats Python files with Black.
 │   ├── source-test-pairing.md
 │   ├── api-documentation.md
 │   └── python-formatting.md
-└── tmp/
+└── tmp/                         # GITIGNORED
     └── policy/
         └── queue/
             └── (queue entries created during tests)

From cd0597eec50cb6a385a40bbeebbfa7a9724053f1 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Fri, 16 Jan 2026 18:41:30 +0000
Subject: [PATCH 04/21] Remove idempotency verification and unused output_mode
 config

- Don't enforce idempotency, just document it as expected behavior
- Give lint formatters (black, ruff, prettier) as good examples
- Remove output_mode from config (not referenced elsewhere)
- Remove idempotency verification test scenarios
---
 doc/policy_syntax.md        |  6 +-----
 doc/policy_system_design.md |  3 +--
 doc/test_scenarios.md       | 18 +++++-------------
 3 files changed, 7 insertions(+), 20 deletions(-)

diff --git a/doc/policy_syntax.md b/doc/policy_syntax.md
index 382e9669..4914a8da 100644
--- a/doc/policy_syntax.md
+++ b/doc/policy_syntax.md
@@ -237,11 +237,7 @@ action:
 
 **Idempotency Requirement:**
 
-Commands MUST be idempotent. The system verifies this by:
-1. Running the command
-2. Checking for changes
-3. If changes occurred, running again
-4. If more changes occur, marking as failed
+Commands should be idempotent—running them multiple times produces the same result. Lint formatters like `black`, `ruff format`, and `prettier` are good examples: they produce consistent output regardless of how many times they run.
 
 ## Pattern Syntax
 
diff --git a/doc/policy_system_design.md b/doc/policy_system_design.md
index 93f49896..d15e65be 100644
--- a/doc/policy_system_design.md
+++ b/doc/policy_system_design.md
@@ -74,7 +74,7 @@ action:
   run_for: each_match
 ```
 
-Command actions execute idempotent commands. The system verifies idempotency by running the command twice and checking that no additional changes occur.
+Command actions should be idempotent—running them multiple times produces the same result. Lint formatters like `black`, `ruff format`, and `prettier` are good examples.
 
 ## Architecture
 
@@ -471,7 +471,6 @@ In `.deepwork/config.yml`:
 policy:
   enabled: true
   policies_dir: .deepwork/policies  # Can be customized
-  output_mode: batched  # batched or individual
 ```
 
 ## Performance Considerations
diff --git a/doc/test_scenarios.md b/doc/test_scenarios.md
index 0651ad2e..9ef03c0a 100644
--- a/doc/test_scenarios.md
+++ b/doc/test_scenarios.md
@@ -199,22 +199,14 @@ action:
 | CMD-5.2.1 | Multiple files | `["a.js", "b.js", "c.js"]` | `eslint --fix a.js b.js c.js` |
 | CMD-5.2.2 | Single file | `["a.js"]` | `eslint --fix a.js` |
 
-### 5.3 Idempotency Verification
-
-| ID | Scenario | First Run | Second Run | Expected Result |
-|----|----------|-----------|------------|-----------------|
-| CMD-5.3.1 | Truly idempotent | Changes files | No changes | Pass |
-| CMD-5.3.2 | Not idempotent | Changes files | Changes files | Fail |
-| CMD-5.3.3 | No changes needed | No changes | (not run) | Pass |
-
-### 5.4 Command Errors
+### 5.3 Command Errors
 
 | ID | Scenario | Command Result | Expected |
 |----|----------|----------------|----------|
-| CMD-5.4.1 | Exit code 0 | Success | Pass |
-| CMD-5.4.2 | Exit code 1 | Failure | Fail, show stderr |
-| CMD-5.4.3 | Timeout | Command hangs | Fail, timeout error |
-| CMD-5.4.4 | Command not found | Not executable | Fail, not found error |
+| CMD-5.3.1 | Exit code 0 | Success | Pass |
+| CMD-5.3.2 | Exit code 1 | Failure | Fail, show stderr |
+| CMD-5.3.3 | Timeout | Command hangs | Fail, timeout error |
+| CMD-5.3.4 | Command not found | Not executable | Fail, not found error |
 
 ## 6. Queue System
 

From 4d9b5e95283c0fc26e6d3050df5ee0c85ace7cb0 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Fri, 16 Jan 2026 18:53:10 +0000
Subject: [PATCH 05/21] Implement policy system v2 with sets, pairs, and
 command actions

This implements the redesigned policy system with:

- Detection modes: trigger/safety (default), set (bidirectional), pair (directional)
- Action types: prompt (show instructions), command (run idempotent command)
- Variable pattern matching: {path} for multi-segment, {name} for single-segment
- Queue system in .deepwork/tmp/policy/queue/ for state tracking
- Frontmatter markdown format for policy files in .deepwork/policies/

New core modules:
- pattern_matcher.py: Variable pattern matching with regex
- policy_queue.py: Queue system for policy state persistence
- command_executor.py: Command action execution with substitution

Updates to existing modules:
- policy_parser.py: v2 Policy class with detection modes and action types
- policy_check.py: Uses new v2 system with queue deduplication
- evaluate_policies.py: Updated for v1 backward compatibility
- policy_schema.py: New frontmatter schema for v2 format

Tests updated to work with both v1 and v2 APIs.
---
 src/deepwork/core/command_executor.py   | 169 +++++++
 src/deepwork/core/pattern_matcher.py    | 271 +++++++++++
 src/deepwork/core/policy_parser.py      | 616 ++++++++++++++++++------
 src/deepwork/core/policy_queue.py       | 321 ++++++++++++
 src/deepwork/hooks/evaluate_policies.py |  44 +-
 src/deepwork/hooks/policy_check.py      | 229 +++++++--
 src/deepwork/schemas/policy_schema.py   | 112 ++++-
 tests/unit/test_evaluate_policies.py    |  10 +-
 tests/unit/test_policy_parser.py        |  79 ++-
 9 files changed, 1638 insertions(+), 213 deletions(-)
 create mode 100644 src/deepwork/core/command_executor.py
 create mode 100644 src/deepwork/core/pattern_matcher.py
 create mode 100644 src/deepwork/core/policy_queue.py

diff --git a/src/deepwork/core/command_executor.py b/src/deepwork/core/command_executor.py
new file mode 100644
index 00000000..7db8ee2a
--- /dev/null
+++ b/src/deepwork/core/command_executor.py
@@ -0,0 +1,169 @@
+"""Execute command actions for policies."""
+
+import subprocess
+from dataclasses import dataclass
+from pathlib import Path
+
+from deepwork.core.policy_parser import CommandAction
+
+
+@dataclass
+class CommandResult:
+    """Result of executing a command."""
+
+    success: bool
+    exit_code: int
+    stdout: str
+    stderr: str
+    command: str  # The actual command that was run
+
+
+def substitute_command_variables(
+    command_template: str,
+    file: str | None = None,
+    files: list[str] | None = None,
+    repo_root: Path | None = None,
+) -> str:
+    """
+    Substitute template variables in a command string.
+
+    Variables:
+    - {file} - Single file path
+    - {files} - Space-separated file paths
+    - {repo_root} - Repository root directory
+
+    Args:
+        command_template: Command string with {var} placeholders
+        file: Single file path (for run_for: each_match)
+        files: List of file paths (for run_for: all_matches)
+        repo_root: Repository root path
+
+    Returns:
+        Command string with variables substituted
+    """
+    result = command_template
+
+    if file is not None:
+        result = result.replace("{file}", file)
+
+    if files is not None:
+        result = result.replace("{files}", " ".join(files))
+
+    if repo_root is not None:
+        result = result.replace("{repo_root}", str(repo_root))
+
+    return result
+
+
+def execute_command(
+    command: str,
+    cwd: Path | None = None,
+    timeout: int = 60,
+) -> CommandResult:
+    """
+    Execute a command and capture output.
+
+    Args:
+        command: Command string to execute
+        cwd: Working directory (defaults to current directory)
+        timeout: Timeout in seconds
+
+    Returns:
+        CommandResult with execution details
+    """
+    try:
+        # Run command as shell to support pipes, etc.
+        result = subprocess.run(
+            command,
+            shell=True,
+            cwd=cwd,
+            capture_output=True,
+            text=True,
+            timeout=timeout,
+        )
+
+        return CommandResult(
+            success=result.returncode == 0,
+            exit_code=result.returncode,
+            stdout=result.stdout,
+            stderr=result.stderr,
+            command=command,
+        )
+
+    except subprocess.TimeoutExpired:
+        return CommandResult(
+            success=False,
+            exit_code=-1,
+            stdout="",
+            stderr=f"Command timed out after {timeout} seconds",
+            command=command,
+        )
+    except Exception as e:
+        return CommandResult(
+            success=False,
+            exit_code=-1,
+            stdout="",
+            stderr=str(e),
+            command=command,
+        )
+
+
+def run_command_action(
+    action: CommandAction,
+    trigger_files: list[str],
+    repo_root: Path | None = None,
+) -> list[CommandResult]:
+    """
+    Run a command action for the given trigger files.
+
+    Args:
+        action: CommandAction configuration
+        trigger_files: Files that triggered the policy
+        repo_root: Repository root path
+
+    Returns:
+        List of CommandResult (one per command execution)
+    """
+    results: list[CommandResult] = []
+
+    if action.run_for == "each_match":
+        # Run command for each file individually
+        for file_path in trigger_files:
+            command = substitute_command_variables(
+                action.command,
+                file=file_path,
+                repo_root=repo_root,
+            )
+            result = execute_command(command, cwd=repo_root)
+            results.append(result)
+
+    elif action.run_for == "all_matches":
+        # Run command once with all files
+        command = substitute_command_variables(
+            action.command,
+            files=trigger_files,
+            repo_root=repo_root,
+        )
+        result = execute_command(command, cwd=repo_root)
+        results.append(result)
+
+    return results
+
+
+def all_commands_succeeded(results: list[CommandResult]) -> bool:
+    """Check if all command executions succeeded."""
+    return all(r.success for r in results)
+
+
+def format_command_errors(results: list[CommandResult]) -> str:
+    """Format error messages from failed commands."""
+    errors: list[str] = []
+    for result in results:
+        if not result.success:
+            msg = f"Command failed: {result.command}\n"
+            if result.stderr:
+                msg += f"Error: {result.stderr}\n"
+            if result.exit_code != 0:
+                msg += f"Exit code: {result.exit_code}\n"
+            errors.append(msg)
+    return "\n".join(errors)
diff --git a/src/deepwork/core/pattern_matcher.py b/src/deepwork/core/pattern_matcher.py
new file mode 100644
index 00000000..215b1d9a
--- /dev/null
+++ b/src/deepwork/core/pattern_matcher.py
@@ -0,0 +1,271 @@
+"""Pattern matching with variable extraction for policy file correspondence."""
+
+import re
+from dataclasses import dataclass
+from fnmatch import fnmatch
+
+
+class PatternError(Exception):
+    """Exception raised for invalid pattern syntax."""
+
+    pass
+
+
+@dataclass
+class MatchResult:
+    """Result of matching a file against a pattern."""
+
+    matched: bool
+    variables: dict[str, str]  # Captured variable values
+
+    @classmethod
+    def no_match(cls) -> "MatchResult":
+        return cls(matched=False, variables={})
+
+    @classmethod
+    def match(cls, variables: dict[str, str] | None = None) -> "MatchResult":
+        return cls(matched=True, variables=variables or {})
+
+
+def validate_pattern(pattern: str) -> None:
+    """
+    Validate pattern syntax.
+
+    Raises:
+        PatternError: If pattern has invalid syntax
+    """
+    # Check for unbalanced braces
+    brace_depth = 0
+    for i, char in enumerate(pattern):
+        if char == "{":
+            brace_depth += 1
+        elif char == "}":
+            brace_depth -= 1
+            if brace_depth < 0:
+                raise PatternError(f"Unmatched closing brace at position {i}")
+
+    if brace_depth > 0:
+        raise PatternError("Unclosed brace in pattern")
+
+    # Extract and validate variable names
+    var_pattern = r"\{([^}]*)\}"
+    seen_vars: set[str] = set()
+
+    for match in re.finditer(var_pattern, pattern):
+        var_name = match.group(1)
+
+        # Check for empty variable name
+        if not var_name:
+            raise PatternError("Empty variable name in pattern")
+
+        # Strip leading ** or * for validation
+        clean_name = var_name.lstrip("*")
+        if not clean_name:
+            # Just {*} or {**} is valid
+            continue
+
+        # Check for invalid characters in variable name
+        if "/" in clean_name or "\\" in clean_name:
+            raise PatternError(f"Invalid character in variable name: {var_name}")
+
+        # Check for duplicates (use clean name for comparison)
+        if clean_name in seen_vars:
+            raise PatternError(f"Duplicate variable: {clean_name}")
+        seen_vars.add(clean_name)
+
+
+def pattern_to_regex(pattern: str) -> tuple[str, list[str]]:
+    """
+    Convert a pattern with {var} placeholders to a regex.
+
+    Variables:
+    - {path} or {**name} - Matches multiple path segments (.+)
+    - {name} or {*name} - Matches single path segment ([^/]+)
+
+    Args:
+        pattern: Pattern string like "src/{path}.py"
+
+    Returns:
+        Tuple of (regex_pattern, list_of_variable_names)
+
+    Raises:
+        PatternError: If pattern has invalid syntax
+    """
+    validate_pattern(pattern)
+
+    # Normalize path separators
+    pattern = pattern.replace("\\", "/")
+
+    result: list[str] = []
+    var_names: list[str] = []
+    pos = 0
+
+    # Parse pattern segments
+    while pos < len(pattern):
+        # Look for next variable
+        brace_start = pattern.find("{", pos)
+
+        if brace_start == -1:
+            # No more variables, escape the rest
+            result.append(re.escape(pattern[pos:]))
+            break
+
+        # Escape literal part before variable
+        if brace_start > pos:
+            result.append(re.escape(pattern[pos:brace_start]))
+
+        # Find end of variable
+        brace_end = pattern.find("}", brace_start)
+        if brace_end == -1:
+            raise PatternError("Unclosed brace in pattern")
+
+        var_spec = pattern[brace_start + 1 : brace_end]
+
+        # Determine variable type and name
+        if var_spec.startswith("**"):
+            # Explicit multi-segment: {**name}
+            var_name = var_spec[2:] or "path"
+            regex_part = "(?P<{}>.+)".format(re.escape(var_name))
+        elif var_spec.startswith("*"):
+            # Explicit single-segment: {*name}
+            var_name = var_spec[1:] or "name"
+            regex_part = "(?P<{}>[^/]+)".format(re.escape(var_name))
+        elif var_spec == "path":
+            # Conventional multi-segment
+            var_name = "path"
+            regex_part = "(?P<path>.+)"
+        else:
+            # Default single-segment (including custom names)
+            var_name = var_spec
+            regex_part = "(?P<{}>[^/]+)".format(re.escape(var_name))
+
+        result.append(regex_part)
+        var_names.append(var_name)
+        pos = brace_end + 1
+
+    return "^" + "".join(result) + "$", var_names
+
+
+def match_pattern(pattern: str, filepath: str) -> MatchResult:
+    """
+    Match a filepath against a pattern, extracting variables.
+
+    Args:
+        pattern: Pattern with {var} placeholders
+        filepath: File path to match
+
+    Returns:
+        MatchResult with matched=True and captured variables, or matched=False
+    """
+    # Normalize path separators
+    filepath = filepath.replace("\\", "/")
+
+    try:
+        regex, _ = pattern_to_regex(pattern)
+    except PatternError:
+        return MatchResult.no_match()
+
+    match = re.fullmatch(regex, filepath)
+    if match:
+        return MatchResult.match(match.groupdict())
+    return MatchResult.no_match()
+
+
+def resolve_pattern(pattern: str, variables: dict[str, str]) -> str:
+    """
+    Substitute variables into a pattern to generate a filepath.
+
+    Args:
+        pattern: Pattern with {var} placeholders
+        variables: Dict of variable name -> value
+
+    Returns:
+        Resolved filepath string
+    """
+    result = pattern
+    for name, value in variables.items():
+        # Handle both {name} and {*name} / {**name} forms
+        result = result.replace(f"{{{name}}}", value)
+        result = result.replace(f"{{*{name}}}", value)
+        result = result.replace(f"{{**{name}}}", value)
+    return result
+
+
+def matches_glob(file_path: str, pattern: str) -> bool:
+    """
+    Match a file path against a glob pattern, supporting ** for recursive matching.
+
+    This is for simple glob patterns without variable capture.
+
+    Args:
+        file_path: File path to check
+        pattern: Glob pattern (supports *, **, ?)
+
+    Returns:
+        True if matches
+    """
+    # Normalize path separators
+    file_path = file_path.replace("\\", "/")
+    pattern = pattern.replace("\\", "/")
+
+    # Handle ** patterns (recursive directory matching)
+    if "**" in pattern:
+        # Split pattern by **
+        parts = pattern.split("**")
+
+        if len(parts) == 2:
+            prefix, suffix = parts[0], parts[1]
+
+            # Remove leading/trailing slashes from suffix
+            suffix = suffix.lstrip("/")
+
+            # Check if prefix matches the start of the path
+            if prefix:
+                prefix = prefix.rstrip("/")
+                if not file_path.startswith(prefix + "/") and file_path != prefix:
+                    return False
+                # Get the remaining path after prefix
+                remaining = file_path[len(prefix) :].lstrip("/")
+            else:
+                remaining = file_path
+
+            # If no suffix, any remaining path matches
+            if not suffix:
+                return True
+
+            # Check if suffix matches the end of any remaining path segment
+            remaining_parts = remaining.split("/")
+            for i in range(len(remaining_parts)):
+                test_path = "/".join(remaining_parts[i:])
+                if fnmatch(test_path, suffix):
+                    return True
+                # Also try just the filename
+                if fnmatch(remaining_parts[-1], suffix):
+                    return True
+
+            return False
+
+    # Simple pattern without **
+    return fnmatch(file_path, pattern)
+
+
+def matches_any_pattern(file_path: str, patterns: list[str]) -> bool:
+    """
+    Check if a file path matches any of the given glob patterns.
+
+    Args:
+        file_path: File path to check (relative path)
+        patterns: List of glob patterns to match against
+
+    Returns:
+        True if the file matches any pattern
+    """
+    for pattern in patterns:
+        if matches_glob(file_path, pattern):
+            return True
+    return False
+
+
+def has_variables(pattern: str) -> bool:
+    """Check if a pattern contains variable placeholders."""
+    return "{" in pattern and "}" in pattern
diff --git a/src/deepwork/core/policy_parser.py b/src/deepwork/core/policy_parser.py
index b6ade990..f1c5a288 100644
--- a/src/deepwork/core/policy_parser.py
+++ b/src/deepwork/core/policy_parser.py
@@ -1,13 +1,19 @@
-"""Policy definition parser."""
+"""Policy definition parser (v2 - frontmatter markdown format)."""
 
 from dataclasses import dataclass, field
-from fnmatch import fnmatch
+from enum import Enum
 from pathlib import Path
 from typing import Any
 
 import yaml
 
-from deepwork.schemas.policy_schema import POLICY_SCHEMA
+from deepwork.core.pattern_matcher import (
+    has_variables,
+    match_pattern,
+    matches_any_pattern,
+    resolve_pattern,
+)
+from deepwork.schemas.policy_schema import POLICY_FRONTMATTER_SCHEMA, POLICY_SCHEMA
 from deepwork.utils.validation import ValidationError, validate_against_schema
 
 
@@ -17,175 +23,309 @@ class PolicyParseError(Exception):
     pass
 
 
+class DetectionMode(Enum):
+    """How the policy detects when to fire."""
+
+    TRIGGER_SAFETY = "trigger_safety"  # Fire when trigger matches, safety doesn't
+    SET = "set"  # Bidirectional file correspondence
+    PAIR = "pair"  # Directional file correspondence
+
+
+class ActionType(Enum):
+    """What happens when the policy fires."""
+
+    PROMPT = "prompt"  # Show instructions to agent (default)
+    COMMAND = "command"  # Run an idempotent command
+
+
 # Valid compare_to values
 COMPARE_TO_VALUES = frozenset({"base", "default_tip", "prompt"})
 DEFAULT_COMPARE_TO = "base"
 
 
+@dataclass
+class CommandAction:
+    """Configuration for command action."""
+
+    command: str  # Command template (supports {file}, {files}, {repo_root})
+    run_for: str = "each_match"  # "each_match" or "all_matches"
+
+
+@dataclass
+class PairConfig:
+    """Configuration for pair detection mode."""
+
+    trigger: str  # Pattern that triggers
+    expects: list[str]  # Patterns for expected corresponding files
+
+
 @dataclass
 class Policy:
-    """Represents a single policy definition."""
+    """Represents a single policy definition (v2 format)."""
 
-    name: str
-    triggers: list[str]  # Normalized to list
-    safety: list[str] = field(default_factory=list)  # Normalized to list, empty if not specified
-    instructions: str = ""  # Resolved content (either inline or from file)
-    compare_to: str = DEFAULT_COMPARE_TO  # What to compare against: base, default_tip, or prompt
+    # Identity
+    name: str  # Human-friendly name (displayed in promise tags)
+    filename: str  # Filename without .md extension (used for queue)
+
+    # Detection mode (exactly one must be set)
+    detection_mode: DetectionMode
+    triggers: list[str] = field(default_factory=list)  # For TRIGGER_SAFETY mode
+    safety: list[str] = field(default_factory=list)  # For TRIGGER_SAFETY mode
+    set_patterns: list[str] = field(default_factory=list)  # For SET mode
+    pair_config: PairConfig | None = None  # For PAIR mode
+
+    # Action type
+    action_type: ActionType = ActionType.PROMPT
+    instructions: str = ""  # For PROMPT action (markdown body)
+    command_action: CommandAction | None = None  # For COMMAND action
+
+    # Common options
+    compare_to: str = DEFAULT_COMPARE_TO
 
     @classmethod
-    def from_dict(cls, data: dict[str, Any], base_dir: Path | None = None) -> "Policy":
+    def from_frontmatter(
+        cls,
+        frontmatter: dict[str, Any],
+        markdown_body: str,
+        filename: str,
+    ) -> "Policy":
         """
-        Create Policy from dictionary.
+        Create Policy from parsed frontmatter and markdown body.
 
         Args:
-            data: Parsed YAML data for a single policy
-            base_dir: Base directory for resolving instructions_file paths
+            frontmatter: Parsed YAML frontmatter
+            markdown_body: Markdown content after frontmatter
+            filename: Filename without .md extension
 
         Returns:
             Policy instance
 
         Raises:
-            PolicyParseError: If instructions cannot be resolved
+            PolicyParseError: If validation fails
         """
-        # Normalize trigger to list
-        trigger = data["trigger"]
-        triggers = [trigger] if isinstance(trigger, str) else list(trigger)
-
-        # Normalize safety to list (empty if not present)
-        safety_data = data.get("safety", [])
-        safety = [safety_data] if isinstance(safety_data, str) else list(safety_data)
+        # Get name (required)
+        name = frontmatter.get("name", "")
+        if not name:
+            raise PolicyParseError(f"Policy '{filename}' missing required 'name' field")
+
+        # Determine detection mode
+        has_trigger = "trigger" in frontmatter
+        has_set = "set" in frontmatter
+        has_pair = "pair" in frontmatter
+
+        mode_count = sum([has_trigger, has_set, has_pair])
+        if mode_count == 0:
+            raise PolicyParseError(
+                f"Policy '{name}' must have 'trigger', 'set', or 'pair'"
+            )
+        if mode_count > 1:
+            raise PolicyParseError(
+                f"Policy '{name}' has multiple detection modes - use only one"
+            )
 
-        # Resolve instructions
-        if "instructions" in data:
-            instructions = data["instructions"]
-        elif "instructions_file" in data:
-            if base_dir is None:
+        # Parse based on detection mode
+        detection_mode: DetectionMode
+        triggers: list[str] = []
+        safety: list[str] = []
+        set_patterns: list[str] = []
+        pair_config: PairConfig | None = None
+
+        if has_trigger:
+            detection_mode = DetectionMode.TRIGGER_SAFETY
+            trigger = frontmatter["trigger"]
+            triggers = [trigger] if isinstance(trigger, str) else list(trigger)
+            safety_data = frontmatter.get("safety", [])
+            safety = [safety_data] if isinstance(safety_data, str) else list(safety_data)
+
+        elif has_set:
+            detection_mode = DetectionMode.SET
+            set_patterns = list(frontmatter["set"])
+            if len(set_patterns) < 2:
                 raise PolicyParseError(
-                    f"Policy '{data['name']}' uses instructions_file but no base_dir provided"
+                    f"Policy '{name}' set requires at least 2 patterns"
                 )
-            instructions_path = base_dir / data["instructions_file"]
-            if not instructions_path.exists():
+
+        elif has_pair:
+            detection_mode = DetectionMode.PAIR
+            pair_data = frontmatter["pair"]
+            expects = pair_data["expects"]
+            expects_list = [expects] if isinstance(expects, str) else list(expects)
+            pair_config = PairConfig(
+                trigger=pair_data["trigger"],
+                expects=expects_list,
+            )
+
+        # Determine action type
+        action_type: ActionType
+        command_action: CommandAction | None = None
+
+        if "action" in frontmatter:
+            action_type = ActionType.COMMAND
+            action_data = frontmatter["action"]
+            command_action = CommandAction(
+                command=action_data["command"],
+                run_for=action_data.get("run_for", "each_match"),
+            )
+        else:
+            action_type = ActionType.PROMPT
+            # Markdown body is the instructions
+            if not markdown_body.strip():
                 raise PolicyParseError(
-                    f"Policy '{data['name']}' instructions file not found: {instructions_path}"
+                    f"Policy '{name}' with prompt action requires markdown body"
                 )
-            try:
-                instructions = instructions_path.read_text()
-            except Exception as e:
-                raise PolicyParseError(
-                    f"Policy '{data['name']}' failed to read instructions file: {e}"
-                ) from e
-        else:
-            # Schema should catch this, but be defensive
-            raise PolicyParseError(
-                f"Policy '{data['name']}' must have either 'instructions' or 'instructions_file'"
-            )
 
-        # Get compare_to (defaults to DEFAULT_COMPARE_TO)
-        compare_to = data.get("compare_to", DEFAULT_COMPARE_TO)
+        # Get compare_to
+        compare_to = frontmatter.get("compare_to", DEFAULT_COMPARE_TO)
 
         return cls(
-            name=data["name"],
+            name=name,
+            filename=filename,
+            detection_mode=detection_mode,
             triggers=triggers,
             safety=safety,
-            instructions=instructions,
+            set_patterns=set_patterns,
+            pair_config=pair_config,
+            action_type=action_type,
+            instructions=markdown_body.strip(),
+            command_action=command_action,
             compare_to=compare_to,
         )
 
 
-def matches_pattern(file_path: str, patterns: list[str]) -> bool:
+def parse_frontmatter_file(filepath: Path) -> tuple[dict[str, Any], str]:
     """
-    Check if a file path matches any of the given glob patterns.
+    Parse a markdown file with YAML frontmatter.
 
     Args:
-        file_path: File path to check (relative path)
-        patterns: List of glob patterns to match against
+        filepath: Path to .md file
 
     Returns:
-        True if the file matches any pattern
+        Tuple of (frontmatter_dict, markdown_body)
+
+    Raises:
+        PolicyParseError: If parsing fails
     """
-    for pattern in patterns:
-        if _matches_glob(file_path, pattern):
-            return True
-    return False
+    try:
+        content = filepath.read_text(encoding="utf-8")
+    except OSError as e:
+        raise PolicyParseError(f"Failed to read policy file: {e}") from e
+
+    # Split frontmatter from body
+    if not content.startswith("---"):
+        raise PolicyParseError(
+            f"Policy file '{filepath.name}' must start with '---' frontmatter delimiter"
+        )
+
+    # Find end of frontmatter
+    end_marker = content.find("\n---", 3)
+    if end_marker == -1:
+        raise PolicyParseError(
+            f"Policy file '{filepath.name}' missing closing '---' frontmatter delimiter"
+        )
+
+    frontmatter_str = content[4:end_marker]  # Skip initial "---\n"
+    markdown_body = content[end_marker + 4 :]  # Skip "\n---\n" or "\n---"
+
+    # Parse YAML frontmatter
+    try:
+        frontmatter = yaml.safe_load(frontmatter_str)
+    except yaml.YAMLError as e:
+        raise PolicyParseError(
+            f"Invalid YAML frontmatter in '{filepath.name}': {e}"
+        ) from e
 
+    if frontmatter is None:
+        frontmatter = {}
 
-def _matches_glob(file_path: str, pattern: str) -> bool:
+    if not isinstance(frontmatter, dict):
+        raise PolicyParseError(
+            f"Frontmatter in '{filepath.name}' must be a mapping, got {type(frontmatter).__name__}"
+        )
+
+    return frontmatter, markdown_body
+
+
+def parse_policy_file_v2(filepath: Path) -> Policy:
     """
-    Match a file path against a glob pattern, supporting ** for recursive matching.
+    Parse a single policy from a frontmatter markdown file.
 
     Args:
-        file_path: File path to check
-        pattern: Glob pattern (supports *, **, ?)
+        filepath: Path to .md file in .deepwork/policies/
 
     Returns:
-        True if matches
-    """
-    # Normalize path separators
-    file_path = file_path.replace("\\", "/")
-    pattern = pattern.replace("\\", "/")
-
-    # Handle ** patterns (recursive directory matching)
-    if "**" in pattern:
-        # Split pattern by **
-        parts = pattern.split("**")
-
-        if len(parts) == 2:
-            prefix, suffix = parts[0], parts[1]
-
-            # Remove leading/trailing slashes from suffix
-            suffix = suffix.lstrip("/")
-
-            # Check if prefix matches the start of the path
-            if prefix:
-                prefix = prefix.rstrip("/")
-                if not file_path.startswith(prefix + "/") and file_path != prefix:
-                    return False
-                # Get the remaining path after prefix
-                remaining = file_path[len(prefix) :].lstrip("/")
-            else:
-                remaining = file_path
-
-            # If no suffix, any remaining path matches
-            if not suffix:
-                return True
-
-            # Check if suffix matches the end of any remaining path segment
-            # For pattern "src/**/*.py", suffix is "*.py"
-            # We need to match *.py against the filename portion
-            remaining_parts = remaining.split("/")
-            for i in range(len(remaining_parts)):
-                test_path = "/".join(remaining_parts[i:])
-                if fnmatch(test_path, suffix):
-                    return True
-                # Also try just the filename
-                if fnmatch(remaining_parts[-1], suffix):
-                    return True
-
-            return False
-
-    # Simple pattern without **
-    return fnmatch(file_path, pattern)
-
-
-def evaluate_policy(policy: Policy, changed_files: list[str]) -> bool:
+        Parsed Policy object
+
+    Raises:
+        PolicyParseError: If parsing or validation fails
     """
-    Evaluate whether a policy should fire based on changed files.
+    if not filepath.exists():
+        raise PolicyParseError(f"Policy file does not exist: {filepath}")
 
-    A policy fires if:
-    - At least one changed file matches a trigger pattern
-    - AND no changed file matches a safety pattern
+    if not filepath.is_file():
+        raise PolicyParseError(f"Policy path is not a file: {filepath}")
+
+    frontmatter, markdown_body = parse_frontmatter_file(filepath)
+
+    # Validate against schema
+    try:
+        validate_against_schema(frontmatter, POLICY_FRONTMATTER_SCHEMA)
+    except ValidationError as e:
+        raise PolicyParseError(
+            f"Policy '{filepath.name}' validation failed: {e}"
+        ) from e
+
+    # Create Policy object
+    filename = filepath.stem  # filename without .md extension
+    return Policy.from_frontmatter(frontmatter, markdown_body, filename)
+
+
+def load_policies_from_directory(policies_dir: Path) -> list[Policy]:
+    """
+    Load all policies from a directory.
 
     Args:
-        policy: Policy to evaluate
-        changed_files: List of changed file paths (relative)
+        policies_dir: Path to .deepwork/policies/ directory
 
     Returns:
-        True if the policy should fire
+        List of parsed Policy objects (sorted by filename)
+
+    Raises:
+        PolicyParseError: If any policy file fails to parse
+    """
+    if not policies_dir.exists():
+        return []
+
+    if not policies_dir.is_dir():
+        raise PolicyParseError(f"Policies path is not a directory: {policies_dir}")
+
+    policies = []
+    for filepath in sorted(policies_dir.glob("*.md")):
+        policy = parse_policy_file_v2(filepath)
+        policies.append(policy)
+
+    return policies
+
+
+# =============================================================================
+# Evaluation Logic
+# =============================================================================
+
+
+def evaluate_trigger_safety(
+    policy: Policy,
+    changed_files: list[str],
+) -> bool:
+    """
+    Evaluate a trigger/safety mode policy.
+
+    Returns True if policy should fire:
+    - At least one changed file matches a trigger pattern
+    - AND no changed file matches a safety pattern
     """
     # Check if any trigger matches
     trigger_matched = False
     for file_path in changed_files:
-        if matches_pattern(file_path, policy.triggers):
+        if matches_any_pattern(file_path, policy.triggers):
             trigger_matched = True
             break
 
@@ -195,18 +335,165 @@ def evaluate_policy(policy: Policy, changed_files: list[str]) -> bool:
     # Check if any safety pattern matches
     if policy.safety:
         for file_path in changed_files:
-            if matches_pattern(file_path, policy.safety):
-                # Safety file was also changed, don't fire
+            if matches_any_pattern(file_path, policy.safety):
                 return False
 
     return True
 
 
+def evaluate_set_correspondence(
+    policy: Policy,
+    changed_files: list[str],
+) -> tuple[bool, list[str], list[str]]:
+    """
+    Evaluate a set (bidirectional correspondence) policy.
+
+    Returns:
+        Tuple of (should_fire, trigger_files, missing_files)
+        - should_fire: True if correspondence is incomplete
+        - trigger_files: Files that triggered (matched a pattern)
+        - missing_files: Expected files that didn't change
+    """
+    trigger_files: list[str] = []
+    missing_files: list[str] = []
+    changed_set = set(changed_files)
+
+    for file_path in changed_files:
+        # Check each pattern in the set
+        for pattern in policy.set_patterns:
+            result = match_pattern(pattern, file_path)
+            if result.matched:
+                trigger_files.append(file_path)
+
+                # Check if all other corresponding files also changed
+                for other_pattern in policy.set_patterns:
+                    if other_pattern == pattern:
+                        continue
+
+                    if has_variables(other_pattern):
+                        expected = resolve_pattern(other_pattern, result.variables)
+                    else:
+                        expected = other_pattern
+
+                    if expected not in changed_set:
+                        if expected not in missing_files:
+                            missing_files.append(expected)
+
+                break  # Only match one pattern per file
+
+    # Policy fires if there are trigger files with missing correspondences
+    should_fire = len(trigger_files) > 0 and len(missing_files) > 0
+    return should_fire, trigger_files, missing_files
+
+
+def evaluate_pair_correspondence(
+    policy: Policy,
+    changed_files: list[str],
+) -> tuple[bool, list[str], list[str]]:
+    """
+    Evaluate a pair (directional correspondence) policy.
+
+    Only trigger-side changes require corresponding expected files.
+    Expected-side changes alone do not trigger.
+
+    Returns:
+        Tuple of (should_fire, trigger_files, missing_files)
+    """
+    if policy.pair_config is None:
+        return False, [], []
+
+    trigger_files: list[str] = []
+    missing_files: list[str] = []
+    changed_set = set(changed_files)
+
+    trigger_pattern = policy.pair_config.trigger
+    expects_patterns = policy.pair_config.expects
+
+    for file_path in changed_files:
+        # Only check trigger pattern (directional)
+        result = match_pattern(trigger_pattern, file_path)
+        if result.matched:
+            trigger_files.append(file_path)
+
+            # Check if all expected files also changed
+            for expects_pattern in expects_patterns:
+                if has_variables(expects_pattern):
+                    expected = resolve_pattern(expects_pattern, result.variables)
+                else:
+                    expected = expects_pattern
+
+                if expected not in changed_set:
+                    if expected not in missing_files:
+                        missing_files.append(expected)
+
+    should_fire = len(trigger_files) > 0 and len(missing_files) > 0
+    return should_fire, trigger_files, missing_files
+
+
+@dataclass
+class PolicyEvaluationResult:
+    """Result of evaluating a single policy."""
+
+    policy: Policy
+    should_fire: bool
+    trigger_files: list[str] = field(default_factory=list)
+    missing_files: list[str] = field(default_factory=list)  # For set/pair modes
+
+
+def evaluate_policy(policy: Policy, changed_files: list[str]) -> PolicyEvaluationResult:
+    """
+    Evaluate whether a policy should fire based on changed files.
+
+    Args:
+        policy: Policy to evaluate
+        changed_files: List of changed file paths (relative)
+
+    Returns:
+        PolicyEvaluationResult with evaluation details
+    """
+    if policy.detection_mode == DetectionMode.TRIGGER_SAFETY:
+        should_fire = evaluate_trigger_safety(policy, changed_files)
+        trigger_files = (
+            [f for f in changed_files if matches_any_pattern(f, policy.triggers)]
+            if should_fire
+            else []
+        )
+        return PolicyEvaluationResult(
+            policy=policy,
+            should_fire=should_fire,
+            trigger_files=trigger_files,
+        )
+
+    elif policy.detection_mode == DetectionMode.SET:
+        should_fire, trigger_files, missing_files = evaluate_set_correspondence(
+            policy, changed_files
+        )
+        return PolicyEvaluationResult(
+            policy=policy,
+            should_fire=should_fire,
+            trigger_files=trigger_files,
+            missing_files=missing_files,
+        )
+
+    elif policy.detection_mode == DetectionMode.PAIR:
+        should_fire, trigger_files, missing_files = evaluate_pair_correspondence(
+            policy, changed_files
+        )
+        return PolicyEvaluationResult(
+            policy=policy,
+            should_fire=should_fire,
+            trigger_files=trigger_files,
+            missing_files=missing_files,
+        )
+
+    return PolicyEvaluationResult(policy=policy, should_fire=False)
+
+
 def evaluate_policies(
     policies: list[Policy],
     changed_files: list[str],
     promised_policies: set[str] | None = None,
-) -> list[Policy]:
+) -> list[PolicyEvaluationResult]:
     """
     Evaluate which policies should fire.
 
@@ -214,40 +501,91 @@ def evaluate_policies(
         policies: List of policies to evaluate
         changed_files: List of changed file paths (relative)
         promised_policies: Set of policy names that have been marked as addressed
-                          via <promise> tags (these are skipped)
+                          via <promise> tags (case-insensitive)
 
     Returns:
-        List of policies that should fire (trigger matches, no safety match, not promised)
+        List of PolicyEvaluationResult for policies that should fire
     """
     if promised_policies is None:
         promised_policies = set()
 
-    fired_policies = []
+    # Normalize promised names for case-insensitive comparison
+    promised_lower = {name.lower() for name in promised_policies}
+
+    results = []
     for policy in policies:
-        # Skip if already promised/addressed
-        if policy.name in promised_policies:
+        # Skip if already promised/addressed (case-insensitive)
+        if policy.name.lower() in promised_lower:
             continue
 
-        if evaluate_policy(policy, changed_files):
-            fired_policies.append(policy)
+        result = evaluate_policy(policy, changed_files)
+        if result.should_fire:
+            results.append(result)
+
+    return results
 
-    return fired_policies
 
+# =============================================================================
+# Legacy v1 Support (for migration)
+# =============================================================================
+
+
+@dataclass
+class PolicyV1:
+    """Legacy v1 policy format (from .deepwork.policy.yml)."""
 
-def parse_policy_file(policy_path: Path | str, base_dir: Path | None = None) -> list[Policy]:
+    name: str
+    triggers: list[str]
+    safety: list[str] = field(default_factory=list)
+    instructions: str = ""
+    compare_to: str = DEFAULT_COMPARE_TO
+
+    @classmethod
+    def from_dict(cls, data: dict[str, Any], base_dir: Path | None = None) -> "PolicyV1":
+        """Create PolicyV1 from dictionary (legacy format)."""
+        trigger = data["trigger"]
+        triggers = [trigger] if isinstance(trigger, str) else list(trigger)
+
+        safety_data = data.get("safety", [])
+        safety = [safety_data] if isinstance(safety_data, str) else list(safety_data)
+
+        if "instructions" in data:
+            instructions = data["instructions"]
+        elif "instructions_file" in data:
+            if base_dir is None:
+                raise PolicyParseError(
+                    f"Policy '{data['name']}' uses instructions_file but no base_dir provided"
+                )
+            instructions_path = base_dir / data["instructions_file"]
+            if not instructions_path.exists():
+                raise PolicyParseError(
+                    f"Policy '{data['name']}' instructions file not found: {instructions_path}"
+                )
+            instructions = instructions_path.read_text()
+        else:
+            raise PolicyParseError(
+                f"Policy '{data['name']}' must have 'instructions' or 'instructions_file'"
+            )
+
+        return cls(
+            name=data["name"],
+            triggers=triggers,
+            safety=safety,
+            instructions=instructions,
+            compare_to=data.get("compare_to", DEFAULT_COMPARE_TO),
+        )
+
+
+def parse_policy_file(policy_path: Path | str, base_dir: Path | None = None) -> list[PolicyV1]:
     """
-    Parse policy definitions from a YAML file.
+    Parse policy definitions from a YAML file (legacy v1 format).
 
     Args:
         policy_path: Path to .deepwork.policy.yml file
-        base_dir: Base directory for resolving instructions_file paths.
-                  Defaults to the directory containing the policy file.
+        base_dir: Base directory for resolving instructions_file paths
 
     Returns:
-        List of parsed Policy objects
-
-    Raises:
-        PolicyParseError: If parsing fails or validation errors occur
+        List of parsed PolicyV1 objects
     """
     policy_path = Path(policy_path)
 
@@ -257,11 +595,9 @@ def parse_policy_file(policy_path: Path | str, base_dir: Path | None = None) ->
     if not policy_path.is_file():
         raise PolicyParseError(f"Policy path is not a file: {policy_path}")
 
-    # Default base_dir to policy file's directory
     if base_dir is None:
         base_dir = policy_path.parent
 
-    # Load YAML (policies are stored as a list, not a dict)
     try:
         with open(policy_path, encoding="utf-8") as f:
             policy_data = yaml.safe_load(f)
@@ -270,26 +606,22 @@ def parse_policy_file(policy_path: Path | str, base_dir: Path | None = None) ->
     except OSError as e:
         raise PolicyParseError(f"Failed to read policy file: {e}") from e
 
-    # Handle empty file or null content
     if policy_data is None:
         return []
 
-    # Validate it's a list (schema expects array)
     if not isinstance(policy_data, list):
         raise PolicyParseError(
             f"Policy file must contain a list of policies, got {type(policy_data).__name__}"
         )
 
-    # Validate against schema
     try:
         validate_against_schema(policy_data, POLICY_SCHEMA)
     except ValidationError as e:
         raise PolicyParseError(f"Policy definition validation failed: {e}") from e
 
-    # Parse into dataclasses
     policies = []
     for policy_item in policy_data:
-        policy = Policy.from_dict(policy_item, base_dir)
+        policy = PolicyV1.from_dict(policy_item, base_dir)
         policies.append(policy)
 
     return policies
diff --git a/src/deepwork/core/policy_queue.py b/src/deepwork/core/policy_queue.py
new file mode 100644
index 00000000..44046832
--- /dev/null
+++ b/src/deepwork/core/policy_queue.py
@@ -0,0 +1,321 @@
+"""Queue system for tracking policy state in .deepwork/tmp/policy/queue/."""
+
+import hashlib
+import json
+from dataclasses import asdict, dataclass, field
+from datetime import datetime, timezone
+from enum import Enum
+from pathlib import Path
+from typing import Any
+
+
+class QueueEntryStatus(Enum):
+    """Status of a queue entry."""
+
+    QUEUED = "queued"  # Detected, awaiting evaluation
+    PASSED = "passed"  # Evaluated, policy satisfied (promise found or action succeeded)
+    FAILED = "failed"  # Evaluated, policy not satisfied
+    SKIPPED = "skipped"  # Safety pattern matched, skipped
+
+
+@dataclass
+class ActionResult:
+    """Result of executing a policy action."""
+
+    type: str  # "prompt" or "command"
+    output: str | None = None  # Command stdout or prompt message shown
+    exit_code: int | None = None  # Command exit code (None for prompt)
+
+
+@dataclass
+class QueueEntry:
+    """A single entry in the policy queue."""
+
+    # Identity
+    policy_name: str  # Human-friendly name
+    policy_file: str  # Filename (e.g., "source-test-pairing.md")
+    trigger_hash: str  # Hash for deduplication
+
+    # State
+    status: QueueEntryStatus = QueueEntryStatus.QUEUED
+    created_at: str = ""  # ISO8601 timestamp
+    evaluated_at: str | None = None  # ISO8601 timestamp
+
+    # Context
+    baseline_ref: str = ""  # Commit hash or timestamp used as baseline
+    trigger_files: list[str] = field(default_factory=list)
+    expected_files: list[str] = field(default_factory=list)  # For set/pair modes
+    matched_files: list[str] = field(default_factory=list)  # Files that also changed
+
+    # Result
+    action_result: ActionResult | None = None
+
+    def __post_init__(self) -> None:
+        if not self.created_at:
+            self.created_at = datetime.now(timezone.utc).isoformat()
+
+    def to_dict(self) -> dict[str, Any]:
+        """Convert to dictionary for JSON serialization."""
+        data = asdict(self)
+        data["status"] = self.status.value
+        if self.action_result:
+            data["action_result"] = asdict(self.action_result)
+        return data
+
+    @classmethod
+    def from_dict(cls, data: dict[str, Any]) -> "QueueEntry":
+        """Create from dictionary."""
+        action_result = None
+        if data.get("action_result"):
+            action_result = ActionResult(**data["action_result"])
+
+        return cls(
+            policy_name=data["policy_name"],
+            policy_file=data["policy_file"],
+            trigger_hash=data["trigger_hash"],
+            status=QueueEntryStatus(data["status"]),
+            created_at=data.get("created_at", ""),
+            evaluated_at=data.get("evaluated_at"),
+            baseline_ref=data.get("baseline_ref", ""),
+            trigger_files=data.get("trigger_files", []),
+            expected_files=data.get("expected_files", []),
+            matched_files=data.get("matched_files", []),
+            action_result=action_result,
+        )
+
+
+def compute_trigger_hash(
+    policy_name: str,
+    trigger_files: list[str],
+    baseline_ref: str,
+) -> str:
+    """
+    Compute a hash for deduplication.
+
+    The hash is based on:
+    - Policy name
+    - Sorted list of trigger files
+    - Baseline reference (commit hash or timestamp)
+
+    Returns:
+        12-character hex hash
+    """
+    hash_input = f"{policy_name}:{sorted(trigger_files)}:{baseline_ref}"
+    return hashlib.sha256(hash_input.encode()).hexdigest()[:12]
+
+
+class PolicyQueue:
+    """
+    Manages the policy queue in .deepwork/tmp/policy/queue/.
+
+    Queue entries are stored as JSON files named {hash}.{status}.json
+    """
+
+    def __init__(self, queue_dir: Path | None = None):
+        """
+        Initialize the queue.
+
+        Args:
+            queue_dir: Path to queue directory. Defaults to .deepwork/tmp/policy/queue/
+        """
+        if queue_dir is None:
+            queue_dir = Path(".deepwork/tmp/policy/queue")
+        self.queue_dir = queue_dir
+
+    def _ensure_dir(self) -> None:
+        """Ensure queue directory exists."""
+        self.queue_dir.mkdir(parents=True, exist_ok=True)
+
+    def _get_entry_path(self, trigger_hash: str, status: QueueEntryStatus) -> Path:
+        """Get path for an entry file."""
+        return self.queue_dir / f"{trigger_hash}.{status.value}.json"
+
+    def _find_entry_path(self, trigger_hash: str) -> Path | None:
+        """Find existing entry file for a hash (any status)."""
+        for status in QueueEntryStatus:
+            path = self._get_entry_path(trigger_hash, status)
+            if path.exists():
+                return path
+        return None
+
+    def has_entry(self, trigger_hash: str) -> bool:
+        """Check if an entry exists for this hash."""
+        return self._find_entry_path(trigger_hash) is not None
+
+    def get_entry(self, trigger_hash: str) -> QueueEntry | None:
+        """Get an entry by hash."""
+        path = self._find_entry_path(trigger_hash)
+        if path is None:
+            return None
+
+        try:
+            with open(path, "r", encoding="utf-8") as f:
+                data = json.load(f)
+            return QueueEntry.from_dict(data)
+        except (json.JSONDecodeError, OSError, KeyError):
+            return None
+
+    def create_entry(
+        self,
+        policy_name: str,
+        policy_file: str,
+        trigger_files: list[str],
+        baseline_ref: str,
+        expected_files: list[str] | None = None,
+    ) -> QueueEntry | None:
+        """
+        Create a new queue entry if one doesn't already exist.
+
+        Args:
+            policy_name: Human-friendly policy name
+            policy_file: Policy filename (e.g., "source-test-pairing.md")
+            trigger_files: Files that triggered the policy
+            baseline_ref: Baseline reference for change detection
+            expected_files: Expected corresponding files (for set/pair)
+
+        Returns:
+            Created QueueEntry, or None if entry already exists
+        """
+        trigger_hash = compute_trigger_hash(policy_name, trigger_files, baseline_ref)
+
+        # Check if already exists
+        if self.has_entry(trigger_hash):
+            return None
+
+        self._ensure_dir()
+
+        entry = QueueEntry(
+            policy_name=policy_name,
+            policy_file=policy_file,
+            trigger_hash=trigger_hash,
+            status=QueueEntryStatus.QUEUED,
+            baseline_ref=baseline_ref,
+            trigger_files=trigger_files,
+            expected_files=expected_files or [],
+        )
+
+        path = self._get_entry_path(trigger_hash, QueueEntryStatus.QUEUED)
+        with open(path, "w", encoding="utf-8") as f:
+            json.dump(entry.to_dict(), f, indent=2)
+
+        return entry
+
+    def update_status(
+        self,
+        trigger_hash: str,
+        new_status: QueueEntryStatus,
+        action_result: ActionResult | None = None,
+    ) -> bool:
+        """
+        Update the status of an entry.
+
+        This renames the file to reflect the new status.
+
+        Args:
+            trigger_hash: Hash of the entry to update
+            new_status: New status
+            action_result: Optional result of action execution
+
+        Returns:
+            True if updated, False if entry not found
+        """
+        old_path = self._find_entry_path(trigger_hash)
+        if old_path is None:
+            return False
+
+        # Load existing entry
+        try:
+            with open(old_path, "r", encoding="utf-8") as f:
+                data = json.load(f)
+        except (json.JSONDecodeError, OSError):
+            return False
+
+        # Update fields
+        data["status"] = new_status.value
+        data["evaluated_at"] = datetime.now(timezone.utc).isoformat()
+        if action_result:
+            data["action_result"] = asdict(action_result)
+
+        # Write to new path
+        new_path = self._get_entry_path(trigger_hash, new_status)
+
+        # If status didn't change, just update in place
+        if old_path == new_path:
+            with open(new_path, "w", encoding="utf-8") as f:
+                json.dump(data, f, indent=2)
+        else:
+            # Write new file then delete old
+            with open(new_path, "w", encoding="utf-8") as f:
+                json.dump(data, f, indent=2)
+            old_path.unlink()
+
+        return True
+
+    def get_queued_entries(self) -> list[QueueEntry]:
+        """Get all entries with QUEUED status."""
+        if not self.queue_dir.exists():
+            return []
+
+        entries = []
+        for path in self.queue_dir.glob("*.queued.json"):
+            try:
+                with open(path, "r", encoding="utf-8") as f:
+                    data = json.load(f)
+                entries.append(QueueEntry.from_dict(data))
+            except (json.JSONDecodeError, OSError, KeyError):
+                continue
+
+        return entries
+
+    def get_all_entries(self) -> list[QueueEntry]:
+        """Get all entries regardless of status."""
+        if not self.queue_dir.exists():
+            return []
+
+        entries = []
+        for path in self.queue_dir.glob("*.json"):
+            try:
+                with open(path, "r", encoding="utf-8") as f:
+                    data = json.load(f)
+                entries.append(QueueEntry.from_dict(data))
+            except (json.JSONDecodeError, OSError, KeyError):
+                continue
+
+        return entries
+
+    def clear(self) -> int:
+        """
+        Clear all entries from the queue.
+
+        Returns:
+            Number of entries removed
+        """
+        if not self.queue_dir.exists():
+            return 0
+
+        count = 0
+        for path in self.queue_dir.glob("*.json"):
+            try:
+                path.unlink()
+                count += 1
+            except OSError:
+                continue
+
+        return count
+
+    def remove_entry(self, trigger_hash: str) -> bool:
+        """
+        Remove an entry by hash.
+
+        Returns:
+            True if removed, False if not found
+        """
+        path = self._find_entry_path(trigger_hash)
+        if path is None:
+            return False
+
+        try:
+            path.unlink()
+            return True
+        except OSError:
+            return False
diff --git a/src/deepwork/hooks/evaluate_policies.py b/src/deepwork/hooks/evaluate_policies.py
index 07ac3845..3a2b05d8 100644
--- a/src/deepwork/hooks/evaluate_policies.py
+++ b/src/deepwork/hooks/evaluate_policies.py
@@ -28,14 +28,48 @@
 import sys
 from pathlib import Path
 
+from deepwork.core.pattern_matcher import matches_any_pattern
 from deepwork.core.policy_parser import (
-    Policy,
     PolicyParseError,
-    evaluate_policy,
+    PolicyV1,
     parse_policy_file,
 )
 
 
+def evaluate_policy_v1(policy: PolicyV1, changed_files: list[str]) -> bool:
+    """
+    Evaluate whether a v1 policy should fire based on changed files.
+
+    A policy fires when:
+    - At least one changed file matches a trigger pattern
+    - AND no changed file matches a safety pattern
+
+    Args:
+        policy: PolicyV1 to evaluate
+        changed_files: List of changed file paths
+
+    Returns:
+        True if policy should fire, False otherwise
+    """
+    # Check if any trigger matches
+    trigger_matched = False
+    for file_path in changed_files:
+        if matches_any_pattern(file_path, policy.triggers):
+            trigger_matched = True
+            break
+
+    if not trigger_matched:
+        return False
+
+    # Check if any safety pattern matches
+    if policy.safety:
+        for file_path in changed_files:
+            if matches_any_pattern(file_path, policy.safety):
+                return False
+
+    return True
+
+
 def get_default_branch() -> str:
     """
     Get the default branch name (main or master).
@@ -334,7 +368,7 @@ def main() -> None:
         return
 
     # Group policies by compare_to mode to minimize git calls
-    policies_by_mode: dict[str, list[Policy]] = {}
+    policies_by_mode: dict[str, list[PolicyV1]] = {}
     for policy in policies:
         mode = policy.compare_to
         if mode not in policies_by_mode:
@@ -342,7 +376,7 @@ def main() -> None:
         policies_by_mode[mode].append(policy)
 
     # Get changed files for each mode and evaluate policies
-    fired_policies: list[Policy] = []
+    fired_policies: list[PolicyV1] = []
     for mode, mode_policies in policies_by_mode.items():
         changed_files = get_changed_files_for_mode(mode)
         if not changed_files:
@@ -353,7 +387,7 @@ def main() -> None:
             if policy.name in promised_policies:
                 continue
             # Evaluate this policy
-            if evaluate_policy(policy, changed_files):
+            if evaluate_policy_v1(policy, changed_files):
                 fired_policies.append(policy)
 
     if not fired_policies:
diff --git a/src/deepwork/hooks/policy_check.py b/src/deepwork/hooks/policy_check.py
index 287852bd..4fb09141 100644
--- a/src/deepwork/hooks/policy_check.py
+++ b/src/deepwork/hooks/policy_check.py
@@ -1,9 +1,11 @@
 """
-Policy check hook for DeepWork.
+Policy check hook for DeepWork (v2).
 
 This hook evaluates policies when the agent finishes (after_agent event).
 It uses the wrapper system for cross-platform compatibility.
 
+Policy files are loaded from .deepwork/policies/ directory as frontmatter markdown files.
+
 Usage (via shell wrapper):
     claude_hook.sh deepwork.hooks.policy_check
     gemini_hook.sh deepwork.hooks.policy_check
@@ -21,11 +23,25 @@
 import sys
 from pathlib import Path
 
+from deepwork.core.command_executor import (
+    all_commands_succeeded,
+    format_command_errors,
+    run_command_action,
+)
 from deepwork.core.policy_parser import (
+    ActionType,
+    DetectionMode,
     Policy,
+    PolicyEvaluationResult,
     PolicyParseError,
-    evaluate_policy,
-    parse_policy_file,
+    evaluate_policies,
+    load_policies_from_directory,
+)
+from deepwork.core.policy_queue import (
+    ActionResult,
+    PolicyQueue,
+    QueueEntryStatus,
+    compute_trigger_hash,
 )
 from deepwork.hooks.wrapper import (
     HookInput,
@@ -63,6 +79,41 @@ def get_default_branch() -> str:
     return "main"
 
 
+def get_baseline_ref(mode: str) -> str:
+    """Get the baseline reference for a compare_to mode."""
+    if mode == "base":
+        try:
+            default_branch = get_default_branch()
+            result = subprocess.run(
+                ["git", "merge-base", "HEAD", f"origin/{default_branch}"],
+                capture_output=True,
+                text=True,
+                check=True,
+            )
+            return result.stdout.strip()
+        except subprocess.CalledProcessError:
+            return "base"
+    elif mode == "default_tip":
+        try:
+            default_branch = get_default_branch()
+            result = subprocess.run(
+                ["git", "rev-parse", f"origin/{default_branch}"],
+                capture_output=True,
+                text=True,
+                check=True,
+            )
+            return result.stdout.strip()
+        except subprocess.CalledProcessError:
+            return "default_tip"
+    elif mode == "prompt":
+        baseline_path = Path(".deepwork/.last_work_tree")
+        if baseline_path.exists():
+            # Use file modification time as reference
+            return str(int(baseline_path.stat().st_mtime))
+        return "prompt"
+    return mode
+
+
 def get_changed_files_base() -> list[str]:
     """Get files changed relative to branch base."""
     default_branch = get_default_branch()
@@ -188,8 +239,15 @@ def get_changed_files_for_mode(mode: str) -> list[str]:
 
 
 def extract_promise_tags(text: str) -> set[str]:
-    """Extract policy names from <promise> tags in text."""
-    pattern = r"<promise>✓\s*([^<]+)</promise>"
+    """
+    Extract policy names from <promise> tags in text.
+
+    Supports both:
+    - <promise>✓ Policy Name</promise>
+    - <promise>Policy Name</promise>
+    """
+    # Match with or without checkmark
+    pattern = r"<promise>(?:✓\s*)?([^<]+)</promise>"
     matches = re.findall(pattern, text, re.IGNORECASE | re.DOTALL)
     return {m.strip() for m in matches}
 
@@ -247,28 +305,52 @@ def extract_conversation_from_transcript(transcript_path: str, platform: Platfor
         return ""
 
 
-def format_policy_message(policies: list[Policy]) -> str:
-    """Format triggered policies into a message for the agent."""
+def format_policy_message(results: list[PolicyEvaluationResult]) -> str:
+    """
+    Format triggered policies into a concise message for the agent.
+
+    Groups policies by name and uses minimal formatting.
+    """
     lines = ["## DeepWork Policies Triggered", ""]
     lines.append(
         "Comply with the following policies. "
         "To mark a policy as addressed, include `<promise>✓ Policy Name</promise>` "
-        "in your response (replace Policy Name with the actual policy name)."
+        "in your response."
     )
     lines.append("")
 
-    for policy in policies:
-        lines.append(f"### Policy: {policy.name}")
-        lines.append("")
-        lines.append(policy.instructions.strip())
+    # Group results by policy name
+    by_name: dict[str, list[PolicyEvaluationResult]] = {}
+    for result in results:
+        name = result.policy.name
+        if name not in by_name:
+            by_name[name] = []
+        by_name[name].append(result)
+
+    for name, policy_results in by_name.items():
+        policy = policy_results[0].policy
+        lines.append(f"## {name}")
         lines.append("")
 
+        # For set/pair modes, show the correspondence violations concisely
+        if policy.detection_mode in (DetectionMode.SET, DetectionMode.PAIR):
+            for result in policy_results:
+                for trigger_file in result.trigger_files:
+                    for missing_file in result.missing_files:
+                        lines.append(f"{trigger_file} → {missing_file}")
+            lines.append("")
+
+        # Show instructions
+        if policy.instructions:
+            lines.append(policy.instructions.strip())
+            lines.append("")
+
     return "\n".join(lines)
 
 
 def policy_check_hook(hook_input: HookInput) -> HookOutput:
     """
-    Main hook logic for policy evaluation.
+    Main hook logic for policy evaluation (v2).
 
     This is called for after_agent events to check if policies need attention
     before allowing the agent to complete.
@@ -277,9 +359,9 @@ def policy_check_hook(hook_input: HookInput) -> HookOutput:
     if hook_input.event != NormalizedEvent.AFTER_AGENT:
         return HookOutput()
 
-    # Check if policy file exists
-    policy_path = Path(".deepwork.policy.yml")
-    if not policy_path.exists():
+    # Check if policies directory exists
+    policies_dir = Path(".deepwork/policies")
+    if not policies_dir.exists():
         return HookOutput()
 
     # Extract conversation context from transcript
@@ -287,19 +369,22 @@ def policy_check_hook(hook_input: HookInput) -> HookOutput:
         hook_input.transcript_path, hook_input.platform
     )
 
-    # Extract promise tags
+    # Extract promise tags (case-insensitive)
     promised_policies = extract_promise_tags(conversation_context)
 
-    # Parse policies
+    # Load policies
     try:
-        policies = parse_policy_file(policy_path)
+        policies = load_policies_from_directory(policies_dir)
     except PolicyParseError as e:
-        print(f"Error parsing policy file: {e}", file=sys.stderr)
+        print(f"Error loading policies: {e}", file=sys.stderr)
         return HookOutput()
 
     if not policies:
         return HookOutput()
 
+    # Initialize queue
+    queue = PolicyQueue()
+
     # Group policies by compare_to mode
     policies_by_mode: dict[str, list[Policy]] = {}
     for policy in policies:
@@ -308,25 +393,105 @@ def policy_check_hook(hook_input: HookInput) -> HookOutput:
             policies_by_mode[mode] = []
         policies_by_mode[mode].append(policy)
 
-    # Evaluate policies
-    fired_policies: list[Policy] = []
+    # Evaluate policies and collect results
+    prompt_results: list[PolicyEvaluationResult] = []
+    command_errors: list[str] = []
+
     for mode, mode_policies in policies_by_mode.items():
         changed_files = get_changed_files_for_mode(mode)
         if not changed_files:
             continue
 
-        for policy in mode_policies:
-            if policy.name in promised_policies:
-                continue
-            if evaluate_policy(policy, changed_files):
-                fired_policies.append(policy)
+        baseline_ref = get_baseline_ref(mode)
 
-    if not fired_policies:
-        return HookOutput()
+        # Evaluate which policies fire
+        results = evaluate_policies(mode_policies, changed_files, promised_policies)
+
+        for result in results:
+            policy = result.policy
+
+            # Compute trigger hash for queue deduplication
+            trigger_hash = compute_trigger_hash(
+                policy.name,
+                result.trigger_files,
+                baseline_ref,
+            )
+
+            # Check if already in queue (passed/skipped)
+            existing = queue.get_entry(trigger_hash)
+            if existing and existing.status in (
+                QueueEntryStatus.PASSED,
+                QueueEntryStatus.SKIPPED,
+            ):
+                continue
 
-    # Format message and return blocking response
-    message = format_policy_message(fired_policies)
-    return HookOutput(decision="block", reason=message)
+            # Create queue entry if new
+            if not existing:
+                queue.create_entry(
+                    policy_name=policy.name,
+                    policy_file=f"{policy.filename}.md",
+                    trigger_files=result.trigger_files,
+                    baseline_ref=baseline_ref,
+                    expected_files=result.missing_files,
+                )
+
+            # Handle based on action type
+            if policy.action_type == ActionType.COMMAND:
+                # Run command action
+                if policy.command_action:
+                    repo_root = Path.cwd()
+                    cmd_results = run_command_action(
+                        policy.command_action,
+                        result.trigger_files,
+                        repo_root,
+                    )
+
+                    if all_commands_succeeded(cmd_results):
+                        # Command succeeded, mark as passed
+                        queue.update_status(
+                            trigger_hash,
+                            QueueEntryStatus.PASSED,
+                            ActionResult(
+                                type="command",
+                                output=cmd_results[0].stdout if cmd_results else None,
+                                exit_code=0,
+                            ),
+                        )
+                    else:
+                        # Command failed
+                        error_msg = format_command_errors(cmd_results)
+                        command_errors.append(f"## {policy.name}\n{error_msg}")
+                        queue.update_status(
+                            trigger_hash,
+                            QueueEntryStatus.FAILED,
+                            ActionResult(
+                                type="command",
+                                output=error_msg,
+                                exit_code=cmd_results[0].exit_code if cmd_results else -1,
+                            ),
+                        )
+
+            elif policy.action_type == ActionType.PROMPT:
+                # Collect for prompt output
+                prompt_results.append(result)
+
+    # Build response
+    messages: list[str] = []
+
+    # Add command errors if any
+    if command_errors:
+        messages.append("## Command Policy Errors\n")
+        messages.extend(command_errors)
+        messages.append("")
+
+    # Add prompt policies if any
+    if prompt_results:
+        messages.append(format_policy_message(prompt_results))
+
+    if messages:
+        return HookOutput(decision="block", reason="\n".join(messages))
+
+    return HookOutput()
 
 
 def main() -> None:
diff --git a/src/deepwork/schemas/policy_schema.py b/src/deepwork/schemas/policy_schema.py
index 5aa6ae89..690cb643 100644
--- a/src/deepwork/schemas/policy_schema.py
+++ b/src/deepwork/schemas/policy_schema.py
@@ -1,10 +1,111 @@
-"""JSON Schema definition for policy definitions."""
+"""JSON Schema definition for policy definitions (v2 - frontmatter format)."""
 
 from typing import Any
 
-# JSON Schema for .deepwork.policy.yml files
-# Policies are defined as an array of policy objects
-POLICY_SCHEMA: dict[str, Any] = {
+# Pattern for string or array of strings
+STRING_OR_ARRAY: dict[str, Any] = {
+    "oneOf": [
+        {"type": "string", "minLength": 1},
+        {"type": "array", "items": {"type": "string", "minLength": 1}, "minItems": 1},
+    ]
+}
+
+# JSON Schema for policy frontmatter (YAML between --- delimiters)
+# Policies are stored as individual .md files in .deepwork/policies/
+POLICY_FRONTMATTER_SCHEMA: dict[str, Any] = {
+    "$schema": "http://json-schema.org/draft-07/schema#",
+    "type": "object",
+    "required": ["name"],
+    "properties": {
+        "name": {
+            "type": "string",
+            "minLength": 1,
+            "description": "Human-friendly name for the policy (displayed in promise tags)",
+        },
+        # Detection mode: trigger/safety (mutually exclusive with set/pair)
+        "trigger": {
+            **STRING_OR_ARRAY,
+            "description": "Glob pattern(s) for files that trigger this policy",
+        },
+        "safety": {
+            **STRING_OR_ARRAY,
+            "description": "Glob pattern(s) that suppress the policy if changed",
+        },
+        # Detection mode: set (bidirectional correspondence)
+        "set": {
+            "type": "array",
+            "items": {"type": "string", "minLength": 1},
+            "minItems": 2,
+            "description": "Patterns defining bidirectional file correspondence",
+        },
+        # Detection mode: pair (directional correspondence)
+        "pair": {
+            "type": "object",
+            "required": ["trigger", "expects"],
+            "properties": {
+                "trigger": {
+                    "type": "string",
+                    "minLength": 1,
+                    "description": "Pattern that triggers the policy",
+                },
+                "expects": {
+                    **STRING_OR_ARRAY,
+                    "description": "Pattern(s) for expected corresponding files",
+                },
+            },
+            "additionalProperties": False,
+            "description": "Directional file correspondence (trigger -> expects)",
+        },
+        # Action type: command (default is prompt using markdown body)
+        "action": {
+            "type": "object",
+            "required": ["command"],
+            "properties": {
+                "command": {
+                    "type": "string",
+                    "minLength": 1,
+                    "description": "Command to run (supports {file}, {files}, {repo_root})",
+                },
+                "run_for": {
+                    "type": "string",
+                    "enum": ["each_match", "all_matches"],
+                    "default": "each_match",
+                    "description": "Run command for each file or all files at once",
+                },
+            },
+            "additionalProperties": False,
+            "description": "Command action to run instead of prompting",
+        },
+        # Common options
+        "compare_to": {
+            "type": "string",
+            "enum": ["base", "default_tip", "prompt"],
+            "default": "base",
+            "description": "Baseline for detecting file changes",
+        },
+    },
+    "additionalProperties": False,
+    # Detection mode must be exactly one of: trigger, set, or pair
+    "oneOf": [
+        {
+            "required": ["trigger"],
+            "not": {"anyOf": [{"required": ["set"]}, {"required": ["pair"]}]},
+        },
+        {
+            "required": ["set"],
+            "not": {"anyOf": [{"required": ["trigger"]}, {"required": ["pair"]}]},
+        },
+        {
+            "required": ["pair"],
+            "not": {"anyOf": [{"required": ["trigger"]}, {"required": ["set"]}]},
+        },
+    ],
+}
+
+
+# Legacy schema for .deepwork.policy.yml (v1 format)
+# Kept for reference but not used in v2
+POLICY_SCHEMA_V1: dict[str, Any] = {
     "$schema": "http://json-schema.org/draft-07/schema#",
     "type": "array",
     "description": "List of policies that trigger based on file changes",
@@ -76,3 +177,6 @@
         "additionalProperties": False,
     },
 }
+
+# Alias for backwards compatibility
+POLICY_SCHEMA = POLICY_SCHEMA_V1
diff --git a/tests/unit/test_evaluate_policies.py b/tests/unit/test_evaluate_policies.py
index 03f1a26a..c0abdceb 100644
--- a/tests/unit/test_evaluate_policies.py
+++ b/tests/unit/test_evaluate_policies.py
@@ -1,6 +1,6 @@
 """Tests for the hooks evaluate_policies module."""
 
-from deepwork.core.policy_parser import Policy
+from deepwork.core.policy_parser import PolicyV1
 from deepwork.hooks.evaluate_policies import extract_promise_tags, format_policy_message
 
 
@@ -48,7 +48,7 @@ class TestFormatPolicyMessage:
     def test_formats_single_policy(self) -> None:
         """Test formatting a single policy."""
         policies = [
-            Policy(
+            PolicyV1(
                 name="Test Policy",
                 triggers=["src/*"],
                 safety=[],
@@ -65,13 +65,13 @@ def test_formats_single_policy(self) -> None:
     def test_formats_multiple_policies(self) -> None:
         """Test formatting multiple policies."""
         policies = [
-            Policy(
+            PolicyV1(
                 name="Policy 1",
                 triggers=["src/*"],
                 safety=[],
                 instructions="Do thing 1.",
             ),
-            Policy(
+            PolicyV1(
                 name="Policy 2",
                 triggers=["test/*"],
                 safety=[],
@@ -88,7 +88,7 @@ def test_formats_multiple_policies(self) -> None:
     def test_strips_instruction_whitespace(self) -> None:
         """Test that instruction whitespace is stripped."""
         policies = [
-            Policy(
+            PolicyV1(
                 name="Test",
                 triggers=["*"],
                 safety=[],
diff --git a/tests/unit/test_policy_parser.py b/tests/unit/test_policy_parser.py
index 80eedbb1..24e537c4 100644
--- a/tests/unit/test_policy_parser.py
+++ b/tests/unit/test_policy_parser.py
@@ -4,19 +4,21 @@
 
 import pytest
 
+from deepwork.core.pattern_matcher import matches_any_pattern as matches_pattern
 from deepwork.core.policy_parser import (
     DEFAULT_COMPARE_TO,
+    DetectionMode,
     Policy,
     PolicyParseError,
+    PolicyV1,
     evaluate_policies,
     evaluate_policy,
-    matches_pattern,
     parse_policy_file,
 )
 
 
-class TestPolicy:
-    """Tests for Policy dataclass."""
+class TestPolicyV1:
+    """Tests for PolicyV1 dataclass (legacy format)."""
 
     def test_from_dict_with_inline_instructions(self) -> None:
         """Test creating policy from dict with inline instructions."""
@@ -26,7 +28,7 @@ def test_from_dict_with_inline_instructions(self) -> None:
             "safety": "docs/readme.md",
             "instructions": "Do something",
         }
-        policy = Policy.from_dict(data)
+        policy = PolicyV1.from_dict(data)
 
         assert policy.name == "Test Policy"
         assert policy.triggers == ["src/**/*"]
@@ -40,7 +42,7 @@ def test_from_dict_normalizes_trigger_string_to_list(self) -> None:
             "trigger": "*.py",
             "instructions": "Check it",
         }
-        policy = Policy.from_dict(data)
+        policy = PolicyV1.from_dict(data)
 
         assert policy.triggers == ["*.py"]
 
@@ -51,7 +53,7 @@ def test_from_dict_preserves_trigger_list(self) -> None:
             "trigger": ["*.py", "*.js"],
             "instructions": "Check it",
         }
-        policy = Policy.from_dict(data)
+        policy = PolicyV1.from_dict(data)
 
         assert policy.triggers == ["*.py", "*.js"]
 
@@ -63,7 +65,7 @@ def test_from_dict_normalizes_safety_string_to_list(self) -> None:
             "safety": "docs/README.md",
             "instructions": "Check it",
         }
-        policy = Policy.from_dict(data)
+        policy = PolicyV1.from_dict(data)
 
         assert policy.safety == ["docs/README.md"]
 
@@ -74,7 +76,7 @@ def test_from_dict_safety_defaults_to_empty_list(self) -> None:
             "trigger": "src/*",
             "instructions": "Check it",
         }
-        policy = Policy.from_dict(data)
+        policy = PolicyV1.from_dict(data)
 
         assert policy.safety == []
 
@@ -89,7 +91,7 @@ def test_from_dict_with_instructions_file(self, temp_dir: Path) -> None:
             "trigger": "src/*",
             "instructions_file": "instructions.md",
         }
-        policy = Policy.from_dict(data, base_dir=temp_dir)
+        policy = PolicyV1.from_dict(data, base_dir=temp_dir)
 
         assert policy.instructions == "# Instructions\nDo this and that."
 
@@ -102,7 +104,7 @@ def test_from_dict_instructions_file_not_found(self, temp_dir: Path) -> None:
         }
 
         with pytest.raises(PolicyParseError, match="instructions file not found"):
-            Policy.from_dict(data, base_dir=temp_dir)
+            PolicyV1.from_dict(data, base_dir=temp_dir)
 
     def test_from_dict_instructions_file_without_base_dir(self) -> None:
         """Test error when instructions_file used without base_dir."""
@@ -113,7 +115,7 @@ def test_from_dict_instructions_file_without_base_dir(self) -> None:
         }
 
         with pytest.raises(PolicyParseError, match="no base_dir provided"):
-            Policy.from_dict(data, base_dir=None)
+            PolicyV1.from_dict(data, base_dir=None)
 
     def test_from_dict_compare_to_defaults_to_base(self) -> None:
         """Test that compare_to defaults to 'base'."""
@@ -122,7 +124,7 @@ def test_from_dict_compare_to_defaults_to_base(self) -> None:
             "trigger": "src/*",
             "instructions": "Check it",
         }
-        policy = Policy.from_dict(data)
+        policy = PolicyV1.from_dict(data)
 
         assert policy.compare_to == DEFAULT_COMPARE_TO
         assert policy.compare_to == "base"
@@ -135,7 +137,7 @@ def test_from_dict_compare_to_explicit_base(self) -> None:
             "instructions": "Check it",
             "compare_to": "base",
         }
-        policy = Policy.from_dict(data)
+        policy = PolicyV1.from_dict(data)
 
         assert policy.compare_to == "base"
 
@@ -147,7 +149,7 @@ def test_from_dict_compare_to_default_tip(self) -> None:
             "instructions": "Check it",
             "compare_to": "default_tip",
         }
-        policy = Policy.from_dict(data)
+        policy = PolicyV1.from_dict(data)
 
         assert policy.compare_to == "default_tip"
 
@@ -159,7 +161,7 @@ def test_from_dict_compare_to_prompt(self) -> None:
             "instructions": "Check it",
             "compare_to": "prompt",
         }
-        policy = Policy.from_dict(data)
+        policy = PolicyV1.from_dict(data)
 
         assert policy.compare_to == "prompt"
 
@@ -204,65 +206,82 @@ def test_fires_when_trigger_matches(self) -> None:
         """Test policy fires when trigger matches."""
         policy = Policy(
             name="Test",
+            filename="test",
+            detection_mode=DetectionMode.TRIGGER_SAFETY,
             triggers=["src/**/*.py"],
             safety=[],
             instructions="Check it",
         )
         changed_files = ["src/main.py", "README.md"]
 
-        assert evaluate_policy(policy, changed_files) is True
+        result = evaluate_policy(policy, changed_files)
+        assert result.should_fire is True
 
     def test_does_not_fire_when_no_trigger_match(self) -> None:
         """Test policy doesn't fire when no trigger matches."""
         policy = Policy(
             name="Test",
+            filename="test",
+            detection_mode=DetectionMode.TRIGGER_SAFETY,
             triggers=["src/**/*.py"],
             safety=[],
             instructions="Check it",
         )
         changed_files = ["test/main.py", "README.md"]
 
-        assert evaluate_policy(policy, changed_files) is False
+        result = evaluate_policy(policy, changed_files)
+        assert result.should_fire is False
 
     def test_does_not_fire_when_safety_matches(self) -> None:
         """Test policy doesn't fire when safety file is also changed."""
         policy = Policy(
             name="Test",
+            filename="test",
+            detection_mode=DetectionMode.TRIGGER_SAFETY,
             triggers=["app/config/**/*"],
             safety=["docs/install_guide.md"],
             instructions="Update docs",
         )
         changed_files = ["app/config/settings.py", "docs/install_guide.md"]
 
-        assert evaluate_policy(policy, changed_files) is False
+        result = evaluate_policy(policy, changed_files)
+        assert result.should_fire is False
 
     def test_fires_when_trigger_matches_but_safety_doesnt(self) -> None:
         """Test policy fires when trigger matches but safety doesn't."""
         policy = Policy(
             name="Test",
+            filename="test",
+            detection_mode=DetectionMode.TRIGGER_SAFETY,
             triggers=["app/config/**/*"],
             safety=["docs/install_guide.md"],
             instructions="Update docs",
         )
         changed_files = ["app/config/settings.py", "app/main.py"]
 
-        assert evaluate_policy(policy, changed_files) is True
+        result = evaluate_policy(policy, changed_files)
+        assert result.should_fire is True
 
     def test_multiple_safety_patterns(self) -> None:
         """Test policy with multiple safety patterns."""
         policy = Policy(
             name="Test",
+            filename="test",
+            detection_mode=DetectionMode.TRIGGER_SAFETY,
             triggers=["src/auth/**/*"],
             safety=["SECURITY.md", "docs/security_review.md"],
             instructions="Security review",
         )
 
         # Should not fire if any safety file is changed
-        assert evaluate_policy(policy, ["src/auth/login.py", "SECURITY.md"]) is False
-        assert evaluate_policy(policy, ["src/auth/login.py", "docs/security_review.md"]) is False
+        result1 = evaluate_policy(policy, ["src/auth/login.py", "SECURITY.md"])
+        assert result1.should_fire is False
+        result2 = evaluate_policy(policy, ["src/auth/login.py", "docs/security_review.md"])
+        assert result2.should_fire is False
 
         # Should fire if no safety files changed
-        assert evaluate_policy(policy, ["src/auth/login.py"]) is True
+        result3 = evaluate_policy(policy, ["src/auth/login.py"])
+        assert result3.should_fire is True
 
 
 class TestEvaluatePolicies:
@@ -273,12 +292,16 @@ def test_returns_fired_policies(self) -> None:
         policies = [
             Policy(
                 name="Policy 1",
+                filename="policy1",
+                detection_mode=DetectionMode.TRIGGER_SAFETY,
                 triggers=["src/**/*"],
                 safety=[],
                 instructions="Do 1",
             ),
             Policy(
                 name="Policy 2",
+                filename="policy2",
+                detection_mode=DetectionMode.TRIGGER_SAFETY,
                 triggers=["test/**/*"],
                 safety=[],
                 instructions="Do 2",
@@ -289,20 +312,24 @@ def test_returns_fired_policies(self) -> None:
         fired = evaluate_policies(policies, changed_files)
 
         assert len(fired) == 2
-        assert fired[0].name == "Policy 1"
-        assert fired[1].name == "Policy 2"
+        assert fired[0].policy.name == "Policy 1"
+        assert fired[1].policy.name == "Policy 2"
 
     def test_skips_promised_policies(self) -> None:
         """Test that promised policies are skipped."""
         policies = [
             Policy(
                 name="Policy 1",
+                filename="policy1",
+                detection_mode=DetectionMode.TRIGGER_SAFETY,
                 triggers=["src/**/*"],
                 safety=[],
                 instructions="Do 1",
             ),
             Policy(
                 name="Policy 2",
+                filename="policy2",
+                detection_mode=DetectionMode.TRIGGER_SAFETY,
                 triggers=["src/**/*"],
                 safety=[],
                 instructions="Do 2",
@@ -314,13 +341,15 @@ def test_skips_promised_policies(self) -> None:
         fired = evaluate_policies(policies, changed_files, promised)
 
         assert len(fired) == 1
-        assert fired[0].name == "Policy 2"
+        assert fired[0].policy.name == "Policy 2"
 
     def test_returns_empty_when_no_policies_fire(self) -> None:
         """Test returns empty list when no policies fire."""
         policies = [
             Policy(
                 name="Policy 1",
+                filename="policy1",
+                detection_mode=DetectionMode.TRIGGER_SAFETY,
                 triggers=["src/**/*"],
                 safety=[],
                 instructions="Do 1",

From d4bb783655dc1ca44045b730fa41e63dce205dfb Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Fri, 16 Jan 2026 18:56:33 +0000
Subject: [PATCH 06/21] Update documentation and version for policy system v2

- Update README.md with v2 policy examples and directory structure
- Update doc/architecture.md with v2 detection modes, action types, and queue system
- Bump version to 0.4.0 in pyproject.toml
- Add changelog entry for v2 policy system features
---
 CHANGELOG.md        |  19 +++++
 README.md           |  45 +++++++++---
 doc/architecture.md | 170 +++++++++++++++++++++++++++++++++-----------
 pyproject.toml      |   2 +-
 4 files changed, 183 insertions(+), 53 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 2fb45116..bf907eb1 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -5,6 +5,24 @@ All notable changes to DeepWork will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [0.4.0] - 2026-01-16
+
+### Added
+- Policy system v2 with frontmatter markdown format in `.deepwork/policies/`
+  - Detection modes: trigger/safety (default), set (bidirectional), pair (directional)
+  - Action types: prompt (show instructions), command (run idempotent commands)
+  - Variable pattern matching with `{path}` (multi-segment) and `{name}` (single-segment)
+  - Queue system in `.deepwork/tmp/policy/queue/` for state tracking and deduplication
+- New core modules:
+  - `pattern_matcher.py`: Variable pattern matching with regex-based capture
+  - `policy_queue.py`: Queue system for policy state persistence
+  - `command_executor.py`: Command action execution with variable substitution
+- Updated `policy_check.py` hook to use v2 system with queue-based deduplication
+
+### Changed
+- Policy parser now supports both v1 (`.deepwork.policy.yml`) and v2 (`.deepwork/policies/*.md`) formats
+- Documentation updated with v2 policy examples and configuration
+
 ## [0.3.0] - 2026-01-16
 
 ### Added
@@ -64,6 +82,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 Initial version.
 
+[0.4.0]: https://github.com/anthropics/deepwork/releases/tag/0.4.0
 [0.3.0]: https://github.com/anthropics/deepwork/releases/tag/0.3.0
 [0.1.1]: https://github.com/anthropics/deepwork/releases/tag/0.1.1
 [0.1.0]: https://github.com/anthropics/deepwork/releases/tag/0.1.0
diff --git a/README.md b/README.md
index 33319968..218a30f5 100644
--- a/README.md
+++ b/README.md
@@ -178,6 +178,10 @@ DeepWork follows a **Git-native, installation-only** design:
 your-project/
 ├── .deepwork/
 │   ├── config.yml          # Platform configuration
+│   ├── policies/           # Policy definitions (v2 format)
+│   │   └── policy-name.md  # Individual policy files
+│   ├── tmp/                # Temporary state (gitignored)
+│   │   └── policy/queue/   # Policy evaluation queue
 │   └── jobs/               # Job definitions
 │       └── job_name/
 │           ├── job.yml     # Job metadata
@@ -208,11 +212,16 @@ deepwork/
 │   ├── core/             # Core functionality
 │   │   ├── parser.py     # Job definition parsing
 │   │   ├── detector.py   # Platform detection
-│   │   └── generator.py  # Skill file generation
+│   │   ├── generator.py  # Skill file generation
+│   │   ├── policy_parser.py    # Policy parsing (v1 and v2)
+│   │   ├── pattern_matcher.py  # Variable pattern matching
+│   │   ├── policy_queue.py     # Policy state queue
+│   │   └── command_executor.py # Command action execution
 │   ├── hooks/            # Cross-platform hook wrappers
 │   │   ├── wrapper.py    # Input/output normalization
-│   │   ├── claude_hook.sh  # Claude Code adapter
-│   │   └── gemini_hook.sh  # Gemini CLI adapter
+│   │   ├── policy_check.py   # Policy evaluation hook (v2)
+│   │   ├── claude_hook.sh    # Claude Code adapter
+│   │   └── gemini_hook.sh    # Gemini CLI adapter
 │   ├── templates/        # Jinja2 templates
 │   │   ├── claude/       # Claude Code templates
 │   │   └── gemini/       # Gemini CLI templates
@@ -243,15 +252,31 @@ Maintain a clean repository with automatic branch management and isolation.
 ### 🛡️ Automated Policies
 Enforce project standards and best practices without manual oversight. Policies monitor file changes and automatically prompt your AI assistant to follow specific guidelines when relevant code is modified.
 - **Automatic Triggers**: Detect when specific files or directories are changed to fire relevant policies.
+- **File Correspondence**: Define bidirectional (set) or directional (pair) relationships between files.
+- **Command Actions**: Run idempotent commands (formatters, linters) automatically when files change.
 - **Contextual Guidance**: Instructions are injected directly into the AI's workflow at the right moment.
-- **Common Use Cases**: Keep documentation in sync, enforce security reviews, or automate changelog updates.
 
-**Example Policy**:
-```yaml
-# Enforce documentation updates when config changes
-- name: "Update docs on config changes"
-  trigger: "app/config/**/*"
-  instructions: "Configuration files changed. Please update docs/install_guide.md."
+**Example Policy** (`.deepwork/policies/source-test-pairing.md`):
+```markdown
+---
+name: Source/Test Pairing
+set:
+  - src/{path}.py
+  - tests/{path}_test.py
+---
+When source files change, corresponding test files should also change.
+Please create or update tests for the modified source files.
+```
+
+**Example Command Policy** (`.deepwork/policies/format-python.md`):
+```markdown
+---
+name: Format Python
+trigger: "**/*.py"
+action:
+  command: "ruff format {file}"
+  run_for: each_match
+---
 ```
 
 ### 🚀 Multi-Platform Support
diff --git a/doc/architecture.md b/doc/architecture.md
index 6ddf971c..16bed5b7 100644
--- a/doc/architecture.md
+++ b/doc/architecture.md
@@ -46,8 +46,11 @@ deepwork/                       # DeepWork tool repository
 │       │   ├── detector.py     # AI platform detection
 │       │   ├── generator.py    # Command file generation
 │       │   ├── parser.py       # Job definition parsing
-│       │   ├── policy_parser.py # Policy definition parsing
-│       │   └── hooks_syncer.py # Hook syncing to platforms
+│       │   ├── policy_parser.py    # Policy definition parsing (v1 and v2)
+│       │   ├── pattern_matcher.py  # Variable pattern matching for policies
+│       │   ├── policy_queue.py     # Policy state queue system
+│       │   ├── command_executor.py # Command action execution
+│       │   └── hooks_syncer.py     # Hook syncing to platforms
 │       ├── hooks/              # Hook system and cross-platform wrappers
 │       │   ├── __init__.py
 │       │   ├── wrapper.py           # Cross-platform input/output normalization
@@ -286,7 +289,13 @@ my-project/                     # User's project (target)
 │       └── ...
 ├── .deepwork/                  # DeepWork configuration
 │   ├── config.yml              # Platform config
-│   ├── .gitignore              # Ignores .last_work_tree
+│   ├── .gitignore              # Ignores tmp/ directory
+│   ├── policies/               # Policy definitions (v2 format)
+│   │   ├── source-test-pairing.md
+│   │   ├── format-python.md
+│   │   └── api-docs.md
+│   ├── tmp/                    # Temporary state (gitignored)
+│   │   └── policy/queue/       # Policy evaluation queue
 │   └── jobs/                   # Job definitions
 │       ├── deepwork_jobs/      # Core job for managing jobs
 │       │   ├── job.yml
@@ -305,7 +314,7 @@ my-project/                     # User's project (target)
 │       │   └── steps/
 │       └── ad_campaign/
 │           └── ...
-├── .deepwork.policy.yml        # Policy definitions (project root)
+├── .deepwork.policy.yml        # Legacy policy definitions (v1 format)
 ├── (rest of user's project files)
 └── README.md
 ```
@@ -1000,57 +1009,125 @@ Policies are automated enforcement rules that trigger based on file changes duri
 - Documentation stays in sync with code changes
 - Security reviews happen when sensitive code is modified
 - Team guidelines are followed automatically
+- File correspondences are maintained (e.g., source/test pairing)
 
-### Policy Configuration File
+### Policy System v2 (Frontmatter Markdown)
 
-Policies are defined in `.deepwork.policy.yml` at the project root:
+Policies are defined as individual markdown files in `.deepwork/policies/`:
 
+```
+.deepwork/policies/
+├── source-test-pairing.md
+├── format-python.md
+└── api-docs.md
+```
+
+Each policy file uses YAML frontmatter with a markdown body for instructions:
+
+```markdown
+---
+name: Source/Test Pairing
+set:
+  - src/{path}.py
+  - tests/{path}_test.py
+---
+When source files change, corresponding test files should also change.
+Please create or update tests for the modified source files.
+```
+
+### Detection Modes
+
+Policies support three detection modes:
+
+**1. Trigger/Safety (default)** - Fire when trigger matches but safety doesn't:
 ```yaml
-- name: "Update install guide on config changes"
-  trigger: "app/config/**/*"
-  safety: "docs/install_guide.md"
-  instructions: |
-    Configuration files have been modified. Please review docs/install_guide.md
-    and update it if any installation instructions need to change.
-
-- name: "Security review for auth changes"
-  trigger:
-    - "src/auth/**/*"
-    - "src/security/**/*"
-  safety:
-    - "SECURITY.md"
-    - "docs/security_audit.md"
-  instructions: |
-    Authentication or security code has been changed. Please:
-    1. Check for hardcoded credentials
-    2. Verify input validation
-    3. Review access control logic
+---
+name: Update install guide
+trigger: "app/config/**/*"
+safety: "docs/install_guide.md"
+---
+```
+
+**2. Set (bidirectional)** - Enforce file correspondence in both directions:
+```yaml
+---
+name: Source/Test Pairing
+set:
+  - src/{path}.py
+  - tests/{path}_test.py
+---
+```
+Uses variable patterns like `{path}` (multi-segment) and `{name}` (single-segment) for matching.
+
+**3. Pair (directional)** - Trigger requires corresponding files, but not vice versa:
+```yaml
+---
+name: API Documentation
+pair:
+  trigger: src/api/{name}.py
+  expects: docs/api/{name}.md
+---
+```
+
+### Action Types
+
+**1. Prompt (default)** - Show instructions to the agent:
+```yaml
+---
+name: Security Review
+trigger: "src/auth/**/*"
+---
+Please check for hardcoded credentials and validate input.
+```
+
+**2. Command** - Run an idempotent command:
+```yaml
+---
+name: Format Python
+trigger: "**/*.py"
+action:
+  command: "ruff format {file}"
+  run_for: each_match  # or "all_matches"
+---
 ```
 
 ### Policy Evaluation Flow
 
 1. **Session Start**: When a Claude Code session begins, the baseline git state is captured
 2. **Agent Works**: The AI agent performs tasks, potentially modifying files
-3. **Session Stop**: When the agent finishes:
-   - Changed files are detected by comparing against the baseline
-   - Each policy is evaluated:
-     - If any changed file matches a `trigger` pattern AND
-     - No changed file matches a `safety` pattern AND
-     - The agent hasn't marked it with a `<promise>` tag
-     - → The policy fires
-   - If policies fire, Claude is prompted to address them
+3. **Session Stop**: When the agent finishes (after_agent event):
+   - Changed files are detected based on `compare_to` setting (base, default_tip, or prompt)
+   - Each policy is evaluated based on its detection mode
+   - Queue entries are created in `.deepwork/tmp/policy/queue/` for deduplication
+   - For command actions: commands are executed, results tracked
+   - For prompt actions: if policy fires and not already promised, agent is prompted
 4. **Promise Tags**: Agents can mark policies as addressed by including `<promise>✓ Policy Name</promise>` in their response
 
+### Queue System
+
+Policy state is tracked in `.deepwork/tmp/policy/queue/` with files named `{hash}.{status}.json`:
+- `queued` - Detected, awaiting evaluation
+- `passed` - Policy satisfied (promise found or command succeeded)
+- `failed` - Policy not satisfied
+- `skipped` - Safety pattern matched
+
+This prevents re-prompting for the same policy violation within a session.
+
 ### Hook Integration
 
-Policies are implemented using Claude Code's hooks system. The `deepwork_policy` standard job includes:
+The v2 policy system uses the cross-platform hook wrapper:
 
 ```
-.deepwork/jobs/deepwork_policy/hooks/
-├── global_hooks.yml              # Maps lifecycle events to scripts
-├── user_prompt_submit.sh         # Captures baseline at each prompt
-├── capture_prompt_work_tree.sh   # Creates git state snapshot for compare_to: prompt
-└── policy_stop_hook.sh           # Evaluates policies on stop (calls Python evaluator)
+src/deepwork/hooks/
+├── wrapper.py           # Cross-platform input/output normalization
+├── policy_check.py      # Policy evaluation hook (v2)
+├── claude_hook.sh       # Claude Code shell wrapper
+└── gemini_hook.sh       # Gemini CLI shell wrapper
+```
+
+Hooks are called via the shell wrappers:
+```bash
+claude_hook.sh deepwork.hooks.policy_check
 ```
 
 The hooks are installed to `.claude/settings.json` during `deepwork sync`:
@@ -1058,9 +1135,6 @@ The hooks are installed to `.claude/settings.json` during `deepwork sync`:
 ```json
 {
   "hooks": {
-    "UserPromptSubmit": [
-      {"matcher": "", "hooks": [{"type": "command", "command": ".deepwork/jobs/deepwork_policy/hooks/user_prompt_submit.sh"}]}
-    ],
     "Stop": [
       {"matcher": "", "hooks": [{"type": "command", "command": ".deepwork/jobs/deepwork_policy/hooks/policy_stop_hook.sh"}]}
     ]
@@ -1068,6 +1142,18 @@ The hooks are installed to `.claude/settings.json` during `deepwork sync`:
 }
 ```
 
+### Legacy v1 Format
+
+The v1 format (`.deepwork.policy.yml`) is still supported for backward compatibility:
+
+```yaml
+- name: "Update install guide"
+  trigger: "app/config/**/*"
+  safety: "docs/install_guide.md"
+  instructions: |
+    Configuration files have been modified. Please review docs/install_guide.md.
+```
+
 ### Cross-Platform Hook Wrapper System
 
 The `hooks/` module provides a wrapper system that allows writing hooks once in Python and running them on multiple platforms. This normalizes the differences between Claude Code and Gemini CLI hook systems.
diff --git a/pyproject.toml b/pyproject.toml
index f3d38afd..d84e3edb 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "deepwork"
-version = "0.3.0"
+version = "0.4.0"
 description = "Framework for enabling AI agents to perform complex, multi-step work tasks"
 readme = "README.md"
 requires-python = ">=3.11"

From 1f95c13112e76aeea74a2c2bf2d1b4fd81ccd1d6 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Fri, 16 Jan 2026 19:12:52 +0000
Subject: [PATCH 07/21] Update policy_stop_hook.sh to support both v1 and v2
 policy formats

The hook now:
- Checks for v2 policies in .deepwork/policies/ first
- Falls back to v1 policies in .deepwork.policy.yml if no v2 found
- Passes JSON input directly to policy_check.py for v2 (via wrapper)
- Maintains existing behavior for v1 evaluate_policies.py
---
 .../deepwork_policy/hooks/policy_stop_hook.sh | 77 +++++++++++--------
 1 file changed, 45 insertions(+), 32 deletions(-)

diff --git a/src/deepwork/standard_jobs/deepwork_policy/hooks/policy_stop_hook.sh b/src/deepwork/standard_jobs/deepwork_policy/hooks/policy_stop_hook.sh
index b12d456c..6a84bddc 100755
--- a/src/deepwork/standard_jobs/deepwork_policy/hooks/policy_stop_hook.sh
+++ b/src/deepwork/standard_jobs/deepwork_policy/hooks/policy_stop_hook.sh
@@ -2,16 +2,27 @@
 # policy_stop_hook.sh - Evaluates policies when the agent stops
 #
 # This script is called as a Claude Code Stop hook. It:
-# 1. Evaluates policies from .deepwork.policy.yml
+# 1. Evaluates policies from .deepwork/policies/ (v2) or .deepwork.policy.yml (v1)
 # 2. Computes changed files based on each policy's compare_to setting
 # 3. Checks for <promise> tags in the conversation transcript
 # 4. Returns JSON to block stop if policies need attention
 
 set -e
 
-# Check if policy file exists
-if [ ! -f .deepwork.policy.yml ]; then
-    # No policies defined, nothing to do
+# Determine which policy system to use
+USE_V2=false
+V1_POLICY_FILE=".deepwork.policy.yml"
+V2_POLICY_DIR=".deepwork/policies"
+
+if [ -d "${V2_POLICY_DIR}" ]; then
+    # Check if there are any .md files in the v2 directory
+    if ls "${V2_POLICY_DIR}"/*.md 1>/dev/null 2>&1; then
+        USE_V2=true
+    fi
+fi
+
+# If no v2 policies and no v1 policy file, nothing to do
+if [ "${USE_V2}" = false ] && [ ! -f "${V1_POLICY_FILE}" ]; then
     exit 0
 fi
 
@@ -21,36 +32,38 @@ if [ ! -t 0 ]; then
     HOOK_INPUT=$(cat)
 fi
 
-# Extract transcript_path from the hook input JSON using jq
-# Claude Code passes: {"session_id": "...", "transcript_path": "...", ...}
-TRANSCRIPT_PATH=""
-if [ -n "${HOOK_INPUT}" ]; then
-    TRANSCRIPT_PATH=$(echo "${HOOK_INPUT}" | jq -r '.transcript_path // empty' 2>/dev/null || echo "")
-fi
+if [ "${USE_V2}" = true ]; then
+    # Use v2 policy system via cross-platform wrapper
+    # The wrapper reads JSON input and handles transcript extraction
+    result=$(echo "${HOOK_INPUT}" | DEEPWORK_HOOK_PLATFORM=claude DEEPWORK_HOOK_EVENT=Stop python -m deepwork.hooks.policy_check 2>/dev/null || echo '{}')
+else
+    # Use v1 policy system - extract conversation context for evaluate_policies
 
-# Extract conversation text from the JSONL transcript
-# The transcript is JSONL format - each line is a JSON object
-# We need to extract the text content from assistant messages
-conversation_context=""
-if [ -n "${TRANSCRIPT_PATH}" ] && [ -f "${TRANSCRIPT_PATH}" ]; then
-    # Extract text content from all assistant messages in the transcript
-    # Each line is a JSON object; we extract .message.content[].text for assistant messages
-    conversation_context=$(cat "${TRANSCRIPT_PATH}" | \
-        grep -E '"role"\s*:\s*"assistant"' | \
-        jq -r '.message.content // [] | map(select(.type == "text")) | map(.text) | join("\n")' 2>/dev/null | \
-        tr -d '\0' || echo "")
-fi
+    # Extract transcript_path from the hook input JSON using jq
+    # Claude Code passes: {"session_id": "...", "transcript_path": "...", ...}
+    TRANSCRIPT_PATH=""
+    if [ -n "${HOOK_INPUT}" ]; then
+        TRANSCRIPT_PATH=$(echo "${HOOK_INPUT}" | jq -r '.transcript_path // empty' 2>/dev/null || echo "")
+    fi
+
+    # Extract conversation text from the JSONL transcript
+    # The transcript is JSONL format - each line is a JSON object
+    # We need to extract the text content from assistant messages
+    conversation_context=""
+    if [ -n "${TRANSCRIPT_PATH}" ] && [ -f "${TRANSCRIPT_PATH}" ]; then
+        # Extract text content from all assistant messages in the transcript
+        # Each line is a JSON object; we extract .message.content[].text for assistant messages
+        conversation_context=$(cat "${TRANSCRIPT_PATH}" | \
+            grep -E '"role"\s*:\s*"assistant"' | \
+            jq -r '.message.content // [] | map(select(.type == "text")) | map(.text) | join("\n")' 2>/dev/null | \
+            tr -d '\0' || echo "")
+    fi
 
-# Call the Python evaluator
-# The Python module handles:
-# - Parsing the policy file
-# - Computing changed files based on each policy's compare_to setting
-# - Matching changed files against triggers/safety patterns
-# - Checking for promise tags in the conversation context
-# - Generating appropriate JSON output
-result=$(echo "${conversation_context}" | python -m deepwork.hooks.evaluate_policies \
-    --policy-file .deepwork.policy.yml \
-    2>/dev/null || echo '{}')
+    # Call the Python v1 evaluator
+    result=$(echo "${conversation_context}" | python -m deepwork.hooks.evaluate_policies \
+        --policy-file "${V1_POLICY_FILE}" \
+        2>/dev/null || echo '{}')
+fi
 
 # Output the result (JSON for Claude Code hooks)
 echo "${result}"

From 3c47aa73b3c5fea9b4b006fb989ae0d05cb57cdd Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Fri, 16 Jan 2026 19:29:59 +0000
Subject: [PATCH 08/21] Remove v1 policy format support

Remove all legacy v1 policy format (.deepwork.policy.yml) support:

- Remove evaluate_policies.py hook module
- Remove PolicyV1 class and parse_policy_file from policy_parser.py
- Remove v1 schema (POLICY_SCHEMA_V1) from policy_schema.py
- Remove v1 test fixtures and test_evaluate_policies.py
- Update test fixtures to use v2 frontmatter markdown format
- Update documentation to remove v1 references
- Fix policy_stop_hook.sh to handle exit code 2 (block) correctly

Only v2 frontmatter markdown format (.deepwork/policies/*.md) is now supported.
---
 .claude/commands/deepwork_policy.define.md    | 382 ++++++++++------
 .deepwork/jobs/deepwork_policy/job.yml        |  21 +-
 .../jobs/deepwork_policy/steps/define.md      | 338 +++++++++------
 .gemini/commands/deepwork_policy/define.toml  | 383 ++++++++++------
 CHANGELOG.md                                  |   4 +-
 README.md                                     |   5 +-
 doc/architecture.md                           |  18 +-
 src/deepwork/core/policy_parser.py            | 104 +----
 src/deepwork/hooks/README.md                  |   2 -
 src/deepwork/hooks/evaluate_policies.py       | 410 ------------------
 src/deepwork/schemas/policy_schema.py         |  79 ----
 .../deepwork_policy/hooks/policy_stop_hook.sh |  58 +--
 tests/fixtures/policies/empty_policy.yml      |   1 -
 .../policies/instructions/security_review.md  |   8 -
 .../policies/invalid_missing_instructions.yml |   2 -
 .../policies/invalid_missing_trigger.yml      |   3 -
 tests/fixtures/policies/multiple_policies.yml |  21 -
 .../policy_with_instructions_file.yml         |   3 -
 tests/fixtures/policies/valid_policy.yml      |   6 -
 tests/shell_script_tests/conftest.py          |  20 +-
 .../test_policy_stop_hook.py                  |  52 ++-
 tests/unit/test_evaluate_policies.py          | 101 -----
 tests/unit/test_policy_parser.py              | 343 ++++++---------
 23 files changed, 896 insertions(+), 1468 deletions(-)
 delete mode 100644 src/deepwork/hooks/evaluate_policies.py
 delete mode 100644 tests/fixtures/policies/empty_policy.yml
 delete mode 100644 tests/fixtures/policies/instructions/security_review.md
 delete mode 100644 tests/fixtures/policies/invalid_missing_instructions.yml
 delete mode 100644 tests/fixtures/policies/invalid_missing_trigger.yml
 delete mode 100644 tests/fixtures/policies/multiple_policies.yml
 delete mode 100644 tests/fixtures/policies/policy_with_instructions_file.yml
 delete mode 100644 tests/fixtures/policies/valid_policy.yml
 delete mode 100644 tests/unit/test_evaluate_policies.py

diff --git a/.claude/commands/deepwork_policy.define.md b/.claude/commands/deepwork_policy.define.md
index 9e7d1c20..9a2a551a 100644
--- a/.claude/commands/deepwork_policy.define.md
+++ b/.claude/commands/deepwork_policy.define.md
@@ -1,5 +1,5 @@
 ---
-description: Create or update policy entries in .deepwork.policy.yml
+description: Create or update policies in .deepwork/policies/ (v2) or .deepwork.policy.yml (v1)
 ---
 
 # deepwork_policy.define
@@ -14,17 +14,22 @@ Manages policies that automatically trigger when certain files change during an
 Policies help ensure that code changes follow team guidelines, documentation is updated,
 and architectural decisions are respected.
 
-Policies are defined in a `.deepwork.policy.yml` file at the root of your project. Each policy
-specifies:
-- Trigger patterns: Glob patterns for files that, when changed, should trigger the policy
-- Safety patterns: Glob patterns for files that, if also changed, mean the policy doesn't need to fire
-- Instructions: What the agent should do when the policy triggers
+**Policy System v2 (Recommended)**
+Policies are defined as individual markdown files in `.deepwork/policies/` with YAML frontmatter.
+This format supports:
+- Detection modes: trigger/safety (default), set (bidirectional), pair (directional)
+- Action types: prompt (show instructions), command (run idempotent commands)
+- Variable pattern matching for file correspondence (e.g., `src/{path}.py` ↔ `tests/{path}_test.py`)
+
+**Legacy v1 Format**
+Still supported: `.deepwork.policy.yml` at project root with trigger/safety/instructions fields.
 
 Example use cases:
+- Enforce source/test pairing with set patterns
+- Run formatters automatically when files change
 - Update installation docs when configuration files change
 - Require security review when authentication code is modified
 - Ensure API documentation stays in sync with API code
-- Remind developers to update changelogs
 
 
 
@@ -34,200 +39,295 @@ Example use cases:
 
 ## Objective
 
-Create or update policy entries in the `.deepwork.policy.yml` file to enforce team guidelines, documentation requirements, or other constraints when specific files change.
+Create or update policies to enforce team guidelines, documentation requirements, file correspondences, or automated commands when specific files change.
 
 ## Task
 
 Guide the user through defining a new policy by asking structured questions. **Do not create the policy without first understanding what they want to enforce.**
 
-**Important**: Use the AskUserQuestion tool to ask structured questions when gathering information from the user. This provides a better user experience with clear options and guided choices.
+**Important**: Use the AskUserQuestion tool to ask structured questions when gathering information from the user.
 
-### Step 1: Understand the Policy Purpose
+## Policy System Overview
 
-Start by asking structured questions to understand what the user wants to enforce:
+DeepWork supports two policy formats:
 
-1. **What guideline or constraint should this policy enforce?**
-   - What situation triggers the need for action?
-   - What files or directories, when changed, should trigger this policy?
-   - Examples: "When config files change", "When API code changes", "When database schema changes"
+**v2 (Recommended)**: Individual markdown files in `.deepwork/policies/` with YAML frontmatter
+**v1 (Legacy)**: Single `.deepwork.policy.yml` file at project root
 
-2. **What action should be taken?**
-   - What should the agent do when the policy triggers?
-   - Update documentation? Perform a security review? Update tests?
-   - Is there a specific file or process that needs attention?
+**Always prefer v2 format** for new policies. It supports more detection modes and action types.
 
-3. **Are there any "safety" conditions?**
-   - Are there files that, if also changed, mean the policy doesn't need to fire?
-   - For example: If config changes AND install_guide.md changes, assume docs are already updated
-   - This prevents redundant prompts when the user has already done the right thing
+---
 
-### Step 2: Define the Trigger Patterns
+## Step 1: Understand the Policy Purpose
 
-Help the user define glob patterns for files that should trigger the policy:
+Ask structured questions to understand what the user wants to enforce:
 
-**Common patterns:**
-- `src/**/*.py` - All Python files in src directory (recursive)
-- `app/config/**/*` - All files in app/config directory
-- `*.md` - All markdown files in root
-- `src/api/**/*` - All files in the API directory
-- `migrations/**/*.sql` - All SQL migrations
+1. **What should this policy enforce?**
+   - Documentation sync? Security review? File correspondence? Code formatting?
 
-**Pattern syntax:**
-- `*` - Matches any characters within a single path segment
-- `**` - Matches any characters across multiple path segments (recursive)
-- `?` - Matches a single character
+2. **What files trigger this policy?**
+   - Which files/directories, when changed, should trigger action?
 
-### Step 3: Define Safety Patterns (Optional)
+3. **What should happen when the policy fires?**
+   - Show instructions to the agent? Run a command automatically?
 
-If there are files that, when also changed, mean the policy shouldn't fire:
+---
 
-**Examples:**
-- Policy: "Update install guide when config changes"
-  - Trigger: `app/config/**/*`
-  - Safety: `docs/install_guide.md` (if already updated, don't prompt)
+## Step 2: Choose Detection Mode
 
-- Policy: "Security review for auth changes"
-  - Trigger: `src/auth/**/*`
-  - Safety: `SECURITY.md`, `docs/security_review.md`
+Policies support three detection modes:
 
-### Step 3b: Choose the Comparison Mode (Optional)
+### Trigger/Safety (Default)
+Fire when trigger patterns match AND safety patterns don't.
 
-The `compare_to` field controls what baseline is used when detecting "changed files":
+**Use for**: General checks like "source changed, verify README"
 
-**Options:**
-- `base` (default) - Compares to the base of the current branch (merge-base with main/master). This is the most common choice for feature branches, as it shows all changes made on the branch.
-- `default_tip` - Compares to the current tip of the default branch (main/master). Useful when you want to see the difference from what's currently in production.
-- `prompt` - Compares to the state at the start of each prompt. Useful for policies that should only fire based on changes made during a single agent response.
+```yaml
+trigger: "app/config/**/*"
+safety: "docs/install_guide.md"
+```
 
-**When to use each:**
-- **base**: Best for most policies. "Did this branch change config files?" → trigger docs review
-- **default_tip**: For policies about what's different from production/main
-- **prompt**: For policies that should only consider very recent changes within the current session
+### Set (Bidirectional Correspondence)
+Fire when files matching one pattern change but corresponding files don't.
 
-Most policies should use the default (`base`) and don't need to specify `compare_to`.
+**Use for**: Source/test pairing, i18n files, paired documentation
 
-### Step 4: Write the Instructions
+```yaml
+set:
+  - src/{path}.py
+  - tests/{path}_test.py
+```
 
-Create clear, actionable instructions for what the agent should do when the policy fires.
+If `src/utils/helper.py` changes, expects `tests/utils/helper_test.py` to also change.
 
-**Good instructions include:**
-- What to check or review
-- What files might need updating
-- Specific actions to take
-- Quality criteria for completion
+### Pair (Directional Correspondence)
+Fire when trigger files change but expected files don't. Changes to expected files alone don't trigger.
 
-**Example:**
+**Use for**: API code requires docs (but docs changes don't require API changes)
+
+```yaml
+pair:
+  trigger: src/api/{name}.py
+  expects: docs/api/{name}.md
+```
+
+### Variable Pattern Syntax
+
+- `{path}` - Matches multiple path segments (e.g., `foo/bar/baz`)
+- `{name}` - Matches a single segment (e.g., `helper`)
+
+---
+
+## Step 3: Choose Action Type
+
+### Prompt (Default)
+Show instructions to the agent. The markdown body becomes the instructions.
+
+```markdown
+---
+name: Security Review
+trigger: "src/auth/**/*"
+---
+Please review for hardcoded credentials and validate input handling.
 ```
-Configuration files have changed. Please:
-1. Review docs/install_guide.md for accuracy
-2. Update any installation steps that reference changed config
-3. Verify environment variable documentation is current
-4. Test that installation instructions still work
+
+### Command
+Run an idempotent command automatically. No markdown body needed.
+
+```markdown
+---
+name: Format Python
+trigger: "**/*.py"
+action:
+  command: "ruff format {file}"
+  run_for: each_match
+---
 ```
 
-### Step 5: Create the Policy Entry
+**Command variables**:
+- `{file}` - Current file being processed
+- `{files}` - Space-separated list of all matching files
+- `{repo_root}` - Repository root path
 
-Create or update `.deepwork.policy.yml` in the project root.
+**run_for options**:
+- `each_match` - Run command once per matching file
+- `all_matches` - Run command once with all files
 
-**File Location**: `.deepwork.policy.yml` (root of project)
+---
 
-**Format**:
-```yaml
-- name: "[Friendly name for the policy]"
-  trigger: "[glob pattern]"  # or array: ["pattern1", "pattern2"]
-  safety: "[glob pattern]"   # optional, or array
-  compare_to: "base"         # optional: "base" (default), "default_tip", or "prompt"
-  instructions: |
-    [Multi-line instructions for the agent...]
+## Step 4: Define Optional Settings
+
+### compare_to (Optional)
+Controls what baseline is used for detecting changed files:
+
+- `base` (default) - Changes since branch diverged from main/master
+- `default_tip` - Changes compared to current main/master tip
+- `prompt` - Changes since the last prompt submission
+
+Most policies should use the default (`base`).
+
+---
+
+## Step 5: Create the Policy File (v2 Format)
+
+### File Location
+Create: `.deepwork/policies/[policy-name].md`
+
+Use kebab-case for filename (e.g., `source-test-pairing.md`, `format-python.md`)
+
+### v2 Format Examples
+
+**Trigger/Safety with Prompt:**
+```markdown
+---
+name: Update Install Guide
+trigger: "app/config/**/*"
+safety: "docs/install_guide.md"
+---
+Configuration files have changed. Please review docs/install_guide.md
+and update installation instructions if needed.
 ```
 
-**Alternative with instructions_file**:
-```yaml
-- name: "[Friendly name for the policy]"
-  trigger: "[glob pattern]"
-  safety: "[glob pattern]"
-  compare_to: "base"         # optional
-  instructions_file: "path/to/instructions.md"
+**Set (Bidirectional) with Prompt:**
+```markdown
+---
+name: Source/Test Pairing
+set:
+  - src/{path}.py
+  - tests/{path}_test.py
+---
+When source files change, corresponding test files should also change.
+Please create or update tests for the modified source files.
 ```
 
-### Step 6: Verify the Policy
+**Pair (Directional) with Prompt:**
+```markdown
+---
+name: API Documentation
+pair:
+  trigger: src/api/{name}.py
+  expects: docs/api/{name}.md
+---
+API code has changed. Please update the corresponding documentation.
+```
 
-After creating the policy:
+**Command Action:**
+```markdown
+---
+name: Format Python Files
+trigger: "**/*.py"
+action:
+  command: "ruff format {file}"
+  run_for: each_match
+---
+```
 
-1. **Check the YAML syntax** - Ensure valid YAML formatting
-2. **Test trigger patterns** - Verify patterns match intended files
-3. **Review instructions** - Ensure they're clear and actionable
-4. **Check for conflicts** - Ensure the policy doesn't conflict with existing ones
+**Multiple Trigger Patterns:**
+```markdown
+---
+name: Security Review
+trigger:
+  - "src/auth/**/*"
+  - "src/security/**/*"
+safety:
+  - "SECURITY.md"
+  - "docs/security_audit.md"
+---
+Authentication or security code has been changed. Please review for:
+1. Hardcoded credentials or secrets
+2. Input validation issues
+3. Access control logic
+```
+
+---
+
+## Step 6: Legacy v1 Format (If Needed)
+
+Only use v1 format when adding to an existing `.deepwork.policy.yml` file.
 
-## Example Policies
+**File Location**: `.deepwork.policy.yml` (project root)
 
-### Update Documentation on Config Changes
 ```yaml
 - name: "Update install guide on config changes"
   trigger: "app/config/**/*"
   safety: "docs/install_guide.md"
+  compare_to: "base"
   instructions: |
-    Configuration files have been modified. Please review docs/install_guide.md
-    and update it if any installation instructions need to change based on the
-    new configuration.
+    Configuration files have changed. Please review docs/install_guide.md.
 ```
 
-### Security Review for Auth Code
+**Alternative with instructions_file:**
 ```yaml
-- name: "Security review for authentication changes"
-  trigger:
-    - "src/auth/**/*"
-    - "src/security/**/*"
-  safety:
-    - "SECURITY.md"
-    - "docs/security_audit.md"
-  instructions: |
-    Authentication or security code has been changed. Please:
-    1. Review for hardcoded credentials or secrets
-    2. Check input validation on user inputs
-    3. Verify access control logic is correct
-    4. Update security documentation if needed
+- name: "Security review"
+  trigger: "src/auth/**/*"
+  instructions_file: "path/to/instructions.md"
 ```
 
-### API Documentation Sync
-```yaml
-- name: "API documentation update"
-  trigger: "src/api/**/*.py"
-  safety: "docs/api/**/*.md"
-  instructions: |
-    API code has changed. Please verify that API documentation in docs/api/
-    is up to date with the code changes. Pay special attention to:
-    - New or changed endpoints
-    - Modified request/response schemas
-    - Updated authentication requirements
-```
+---
+
+## Step 7: Verify the Policy
+
+After creating the policy:
+
+1. **Check YAML frontmatter syntax** - Ensure valid YAML
+2. **Verify detection mode is appropriate** - trigger/safety vs set vs pair
+3. **Test patterns match intended files** - Check glob/variable patterns
+4. **Review instructions/command** - Ensure they're actionable
+5. **Check for conflicts** - Ensure no overlap with existing policies
+
+---
+
+## Pattern Reference
+
+### Glob Patterns
+- `*` - Matches any characters within a single path segment
+- `**` - Matches across multiple path segments (recursive)
+- `?` - Matches a single character
+
+### Variable Patterns (v2 only)
+- `{path}` - Captures multiple segments: `src/{path}.py` matches `src/a/b/c.py` → path=`a/b/c`
+- `{name}` - Captures single segment: `src/{name}.py` matches `src/utils.py` → name=`utils`
+
+### Common Examples
+- `src/**/*.py` - All Python files in src (recursive)
+- `app/config/**/*` - All files in app/config
+- `*.md` - Markdown files in root only
+- `**/*.test.ts` - All test files anywhere
+- `src/{path}.ts` ↔ `tests/{path}.test.ts` - Source/test pairs
+
+---
 
 ## Output Format
 
-### .deepwork.policy.yml
-Create or update this file at the project root with the new policy entry.
+Create one of:
+- `.deepwork/policies/[policy-name].md` (v2 format, recommended)
+- Entry in `.deepwork.policy.yml` (v1 format, legacy)
+
+---
 
 ## Quality Criteria
 
-- Asked structured questions to understand user requirements
+- Asked structured questions to understand requirements
+- Chose appropriate detection mode (trigger/safety, set, or pair)
+- Chose appropriate action type (prompt or command)
 - Policy name is clear and descriptive
-- Trigger patterns accurately match the intended files
-- Safety patterns prevent unnecessary triggering
-- Instructions are actionable and specific
-- YAML is valid and properly formatted
+- Patterns accurately match intended files
+- Instructions or command are actionable
+- YAML frontmatter is valid
+
+---
 
 ## Context
 
-Policies are evaluated automatically when you finish working on a task. The system:
-1. Determines which files have changed based on each policy's `compare_to` setting:
-   - `base` (default): Files changed since the branch diverged from main/master
-   - `default_tip`: Files different from the current main/master branch
-   - `prompt`: Files changed since the last prompt submission
-2. Checks if any changes match policy trigger patterns
-3. Skips policies where safety patterns also matched
-4. Prompts you with instructions for any triggered policies
+Policies are evaluated automatically when you finish working. The system:
+
+1. Loads policies from `.deepwork/policies/` (v2) and `.deepwork.policy.yml` (v1)
+2. Detects changed files based on `compare_to` setting
+3. Evaluates each policy based on its detection mode
+4. For **command** actions: Runs the command automatically
+5. For **prompt** actions: Shows instructions if policy fires
 
-You can mark a policy as addressed by including `<promise>✓ Policy Name</promise>` in your response (replace Policy Name with the actual policy name). This tells the system you've already handled that policy's requirements.
+Mark a policy as addressed by including `<promise>✓ Policy Name</promise>` in your response.
 
 
 ## Inputs
@@ -255,7 +355,7 @@ All work for this job should be done on a dedicated work branch:
 ## Output Requirements
 
 Create the following output(s):
-- `.deepwork.policy.yml`
+- `.deepwork/policies/*.md`- `.deepwork.policy.yml`
 Ensure all outputs are:
 - Well-formatted and complete
 - Ready for review or use by subsequent steps
@@ -268,7 +368,7 @@ After completing this step:
 
 2. **Inform the user**:
    - The define command is complete
-   - Outputs created: .deepwork.policy.yml
+   - Outputs created: .deepwork/policies/*.md, .deepwork.policy.yml
    - This command can be run again anytime to make further changes
 
 ## Command Complete
diff --git a/.deepwork/jobs/deepwork_policy/job.yml b/.deepwork/jobs/deepwork_policy/job.yml
index 777894ed..946f2386 100644
--- a/.deepwork/jobs/deepwork_policy/job.yml
+++ b/.deepwork/jobs/deepwork_policy/job.yml
@@ -1,37 +1,40 @@
 name: deepwork_policy
-version: "0.2.0"
+version: "0.3.0"
 summary: "Policy enforcement for AI agent sessions"
 description: |
   Manages policies that automatically trigger when certain files change during an AI agent session.
   Policies help ensure that code changes follow team guidelines, documentation is updated,
   and architectural decisions are respected.
 
-  Policies are defined in a `.deepwork.policy.yml` file at the root of your project. Each policy
-  specifies:
-  - Trigger patterns: Glob patterns for files that, when changed, should trigger the policy
-  - Safety patterns: Glob patterns for files that, if also changed, mean the policy doesn't need to fire
-  - Instructions: What the agent should do when the policy triggers
+  Policies are defined as individual markdown files in `.deepwork/policies/` with YAML frontmatter.
+  This format supports:
+  - Detection modes: trigger/safety (default), set (bidirectional), pair (directional)
+  - Action types: prompt (show instructions), command (run idempotent commands)
+  - Variable pattern matching for file correspondence (e.g., `src/{path}.py` ↔ `tests/{path}_test.py`)
 
   Example use cases:
+  - Enforce source/test pairing with set patterns
+  - Run formatters automatically when files change
   - Update installation docs when configuration files change
   - Require security review when authentication code is modified
   - Ensure API documentation stays in sync with API code
-  - Remind developers to update changelogs
 
 changelog:
   - version: "0.1.0"
     changes: "Initial version"
   - version: "0.2.0"
     changes: "Standardized on 'ask structured questions' phrasing for user input"
+  - version: "0.3.0"
+    changes: "Updated for policy system v2 with detection modes, action types, and variable patterns"
 
 steps:
   - id: define
     name: "Define Policy"
-    description: "Create or update policy entries in .deepwork.policy.yml"
+    description: "Create or update policies in .deepwork/policies/"
     instructions_file: steps/define.md
     inputs:
       - name: policy_purpose
         description: "What guideline or constraint should this policy enforce?"
     outputs:
-      - .deepwork.policy.yml
+      - .deepwork/policies/*.md
     dependencies: []
diff --git a/.deepwork/jobs/deepwork_policy/steps/define.md b/.deepwork/jobs/deepwork_policy/steps/define.md
index 302eda7f..452194aa 100644
--- a/.deepwork/jobs/deepwork_policy/steps/define.md
+++ b/.deepwork/jobs/deepwork_policy/steps/define.md
@@ -2,197 +2,257 @@
 
 ## Objective
 
-Create or update policy entries in the `.deepwork.policy.yml` file to enforce team guidelines, documentation requirements, or other constraints when specific files change.
+Create or update policies to enforce team guidelines, documentation requirements, file correspondences, or automated commands when specific files change.
 
 ## Task
 
 Guide the user through defining a new policy by asking structured questions. **Do not create the policy without first understanding what they want to enforce.**
 
-**Important**: Use the AskUserQuestion tool to ask structured questions when gathering information from the user. This provides a better user experience with clear options and guided choices.
+**Important**: Use the AskUserQuestion tool to ask structured questions when gathering information from the user.
 
-### Step 1: Understand the Policy Purpose
+---
 
-Start by asking structured questions to understand what the user wants to enforce:
+## Step 1: Understand the Policy Purpose
 
-1. **What guideline or constraint should this policy enforce?**
-   - What situation triggers the need for action?
-   - What files or directories, when changed, should trigger this policy?
-   - Examples: "When config files change", "When API code changes", "When database schema changes"
+Ask structured questions to understand what the user wants to enforce:
 
-2. **What action should be taken?**
-   - What should the agent do when the policy triggers?
-   - Update documentation? Perform a security review? Update tests?
-   - Is there a specific file or process that needs attention?
+1. **What should this policy enforce?**
+   - Documentation sync? Security review? File correspondence? Code formatting?
 
-3. **Are there any "safety" conditions?**
-   - Are there files that, if also changed, mean the policy doesn't need to fire?
-   - For example: If config changes AND install_guide.md changes, assume docs are already updated
-   - This prevents redundant prompts when the user has already done the right thing
+2. **What files trigger this policy?**
+   - Which files/directories, when changed, should trigger action?
 
-### Step 2: Define the Trigger Patterns
+3. **What should happen when the policy fires?**
+   - Show instructions to the agent? Run a command automatically?
 
-Help the user define glob patterns for files that should trigger the policy:
+---
 
-**Common patterns:**
-- `src/**/*.py` - All Python files in src directory (recursive)
-- `app/config/**/*` - All files in app/config directory
-- `*.md` - All markdown files in root
-- `src/api/**/*` - All files in the API directory
-- `migrations/**/*.sql` - All SQL migrations
+## Step 2: Choose Detection Mode
 
-**Pattern syntax:**
-- `*` - Matches any characters within a single path segment
-- `**` - Matches any characters across multiple path segments (recursive)
-- `?` - Matches a single character
+Policies support three detection modes:
 
-### Step 3: Define Safety Patterns (Optional)
+### Trigger/Safety (Default)
+Fire when trigger patterns match AND safety patterns don't.
 
-If there are files that, when also changed, mean the policy shouldn't fire:
+**Use for**: General checks like "source changed, verify README"
 
-**Examples:**
-- Policy: "Update install guide when config changes"
-  - Trigger: `app/config/**/*`
-  - Safety: `docs/install_guide.md` (if already updated, don't prompt)
+```yaml
+trigger: "app/config/**/*"
+safety: "docs/install_guide.md"
+```
 
-- Policy: "Security review for auth changes"
-  - Trigger: `src/auth/**/*`
-  - Safety: `SECURITY.md`, `docs/security_review.md`
+### Set (Bidirectional Correspondence)
+Fire when files matching one pattern change but corresponding files don't.
 
-### Step 3b: Choose the Comparison Mode (Optional)
+**Use for**: Source/test pairing, i18n files, paired documentation
 
-The `compare_to` field controls what baseline is used when detecting "changed files":
+```yaml
+set:
+  - src/{path}.py
+  - tests/{path}_test.py
+```
 
-**Options:**
-- `base` (default) - Compares to the base of the current branch (merge-base with main/master). This is the most common choice for feature branches, as it shows all changes made on the branch.
-- `default_tip` - Compares to the current tip of the default branch (main/master). Useful when you want to see the difference from what's currently in production.
-- `prompt` - Compares to the state at the start of each prompt. Useful for policies that should only fire based on changes made during a single agent response.
+If `src/utils/helper.py` changes, expects `tests/utils/helper_test.py` to also change.
+
+### Pair (Directional Correspondence)
+Fire when trigger files change but expected files don't. Changes to expected files alone don't trigger.
+
+**Use for**: API code requires docs (but docs changes don't require API changes)
+
+```yaml
+pair:
+  trigger: src/api/{name}.py
+  expects: docs/api/{name}.md
+```
 
-**When to use each:**
-- **base**: Best for most policies. "Did this branch change config files?" → trigger docs review
-- **default_tip**: For policies about what's different from production/main
-- **prompt**: For policies that should only consider very recent changes within the current session
+### Variable Pattern Syntax
 
-Most policies should use the default (`base`) and don't need to specify `compare_to`.
+- `{path}` - Matches multiple path segments (e.g., `foo/bar/baz`)
+- `{name}` - Matches a single segment (e.g., `helper`)
 
-### Step 4: Write the Instructions
+---
 
-Create clear, actionable instructions for what the agent should do when the policy fires.
+## Step 3: Choose Action Type
 
-**Good instructions include:**
-- What to check or review
-- What files might need updating
-- Specific actions to take
-- Quality criteria for completion
+### Prompt (Default)
+Show instructions to the agent. The markdown body becomes the instructions.
 
-**Example:**
+```markdown
+---
+name: Security Review
+trigger: "src/auth/**/*"
+---
+Please review for hardcoded credentials and validate input handling.
 ```
-Configuration files have changed. Please:
-1. Review docs/install_guide.md for accuracy
-2. Update any installation steps that reference changed config
-3. Verify environment variable documentation is current
-4. Test that installation instructions still work
+
+### Command
+Run an idempotent command automatically. No markdown body needed.
+
+```markdown
+---
+name: Format Python
+trigger: "**/*.py"
+action:
+  command: "ruff format {file}"
+  run_for: each_match
+---
 ```
 
-### Step 5: Create the Policy Entry
+**Command variables**:
+- `{file}` - Current file being processed
+- `{files}` - Space-separated list of all matching files
+- `{repo_root}` - Repository root path
 
-Create or update `.deepwork.policy.yml` in the project root.
+**run_for options**:
+- `each_match` - Run command once per matching file
+- `all_matches` - Run command once with all files
 
-**File Location**: `.deepwork.policy.yml` (root of project)
+---
 
-**Format**:
-```yaml
-- name: "[Friendly name for the policy]"
-  trigger: "[glob pattern]"  # or array: ["pattern1", "pattern2"]
-  safety: "[glob pattern]"   # optional, or array
-  compare_to: "base"         # optional: "base" (default), "default_tip", or "prompt"
-  instructions: |
-    [Multi-line instructions for the agent...]
-```
+## Step 4: Define Optional Settings
 
-**Alternative with instructions_file**:
-```yaml
-- name: "[Friendly name for the policy]"
-  trigger: "[glob pattern]"
-  safety: "[glob pattern]"
-  compare_to: "base"         # optional
-  instructions_file: "path/to/instructions.md"
-```
+### compare_to (Optional)
+Controls what baseline is used for detecting changed files:
 
-### Step 6: Verify the Policy
+- `base` (default) - Changes since branch diverged from main/master
+- `default_tip` - Changes compared to current main/master tip
+- `prompt` - Changes since the last prompt submission
 
-After creating the policy:
+Most policies should use the default (`base`).
 
-1. **Check the YAML syntax** - Ensure valid YAML formatting
-2. **Test trigger patterns** - Verify patterns match intended files
-3. **Review instructions** - Ensure they're clear and actionable
-4. **Check for conflicts** - Ensure the policy doesn't conflict with existing ones
+---
 
-## Example Policies
+## Step 5: Create the Policy File
 
-### Update Documentation on Config Changes
-```yaml
-- name: "Update install guide on config changes"
-  trigger: "app/config/**/*"
-  safety: "docs/install_guide.md"
-  instructions: |
-    Configuration files have been modified. Please review docs/install_guide.md
-    and update it if any installation instructions need to change based on the
-    new configuration.
+### File Location
+Create: `.deepwork/policies/[policy-name].md`
+
+Use kebab-case for filename (e.g., `source-test-pairing.md`, `format-python.md`)
+
+### Examples
+
+**Trigger/Safety with Prompt:**
+```markdown
+---
+name: Update Install Guide
+trigger: "app/config/**/*"
+safety: "docs/install_guide.md"
+---
+Configuration files have changed. Please review docs/install_guide.md
+and update installation instructions if needed.
 ```
 
-### Security Review for Auth Code
-```yaml
-- name: "Security review for authentication changes"
-  trigger:
-    - "src/auth/**/*"
-    - "src/security/**/*"
-  safety:
-    - "SECURITY.md"
-    - "docs/security_audit.md"
-  instructions: |
-    Authentication or security code has been changed. Please:
-    1. Review for hardcoded credentials or secrets
-    2. Check input validation on user inputs
-    3. Verify access control logic is correct
-    4. Update security documentation if needed
+**Set (Bidirectional) with Prompt:**
+```markdown
+---
+name: Source/Test Pairing
+set:
+  - src/{path}.py
+  - tests/{path}_test.py
+---
+When source files change, corresponding test files should also change.
+Please create or update tests for the modified source files.
 ```
 
-### API Documentation Sync
-```yaml
-- name: "API documentation update"
-  trigger: "src/api/**/*.py"
-  safety: "docs/api/**/*.md"
-  instructions: |
-    API code has changed. Please verify that API documentation in docs/api/
-    is up to date with the code changes. Pay special attention to:
-    - New or changed endpoints
-    - Modified request/response schemas
-    - Updated authentication requirements
+**Pair (Directional) with Prompt:**
+```markdown
+---
+name: API Documentation
+pair:
+  trigger: src/api/{name}.py
+  expects: docs/api/{name}.md
+---
+API code has changed. Please update the corresponding documentation.
+```
+
+**Command Action:**
+```markdown
+---
+name: Format Python Files
+trigger: "**/*.py"
+action:
+  command: "ruff format {file}"
+  run_for: each_match
+---
 ```
 
+**Multiple Trigger Patterns:**
+```markdown
+---
+name: Security Review
+trigger:
+  - "src/auth/**/*"
+  - "src/security/**/*"
+safety:
+  - "SECURITY.md"
+  - "docs/security_audit.md"
+---
+Authentication or security code has been changed. Please review for:
+1. Hardcoded credentials or secrets
+2. Input validation issues
+3. Access control logic
+```
+
+---
+
+## Step 6: Verify the Policy
+
+After creating the policy:
+
+1. **Check YAML frontmatter syntax** - Ensure valid YAML
+2. **Verify detection mode is appropriate** - trigger/safety vs set vs pair
+3. **Test patterns match intended files** - Check glob/variable patterns
+4. **Review instructions/command** - Ensure they're actionable
+5. **Check for conflicts** - Ensure no overlap with existing policies
+
+---
+
+## Pattern Reference
+
+### Glob Patterns
+- `*` - Matches any characters within a single path segment
+- `**` - Matches across multiple path segments (recursive)
+- `?` - Matches a single character
+
+### Variable Patterns
+- `{path}` - Captures multiple segments: `src/{path}.py` matches `src/a/b/c.py` → path=`a/b/c`
+- `{name}` - Captures single segment: `src/{name}.py` matches `src/utils.py` → name=`utils`
+
+### Common Examples
+- `src/**/*.py` - All Python files in src (recursive)
+- `app/config/**/*` - All files in app/config
+- `*.md` - Markdown files in root only
+- `**/*.test.ts` - All test files anywhere
+- `src/{path}.ts` ↔ `tests/{path}.test.ts` - Source/test pairs
+
+---
+
 ## Output Format
 
-### .deepwork.policy.yml
-Create or update this file at the project root with the new policy entry.
+Create: `.deepwork/policies/[policy-name].md`
+
+---
 
 ## Quality Criteria
 
-- Asked structured questions to understand user requirements
+- Asked structured questions to understand requirements
+- Chose appropriate detection mode (trigger/safety, set, or pair)
+- Chose appropriate action type (prompt or command)
 - Policy name is clear and descriptive
-- Trigger patterns accurately match the intended files
-- Safety patterns prevent unnecessary triggering
-- Instructions are actionable and specific
-- YAML is valid and properly formatted
+- Patterns accurately match intended files
+- Instructions or command are actionable
+- YAML frontmatter is valid
+
+---
 
 ## Context
 
-Policies are evaluated automatically when you finish working on a task. The system:
-1. Determines which files have changed based on each policy's `compare_to` setting:
-   - `base` (default): Files changed since the branch diverged from main/master
-   - `default_tip`: Files different from the current main/master branch
-   - `prompt`: Files changed since the last prompt submission
-2. Checks if any changes match policy trigger patterns
-3. Skips policies where safety patterns also matched
-4. Prompts you with instructions for any triggered policies
+Policies are evaluated automatically when you finish working. The system:
+
+1. Loads policies from `.deepwork/policies/`
+2. Detects changed files based on `compare_to` setting
+3. Evaluates each policy based on its detection mode
+4. For **command** actions: Runs the command automatically
+5. For **prompt** actions: Shows instructions if policy fires
 
-You can mark a policy as addressed by including `<promise>✓ Policy Name</promise>` in your response (replace Policy Name with the actual policy name). This tells the system you've already handled that policy's requirements.
+Mark a policy as addressed by including `<promise>✓ Policy Name</promise>` in your response.
diff --git a/.gemini/commands/deepwork_policy/define.toml b/.gemini/commands/deepwork_policy/define.toml
index ca45a47f..0195ff11 100644
--- a/.gemini/commands/deepwork_policy/define.toml
+++ b/.gemini/commands/deepwork_policy/define.toml
@@ -1,10 +1,10 @@
 # deepwork_policy:define
 #
-# Create or update policy entries in .deepwork.policy.yml
+# Create or update policies in .deepwork/policies/ (v2) or .deepwork.policy.yml (v1)
 #
 # Generated by DeepWork - do not edit manually
 
-description = "Create or update policy entries in .deepwork.policy.yml"
+description = "Create or update policies in .deepwork/policies/ (v2) or .deepwork.policy.yml (v1)"
 
 prompt = """
 # deepwork_policy:define
@@ -19,17 +19,22 @@ Manages policies that automatically trigger when certain files change during an
 Policies help ensure that code changes follow team guidelines, documentation is updated,
 and architectural decisions are respected.
 
-Policies are defined in a `.deepwork.policy.yml` file at the root of your project. Each policy
-specifies:
-- Trigger patterns: Glob patterns for files that, when changed, should trigger the policy
-- Safety patterns: Glob patterns for files that, if also changed, mean the policy doesn't need to fire
-- Instructions: What the agent should do when the policy triggers
+**Policy System v2 (Recommended)**
+Policies are defined as individual markdown files in `.deepwork/policies/` with YAML frontmatter.
+This format supports:
+- Detection modes: trigger/safety (default), set (bidirectional), pair (directional)
+- Action types: prompt (show instructions), command (run idempotent commands)
+- Variable pattern matching for file correspondence (e.g., `src/{path}.py` ↔ `tests/{path}_test.py`)
+
+**Legacy v1 Format**
+Still supported: `.deepwork.policy.yml` at project root with trigger/safety/instructions fields.
 
 Example use cases:
+- Enforce source/test pairing with set patterns
+- Run formatters automatically when files change
 - Update installation docs when configuration files change
 - Require security review when authentication code is modified
 - Ensure API documentation stays in sync with API code
-- Remind developers to update changelogs
 
 
 
@@ -39,200 +44,295 @@ Example use cases:
 
 ## Objective
 
-Create or update policy entries in the `.deepwork.policy.yml` file to enforce team guidelines, documentation requirements, or other constraints when specific files change.
+Create or update policies to enforce team guidelines, documentation requirements, file correspondences, or automated commands when specific files change.
 
 ## Task
 
 Guide the user through defining a new policy by asking structured questions. **Do not create the policy without first understanding what they want to enforce.**
 
-**Important**: Use the AskUserQuestion tool to ask structured questions when gathering information from the user. This provides a better user experience with clear options and guided choices.
+**Important**: Use the AskUserQuestion tool to ask structured questions when gathering information from the user.
 
-### Step 1: Understand the Policy Purpose
+## Policy System Overview
 
-Start by asking structured questions to understand what the user wants to enforce:
+DeepWork supports two policy formats:
 
-1. **What guideline or constraint should this policy enforce?**
-   - What situation triggers the need for action?
-   - What files or directories, when changed, should trigger this policy?
-   - Examples: "When config files change", "When API code changes", "When database schema changes"
+**v2 (Recommended)**: Individual markdown files in `.deepwork/policies/` with YAML frontmatter
+**v1 (Legacy)**: Single `.deepwork.policy.yml` file at project root
 
-2. **What action should be taken?**
-   - What should the agent do when the policy triggers?
-   - Update documentation? Perform a security review? Update tests?
-   - Is there a specific file or process that needs attention?
+**Always prefer v2 format** for new policies. It supports more detection modes and action types.
 
-3. **Are there any "safety" conditions?**
-   - Are there files that, if also changed, mean the policy doesn't need to fire?
-   - For example: If config changes AND install_guide.md changes, assume docs are already updated
-   - This prevents redundant prompts when the user has already done the right thing
+---
 
-### Step 2: Define the Trigger Patterns
+## Step 1: Understand the Policy Purpose
 
-Help the user define glob patterns for files that should trigger the policy:
+Ask structured questions to understand what the user wants to enforce:
 
-**Common patterns:**
-- `src/**/*.py` - All Python files in src directory (recursive)
-- `app/config/**/*` - All files in app/config directory
-- `*.md` - All markdown files in root
-- `src/api/**/*` - All files in the API directory
-- `migrations/**/*.sql` - All SQL migrations
+1. **What should this policy enforce?**
+   - Documentation sync? Security review? File correspondence? Code formatting?
 
-**Pattern syntax:**
-- `*` - Matches any characters within a single path segment
-- `**` - Matches any characters across multiple path segments (recursive)
-- `?` - Matches a single character
+2. **What files trigger this policy?**
+   - Which files/directories, when changed, should trigger action?
 
-### Step 3: Define Safety Patterns (Optional)
+3. **What should happen when the policy fires?**
+   - Show instructions to the agent? Run a command automatically?
 
-If there are files that, when also changed, mean the policy shouldn't fire:
+---
 
-**Examples:**
-- Policy: "Update install guide when config changes"
-  - Trigger: `app/config/**/*`
-  - Safety: `docs/install_guide.md` (if already updated, don't prompt)
+## Step 2: Choose Detection Mode
 
-- Policy: "Security review for auth changes"
-  - Trigger: `src/auth/**/*`
-  - Safety: `SECURITY.md`, `docs/security_review.md`
+Policies support three detection modes:
 
-### Step 3b: Choose the Comparison Mode (Optional)
+### Trigger/Safety (Default)
+Fire when trigger patterns match AND safety patterns don't.
 
-The `compare_to` field controls what baseline is used when detecting "changed files":
+**Use for**: General checks like "source changed, verify README"
 
-**Options:**
-- `base` (default) - Compares to the base of the current branch (merge-base with main/master). This is the most common choice for feature branches, as it shows all changes made on the branch.
-- `default_tip` - Compares to the current tip of the default branch (main/master). Useful when you want to see the difference from what's currently in production.
-- `prompt` - Compares to the state at the start of each prompt. Useful for policies that should only fire based on changes made during a single agent response.
+```yaml
+trigger: "app/config/**/*"
+safety: "docs/install_guide.md"
+```
 
-**When to use each:**
-- **base**: Best for most policies. "Did this branch change config files?" → trigger docs review
-- **default_tip**: For policies about what's different from production/main
-- **prompt**: For policies that should only consider very recent changes within the current session
+### Set (Bidirectional Correspondence)
+Fire when files matching one pattern change but corresponding files don't.
 
-Most policies should use the default (`base`) and don't need to specify `compare_to`.
+**Use for**: Source/test pairing, i18n files, paired documentation
 
-### Step 4: Write the Instructions
+```yaml
+set:
+  - src/{path}.py
+  - tests/{path}_test.py
+```
 
-Create clear, actionable instructions for what the agent should do when the policy fires.
+If `src/utils/helper.py` changes, expects `tests/utils/helper_test.py` to also change.
 
-**Good instructions include:**
-- What to check or review
-- What files might need updating
-- Specific actions to take
-- Quality criteria for completion
+### Pair (Directional Correspondence)
+Fire when trigger files change but expected files don't. Changes to expected files alone don't trigger.
 
-**Example:**
+**Use for**: API code requires docs (but docs changes don't require API changes)
+
+```yaml
+pair:
+  trigger: src/api/{name}.py
+  expects: docs/api/{name}.md
+```
+
+### Variable Pattern Syntax
+
+- `{path}` - Matches multiple path segments (e.g., `foo/bar/baz`)
+- `{name}` - Matches a single segment (e.g., `helper`)
+
+---
+
+## Step 3: Choose Action Type
+
+### Prompt (Default)
+Show instructions to the agent. The markdown body becomes the instructions.
+
+```markdown
+---
+name: Security Review
+trigger: "src/auth/**/*"
+---
+Please review for hardcoded credentials and validate input handling.
 ```
-Configuration files have changed. Please:
-1. Review docs/install_guide.md for accuracy
-2. Update any installation steps that reference changed config
-3. Verify environment variable documentation is current
-4. Test that installation instructions still work
+
+### Command
+Run an idempotent command automatically. No markdown body needed.
+
+```markdown
+---
+name: Format Python
+trigger: "**/*.py"
+action:
+  command: "ruff format {file}"
+  run_for: each_match
+---
 ```
 
-### Step 5: Create the Policy Entry
+**Command variables**:
+- `{file}` - Current file being processed
+- `{files}` - Space-separated list of all matching files
+- `{repo_root}` - Repository root path
 
-Create or update `.deepwork.policy.yml` in the project root.
+**run_for options**:
+- `each_match` - Run command once per matching file
+- `all_matches` - Run command once with all files
 
-**File Location**: `.deepwork.policy.yml` (root of project)
+---
 
-**Format**:
-```yaml
-- name: "[Friendly name for the policy]"
-  trigger: "[glob pattern]"  # or array: ["pattern1", "pattern2"]
-  safety: "[glob pattern]"   # optional, or array
-  compare_to: "base"         # optional: "base" (default), "default_tip", or "prompt"
-  instructions: |
-    [Multi-line instructions for the agent...]
+## Step 4: Define Optional Settings
+
+### compare_to (Optional)
+Controls what baseline is used for detecting changed files:
+
+- `base` (default) - Changes since branch diverged from main/master
+- `default_tip` - Changes compared to current main/master tip
+- `prompt` - Changes since the last prompt submission
+
+Most policies should use the default (`base`).
+
+---
+
+## Step 5: Create the Policy File (v2 Format)
+
+### File Location
+Create: `.deepwork/policies/[policy-name].md`
+
+Use kebab-case for filename (e.g., `source-test-pairing.md`, `format-python.md`)
+
+### v2 Format Examples
+
+**Trigger/Safety with Prompt:**
+```markdown
+---
+name: Update Install Guide
+trigger: "app/config/**/*"
+safety: "docs/install_guide.md"
+---
+Configuration files have changed. Please review docs/install_guide.md
+and update installation instructions if needed.
 ```
 
-**Alternative with instructions_file**:
-```yaml
-- name: "[Friendly name for the policy]"
-  trigger: "[glob pattern]"
-  safety: "[glob pattern]"
-  compare_to: "base"         # optional
-  instructions_file: "path/to/instructions.md"
+**Set (Bidirectional) with Prompt:**
+```markdown
+---
+name: Source/Test Pairing
+set:
+  - src/{path}.py
+  - tests/{path}_test.py
+---
+When source files change, corresponding test files should also change.
+Please create or update tests for the modified source files.
 ```
 
-### Step 6: Verify the Policy
+**Pair (Directional) with Prompt:**
+```markdown
+---
+name: API Documentation
+pair:
+  trigger: src/api/{name}.py
+  expects: docs/api/{name}.md
+---
+API code has changed. Please update the corresponding documentation.
+```
 
-After creating the policy:
+**Command Action:**
+```markdown
+---
+name: Format Python Files
+trigger: "**/*.py"
+action:
+  command: "ruff format {file}"
+  run_for: each_match
+---
+```
 
-1. **Check the YAML syntax** - Ensure valid YAML formatting
-2. **Test trigger patterns** - Verify patterns match intended files
-3. **Review instructions** - Ensure they're clear and actionable
-4. **Check for conflicts** - Ensure the policy doesn't conflict with existing ones
+**Multiple Trigger Patterns:**
+```markdown
+---
+name: Security Review
+trigger:
+  - "src/auth/**/*"
+  - "src/security/**/*"
+safety:
+  - "SECURITY.md"
+  - "docs/security_audit.md"
+---
+Authentication or security code has been changed. Please review for:
+1. Hardcoded credentials or secrets
+2. Input validation issues
+3. Access control logic
+```
+
+---
+
+## Step 6: Legacy v1 Format (If Needed)
+
+Only use v1 format when adding to an existing `.deepwork.policy.yml` file.
 
-## Example Policies
+**File Location**: `.deepwork.policy.yml` (project root)
 
-### Update Documentation on Config Changes
 ```yaml
 - name: "Update install guide on config changes"
   trigger: "app/config/**/*"
   safety: "docs/install_guide.md"
+  compare_to: "base"
   instructions: |
-    Configuration files have been modified. Please review docs/install_guide.md
-    and update it if any installation instructions need to change based on the
-    new configuration.
+    Configuration files have changed. Please review docs/install_guide.md.
 ```
 
-### Security Review for Auth Code
+**Alternative with instructions_file:**
 ```yaml
-- name: "Security review for authentication changes"
-  trigger:
-    - "src/auth/**/*"
-    - "src/security/**/*"
-  safety:
-    - "SECURITY.md"
-    - "docs/security_audit.md"
-  instructions: |
-    Authentication or security code has been changed. Please:
-    1. Review for hardcoded credentials or secrets
-    2. Check input validation on user inputs
-    3. Verify access control logic is correct
-    4. Update security documentation if needed
+- name: "Security review"
+  trigger: "src/auth/**/*"
+  instructions_file: "path/to/instructions.md"
 ```
 
-### API Documentation Sync
-```yaml
-- name: "API documentation update"
-  trigger: "src/api/**/*.py"
-  safety: "docs/api/**/*.md"
-  instructions: |
-    API code has changed. Please verify that API documentation in docs/api/
-    is up to date with the code changes. Pay special attention to:
-    - New or changed endpoints
-    - Modified request/response schemas
-    - Updated authentication requirements
-```
+---
+
+## Step 7: Verify the Policy
+
+After creating the policy:
+
+1. **Check YAML frontmatter syntax** - Ensure valid YAML
+2. **Verify detection mode is appropriate** - trigger/safety vs set vs pair
+3. **Test patterns match intended files** - Check glob/variable patterns
+4. **Review instructions/command** - Ensure they're actionable
+5. **Check for conflicts** - Ensure no overlap with existing policies
+
+---
+
+## Pattern Reference
+
+### Glob Patterns
+- `*` - Matches any characters within a single path segment
+- `**` - Matches across multiple path segments (recursive)
+- `?` - Matches a single character
+
+### Variable Patterns (v2 only)
+- `{path}` - Captures multiple segments: `src/{path}.py` matches `src/a/b/c.py` → path=`a/b/c`
+- `{name}` - Captures single segment: `src/{name}.py` matches `src/utils.py` → name=`utils`
+
+### Common Examples
+- `src/**/*.py` - All Python files in src (recursive)
+- `app/config/**/*` - All files in app/config
+- `*.md` - Markdown files in root only
+- `**/*.test.ts` - All test files anywhere
+- `src/{path}.ts` ↔ `tests/{path}.test.ts` - Source/test pairs
+
+---
 
 ## Output Format
 
-### .deepwork.policy.yml
-Create or update this file at the project root with the new policy entry.
+Create one of:
+- `.deepwork/policies/[policy-name].md` (v2 format, recommended)
+- Entry in `.deepwork.policy.yml` (v1 format, legacy)
+
+---
 
 ## Quality Criteria
 
-- Asked structured questions to understand user requirements
+- Asked structured questions to understand requirements
+- Chose appropriate detection mode (trigger/safety, set, or pair)
+- Chose appropriate action type (prompt or command)
 - Policy name is clear and descriptive
-- Trigger patterns accurately match the intended files
-- Safety patterns prevent unnecessary triggering
-- Instructions are actionable and specific
-- YAML is valid and properly formatted
+- Patterns accurately match intended files
+- Instructions or command are actionable
+- YAML frontmatter is valid
+
+---
 
 ## Context
 
-Policies are evaluated automatically when you finish working on a task. The system:
-1. Determines which files have changed based on each policy's `compare_to` setting:
-   - `base` (default): Files changed since the branch diverged from main/master
-   - `default_tip`: Files different from the current main/master branch
-   - `prompt`: Files changed since the last prompt submission
-2. Checks if any changes match policy trigger patterns
-3. Skips policies where safety patterns also matched
-4. Prompts you with instructions for any triggered policies
+Policies are evaluated automatically when you finish working. The system:
+
+1. Loads policies from `.deepwork/policies/` (v2) and `.deepwork.policy.yml` (v1)
+2. Detects changed files based on `compare_to` setting
+3. Evaluates each policy based on its detection mode
+4. For **command** actions: Runs the command automatically
+5. For **prompt** actions: Shows instructions if policy fires
 
-You can mark a policy as addressed by including `<promise>✓ Policy Name</promise>` in your response (replace Policy Name with the actual policy name). This tells the system you've already handled that policy's requirements.
+Mark a policy as addressed by including `<promise>✓ Policy Name</promise>` in your response.
 
 
 ## Inputs
@@ -260,6 +360,7 @@ All work for this job should be done on a dedicated work branch:
 ## Output Requirements
 
 Create the following output(s):
+- `.deepwork/policies/*.md`
 - `.deepwork.policy.yml`
 
 Ensure all outputs are:
@@ -274,7 +375,7 @@ After completing this step:
 
 2. **Inform the user**:
    - The define command is complete
-   - Outputs created: .deepwork.policy.yml
+   - Outputs created: .deepwork/policies/*.md, .deepwork.policy.yml
    - This command can be run again anytime to make further changes
 
 ## Command Complete
diff --git a/CHANGELOG.md b/CHANGELOG.md
index bf907eb1..afbd1221 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -20,9 +20,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Updated `policy_check.py` hook to use v2 system with queue-based deduplication
 
 ### Changed
-- Policy parser now supports both v1 (`.deepwork.policy.yml`) and v2 (`.deepwork/policies/*.md`) formats
 - Documentation updated with v2 policy examples and configuration
 
+### Removed
+- v1 policy format (`.deepwork.policy.yml`) - now only v2 frontmatter markdown format is supported
+
 ## [0.3.0] - 2026-01-16
 
 ### Added
diff --git a/README.md b/README.md
index 218a30f5..6005b143 100644
--- a/README.md
+++ b/README.md
@@ -62,7 +62,6 @@ This will:
 - Generate core DeepWork jobs
 - Install DeepWork jobs for your AI assistant
 - Configure hooks for your AI assistant to enable policies
-- Create a `.deepwork.policy.yml` template file with example policies
 
 ## Quick Start
 
@@ -213,13 +212,13 @@ deepwork/
 │   │   ├── parser.py     # Job definition parsing
 │   │   ├── detector.py   # Platform detection
 │   │   ├── generator.py  # Skill file generation
-│   │   ├── policy_parser.py    # Policy parsing (v1 and v2)
+│   │   ├── policy_parser.py    # Policy parsing
 │   │   ├── pattern_matcher.py  # Variable pattern matching
 │   │   ├── policy_queue.py     # Policy state queue
 │   │   └── command_executor.py # Command action execution
 │   ├── hooks/            # Cross-platform hook wrappers
 │   │   ├── wrapper.py    # Input/output normalization
-│   │   ├── policy_check.py   # Policy evaluation hook (v2)
+│   │   ├── policy_check.py   # Policy evaluation hook
 │   │   ├── claude_hook.sh    # Claude Code adapter
 │   │   └── gemini_hook.sh    # Gemini CLI adapter
 │   ├── templates/        # Jinja2 templates
diff --git a/doc/architecture.md b/doc/architecture.md
index 16bed5b7..282f1f87 100644
--- a/doc/architecture.md
+++ b/doc/architecture.md
@@ -46,7 +46,7 @@ deepwork/                       # DeepWork tool repository
 │       │   ├── detector.py     # AI platform detection
 │       │   ├── generator.py    # Command file generation
 │       │   ├── parser.py       # Job definition parsing
-│       │   ├── policy_parser.py    # Policy definition parsing (v1 and v2)
+│       │   ├── policy_parser.py    # Policy definition parsing
 │       │   ├── pattern_matcher.py  # Variable pattern matching for policies
 │       │   ├── policy_queue.py     # Policy state queue system
 │       │   ├── command_executor.py # Command action execution
@@ -56,8 +56,7 @@ deepwork/                       # DeepWork tool repository
 │       │   ├── wrapper.py           # Cross-platform input/output normalization
 │       │   ├── claude_hook.sh       # Shell wrapper for Claude Code
 │       │   ├── gemini_hook.sh       # Shell wrapper for Gemini CLI
-│       │   ├── policy_check.py      # Cross-platform policy evaluation hook
-│       │   └── evaluate_policies.py # Legacy policy evaluation CLI
+│       │   └── policy_check.py      # Cross-platform policy evaluation hook
 │       ├── templates/          # Command templates for each platform
 │       │   ├── claude/
 │       │   │   └── command-job-step.md.jinja
@@ -314,7 +313,6 @@ my-project/                     # User's project (target)
 │       │   └── steps/
 │       └── ad_campaign/
 │           └── ...
-├── .deepwork.policy.yml        # Legacy policy definitions (v1 format)
 ├── (rest of user's project files)
 └── README.md
 ```
@@ -1142,18 +1140,6 @@ The hooks are installed to `.claude/settings.json` during `deepwork sync`:
 }
 ```
 
-### Legacy v1 Format
-
-The v1 format (`.deepwork.policy.yml`) is still supported for backward compatibility:
-
-```yaml
-- name: "Update install guide"
-  trigger: "app/config/**/*"
-  safety: "docs/install_guide.md"
-  instructions: |
-    Configuration files have been modified. Please review docs/install_guide.md.
-```
-
 ### Cross-Platform Hook Wrapper System
 
 The `hooks/` module provides a wrapper system that allows writing hooks once in Python and running them on multiple platforms. This normalizes the differences between Claude Code and Gemini CLI hook systems.
diff --git a/src/deepwork/core/policy_parser.py b/src/deepwork/core/policy_parser.py
index f1c5a288..06372098 100644
--- a/src/deepwork/core/policy_parser.py
+++ b/src/deepwork/core/policy_parser.py
@@ -13,7 +13,7 @@
     matches_any_pattern,
     resolve_pattern,
 )
-from deepwork.schemas.policy_schema import POLICY_FRONTMATTER_SCHEMA, POLICY_SCHEMA
+from deepwork.schemas.policy_schema import POLICY_FRONTMATTER_SCHEMA
 from deepwork.utils.validation import ValidationError, validate_against_schema
 
 
@@ -523,105 +523,3 @@ def evaluate_policies(
             results.append(result)
 
     return results
-
-
-# =============================================================================
-# Legacy v1 Support (for migration)
-# =============================================================================
-
-
-@dataclass
-class PolicyV1:
-    """Legacy v1 policy format (from .deepwork.policy.yml)."""
-
-    name: str
-    triggers: list[str]
-    safety: list[str] = field(default_factory=list)
-    instructions: str = ""
-    compare_to: str = DEFAULT_COMPARE_TO
-
-    @classmethod
-    def from_dict(cls, data: dict[str, Any], base_dir: Path | None = None) -> "PolicyV1":
-        """Create PolicyV1 from dictionary (legacy format)."""
-        trigger = data["trigger"]
-        triggers = [trigger] if isinstance(trigger, str) else list(trigger)
-
-        safety_data = data.get("safety", [])
-        safety = [safety_data] if isinstance(safety_data, str) else list(safety_data)
-
-        if "instructions" in data:
-            instructions = data["instructions"]
-        elif "instructions_file" in data:
-            if base_dir is None:
-                raise PolicyParseError(
-                    f"Policy '{data['name']}' uses instructions_file but no base_dir provided"
-                )
-            instructions_path = base_dir / data["instructions_file"]
-            if not instructions_path.exists():
-                raise PolicyParseError(
-                    f"Policy '{data['name']}' instructions file not found: {instructions_path}"
-                )
-            instructions = instructions_path.read_text()
-        else:
-            raise PolicyParseError(
-                f"Policy '{data['name']}' must have 'instructions' or 'instructions_file'"
-            )
-
-        return cls(
-            name=data["name"],
-            triggers=triggers,
-            safety=safety,
-            instructions=instructions,
-            compare_to=data.get("compare_to", DEFAULT_COMPARE_TO),
-        )
-
-
-def parse_policy_file(policy_path: Path | str, base_dir: Path | None = None) -> list[PolicyV1]:
-    """
-    Parse policy definitions from a YAML file (legacy v1 format).
-
-    Args:
-        policy_path: Path to .deepwork.policy.yml file
-        base_dir: Base directory for resolving instructions_file paths
-
-    Returns:
-        List of parsed PolicyV1 objects
-    """
-    policy_path = Path(policy_path)
-
-    if not policy_path.exists():
-        raise PolicyParseError(f"Policy file does not exist: {policy_path}")
-
-    if not policy_path.is_file():
-        raise PolicyParseError(f"Policy path is not a file: {policy_path}")
-
-    if base_dir is None:
-        base_dir = policy_path.parent
-
-    try:
-        with open(policy_path, encoding="utf-8") as f:
-            policy_data = yaml.safe_load(f)
-    except yaml.YAMLError as e:
-        raise PolicyParseError(f"Failed to parse policy YAML: {e}") from e
-    except OSError as e:
-        raise PolicyParseError(f"Failed to read policy file: {e}") from e
-
-    if policy_data is None:
-        return []
-
-    if not isinstance(policy_data, list):
-        raise PolicyParseError(
-            f"Policy file must contain a list of policies, got {type(policy_data).__name__}"
-        )
-
-    try:
-        validate_against_schema(policy_data, POLICY_SCHEMA)
-    except ValidationError as e:
-        raise PolicyParseError(f"Policy definition validation failed: {e}") from e
-
-    policies = []
-    for policy_item in policy_data:
-        policy = PolicyV1.from_dict(policy_item, base_dir)
-        policies.append(policy)
-
-    return policies
diff --git a/src/deepwork/hooks/README.md b/src/deepwork/hooks/README.md
index 7cf51559..84914a10 100644
--- a/src/deepwork/hooks/README.md
+++ b/src/deepwork/hooks/README.md
@@ -17,7 +17,6 @@ The hook system provides:
 
 3. **Hook implementations**:
    - `policy_check.py` - Evaluates DeepWork policies on `after_agent` events
-   - `evaluate_policies.py` - Legacy Claude-specific policy evaluation
 
 ## Usage
 
@@ -180,4 +179,3 @@ pytest tests/shell_script_tests/test_hook_wrappers.py -v
 | `claude_hook.sh` | Shell wrapper for Claude Code |
 | `gemini_hook.sh` | Shell wrapper for Gemini CLI |
 | `policy_check.py` | Cross-platform policy evaluation hook |
-| `evaluate_policies.py` | Legacy Claude-specific policy evaluation |
diff --git a/src/deepwork/hooks/evaluate_policies.py b/src/deepwork/hooks/evaluate_policies.py
deleted file mode 100644
index 3a2b05d8..00000000
--- a/src/deepwork/hooks/evaluate_policies.py
+++ /dev/null
@@ -1,410 +0,0 @@
-"""
-Policy evaluation module for DeepWork hooks.
-
-This module is called by the policy_stop_hook.sh script to evaluate which policies
-should fire based on changed files and conversation context.
-
-Usage:
-    python -m deepwork.hooks.evaluate_policies \
-        --policy-file .deepwork.policy.yml
-
-The conversation context is read from stdin and checked for <promise> tags
-that indicate policies have already been addressed.
-
-Changed files are computed based on each policy's compare_to setting:
-- base: Compare to merge-base with default branch (default)
-- default_tip: Two-dot diff against default branch tip
-- prompt: Compare to state captured at prompt submission
-
-Output is JSON suitable for Claude Code Stop hooks:
-    {"decision": "block", "reason": "..."}  # Block stop, policies need attention
-    {}  # No policies fired, allow stop
-"""
-
-import argparse
-import json
-import re
-import subprocess
-import sys
-from pathlib import Path
-
-from deepwork.core.pattern_matcher import matches_any_pattern
-from deepwork.core.policy_parser import (
-    PolicyParseError,
-    PolicyV1,
-    parse_policy_file,
-)
-
-
-def evaluate_policy_v1(policy: PolicyV1, changed_files: list[str]) -> bool:
-    """
-    Evaluate whether a v1 policy should fire based on changed files.
-
-    A policy fires when:
-    - At least one changed file matches a trigger pattern
-    - AND no changed file matches a safety pattern
-
-    Args:
-        policy: PolicyV1 to evaluate
-        changed_files: List of changed file paths
-
-    Returns:
-        True if policy should fire, False otherwise
-    """
-    # Check if any trigger matches
-    trigger_matched = False
-    for file_path in changed_files:
-        if matches_any_pattern(file_path, policy.triggers):
-            trigger_matched = True
-            break
-
-    if not trigger_matched:
-        return False
-
-    # Check if any safety pattern matches
-    if policy.safety:
-        for file_path in changed_files:
-            if matches_any_pattern(file_path, policy.safety):
-                return False
-
-    return True
-
-
-def get_default_branch() -> str:
-    """
-    Get the default branch name (main or master).
-
-    Returns:
-        Default branch name, or "main" if cannot be determined.
-    """
-    # Try to get the default branch from remote HEAD
-    try:
-        result = subprocess.run(
-            ["git", "symbolic-ref", "refs/remotes/origin/HEAD"],
-            capture_output=True,
-            text=True,
-            check=True,
-        )
-        # Output is like "refs/remotes/origin/main"
-        return result.stdout.strip().split("/")[-1]
-    except subprocess.CalledProcessError:
-        pass
-
-    # Try common default branch names
-    for branch in ["main", "master"]:
-        try:
-            subprocess.run(
-                ["git", "rev-parse", "--verify", f"origin/{branch}"],
-                capture_output=True,
-                check=True,
-            )
-            return branch
-        except subprocess.CalledProcessError:
-            continue
-
-    # Fall back to main
-    return "main"
-
-
-def get_changed_files_base() -> list[str]:
-    """
-    Get files changed relative to the base of the current branch.
-
-    This finds the merge-base between the current branch and the default branch,
-    then returns all files changed since that point.
-
-    Returns:
-        List of changed file paths.
-    """
-    default_branch = get_default_branch()
-
-    try:
-        # Get the merge-base (where current branch diverged from default)
-        result = subprocess.run(
-            ["git", "merge-base", "HEAD", f"origin/{default_branch}"],
-            capture_output=True,
-            text=True,
-            check=True,
-        )
-        merge_base = result.stdout.strip()
-
-        # Stage all changes so they appear in diff
-        subprocess.run(["git", "add", "-A"], capture_output=True, check=False)
-
-        # Get files changed since merge-base (including staged)
-        result = subprocess.run(
-            ["git", "diff", "--name-only", merge_base, "HEAD"],
-            capture_output=True,
-            text=True,
-            check=True,
-        )
-        committed_files = set(result.stdout.strip().split("\n")) if result.stdout.strip() else set()
-
-        # Also get staged changes not yet committed
-        result = subprocess.run(
-            ["git", "diff", "--name-only", "--cached"],
-            capture_output=True,
-            text=True,
-            check=False,
-        )
-        staged_files = set(result.stdout.strip().split("\n")) if result.stdout.strip() else set()
-
-        # Get untracked files
-        result = subprocess.run(
-            ["git", "ls-files", "--others", "--exclude-standard"],
-            capture_output=True,
-            text=True,
-            check=False,
-        )
-        untracked_files = set(result.stdout.strip().split("\n")) if result.stdout.strip() else set()
-
-        all_files = committed_files | staged_files | untracked_files
-        return sorted([f for f in all_files if f])
-
-    except subprocess.CalledProcessError:
-        return []
-
-
-def get_changed_files_default_tip() -> list[str]:
-    """
-    Get files changed compared to the tip of the default branch.
-
-    This does a two-dot diff: what's different between HEAD and origin/default.
-
-    Returns:
-        List of changed file paths.
-    """
-    default_branch = get_default_branch()
-
-    try:
-        # Stage all changes so they appear in diff
-        subprocess.run(["git", "add", "-A"], capture_output=True, check=False)
-
-        # Two-dot diff against default branch tip
-        result = subprocess.run(
-            ["git", "diff", "--name-only", f"origin/{default_branch}..HEAD"],
-            capture_output=True,
-            text=True,
-            check=True,
-        )
-        committed_files = set(result.stdout.strip().split("\n")) if result.stdout.strip() else set()
-
-        # Also get staged changes not yet committed
-        result = subprocess.run(
-            ["git", "diff", "--name-only", "--cached"],
-            capture_output=True,
-            text=True,
-            check=False,
-        )
-        staged_files = set(result.stdout.strip().split("\n")) if result.stdout.strip() else set()
-
-        # Get untracked files
-        result = subprocess.run(
-            ["git", "ls-files", "--others", "--exclude-standard"],
-            capture_output=True,
-            text=True,
-            check=False,
-        )
-        untracked_files = set(result.stdout.strip().split("\n")) if result.stdout.strip() else set()
-
-        all_files = committed_files | staged_files | untracked_files
-        return sorted([f for f in all_files if f])
-
-    except subprocess.CalledProcessError:
-        return []
-
-
-def get_changed_files_prompt() -> list[str]:
-    """
-    Get files changed since the prompt was submitted.
-
-    This compares against the baseline captured by capture_prompt_work_tree.sh.
-
-    Returns:
-        List of changed file paths.
-    """
-    baseline_path = Path(".deepwork/.last_work_tree")
-
-    try:
-        # Stage all changes so we can see them with --cached
-        subprocess.run(["git", "add", "-A"], capture_output=True, check=False)
-
-        # Get all staged files (includes what was just staged)
-        result = subprocess.run(
-            ["git", "diff", "--name-only", "--cached"],
-            capture_output=True,
-            text=True,
-            check=False,
-        )
-        current_files = set(result.stdout.strip().split("\n")) if result.stdout.strip() else set()
-        current_files = {f for f in current_files if f}
-
-        if baseline_path.exists():
-            # Read baseline and find new files
-            baseline_files = set(baseline_path.read_text().strip().split("\n"))
-            baseline_files = {f for f in baseline_files if f}
-            # Return files that are in current but not in baseline
-            new_files = current_files - baseline_files
-            return sorted(new_files)
-        else:
-            # No baseline, return all current changes
-            return sorted(current_files)
-
-    except (subprocess.CalledProcessError, OSError):
-        return []
-
-
-def get_changed_files_for_mode(mode: str) -> list[str]:
-    """
-    Get changed files for a specific compare_to mode.
-
-    Args:
-        mode: One of 'base', 'default_tip', or 'prompt'
-
-    Returns:
-        List of changed file paths.
-    """
-    if mode == "base":
-        return get_changed_files_base()
-    elif mode == "default_tip":
-        return get_changed_files_default_tip()
-    elif mode == "prompt":
-        return get_changed_files_prompt()
-    else:
-        # Unknown mode, fall back to base
-        return get_changed_files_base()
-
-
-def extract_promise_tags(text: str) -> set[str]:
-    """
-    Extract policy names from <promise> tags in text.
-
-    Supported format:
-    - <promise>✓ Policy Name</promise>
-
-    Args:
-        text: Text to search for promise tags
-
-    Returns:
-        Set of policy names that have been promised/addressed
-    """
-    # Match <promise>✓ Policy Name</promise> and extract the policy name
-    pattern = r"<promise>✓\s*([^<]+)</promise>"
-    matches = re.findall(pattern, text, re.IGNORECASE | re.DOTALL)
-    return {m.strip() for m in matches}
-
-
-def format_policy_message(policies: list) -> str:
-    """
-    Format triggered policies into a message for the agent.
-
-    Args:
-        policies: List of Policy objects that fired
-
-    Returns:
-        Formatted message with all policy instructions
-    """
-    lines = ["## DeepWork Policies Triggered", ""]
-    lines.append(
-        "Comply with the following policies. "
-        "To mark a policy as addressed, include `<promise>✓ Policy Name</promise>` "
-        "in your response (replace Policy Name with the actual policy name)."
-    )
-    lines.append("")
-
-    for policy in policies:
-        lines.append(f"### Policy: {policy.name}")
-        lines.append("")
-        lines.append(policy.instructions.strip())
-        lines.append("")
-
-    return "\n".join(lines)
-
-
-def main() -> None:
-    """Main entry point for policy evaluation CLI."""
-    parser = argparse.ArgumentParser(
-        description="Evaluate DeepWork policies based on changed files"
-    )
-    parser.add_argument(
-        "--policy-file",
-        type=str,
-        required=True,
-        help="Path to .deepwork.policy.yml file",
-    )
-
-    args = parser.parse_args()
-
-    # Check if policy file exists
-    policy_path = Path(args.policy_file)
-    if not policy_path.exists():
-        # No policy file, nothing to evaluate
-        print("{}")
-        return
-
-    # Read conversation context from stdin (if available)
-    conversation_context = ""
-    if not sys.stdin.isatty():
-        try:
-            conversation_context = sys.stdin.read()
-        except Exception:
-            pass
-
-    # Extract promise tags from conversation
-    promised_policies = extract_promise_tags(conversation_context)
-
-    # Parse policies
-    try:
-        policies = parse_policy_file(policy_path)
-    except PolicyParseError as e:
-        # Log error to stderr, return empty result
-        print(f"Error parsing policy file: {e}", file=sys.stderr)
-        print("{}")
-        return
-
-    if not policies:
-        # No policies defined
-        print("{}")
-        return
-
-    # Group policies by compare_to mode to minimize git calls
-    policies_by_mode: dict[str, list[PolicyV1]] = {}
-    for policy in policies:
-        mode = policy.compare_to
-        if mode not in policies_by_mode:
-            policies_by_mode[mode] = []
-        policies_by_mode[mode].append(policy)
-
-    # Get changed files for each mode and evaluate policies
-    fired_policies: list[PolicyV1] = []
-    for mode, mode_policies in policies_by_mode.items():
-        changed_files = get_changed_files_for_mode(mode)
-        if not changed_files:
-            continue
-
-        for policy in mode_policies:
-            # Skip if already promised
-            if policy.name in promised_policies:
-                continue
-            # Evaluate this policy
-            if evaluate_policy_v1(policy, changed_files):
-                fired_policies.append(policy)
-
-    if not fired_policies:
-        # No policies fired
-        print("{}")
-        return
-
-    # Format output for Claude Code Stop hooks
-    # Use "decision": "block" to prevent Claude from stopping
-    message = format_policy_message(fired_policies)
-    result = {
-        "decision": "block",
-        "reason": message,
-    }
-
-    print(json.dumps(result))
-
-
-if __name__ == "__main__":
-    main()
diff --git a/src/deepwork/schemas/policy_schema.py b/src/deepwork/schemas/policy_schema.py
index 690cb643..51e35812 100644
--- a/src/deepwork/schemas/policy_schema.py
+++ b/src/deepwork/schemas/policy_schema.py
@@ -101,82 +101,3 @@
         },
     ],
 }
-
-
-# Legacy schema for .deepwork.policy.yml (v1 format)
-# Kept for reference but not used in v2
-POLICY_SCHEMA_V1: dict[str, Any] = {
-    "$schema": "http://json-schema.org/draft-07/schema#",
-    "type": "array",
-    "description": "List of policies that trigger based on file changes",
-    "items": {
-        "type": "object",
-        "required": ["name", "trigger"],
-        "properties": {
-            "name": {
-                "type": "string",
-                "minLength": 1,
-                "description": "Friendly name for the policy",
-            },
-            "trigger": {
-                "oneOf": [
-                    {
-                        "type": "string",
-                        "minLength": 1,
-                        "description": "Glob pattern for files that trigger this policy",
-                    },
-                    {
-                        "type": "array",
-                        "items": {"type": "string", "minLength": 1},
-                        "minItems": 1,
-                        "description": "List of glob patterns for files that trigger this policy",
-                    },
-                ],
-                "description": "Glob pattern(s) for files that, if changed, should trigger this policy",
-            },
-            "safety": {
-                "oneOf": [
-                    {
-                        "type": "string",
-                        "minLength": 1,
-                        "description": "Glob pattern for safety files",
-                    },
-                    {
-                        "type": "array",
-                        "items": {"type": "string", "minLength": 1},
-                        "description": "List of glob patterns for safety files",
-                    },
-                ],
-                "description": "Glob pattern(s) for files that, if also changed, mean the policy doesn't need to trigger",
-            },
-            "instructions": {
-                "type": "string",
-                "minLength": 1,
-                "description": "Instructions to give the agent when this policy triggers",
-            },
-            "instructions_file": {
-                "type": "string",
-                "minLength": 1,
-                "description": "Path to a file containing instructions (alternative to inline instructions)",
-            },
-            "compare_to": {
-                "type": "string",
-                "enum": ["base", "default_tip", "prompt"],
-                "description": (
-                    "What to compare against when detecting changed files. "
-                    "'base' (default) compares to the base of the current branch. "
-                    "'default_tip' compares to the tip of the default branch. "
-                    "'prompt' compares to the state at the start of the prompt."
-                ),
-            },
-        },
-        "oneOf": [
-            {"required": ["instructions"]},
-            {"required": ["instructions_file"]},
-        ],
-        "additionalProperties": False,
-    },
-}
-
-# Alias for backwards compatibility
-POLICY_SCHEMA = POLICY_SCHEMA_V1
diff --git a/src/deepwork/standard_jobs/deepwork_policy/hooks/policy_stop_hook.sh b/src/deepwork/standard_jobs/deepwork_policy/hooks/policy_stop_hook.sh
index 6a84bddc..4ad1b539 100755
--- a/src/deepwork/standard_jobs/deepwork_policy/hooks/policy_stop_hook.sh
+++ b/src/deepwork/standard_jobs/deepwork_policy/hooks/policy_stop_hook.sh
@@ -2,27 +2,24 @@
 # policy_stop_hook.sh - Evaluates policies when the agent stops
 #
 # This script is called as a Claude Code Stop hook. It:
-# 1. Evaluates policies from .deepwork/policies/ (v2) or .deepwork.policy.yml (v1)
+# 1. Evaluates policies from .deepwork/policies/
 # 2. Computes changed files based on each policy's compare_to setting
 # 3. Checks for <promise> tags in the conversation transcript
 # 4. Returns JSON to block stop if policies need attention
 
 set -e
 
-# Determine which policy system to use
-USE_V2=false
-V1_POLICY_FILE=".deepwork.policy.yml"
-V2_POLICY_DIR=".deepwork/policies"
+# Check if policies directory exists with .md files
+POLICY_DIR=".deepwork/policies"
 
-if [ -d "${V2_POLICY_DIR}" ]; then
-    # Check if there are any .md files in the v2 directory
-    if ls "${V2_POLICY_DIR}"/*.md 1>/dev/null 2>&1; then
-        USE_V2=true
-    fi
+if [ ! -d "${POLICY_DIR}" ]; then
+    # No policies directory, nothing to do
+    exit 0
 fi
 
-# If no v2 policies and no v1 policy file, nothing to do
-if [ "${USE_V2}" = false ] && [ ! -f "${V1_POLICY_FILE}" ]; then
+# Check if there are any .md files
+if ! ls "${POLICY_DIR}"/*.md 1>/dev/null 2>&1; then
+    # No policy files, nothing to do
     exit 0
 fi
 
@@ -32,37 +29,14 @@ if [ ! -t 0 ]; then
     HOOK_INPUT=$(cat)
 fi
 
-if [ "${USE_V2}" = true ]; then
-    # Use v2 policy system via cross-platform wrapper
-    # The wrapper reads JSON input and handles transcript extraction
-    result=$(echo "${HOOK_INPUT}" | DEEPWORK_HOOK_PLATFORM=claude DEEPWORK_HOOK_EVENT=Stop python -m deepwork.hooks.policy_check 2>/dev/null || echo '{}')
-else
-    # Use v1 policy system - extract conversation context for evaluate_policies
-
-    # Extract transcript_path from the hook input JSON using jq
-    # Claude Code passes: {"session_id": "...", "transcript_path": "...", ...}
-    TRANSCRIPT_PATH=""
-    if [ -n "${HOOK_INPUT}" ]; then
-        TRANSCRIPT_PATH=$(echo "${HOOK_INPUT}" | jq -r '.transcript_path // empty' 2>/dev/null || echo "")
-    fi
-
-    # Extract conversation text from the JSONL transcript
-    # The transcript is JSONL format - each line is a JSON object
-    # We need to extract the text content from assistant messages
-    conversation_context=""
-    if [ -n "${TRANSCRIPT_PATH}" ] && [ -f "${TRANSCRIPT_PATH}" ]; then
-        # Extract text content from all assistant messages in the transcript
-        # Each line is a JSON object; we extract .message.content[].text for assistant messages
-        conversation_context=$(cat "${TRANSCRIPT_PATH}" | \
-            grep -E '"role"\s*:\s*"assistant"' | \
-            jq -r '.message.content // [] | map(select(.type == "text")) | map(.text) | join("\n")' 2>/dev/null | \
-            tr -d '\0' || echo "")
-    fi
+# Call the Python policy evaluator via the cross-platform wrapper
+# The wrapper reads JSON input and handles transcript extraction
+# Note: exit code 2 means "block" which is valid (not an error), so capture it
+result=$(echo "${HOOK_INPUT}" | DEEPWORK_HOOK_PLATFORM=claude DEEPWORK_HOOK_EVENT=Stop python -m deepwork.hooks.policy_check 2>/dev/null) || true
 
-    # Call the Python v1 evaluator
-    result=$(echo "${conversation_context}" | python -m deepwork.hooks.evaluate_policies \
-        --policy-file "${V1_POLICY_FILE}" \
-        2>/dev/null || echo '{}')
+# If no output (error case), provide empty JSON as fallback
+if [ -z "${result}" ]; then
+    result='{}'
 fi
 
 # Output the result (JSON for Claude Code hooks)
diff --git a/tests/fixtures/policies/empty_policy.yml b/tests/fixtures/policies/empty_policy.yml
deleted file mode 100644
index c8faa07a..00000000
--- a/tests/fixtures/policies/empty_policy.yml
+++ /dev/null
@@ -1 +0,0 @@
-# Empty policy file
diff --git a/tests/fixtures/policies/instructions/security_review.md b/tests/fixtures/policies/instructions/security_review.md
deleted file mode 100644
index b64978bc..00000000
--- a/tests/fixtures/policies/instructions/security_review.md
+++ /dev/null
@@ -1,8 +0,0 @@
-## Security Review Required
-
-Authentication code has been modified. Please:
-
-1. Check for hardcoded credentials
-2. Verify input validation
-3. Review access control logic
-4. Update security documentation
diff --git a/tests/fixtures/policies/invalid_missing_instructions.yml b/tests/fixtures/policies/invalid_missing_instructions.yml
deleted file mode 100644
index 6c47934a..00000000
--- a/tests/fixtures/policies/invalid_missing_instructions.yml
+++ /dev/null
@@ -1,2 +0,0 @@
-- name: "Invalid policy"
-  trigger: "src/**/*"
diff --git a/tests/fixtures/policies/invalid_missing_trigger.yml b/tests/fixtures/policies/invalid_missing_trigger.yml
deleted file mode 100644
index a5c89493..00000000
--- a/tests/fixtures/policies/invalid_missing_trigger.yml
+++ /dev/null
@@ -1,3 +0,0 @@
-- name: "Invalid policy"
-  safety: "some/file.md"
-  instructions: "This policy is missing a trigger"
diff --git a/tests/fixtures/policies/multiple_policies.yml b/tests/fixtures/policies/multiple_policies.yml
deleted file mode 100644
index da292317..00000000
--- a/tests/fixtures/policies/multiple_policies.yml
+++ /dev/null
@@ -1,21 +0,0 @@
-- name: "Update install guide on config changes"
-  trigger: "app/config/**/*"
-  safety: "docs/install_guide.md"
-  instructions: "Update docs/install_guide.md if needed."
-
-- name: "Security review for auth changes"
-  trigger:
-    - "src/auth/**/*"
-    - "src/security/**/*"
-  safety:
-    - "SECURITY.md"
-    - "docs/security_review.md"
-  instructions: |
-    Authentication or security code has changed.
-    Please ensure:
-    1. No secrets are exposed
-    2. Security review documentation is updated
-
-- name: "API documentation update"
-  trigger: "src/api/**/*.py"
-  instructions: "API code changed. Update API documentation."
diff --git a/tests/fixtures/policies/policy_with_instructions_file.yml b/tests/fixtures/policies/policy_with_instructions_file.yml
deleted file mode 100644
index 267bfc66..00000000
--- a/tests/fixtures/policies/policy_with_instructions_file.yml
+++ /dev/null
@@ -1,3 +0,0 @@
-- name: "Security review"
-  trigger: "src/auth/**/*"
-  instructions_file: "instructions/security_review.md"
diff --git a/tests/fixtures/policies/valid_policy.yml b/tests/fixtures/policies/valid_policy.yml
deleted file mode 100644
index a2b0b6be..00000000
--- a/tests/fixtures/policies/valid_policy.yml
+++ /dev/null
@@ -1,6 +0,0 @@
-- name: "Update install guide on config changes"
-  trigger: "app/config/**/*"
-  safety: "docs/install_guide.md"
-  instructions: |
-    Configuration files have changed. Please review docs/install_guide.md
-    and update it if the installation instructions need to change.
diff --git a/tests/shell_script_tests/conftest.py b/tests/shell_script_tests/conftest.py
index 085cf2ff..e9b97682 100644
--- a/tests/shell_script_tests/conftest.py
+++ b/tests/shell_script_tests/conftest.py
@@ -32,20 +32,24 @@ def git_repo_with_policy(tmp_path: Path) -> Path:
     repo.index.add(["README.md"])
     repo.index.commit("Initial commit")
 
-    # Policy that triggers on any Python file
-    policy_file = tmp_path / ".deepwork.policy.yml"
+    # Create v2 policy directory and file
+    policies_dir = tmp_path / ".deepwork" / "policies"
+    policies_dir.mkdir(parents=True, exist_ok=True)
+
+    # Policy that triggers on any Python file (v2 format)
+    policy_file = policies_dir / "python-file-policy.md"
     policy_file.write_text(
-        """- name: "Python File Policy"
-  trigger: "**/*.py"
-  compare_to: prompt
-  instructions: |
-    Review Python files for quality.
+        """---
+name: Python File Policy
+trigger: "**/*.py"
+compare_to: prompt
+---
+Review Python files for quality.
 """
     )
 
     # Empty baseline so new files trigger
     deepwork_dir = tmp_path / ".deepwork"
-    deepwork_dir.mkdir(exist_ok=True)
     (deepwork_dir / ".last_work_tree").write_text("")
 
     return tmp_path
diff --git a/tests/shell_script_tests/test_policy_stop_hook.py b/tests/shell_script_tests/test_policy_stop_hook.py
index 07a2d221..bfe9c04c 100644
--- a/tests/shell_script_tests/test_policy_stop_hook.py
+++ b/tests/shell_script_tests/test_policy_stop_hook.py
@@ -17,7 +17,7 @@
 
 @pytest.fixture
 def git_repo_with_src_policy(tmp_path: Path) -> Path:
-    """Create a git repo with a policy file that triggers on src/** changes."""
+    """Create a git repo with a v2 policy file that triggers on src/** changes."""
     repo = Repo.init(tmp_path)
 
     readme = tmp_path / "README.md"
@@ -25,21 +25,25 @@ def git_repo_with_src_policy(tmp_path: Path) -> Path:
     repo.index.add(["README.md"])
     repo.index.commit("Initial commit")
 
+    # Create v2 policy directory and file
+    policies_dir = tmp_path / ".deepwork" / "policies"
+    policies_dir.mkdir(parents=True, exist_ok=True)
+
     # Use compare_to: prompt since test repos don't have origin remote
-    policy_file = tmp_path / ".deepwork.policy.yml"
+    policy_file = policies_dir / "test-policy.md"
     policy_file.write_text(
-        """- name: "Test Policy"
-  trigger: "src/**/*"
-  compare_to: prompt
-  instructions: |
-    This is a test policy that fires when src/ files change.
-    Please address this policy.
+        """---
+name: Test Policy
+trigger: "src/**/*"
+compare_to: prompt
+---
+This is a test policy that fires when src/ files change.
+Please address this policy.
 """
     )
 
     # Empty baseline means all current files are "new"
     deepwork_dir = tmp_path / ".deepwork"
-    deepwork_dir.mkdir(exist_ok=True)
     (deepwork_dir / ".last_work_tree").write_text("")
 
     return tmp_path
@@ -112,14 +116,14 @@ def test_outputs_empty_json_when_no_policy_fires(
         # Should be empty JSON (no blocking)
         assert result == {}, f"Expected empty JSON when no policies fire, got: {result}"
 
-    def test_exits_early_when_no_policy_file(self, policy_hooks_dir: Path, git_repo: Path) -> None:
-        """Test that the hook exits cleanly when no policy file exists."""
+    def test_exits_early_when_no_policy_dir(self, policy_hooks_dir: Path, git_repo: Path) -> None:
+        """Test that the hook exits cleanly when no policy directory exists."""
         script_path = policy_hooks_dir / "policy_stop_hook.sh"
         stdout, stderr, code = run_stop_hook(script_path, git_repo)
 
         # Should exit with code 0 and produce no output (or empty)
         assert code == 0, f"Expected exit code 0, got {code}. stderr: {stderr}"
-        # No output is fine when there's no policy file
+        # No output is fine when there's no policy directory
         output = stdout.strip()
         if output:
             # If there is output, it should be valid JSON
@@ -167,7 +171,7 @@ def test_respects_promise_tags(
         try:
             # Run the stop hook with transcript path
             script_path = policy_hooks_dir / "policy_stop_hook.sh"
-            hook_input = {"transcript_path": transcript_path}
+            hook_input = {"transcript_path": transcript_path, "hook_event_name": "Stop"}
             stdout, stderr, code = run_stop_hook(script_path, git_repo_with_src_policy, hook_input)
 
             # Parse the output
@@ -191,22 +195,24 @@ def test_safety_pattern_prevents_firing(self, policy_hooks_dir: Path, tmp_path:
         repo.index.add(["README.md"])
         repo.index.commit("Initial commit")
 
-        # Create a policy with a safety pattern
-        # Use compare_to: prompt since test repos don't have origin remote
-        policy_file = tmp_path / ".deepwork.policy.yml"
+        # Create v2 policy with a safety pattern
+        policies_dir = tmp_path / ".deepwork" / "policies"
+        policies_dir.mkdir(parents=True, exist_ok=True)
+
+        policy_file = policies_dir / "documentation-policy.md"
         policy_file.write_text(
-            """- name: "Documentation Policy"
-  trigger: "src/**/*"
-  safety: "docs/**/*"
-  compare_to: prompt
-  instructions: |
-    Update documentation when changing source files.
+            """---
+name: Documentation Policy
+trigger: "src/**/*"
+safety: "docs/**/*"
+compare_to: prompt
+---
+Update documentation when changing source files.
 """
         )
 
         # Create .deepwork directory with empty baseline
         deepwork_dir = tmp_path / ".deepwork"
-        deepwork_dir.mkdir(exist_ok=True)
         (deepwork_dir / ".last_work_tree").write_text("")
 
         # Create both trigger and safety files
diff --git a/tests/unit/test_evaluate_policies.py b/tests/unit/test_evaluate_policies.py
deleted file mode 100644
index c0abdceb..00000000
--- a/tests/unit/test_evaluate_policies.py
+++ /dev/null
@@ -1,101 +0,0 @@
-"""Tests for the hooks evaluate_policies module."""
-
-from deepwork.core.policy_parser import PolicyV1
-from deepwork.hooks.evaluate_policies import extract_promise_tags, format_policy_message
-
-
-class TestExtractPromiseTags:
-    """Tests for extract_promise_tags function."""
-
-    def test_extracts_policy_name_from_promise(self) -> None:
-        """Test extracting policy name from promise tag body."""
-        text = "<promise>✓ Update Docs</promise>"
-        result = extract_promise_tags(text)
-        assert result == {"Update Docs"}
-
-    def test_extracts_multiple_promises(self) -> None:
-        """Test extracting multiple promise tags."""
-        text = """
-        I've addressed the policies.
-        <promise>✓ Update Docs</promise>
-        <promise>✓ Security Review</promise>
-        """
-        result = extract_promise_tags(text)
-        assert result == {"Update Docs", "Security Review"}
-
-    def test_case_insensitive(self) -> None:
-        """Test that promise tag matching is case insensitive."""
-        text = "<PROMISE>✓ Test Policy</PROMISE>"
-        result = extract_promise_tags(text)
-        assert result == {"Test Policy"}
-
-    def test_returns_empty_set_for_no_promises(self) -> None:
-        """Test that empty set is returned when no promises found."""
-        text = "This is just some regular text without any promise tags."
-        result = extract_promise_tags(text)
-        assert result == set()
-
-    def test_strips_whitespace_from_policy_name(self) -> None:
-        """Test that whitespace is stripped from extracted policy names."""
-        text = "<promise>✓   Policy With Spaces   </promise>"
-        result = extract_promise_tags(text)
-        assert result == {"Policy With Spaces"}
-
-
-class TestFormatPolicyMessage:
-    """Tests for format_policy_message function."""
-
-    def test_formats_single_policy(self) -> None:
-        """Test formatting a single policy."""
-        policies = [
-            PolicyV1(
-                name="Test Policy",
-                triggers=["src/*"],
-                safety=[],
-                instructions="Please update the documentation.",
-            )
-        ]
-        result = format_policy_message(policies)
-
-        assert "## DeepWork Policies Triggered" in result
-        assert "### Policy: Test Policy" in result
-        assert "Please update the documentation." in result
-        assert "<promise>✓ Policy Name</promise>" in result
-
-    def test_formats_multiple_policies(self) -> None:
-        """Test formatting multiple policies."""
-        policies = [
-            PolicyV1(
-                name="Policy 1",
-                triggers=["src/*"],
-                safety=[],
-                instructions="Do thing 1.",
-            ),
-            PolicyV1(
-                name="Policy 2",
-                triggers=["test/*"],
-                safety=[],
-                instructions="Do thing 2.",
-            ),
-        ]
-        result = format_policy_message(policies)
-
-        assert "### Policy: Policy 1" in result
-        assert "### Policy: Policy 2" in result
-        assert "Do thing 1." in result
-        assert "Do thing 2." in result
-
-    def test_strips_instruction_whitespace(self) -> None:
-        """Test that instruction whitespace is stripped."""
-        policies = [
-            PolicyV1(
-                name="Test",
-                triggers=["*"],
-                safety=[],
-                instructions="  \n  Instructions here  \n  ",
-            )
-        ]
-        result = format_policy_message(policies)
-
-        # Should be stripped but present
-        assert "Instructions here" in result
diff --git a/tests/unit/test_policy_parser.py b/tests/unit/test_policy_parser.py
index 24e537c4..62c73cb8 100644
--- a/tests/unit/test_policy_parser.py
+++ b/tests/unit/test_policy_parser.py
@@ -10,162 +10,12 @@
     DetectionMode,
     Policy,
     PolicyParseError,
-    PolicyV1,
     evaluate_policies,
     evaluate_policy,
-    parse_policy_file,
+    load_policies_from_directory,
 )
 
 
-class TestPolicyV1:
-    """Tests for PolicyV1 dataclass (legacy format)."""
-
-    def test_from_dict_with_inline_instructions(self) -> None:
-        """Test creating policy from dict with inline instructions."""
-        data = {
-            "name": "Test Policy",
-            "trigger": "src/**/*",
-            "safety": "docs/readme.md",
-            "instructions": "Do something",
-        }
-        policy = PolicyV1.from_dict(data)
-
-        assert policy.name == "Test Policy"
-        assert policy.triggers == ["src/**/*"]
-        assert policy.safety == ["docs/readme.md"]
-        assert policy.instructions == "Do something"
-
-    def test_from_dict_normalizes_trigger_string_to_list(self) -> None:
-        """Test that trigger string is normalized to list."""
-        data = {
-            "name": "Test",
-            "trigger": "*.py",
-            "instructions": "Check it",
-        }
-        policy = PolicyV1.from_dict(data)
-
-        assert policy.triggers == ["*.py"]
-
-    def test_from_dict_preserves_trigger_list(self) -> None:
-        """Test that trigger list is preserved."""
-        data = {
-            "name": "Test",
-            "trigger": ["*.py", "*.js"],
-            "instructions": "Check it",
-        }
-        policy = PolicyV1.from_dict(data)
-
-        assert policy.triggers == ["*.py", "*.js"]
-
-    def test_from_dict_normalizes_safety_string_to_list(self) -> None:
-        """Test that safety string is normalized to list."""
-        data = {
-            "name": "Test",
-            "trigger": "src/*",
-            "safety": "docs/README.md",
-            "instructions": "Check it",
-        }
-        policy = PolicyV1.from_dict(data)
-
-        assert policy.safety == ["docs/README.md"]
-
-    def test_from_dict_safety_defaults_to_empty_list(self) -> None:
-        """Test that missing safety defaults to empty list."""
-        data = {
-            "name": "Test",
-            "trigger": "src/*",
-            "instructions": "Check it",
-        }
-        policy = PolicyV1.from_dict(data)
-
-        assert policy.safety == []
-
-    def test_from_dict_with_instructions_file(self, temp_dir: Path) -> None:
-        """Test creating policy from dict with instructions_file."""
-        # Create instructions file
-        instructions_file = temp_dir / "instructions.md"
-        instructions_file.write_text("# Instructions\nDo this and that.")
-
-        data = {
-            "name": "Test Policy",
-            "trigger": "src/*",
-            "instructions_file": "instructions.md",
-        }
-        policy = PolicyV1.from_dict(data, base_dir=temp_dir)
-
-        assert policy.instructions == "# Instructions\nDo this and that."
-
-    def test_from_dict_instructions_file_not_found(self, temp_dir: Path) -> None:
-        """Test error when instructions_file doesn't exist."""
-        data = {
-            "name": "Test Policy",
-            "trigger": "src/*",
-            "instructions_file": "nonexistent.md",
-        }
-
-        with pytest.raises(PolicyParseError, match="instructions file not found"):
-            PolicyV1.from_dict(data, base_dir=temp_dir)
-
-    def test_from_dict_instructions_file_without_base_dir(self) -> None:
-        """Test error when instructions_file used without base_dir."""
-        data = {
-            "name": "Test Policy",
-            "trigger": "src/*",
-            "instructions_file": "instructions.md",
-        }
-
-        with pytest.raises(PolicyParseError, match="no base_dir provided"):
-            PolicyV1.from_dict(data, base_dir=None)
-
-    def test_from_dict_compare_to_defaults_to_base(self) -> None:
-        """Test that compare_to defaults to 'base'."""
-        data = {
-            "name": "Test",
-            "trigger": "src/*",
-            "instructions": "Check it",
-        }
-        policy = PolicyV1.from_dict(data)
-
-        assert policy.compare_to == DEFAULT_COMPARE_TO
-        assert policy.compare_to == "base"
-
-    def test_from_dict_compare_to_explicit_base(self) -> None:
-        """Test explicit compare_to: base."""
-        data = {
-            "name": "Test",
-            "trigger": "src/*",
-            "instructions": "Check it",
-            "compare_to": "base",
-        }
-        policy = PolicyV1.from_dict(data)
-
-        assert policy.compare_to == "base"
-
-    def test_from_dict_compare_to_default_tip(self) -> None:
-        """Test compare_to: default_tip."""
-        data = {
-            "name": "Test",
-            "trigger": "src/*",
-            "instructions": "Check it",
-            "compare_to": "default_tip",
-        }
-        policy = PolicyV1.from_dict(data)
-
-        assert policy.compare_to == "default_tip"
-
-    def test_from_dict_compare_to_prompt(self) -> None:
-        """Test compare_to: prompt."""
-        data = {
-            "name": "Test",
-            "trigger": "src/*",
-            "instructions": "Check it",
-            "compare_to": "prompt",
-        }
-        policy = PolicyV1.from_dict(data)
-
-        assert policy.compare_to == "prompt"
-
-
 class TestMatchesPattern:
     """Tests for matches_pattern function."""
 
@@ -362,72 +212,153 @@ def test_returns_empty_when_no_policies_fire(self) -> None:
         assert len(fired) == 0
 
 
-class TestParsePolicyFile:
-    """Tests for parse_policy_file function."""
+class TestLoadPoliciesFromDirectory:
+    """Tests for load_policies_from_directory function."""
 
-    def test_parses_valid_policy_file(self, fixtures_dir: Path) -> None:
-        """Test parsing a valid policy file."""
-        policy_file = fixtures_dir / "policies" / "valid_policy.yml"
-        policies = parse_policy_file(policy_file)
+    def test_loads_policies_from_directory(self, temp_dir: Path) -> None:
+        """Test loading policies from a directory."""
+        policies_dir = temp_dir / "policies"
+        policies_dir.mkdir()
 
-        assert len(policies) == 1
-        assert policies[0].name == "Update install guide on config changes"
-        assert policies[0].triggers == ["app/config/**/*"]
-        assert policies[0].safety == ["docs/install_guide.md"]
-        assert "Configuration files have changed" in policies[0].instructions
-
-    def test_parses_multiple_policies(self, fixtures_dir: Path) -> None:
-        """Test parsing a file with multiple policies."""
-        policy_file = fixtures_dir / "policies" / "multiple_policies.yml"
-        policies = parse_policy_file(policy_file)
-
-        assert len(policies) == 3
-        assert policies[0].name == "Update install guide on config changes"
-        assert policies[1].name == "Security review for auth changes"
-        assert policies[2].name == "API documentation update"
-
-        # Check that arrays are parsed correctly
-        assert policies[1].triggers == ["src/auth/**/*", "src/security/**/*"]
-        assert policies[1].safety == ["SECURITY.md", "docs/security_review.md"]
-
-    def test_parses_policy_with_instructions_file(self, fixtures_dir: Path) -> None:
-        """Test parsing a policy with instructions_file."""
-        policy_file = fixtures_dir / "policies" / "policy_with_instructions_file.yml"
-        policies = parse_policy_file(policy_file)
+        # Create a policy file
+        policy_file = policies_dir / "test-policy.md"
+        policy_file.write_text(
+            """---
+name: Test Policy
+trigger: "src/**/*"
+---
+Please check the source files.
+"""
+        )
+
+        policies = load_policies_from_directory(policies_dir)
 
         assert len(policies) == 1
-        assert "Security Review Required" in policies[0].instructions
-        assert "hardcoded credentials" in policies[0].instructions
+        assert policies[0].name == "Test Policy"
+        assert policies[0].triggers == ["src/**/*"]
+        assert policies[0].detection_mode == DetectionMode.TRIGGER_SAFETY
+        assert "check the source files" in policies[0].instructions
+
+    def test_loads_multiple_policies(self, temp_dir: Path) -> None:
+        """Test loading multiple policies."""
+        policies_dir = temp_dir / "policies"
+        policies_dir.mkdir()
+
+        # Create policy files
+        (policies_dir / "policy1.md").write_text(
+            """---
+name: Policy 1
+trigger: "src/**/*"
+---
+Instructions for policy 1.
+"""
+        )
+        (policies_dir / "policy2.md").write_text(
+            """---
+name: Policy 2
+trigger: "test/**/*"
+---
+Instructions for policy 2.
+"""
+        )
+
+        policies = load_policies_from_directory(policies_dir)
+
+        assert len(policies) == 2
+        names = {p.name for p in policies}
+        assert names == {"Policy 1", "Policy 2"}
 
-    def test_empty_policy_file_returns_empty_list(self, fixtures_dir: Path) -> None:
-        """Test that empty policy file returns empty list."""
-        policy_file = fixtures_dir / "policies" / "empty_policy.yml"
-        policies = parse_policy_file(policy_file)
+    def test_returns_empty_for_empty_directory(self, temp_dir: Path) -> None:
+        """Test that empty directory returns empty list."""
+        policies_dir = temp_dir / "policies"
+        policies_dir.mkdir()
+
+        policies = load_policies_from_directory(policies_dir)
 
         assert policies == []
 
-    def test_raises_for_missing_trigger(self, fixtures_dir: Path) -> None:
-        """Test error when policy is missing trigger."""
-        policy_file = fixtures_dir / "policies" / "invalid_missing_trigger.yml"
+    def test_returns_empty_for_nonexistent_directory(self, temp_dir: Path) -> None:
+        """Test that nonexistent directory returns empty list."""
+        policies_dir = temp_dir / "nonexistent"
 
-        with pytest.raises(PolicyParseError, match="validation failed"):
-            parse_policy_file(policy_file)
+        policies = load_policies_from_directory(policies_dir)
 
-    def test_raises_for_missing_instructions(self, fixtures_dir: Path) -> None:
-        """Test error when policy is missing both instructions and instructions_file."""
-        policy_file = fixtures_dir / "policies" / "invalid_missing_instructions.yml"
+        assert policies == []
 
-        with pytest.raises(PolicyParseError, match="validation failed"):
-            parse_policy_file(policy_file)
+    def test_loads_policy_with_set_detection_mode(self, temp_dir: Path) -> None:
+        """Test loading a policy with set detection mode."""
+        policies_dir = temp_dir / "policies"
+        policies_dir.mkdir()
+
+        policy_file = policies_dir / "source-test-pairing.md"
+        policy_file.write_text(
+            """---
+name: Source/Test Pairing
+set:
+  - src/{path}.py
+  - tests/{path}_test.py
+---
+Source and test files should change together.
+"""
+        )
+
+        policies = load_policies_from_directory(policies_dir)
+
+        assert len(policies) == 1
+        assert policies[0].name == "Source/Test Pairing"
+        assert policies[0].detection_mode == DetectionMode.SET
+        assert policies[0].set_patterns == ["src/{path}.py", "tests/{path}_test.py"]
+
+    def test_loads_policy_with_pair_detection_mode(self, temp_dir: Path) -> None:
+        """Test loading a policy with pair detection mode."""
+        policies_dir = temp_dir / "policies"
+        policies_dir.mkdir()
+
+        policy_file = policies_dir / "api-docs.md"
+        policy_file.write_text(
+            """---
+name: API Documentation
+pair:
+  trigger: src/api/{name}.py
+  expects: docs/api/{name}.md
+---
+API code requires documentation.
+"""
+        )
 
-    def test_raises_for_nonexistent_file(self, temp_dir: Path) -> None:
-        """Test error when policy file doesn't exist."""
-        policy_file = temp_dir / "nonexistent.yml"
+        policies = load_policies_from_directory(policies_dir)
 
-        with pytest.raises(PolicyParseError, match="does not exist"):
-            parse_policy_file(policy_file)
+        assert len(policies) == 1
+        assert policies[0].name == "API Documentation"
+        assert policies[0].detection_mode == DetectionMode.PAIR
+        assert policies[0].pair_config is not None
+        assert policies[0].pair_config.trigger == "src/api/{name}.py"
+        assert policies[0].pair_config.expects == ["docs/api/{name}.md"]
+
+    def test_loads_policy_with_command_action(self, temp_dir: Path) -> None:
+        """Test loading a policy with command action."""
+        policies_dir = temp_dir / "policies"
+        policies_dir.mkdir()
+
+        policy_file = policies_dir / "format-python.md"
+        policy_file.write_text(
+            """---
+name: Format Python
+trigger: "**/*.py"
+action:
+  command: "ruff format {file}"
+  run_for: each_match
+---
+"""
+        )
+
+        policies = load_policies_from_directory(policies_dir)
+
+        assert len(policies) == 1
+        assert policies[0].name == "Format Python"
+        from deepwork.core.policy_parser import ActionType
 
-    def test_raises_for_directory_path(self, temp_dir: Path) -> None:
-        """Test error when path is a directory."""
-        with pytest.raises(PolicyParseError, match="is not a file"):
-            parse_policy_file(temp_dir)
+        assert policies[0].action_type == ActionType.COMMAND
+        assert policies[0].command_action is not None
+        assert policies[0].command_action.command == "ruff format {file}"
+        assert policies[0].command_action.run_for == "each_match"

From 84eb7416a64385d3fd9995071d9af21f58c06c12 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Fri, 16 Jan 2026 21:42:33 +0000
Subject: [PATCH 09/21] Format policy_parser.py with ruff

---
 src/deepwork/core/policy_parser.py | 24 ++++++------------------
 1 file changed, 6 insertions(+), 18 deletions(-)

diff --git a/src/deepwork/core/policy_parser.py b/src/deepwork/core/policy_parser.py
index 06372098..21726079 100644
--- a/src/deepwork/core/policy_parser.py
+++ b/src/deepwork/core/policy_parser.py
@@ -115,13 +115,9 @@ def from_frontmatter(
 
         mode_count = sum([has_trigger, has_set, has_pair])
         if mode_count == 0:
-            raise PolicyParseError(
-                f"Policy '{name}' must have 'trigger', 'set', or 'pair'"
-            )
+            raise PolicyParseError(f"Policy '{name}' must have 'trigger', 'set', or 'pair'")
         if mode_count > 1:
-            raise PolicyParseError(
-                f"Policy '{name}' has multiple detection modes - use only one"
-            )
+            raise PolicyParseError(f"Policy '{name}' has multiple detection modes - use only one")
 
         # Parse based on detection mode
         detection_mode: DetectionMode
@@ -141,9 +137,7 @@ def from_frontmatter(
             detection_mode = DetectionMode.SET
             set_patterns = list(frontmatter["set"])
             if len(set_patterns) < 2:
-                raise PolicyParseError(
-                    f"Policy '{name}' set requires at least 2 patterns"
-                )
+                raise PolicyParseError(f"Policy '{name}' set requires at least 2 patterns")
 
         elif has_pair:
             detection_mode = DetectionMode.PAIR
@@ -170,9 +164,7 @@ def from_frontmatter(
             action_type = ActionType.PROMPT
             # Markdown body is the instructions
             if not markdown_body.strip():
-                raise PolicyParseError(
-                    f"Policy '{name}' with prompt action requires markdown body"
-                )
+                raise PolicyParseError(f"Policy '{name}' with prompt action requires markdown body")
 
         # Get compare_to
         compare_to = frontmatter.get("compare_to", DEFAULT_COMPARE_TO)
@@ -230,9 +222,7 @@ def parse_frontmatter_file(filepath: Path) -> tuple[dict[str, Any], str]:
     try:
         frontmatter = yaml.safe_load(frontmatter_str)
     except yaml.YAMLError as e:
-        raise PolicyParseError(
-            f"Invalid YAML frontmatter in '{filepath.name}': {e}"
-        ) from e
+        raise PolicyParseError(f"Invalid YAML frontmatter in '{filepath.name}': {e}") from e
 
     if frontmatter is None:
         frontmatter = {}
@@ -270,9 +260,7 @@ def parse_policy_file_v2(filepath: Path) -> Policy:
     try:
         validate_against_schema(frontmatter, POLICY_FRONTMATTER_SCHEMA)
     except ValidationError as e:
-        raise PolicyParseError(
-            f"Policy '{filepath.name}' validation failed: {e}"
-        ) from e
+        raise PolicyParseError(f"Policy '{filepath.name}' validation failed: {e}") from e
 
     # Create Policy object
     filename = filepath.stem  # filename without .md extension

From 76f138c1cc74872b90cd07a2af764ea2e15bbd19 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Fri, 16 Jan 2026 21:43:09 +0000
Subject: [PATCH 10/21] Update uv.lock

---
 uv.lock | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/uv.lock b/uv.lock
index c4091ca4..cd4110a3 100644
--- a/uv.lock
+++ b/uv.lock
@@ -126,7 +126,7 @@ toml = [
 
 [[package]]
 name = "deepwork"
-version = "0.3.0"
+version = "0.4.0"
 source = { editable = "." }
 dependencies = [
     { name = "click" },

From e209e6f9db4dcc3be66764fd53365575dfb8fc69 Mon Sep 17 00:00:00 2001
From: Noah Horton <noah@unsupervised.com>
Date: Sat, 17 Jan 2026 14:13:27 -0700
Subject: [PATCH 11/21] Rename policy system to rules system

Rename all policy-related terminology to rules throughout the codebase:
- Rename deepwork_policy job to deepwork_rules
- Rename .deepwork.policy.yml to .deepwork.rules.yml
- Rename policy_parser.py, policy_queue.py, policy_check.py to rules_*
- Rename policy_schema.py to rules_schema.py
- Rename policy_stop_hook.sh to rules_stop_hook.sh
- Update all documentation, tests, and references

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---
 .claude/commands/add_platform.verify.md       |   8 +-
 .claude/commands/deepwork_jobs.implement.md   |  50 +--
 .claude/commands/deepwork_policy.define.md    | 388 -----------------
 .claude/commands/deepwork_rules.define.md     | 288 +++++++++++++
 .claude/commands/update.job.md                |   4 +-
 .claude/settings.json                         |  18 +
 .deepwork.policy.yml => .deepwork.rules.yml   |  10 +-
 .deepwork/jobs/add_platform/job.yml           |   2 +-
 .deepwork/jobs/add_platform/steps/verify.md   |   4 +-
 .deepwork/jobs/deepwork_jobs/job.yml          |   6 +-
 .../jobs/deepwork_jobs/steps/implement.md     |  38 +-
 .../deepwork_policy/hooks/policy_stop_hook.sh |  56 ---
 .deepwork/jobs/deepwork_policy/job.yml        |  40 --
 .../jobs/deepwork_policy/steps/define.md      | 258 ------------
 .../hooks/capture_prompt_work_tree.sh         |   0
 .../deepwork_rules}/hooks/global_hooks.yml    |   4 +-
 .../deepwork_rules/hooks/rules_stop_hook.sh   |  24 +-
 .../hooks/user_prompt_submit.sh               |   0
 .../jobs/deepwork_rules}/job.yml              |  26 +-
 .../jobs/deepwork_rules}/steps/define.md      |  72 ++--
 .deepwork/jobs/update/job.yml                 |   2 +-
 .deepwork/jobs/update/steps/job.md            |   2 +-
 .gemini/commands/add_platform/verify.toml     |   4 +-
 .gemini/commands/deepwork_jobs/implement.toml |  38 +-
 .gemini/commands/deepwork_policy/define.toml  | 396 ------------------
 .gemini/commands/deepwork_rules/define.toml   | 295 +++++++++++++
 .gemini/commands/update/job.toml              |   4 +-
 CHANGELOG.md                                  |  28 +-
 README.md                                     |  32 +-
 claude.md                                     |   4 +-
 doc/architecture.md                           |  88 ++--
 doc/platforms/gemini/hooks.md                 |  10 +-
 doc/{policy_syntax.md => rules_syntax.md}     | 102 ++---
 ...ystem_design.md => rules_system_design.md} | 132 +++---
 doc/test_scenarios.md                         | 136 +++---
 src/deepwork/cli/install.py                   |  40 +-
 src/deepwork/core/command_executor.py         |   6 +-
 src/deepwork/core/pattern_matcher.py          |   2 +-
 .../{policy_parser.py => rules_parser.py}     | 200 ++++-----
 .../core/{policy_queue.py => rules_queue.py}  |  48 +--
 src/deepwork/hooks/README.md                  |   8 +-
 src/deepwork/hooks/__init__.py                |   9 +-
 src/deepwork/hooks/claude_hook.sh             |   4 +-
 src/deepwork/hooks/gemini_hook.sh             |   4 +-
 .../hooks/{policy_check.py => rules_check.py} | 150 +++----
 .../{policy_schema.py => rules_schema.py}     |  16 +-
 .../standard_jobs/deepwork_jobs/job.yml       |   6 +-
 .../deepwork_jobs/steps/implement.md          |  38 +-
 .../hooks/capture_prompt_work_tree.sh         |   0
 .../deepwork_rules}/hooks/global_hooks.yml    |   4 +-
 .../deepwork_rules/hooks/rules_stop_hook.sh   |  43 ++
 .../hooks/user_prompt_submit.sh               |   0
 .../standard_jobs/deepwork_rules/job.yml      |  37 ++
 .../deepwork_rules/steps/define.md            | 198 +++++++++
 .../{default_policy.yml => default_rules.yml} |  12 +-
 tests/integration/test_install_flow.py        |  38 +-
 tests/shell_script_tests/README.md            |  12 +-
 tests/shell_script_tests/conftest.py          |  24 +-
 .../test_capture_prompt_work_tree.py          |  62 +--
 .../shell_script_tests/test_hook_wrappers.py  |  18 +-
 .../test_hooks_json_format.py                 | 114 ++---
 ...y_stop_hook.py => test_rules_stop_hook.py} | 148 +++----
 .../test_user_prompt_submit.py                |  36 +-
 tests/unit/test_hook_wrapper.py               |   8 +-
 tests/unit/test_hooks_syncer.py               |   4 +-
 tests/unit/test_policy_parser.py              | 364 ----------------
 tests/unit/test_rules_parser.py               | 364 ++++++++++++++++
 67 files changed, 2163 insertions(+), 2423 deletions(-)
 delete mode 100644 .claude/commands/deepwork_policy.define.md
 create mode 100644 .claude/commands/deepwork_rules.define.md
 rename .deepwork.policy.yml => .deepwork.rules.yml (89%)
 delete mode 100755 .deepwork/jobs/deepwork_policy/hooks/policy_stop_hook.sh
 delete mode 100644 .deepwork/jobs/deepwork_policy/job.yml
 delete mode 100644 .deepwork/jobs/deepwork_policy/steps/define.md
 rename .deepwork/jobs/{deepwork_policy => deepwork_rules}/hooks/capture_prompt_work_tree.sh (100%)
 rename {src/deepwork/standard_jobs/deepwork_policy => .deepwork/jobs/deepwork_rules}/hooks/global_hooks.yml (62%)
 rename src/deepwork/standard_jobs/deepwork_policy/hooks/policy_stop_hook.sh => .deepwork/jobs/deepwork_rules/hooks/rules_stop_hook.sh (51%)
 rename .deepwork/jobs/{deepwork_policy => deepwork_rules}/hooks/user_prompt_submit.sh (100%)
 rename {src/deepwork/standard_jobs/deepwork_policy => .deepwork/jobs/deepwork_rules}/job.yml (50%)
 rename {src/deepwork/standard_jobs/deepwork_policy => .deepwork/jobs/deepwork_rules}/steps/define.md (69%)
 delete mode 100644 .gemini/commands/deepwork_policy/define.toml
 create mode 100644 .gemini/commands/deepwork_rules/define.toml
 rename doc/{policy_syntax.md => rules_syntax.md} (73%)
 rename doc/{policy_system_design.md => rules_system_design.md} (80%)
 rename src/deepwork/core/{policy_parser.py => rules_parser.py} (68%)
 rename src/deepwork/core/{policy_queue.py => rules_queue.py} (88%)
 rename src/deepwork/hooks/{policy_check.py => rules_check.py} (79%)
 rename src/deepwork/schemas/{policy_schema.py => rules_schema.py} (86%)
 rename src/deepwork/standard_jobs/{deepwork_policy => deepwork_rules}/hooks/capture_prompt_work_tree.sh (100%)
 rename {.deepwork/jobs/deepwork_policy => src/deepwork/standard_jobs/deepwork_rules}/hooks/global_hooks.yml (62%)
 create mode 100755 src/deepwork/standard_jobs/deepwork_rules/hooks/rules_stop_hook.sh
 rename src/deepwork/standard_jobs/{deepwork_policy => deepwork_rules}/hooks/user_prompt_submit.sh (100%)
 create mode 100644 src/deepwork/standard_jobs/deepwork_rules/job.yml
 create mode 100644 src/deepwork/standard_jobs/deepwork_rules/steps/define.md
 rename src/deepwork/templates/{default_policy.yml => default_rules.yml} (85%)
 rename tests/shell_script_tests/{test_policy_stop_hook.py => test_rules_stop_hook.py} (61%)
 delete mode 100644 tests/unit/test_policy_parser.py
 create mode 100644 tests/unit/test_rules_parser.py

diff --git a/.claude/commands/add_platform.verify.md b/.claude/commands/add_platform.verify.md
index d92da75a..d937b537 100644
--- a/.claude/commands/add_platform.verify.md
+++ b/.claude/commands/add_platform.verify.md
@@ -14,7 +14,7 @@ hooks:
             2. Running `deepwork install --platform <platform>` completes without errors
             3. Expected command files are created in the platform's command directory
             4. Command file content matches the templates and job definitions
-            5. Established DeepWork jobs (deepwork_jobs, deepwork_policy) are installed correctly
+            5. Established DeepWork jobs (deepwork_jobs, deepwork_rules) are installed correctly
             6. The platform can be used alongside existing platforms without conflicts
 
             If ALL criteria are met, include `<promise>✓ Quality Criteria Met</promise>`.
@@ -121,7 +121,7 @@ Ensure the implementation step is complete:
    - `deepwork_jobs.define.md` exists (or equivalent for the platform)
    - `deepwork_jobs.implement.md` exists
    - `deepwork_jobs.refine.md` exists
-   - `deepwork_policy.define.md` exists
+   - `deepwork_rules.define.md` exists
    - All expected step commands exist
 
 4. **Validate command file content**
@@ -151,7 +151,7 @@ Ensure the implementation step is complete:
 - `deepwork install --platform <platform_name>` completes without errors
 - All expected command files are created:
   - deepwork_jobs.define, implement, refine
-  - deepwork_policy.define
+  - deepwork_rules.define
   - Any other standard job commands
 - Command file content is correct:
   - Matches platform's expected format
@@ -218,7 +218,7 @@ Verify the installation meets ALL criteria:
 2. Running `deepwork install --platform <platform>` completes without errors
 3. Expected command files are created in the platform's command directory
 4. Command file content matches the templates and job definitions
-5. Established DeepWork jobs (deepwork_jobs, deepwork_policy) are installed correctly
+5. Established DeepWork jobs (deepwork_jobs, deepwork_rules) are installed correctly
 6. The platform can be used alongside existing platforms without conflicts
 
 If ALL criteria are met, include `<promise>✓ Quality Criteria Met</promise>`.
diff --git a/.claude/commands/deepwork_jobs.implement.md b/.claude/commands/deepwork_jobs.implement.md
index 132330f1..76089b2d 100644
--- a/.claude/commands/deepwork_jobs.implement.md
+++ b/.claude/commands/deepwork_jobs.implement.md
@@ -19,9 +19,9 @@ hooks:
             6. **Ask Structured Questions**: Do step instructions that gather user input explicitly use the phrase "ask structured questions"?
             7. **Sync Complete**: Has `deepwork sync` been run successfully?
             8. **Commands Available**: Are the slash-commands generated in `.claude/commands/`?
-            9. **Policies Considered**: Have you thought about whether policies would benefit this job?
-               - If relevant policies were identified, did you explain them and offer to run `/deepwork_policy.define`?
-               - Not every job needs policies - only suggest when genuinely helpful.
+            9. **Rules Considered**: Have you thought about whether rules would benefit this job?
+               - If relevant rules were identified, did you explain them and offer to run `/deepwork_rules.define`?
+               - Not every job needs rules - only suggest when genuinely helpful.
 
             If ANY criterion is not met, continue working to address it.
             If ALL criteria are satisfied, include `<promise>✓ Quality Criteria Met</promise>` in your response.
@@ -200,19 +200,19 @@ This will:
 
 After running `deepwork sync`, look at the "To use the new commands" section in the output. **Relay these exact reload instructions to the user** so they know how to pick up the new commands. Don't just reference the sync output - tell them directly what they need to do (e.g., "Type 'exit' then run 'claude --resume'" for Claude Code, or "Run '/memory refresh'" for Gemini CLI).
 
-### Step 7: Consider Policies for the New Job
+### Step 7: Consider Rules for the New Job
 
-After implementing the job, consider whether there are **policies** that would help enforce quality or consistency when working with this job's domain.
+After implementing the job, consider whether there are **rules** that would help enforce quality or consistency when working with this job's domain.
 
-**What are policies?**
+**What are rules?**
 
-Policies are automated guardrails defined in `.deepwork.policy.yml` that trigger when certain files change during an AI session. They help ensure:
+Rules are automated guardrails defined in `.deepwork.rules.yml` that trigger when certain files change during an AI session. They help ensure:
 - Documentation stays in sync with code
 - Team guidelines are followed
 - Architectural decisions are respected
 - Quality standards are maintained
 
-**When to suggest policies:**
+**When to suggest rules:**
 
 Think about the job you just implemented and ask:
 - Does this job produce outputs that other files depend on?
@@ -220,28 +220,28 @@ Think about the job you just implemented and ask:
 - Are there quality checks or reviews that should happen when certain files in this domain change?
 - Could changes to the job's output files impact other parts of the project?
 
-**Examples of policies that might make sense:**
+**Examples of rules that might make sense:**
 
-| Job Type | Potential Policy |
-|----------|------------------|
+| Job Type | Potential Rule |
+|----------|----------------|
 | API Design | "Update API docs when endpoint definitions change" |
 | Database Schema | "Review migrations when schema files change" |
 | Competitive Research | "Update strategy docs when competitor analysis changes" |
 | Feature Development | "Update changelog when feature files change" |
 | Configuration Management | "Update install guide when config files change" |
 
-**How to offer policy creation:**
+**How to offer rule creation:**
 
-If you identify one or more policies that would benefit the user, explain:
-1. **What the policy would do** - What triggers it and what action it prompts
+If you identify one or more rules that would benefit the user, explain:
+1. **What the rule would do** - What triggers it and what action it prompts
 2. **Why it would help** - How it prevents common mistakes or keeps things in sync
 3. **What files it would watch** - The trigger patterns
 
 Then ask the user:
 
-> "Would you like me to create this policy for you? I can run `/deepwork_policy.define` to set it up."
+> "Would you like me to create this rule for you? I can run `/deepwork_rules.define` to set it up."
 
-If the user agrees, invoke the `/deepwork_policy.define` command to guide them through creating the policy.
+If the user agrees, invoke the `/deepwork_rules.define` command to guide them through creating the rule.
 
 **Example dialogue:**
 
@@ -250,15 +250,15 @@ Based on the competitive_research job you just created, I noticed that when
 competitor analysis files change, it would be helpful to remind you to update
 your strategy documentation.
 
-I'd suggest a policy like:
+I'd suggest a rule like:
 - **Name**: "Update strategy when competitor analysis changes"
 - **Trigger**: `**/positioning_report.md`
 - **Action**: Prompt to review and update `docs/strategy.md`
 
-Would you like me to create this policy? I can run `/deepwork_policy.define` to set it up.
+Would you like me to create this rule? I can run `/deepwork_rules.define` to set it up.
 ```
 
-**Note:** Not every job needs policies. Only suggest them when they would genuinely help maintain consistency or quality. Don't force policies where they don't make sense.
+**Note:** Not every job needs rules. Only suggest them when they would genuinely help maintain consistency or quality. Don't force rules where they don't make sense.
 
 ## Example Implementation
 
@@ -292,8 +292,8 @@ Before marking this step complete, ensure:
 - [ ] `deepwork sync` executed successfully
 - [ ] Commands generated in platform directory
 - [ ] User informed to follow reload instructions from `deepwork sync`
-- [ ] Considered whether policies would benefit this job (Step 7)
-- [ ] If policies suggested, offered to run `/deepwork_policy.define`
+- [ ] Considered whether rules would benefit this job (Step 7)
+- [ ] If rules suggested, offered to run `/deepwork_rules.define`
 
 ## Quality Criteria
 
@@ -305,7 +305,7 @@ Before marking this step complete, ensure:
 - Steps with user inputs explicitly use "ask structured questions" phrasing
 - Sync completed successfully
 - Commands available for use
-- Thoughtfully considered relevant policies for the job domain
+- Thoughtfully considered relevant rules for the job domain
 
 
 ## Inputs
@@ -355,9 +355,9 @@ Verify the implementation meets ALL quality criteria before completing:
 6. **Ask Structured Questions**: Do step instructions that gather user input explicitly use the phrase "ask structured questions"?
 7. **Sync Complete**: Has `deepwork sync` been run successfully?
 8. **Commands Available**: Are the slash-commands generated in `.claude/commands/`?
-9. **Policies Considered**: Have you thought about whether policies would benefit this job?
-   - If relevant policies were identified, did you explain them and offer to run `/deepwork_policy.define`?
-   - Not every job needs policies - only suggest when genuinely helpful.
+9. **Rules Considered**: Have you thought about whether rules would benefit this job?
+   - If relevant rules were identified, did you explain them and offer to run `/deepwork_rules.define`?
+   - Not every job needs rules - only suggest when genuinely helpful.
 
 If ANY criterion is not met, continue working to address it.
 If ALL criteria are satisfied, include `<promise>✓ Quality Criteria Met</promise>` in your response.
diff --git a/.claude/commands/deepwork_policy.define.md b/.claude/commands/deepwork_policy.define.md
deleted file mode 100644
index 9a2a551a..00000000
--- a/.claude/commands/deepwork_policy.define.md
+++ /dev/null
@@ -1,388 +0,0 @@
----
-description: Create or update policies in .deepwork/policies/ (v2) or .deepwork.policy.yml (v1)
----
-
-# deepwork_policy.define
-
-**Standalone command** in the **deepwork_policy** job - can be run anytime
-
-**Summary**: Policy enforcement for AI agent sessions
-
-## Job Overview
-
-Manages policies that automatically trigger when certain files change during an AI agent session.
-Policies help ensure that code changes follow team guidelines, documentation is updated,
-and architectural decisions are respected.
-
-**Policy System v2 (Recommended)**
-Policies are defined as individual markdown files in `.deepwork/policies/` with YAML frontmatter.
-This format supports:
-- Detection modes: trigger/safety (default), set (bidirectional), pair (directional)
-- Action types: prompt (show instructions), command (run idempotent commands)
-- Variable pattern matching for file correspondence (e.g., `src/{path}.py` ↔ `tests/{path}_test.py`)
-
-**Legacy v1 Format**
-Still supported: `.deepwork.policy.yml` at project root with trigger/safety/instructions fields.
-
-Example use cases:
-- Enforce source/test pairing with set patterns
-- Run formatters automatically when files change
-- Update installation docs when configuration files change
-- Require security review when authentication code is modified
-- Ensure API documentation stays in sync with API code
-
-
-
-## Instructions
-
-# Define Policy
-
-## Objective
-
-Create or update policies to enforce team guidelines, documentation requirements, file correspondences, or automated commands when specific files change.
-
-## Task
-
-Guide the user through defining a new policy by asking structured questions. **Do not create the policy without first understanding what they want to enforce.**
-
-**Important**: Use the AskUserQuestion tool to ask structured questions when gathering information from the user.
-
-## Policy System Overview
-
-DeepWork supports two policy formats:
-
-**v2 (Recommended)**: Individual markdown files in `.deepwork/policies/` with YAML frontmatter
-**v1 (Legacy)**: Single `.deepwork.policy.yml` file at project root
-
-**Always prefer v2 format** for new policies. It supports more detection modes and action types.
-
----
-
-## Step 1: Understand the Policy Purpose
-
-Ask structured questions to understand what the user wants to enforce:
-
-1. **What should this policy enforce?**
-   - Documentation sync? Security review? File correspondence? Code formatting?
-
-2. **What files trigger this policy?**
-   - Which files/directories, when changed, should trigger action?
-
-3. **What should happen when the policy fires?**
-   - Show instructions to the agent? Run a command automatically?
-
----
-
-## Step 2: Choose Detection Mode
-
-Policies support three detection modes:
-
-### Trigger/Safety (Default)
-Fire when trigger patterns match AND safety patterns don't.
-
-**Use for**: General checks like "source changed, verify README"
-
-```yaml
-trigger: "app/config/**/*"
-safety: "docs/install_guide.md"
-```
-
-### Set (Bidirectional Correspondence)
-Fire when files matching one pattern change but corresponding files don't.
-
-**Use for**: Source/test pairing, i18n files, paired documentation
-
-```yaml
-set:
-  - src/{path}.py
-  - tests/{path}_test.py
-```
-
-If `src/utils/helper.py` changes, expects `tests/utils/helper_test.py` to also change.
-
-### Pair (Directional Correspondence)
-Fire when trigger files change but expected files don't. Changes to expected files alone don't trigger.
-
-**Use for**: API code requires docs (but docs changes don't require API changes)
-
-```yaml
-pair:
-  trigger: src/api/{name}.py
-  expects: docs/api/{name}.md
-```
-
-### Variable Pattern Syntax
-
-- `{path}` - Matches multiple path segments (e.g., `foo/bar/baz`)
-- `{name}` - Matches a single segment (e.g., `helper`)
-
----
-
-## Step 3: Choose Action Type
-
-### Prompt (Default)
-Show instructions to the agent. The markdown body becomes the instructions.
-
-```markdown
----
-name: Security Review
-trigger: "src/auth/**/*"
----
-Please review for hardcoded credentials and validate input handling.
-```
-
-### Command
-Run an idempotent command automatically. No markdown body needed.
-
-```markdown
----
-name: Format Python
-trigger: "**/*.py"
-action:
-  command: "ruff format {file}"
-  run_for: each_match
----
-```
-
-**Command variables**:
-- `{file}` - Current file being processed
-- `{files}` - Space-separated list of all matching files
-- `{repo_root}` - Repository root path
-
-**run_for options**:
-- `each_match` - Run command once per matching file
-- `all_matches` - Run command once with all files
-
----
-
-## Step 4: Define Optional Settings
-
-### compare_to (Optional)
-Controls what baseline is used for detecting changed files:
-
-- `base` (default) - Changes since branch diverged from main/master
-- `default_tip` - Changes compared to current main/master tip
-- `prompt` - Changes since the last prompt submission
-
-Most policies should use the default (`base`).
-
----
-
-## Step 5: Create the Policy File (v2 Format)
-
-### File Location
-Create: `.deepwork/policies/[policy-name].md`
-
-Use kebab-case for filename (e.g., `source-test-pairing.md`, `format-python.md`)
-
-### v2 Format Examples
-
-**Trigger/Safety with Prompt:**
-```markdown
----
-name: Update Install Guide
-trigger: "app/config/**/*"
-safety: "docs/install_guide.md"
----
-Configuration files have changed. Please review docs/install_guide.md
-and update installation instructions if needed.
-```
-
-**Set (Bidirectional) with Prompt:**
-```markdown
----
-name: Source/Test Pairing
-set:
-  - src/{path}.py
-  - tests/{path}_test.py
----
-When source files change, corresponding test files should also change.
-Please create or update tests for the modified source files.
-```
-
-**Pair (Directional) with Prompt:**
-```markdown
----
-name: API Documentation
-pair:
-  trigger: src/api/{name}.py
-  expects: docs/api/{name}.md
----
-API code has changed. Please update the corresponding documentation.
-```
-
-**Command Action:**
-```markdown
----
-name: Format Python Files
-trigger: "**/*.py"
-action:
-  command: "ruff format {file}"
-  run_for: each_match
----
-```
-
-**Multiple Trigger Patterns:**
-```markdown
----
-name: Security Review
-trigger:
-  - "src/auth/**/*"
-  - "src/security/**/*"
-safety:
-  - "SECURITY.md"
-  - "docs/security_audit.md"
----
-Authentication or security code has been changed. Please review for:
-1. Hardcoded credentials or secrets
-2. Input validation issues
-3. Access control logic
-```
-
----
-
-## Step 6: Legacy v1 Format (If Needed)
-
-Only use v1 format when adding to an existing `.deepwork.policy.yml` file.
-
-**File Location**: `.deepwork.policy.yml` (project root)
-
-```yaml
-- name: "Update install guide on config changes"
-  trigger: "app/config/**/*"
-  safety: "docs/install_guide.md"
-  compare_to: "base"
-  instructions: |
-    Configuration files have changed. Please review docs/install_guide.md.
-```
-
-**Alternative with instructions_file:**
-```yaml
-- name: "Security review"
-  trigger: "src/auth/**/*"
-  instructions_file: "path/to/instructions.md"
-```
-
----
-
-## Step 7: Verify the Policy
-
-After creating the policy:
-
-1. **Check YAML frontmatter syntax** - Ensure valid YAML
-2. **Verify detection mode is appropriate** - trigger/safety vs set vs pair
-3. **Test patterns match intended files** - Check glob/variable patterns
-4. **Review instructions/command** - Ensure they're actionable
-5. **Check for conflicts** - Ensure no overlap with existing policies
-
----
-
-## Pattern Reference
-
-### Glob Patterns
-- `*` - Matches any characters within a single path segment
-- `**` - Matches across multiple path segments (recursive)
-- `?` - Matches a single character
-
-### Variable Patterns (v2 only)
-- `{path}` - Captures multiple segments: `src/{path}.py` matches `src/a/b/c.py` → path=`a/b/c`
-- `{name}` - Captures single segment: `src/{name}.py` matches `src/utils.py` → name=`utils`
-
-### Common Examples
-- `src/**/*.py` - All Python files in src (recursive)
-- `app/config/**/*` - All files in app/config
-- `*.md` - Markdown files in root only
-- `**/*.test.ts` - All test files anywhere
-- `src/{path}.ts` ↔ `tests/{path}.test.ts` - Source/test pairs
-
----
-
-## Output Format
-
-Create one of:
-- `.deepwork/policies/[policy-name].md` (v2 format, recommended)
-- Entry in `.deepwork.policy.yml` (v1 format, legacy)
-
----
-
-## Quality Criteria
-
-- Asked structured questions to understand requirements
-- Chose appropriate detection mode (trigger/safety, set, or pair)
-- Chose appropriate action type (prompt or command)
-- Policy name is clear and descriptive
-- Patterns accurately match intended files
-- Instructions or command are actionable
-- YAML frontmatter is valid
-
----
-
-## Context
-
-Policies are evaluated automatically when you finish working. The system:
-
-1. Loads policies from `.deepwork/policies/` (v2) and `.deepwork.policy.yml` (v1)
-2. Detects changed files based on `compare_to` setting
-3. Evaluates each policy based on its detection mode
-4. For **command** actions: Runs the command automatically
-5. For **prompt** actions: Shows instructions if policy fires
-
-Mark a policy as addressed by including `<promise>✓ Policy Name</promise>` in your response.
-
-
-## Inputs
-
-### User Parameters
-
-Please gather the following information from the user:
-- **policy_purpose**: What guideline or constraint should this policy enforce?
-
-
-## Work Branch Management
-
-All work for this job should be done on a dedicated work branch:
-
-1. **Check current branch**:
-   - If already on a work branch for this job (format: `deepwork/deepwork_policy-[instance]-[date]`), continue using it
-   - If on main/master, create a new work branch
-
-2. **Create work branch** (if needed):
-   ```bash
-   git checkout -b deepwork/deepwork_policy-[instance]-$(date +%Y%m%d)
-   ```
-   Replace `[instance]` with a descriptive identifier (e.g., `acme`, `q1-launch`, etc.)
-
-## Output Requirements
-
-Create the following output(s):
-- `.deepwork/policies/*.md`- `.deepwork.policy.yml`
-Ensure all outputs are:
-- Well-formatted and complete
-- Ready for review or use by subsequent steps
-
-## Completion
-
-After completing this step:
-
-1. **Verify outputs**: Confirm all required files have been created
-
-2. **Inform the user**:
-   - The define command is complete
-   - Outputs created: .deepwork/policies/*.md, .deepwork.policy.yml
-   - This command can be run again anytime to make further changes
-
-## Command Complete
-
-This is a standalone command that can be run anytime. The outputs are ready for use.
-
-Consider:
-- Reviewing the outputs
-- Running `deepwork sync` if job definitions were changed
-- Re-running this command later if further changes are needed
-
----
-
-## Context Files
-
-- Job definition: `.deepwork/jobs/deepwork_policy/job.yml`
-- Step instructions: `.deepwork/jobs/deepwork_policy/steps/define.md`
\ No newline at end of file
diff --git a/.claude/commands/deepwork_rules.define.md b/.claude/commands/deepwork_rules.define.md
new file mode 100644
index 00000000..286cd54a
--- /dev/null
+++ b/.claude/commands/deepwork_rules.define.md
@@ -0,0 +1,288 @@
+---
+description: Create or update rule entries in .deepwork.rules.yml
+---
+
+# deepwork_rules.define
+
+**Standalone command** in the **deepwork_rules** job - can be run anytime
+
+**Summary**: Rules enforcement for AI agent sessions
+
+## Job Overview
+
+Manages rules that automatically trigger when certain files change during an AI agent session.
+Rules help ensure that code changes follow team guidelines, documentation is updated,
+and architectural decisions are respected.
+
+Rules are defined in a `.deepwork.rules.yml` file at the root of your project. Each rule
+specifies:
+- Trigger patterns: Glob patterns for files that, when changed, should trigger the rule
+- Safety patterns: Glob patterns for files that, if also changed, mean the rule doesn't need to fire
+- Instructions: What the agent should do when the rule triggers
+
+Example use cases:
+- Update installation docs when configuration files change
+- Require security review when authentication code is modified
+- Ensure API documentation stays in sync with API code
+- Remind developers to update changelogs
+
+
+
+## Instructions
+
+# Define Rule
+
+## Objective
+
+Create or update rule entries in the `.deepwork.rules.yml` file to enforce team guidelines, documentation requirements, or other constraints when specific files change.
+
+## Task
+
+Guide the user through defining a new rule by asking structured questions. **Do not create the rule without first understanding what they want to enforce.**
+
+**Important**: Use the AskUserQuestion tool to ask structured questions when gathering information from the user. This provides a better user experience with clear options and guided choices.
+
+### Step 1: Understand the Rule Purpose
+
+Start by asking structured questions to understand what the user wants to enforce:
+
+1. **What guideline or constraint should this rule enforce?**
+   - What situation triggers the need for action?
+   - What files or directories, when changed, should trigger this rule?
+   - Examples: "When config files change", "When API code changes", "When database schema changes"
+
+2. **What action should be taken?**
+   - What should the agent do when the rule triggers?
+   - Update documentation? Perform a security review? Update tests?
+   - Is there a specific file or process that needs attention?
+
+3. **Are there any "safety" conditions?**
+   - Are there files that, if also changed, mean the rule doesn't need to fire?
+   - For example: If config changes AND install_guide.md changes, assume docs are already updated
+   - This prevents redundant prompts when the user has already done the right thing
+
+### Step 2: Define the Trigger Patterns
+
+Help the user define glob patterns for files that should trigger the rule:
+
+**Common patterns:**
+- `src/**/*.py` - All Python files in src directory (recursive)
+- `app/config/**/*` - All files in app/config directory
+- `*.md` - All markdown files in root
+- `src/api/**/*` - All files in the API directory
+- `migrations/**/*.sql` - All SQL migrations
+
+**Pattern syntax:**
+- `*` - Matches any characters within a single path segment
+- `**` - Matches any characters across multiple path segments (recursive)
+- `?` - Matches a single character
+
+### Step 3: Define Safety Patterns (Optional)
+
+If there are files that, when also changed, mean the rule shouldn't fire:
+
+**Examples:**
+- Rule: "Update install guide when config changes"
+  - Trigger: `app/config/**/*`
+  - Safety: `docs/install_guide.md` (if already updated, don't prompt)
+
+- Rule: "Security review for auth changes"
+  - Trigger: `src/auth/**/*`
+  - Safety: `SECURITY.md`, `docs/security_review.md`
+
+### Step 3b: Choose the Comparison Mode (Optional)
+
+The `compare_to` field controls what baseline is used when detecting "changed files":
+
+**Options:**
+- `base` (default) - Compares to the base of the current branch (merge-base with main/master). This is the most common choice for feature branches, as it shows all changes made on the branch.
+- `default_tip` - Compares to the current tip of the default branch (main/master). Useful when you want to see the difference from what's currently in production.
+- `prompt` - Compares to the state at the start of each prompt. Useful for rules that should only fire based on changes made during a single agent response.
+
+**When to use each:**
+- **base**: Best for most rules. "Did this branch change config files?" -> trigger docs review
+- **default_tip**: For rules about what's different from production/main
+- **prompt**: For rules that should only consider very recent changes within the current session
+
+Most rules should use the default (`base`) and don't need to specify `compare_to`.
+
+### Step 4: Write the Instructions
+
+Create clear, actionable instructions for what the agent should do when the rule fires.
+
+**Good instructions include:**
+- What to check or review
+- What files might need updating
+- Specific actions to take
+- Quality criteria for completion
+
+**Example:**
+```
+Configuration files have changed. Please:
+1. Review docs/install_guide.md for accuracy
+2. Update any installation steps that reference changed config
+3. Verify environment variable documentation is current
+4. Test that installation instructions still work
+```
+
+### Step 5: Create the Rule Entry
+
+Create or update `.deepwork.rules.yml` in the project root.
+
+**File Location**: `.deepwork.rules.yml` (root of project)
+
+**Format**:
+```yaml
+- name: "[Friendly name for the rule]"
+  trigger: "[glob pattern]"  # or array: ["pattern1", "pattern2"]
+  safety: "[glob pattern]"   # optional, or array
+  compare_to: "base"         # optional: "base" (default), "default_tip", or "prompt"
+  instructions: |
+    [Multi-line instructions for the agent...]
+```
+
+**Alternative with instructions_file**:
+```yaml
+- name: "[Friendly name for the rule]"
+  trigger: "[glob pattern]"
+  safety: "[glob pattern]"
+  compare_to: "base"         # optional
+  instructions_file: "path/to/instructions.md"
+```
+
+### Step 6: Verify the Rule
+
+After creating the rule:
+
+1. **Check the YAML syntax** - Ensure valid YAML formatting
+2. **Test trigger patterns** - Verify patterns match intended files
+3. **Review instructions** - Ensure they're clear and actionable
+4. **Check for conflicts** - Ensure the rule doesn't conflict with existing ones
+
+## Example Rules
+
+### Update Documentation on Config Changes
+```yaml
+- name: "Update install guide on config changes"
+  trigger: "app/config/**/*"
+  safety: "docs/install_guide.md"
+  instructions: |
+    Configuration files have been modified. Please review docs/install_guide.md
+    and update it if any installation instructions need to change based on the
+    new configuration.
+```
+
+### Security Review for Auth Code
+```yaml
+- name: "Security review for authentication changes"
+  trigger:
+    - "src/auth/**/*"
+    - "src/security/**/*"
+  safety:
+    - "SECURITY.md"
+    - "docs/security_audit.md"
+  instructions: |
+    Authentication or security code has been changed. Please:
+    1. Review for hardcoded credentials or secrets
+    2. Check input validation on user inputs
+    3. Verify access control logic is correct
+    4. Update security documentation if needed
+```
+
+### API Documentation Sync
+```yaml
+- name: "API documentation update"
+  trigger: "src/api/**/*.py"
+  safety: "docs/api/**/*.md"
+  instructions: |
+    API code has changed. Please verify that API documentation in docs/api/
+    is up to date with the code changes. Pay special attention to:
+    - New or changed endpoints
+    - Modified request/response schemas
+    - Updated authentication requirements
+```
+
+## Output Format
+
+### .deepwork.rules.yml
+Create or update this file at the project root with the new rule entry.
+
+## Quality Criteria
+
+- Asked structured questions to understand user requirements
+- Rule name is clear and descriptive
+- Trigger patterns accurately match the intended files
+- Safety patterns prevent unnecessary triggering
+- Instructions are actionable and specific
+- YAML is valid and properly formatted
+
+## Context
+
+Rules are evaluated automatically when you finish working on a task. The system:
+1. Determines which files have changed based on each rule's `compare_to` setting:
+   - `base` (default): Files changed since the branch diverged from main/master
+   - `default_tip`: Files different from the current main/master branch
+   - `prompt`: Files changed since the last prompt submission
+2. Checks if any changes match rule trigger patterns
+3. Skips rules where safety patterns also matched
+4. Prompts you with instructions for any triggered rules
+
+You can mark a rule as addressed by including `<promise>Rule Name</promise>` in your response (replace Rule Name with the actual rule name). This tells the system you've already handled that rule's requirements.
+
+
+## Inputs
+
+### User Parameters
+
+Please gather the following information from the user:
+- **rule_purpose**: What guideline or constraint should this rule enforce?
+
+
+## Work Branch Management
+
+All work for this job should be done on a dedicated work branch:
+
+1. **Check current branch**:
+   - If already on a work branch for this job (format: `deepwork/deepwork_rules-[instance]-[date]`), continue using it
+   - If on main/master, create a new work branch
+
+2. **Create work branch** (if needed):
+   ```bash
+   git checkout -b deepwork/deepwork_rules-[instance]-$(date +%Y%m%d)
+   ```
+   Replace `[instance]` with a descriptive identifier (e.g., `acme`, `q1-launch`, etc.)
+
+## Output Requirements
+
+Create the following output(s):
+- `.deepwork.rules.yml`
+Ensure all outputs are:
+- Well-formatted and complete
+- Ready for review or use by subsequent steps
+
+## Completion
+
+After completing this step:
+
+1. **Verify outputs**: Confirm all required files have been created
+
+2. **Inform the user**:
+   - The define command is complete
+   - Outputs created: .deepwork.rules.yml
+   - This command can be run again anytime to make further changes
+
+## Command Complete
+
+This is a standalone command that can be run anytime. The outputs are ready for use.
+
+Consider:
+- Reviewing the outputs
+- Running `deepwork sync` if job definitions were changed
+- Re-running this command later if further changes are needed
+
+---
+
+## Context Files
+
+- Job definition: `.deepwork/jobs/deepwork_rules/job.yml`
+- Step instructions: `.deepwork/jobs/deepwork_rules/steps/define.md`
\ No newline at end of file
diff --git a/.claude/commands/update.job.md b/.claude/commands/update.job.md
index 1d2af384..9698eecf 100644
--- a/.claude/commands/update.job.md
+++ b/.claude/commands/update.job.md
@@ -38,7 +38,7 @@ hooks:
 ## Job Overview
 
 A workflow for maintaining standard jobs bundled with DeepWork. Standard jobs
-(like `deepwork_jobs` and `deepwork_policy`) are source-controlled in
+(like `deepwork_jobs` and `deepwork_rules`) are source-controlled in
 `src/deepwork/standard_jobs/` and must be edited there—never in `.deepwork/jobs/`
 or `.claude/commands/` directly.
 
@@ -82,7 +82,7 @@ Standard jobs exist in THREE locations, but only ONE is the source of truth:
 #### 1. Identify the Standard Job to Update
 
 From conversation context, determine:
-- Which standard job needs updating (e.g., `deepwork_jobs`, `deepwork_policy`)
+- Which standard job needs updating (e.g., `deepwork_jobs`, `deepwork_rules`)
 - What changes are needed (job.yml, step instructions, hooks, etc.)
 
 Current standard jobs:
diff --git a/.claude/settings.json b/.claude/settings.json
index 4b7a20e6..aa1b950c 100644
--- a/.claude/settings.json
+++ b/.claude/settings.json
@@ -100,6 +100,15 @@
             "command": ".deepwork/jobs/deepwork_policy/hooks/user_prompt_submit.sh"
           }
         ]
+      },
+      {
+        "matcher": "",
+        "hooks": [
+          {
+            "type": "command",
+            "command": ".deepwork/jobs/deepwork_rules/hooks/user_prompt_submit.sh"
+          }
+        ]
       }
     ],
     "Stop": [
@@ -111,6 +120,15 @@
             "command": ".deepwork/jobs/deepwork_policy/hooks/policy_stop_hook.sh"
           }
         ]
+      },
+      {
+        "matcher": "",
+        "hooks": [
+          {
+            "type": "command",
+            "command": ".deepwork/jobs/deepwork_rules/hooks/rules_stop_hook.sh"
+          }
+        ]
       }
     ]
   }
diff --git a/.deepwork.policy.yml b/.deepwork.rules.yml
similarity index 89%
rename from .deepwork.policy.yml
rename to .deepwork.rules.yml
index f2721da3..d3444dc3 100644
--- a/.deepwork.policy.yml
+++ b/.deepwork.rules.yml
@@ -21,16 +21,16 @@
 - name: "Standard Jobs Source of Truth"
   trigger:
     - ".deepwork/jobs/deepwork_jobs/**/*"
-    - ".deepwork/jobs/deepwork_policy/**/*"
+    - ".deepwork/jobs/deepwork_rules/**/*"
   safety:
     - "src/deepwork/standard_jobs/deepwork_jobs/**/*"
-    - "src/deepwork/standard_jobs/deepwork_policy/**/*"
+    - "src/deepwork/standard_jobs/deepwork_rules/**/*"
   instructions: |
-    You modified files in `.deepwork/jobs/deepwork_jobs/` or `.deepwork/jobs/deepwork_policy/`.
+    You modified files in `.deepwork/jobs/deepwork_jobs/` or `.deepwork/jobs/deepwork_rules/`.
 
     **These are installed copies, NOT the source of truth!**
 
-    Standard jobs (deepwork_jobs, deepwork_policy) must be edited in their source location:
+    Standard jobs (deepwork_jobs, deepwork_rules) must be edited in their source location:
     - Source: `src/deepwork/standard_jobs/[job_name]/`
     - Installed copy: `.deepwork/jobs/[job_name]/` (DO NOT edit directly)
 
@@ -68,4 +68,4 @@
     **If NO version update is needed** (e.g., tests only, comments, internal refactoring with no behavior change):
     - Explicitly state why no version bump is required
 
-    **This policy requires explicit action** - either update both files or justify why no update is needed.
\ No newline at end of file
+    **This rule requires explicit action** - either update both files or justify why no update is needed.
diff --git a/.deepwork/jobs/add_platform/job.yml b/.deepwork/jobs/add_platform/job.yml
index 07544743..cca6d637 100644
--- a/.deepwork/jobs/add_platform/job.yml
+++ b/.deepwork/jobs/add_platform/job.yml
@@ -130,7 +130,7 @@ steps:
             2. Running `deepwork install --platform <platform>` completes without errors
             3. Expected command files are created in the platform's command directory
             4. Command file content matches the templates and job definitions
-            5. Established DeepWork jobs (deepwork_jobs, deepwork_policy) are installed correctly
+            5. Established DeepWork jobs (deepwork_jobs, deepwork_rules) are installed correctly
             6. The platform can be used alongside existing platforms without conflicts
 
             If ALL criteria are met, include `<promise>✓ Quality Criteria Met</promise>`.
diff --git a/.deepwork/jobs/add_platform/steps/verify.md b/.deepwork/jobs/add_platform/steps/verify.md
index c4d35ffc..f3afe15a 100644
--- a/.deepwork/jobs/add_platform/steps/verify.md
+++ b/.deepwork/jobs/add_platform/steps/verify.md
@@ -52,7 +52,7 @@ Ensure the implementation step is complete:
    - `deepwork_jobs.define.md` exists (or equivalent for the platform)
    - `deepwork_jobs.implement.md` exists
    - `deepwork_jobs.refine.md` exists
-   - `deepwork_policy.define.md` exists
+   - `deepwork_rules.define.md` exists
    - All expected step commands exist
 
 4. **Validate command file content**
@@ -82,7 +82,7 @@ Ensure the implementation step is complete:
 - `deepwork install --platform <platform_name>` completes without errors
 - All expected command files are created:
   - deepwork_jobs.define, implement, refine
-  - deepwork_policy.define
+  - deepwork_rules.define
   - Any other standard job commands
 - Command file content is correct:
   - Matches platform's expected format
diff --git a/.deepwork/jobs/deepwork_jobs/job.yml b/.deepwork/jobs/deepwork_jobs/job.yml
index e1afa5ee..e95aa2c0 100644
--- a/.deepwork/jobs/deepwork_jobs/job.yml
+++ b/.deepwork/jobs/deepwork_jobs/job.yml
@@ -77,9 +77,9 @@ steps:
             6. **Ask Structured Questions**: Do step instructions that gather user input explicitly use the phrase "ask structured questions"?
             7. **Sync Complete**: Has `deepwork sync` been run successfully?
             8. **Commands Available**: Are the slash-commands generated in `.claude/commands/`?
-            9. **Policies Considered**: Have you thought about whether policies would benefit this job?
-               - If relevant policies were identified, did you explain them and offer to run `/deepwork_policy.define`?
-               - Not every job needs policies - only suggest when genuinely helpful.
+            9. **Rules Considered**: Have you thought about whether rules would benefit this job?
+               - If relevant rules were identified, did you explain them and offer to run `/deepwork_rules.define`?
+               - Not every job needs rules - only suggest when genuinely helpful.
 
             If ANY criterion is not met, continue working to address it.
             If ALL criteria are satisfied, include `<promise>✓ Quality Criteria Met</promise>` in your response.
diff --git a/.deepwork/jobs/deepwork_jobs/steps/implement.md b/.deepwork/jobs/deepwork_jobs/steps/implement.md
index a3a790f6..600e1578 100644
--- a/.deepwork/jobs/deepwork_jobs/steps/implement.md
+++ b/.deepwork/jobs/deepwork_jobs/steps/implement.md
@@ -130,19 +130,19 @@ This will:
 
 After running `deepwork sync`, look at the "To use the new commands" section in the output. **Relay these exact reload instructions to the user** so they know how to pick up the new commands. Don't just reference the sync output - tell them directly what they need to do (e.g., "Type 'exit' then run 'claude --resume'" for Claude Code, or "Run '/memory refresh'" for Gemini CLI).
 
-### Step 7: Consider Policies for the New Job
+### Step 7: Consider Rules for the New Job
 
-After implementing the job, consider whether there are **policies** that would help enforce quality or consistency when working with this job's domain.
+After implementing the job, consider whether there are **rules** that would help enforce quality or consistency when working with this job's domain.
 
-**What are policies?**
+**What are rules?**
 
-Policies are automated guardrails defined in `.deepwork.policy.yml` that trigger when certain files change during an AI session. They help ensure:
+Rules are automated guardrails defined in `.deepwork.rules.yml` that trigger when certain files change during an AI session. They help ensure:
 - Documentation stays in sync with code
 - Team guidelines are followed
 - Architectural decisions are respected
 - Quality standards are maintained
 
-**When to suggest policies:**
+**When to suggest rules:**
 
 Think about the job you just implemented and ask:
 - Does this job produce outputs that other files depend on?
@@ -150,28 +150,28 @@ Think about the job you just implemented and ask:
 - Are there quality checks or reviews that should happen when certain files in this domain change?
 - Could changes to the job's output files impact other parts of the project?
 
-**Examples of policies that might make sense:**
+**Examples of rules that might make sense:**
 
-| Job Type | Potential Policy |
-|----------|------------------|
+| Job Type | Potential Rule |
+|----------|----------------|
 | API Design | "Update API docs when endpoint definitions change" |
 | Database Schema | "Review migrations when schema files change" |
 | Competitive Research | "Update strategy docs when competitor analysis changes" |
 | Feature Development | "Update changelog when feature files change" |
 | Configuration Management | "Update install guide when config files change" |
 
-**How to offer policy creation:**
+**How to offer rule creation:**
 
-If you identify one or more policies that would benefit the user, explain:
-1. **What the policy would do** - What triggers it and what action it prompts
+If you identify one or more rules that would benefit the user, explain:
+1. **What the rule would do** - What triggers it and what action it prompts
 2. **Why it would help** - How it prevents common mistakes or keeps things in sync
 3. **What files it would watch** - The trigger patterns
 
 Then ask the user:
 
-> "Would you like me to create this policy for you? I can run `/deepwork_policy.define` to set it up."
+> "Would you like me to create this rule for you? I can run `/deepwork_rules.define` to set it up."
 
-If the user agrees, invoke the `/deepwork_policy.define` command to guide them through creating the policy.
+If the user agrees, invoke the `/deepwork_rules.define` command to guide them through creating the rule.
 
 **Example dialogue:**
 
@@ -180,15 +180,15 @@ Based on the competitive_research job you just created, I noticed that when
 competitor analysis files change, it would be helpful to remind you to update
 your strategy documentation.
 
-I'd suggest a policy like:
+I'd suggest a rule like:
 - **Name**: "Update strategy when competitor analysis changes"
 - **Trigger**: `**/positioning_report.md`
 - **Action**: Prompt to review and update `docs/strategy.md`
 
-Would you like me to create this policy? I can run `/deepwork_policy.define` to set it up.
+Would you like me to create this rule? I can run `/deepwork_rules.define` to set it up.
 ```
 
-**Note:** Not every job needs policies. Only suggest them when they would genuinely help maintain consistency or quality. Don't force policies where they don't make sense.
+**Note:** Not every job needs rules. Only suggest them when they would genuinely help maintain consistency or quality. Don't force rules where they don't make sense.
 
 ## Example Implementation
 
@@ -222,8 +222,8 @@ Before marking this step complete, ensure:
 - [ ] `deepwork sync` executed successfully
 - [ ] Commands generated in platform directory
 - [ ] User informed to follow reload instructions from `deepwork sync`
-- [ ] Considered whether policies would benefit this job (Step 7)
-- [ ] If policies suggested, offered to run `/deepwork_policy.define`
+- [ ] Considered whether rules would benefit this job (Step 7)
+- [ ] If rules suggested, offered to run `/deepwork_rules.define`
 
 ## Quality Criteria
 
@@ -235,4 +235,4 @@ Before marking this step complete, ensure:
 - Steps with user inputs explicitly use "ask structured questions" phrasing
 - Sync completed successfully
 - Commands available for use
-- Thoughtfully considered relevant policies for the job domain
+- Thoughtfully considered relevant rules for the job domain
diff --git a/.deepwork/jobs/deepwork_policy/hooks/policy_stop_hook.sh b/.deepwork/jobs/deepwork_policy/hooks/policy_stop_hook.sh
deleted file mode 100755
index b12d456c..00000000
--- a/.deepwork/jobs/deepwork_policy/hooks/policy_stop_hook.sh
+++ /dev/null
@@ -1,56 +0,0 @@
-#!/bin/bash
-# policy_stop_hook.sh - Evaluates policies when the agent stops
-#
-# This script is called as a Claude Code Stop hook. It:
-# 1. Evaluates policies from .deepwork.policy.yml
-# 2. Computes changed files based on each policy's compare_to setting
-# 3. Checks for <promise> tags in the conversation transcript
-# 4. Returns JSON to block stop if policies need attention
-
-set -e
-
-# Check if policy file exists
-if [ ! -f .deepwork.policy.yml ]; then
-    # No policies defined, nothing to do
-    exit 0
-fi
-
-# Read the hook input JSON from stdin
-HOOK_INPUT=""
-if [ ! -t 0 ]; then
-    HOOK_INPUT=$(cat)
-fi
-
-# Extract transcript_path from the hook input JSON using jq
-# Claude Code passes: {"session_id": "...", "transcript_path": "...", ...}
-TRANSCRIPT_PATH=""
-if [ -n "${HOOK_INPUT}" ]; then
-    TRANSCRIPT_PATH=$(echo "${HOOK_INPUT}" | jq -r '.transcript_path // empty' 2>/dev/null || echo "")
-fi
-
-# Extract conversation text from the JSONL transcript
-# The transcript is JSONL format - each line is a JSON object
-# We need to extract the text content from assistant messages
-conversation_context=""
-if [ -n "${TRANSCRIPT_PATH}" ] && [ -f "${TRANSCRIPT_PATH}" ]; then
-    # Extract text content from all assistant messages in the transcript
-    # Each line is a JSON object; we extract .message.content[].text for assistant messages
-    conversation_context=$(cat "${TRANSCRIPT_PATH}" | \
-        grep -E '"role"\s*:\s*"assistant"' | \
-        jq -r '.message.content // [] | map(select(.type == "text")) | map(.text) | join("\n")' 2>/dev/null | \
-        tr -d '\0' || echo "")
-fi
-
-# Call the Python evaluator
-# The Python module handles:
-# - Parsing the policy file
-# - Computing changed files based on each policy's compare_to setting
-# - Matching changed files against triggers/safety patterns
-# - Checking for promise tags in the conversation context
-# - Generating appropriate JSON output
-result=$(echo "${conversation_context}" | python -m deepwork.hooks.evaluate_policies \
-    --policy-file .deepwork.policy.yml \
-    2>/dev/null || echo '{}')
-
-# Output the result (JSON for Claude Code hooks)
-echo "${result}"
diff --git a/.deepwork/jobs/deepwork_policy/job.yml b/.deepwork/jobs/deepwork_policy/job.yml
deleted file mode 100644
index 946f2386..00000000
--- a/.deepwork/jobs/deepwork_policy/job.yml
+++ /dev/null
@@ -1,40 +0,0 @@
-name: deepwork_policy
-version: "0.3.0"
-summary: "Policy enforcement for AI agent sessions"
-description: |
-  Manages policies that automatically trigger when certain files change during an AI agent session.
-  Policies help ensure that code changes follow team guidelines, documentation is updated,
-  and architectural decisions are respected.
-
-  Policies are defined as individual markdown files in `.deepwork/policies/` with YAML frontmatter.
-  This format supports:
-  - Detection modes: trigger/safety (default), set (bidirectional), pair (directional)
-  - Action types: prompt (show instructions), command (run idempotent commands)
-  - Variable pattern matching for file correspondence (e.g., `src/{path}.py` ↔ `tests/{path}_test.py`)
-
-  Example use cases:
-  - Enforce source/test pairing with set patterns
-  - Run formatters automatically when files change
-  - Update installation docs when configuration files change
-  - Require security review when authentication code is modified
-  - Ensure API documentation stays in sync with API code
-
-changelog:
-  - version: "0.1.0"
-    changes: "Initial version"
-  - version: "0.2.0"
-    changes: "Standardized on 'ask structured questions' phrasing for user input"
-  - version: "0.3.0"
-    changes: "Updated for policy system v2 with detection modes, action types, and variable patterns"
-
-steps:
-  - id: define
-    name: "Define Policy"
-    description: "Create or update policies in .deepwork/policies/"
-    instructions_file: steps/define.md
-    inputs:
-      - name: policy_purpose
-        description: "What guideline or constraint should this policy enforce?"
-    outputs:
-      - .deepwork/policies/*.md
-    dependencies: []
diff --git a/.deepwork/jobs/deepwork_policy/steps/define.md b/.deepwork/jobs/deepwork_policy/steps/define.md
deleted file mode 100644
index 452194aa..00000000
--- a/.deepwork/jobs/deepwork_policy/steps/define.md
+++ /dev/null
@@ -1,258 +0,0 @@
-# Define Policy
-
-## Objective
-
-Create or update policies to enforce team guidelines, documentation requirements, file correspondences, or automated commands when specific files change.
-
-## Task
-
-Guide the user through defining a new policy by asking structured questions. **Do not create the policy without first understanding what they want to enforce.**
-
-**Important**: Use the AskUserQuestion tool to ask structured questions when gathering information from the user.
-
----
-
-## Step 1: Understand the Policy Purpose
-
-Ask structured questions to understand what the user wants to enforce:
-
-1. **What should this policy enforce?**
-   - Documentation sync? Security review? File correspondence? Code formatting?
-
-2. **What files trigger this policy?**
-   - Which files/directories, when changed, should trigger action?
-
-3. **What should happen when the policy fires?**
-   - Show instructions to the agent? Run a command automatically?
-
----
-
-## Step 2: Choose Detection Mode
-
-Policies support three detection modes:
-
-### Trigger/Safety (Default)
-Fire when trigger patterns match AND safety patterns don't.
-
-**Use for**: General checks like "source changed, verify README"
-
-```yaml
-trigger: "app/config/**/*"
-safety: "docs/install_guide.md"
-```
-
-### Set (Bidirectional Correspondence)
-Fire when files matching one pattern change but corresponding files don't.
-
-**Use for**: Source/test pairing, i18n files, paired documentation
-
-```yaml
-set:
-  - src/{path}.py
-  - tests/{path}_test.py
-```
-
-If `src/utils/helper.py` changes, expects `tests/utils/helper_test.py` to also change.
-
-### Pair (Directional Correspondence)
-Fire when trigger files change but expected files don't. Changes to expected files alone don't trigger.
-
-**Use for**: API code requires docs (but docs changes don't require API changes)
-
-```yaml
-pair:
-  trigger: src/api/{name}.py
-  expects: docs/api/{name}.md
-```
-
-### Variable Pattern Syntax
-
-- `{path}` - Matches multiple path segments (e.g., `foo/bar/baz`)
-- `{name}` - Matches a single segment (e.g., `helper`)
-
----
-
-## Step 3: Choose Action Type
-
-### Prompt (Default)
-Show instructions to the agent. The markdown body becomes the instructions.
-
-```markdown
----
-name: Security Review
-trigger: "src/auth/**/*"
----
-Please review for hardcoded credentials and validate input handling.
-```
-
-### Command
-Run an idempotent command automatically. No markdown body needed.
-
-```markdown
----
-name: Format Python
-trigger: "**/*.py"
-action:
-  command: "ruff format {file}"
-  run_for: each_match
----
-```
-
-**Command variables**:
-- `{file}` - Current file being processed
-- `{files}` - Space-separated list of all matching files
-- `{repo_root}` - Repository root path
-
-**run_for options**:
-- `each_match` - Run command once per matching file
-- `all_matches` - Run command once with all files
-
----
-
-## Step 4: Define Optional Settings
-
-### compare_to (Optional)
-Controls what baseline is used for detecting changed files:
-
-- `base` (default) - Changes since branch diverged from main/master
-- `default_tip` - Changes compared to current main/master tip
-- `prompt` - Changes since the last prompt submission
-
-Most policies should use the default (`base`).
-
----
-
-## Step 5: Create the Policy File
-
-### File Location
-Create: `.deepwork/policies/[policy-name].md`
-
-Use kebab-case for filename (e.g., `source-test-pairing.md`, `format-python.md`)
-
-### Examples
-
-**Trigger/Safety with Prompt:**
-```markdown
----
-name: Update Install Guide
-trigger: "app/config/**/*"
-safety: "docs/install_guide.md"
----
-Configuration files have changed. Please review docs/install_guide.md
-and update installation instructions if needed.
-```
-
-**Set (Bidirectional) with Prompt:**
-```markdown
----
-name: Source/Test Pairing
-set:
-  - src/{path}.py
-  - tests/{path}_test.py
----
-When source files change, corresponding test files should also change.
-Please create or update tests for the modified source files.
-```
-
-**Pair (Directional) with Prompt:**
-```markdown
----
-name: API Documentation
-pair:
-  trigger: src/api/{name}.py
-  expects: docs/api/{name}.md
----
-API code has changed. Please update the corresponding documentation.
-```
-
-**Command Action:**
-```markdown
----
-name: Format Python Files
-trigger: "**/*.py"
-action:
-  command: "ruff format {file}"
-  run_for: each_match
----
-```
-
-**Multiple Trigger Patterns:**
-```markdown
----
-name: Security Review
-trigger:
-  - "src/auth/**/*"
-  - "src/security/**/*"
-safety:
-  - "SECURITY.md"
-  - "docs/security_audit.md"
----
-Authentication or security code has been changed. Please review for:
-1. Hardcoded credentials or secrets
-2. Input validation issues
-3. Access control logic
-```
-
----
-
-## Step 6: Verify the Policy
-
-After creating the policy:
-
-1. **Check YAML frontmatter syntax** - Ensure valid YAML
-2. **Verify detection mode is appropriate** - trigger/safety vs set vs pair
-3. **Test patterns match intended files** - Check glob/variable patterns
-4. **Review instructions/command** - Ensure they're actionable
-5. **Check for conflicts** - Ensure no overlap with existing policies
-
----
-
-## Pattern Reference
-
-### Glob Patterns
-- `*` - Matches any characters within a single path segment
-- `**` - Matches across multiple path segments (recursive)
-- `?` - Matches a single character
-
-### Variable Patterns
-- `{path}` - Captures multiple segments: `src/{path}.py` matches `src/a/b/c.py` → path=`a/b/c`
-- `{name}` - Captures single segment: `src/{name}.py` matches `src/utils.py` → name=`utils`
-
-### Common Examples
-- `src/**/*.py` - All Python files in src (recursive)
-- `app/config/**/*` - All files in app/config
-- `*.md` - Markdown files in root only
-- `**/*.test.ts` - All test files anywhere
-- `src/{path}.ts` ↔ `tests/{path}.test.ts` - Source/test pairs
-
----
-
-## Output Format
-
-Create: `.deepwork/policies/[policy-name].md`
-
----
-
-## Quality Criteria
-
-- Asked structured questions to understand requirements
-- Chose appropriate detection mode (trigger/safety, set, or pair)
-- Chose appropriate action type (prompt or command)
-- Policy name is clear and descriptive
-- Patterns accurately match intended files
-- Instructions or command are actionable
-- YAML frontmatter is valid
-
----
-
-## Context
-
-Policies are evaluated automatically when you finish working. The system:
-
-1. Loads policies from `.deepwork/policies/`
-2. Detects changed files based on `compare_to` setting
-3. Evaluates each policy based on its detection mode
-4. For **command** actions: Runs the command automatically
-5. For **prompt** actions: Shows instructions if policy fires
-
-Mark a policy as addressed by including `<promise>✓ Policy Name</promise>` in your response.
diff --git a/.deepwork/jobs/deepwork_policy/hooks/capture_prompt_work_tree.sh b/.deepwork/jobs/deepwork_rules/hooks/capture_prompt_work_tree.sh
similarity index 100%
rename from .deepwork/jobs/deepwork_policy/hooks/capture_prompt_work_tree.sh
rename to .deepwork/jobs/deepwork_rules/hooks/capture_prompt_work_tree.sh
diff --git a/src/deepwork/standard_jobs/deepwork_policy/hooks/global_hooks.yml b/.deepwork/jobs/deepwork_rules/hooks/global_hooks.yml
similarity index 62%
rename from src/deepwork/standard_jobs/deepwork_policy/hooks/global_hooks.yml
rename to .deepwork/jobs/deepwork_rules/hooks/global_hooks.yml
index 0e024fc7..f76202ab 100644
--- a/src/deepwork/standard_jobs/deepwork_policy/hooks/global_hooks.yml
+++ b/.deepwork/jobs/deepwork_rules/hooks/global_hooks.yml
@@ -1,8 +1,8 @@
-# DeepWork Policy Hooks Configuration
+# DeepWork Rules Hooks Configuration
 # Maps Claude Code lifecycle events to hook scripts
 
 UserPromptSubmit:
   - user_prompt_submit.sh
 
 Stop:
-  - policy_stop_hook.sh
+  - rules_stop_hook.sh
diff --git a/src/deepwork/standard_jobs/deepwork_policy/hooks/policy_stop_hook.sh b/.deepwork/jobs/deepwork_rules/hooks/rules_stop_hook.sh
similarity index 51%
rename from src/deepwork/standard_jobs/deepwork_policy/hooks/policy_stop_hook.sh
rename to .deepwork/jobs/deepwork_rules/hooks/rules_stop_hook.sh
index 4ad1b539..20fa8a3f 100755
--- a/src/deepwork/standard_jobs/deepwork_policy/hooks/policy_stop_hook.sh
+++ b/.deepwork/jobs/deepwork_rules/hooks/rules_stop_hook.sh
@@ -1,25 +1,25 @@
 #!/bin/bash
-# policy_stop_hook.sh - Evaluates policies when the agent stops
+# rules_stop_hook.sh - Evaluates rules when the agent stops
 #
 # This script is called as a Claude Code Stop hook. It:
-# 1. Evaluates policies from .deepwork/policies/
-# 2. Computes changed files based on each policy's compare_to setting
+# 1. Evaluates rules from .deepwork/rules/
+# 2. Computes changed files based on each rule's compare_to setting
 # 3. Checks for <promise> tags in the conversation transcript
-# 4. Returns JSON to block stop if policies need attention
+# 4. Returns JSON to block stop if rules need attention
 
 set -e
 
-# Check if policies directory exists with .md files
-POLICY_DIR=".deepwork/policies"
+# Check if rules directory exists with .md files
+RULES_DIR=".deepwork/rules"
 
-if [ ! -d "${POLICY_DIR}" ]; then
-    # No policies directory, nothing to do
+if [ ! -d "${RULES_DIR}" ]; then
+    # No rules directory, nothing to do
     exit 0
 fi
 
 # Check if there are any .md files
-if ! ls "${POLICY_DIR}"/*.md 1>/dev/null 2>&1; then
-    # No policy files, nothing to do
+if ! ls "${RULES_DIR}"/*.md 1>/dev/null 2>&1; then
+    # No rule files, nothing to do
     exit 0
 fi
 
@@ -29,10 +29,10 @@ if [ ! -t 0 ]; then
     HOOK_INPUT=$(cat)
 fi
 
-# Call the Python policy evaluator via the cross-platform wrapper
+# Call the Python rules evaluator via the cross-platform wrapper
 # The wrapper reads JSON input and handles transcript extraction
 # Note: exit code 2 means "block" which is valid (not an error), so capture it
-result=$(echo "${HOOK_INPUT}" | DEEPWORK_HOOK_PLATFORM=claude DEEPWORK_HOOK_EVENT=Stop python -m deepwork.hooks.policy_check 2>/dev/null) || true
+result=$(echo "${HOOK_INPUT}" | DEEPWORK_HOOK_PLATFORM=claude DEEPWORK_HOOK_EVENT=Stop python -m deepwork.hooks.rules_check 2>/dev/null) || true
 
 # If no output (error case), provide empty JSON as fallback
 if [ -z "${result}" ]; then
diff --git a/.deepwork/jobs/deepwork_policy/hooks/user_prompt_submit.sh b/.deepwork/jobs/deepwork_rules/hooks/user_prompt_submit.sh
similarity index 100%
rename from .deepwork/jobs/deepwork_policy/hooks/user_prompt_submit.sh
rename to .deepwork/jobs/deepwork_rules/hooks/user_prompt_submit.sh
diff --git a/src/deepwork/standard_jobs/deepwork_policy/job.yml b/.deepwork/jobs/deepwork_rules/job.yml
similarity index 50%
rename from src/deepwork/standard_jobs/deepwork_policy/job.yml
rename to .deepwork/jobs/deepwork_rules/job.yml
index 777894ed..9e9ece74 100644
--- a/src/deepwork/standard_jobs/deepwork_policy/job.yml
+++ b/.deepwork/jobs/deepwork_rules/job.yml
@@ -1,16 +1,16 @@
-name: deepwork_policy
+name: deepwork_rules
 version: "0.2.0"
-summary: "Policy enforcement for AI agent sessions"
+summary: "Rules enforcement for AI agent sessions"
 description: |
-  Manages policies that automatically trigger when certain files change during an AI agent session.
-  Policies help ensure that code changes follow team guidelines, documentation is updated,
+  Manages rules that automatically trigger when certain files change during an AI agent session.
+  Rules help ensure that code changes follow team guidelines, documentation is updated,
   and architectural decisions are respected.
 
-  Policies are defined in a `.deepwork.policy.yml` file at the root of your project. Each policy
+  Rules are defined in a `.deepwork.rules.yml` file at the root of your project. Each rule
   specifies:
-  - Trigger patterns: Glob patterns for files that, when changed, should trigger the policy
-  - Safety patterns: Glob patterns for files that, if also changed, mean the policy doesn't need to fire
-  - Instructions: What the agent should do when the policy triggers
+  - Trigger patterns: Glob patterns for files that, when changed, should trigger the rule
+  - Safety patterns: Glob patterns for files that, if also changed, mean the rule doesn't need to fire
+  - Instructions: What the agent should do when the rule triggers
 
   Example use cases:
   - Update installation docs when configuration files change
@@ -26,12 +26,12 @@ changelog:
 
 steps:
   - id: define
-    name: "Define Policy"
-    description: "Create or update policy entries in .deepwork.policy.yml"
+    name: "Define Rule"
+    description: "Create or update rule entries in .deepwork.rules.yml"
     instructions_file: steps/define.md
     inputs:
-      - name: policy_purpose
-        description: "What guideline or constraint should this policy enforce?"
+      - name: rule_purpose
+        description: "What guideline or constraint should this rule enforce?"
     outputs:
-      - .deepwork.policy.yml
+      - .deepwork.rules.yml
     dependencies: []
diff --git a/src/deepwork/standard_jobs/deepwork_policy/steps/define.md b/.deepwork/jobs/deepwork_rules/steps/define.md
similarity index 69%
rename from src/deepwork/standard_jobs/deepwork_policy/steps/define.md
rename to .deepwork/jobs/deepwork_rules/steps/define.md
index 302eda7f..3e8be899 100644
--- a/src/deepwork/standard_jobs/deepwork_policy/steps/define.md
+++ b/.deepwork/jobs/deepwork_rules/steps/define.md
@@ -1,37 +1,37 @@
-# Define Policy
+# Define Rule
 
 ## Objective
 
-Create or update policy entries in the `.deepwork.policy.yml` file to enforce team guidelines, documentation requirements, or other constraints when specific files change.
+Create or update rule entries in the `.deepwork.rules.yml` file to enforce team guidelines, documentation requirements, or other constraints when specific files change.
 
 ## Task
 
-Guide the user through defining a new policy by asking structured questions. **Do not create the policy without first understanding what they want to enforce.**
+Guide the user through defining a new rule by asking structured questions. **Do not create the rule without first understanding what they want to enforce.**
 
 **Important**: Use the AskUserQuestion tool to ask structured questions when gathering information from the user. This provides a better user experience with clear options and guided choices.
 
-### Step 1: Understand the Policy Purpose
+### Step 1: Understand the Rule Purpose
 
 Start by asking structured questions to understand what the user wants to enforce:
 
-1. **What guideline or constraint should this policy enforce?**
+1. **What guideline or constraint should this rule enforce?**
    - What situation triggers the need for action?
-   - What files or directories, when changed, should trigger this policy?
+   - What files or directories, when changed, should trigger this rule?
    - Examples: "When config files change", "When API code changes", "When database schema changes"
 
 2. **What action should be taken?**
-   - What should the agent do when the policy triggers?
+   - What should the agent do when the rule triggers?
    - Update documentation? Perform a security review? Update tests?
    - Is there a specific file or process that needs attention?
 
 3. **Are there any "safety" conditions?**
-   - Are there files that, if also changed, mean the policy doesn't need to fire?
+   - Are there files that, if also changed, mean the rule doesn't need to fire?
    - For example: If config changes AND install_guide.md changes, assume docs are already updated
    - This prevents redundant prompts when the user has already done the right thing
 
 ### Step 2: Define the Trigger Patterns
 
-Help the user define glob patterns for files that should trigger the policy:
+Help the user define glob patterns for files that should trigger the rule:
 
 **Common patterns:**
 - `src/**/*.py` - All Python files in src directory (recursive)
@@ -47,14 +47,14 @@ Help the user define glob patterns for files that should trigger the policy:
 
 ### Step 3: Define Safety Patterns (Optional)
 
-If there are files that, when also changed, mean the policy shouldn't fire:
+If there are files that, when also changed, mean the rule shouldn't fire:
 
 **Examples:**
-- Policy: "Update install guide when config changes"
+- Rule: "Update install guide when config changes"
   - Trigger: `app/config/**/*`
   - Safety: `docs/install_guide.md` (if already updated, don't prompt)
 
-- Policy: "Security review for auth changes"
+- Rule: "Security review for auth changes"
   - Trigger: `src/auth/**/*`
   - Safety: `SECURITY.md`, `docs/security_review.md`
 
@@ -65,18 +65,18 @@ The `compare_to` field controls what baseline is used when detecting "changed fi
 **Options:**
 - `base` (default) - Compares to the base of the current branch (merge-base with main/master). This is the most common choice for feature branches, as it shows all changes made on the branch.
 - `default_tip` - Compares to the current tip of the default branch (main/master). Useful when you want to see the difference from what's currently in production.
-- `prompt` - Compares to the state at the start of each prompt. Useful for policies that should only fire based on changes made during a single agent response.
+- `prompt` - Compares to the state at the start of each prompt. Useful for rules that should only fire based on changes made during a single agent response.
 
 **When to use each:**
-- **base**: Best for most policies. "Did this branch change config files?" → trigger docs review
-- **default_tip**: For policies about what's different from production/main
-- **prompt**: For policies that should only consider very recent changes within the current session
+- **base**: Best for most rules. "Did this branch change config files?" -> trigger docs review
+- **default_tip**: For rules about what's different from production/main
+- **prompt**: For rules that should only consider very recent changes within the current session
 
-Most policies should use the default (`base`) and don't need to specify `compare_to`.
+Most rules should use the default (`base`) and don't need to specify `compare_to`.
 
 ### Step 4: Write the Instructions
 
-Create clear, actionable instructions for what the agent should do when the policy fires.
+Create clear, actionable instructions for what the agent should do when the rule fires.
 
 **Good instructions include:**
 - What to check or review
@@ -93,15 +93,15 @@ Configuration files have changed. Please:
 4. Test that installation instructions still work
 ```
 
-### Step 5: Create the Policy Entry
+### Step 5: Create the Rule Entry
 
-Create or update `.deepwork.policy.yml` in the project root.
+Create or update `.deepwork.rules.yml` in the project root.
 
-**File Location**: `.deepwork.policy.yml` (root of project)
+**File Location**: `.deepwork.rules.yml` (root of project)
 
 **Format**:
 ```yaml
-- name: "[Friendly name for the policy]"
+- name: "[Friendly name for the rule]"
   trigger: "[glob pattern]"  # or array: ["pattern1", "pattern2"]
   safety: "[glob pattern]"   # optional, or array
   compare_to: "base"         # optional: "base" (default), "default_tip", or "prompt"
@@ -111,23 +111,23 @@ Create or update `.deepwork.policy.yml` in the project root.
 
 **Alternative with instructions_file**:
 ```yaml
-- name: "[Friendly name for the policy]"
+- name: "[Friendly name for the rule]"
   trigger: "[glob pattern]"
   safety: "[glob pattern]"
   compare_to: "base"         # optional
   instructions_file: "path/to/instructions.md"
 ```
 
-### Step 6: Verify the Policy
+### Step 6: Verify the Rule
 
-After creating the policy:
+After creating the rule:
 
 1. **Check the YAML syntax** - Ensure valid YAML formatting
 2. **Test trigger patterns** - Verify patterns match intended files
 3. **Review instructions** - Ensure they're clear and actionable
-4. **Check for conflicts** - Ensure the policy doesn't conflict with existing ones
+4. **Check for conflicts** - Ensure the rule doesn't conflict with existing ones
 
-## Example Policies
+## Example Rules
 
 ### Update Documentation on Config Changes
 ```yaml
@@ -172,13 +172,13 @@ After creating the policy:
 
 ## Output Format
 
-### .deepwork.policy.yml
-Create or update this file at the project root with the new policy entry.
+### .deepwork.rules.yml
+Create or update this file at the project root with the new rule entry.
 
 ## Quality Criteria
 
 - Asked structured questions to understand user requirements
-- Policy name is clear and descriptive
+- Rule name is clear and descriptive
 - Trigger patterns accurately match the intended files
 - Safety patterns prevent unnecessary triggering
 - Instructions are actionable and specific
@@ -186,13 +186,13 @@ Create or update this file at the project root with the new policy entry.
 
 ## Context
 
-Policies are evaluated automatically when you finish working on a task. The system:
-1. Determines which files have changed based on each policy's `compare_to` setting:
+Rules are evaluated automatically when you finish working on a task. The system:
+1. Determines which files have changed based on each rule's `compare_to` setting:
    - `base` (default): Files changed since the branch diverged from main/master
    - `default_tip`: Files different from the current main/master branch
    - `prompt`: Files changed since the last prompt submission
-2. Checks if any changes match policy trigger patterns
-3. Skips policies where safety patterns also matched
-4. Prompts you with instructions for any triggered policies
+2. Checks if any changes match rule trigger patterns
+3. Skips rules where safety patterns also matched
+4. Prompts you with instructions for any triggered rules
 
-You can mark a policy as addressed by including `<promise>✓ Policy Name</promise>` in your response (replace Policy Name with the actual policy name). This tells the system you've already handled that policy's requirements.
+You can mark a rule as addressed by including `<promise>Rule Name</promise>` in your response (replace Rule Name with the actual rule name). This tells the system you've already handled that rule's requirements.
diff --git a/.deepwork/jobs/update/job.yml b/.deepwork/jobs/update/job.yml
index 0c6e2b6e..4f8ab339 100644
--- a/.deepwork/jobs/update/job.yml
+++ b/.deepwork/jobs/update/job.yml
@@ -3,7 +3,7 @@ version: "1.1.0"
 summary: "Update standard jobs in src/ and sync to installed locations"
 description: |
   A workflow for maintaining standard jobs bundled with DeepWork. Standard jobs
-  (like `deepwork_jobs` and `deepwork_policy`) are source-controlled in
+  (like `deepwork_jobs` and `deepwork_rules`) are source-controlled in
   `src/deepwork/standard_jobs/` and must be edited there—never in `.deepwork/jobs/`
   or `.claude/commands/` directly.
 
diff --git a/.deepwork/jobs/update/steps/job.md b/.deepwork/jobs/update/steps/job.md
index 0c7f70ab..b226b4f6 100644
--- a/.deepwork/jobs/update/steps/job.md
+++ b/.deepwork/jobs/update/steps/job.md
@@ -25,7 +25,7 @@ Standard jobs exist in THREE locations, but only ONE is the source of truth:
 #### 1. Identify the Standard Job to Update
 
 From conversation context, determine:
-- Which standard job needs updating (e.g., `deepwork_jobs`, `deepwork_policy`)
+- Which standard job needs updating (e.g., `deepwork_jobs`, `deepwork_rules`)
 - What changes are needed (job.yml, step instructions, hooks, etc.)
 
 Current standard jobs:
diff --git a/.gemini/commands/add_platform/verify.toml b/.gemini/commands/add_platform/verify.toml
index 1ee56ab8..acfd9671 100644
--- a/.gemini/commands/add_platform/verify.toml
+++ b/.gemini/commands/add_platform/verify.toml
@@ -96,7 +96,7 @@ Ensure the implementation step is complete:
    - `deepwork_jobs.define.md` exists (or equivalent for the platform)
    - `deepwork_jobs.implement.md` exists
    - `deepwork_jobs.refine.md` exists
-   - `deepwork_policy.define.md` exists
+   - `deepwork_rules.define.md` exists
    - All expected step commands exist
 
 4. **Validate command file content**
@@ -126,7 +126,7 @@ Ensure the implementation step is complete:
 - `deepwork install --platform <platform_name>` completes without errors
 - All expected command files are created:
   - deepwork_jobs.define, implement, refine
-  - deepwork_policy.define
+  - deepwork_rules.define
   - Any other standard job commands
 - Command file content is correct:
   - Matches platform's expected format
diff --git a/.gemini/commands/deepwork_jobs/implement.toml b/.gemini/commands/deepwork_jobs/implement.toml
index 3e922243..4c09fc47 100644
--- a/.gemini/commands/deepwork_jobs/implement.toml
+++ b/.gemini/commands/deepwork_jobs/implement.toml
@@ -168,19 +168,19 @@ This will:
 
 After running `deepwork sync`, look at the "To use the new commands" section in the output. **Relay these exact reload instructions to the user** so they know how to pick up the new commands. Don't just reference the sync output - tell them directly what they need to do (e.g., "Type 'exit' then run 'claude --resume'" for Claude Code, or "Run '/memory refresh'" for Gemini CLI).
 
-### Step 7: Consider Policies for the New Job
+### Step 7: Consider Rules for the New Job
 
-After implementing the job, consider whether there are **policies** that would help enforce quality or consistency when working with this job's domain.
+After implementing the job, consider whether there are **rules** that would help enforce quality or consistency when working with this job's domain.
 
-**What are policies?**
+**What are rules?**
 
-Policies are automated guardrails defined in `.deepwork.policy.yml` that trigger when certain files change during an AI session. They help ensure:
+Rules are automated guardrails defined in `.deepwork.rules.yml` that trigger when certain files change during an AI session. They help ensure:
 - Documentation stays in sync with code
 - Team guidelines are followed
 - Architectural decisions are respected
 - Quality standards are maintained
 
-**When to suggest policies:**
+**When to suggest rules:**
 
 Think about the job you just implemented and ask:
 - Does this job produce outputs that other files depend on?
@@ -188,28 +188,28 @@ Think about the job you just implemented and ask:
 - Are there quality checks or reviews that should happen when certain files in this domain change?
 - Could changes to the job's output files impact other parts of the project?
 
-**Examples of policies that might make sense:**
+**Examples of rules that might make sense:**
 
-| Job Type | Potential Policy |
-|----------|------------------|
+| Job Type | Potential Rule |
+|----------|----------------|
 | API Design | "Update API docs when endpoint definitions change" |
 | Database Schema | "Review migrations when schema files change" |
 | Competitive Research | "Update strategy docs when competitor analysis changes" |
 | Feature Development | "Update changelog when feature files change" |
 | Configuration Management | "Update install guide when config files change" |
 
-**How to offer policy creation:**
+**How to offer rule creation:**
 
-If you identify one or more policies that would benefit the user, explain:
-1. **What the policy would do** - What triggers it and what action it prompts
+If you identify one or more rules that would benefit the user, explain:
+1. **What the rule would do** - What triggers it and what action it prompts
 2. **Why it would help** - How it prevents common mistakes or keeps things in sync
 3. **What files it would watch** - The trigger patterns
 
 Then ask the user:
 
-> "Would you like me to create this policy for you? I can run `/deepwork_policy.define` to set it up."
+> "Would you like me to create this rule for you? I can run `/deepwork_rules.define` to set it up."
 
-If the user agrees, invoke the `/deepwork_policy.define` command to guide them through creating the policy.
+If the user agrees, invoke the `/deepwork_rules.define` command to guide them through creating the rule.
 
 **Example dialogue:**
 
@@ -218,15 +218,15 @@ Based on the competitive_research job you just created, I noticed that when
 competitor analysis files change, it would be helpful to remind you to update
 your strategy documentation.
 
-I'd suggest a policy like:
+I'd suggest a rule like:
 - **Name**: "Update strategy when competitor analysis changes"
 - **Trigger**: `**/positioning_report.md`
 - **Action**: Prompt to review and update `docs/strategy.md`
 
-Would you like me to create this policy? I can run `/deepwork_policy.define` to set it up.
+Would you like me to create this rule? I can run `/deepwork_rules.define` to set it up.
 ```
 
-**Note:** Not every job needs policies. Only suggest them when they would genuinely help maintain consistency or quality. Don't force policies where they don't make sense.
+**Note:** Not every job needs rules. Only suggest them when they would genuinely help maintain consistency or quality. Don't force rules where they don't make sense.
 
 ## Example Implementation
 
@@ -260,8 +260,8 @@ Before marking this step complete, ensure:
 - [ ] `deepwork sync` executed successfully
 - [ ] Commands generated in platform directory
 - [ ] User informed to follow reload instructions from `deepwork sync`
-- [ ] Considered whether policies would benefit this job (Step 7)
-- [ ] If policies suggested, offered to run `/deepwork_policy.define`
+- [ ] Considered whether rules would benefit this job (Step 7)
+- [ ] If rules suggested, offered to run `/deepwork_rules.define`
 
 ## Quality Criteria
 
@@ -273,7 +273,7 @@ Before marking this step complete, ensure:
 - Steps with user inputs explicitly use "ask structured questions" phrasing
 - Sync completed successfully
 - Commands available for use
-- Thoughtfully considered relevant policies for the job domain
+- Thoughtfully considered relevant rules for the job domain
 
 
 ## Inputs
diff --git a/.gemini/commands/deepwork_policy/define.toml b/.gemini/commands/deepwork_policy/define.toml
deleted file mode 100644
index 0195ff11..00000000
--- a/.gemini/commands/deepwork_policy/define.toml
+++ /dev/null
@@ -1,396 +0,0 @@
-# deepwork_policy:define
-#
-# Create or update policies in .deepwork/policies/ (v2) or .deepwork.policy.yml (v1)
-#
-# Generated by DeepWork - do not edit manually
-
-description = "Create or update policies in .deepwork/policies/ (v2) or .deepwork.policy.yml (v1)"
-
-prompt = """
-# deepwork_policy:define
-
-**Standalone command** in the **deepwork_policy** job - can be run anytime
-
-**Summary**: Policy enforcement for AI agent sessions
-
-## Job Overview
-
-Manages policies that automatically trigger when certain files change during an AI agent session.
-Policies help ensure that code changes follow team guidelines, documentation is updated,
-and architectural decisions are respected.
-
-**Policy System v2 (Recommended)**
-Policies are defined as individual markdown files in `.deepwork/policies/` with YAML frontmatter.
-This format supports:
-- Detection modes: trigger/safety (default), set (bidirectional), pair (directional)
-- Action types: prompt (show instructions), command (run idempotent commands)
-- Variable pattern matching for file correspondence (e.g., `src/{path}.py` ↔ `tests/{path}_test.py`)
-
-**Legacy v1 Format**
-Still supported: `.deepwork.policy.yml` at project root with trigger/safety/instructions fields.
-
-Example use cases:
-- Enforce source/test pairing with set patterns
-- Run formatters automatically when files change
-- Update installation docs when configuration files change
-- Require security review when authentication code is modified
-- Ensure API documentation stays in sync with API code
-
-
-
-## Instructions
-
-# Define Policy
-
-## Objective
-
-Create or update policies to enforce team guidelines, documentation requirements, file correspondences, or automated commands when specific files change.
-
-## Task
-
-Guide the user through defining a new policy by asking structured questions. **Do not create the policy without first understanding what they want to enforce.**
-
-**Important**: Use the AskUserQuestion tool to ask structured questions when gathering information from the user.
-
-## Policy System Overview
-
-DeepWork supports two policy formats:
-
-**v2 (Recommended)**: Individual markdown files in `.deepwork/policies/` with YAML frontmatter
-**v1 (Legacy)**: Single `.deepwork.policy.yml` file at project root
-
-**Always prefer v2 format** for new policies. It supports more detection modes and action types.
-
----
-
-## Step 1: Understand the Policy Purpose
-
-Ask structured questions to understand what the user wants to enforce:
-
-1. **What should this policy enforce?**
-   - Documentation sync? Security review? File correspondence? Code formatting?
-
-2. **What files trigger this policy?**
-   - Which files/directories, when changed, should trigger action?
-
-3. **What should happen when the policy fires?**
-   - Show instructions to the agent? Run a command automatically?
-
----
-
-## Step 2: Choose Detection Mode
-
-Policies support three detection modes:
-
-### Trigger/Safety (Default)
-Fire when trigger patterns match AND safety patterns don't.
-
-**Use for**: General checks like "source changed, verify README"
-
-```yaml
-trigger: "app/config/**/*"
-safety: "docs/install_guide.md"
-```
-
-### Set (Bidirectional Correspondence)
-Fire when files matching one pattern change but corresponding files don't.
-
-**Use for**: Source/test pairing, i18n files, paired documentation
-
-```yaml
-set:
-  - src/{path}.py
-  - tests/{path}_test.py
-```
-
-If `src/utils/helper.py` changes, expects `tests/utils/helper_test.py` to also change.
-
-### Pair (Directional Correspondence)
-Fire when trigger files change but expected files don't. Changes to expected files alone don't trigger.
-
-**Use for**: API code requires docs (but docs changes don't require API changes)
-
-```yaml
-pair:
-  trigger: src/api/{name}.py
-  expects: docs/api/{name}.md
-```
-
-### Variable Pattern Syntax
-
-- `{path}` - Matches multiple path segments (e.g., `foo/bar/baz`)
-- `{name}` - Matches a single segment (e.g., `helper`)
-
----
-
-## Step 3: Choose Action Type
-
-### Prompt (Default)
-Show instructions to the agent. The markdown body becomes the instructions.
-
-```markdown
----
-name: Security Review
-trigger: "src/auth/**/*"
----
-Please review for hardcoded credentials and validate input handling.
-```
-
-### Command
-Run an idempotent command automatically. No markdown body needed.
-
-```markdown
----
-name: Format Python
-trigger: "**/*.py"
-action:
-  command: "ruff format {file}"
-  run_for: each_match
----
-```
-
-**Command variables**:
-- `{file}` - Current file being processed
-- `{files}` - Space-separated list of all matching files
-- `{repo_root}` - Repository root path
-
-**run_for options**:
-- `each_match` - Run command once per matching file
-- `all_matches` - Run command once with all files
-
----
-
-## Step 4: Define Optional Settings
-
-### compare_to (Optional)
-Controls what baseline is used for detecting changed files:
-
-- `base` (default) - Changes since branch diverged from main/master
-- `default_tip` - Changes compared to current main/master tip
-- `prompt` - Changes since the last prompt submission
-
-Most policies should use the default (`base`).
-
----
-
-## Step 5: Create the Policy File (v2 Format)
-
-### File Location
-Create: `.deepwork/policies/[policy-name].md`
-
-Use kebab-case for filename (e.g., `source-test-pairing.md`, `format-python.md`)
-
-### v2 Format Examples
-
-**Trigger/Safety with Prompt:**
-```markdown
----
-name: Update Install Guide
-trigger: "app/config/**/*"
-safety: "docs/install_guide.md"
----
-Configuration files have changed. Please review docs/install_guide.md
-and update installation instructions if needed.
-```
-
-**Set (Bidirectional) with Prompt:**
-```markdown
----
-name: Source/Test Pairing
-set:
-  - src/{path}.py
-  - tests/{path}_test.py
----
-When source files change, corresponding test files should also change.
-Please create or update tests for the modified source files.
-```
-
-**Pair (Directional) with Prompt:**
-```markdown
----
-name: API Documentation
-pair:
-  trigger: src/api/{name}.py
-  expects: docs/api/{name}.md
----
-API code has changed. Please update the corresponding documentation.
-```
-
-**Command Action:**
-```markdown
----
-name: Format Python Files
-trigger: "**/*.py"
-action:
-  command: "ruff format {file}"
-  run_for: each_match
----
-```
-
-**Multiple Trigger Patterns:**
-```markdown
----
-name: Security Review
-trigger:
-  - "src/auth/**/*"
-  - "src/security/**/*"
-safety:
-  - "SECURITY.md"
-  - "docs/security_audit.md"
----
-Authentication or security code has been changed. Please review for:
-1. Hardcoded credentials or secrets
-2. Input validation issues
-3. Access control logic
-```
-
----
-
-## Step 6: Legacy v1 Format (If Needed)
-
-Only use v1 format when adding to an existing `.deepwork.policy.yml` file.
-
-**File Location**: `.deepwork.policy.yml` (project root)
-
-```yaml
-- name: "Update install guide on config changes"
-  trigger: "app/config/**/*"
-  safety: "docs/install_guide.md"
-  compare_to: "base"
-  instructions: |
-    Configuration files have changed. Please review docs/install_guide.md.
-```
-
-**Alternative with instructions_file:**
-```yaml
-- name: "Security review"
-  trigger: "src/auth/**/*"
-  instructions_file: "path/to/instructions.md"
-```
-
----
-
-## Step 7: Verify the Policy
-
-After creating the policy:
-
-1. **Check YAML frontmatter syntax** - Ensure valid YAML
-2. **Verify detection mode is appropriate** - trigger/safety vs set vs pair
-3. **Test patterns match intended files** - Check glob/variable patterns
-4. **Review instructions/command** - Ensure they're actionable
-5. **Check for conflicts** - Ensure no overlap with existing policies
-
----
-
-## Pattern Reference
-
-### Glob Patterns
-- `*` - Matches any characters within a single path segment
-- `**` - Matches across multiple path segments (recursive)
-- `?` - Matches a single character
-
-### Variable Patterns (v2 only)
-- `{path}` - Captures multiple segments: `src/{path}.py` matches `src/a/b/c.py` → path=`a/b/c`
-- `{name}` - Captures single segment: `src/{name}.py` matches `src/utils.py` → name=`utils`
-
-### Common Examples
-- `src/**/*.py` - All Python files in src (recursive)
-- `app/config/**/*` - All files in app/config
-- `*.md` - Markdown files in root only
-- `**/*.test.ts` - All test files anywhere
-- `src/{path}.ts` ↔ `tests/{path}.test.ts` - Source/test pairs
-
----
-
-## Output Format
-
-Create one of:
-- `.deepwork/policies/[policy-name].md` (v2 format, recommended)
-- Entry in `.deepwork.policy.yml` (v1 format, legacy)
-
----
-
-## Quality Criteria
-
-- Asked structured questions to understand requirements
-- Chose appropriate detection mode (trigger/safety, set, or pair)
-- Chose appropriate action type (prompt or command)
-- Policy name is clear and descriptive
-- Patterns accurately match intended files
-- Instructions or command are actionable
-- YAML frontmatter is valid
-
----
-
-## Context
-
-Policies are evaluated automatically when you finish working. The system:
-
-1. Loads policies from `.deepwork/policies/` (v2) and `.deepwork.policy.yml` (v1)
-2. Detects changed files based on `compare_to` setting
-3. Evaluates each policy based on its detection mode
-4. For **command** actions: Runs the command automatically
-5. For **prompt** actions: Shows instructions if policy fires
-
-Mark a policy as addressed by including `<promise>✓ Policy Name</promise>` in your response.
-
-
-## Inputs
-
-### User Parameters
-
-Please gather the following information from the user:
-- **policy_purpose**: What guideline or constraint should this policy enforce?
-
-
-## Work Branch Management
-
-All work for this job should be done on a dedicated work branch:
-
-1. **Check current branch**:
-   - If already on a work branch for this job (format: `deepwork/deepwork_policy-[instance]-[date]`), continue using it
-   - If on main/master, create a new work branch
-
-2. **Create work branch** (if needed):
-   ```bash
-   git checkout -b deepwork/deepwork_policy-[instance]-$(date +%Y%m%d)
-   ```
-   Replace `[instance]` with a descriptive identifier (e.g., `acme`, `q1-launch`, etc.)
-
-## Output Requirements
-
-Create the following output(s):
-- `.deepwork/policies/*.md`
-- `.deepwork.policy.yml`
-
-Ensure all outputs are:
-- Well-formatted and complete
-- Ready for review or use by subsequent steps
-
-## Completion
-
-After completing this step:
-
-1. **Verify outputs**: Confirm all required files have been created
-
-2. **Inform the user**:
-   - The define command is complete
-   - Outputs created: .deepwork/policies/*.md, .deepwork.policy.yml
-   - This command can be run again anytime to make further changes
-
-## Command Complete
-
-This is a standalone command that can be run anytime. The outputs are ready for use.
-
-Consider:
-- Reviewing the outputs
-- Running `deepwork sync` if job definitions were changed
-- Re-running this command later if further changes are needed
-
----
-
-## Context Files
-
-- Job definition: `.deepwork/jobs/deepwork_policy/job.yml`
-- Step instructions: `.deepwork/jobs/deepwork_policy/steps/define.md`
-"""
\ No newline at end of file
diff --git a/.gemini/commands/deepwork_rules/define.toml b/.gemini/commands/deepwork_rules/define.toml
new file mode 100644
index 00000000..3615c83e
--- /dev/null
+++ b/.gemini/commands/deepwork_rules/define.toml
@@ -0,0 +1,295 @@
+# deepwork_rules:define
+#
+# Create or update rule entries in .deepwork.rules.yml
+#
+# Generated by DeepWork - do not edit manually
+
+description = "Create or update rule entries in .deepwork.rules.yml"
+
+prompt = """
+# deepwork_rules:define
+
+**Standalone command** in the **deepwork_rules** job - can be run anytime
+
+**Summary**: Rules enforcement for AI agent sessions
+
+## Job Overview
+
+Manages rules that automatically trigger when certain files change during an AI agent session.
+Rules help ensure that code changes follow team guidelines, documentation is updated,
+and architectural decisions are respected.
+
+Rules are defined in a `.deepwork.rules.yml` file at the root of your project. Each rule
+specifies:
+- Trigger patterns: Glob patterns for files that, when changed, should trigger the rule
+- Safety patterns: Glob patterns for files that, if also changed, mean the rule doesn't need to fire
+- Instructions: What the agent should do when the rule triggers
+
+Example use cases:
+- Update installation docs when configuration files change
+- Require security review when authentication code is modified
+- Ensure API documentation stays in sync with API code
+- Remind developers to update changelogs
+
+
+
+## Instructions
+
+# Define Rule
+
+## Objective
+
+Create or update rule entries in the `.deepwork.rules.yml` file to enforce team guidelines, documentation requirements, or other constraints when specific files change.
+
+## Task
+
+Guide the user through defining a new rule by asking structured questions. **Do not create the rule without first understanding what they want to enforce.**
+
+**Important**: Use the AskUserQuestion tool to ask structured questions when gathering information from the user. This provides a better user experience with clear options and guided choices.
+
+### Step 1: Understand the Rule Purpose
+
+Start by asking structured questions to understand what the user wants to enforce:
+
+1. **What guideline or constraint should this rule enforce?**
+   - What situation triggers the need for action?
+   - What files or directories, when changed, should trigger this rule?
+   - Examples: "When config files change", "When API code changes", "When database schema changes"
+
+2. **What action should be taken?**
+   - What should the agent do when the rule triggers?
+   - Update documentation? Perform a security review? Update tests?
+   - Is there a specific file or process that needs attention?
+
+3. **Are there any "safety" conditions?**
+   - Are there files that, if also changed, mean the rule doesn't need to fire?
+   - For example: If config changes AND install_guide.md changes, assume docs are already updated
+   - This prevents redundant prompts when the user has already done the right thing
+
+### Step 2: Define the Trigger Patterns
+
+Help the user define glob patterns for files that should trigger the rule:
+
+**Common patterns:**
+- `src/**/*.py` - All Python files in src directory (recursive)
+- `app/config/**/*` - All files in app/config directory
+- `*.md` - All markdown files in root
+- `src/api/**/*` - All files in the API directory
+- `migrations/**/*.sql` - All SQL migrations
+
+**Pattern syntax:**
+- `*` - Matches any characters within a single path segment
+- `**` - Matches any characters across multiple path segments (recursive)
+- `?` - Matches a single character
+
+### Step 3: Define Safety Patterns (Optional)
+
+If there are files that, when also changed, mean the rule shouldn't fire:
+
+**Examples:**
+- Rule: "Update install guide when config changes"
+  - Trigger: `app/config/**/*`
+  - Safety: `docs/install_guide.md` (if already updated, don't prompt)
+
+- Rule: "Security review for auth changes"
+  - Trigger: `src/auth/**/*`
+  - Safety: `SECURITY.md`, `docs/security_review.md`
+
+### Step 3b: Choose the Comparison Mode (Optional)
+
+The `compare_to` field controls what baseline is used when detecting "changed files":
+
+**Options:**
+- `base` (default) - Compares to the base of the current branch (merge-base with main/master). This is the most common choice for feature branches, as it shows all changes made on the branch.
+- `default_tip` - Compares to the current tip of the default branch (main/master). Useful when you want to see the difference from what's currently in production.
+- `prompt` - Compares to the state at the start of each prompt. Useful for rules that should only fire based on changes made during a single agent response.
+
+**When to use each:**
+- **base**: Best for most rules. "Did this branch change config files?" -> trigger docs review
+- **default_tip**: For rules about what's different from production/main
+- **prompt**: For rules that should only consider very recent changes within the current session
+
+Most rules should use the default (`base`) and don't need to specify `compare_to`.
+
+### Step 4: Write the Instructions
+
+Create clear, actionable instructions for what the agent should do when the rule fires.
+
+**Good instructions include:**
+- What to check or review
+- What files might need updating
+- Specific actions to take
+- Quality criteria for completion
+
+**Example:**
+```
+Configuration files have changed. Please:
+1. Review docs/install_guide.md for accuracy
+2. Update any installation steps that reference changed config
+3. Verify environment variable documentation is current
+4. Test that installation instructions still work
+```
+
+### Step 5: Create the Rule Entry
+
+Create or update `.deepwork.rules.yml` in the project root.
+
+**File Location**: `.deepwork.rules.yml` (root of project)
+
+**Format**:
+```yaml
+- name: "[Friendly name for the rule]"
+  trigger: "[glob pattern]"  # or array: ["pattern1", "pattern2"]
+  safety: "[glob pattern]"   # optional, or array
+  compare_to: "base"         # optional: "base" (default), "default_tip", or "prompt"
+  instructions: |
+    [Multi-line instructions for the agent...]
+```
+
+**Alternative with instructions_file**:
+```yaml
+- name: "[Friendly name for the rule]"
+  trigger: "[glob pattern]"
+  safety: "[glob pattern]"
+  compare_to: "base"         # optional
+  instructions_file: "path/to/instructions.md"
+```
+
+### Step 6: Verify the Rule
+
+After creating the rule:
+
+1. **Check the YAML syntax** - Ensure valid YAML formatting
+2. **Test trigger patterns** - Verify patterns match intended files
+3. **Review instructions** - Ensure they're clear and actionable
+4. **Check for conflicts** - Ensure the rule doesn't conflict with existing ones
+
+## Example Rules
+
+### Update Documentation on Config Changes
+```yaml
+- name: "Update install guide on config changes"
+  trigger: "app/config/**/*"
+  safety: "docs/install_guide.md"
+  instructions: |
+    Configuration files have been modified. Please review docs/install_guide.md
+    and update it if any installation instructions need to change based on the
+    new configuration.
+```
+
+### Security Review for Auth Code
+```yaml
+- name: "Security review for authentication changes"
+  trigger:
+    - "src/auth/**/*"
+    - "src/security/**/*"
+  safety:
+    - "SECURITY.md"
+    - "docs/security_audit.md"
+  instructions: |
+    Authentication or security code has been changed. Please:
+    1. Review for hardcoded credentials or secrets
+    2. Check input validation on user inputs
+    3. Verify access control logic is correct
+    4. Update security documentation if needed
+```
+
+### API Documentation Sync
+```yaml
+- name: "API documentation update"
+  trigger: "src/api/**/*.py"
+  safety: "docs/api/**/*.md"
+  instructions: |
+    API code has changed. Please verify that API documentation in docs/api/
+    is up to date with the code changes. Pay special attention to:
+    - New or changed endpoints
+    - Modified request/response schemas
+    - Updated authentication requirements
+```
+
+## Output Format
+
+### .deepwork.rules.yml
+Create or update this file at the project root with the new rule entry.
+
+## Quality Criteria
+
+- Asked structured questions to understand user requirements
+- Rule name is clear and descriptive
+- Trigger patterns accurately match the intended files
+- Safety patterns prevent unnecessary triggering
+- Instructions are actionable and specific
+- YAML is valid and properly formatted
+
+## Context
+
+Rules are evaluated automatically when you finish working on a task. The system:
+1. Determines which files have changed based on each rule's `compare_to` setting:
+   - `base` (default): Files changed since the branch diverged from main/master
+   - `default_tip`: Files different from the current main/master branch
+   - `prompt`: Files changed since the last prompt submission
+2. Checks if any changes match rule trigger patterns
+3. Skips rules where safety patterns also matched
+4. Prompts you with instructions for any triggered rules
+
+You can mark a rule as addressed by including `<promise>Rule Name</promise>` in your response (replace Rule Name with the actual rule name). This tells the system you've already handled that rule's requirements.
+
+
+## Inputs
+
+### User Parameters
+
+Please gather the following information from the user:
+- **rule_purpose**: What guideline or constraint should this rule enforce?
+
+
+## Work Branch Management
+
+All work for this job should be done on a dedicated work branch:
+
+1. **Check current branch**:
+   - If already on a work branch for this job (format: `deepwork/deepwork_rules-[instance]-[date]`), continue using it
+   - If on main/master, create a new work branch
+
+2. **Create work branch** (if needed):
+   ```bash
+   git checkout -b deepwork/deepwork_rules-[instance]-$(date +%Y%m%d)
+   ```
+   Replace `[instance]` with a descriptive identifier (e.g., `acme`, `q1-launch`, etc.)
+
+## Output Requirements
+
+Create the following output(s):
+- `.deepwork.rules.yml`
+
+Ensure all outputs are:
+- Well-formatted and complete
+- Ready for review or use by subsequent steps
+
+## Completion
+
+After completing this step:
+
+1. **Verify outputs**: Confirm all required files have been created
+
+2. **Inform the user**:
+   - The define command is complete
+   - Outputs created: .deepwork.rules.yml
+   - This command can be run again anytime to make further changes
+
+## Command Complete
+
+This is a standalone command that can be run anytime. The outputs are ready for use.
+
+Consider:
+- Reviewing the outputs
+- Running `deepwork sync` if job definitions were changed
+- Re-running this command later if further changes are needed
+
+---
+
+## Context Files
+
+- Job definition: `.deepwork/jobs/deepwork_rules/job.yml`
+- Step instructions: `.deepwork/jobs/deepwork_rules/steps/define.md`
+"""
\ No newline at end of file
diff --git a/.gemini/commands/update/job.toml b/.gemini/commands/update/job.toml
index 474171d9..c38490e5 100644
--- a/.gemini/commands/update/job.toml
+++ b/.gemini/commands/update/job.toml
@@ -16,7 +16,7 @@ prompt = """
 ## Job Overview
 
 A workflow for maintaining standard jobs bundled with DeepWork. Standard jobs
-(like `deepwork_jobs` and `deepwork_policy`) are source-controlled in
+(like `deepwork_jobs` and `deepwork_rules`) are source-controlled in
 `src/deepwork/standard_jobs/` and must be edited there—never in `.deepwork/jobs/`
 or `.claude/commands/` directly.
 
@@ -60,7 +60,7 @@ Standard jobs exist in THREE locations, but only ONE is the source of truth:
 #### 1. Identify the Standard Job to Update
 
 From conversation context, determine:
-- Which standard job needs updating (e.g., `deepwork_jobs`, `deepwork_policy`)
+- Which standard job needs updating (e.g., `deepwork_jobs`, `deepwork_rules`)
 - What changes are needed (job.yml, step instructions, hooks, etc.)
 
 Current standard jobs:
diff --git a/CHANGELOG.md b/CHANGELOG.md
index afbd1221..41243448 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -8,22 +8,22 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [0.4.0] - 2026-01-16
 
 ### Added
-- Policy system v2 with frontmatter markdown format in `.deepwork/policies/`
+- Rules system v2 with frontmatter markdown format in `.deepwork/rules/`
   - Detection modes: trigger/safety (default), set (bidirectional), pair (directional)
   - Action types: prompt (show instructions), command (run idempotent commands)
   - Variable pattern matching with `{path}` (multi-segment) and `{name}` (single-segment)
-  - Queue system in `.deepwork/tmp/policy/queue/` for state tracking and deduplication
+  - Queue system in `.deepwork/tmp/rules/queue/` for state tracking and deduplication
 - New core modules:
   - `pattern_matcher.py`: Variable pattern matching with regex-based capture
-  - `policy_queue.py`: Queue system for policy state persistence
+  - `rules_queue.py`: Queue system for rule state persistence
   - `command_executor.py`: Command action execution with variable substitution
-- Updated `policy_check.py` hook to use v2 system with queue-based deduplication
+- Updated `rules_check.py` hook to use v2 system with queue-based deduplication
 
 ### Changed
-- Documentation updated with v2 policy examples and configuration
+- Documentation updated with v2 rules examples and configuration
 
 ### Removed
-- v1 policy format (`.deepwork.policy.yml`) - now only v2 frontmatter markdown format is supported
+- v1 rules format (`.deepwork.rules.yml`) - now only v2 frontmatter markdown format is supported
 
 ## [0.3.0] - 2026-01-16
 
@@ -31,18 +31,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Cross-platform hook wrapper system for writing hooks once and running on multiple platforms
   - `wrapper.py`: Normalizes input/output between Claude Code and Gemini CLI
   - `claude_hook.sh` and `gemini_hook.sh`: Platform-specific shell wrappers
-  - `policy_check.py`: Cross-platform policy evaluation hook
+  - `rules_check.py`: Cross-platform rule evaluation hook
 - Platform documentation in `doc/platforms/` with hook references and learnings
 - Claude Code platform documentation (`doc/platforms/claude/`)
 - `update.job` for maintaining standard jobs (#41)
 - `make_new_job.sh` script and templates directory for job scaffolding (#37)
-- Default policy template file created during `deepwork install` (#42)
+- Default rules template file created during `deepwork install` (#42)
 - Full e2e test suite: define → implement → execute workflow (#45)
 - Automated tests for all shell scripts and hook wrappers (#40)
 
 ### Changed
 - Standardized on "ask structured questions" phrasing across all jobs (#48)
-- deepwork_jobs bumped to v0.5.0, deepwork_policy to v0.2.0
+- deepwork_jobs bumped to v0.5.0, deepwork_rules to v0.2.0
 
 ### Fixed
 - Stop hooks now properly return blocking JSON (#38)
@@ -51,7 +51,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [0.1.1] - 2026-01-15
 
 ### Added
-- `compare_to` option in policy system for flexible change detection (#34)
+- `compare_to` option in rules system for flexible change detection (#34)
   - `base` (default): Compare to merge-base with default branch
   - `default_tip`: Two-dot diff against default branch tip
   - `prompt`: Compare to state captured at prompt submission
@@ -63,22 +63,22 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Supplementary markdown file support for job steps (#19)
 - Browser automation capability consideration in job definition (#32)
 - Platform-specific reload instructions in adapters (#31)
-- Version and changelog update policy to enforce version tracking on src changes
+- Version and changelog update rule to enforce version tracking on src changes
 - Added claude and copilot to CLA allowlist (#26)
 
 ### Changed
-- Moved git diff logic into evaluate_policies.py for per-policy handling (#34)
+- Moved git diff logic into evaluate_rules.py for per-rule handling (#34)
 - Renamed `capture_work_tree.sh` to `capture_prompt_work_tree.sh` (#34)
 - Updated README with PyPI install instructions using pipx, uv, and pip (#22)
 - Updated deepwork_jobs job version to 0.2.0
 
 ### Fixed
-- Stop hooks now correctly return blocking JSON when policies fire
+- Stop hooks now correctly return blocking JSON when rules fire
 - Added shell script tests to verify stop hook blocking behavior
 
 ### Removed
 - `refine` step (replaced by `learn` command) (#27)
-- `get_changed_files.sh` hook (logic moved to Python policy evaluator) (#34)
+- `get_changed_files.sh` hook (logic moved to Python rule evaluator) (#34)
 
 ## [0.1.0] - Initial Release
 
diff --git a/README.md b/README.md
index 6005b143..96816677 100644
--- a/README.md
+++ b/README.md
@@ -15,7 +15,7 @@ DeepWork is a tool for defining and executing multi-step workflows with AI codin
 | OpenCode | Planned | Markdown | No |
 | GitHub Copilot CLI | Planned | Markdown | No (tool permissions only) |
 
-> **Tip:** New to DeepWork? Claude Code has the most complete feature support, including quality validation hooks and automated policies. For browser automation, Claude in Chrome (Anthropic's browser extension) works well with DeepWork workflows.
+> **Tip:** New to DeepWork? Claude Code has the most complete feature support, including quality validation hooks and automated rules. For browser automation, Claude in Chrome (Anthropic's browser extension) works well with DeepWork workflows.
 
 ## Easy Installation
 In your Agent CLI (ex. `claude`), ask:
@@ -61,7 +61,7 @@ This will:
 - Create `.deepwork/` directory structure
 - Generate core DeepWork jobs
 - Install DeepWork jobs for your AI assistant
-- Configure hooks for your AI assistant to enable policies
+- Configure hooks for your AI assistant to enable rules
 
 ## Quick Start
 
@@ -177,10 +177,10 @@ DeepWork follows a **Git-native, installation-only** design:
 your-project/
 ├── .deepwork/
 │   ├── config.yml          # Platform configuration
-│   ├── policies/           # Policy definitions (v2 format)
-│   │   └── policy-name.md  # Individual policy files
+│   ├── rules/              # Rule definitions (v2 format)
+│   │   └── rule-name.md    # Individual rule files
 │   ├── tmp/                # Temporary state (gitignored)
-│   │   └── policy/queue/   # Policy evaluation queue
+│   │   └── rules/queue/    # Rule evaluation queue
 │   └── jobs/               # Job definitions
 │       └── job_name/
 │           ├── job.yml     # Job metadata
@@ -212,13 +212,13 @@ deepwork/
 │   │   ├── parser.py     # Job definition parsing
 │   │   ├── detector.py   # Platform detection
 │   │   ├── generator.py  # Skill file generation
-│   │   ├── policy_parser.py    # Policy parsing
+│   │   ├── rules_parser.py     # Rule parsing
 │   │   ├── pattern_matcher.py  # Variable pattern matching
-│   │   ├── policy_queue.py     # Policy state queue
+│   │   ├── rules_queue.py      # Rule state queue
 │   │   └── command_executor.py # Command action execution
 │   ├── hooks/            # Cross-platform hook wrappers
 │   │   ├── wrapper.py    # Input/output normalization
-│   │   ├── policy_check.py   # Policy evaluation hook
+│   │   ├── rules_check.py    # Rule evaluation hook
 │   │   ├── claude_hook.sh    # Claude Code adapter
 │   │   └── gemini_hook.sh    # Gemini CLI adapter
 │   ├── templates/        # Jinja2 templates
@@ -235,27 +235,27 @@ deepwork/
 
 ## Features
 
-### 📋 Job Definition
+### Job Definition
 Define structured, multi-step workflows where each step has clear requirements and produces specific results.
 - **Dependency Management**: Explicitly link steps with automatic sequence handling and cycle detection.
 - **Artifact Passing**: Seamlessly use file outputs from one step as inputs for future steps.
 - **Dynamic Inputs**: Support for both fixed file references and interactive user parameters.
 - **Human-Readable YAML**: Simple, declarative job definitions that are easy to version and maintain.
 
-### 🌿 Git-Native Workflow
+### Git-Native Workflow
 Maintain a clean repository with automatic branch management and isolation.
 - **Automatic Branching**: Every job execution happens on a dedicated work branch (e.g., `deepwork/my-job-2024`).
 - **Namespace Isolation**: Run multiple concurrent jobs or instances without versioning conflicts.
 - **Full Traceability**: All AI-generated changes, logs, and artifacts are tracked natively in your Git history.
 
-### 🛡️ Automated Policies
-Enforce project standards and best practices without manual oversight. Policies monitor file changes and automatically prompt your AI assistant to follow specific guidelines when relevant code is modified.
-- **Automatic Triggers**: Detect when specific files or directories are changed to fire relevant policies.
+### Automated Rules
+Enforce project standards and best practices without manual oversight. Rules monitor file changes and automatically prompt your AI assistant to follow specific guidelines when relevant code is modified.
+- **Automatic Triggers**: Detect when specific files or directories are changed to fire relevant rules.
 - **File Correspondence**: Define bidirectional (set) or directional (pair) relationships between files.
 - **Command Actions**: Run idempotent commands (formatters, linters) automatically when files change.
 - **Contextual Guidance**: Instructions are injected directly into the AI's workflow at the right moment.
 
-**Example Policy** (`.deepwork/policies/source-test-pairing.md`):
+**Example Rule** (`.deepwork/rules/source-test-pairing.md`):
 ```markdown
 ---
 name: Source/Test Pairing
@@ -267,7 +267,7 @@ When source files change, corresponding test files should also change.
 Please create or update tests for the modified source files.
 ```
 
-**Example Command Policy** (`.deepwork/policies/format-python.md`):
+**Example Command Rule** (`.deepwork/rules/format-python.md`):
 ```markdown
 ---
 name: Format Python
@@ -278,7 +278,7 @@ action:
 ---
 ```
 
-### 🚀 Multi-Platform Support
+### Multi-Platform Support
 Generate native commands and skills tailored for your AI coding assistant.
 - **Native Integration**: Works directly with the skill/command formats of supported agents.
 - **Context-Aware**: Skills include all necessary context (instructions, inputs, and dependencies) for the AI.
diff --git a/claude.md b/claude.md
index 34a4c011..9141a2b9 100644
--- a/claude.md
+++ b/claude.md
@@ -184,7 +184,7 @@ my-project/
 
 ## CRITICAL: Editing Standard Jobs
 
-**Standard jobs** (like `deepwork_jobs` and `deepwork_policy`) are bundled with DeepWork and installed to user projects. They exist in THREE locations:
+**Standard jobs** (like `deepwork_jobs` and `deepwork_rules`) are bundled with DeepWork and installed to user projects. They exist in THREE locations:
 
 1. **Source of truth**: `src/deepwork/standard_jobs/[job_name]/` - The canonical source files
 2. **Installed copy**: `.deepwork/jobs/[job_name]/` - Installed by `deepwork install`
@@ -209,7 +209,7 @@ Instead, follow this workflow:
 
 Standard jobs are defined in `src/deepwork/standard_jobs/`. Currently:
 - `deepwork_jobs` - Core job management commands (define, implement, refine)
-- `deepwork_policy` - Policy enforcement system
+- `deepwork_rules` - Rules enforcement system
 
 If a job exists in `src/deepwork/standard_jobs/`, it is a standard job and MUST be edited there.
 
diff --git a/doc/architecture.md b/doc/architecture.md
index 282f1f87..8c494beb 100644
--- a/doc/architecture.md
+++ b/doc/architecture.md
@@ -46,9 +46,9 @@ deepwork/                       # DeepWork tool repository
 │       │   ├── detector.py     # AI platform detection
 │       │   ├── generator.py    # Command file generation
 │       │   ├── parser.py       # Job definition parsing
-│       │   ├── policy_parser.py    # Policy definition parsing
-│       │   ├── pattern_matcher.py  # Variable pattern matching for policies
-│       │   ├── policy_queue.py     # Policy state queue system
+│       │   ├── rules_parser.py     # Rule definition parsing
+│       │   ├── pattern_matcher.py  # Variable pattern matching for rules
+│       │   ├── rules_queue.py      # Rule state queue system
 │       │   ├── command_executor.py # Command action execution
 │       │   └── hooks_syncer.py     # Hook syncing to platforms
 │       ├── hooks/              # Hook system and cross-platform wrappers
@@ -56,7 +56,7 @@ deepwork/                       # DeepWork tool repository
 │       │   ├── wrapper.py           # Cross-platform input/output normalization
 │       │   ├── claude_hook.sh       # Shell wrapper for Claude Code
 │       │   ├── gemini_hook.sh       # Shell wrapper for Gemini CLI
-│       │   └── policy_check.py      # Cross-platform policy evaluation hook
+│       │   └── rules_check.py       # Cross-platform rule evaluation hook
 │       ├── templates/          # Command templates for each platform
 │       │   ├── claude/
 │       │   │   └── command-job-step.md.jinja
@@ -66,7 +66,7 @@ deepwork/                       # DeepWork tool repository
 │       │   ├── deepwork_jobs/
 │       │   │   ├── job.yml
 │       │   │   └── steps/
-│       │   └── deepwork_policy/   # Policy management job
+│       │   └── deepwork_rules/   # Rule management job
 │       │       ├── job.yml
 │       │       ├── steps/
 │       │       │   └── define.md
@@ -74,10 +74,10 @@ deepwork/                       # DeepWork tool repository
 │       │           ├── global_hooks.yml
 │       │           ├── user_prompt_submit.sh
 │       │           ├── capture_prompt_work_tree.sh
-│       │           └── policy_stop_hook.sh
+│       │           └── rules_stop_hook.sh
 │       ├── schemas/            # Definition schemas
 │       │   ├── job_schema.py
-│       │   └── policy_schema.py
+│       │   └── rules_schema.py
 │       └── utils/
 │           ├── fs.py
 │           ├── git.py
@@ -122,9 +122,9 @@ def install(platform: str):
     # Inject core job definitions
     inject_deepwork_jobs(".deepwork/jobs/")
 
-    # Create default policy template (if not exists)
-    if not exists(".deepwork.policy.yml"):
-        copy_template("default_policy.yml", ".deepwork.policy.yml")
+    # Create default rules template (if not exists)
+    if not exists(".deepwork.rules.yml"):
+        copy_template("default_rules.yml", ".deepwork.rules.yml")
 
     # Update config (supports multiple platforms)
     config = load_yaml(".deepwork/config.yml") or {}
@@ -283,23 +283,23 @@ my-project/                     # User's project (target)
 │       ├── deepwork_jobs.define.md         # Core DeepWork commands
 │       ├── deepwork_jobs.implement.md
 │       ├── deepwork_jobs.refine.md
-│       ├── deepwork_policy.define.md       # Policy management
+│       ├── deepwork_rules.define.md        # Rule management
 │       ├── competitive_research.identify_competitors.md
 │       └── ...
 ├── .deepwork/                  # DeepWork configuration
 │   ├── config.yml              # Platform config
 │   ├── .gitignore              # Ignores tmp/ directory
-│   ├── policies/               # Policy definitions (v2 format)
+│   ├── rules/                  # Rule definitions (v2 format)
 │   │   ├── source-test-pairing.md
 │   │   ├── format-python.md
 │   │   └── api-docs.md
 │   ├── tmp/                    # Temporary state (gitignored)
-│   │   └── policy/queue/       # Policy evaluation queue
+│   │   └── rules/queue/        # Rule evaluation queue
 │   └── jobs/                   # Job definitions
 │       ├── deepwork_jobs/      # Core job for managing jobs
 │       │   ├── job.yml
 │       │   └── steps/
-│       ├── deepwork_policy/    # Policy management job
+│       ├── deepwork_rules/     # Rule management job
 │       │   ├── job.yml
 │       │   ├── steps/
 │       │   │   └── define.md
@@ -307,7 +307,7 @@ my-project/                     # User's project (target)
 │       │       ├── global_hooks.yml
 │       │       ├── user_prompt_submit.sh
 │       │       ├── capture_prompt_work_tree.sh
-│       │       └── policy_stop_hook.sh
+│       │       └── rules_stop_hook.sh
 │       ├── competitive_research/
 │       │   ├── job.yml         # Job metadata
 │       │   └── steps/
@@ -1001,26 +1001,26 @@ Github Actions are used for all CI/CD tasks.
 
 ---
 
-## Policies
+## Rules
 
-Policies are automated enforcement rules that trigger based on file changes during an AI agent session. They help ensure that:
+Rules are automated enforcement mechanisms that trigger based on file changes during an AI agent session. They help ensure that:
 - Documentation stays in sync with code changes
 - Security reviews happen when sensitive code is modified
 - Team guidelines are followed automatically
 - File correspondences are maintained (e.g., source/test pairing)
 
-### Policy System v2 (Frontmatter Markdown)
+### Rules System v2 (Frontmatter Markdown)
 
-Policies are defined as individual markdown files in `.deepwork/policies/`:
+Rules are defined as individual markdown files in `.deepwork/rules/`:
 
 ```
-.deepwork/policies/
+.deepwork/rules/
 ├── source-test-pairing.md
 ├── format-python.md
 └── api-docs.md
 ```
 
-Each policy file uses YAML frontmatter with a markdown body for instructions:
+Each rule file uses YAML frontmatter with a markdown body for instructions:
 
 ```markdown
 ---
@@ -1035,7 +1035,7 @@ Please create or update tests for the modified source files.
 
 ### Detection Modes
 
-Policies support three detection modes:
+Rules support three detection modes:
 
 **1. Trigger/Safety (default)** - Fire when trigger matches but safety doesn't:
 ```yaml
@@ -1089,43 +1089,43 @@ action:
 ---
 ```
 
-### Policy Evaluation Flow
+### Rule Evaluation Flow
 
 1. **Session Start**: When a Claude Code session begins, the baseline git state is captured
 2. **Agent Works**: The AI agent performs tasks, potentially modifying files
 3. **Session Stop**: When the agent finishes (after_agent event):
    - Changed files are detected based on `compare_to` setting (base, default_tip, or prompt)
-   - Each policy is evaluated based on its detection mode
-   - Queue entries are created in `.deepwork/tmp/policy/queue/` for deduplication
+   - Each rule is evaluated based on its detection mode
+   - Queue entries are created in `.deepwork/tmp/rules/queue/` for deduplication
    - For command actions: commands are executed, results tracked
-   - For prompt actions: if policy fires and not already promised, agent is prompted
-4. **Promise Tags**: Agents can mark policies as addressed by including `<promise>✓ Policy Name</promise>` in their response
+   - For prompt actions: if rule fires and not already promised, agent is prompted
+4. **Promise Tags**: Agents can mark rules as addressed by including `<promise>✓ Rule Name</promise>` in their response
 
 ### Queue System
 
-Policy state is tracked in `.deepwork/tmp/policy/queue/` with files named `{hash}.{status}.json`:
+Rule state is tracked in `.deepwork/tmp/rules/queue/` with files named `{hash}.{status}.json`:
 - `queued` - Detected, awaiting evaluation
-- `passed` - Policy satisfied (promise found or command succeeded)
-- `failed` - Policy not satisfied
+- `passed` - Rule satisfied (promise found or command succeeded)
+- `failed` - Rule not satisfied
 - `skipped` - Safety pattern matched
 
-This prevents re-prompting for the same policy violation within a session.
+This prevents re-prompting for the same rule violation within a session.
 
 ### Hook Integration
 
-The v2 policy system uses the cross-platform hook wrapper:
+The v2 rules system uses the cross-platform hook wrapper:
 
 ```
 src/deepwork/hooks/
 ├── wrapper.py           # Cross-platform input/output normalization
-├── policy_check.py      # Policy evaluation hook (v2)
+├── rules_check.py       # Rule evaluation hook (v2)
 ├── claude_hook.sh       # Claude Code shell wrapper
 └── gemini_hook.sh       # Gemini CLI shell wrapper
 ```
 
 Hooks are called via the shell wrappers:
 ```bash
-claude_hook.sh deepwork.hooks.policy_check
+claude_hook.sh deepwork.hooks.rules_check
 ```
 
 The hooks are installed to `.claude/settings.json` during `deepwork sync`:
@@ -1134,7 +1134,7 @@ The hooks are installed to `.claude/settings.json` during `deepwork sync`:
 {
   "hooks": {
     "Stop": [
-      {"matcher": "", "hooks": [{"type": "command", "command": ".deepwork/jobs/deepwork_policy/hooks/policy_stop_hook.sh"}]}
+      {"matcher": "", "hooks": [{"type": "command", "command": ".deepwork/jobs/deepwork_rules/hooks/rules_stop_hook.sh"}]}
     ]
   }
 }
@@ -1190,34 +1190,34 @@ def my_hook(input: HookInput) -> HookOutput:
 
 See `doc/platforms/` for detailed platform-specific hook documentation.
 
-### Policy Schema
+### Rule Schema
 
-Policies are validated against a JSON Schema:
+Rules are validated against a JSON Schema:
 
 ```yaml
-- name: string          # Required: Friendly name for the policy
+- name: string          # Required: Friendly name for the rule
   trigger: string|array # Required: Glob pattern(s) for triggering files
   safety: string|array  # Optional: Glob pattern(s) for safety files
   instructions: string  # Required (unless instructions_file): What to do
   instructions_file: string  # Alternative: Path to instructions file
 ```
 
-### Defining Policies
+### Defining Rules
 
-Use the `/deepwork_policy.define` command to interactively create policies:
+Use the `/deepwork_rules.define` command to interactively create rules:
 
 ```
-User: /deepwork_policy.define
+User: /deepwork_rules.define
 
-Claude: I'll help you define a new policy. What guideline or constraint
-        should this policy enforce?
+Claude: I'll help you define a new rule. What guideline or constraint
+        should this rule enforce?
 
 User: When API code changes, the API documentation should be updated
 
 Claude: Got it. Let me ask a few questions...
         [Interactive dialog to define trigger, safety, and instructions]
 
-Claude: ✓ Created policy "API documentation update" in .deepwork.policy.yml
+Claude: ✓ Created rule "API documentation update" in .deepwork.rules.yml
 ```
 
 ---
diff --git a/doc/platforms/gemini/hooks.md b/doc/platforms/gemini/hooks.md
index b9103a6f..e8cc11d9 100644
--- a/doc/platforms/gemini/hooks.md
+++ b/doc/platforms/gemini/hooks.md
@@ -34,11 +34,11 @@ Hooks are configured in `settings.json` at various levels:
         "matcher": "*",
         "hooks": [
           {
-            "name": "policy-check",
+            "name": "rules-check",
             "type": "command",
-            "command": ".gemini/hooks/policy_check.sh",
+            "command": ".gemini/hooks/rules_check.sh",
             "timeout": 60000,
-            "description": "Evaluates DeepWork policies"
+            "description": "Evaluates DeepWork rules"
           }
         ]
       }
@@ -264,7 +264,7 @@ Block the agent from completing:
 ```json
 {
   "decision": "deny",
-  "reason": "Policy X requires attention before completing"
+  "reason": "Rule X requires attention before completing"
 }
 ```
 
@@ -287,7 +287,7 @@ Block tool execution:
 ```json
 {
   "decision": "deny",
-  "reason": "Security policy violation"
+  "reason": "Security rule violation"
 }
 ```
 
diff --git a/doc/policy_syntax.md b/doc/rules_syntax.md
similarity index 73%
rename from doc/policy_syntax.md
rename to doc/rules_syntax.md
index 4914a8da..f4c3ae83 100644
--- a/doc/policy_syntax.md
+++ b/doc/rules_syntax.md
@@ -1,14 +1,14 @@
-# Policy Configuration Syntax
+# Rules Configuration Syntax
 
-This document describes the syntax for policy files in the `.deepwork/policies/` directory.
+This document describes the syntax for rule files in the `.deepwork/rules/` directory.
 
 ## Directory Structure
 
-Policies are stored as individual markdown files with YAML frontmatter:
+Rules are stored as individual markdown files with YAML frontmatter:
 
 ```
 .deepwork/
-└── policies/
+└── rules/
     ├── readme-accuracy.md
     ├── source-test-pairing.md
     ├── api-documentation.md
@@ -19,9 +19,9 @@ Each file has:
 - **Frontmatter**: YAML configuration between `---` delimiters
 - **Body**: Instructions (for prompt actions) or description (for command actions)
 
-This structure enables code files to reference policies:
+This structure enables code files to reference rules:
 ```python
-# Read the policy `.deepwork/policies/source-test-pairing.md` before editing
+# Read the rule `.deepwork/rules/source-test-pairing.md` before editing
 class AuthService:
     ...
 ```
@@ -30,7 +30,7 @@ class AuthService:
 
 ### Simple Trigger with Prompt
 
-`.deepwork/policies/readme-accuracy.md`:
+`.deepwork/rules/readme-accuracy.md`:
 ```markdown
 ---
 name: README Accuracy
@@ -47,7 +47,7 @@ Check that:
 
 ### Correspondence Set (bidirectional)
 
-`.deepwork/policies/source-test-pairing.md`:
+`.deepwork/rules/source-test-pairing.md`:
 ```markdown
 ---
 name: Source/Test Pairing
@@ -63,7 +63,7 @@ When adding tests, ensure they test actual source code.
 
 ### Correspondence Pair (directional)
 
-`.deepwork/policies/api-documentation.md`:
+`.deepwork/rules/api-documentation.md`:
 ```markdown
 ---
 name: API Documentation
@@ -81,7 +81,7 @@ When modifying an API endpoint, update its documentation to reflect:
 
 ### Command Action
 
-`.deepwork/policies/python-formatting.md`:
+`.deepwork/rules/python-formatting.md`:
 ```markdown
 ---
 name: Python Formatting
@@ -91,17 +91,17 @@ action:
 ---
 Automatically formats Python files using ruff.
 
-This policy runs `ruff format` on any changed Python files to ensure
+This rule runs `ruff format` on any changed Python files to ensure
 consistent code style across the codebase.
 ```
 
-## Policy Structure
+## Rule Structure
 
-Every policy has two orthogonal aspects:
+Every rule has two orthogonal aspects:
 
 ### Detection Mode
 
-How the policy decides when to fire:
+How the rule decides when to fire:
 
 | Mode | Field | Description |
 |------|-------|-------------|
@@ -111,7 +111,7 @@ How the policy decides when to fire:
 
 ### Action Type
 
-What happens when the policy fires:
+What happens when the rule fires:
 
 | Type | Field | Description |
 |------|-------|-------------|
@@ -153,8 +153,8 @@ set:
 1. A file changes that matches one pattern in the set
 2. System extracts the variable portions (e.g., `{path}`)
 3. System generates expected files by substituting into other patterns
-4. If ALL expected files also changed: policy is satisfied (no trigger)
-5. If ANY expected file is missing: policy fires
+4. If ALL expected files also changed: rule is satisfied (no trigger)
+5. If ANY expected file is missing: rule fires
 
 If `src/auth/login.py` changes:
 - Extracts `{path}` = `auth/login`
@@ -208,7 +208,7 @@ The markdown body after frontmatter serves as instructions shown to the agent. T
 
 | Variable | Description |
 |----------|-------------|
-| `{trigger_file}` | The file that triggered the policy |
+| `{trigger_file}` | The file that triggered the rule |
 | `{trigger_files}` | All files that matched trigger patterns |
 | `{expected_files}` | Expected corresponding files (for sets/pairs) |
 
@@ -237,7 +237,7 @@ action:
 
 **Idempotency Requirement:**
 
-Commands should be idempotent—running them multiple times produces the same result. Lint formatters like `black`, `ruff format`, and `prettier` are good examples: they produce consistent output regardless of how many times they run.
+Commands should be idempotent--running them multiple times produces the same result. Lint formatters like `black`, `ruff format`, and `prettier` are good examples: they produce consistent output regardless of how many times they run.
 
 ## Pattern Syntax
 
@@ -259,9 +259,9 @@ Variable patterns use `{name}` syntax to capture path segments:
 
 | Pattern | Captures | Example Match |
 |---------|----------|---------------|
-| `src/{path}.py` | `{path}` = multi-segment path | `src/foo/bar.py` → `path=foo/bar` |
-| `src/{name}.py` | `{name}` = single segment | `src/utils.py` → `name=utils` |
-| `{module}/{name}.py` | Both variables | `auth/login.py` → `module=auth, name=login` |
+| `src/{path}.py` | `{path}` = multi-segment path | `src/foo/bar.py` -> `path=foo/bar` |
+| `src/{name}.py` | `{name}` = single segment | `src/utils.py` -> `name=utils` |
+| `{module}/{name}.py` | Both variables | `auth/login.py` -> `module=auth, name=login` |
 
 **Variable Naming Conventions:**
 
@@ -284,15 +284,15 @@ By default, `{path}` matches multiple path segments and `{name}` matches one:
 To explicitly control this, use `{**name}` for multi-segment or `{*name}` for single:
 
 ```yaml
-- "src/{**module}/index.py"   # src/foo/bar/index.py → module=foo/bar
-- "src/{*component}.py"       # src/Button.py → component=Button
+- "src/{**module}/index.py"   # src/foo/bar/index.py -> module=foo/bar
+- "src/{*component}.py"       # src/Button.py -> component=Button
 ```
 
 ## Field Reference
 
 ### name (required)
 
-Human-friendly name for the policy. Displayed in promise tags and output.
+Human-friendly name for the rule. Displayed in promise tags and output.
 
 ```yaml
 ---
@@ -302,16 +302,16 @@ name: Source/Test Pairing
 
 ### File Naming
 
-Policy files are named using kebab-case with `.md` extension:
+Rule files are named using kebab-case with `.md` extension:
 - `readme-accuracy.md`
 - `source-test-pairing.md`
 - `api-documentation.md`
 
-The filename serves as the policy's identifier in the queue system.
+The filename serves as the rule's identifier in the queue system.
 
 ### trigger
 
-File patterns that cause the policy to fire (trigger/safety mode). Can be string or array.
+File patterns that cause the rule to fire (trigger/safety mode). Can be string or array.
 
 ```yaml
 ---
@@ -327,7 +327,7 @@ trigger:
 
 ### safety (optional)
 
-File patterns that suppress the policy. If ANY changed file matches a safety pattern, the policy does not fire.
+File patterns that suppress the rule. If ANY changed file matches a safety pattern, the rule does not fire.
 
 ```yaml
 ---
@@ -403,9 +403,9 @@ compare_to: prompt
 
 ## Complete Examples
 
-### Example 1: Test Coverage Policy
+### Example 1: Test Coverage Rule
 
-`.deepwork/policies/test-coverage.md`:
+`.deepwork/rules/test-coverage.md`:
 ```markdown
 ---
 name: Test Coverage
@@ -425,7 +425,7 @@ Please either:
 
 ### Example 2: Documentation Sync
 
-`.deepwork/policies/api-documentation-sync.md`:
+`.deepwork/rules/api-documentation-sync.md`:
 ```markdown
 ---
 name: API Documentation Sync
@@ -442,7 +442,7 @@ API endpoint changed. Please update:
 
 ### Example 3: Auto-formatting Pipeline
 
-`.deepwork/policies/python-black-formatting.md`:
+`.deepwork/rules/python-black-formatting.md`:
 ```markdown
 ---
 name: Python Black Formatting
@@ -463,7 +463,7 @@ Excludes:
 
 ### Example 4: Multi-file Correspondence
 
-`.deepwork/policies/full-stack-feature-sync.md`:
+`.deepwork/rules/full-stack-feature-sync.md`:
 ```markdown
 ---
 name: Full Stack Feature Sync
@@ -484,7 +484,7 @@ When modifying a feature, ensure:
 
 ### Example 5: Conditional Safety
 
-`.deepwork/policies/version-bump-required.md`:
+`.deepwork/rules/version-bump-required.md`:
 ```markdown
 ---
 name: Version Bump Required
@@ -499,56 +499,56 @@ Code changes detected. Before merging, ensure:
 - Version is bumped in pyproject.toml (if needed)
 - CHANGELOG.md is updated
 
-This policy is suppressed if you've already modified pyproject.toml
+This rule is suppressed if you've already modified pyproject.toml
 or CHANGELOG.md, as that indicates you're handling versioning.
 ```
 
 ## Promise Tags
 
-When a policy fires but should be dismissed, use promise tags in the conversation. The tag content should be human-readable, using the policy's `name` field with a checkmark:
+When a rule fires but should be dismissed, use promise tags in the conversation. The tag content should be human-readable, using the rule's `name` field:
 
 ```
-<promise>✓ Source/Test Pairing</promise>
-<promise>✓ API Documentation Sync</promise>
+<promise>Source/Test Pairing</promise>
+<promise>API Documentation Sync</promise>
 ```
 
-The checkmark and friendly name make promise tags easy to read when displayed in the conversation. The system matches promise tags to policies using case-insensitive comparison of the `name` field (ignoring the checkmark prefix).
+The friendly name makes promise tags easy to read when displayed in the conversation. The system matches promise tags to rules using case-insensitive comparison of the `name` field.
 
 ## Validation
 
-Policy files are validated on load. Common errors:
+Rule files are validated on load. Common errors:
 
 **Invalid frontmatter:**
 ```
-Error: .deepwork/policies/my-policy.md - invalid YAML frontmatter
+Error: .deepwork/rules/my-rule.md - invalid YAML frontmatter
 ```
 
 **Missing required field:**
 ```
-Error: .deepwork/policies/my-policy.md - must have 'trigger', 'set', or 'pair'
+Error: .deepwork/rules/my-rule.md - must have 'trigger', 'set', or 'pair'
 ```
 
 **Invalid pattern:**
 ```
-Error: .deepwork/policies/test-coverage.md - invalid pattern "src/{path" - unclosed brace
+Error: .deepwork/rules/test-coverage.md - invalid pattern "src/{path" - unclosed brace
 ```
 
 **Conflicting fields:**
 ```
-Error: .deepwork/policies/my-policy.md - has both 'trigger' and 'set' - use one or the other
+Error: .deepwork/rules/my-rule.md - has both 'trigger' and 'set' - use one or the other
 ```
 
 **Empty body:**
 ```
-Error: .deepwork/policies/my-policy.md - instruction policies require markdown body
+Error: .deepwork/rules/my-rule.md - instruction rules require markdown body
 ```
 
-## Referencing Policies in Code
+## Referencing Rules in Code
 
-A key benefit of the `.deepwork/policies/` folder structure is that code files can reference policies directly:
+A key benefit of the `.deepwork/rules/` folder structure is that code files can reference rules directly:
 
 ```python
-# Read `.deepwork/policies/source-test-pairing.md` before editing this file
+# Read `.deepwork/rules/source-test-pairing.md` before editing this file
 
 class UserService:
     """Service for user management."""
@@ -556,7 +556,7 @@ class UserService:
 ```
 
 ```typescript
-// This file is governed by `.deepwork/policies/api-documentation.md`
+// This file is governed by `.deepwork/rules/api-documentation.md`
 // Any changes here require corresponding documentation updates
 
 export async function createUser(data: UserInput): Promise<User> {
@@ -564,4 +564,4 @@ export async function createUser(data: UserInput): Promise<User> {
 }
 ```
 
-This helps AI agents and human developers understand which policies apply to specific files.
+This helps AI agents and human developers understand which rules apply to specific files.
diff --git a/doc/policy_system_design.md b/doc/rules_system_design.md
similarity index 80%
rename from doc/policy_system_design.md
rename to doc/rules_system_design.md
index d15e65be..24e296b5 100644
--- a/doc/policy_system_design.md
+++ b/doc/rules_system_design.md
@@ -1,8 +1,8 @@
-# Policy System Design
+# Rules System Design
 
 ## Overview
 
-The deepwork policy system enables automated enforcement of development standards during AI-assisted coding sessions. This document describes the architecture for the next-generation policy system with support for:
+The deepwork rules system enables automated enforcement of development standards during AI-assisted coding sessions. This document describes the architecture for the next-generation rules system with support for:
 
 1. **File correspondence matching** (sets and pairs)
 2. **Idempotent command execution**
@@ -11,11 +11,11 @@ The deepwork policy system enables automated enforcement of development standard
 
 ## Core Concepts
 
-### Policy Structure
+### Rule Structure
 
-Every policy has two orthogonal aspects:
+Every rule has two orthogonal aspects:
 
-**Detection Mode** - How the policy decides when to fire:
+**Detection Mode** - How the rule decides when to fire:
 
 | Mode | Field | Description |
 |------|-------|-------------|
@@ -23,7 +23,7 @@ Every policy has two orthogonal aspects:
 | **Set** | `set` | Fire when file correspondence is incomplete (bidirectional) |
 | **Pair** | `pair` | Fire when file correspondence is incomplete (directional) |
 
-**Action Type** - What happens when the policy fires:
+**Action Type** - What happens when the rule fires:
 
 | Type | Field | Description |
 |------|-------|-------------|
@@ -44,7 +44,7 @@ Every policy has two orthogonal aspects:
 **Pair Mode (Directional Correspondence)**
 - Define a trigger pattern and one or more expected patterns
 - Changes to trigger files require corresponding expected files to also change
-- Changes to expected files alone do not trigger the policy
+- Changes to expected files alone do not trigger the rule
 - Example: API code requires documentation updates
 
 ### Pattern Variables
@@ -65,7 +65,7 @@ Special variable names:
 ### Action Types
 
 **Prompt Action (default)**
-The markdown body of the policy file serves as instructions shown to the agent.
+The markdown body of the rule file serves as instructions shown to the agent.
 
 **Command Action**
 ```yaml
@@ -82,14 +82,14 @@ Command actions should be idempotent—running them multiple times produces the
 
 ```
 ┌─────────────────────────────────────────────────────────────────┐
-│                        Policy System                             │
+│                        Rules System                              │
 ├─────────────────────────────────────────────────────────────────┤
 │                                                                  │
 │  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐      │
 │  │   Detector   │───▶│    Queue     │◀───│  Evaluator   │      │
 │  │              │    │              │    │              │      │
 │  │ - Watch files│    │ .deepwork/   │    │ - Process    │      │
-│  │ - Match pols │    │ tmp/policy/  │    │   queued     │      │
+│  │ - Match rules│    │ tmp/rules/   │    │   queued     │      │
 │  │ - Create     │    │ queue/       │    │ - Run action │      │
 │  │   entries    │    │              │    │ - Update     │      │
 │  └──────────────┘    └──────────────┘    │   status     │      │
@@ -109,15 +109,15 @@ Command actions should be idempotent—running them multiple times produces the
 
 ### Detector
 
-The detector identifies when policies should be evaluated:
+The detector identifies when rules should be evaluated:
 
-1. **Trigger Detection**: Monitors for file changes that match policy triggers
+1. **Trigger Detection**: Monitors for file changes that match rule triggers
 2. **Deduplication**: Computes a hash to avoid re-processing identical triggers
 3. **Queue Entry Creation**: Creates entries for the evaluator to process
 
 **Trigger Hash Computation**:
 ```python
-hash_input = f"{policy_name}:{sorted(trigger_files)}:{baseline_ref}"
+hash_input = f"{rule_name}:{sorted(trigger_files)}:{baseline_ref}"
 trigger_hash = sha256(hash_input.encode()).hexdigest()[:12]
 ```
 
@@ -128,20 +128,20 @@ The baseline_ref varies by `compare_to` mode:
 
 ### Queue
 
-The queue persists policy trigger state in `.deepwork/tmp/policy/queue/`:
+The queue persists rule trigger state in `.deepwork/tmp/rules/queue/`:
 
 ```
-.deepwork/tmp/policy/queue/
+.deepwork/tmp/rules/queue/
 ├── {hash}.queued.json      # Detected, awaiting evaluation
-├── {hash}.passed.json      # Evaluated, policy satisfied
-├── {hash}.failed.json      # Evaluated, policy not satisfied
+├── {hash}.passed.json      # Evaluated, rule satisfied
+├── {hash}.failed.json      # Evaluated, rule not satisfied
 └── {hash}.skipped.json     # Safety pattern matched, skipped
 ```
 
 **Queue Entry Schema**:
 ```json
 {
-  "policy_name": "string",
+  "rule_name": "string",
   "trigger_hash": "string",
   "status": "queued|passed|failed|skipped",
   "created_at": "ISO8601 timestamp",
@@ -233,22 +233,22 @@ def resolve_pattern(pattern: str, variables: dict[str, str]) -> str:
 
 ## Evaluation Flow
 
-### Standard Instruction Policy
+### Standard Instruction Rule
 
 ```
 1. Detector: File changes detected
-2. Detector: Check each policy's trigger patterns
-3. Detector: For matching policy, compute trigger hash
+2. Detector: Check each rule's trigger patterns
+3. Detector: For matching rule, compute trigger hash
 4. Detector: If hash not in queue, create .queued entry
 5. Evaluator: Process queued entry
 6. Evaluator: Check safety patterns against changed files
 7. Evaluator: If safety matches, mark .skipped
 8. Evaluator: If no safety match, return instructions to agent
-9. Agent: Addresses policy, includes <promise> tag
+9. Agent: Addresses rule, includes <promise> tag
 10. Evaluator: On next check, mark .passed (promise found)
 ```
 
-### Correspondence Policy (Set)
+### Correspondence Rule (Set)
 
 ```
 1. Detector: File src/foo/bar.py changed
@@ -261,7 +261,7 @@ def resolve_pattern(pattern: str, variables: dict[str, str]) -> str:
 7. Evaluator: Return instructions prompting for test update
 ```
 
-### Correspondence Policy (Pair)
+### Correspondence Rule (Pair)
 
 ```
 1. Detector: File api/users.py changed (trigger pattern)
@@ -273,14 +273,14 @@ def resolve_pattern(pattern: str, variables: dict[str, str]) -> str:
 7. Evaluator: Return instructions
 
 Note: If only docs/api/users.md changed (not api/users.py),
-the pair policy does NOT trigger (directional).
+the pair rule does NOT trigger (directional).
 ```
 
-### Command Policy
+### Command Rule
 
 ```
 1. Detector: Python file changed, matches "**/*.py"
-2. Detector: Create .queued entry for format policy
+2. Detector: Create .queued entry for format rule
 3. Evaluator: Execute "ruff format {file}"
 4. Evaluator: Run git diff to check for changes
 5. Evaluator: If changes made, re-run command (idempotency check)
@@ -292,15 +292,15 @@ the pair policy does NOT trigger (directional).
 
 ### Problem
 
-When many policies trigger, the agent receives excessive output, degrading performance.
+When many rules trigger, the agent receives excessive output, degrading performance.
 
 ### Solution
 
 **1. Output Batching**
-Group related policies into concise sections:
+Group related rules into concise sections:
 
 ```
-The following policies require attention:
+The following rules require attention:
 
 ## Source/Test Pairing
 src/auth/login.py → tests/auth/login_test.py
@@ -313,8 +313,8 @@ api/users.py → docs/api/users.md
 Source files changed. Verify README.md is accurate.
 ```
 
-**2. Grouped by Policy Name**
-Multiple violations of the same policy are grouped together under a single heading, keeping output compact.
+**2. Grouped by Rule Name**
+Multiple violations of the same rule are grouped together under a single heading, keeping output compact.
 
 **3. Minimal Decoration**
 Avoid excessive formatting, numbering, or emphasis. Use simple arrow notation for correspondence violations.
@@ -325,13 +325,13 @@ Avoid excessive formatting, numbering, or emphasis. Use simple arrow notation fo
 
 ```
 .deepwork/
-├── policies/                # Policy definitions (frontmatter markdown)
+├── rules/                   # Rule definitions (frontmatter markdown)
 │   ├── readme-accuracy.md
 │   ├── source-test-pairing.md
 │   ├── api-documentation.md
 │   └── python-formatting.md
 ├── tmp/                     # GITIGNORED - transient state
-│   └── policy/
+│   └── rules/
 │       ├── queue/           # Queue entries
 │       │   ├── abc123.queued.json
 │       │   └── def456.passed.json
@@ -339,14 +339,14 @@ Avoid excessive formatting, numbering, or emphasis. Use simple arrow notation fo
 │       │   └── prompt_1705420800.json
 │       └── cache/           # Pattern matching cache
 │           └── patterns.json
-└── policy_state.json        # Session state summary
+└── rules_state.json         # Session state summary
 ```
 
 **Important:** The entire `.deepwork/tmp/` directory is gitignored. All queue entries, baselines, and caches are local transient state that is not committed. This means cleanup is not critical—files can accumulate and will be naturally cleaned when the directory is deleted or the repo is re-cloned.
 
-### Policy File Format
+### Rule File Format
 
-Each policy is a markdown file with YAML frontmatter:
+Each rule is a markdown file with YAML frontmatter:
 
 ```markdown
 ---
@@ -354,14 +354,14 @@ name: README Accuracy
 trigger: src/**/*.py
 safety: README.md
 ---
-Instructions shown to the agent when this policy fires.
+Instructions shown to the agent when this rule fires.
 
 These can be multi-line with full markdown formatting.
 ```
 
 This format enables:
-1. Code files to reference policies in comments
-2. Human-readable policy documentation
+1. Code files to reference rules in comments
+2. Human-readable rule documentation
 3. Easy editing with any markdown editor
 4. Clear separation of configuration and content
 
@@ -402,10 +402,10 @@ Terminal states persist in `.deepwork/tmp/` (gitignored) until manually cleared
 
 ### Pattern Errors
 
-Invalid patterns are caught at policy load time:
+Invalid patterns are caught at rule load time:
 
 ```python
-class PatternError(PolicyError):
+class PatternError(RulesError):
     """Invalid pattern syntax."""
     pass
 
@@ -442,25 +442,25 @@ If queue entries become corrupted:
 
 ## Configuration
 
-### Policy Files
+### Rule Files
 
-Policies are stored in `.deepwork/policies/` as individual markdown files with YAML frontmatter. See `doc/policy_syntax.md` for complete syntax documentation.
+Rules are stored in `.deepwork/rules/` as individual markdown files with YAML frontmatter. See `doc/rules_syntax.md` for complete syntax documentation.
 
 **Loading Order:**
-1. All `.md` files in `.deepwork/policies/` are loaded
+1. All `.md` files in `.deepwork/rules/` are loaded
 2. Files are processed in alphabetical order
-3. Filename (without extension) becomes policy identifier
+3. Filename (without extension) becomes rule identifier
 
-**Policy Discovery:**
+**Rule Discovery:**
 ```python
-def load_policies(policies_dir: Path) -> list[Policy]:
-    """Load all policies from the policies directory."""
-    policies = []
-    for path in sorted(policies_dir.glob("*.md")):
-        policy = parse_policy_file(path)
-        policy.name = path.stem  # filename without .md
-        policies.append(policy)
-    return policies
+def load_rules(rules_dir: Path) -> list[Rule]:
+    """Load all rules from the rules directory."""
+    rules = []
+    for path in sorted(rules_dir.glob("*.md")):
+        rule = parse_rule_file(path)
+        rule.name = path.stem  # filename without .md
+        rules.append(rule)
+    return rules
 ```
 
 ### System Configuration
@@ -468,9 +468,9 @@ def load_policies(policies_dir: Path) -> list[Policy]:
 In `.deepwork/config.yml`:
 
 ```yaml
-policy:
+rules:
   enabled: true
-  policies_dir: .deepwork/policies  # Can be customized
+  rules_dir: .deepwork/rules  # Can be customized
 ```
 
 ## Performance Considerations
@@ -484,30 +484,30 @@ policy:
 ### Lazy Evaluation
 
 - Patterns only compiled when needed
-- File lists only computed for triggered policies
-- Instructions only loaded when policy fires
+- File lists only computed for triggered rules
+- Instructions only loaded when rule fires
 
 ### Parallel Processing
 
 - Multiple queue entries can be processed in parallel
 - Command actions can run concurrently (with file locking)
-- Pattern matching is parallelized across policies
+- Pattern matching is parallelized across rules
 
 ## Migration from Legacy System
 
-The legacy system used a single `.deepwork.policy.yml` file with array of policies. The new system uses individual markdown files in `.deepwork/policies/`.
+The legacy system used a single `.deepwork.rules.yml` file with array of rules. The new system uses individual markdown files in `.deepwork/rules/`.
 
 **Breaking Changes:**
 - Single YAML file replaced with folder of markdown files
-- Policy `name` field replaced with filename
+- Rule `name` field replaced with filename
 - `instructions` / `instructions_file` replaced with markdown body
 - New features: sets, pairs, commands, queue-based state
 
-**No backwards compatibility is provided.** Existing `.deepwork.policy.yml` files must be converted manually.
+**No backwards compatibility is provided.** Existing `.deepwork.rules.yml` files must be converted manually.
 
 **Conversion Example:**
 
-Old format (`.deepwork.policy.yml`):
+Old format (`.deepwork.rules.yml`):
 ```yaml
 - name: "README Accuracy"
   trigger: "src/**/*"
@@ -516,7 +516,7 @@ Old format (`.deepwork.policy.yml`):
     Please verify README.md is accurate.
 ```
 
-New format (`.deepwork/policies/readme-accuracy.md`):
+New format (`.deepwork/rules/readme-accuracy.md`):
 ```markdown
 ---
 trigger: src/**/*
@@ -542,6 +542,6 @@ Please verify README.md is accurate.
 
 ### Input Validation
 
-- All policy files validated against schema
+- All rule files validated against schema
 - Pattern variables sanitized before use
 - File paths normalized and validated
diff --git a/doc/test_scenarios.md b/doc/test_scenarios.md
index 9ef03c0a..137120c1 100644
--- a/doc/test_scenarios.md
+++ b/doc/test_scenarios.md
@@ -1,6 +1,6 @@
-# Policy System Test Scenarios
+# Rules System Test Scenarios
 
-This document describes test scenarios for validating the policy system implementation.
+This document describes test scenarios for validating the rules system implementation.
 
 ## 1. Pattern Matching
 
@@ -44,19 +44,19 @@ This document describes test scenarios for validating the policy system implemen
 | PM-1.3.2 | Nested path | `tests/{path}_test.py` | `{path: "a/b/c"}` | `tests/a/b/c_test.py` |
 | PM-1.3.3 | Multiple vars | `{dir}/test_{name}.py` | `{dir: "tests", name: "foo"}` | `tests/test_foo.py` |
 
-## 2. Instruction Policies
+## 2. Instruction Rules
 
 ### 2.1 Basic Trigger/Safety
 
 | ID | Scenario | Changed Files | Trigger | Safety | Expected |
 |----|----------|---------------|---------|--------|----------|
-| IP-2.1.1 | Trigger match, no safety | `["src/main.py"]` | `src/**/*.py` | None | Fire |
-| IP-2.1.2 | Trigger match, safety match | `["src/main.py", "README.md"]` | `src/**/*.py` | `README.md` | No fire |
-| IP-2.1.3 | Trigger no match | `["docs/readme.md"]` | `src/**/*.py` | None | No fire |
-| IP-2.1.4 | Multiple triggers, one match | `["lib/utils.py"]` | `["src/**/*.py", "lib/**/*.py"]` | None | Fire |
-| IP-2.1.5 | Safety match only | `["README.md"]` | `src/**/*.py` | `README.md` | No fire |
-| IP-2.1.6 | Multiple safety, one match | `["src/main.py", "CHANGELOG.md"]` | `src/**/*.py` | `["README.md", "CHANGELOG.md"]` | No fire |
-| IP-2.1.7 | Multiple triggers, multiple files | `["src/a.py", "lib/b.py"]` | `["src/**/*.py", "lib/**/*.py"]` | None | Fire |
+| IR-2.1.1 | Trigger match, no safety | `["src/main.py"]` | `src/**/*.py` | None | Fire |
+| IR-2.1.2 | Trigger match, safety match | `["src/main.py", "README.md"]` | `src/**/*.py` | `README.md` | No fire |
+| IR-2.1.3 | Trigger no match | `["docs/readme.md"]` | `src/**/*.py` | None | No fire |
+| IR-2.1.4 | Multiple triggers, one match | `["lib/utils.py"]` | `["src/**/*.py", "lib/**/*.py"]` | None | Fire |
+| IR-2.1.5 | Safety match only | `["README.md"]` | `src/**/*.py` | `README.md` | No fire |
+| IR-2.1.6 | Multiple safety, one match | `["src/main.py", "CHANGELOG.md"]` | `src/**/*.py` | `["README.md", "CHANGELOG.md"]` | No fire |
+| IR-2.1.7 | Multiple triggers, multiple files | `["src/a.py", "lib/b.py"]` | `["src/**/*.py", "lib/**/*.py"]` | None | Fire |
 
 ### 2.2 Compare Modes
 
@@ -70,23 +70,23 @@ Setup: Branch diverged 3 commits ago from main
 
 | ID | Scenario | compare_to | Expected Changed Files |
 |----|----------|------------|----------------------|
-| IP-2.2.1 | Base comparison | `base` | `["src/feature.py", "tests/feature_test.py", "src/utils.py"]` |
-| IP-2.2.2 | Default tip (main ahead 1) | `default_tip` | All base + main's changes |
-| IP-2.2.3 | Prompt baseline (captured after commit 2) | `prompt` | `["tests/feature_test.py", "src/utils.py"]` |
+| IR-2.2.1 | Base comparison | `base` | `["src/feature.py", "tests/feature_test.py", "src/utils.py"]` |
+| IR-2.2.2 | Default tip (main ahead 1) | `default_tip` | All base + main's changes |
+| IR-2.2.3 | Prompt baseline (captured after commit 2) | `prompt` | `["tests/feature_test.py", "src/utils.py"]` |
 
 ### 2.3 Promise Tags
 
-Promise tags use the policy's `name` field (not filename) with a checkmark prefix for human readability.
+Promise tags use the rule's `name` field (not filename) with a checkmark prefix for human readability.
 
-| ID | Scenario | Conversation Contains | Policy `name` | Expected |
+| ID | Scenario | Conversation Contains | Rule `name` | Expected |
 |----|----------|----------------------|---------------|----------|
-| IP-2.3.1 | Standard promise | `<promise>✓ README Accuracy</promise>` | `README Accuracy` | Suppressed |
-| IP-2.3.2 | Without checkmark | `<promise>README Accuracy</promise>` | `README Accuracy` | Suppressed |
-| IP-2.3.3 | Case insensitive | `<promise>✓ readme accuracy</promise>` | `README Accuracy` | Suppressed |
-| IP-2.3.4 | Whitespace | `<promise>  ✓ README Accuracy  </promise>` | `README Accuracy` | Suppressed |
-| IP-2.3.5 | No promise | (none) | `README Accuracy` | Not suppressed |
-| IP-2.3.6 | Wrong promise | `<promise>✓ Other Policy</promise>` | `README Accuracy` | Not suppressed |
-| IP-2.3.7 | Multiple promises | `<promise>✓ A</promise><promise>✓ B</promise>` | `A` | Suppressed |
+| IR-2.3.1 | Standard promise | `<promise>✓ README Accuracy</promise>` | `README Accuracy` | Suppressed |
+| IR-2.3.2 | Without checkmark | `<promise>README Accuracy</promise>` | `README Accuracy` | Suppressed |
+| IR-2.3.3 | Case insensitive | `<promise>✓ readme accuracy</promise>` | `README Accuracy` | Suppressed |
+| IR-2.3.4 | Whitespace | `<promise>  ✓ README Accuracy  </promise>` | `README Accuracy` | Suppressed |
+| IR-2.3.5 | No promise | (none) | `README Accuracy` | Not suppressed |
+| IR-2.3.6 | Wrong promise | `<promise>✓ Other Rule</promise>` | `README Accuracy` | Not suppressed |
+| IR-2.3.7 | Multiple promises | `<promise>✓ A</promise><promise>✓ B</promise>` | `A` | Suppressed |
 
 ## 3. Correspondence Sets
 
@@ -168,7 +168,7 @@ pair:
 | CP-4.2.3 | Only trigger | `["api/users.py"]` | Fire (missing both) |
 | CP-4.2.4 | Both expects only | `["docs/api/users.md", "openapi/users.yaml"]` | No fire |
 
-## 5. Command Policies
+## 5. Command Rules
 
 ### 5.1 Basic Commands
 
@@ -224,12 +224,12 @@ action:
 
 ### 6.2 Hash Calculation
 
-| ID | Scenario | Policy | Files | Baseline | Expected Hash Differs? |
+| ID | Scenario | Rule | Files | Baseline | Expected Hash Differs? |
 |----|----------|--------|-------|----------|------------------------|
-| QS-6.2.1 | Same everything | PolicyA | `[a.py]` | commit1 | Same hash |
-| QS-6.2.2 | Different files | PolicyA | `[a.py]` vs `[b.py]` | commit1 | Different |
-| QS-6.2.3 | Different baseline | PolicyA | `[a.py]` | commit1 vs commit2 | Different |
-| QS-6.2.4 | Different policy | PolicyA vs PolicyB | `[a.py]` | commit1 | Different |
+| QS-6.2.1 | Same everything | RuleA | `[a.py]` | commit1 | Same hash |
+| QS-6.2.2 | Different files | RuleA | `[a.py]` vs `[b.py]` | commit1 | Different |
+| QS-6.2.3 | Different baseline | RuleA | `[a.py]` | commit1 vs commit2 | Different |
+| QS-6.2.4 | Different rule | RuleA vs RuleB | `[a.py]` | commit1 | Different |
 
 ### 6.3 Queue Cleanup
 
@@ -253,20 +253,20 @@ action:
 
 ### 7.1 Output Batching
 
-| ID | Scenario | Triggered Policies | Expected Output |
-|----|----------|-------------------|-----------------|
-| OM-7.1.1 | Single policy | 1 | Full instructions |
-| OM-7.1.2 | Two policies | 2 | Both, grouped |
-| OM-7.1.3 | Many policies | 10 | Batched by policy name |
-| OM-7.1.4 | Same policy multiple files | 3 Source/Test pairs | Grouped under single heading |
+| ID | Scenario | Triggered Rules | Expected Output |
+|----|----------|-----------------|-----------------|
+| OM-7.1.1 | Single rule | 1 | Full instructions |
+| OM-7.1.2 | Two rules | 2 | Both, grouped |
+| OM-7.1.3 | Many rules | 10 | Batched by rule name |
+| OM-7.1.4 | Same rule multiple files | 3 Source/Test pairs | Grouped under single heading |
 
 ### 7.2 Output Format
 
 | ID | Scenario | Input | Expected Format |
 |----|----------|-------|-----------------|
 | OM-7.2.1 | Correspondence violation | `src/foo.py` missing `tests/foo_test.py` | `src/foo.py → tests/foo_test.py` |
-| OM-7.2.2 | Multiple same policy | 3 correspondence violations | Single heading, 3 lines |
-| OM-7.2.3 | Instruction policy | Source files changed | Short summary + instructions |
+| OM-7.2.2 | Multiple same rule | 3 correspondence violations | Single heading, 3 lines |
+| OM-7.2.3 | Instruction rule | Source files changed | Short summary + instructions |
 
 ## 8. Schema Validation
 
@@ -276,7 +276,7 @@ action:
 |----|----------|---------------|----------------|
 | SV-8.1.1 | Missing name | `name` | "required field 'name'" |
 | SV-8.1.2 | Missing detection mode | no `trigger`, `set`, or `pair` | "must have 'trigger', 'set', or 'pair'" |
-| SV-8.1.3 | Missing markdown body | empty body (prompt action) | "instruction policies require markdown body" |
+| SV-8.1.3 | Missing markdown body | empty body (prompt action) | "instruction rules require markdown body" |
 | SV-8.1.4 | Missing set patterns | `set` is empty | "set requires at least 2 patterns" |
 
 ### 8.2 Mutually Exclusive Fields
@@ -305,26 +305,26 @@ action:
 
 ## 9. Integration Tests
 
-### 9.1 End-to-End Instruction Policy
+### 9.1 End-to-End Instruction Rule
 
 ```
-Given: Policy requiring tests for source changes
+Given: Rule requiring tests for source changes
 When: User modifies src/auth/login.py without test
 Then:
   1. Stop hook fires
   2. Detector creates queue entry
   3. Evaluator returns instructions
-  4. Agent sees policy message
+  4. Agent sees rule message
   5. Agent adds tests
   6. Agent includes promise tag
   7. Next stop: queue entry marked passed
   8. Agent can stop successfully
 ```
 
-### 9.2 End-to-End Command Policy
+### 9.2 End-to-End Command Rule
 
 ```
-Given: Auto-format policy for Python files
+Given: Auto-format rule for Python files
 When: User creates unformatted src/new_file.py
 Then:
   1. Stop hook fires
@@ -339,7 +339,7 @@ Then:
 ### 9.3 End-to-End Correspondence Set
 
 ```
-Given: Source/test pairing policy
+Given: Source/test pairing rule
 When: User modifies src/utils.py only
 Then:
   1. Detector matches src/utils.py to pattern
@@ -350,33 +350,33 @@ Then:
   6. Agent sees "expected tests/utils_test.py to change"
 ```
 
-### 9.4 Multiple Policies Same File
+### 9.4 Multiple Rules Same File
 
 ```
 Given:
-  - Policy A: "Format Python" (command)
-  - Policy B: "Test Coverage" (set)
-  - Policy C: "README Accuracy" (instruction)
+  - Rule A: "Format Python" (command)
+  - Rule B: "Test Coverage" (set)
+  - Rule C: "README Accuracy" (instruction)
 When: User modifies src/main.py
 Then:
-  1. All three policies trigger
-  2. Command policy runs first
-  3. Set policy checks for test
-  4. Instruction policy prepares message
+  1. All three rules trigger
+  2. Command rule runs first
+  3. Set rule checks for test
+  4. Instruction rule prepares message
   5. Agent sees batched output with all requirements
 ```
 
-### 9.5 Safety Pattern Across Policies
+### 9.5 Safety Pattern Across Rules
 
 ```
 Given:
-  - Policy A: trigger=src/**/*.py, safety=CHANGELOG.md
-  - Policy B: trigger=src/**/*.py, safety=README.md
+  - Rule A: trigger=src/**/*.py, safety=CHANGELOG.md
+  - Rule B: trigger=src/**/*.py, safety=README.md
 When: User modifies src/main.py and CHANGELOG.md
 Then:
-  1. Policy A: safety match, skipped
-  2. Policy B: no safety match, fires
-  3. Only Policy B instructions shown
+  1. Rule A: safety match, skipped
+  2. Rule B: no safety match, fires
+  3. Only Rule B instructions shown
 ```
 
 ## 10. Performance Tests
@@ -387,7 +387,7 @@ Then:
 |----|----------|------------|----------|
 | PT-10.1.1 | Many changed files | 100 | < 1s evaluation |
 | PT-10.1.2 | Very many files | 1000 | < 5s evaluation |
-| PT-10.1.3 | Pattern-heavy | 50 policies, 100 files | < 2s evaluation |
+| PT-10.1.3 | Pattern-heavy | 50 rules, 100 files | < 2s evaluation |
 
 ### 10.2 Queue Size
 
@@ -407,11 +407,11 @@ Then:
 
 ## Test Data Fixtures
 
-### Sample Policy Files
+### Sample Rule Files
 
-Policies are stored as individual markdown files in `.deepwork/policies/`:
+Rules are stored as individual markdown files in `.deepwork/rules/`:
 
-**`.deepwork/policies/readme-accuracy.md`**
+**`.deepwork/rules/readme-accuracy.md`**
 ```markdown
 ---
 name: README Accuracy
@@ -421,7 +421,7 @@ safety: README.md
 Please review README.md for accuracy.
 ```
 
-**`.deepwork/policies/source-test-pairing.md`**
+**`.deepwork/rules/source-test-pairing.md`**
 ```markdown
 ---
 name: Source/Test Pairing
@@ -432,7 +432,7 @@ set:
 Source and test should change together.
 ```
 
-**`.deepwork/policies/api-documentation.md`**
+**`.deepwork/rules/api-documentation.md`**
 ```markdown
 ---
 name: API Documentation
@@ -443,7 +443,7 @@ pair:
 API changes need documentation.
 ```
 
-**`.deepwork/policies/python-formatting.md`**
+**`.deepwork/rules/python-formatting.md`**
 ```markdown
 ---
 name: Python Formatting
@@ -459,8 +459,8 @@ Auto-formats Python files with Black.
 
 ```json
 {
-  "policy_name": "Source/Test Pairing",
-  "policy_file": "source-test-pairing.md",
+  "rule_name": "Source/Test Pairing",
+  "rule_file": "source-test-pairing.md",
   "trigger_hash": "abc123def456",
   "status": "queued",
   "created_at": "2024-01-16T10:00:00Z",
@@ -477,13 +477,13 @@ Auto-formats Python files with Black.
 
 ```
 .deepwork/
-├── policies/
+├── rules/
 │   ├── readme-accuracy.md
 │   ├── source-test-pairing.md
 │   ├── api-documentation.md
 │   └── python-formatting.md
 └── tmp/                         # GITIGNORED
-    └── policy/
+    └── rules/
         └── queue/
             └── (queue entries created during tests)
 ```
diff --git a/src/deepwork/cli/install.py b/src/deepwork/cli/install.py
index d65c5a4e..ce7608b1 100644
--- a/src/deepwork/cli/install.py
+++ b/src/deepwork/cli/install.py
@@ -73,9 +73,9 @@ def _inject_deepwork_jobs(jobs_dir: Path, project_path: Path) -> None:
     _inject_standard_job("deepwork_jobs", jobs_dir, project_path)
 
 
-def _inject_deepwork_policy(jobs_dir: Path, project_path: Path) -> None:
+def _inject_deepwork_rules(jobs_dir: Path, project_path: Path) -> None:
     """
-    Inject the deepwork_policy job definition into the project.
+    Inject the deepwork_rules job definition into the project.
 
     Args:
         jobs_dir: Path to .deepwork/jobs directory
@@ -84,7 +84,7 @@ def _inject_deepwork_policy(jobs_dir: Path, project_path: Path) -> None:
     Raises:
         InstallError: If injection fails
     """
-    _inject_standard_job("deepwork_policy", jobs_dir, project_path)
+    _inject_standard_job("deepwork_rules", jobs_dir, project_path)
 
 
 def _create_deepwork_gitignore(deepwork_dir: Path) -> None:
@@ -98,7 +98,7 @@ def _create_deepwork_gitignore(deepwork_dir: Path) -> None:
     """
     gitignore_path = deepwork_dir / ".gitignore"
     gitignore_content = """# DeepWork temporary files
-# These files are used for policy evaluation during sessions
+# These files are used for rules evaluation during sessions
 .last_work_tree
 """
 
@@ -113,9 +113,9 @@ def _create_deepwork_gitignore(deepwork_dir: Path) -> None:
         gitignore_path.write_text(gitignore_content)
 
 
-def _create_default_policy_file(project_path: Path) -> bool:
+def _create_default_rules_file(project_path: Path) -> bool:
     """
-    Create a default policy file template in the project root.
+    Create a default rules file template in the project root.
 
     Only creates the file if it doesn't already exist.
 
@@ -125,26 +125,26 @@ def _create_default_policy_file(project_path: Path) -> bool:
     Returns:
         True if the file was created, False if it already existed
     """
-    policy_file = project_path / ".deepwork.policy.yml"
+    rules_file = project_path / ".deepwork.rules.yml"
 
-    if policy_file.exists():
+    if rules_file.exists():
         return False
 
     # Copy the template from the templates directory
-    template_path = Path(__file__).parent.parent / "templates" / "default_policy.yml"
+    template_path = Path(__file__).parent.parent / "templates" / "default_rules.yml"
 
     if template_path.exists():
-        shutil.copy(template_path, policy_file)
+        shutil.copy(template_path, rules_file)
     else:
         # Fallback: create a minimal template inline
-        policy_file.write_text(
-            """# DeepWork Policy Configuration
+        rules_file.write_text(
+            """# DeepWork Rules Configuration
 #
-# Policies are automated guardrails that trigger when specific files change.
-# Use /deepwork_policy.define to create new policies interactively.
+# Rules are automated guardrails that trigger when specific files change.
+# Use /deepwork_rules.define to create new rules interactively.
 #
 # Format:
-#   - name: "Policy name"
+#   - name: "Rule name"
 #     trigger: "glob/pattern/**/*"
 #     safety: "optional/pattern/**/*"
 #     instructions: |
@@ -271,17 +271,17 @@ def _install_deepwork(platform_name: str | None, project_path: Path) -> None:
     # Step 3b: Inject standard jobs (core job definitions)
     console.print("[yellow]→[/yellow] Installing core job definitions...")
     _inject_deepwork_jobs(jobs_dir, project_path)
-    _inject_deepwork_policy(jobs_dir, project_path)
+    _inject_deepwork_rules(jobs_dir, project_path)
 
     # Step 3c: Create .gitignore for temporary files
     _create_deepwork_gitignore(deepwork_dir)
     console.print("  [green]✓[/green] Created .deepwork/.gitignore")
 
-    # Step 3d: Create default policy file template
-    if _create_default_policy_file(project_path):
-        console.print("  [green]✓[/green] Created .deepwork.policy.yml template")
+    # Step 3d: Create default rules file template
+    if _create_default_rules_file(project_path):
+        console.print("  [green]✓[/green] Created .deepwork.rules.yml template")
     else:
-        console.print("  [dim]•[/dim] .deepwork.policy.yml already exists")
+        console.print("  [dim]•[/dim] .deepwork.rules.yml already exists")
 
     # Step 4: Load or create config.yml
     console.print("[yellow]→[/yellow] Updating configuration...")
diff --git a/src/deepwork/core/command_executor.py b/src/deepwork/core/command_executor.py
index 7db8ee2a..9db456ca 100644
--- a/src/deepwork/core/command_executor.py
+++ b/src/deepwork/core/command_executor.py
@@ -1,10 +1,10 @@
-"""Execute command actions for policies."""
+"""Execute command actions for rules."""
 
 import subprocess
 from dataclasses import dataclass
 from pathlib import Path
 
-from deepwork.core.policy_parser import CommandAction
+from deepwork.core.rules_parser import CommandAction
 
 
 @dataclass
@@ -118,7 +118,7 @@ def run_command_action(
 
     Args:
         action: CommandAction configuration
-        trigger_files: Files that triggered the policy
+        trigger_files: Files that triggered the rule
         repo_root: Repository root path
 
     Returns:
diff --git a/src/deepwork/core/pattern_matcher.py b/src/deepwork/core/pattern_matcher.py
index 215b1d9a..9d80549b 100644
--- a/src/deepwork/core/pattern_matcher.py
+++ b/src/deepwork/core/pattern_matcher.py
@@ -1,4 +1,4 @@
-"""Pattern matching with variable extraction for policy file correspondence."""
+"""Pattern matching with variable extraction for rule file correspondence."""
 
 import re
 from dataclasses import dataclass
diff --git a/src/deepwork/core/policy_parser.py b/src/deepwork/core/rules_parser.py
similarity index 68%
rename from src/deepwork/core/policy_parser.py
rename to src/deepwork/core/rules_parser.py
index 21726079..270d1ba2 100644
--- a/src/deepwork/core/policy_parser.py
+++ b/src/deepwork/core/rules_parser.py
@@ -1,4 +1,4 @@
-"""Policy definition parser (v2 - frontmatter markdown format)."""
+"""Rule definition parser (v2 - frontmatter markdown format)."""
 
 from dataclasses import dataclass, field
 from enum import Enum
@@ -13,18 +13,18 @@
     matches_any_pattern,
     resolve_pattern,
 )
-from deepwork.schemas.policy_schema import POLICY_FRONTMATTER_SCHEMA
+from deepwork.schemas.rules_schema import RULES_FRONTMATTER_SCHEMA
 from deepwork.utils.validation import ValidationError, validate_against_schema
 
 
-class PolicyParseError(Exception):
-    """Exception raised for policy parsing errors."""
+class RulesParseError(Exception):
+    """Exception raised for rule parsing errors."""
 
     pass
 
 
 class DetectionMode(Enum):
-    """How the policy detects when to fire."""
+    """How the rule detects when to fire."""
 
     TRIGGER_SAFETY = "trigger_safety"  # Fire when trigger matches, safety doesn't
     SET = "set"  # Bidirectional file correspondence
@@ -32,7 +32,7 @@ class DetectionMode(Enum):
 
 
 class ActionType(Enum):
-    """What happens when the policy fires."""
+    """What happens when the rule fires."""
 
     PROMPT = "prompt"  # Show instructions to agent (default)
     COMMAND = "command"  # Run an idempotent command
@@ -60,8 +60,8 @@ class PairConfig:
 
 
 @dataclass
-class Policy:
-    """Represents a single policy definition (v2 format)."""
+class Rule:
+    """Represents a single rule definition (v2 format)."""
 
     # Identity
     name: str  # Human-friendly name (displayed in promise tags)
@@ -88,9 +88,9 @@ def from_frontmatter(
         frontmatter: dict[str, Any],
         markdown_body: str,
         filename: str,
-    ) -> "Policy":
+    ) -> "Rule":
         """
-        Create Policy from parsed frontmatter and markdown body.
+        Create Rule from parsed frontmatter and markdown body.
 
         Args:
             frontmatter: Parsed YAML frontmatter
@@ -98,15 +98,15 @@ def from_frontmatter(
             filename: Filename without .md extension
 
         Returns:
-            Policy instance
+            Rule instance
 
         Raises:
-            PolicyParseError: If validation fails
+            RulesParseError: If validation fails
         """
         # Get name (required)
         name = frontmatter.get("name", "")
         if not name:
-            raise PolicyParseError(f"Policy '{filename}' missing required 'name' field")
+            raise RulesParseError(f"Rule '{filename}' missing required 'name' field")
 
         # Determine detection mode
         has_trigger = "trigger" in frontmatter
@@ -115,9 +115,9 @@ def from_frontmatter(
 
         mode_count = sum([has_trigger, has_set, has_pair])
         if mode_count == 0:
-            raise PolicyParseError(f"Policy '{name}' must have 'trigger', 'set', or 'pair'")
+            raise RulesParseError(f"Rule '{name}' must have 'trigger', 'set', or 'pair'")
         if mode_count > 1:
-            raise PolicyParseError(f"Policy '{name}' has multiple detection modes - use only one")
+            raise RulesParseError(f"Rule '{name}' has multiple detection modes - use only one")
 
         # Parse based on detection mode
         detection_mode: DetectionMode
@@ -137,7 +137,7 @@ def from_frontmatter(
             detection_mode = DetectionMode.SET
             set_patterns = list(frontmatter["set"])
             if len(set_patterns) < 2:
-                raise PolicyParseError(f"Policy '{name}' set requires at least 2 patterns")
+                raise RulesParseError(f"Rule '{name}' set requires at least 2 patterns")
 
         elif has_pair:
             detection_mode = DetectionMode.PAIR
@@ -164,7 +164,7 @@ def from_frontmatter(
             action_type = ActionType.PROMPT
             # Markdown body is the instructions
             if not markdown_body.strip():
-                raise PolicyParseError(f"Policy '{name}' with prompt action requires markdown body")
+                raise RulesParseError(f"Rule '{name}' with prompt action requires markdown body")
 
         # Get compare_to
         compare_to = frontmatter.get("compare_to", DEFAULT_COMPARE_TO)
@@ -195,24 +195,24 @@ def parse_frontmatter_file(filepath: Path) -> tuple[dict[str, Any], str]:
         Tuple of (frontmatter_dict, markdown_body)
 
     Raises:
-        PolicyParseError: If parsing fails
+        RulesParseError: If parsing fails
     """
     try:
         content = filepath.read_text(encoding="utf-8")
     except OSError as e:
-        raise PolicyParseError(f"Failed to read policy file: {e}") from e
+        raise RulesParseError(f"Failed to read rule file: {e}") from e
 
     # Split frontmatter from body
     if not content.startswith("---"):
-        raise PolicyParseError(
-            f"Policy file '{filepath.name}' must start with '---' frontmatter delimiter"
+        raise RulesParseError(
+            f"Rule file '{filepath.name}' must start with '---' frontmatter delimiter"
         )
 
     # Find end of frontmatter
     end_marker = content.find("\n---", 3)
     if end_marker == -1:
-        raise PolicyParseError(
-            f"Policy file '{filepath.name}' missing closing '---' frontmatter delimiter"
+        raise RulesParseError(
+            f"Rule file '{filepath.name}' missing closing '---' frontmatter delimiter"
         )
 
     frontmatter_str = content[4:end_marker]  # Skip initial "---\n"
@@ -222,76 +222,76 @@ def parse_frontmatter_file(filepath: Path) -> tuple[dict[str, Any], str]:
     try:
         frontmatter = yaml.safe_load(frontmatter_str)
     except yaml.YAMLError as e:
-        raise PolicyParseError(f"Invalid YAML frontmatter in '{filepath.name}': {e}") from e
+        raise RulesParseError(f"Invalid YAML frontmatter in '{filepath.name}': {e}") from e
 
     if frontmatter is None:
         frontmatter = {}
 
     if not isinstance(frontmatter, dict):
-        raise PolicyParseError(
+        raise RulesParseError(
             f"Frontmatter in '{filepath.name}' must be a mapping, got {type(frontmatter).__name__}"
         )
 
     return frontmatter, markdown_body
 
 
-def parse_policy_file_v2(filepath: Path) -> Policy:
+def parse_rule_file(filepath: Path) -> Rule:
     """
-    Parse a single policy from a frontmatter markdown file.
+    Parse a single rule from a frontmatter markdown file.
 
     Args:
-        filepath: Path to .md file in .deepwork/policies/
+        filepath: Path to .md file in .deepwork/rules/
 
     Returns:
-        Parsed Policy object
+        Parsed Rule object
 
     Raises:
-        PolicyParseError: If parsing or validation fails
+        RulesParseError: If parsing or validation fails
     """
     if not filepath.exists():
-        raise PolicyParseError(f"Policy file does not exist: {filepath}")
+        raise RulesParseError(f"Rule file does not exist: {filepath}")
 
     if not filepath.is_file():
-        raise PolicyParseError(f"Policy path is not a file: {filepath}")
+        raise RulesParseError(f"Rule path is not a file: {filepath}")
 
     frontmatter, markdown_body = parse_frontmatter_file(filepath)
 
     # Validate against schema
     try:
-        validate_against_schema(frontmatter, POLICY_FRONTMATTER_SCHEMA)
+        validate_against_schema(frontmatter, RULES_FRONTMATTER_SCHEMA)
     except ValidationError as e:
-        raise PolicyParseError(f"Policy '{filepath.name}' validation failed: {e}") from e
+        raise RulesParseError(f"Rule '{filepath.name}' validation failed: {e}") from e
 
-    # Create Policy object
+    # Create Rule object
     filename = filepath.stem  # filename without .md extension
-    return Policy.from_frontmatter(frontmatter, markdown_body, filename)
+    return Rule.from_frontmatter(frontmatter, markdown_body, filename)
 
 
-def load_policies_from_directory(policies_dir: Path) -> list[Policy]:
+def load_rules_from_directory(rules_dir: Path) -> list[Rule]:
     """
-    Load all policies from a directory.
+    Load all rules from a directory.
 
     Args:
-        policies_dir: Path to .deepwork/policies/ directory
+        rules_dir: Path to .deepwork/rules/ directory
 
     Returns:
-        List of parsed Policy objects (sorted by filename)
+        List of parsed Rule objects (sorted by filename)
 
     Raises:
-        PolicyParseError: If any policy file fails to parse
+        RulesParseError: If any rule file fails to parse
     """
-    if not policies_dir.exists():
+    if not rules_dir.exists():
         return []
 
-    if not policies_dir.is_dir():
-        raise PolicyParseError(f"Policies path is not a directory: {policies_dir}")
+    if not rules_dir.is_dir():
+        raise RulesParseError(f"Rules path is not a directory: {rules_dir}")
 
-    policies = []
-    for filepath in sorted(policies_dir.glob("*.md")):
-        policy = parse_policy_file_v2(filepath)
-        policies.append(policy)
+    rules = []
+    for filepath in sorted(rules_dir.glob("*.md")):
+        rule = parse_rule_file(filepath)
+        rules.append(rule)
 
-    return policies
+    return rules
 
 
 # =============================================================================
@@ -300,20 +300,20 @@ def load_policies_from_directory(policies_dir: Path) -> list[Policy]:
 
 
 def evaluate_trigger_safety(
-    policy: Policy,
+    rule: Rule,
     changed_files: list[str],
 ) -> bool:
     """
-    Evaluate a trigger/safety mode policy.
+    Evaluate a trigger/safety mode rule.
 
-    Returns True if policy should fire:
+    Returns True if rule should fire:
     - At least one changed file matches a trigger pattern
     - AND no changed file matches a safety pattern
     """
     # Check if any trigger matches
     trigger_matched = False
     for file_path in changed_files:
-        if matches_any_pattern(file_path, policy.triggers):
+        if matches_any_pattern(file_path, rule.triggers):
             trigger_matched = True
             break
 
@@ -321,20 +321,20 @@ def evaluate_trigger_safety(
         return False
 
     # Check if any safety pattern matches
-    if policy.safety:
+    if rule.safety:
         for file_path in changed_files:
-            if matches_any_pattern(file_path, policy.safety):
+            if matches_any_pattern(file_path, rule.safety):
                 return False
 
     return True
 
 
 def evaluate_set_correspondence(
-    policy: Policy,
+    rule: Rule,
     changed_files: list[str],
 ) -> tuple[bool, list[str], list[str]]:
     """
-    Evaluate a set (bidirectional correspondence) policy.
+    Evaluate a set (bidirectional correspondence) rule.
 
     Returns:
         Tuple of (should_fire, trigger_files, missing_files)
@@ -348,13 +348,13 @@ def evaluate_set_correspondence(
 
     for file_path in changed_files:
         # Check each pattern in the set
-        for pattern in policy.set_patterns:
+        for pattern in rule.set_patterns:
             result = match_pattern(pattern, file_path)
             if result.matched:
                 trigger_files.append(file_path)
 
                 # Check if all other corresponding files also changed
-                for other_pattern in policy.set_patterns:
+                for other_pattern in rule.set_patterns:
                     if other_pattern == pattern:
                         continue
 
@@ -369,17 +369,17 @@ def evaluate_set_correspondence(
 
                 break  # Only match one pattern per file
 
-    # Policy fires if there are trigger files with missing correspondences
+    # Rule fires if there are trigger files with missing correspondences
     should_fire = len(trigger_files) > 0 and len(missing_files) > 0
     return should_fire, trigger_files, missing_files
 
 
 def evaluate_pair_correspondence(
-    policy: Policy,
+    rule: Rule,
     changed_files: list[str],
 ) -> tuple[bool, list[str], list[str]]:
     """
-    Evaluate a pair (directional correspondence) policy.
+    Evaluate a pair (directional correspondence) rule.
 
     Only trigger-side changes require corresponding expected files.
     Expected-side changes alone do not trigger.
@@ -387,15 +387,15 @@ def evaluate_pair_correspondence(
     Returns:
         Tuple of (should_fire, trigger_files, missing_files)
     """
-    if policy.pair_config is None:
+    if rule.pair_config is None:
         return False, [], []
 
     trigger_files: list[str] = []
     missing_files: list[str] = []
     changed_set = set(changed_files)
 
-    trigger_pattern = policy.pair_config.trigger
-    expects_patterns = policy.pair_config.expects
+    trigger_pattern = rule.pair_config.trigger
+    expects_patterns = rule.pair_config.expects
 
     for file_path in changed_files:
         # Only check trigger pattern (directional)
@@ -419,94 +419,94 @@ def evaluate_pair_correspondence(
 
 
 @dataclass
-class PolicyEvaluationResult:
-    """Result of evaluating a single policy."""
+class RuleEvaluationResult:
+    """Result of evaluating a single rule."""
 
-    policy: Policy
+    rule: Rule
     should_fire: bool
     trigger_files: list[str] = field(default_factory=list)
     missing_files: list[str] = field(default_factory=list)  # For set/pair modes
 
 
-def evaluate_policy(policy: Policy, changed_files: list[str]) -> PolicyEvaluationResult:
+def evaluate_rule(rule: Rule, changed_files: list[str]) -> RuleEvaluationResult:
     """
-    Evaluate whether a policy should fire based on changed files.
+    Evaluate whether a rule should fire based on changed files.
 
     Args:
-        policy: Policy to evaluate
+        rule: Rule to evaluate
         changed_files: List of changed file paths (relative)
 
     Returns:
-        PolicyEvaluationResult with evaluation details
+        RuleEvaluationResult with evaluation details
     """
-    if policy.detection_mode == DetectionMode.TRIGGER_SAFETY:
-        should_fire = evaluate_trigger_safety(policy, changed_files)
+    if rule.detection_mode == DetectionMode.TRIGGER_SAFETY:
+        should_fire = evaluate_trigger_safety(rule, changed_files)
         trigger_files = (
-            [f for f in changed_files if matches_any_pattern(f, policy.triggers)]
+            [f for f in changed_files if matches_any_pattern(f, rule.triggers)]
             if should_fire
             else []
         )
-        return PolicyEvaluationResult(
-            policy=policy,
+        return RuleEvaluationResult(
+            rule=rule,
             should_fire=should_fire,
             trigger_files=trigger_files,
         )
 
-    elif policy.detection_mode == DetectionMode.SET:
+    elif rule.detection_mode == DetectionMode.SET:
         should_fire, trigger_files, missing_files = evaluate_set_correspondence(
-            policy, changed_files
+            rule, changed_files
         )
-        return PolicyEvaluationResult(
-            policy=policy,
+        return RuleEvaluationResult(
+            rule=rule,
             should_fire=should_fire,
             trigger_files=trigger_files,
             missing_files=missing_files,
         )
 
-    elif policy.detection_mode == DetectionMode.PAIR:
+    elif rule.detection_mode == DetectionMode.PAIR:
         should_fire, trigger_files, missing_files = evaluate_pair_correspondence(
-            policy, changed_files
+            rule, changed_files
         )
-        return PolicyEvaluationResult(
-            policy=policy,
+        return RuleEvaluationResult(
+            rule=rule,
             should_fire=should_fire,
             trigger_files=trigger_files,
             missing_files=missing_files,
         )
 
-    return PolicyEvaluationResult(policy=policy, should_fire=False)
+    return RuleEvaluationResult(rule=rule, should_fire=False)
 
 
-def evaluate_policies(
-    policies: list[Policy],
+def evaluate_rules(
+    rules: list[Rule],
     changed_files: list[str],
-    promised_policies: set[str] | None = None,
-) -> list[PolicyEvaluationResult]:
+    promised_rules: set[str] | None = None,
+) -> list[RuleEvaluationResult]:
     """
-    Evaluate which policies should fire.
+    Evaluate which rules should fire.
 
     Args:
-        policies: List of policies to evaluate
+        rules: List of rules to evaluate
         changed_files: List of changed file paths (relative)
-        promised_policies: Set of policy names that have been marked as addressed
+        promised_rules: Set of rule names that have been marked as addressed
                           via <promise> tags (case-insensitive)
 
     Returns:
-        List of PolicyEvaluationResult for policies that should fire
+        List of RuleEvaluationResult for rules that should fire
     """
-    if promised_policies is None:
-        promised_policies = set()
+    if promised_rules is None:
+        promised_rules = set()
 
     # Normalize promised names for case-insensitive comparison
-    promised_lower = {name.lower() for name in promised_policies}
+    promised_lower = {name.lower() for name in promised_rules}
 
     results = []
-    for policy in policies:
+    for rule in rules:
         # Skip if already promised/addressed (case-insensitive)
-        if policy.name.lower() in promised_lower:
+        if rule.name.lower() in promised_lower:
             continue
 
-        result = evaluate_policy(policy, changed_files)
+        result = evaluate_rule(rule, changed_files)
         if result.should_fire:
             results.append(result)
 
diff --git a/src/deepwork/core/policy_queue.py b/src/deepwork/core/rules_queue.py
similarity index 88%
rename from src/deepwork/core/policy_queue.py
rename to src/deepwork/core/rules_queue.py
index 44046832..8f6ec430 100644
--- a/src/deepwork/core/policy_queue.py
+++ b/src/deepwork/core/rules_queue.py
@@ -1,4 +1,4 @@
-"""Queue system for tracking policy state in .deepwork/tmp/policy/queue/."""
+"""Queue system for tracking rule state in .deepwork/tmp/rules/queue/."""
 
 import hashlib
 import json
@@ -13,14 +13,14 @@ class QueueEntryStatus(Enum):
     """Status of a queue entry."""
 
     QUEUED = "queued"  # Detected, awaiting evaluation
-    PASSED = "passed"  # Evaluated, policy satisfied (promise found or action succeeded)
-    FAILED = "failed"  # Evaluated, policy not satisfied
+    PASSED = "passed"  # Evaluated, rule satisfied (promise found or action succeeded)
+    FAILED = "failed"  # Evaluated, rule not satisfied
     SKIPPED = "skipped"  # Safety pattern matched, skipped
 
 
 @dataclass
 class ActionResult:
-    """Result of executing a policy action."""
+    """Result of executing a rule action."""
 
     type: str  # "prompt" or "command"
     output: str | None = None  # Command stdout or prompt message shown
@@ -29,11 +29,11 @@ class ActionResult:
 
 @dataclass
 class QueueEntry:
-    """A single entry in the policy queue."""
+    """A single entry in the rules queue."""
 
     # Identity
-    policy_name: str  # Human-friendly name
-    policy_file: str  # Filename (e.g., "source-test-pairing.md")
+    rule_name: str  # Human-friendly name
+    rule_file: str  # Filename (e.g., "source-test-pairing.md")
     trigger_hash: str  # Hash for deduplication
 
     # State
@@ -70,8 +70,8 @@ def from_dict(cls, data: dict[str, Any]) -> "QueueEntry":
             action_result = ActionResult(**data["action_result"])
 
         return cls(
-            policy_name=data["policy_name"],
-            policy_file=data["policy_file"],
+            rule_name=data.get("rule_name", data.get("policy_name", "")),
+            rule_file=data.get("rule_file", data.get("policy_file", "")),
             trigger_hash=data["trigger_hash"],
             status=QueueEntryStatus(data["status"]),
             created_at=data.get("created_at", ""),
@@ -85,7 +85,7 @@ def from_dict(cls, data: dict[str, Any]) -> "QueueEntry":
 
 
 def compute_trigger_hash(
-    policy_name: str,
+    rule_name: str,
     trigger_files: list[str],
     baseline_ref: str,
 ) -> str:
@@ -93,20 +93,20 @@ def compute_trigger_hash(
     Compute a hash for deduplication.
 
     The hash is based on:
-    - Policy name
+    - Rule name
     - Sorted list of trigger files
     - Baseline reference (commit hash or timestamp)
 
     Returns:
         12-character hex hash
     """
-    hash_input = f"{policy_name}:{sorted(trigger_files)}:{baseline_ref}"
+    hash_input = f"{rule_name}:{sorted(trigger_files)}:{baseline_ref}"
     return hashlib.sha256(hash_input.encode()).hexdigest()[:12]
 
 
-class PolicyQueue:
+class RulesQueue:
     """
-    Manages the policy queue in .deepwork/tmp/policy/queue/.
+    Manages the rules queue in .deepwork/tmp/rules/queue/.
 
     Queue entries are stored as JSON files named {hash}.{status}.json
     """
@@ -116,10 +116,10 @@ def __init__(self, queue_dir: Path | None = None):
         Initialize the queue.
 
         Args:
-            queue_dir: Path to queue directory. Defaults to .deepwork/tmp/policy/queue/
+            queue_dir: Path to queue directory. Defaults to .deepwork/tmp/rules/queue/
         """
         if queue_dir is None:
-            queue_dir = Path(".deepwork/tmp/policy/queue")
+            queue_dir = Path(".deepwork/tmp/rules/queue")
         self.queue_dir = queue_dir
 
     def _ensure_dir(self) -> None:
@@ -157,8 +157,8 @@ def get_entry(self, trigger_hash: str) -> QueueEntry | None:
 
     def create_entry(
         self,
-        policy_name: str,
-        policy_file: str,
+        rule_name: str,
+        rule_file: str,
         trigger_files: list[str],
         baseline_ref: str,
         expected_files: list[str] | None = None,
@@ -167,16 +167,16 @@ def create_entry(
         Create a new queue entry if one doesn't already exist.
 
         Args:
-            policy_name: Human-friendly policy name
-            policy_file: Policy filename (e.g., "source-test-pairing.md")
-            trigger_files: Files that triggered the policy
+            rule_name: Human-friendly rule name
+            rule_file: Rule filename (e.g., "source-test-pairing.md")
+            trigger_files: Files that triggered the rule
             baseline_ref: Baseline reference for change detection
             expected_files: Expected corresponding files (for set/pair)
 
         Returns:
             Created QueueEntry, or None if entry already exists
         """
-        trigger_hash = compute_trigger_hash(policy_name, trigger_files, baseline_ref)
+        trigger_hash = compute_trigger_hash(rule_name, trigger_files, baseline_ref)
 
         # Check if already exists
         if self.has_entry(trigger_hash):
@@ -185,8 +185,8 @@ def create_entry(
         self._ensure_dir()
 
         entry = QueueEntry(
-            policy_name=policy_name,
-            policy_file=policy_file,
+            rule_name=rule_name,
+            rule_file=rule_file,
             trigger_hash=trigger_hash,
             status=QueueEntryStatus.QUEUED,
             baseline_ref=baseline_ref,
diff --git a/src/deepwork/hooks/README.md b/src/deepwork/hooks/README.md
index 84914a10..9c3dd887 100644
--- a/src/deepwork/hooks/README.md
+++ b/src/deepwork/hooks/README.md
@@ -16,7 +16,7 @@ The hook system provides:
    - Cross-platform compatibility
 
 3. **Hook implementations**:
-   - `policy_check.py` - Evaluates DeepWork policies on `after_agent` events
+   - `rules_check.py` - Evaluates DeepWork rules on `after_agent` events
 
 ## Usage
 
@@ -32,7 +32,7 @@ The hook system provides:
         "hooks": [
           {
             "type": "command",
-            "command": "path/to/claude_hook.sh deepwork.hooks.policy_check"
+            "command": "path/to/claude_hook.sh deepwork.hooks.rules_check"
           }
         ]
       }
@@ -51,7 +51,7 @@ The hook system provides:
         "hooks": [
           {
             "type": "command",
-            "command": "path/to/gemini_hook.sh deepwork.hooks.policy_check"
+            "command": "path/to/gemini_hook.sh deepwork.hooks.rules_check"
           }
         ]
       }
@@ -178,4 +178,4 @@ pytest tests/shell_script_tests/test_hook_wrappers.py -v
 | `wrapper.py` | Cross-platform input/output normalization |
 | `claude_hook.sh` | Shell wrapper for Claude Code |
 | `gemini_hook.sh` | Shell wrapper for Gemini CLI |
-| `policy_check.py` | Cross-platform policy evaluation hook |
+| `rules_check.py` | Cross-platform rule evaluation hook |
diff --git a/src/deepwork/hooks/__init__.py b/src/deepwork/hooks/__init__.py
index 277080b6..c64dcfc4 100644
--- a/src/deepwork/hooks/__init__.py
+++ b/src/deepwork/hooks/__init__.py
@@ -1,4 +1,4 @@
-"""DeepWork hooks package for policy enforcement and lifecycle events.
+"""DeepWork hooks package for rules enforcement and lifecycle events.
 
 This package provides:
 
@@ -8,8 +8,7 @@
    - gemini_hook.sh: Shell wrapper for Gemini CLI hooks
 
 2. Hook implementations:
-   - policy_check.py: Evaluates policies on after_agent events
-   - evaluate_policies.py: Legacy policy evaluation (Claude-specific)
+   - rules_check.py: Evaluates rules on after_agent events
 
 Usage with wrapper system:
     # Register hook in .claude/settings.json:
@@ -18,7 +17,7 @@
         "Stop": [{
           "hooks": [{
             "type": "command",
-            "command": ".deepwork/hooks/claude_hook.sh deepwork.hooks.policy_check"
+            "command": ".deepwork/hooks/claude_hook.sh deepwork.hooks.rules_check"
           }]
         }]
       }
@@ -30,7 +29,7 @@
         "AfterAgent": [{
           "hooks": [{
             "type": "command",
-            "command": ".gemini/hooks/gemini_hook.sh deepwork.hooks.policy_check"
+            "command": ".gemini/hooks/gemini_hook.sh deepwork.hooks.rules_check"
           }]
         }]
       }
diff --git a/src/deepwork/hooks/claude_hook.sh b/src/deepwork/hooks/claude_hook.sh
index b9c4fd39..7e13ad44 100755
--- a/src/deepwork/hooks/claude_hook.sh
+++ b/src/deepwork/hooks/claude_hook.sh
@@ -9,7 +9,7 @@
 #   claude_hook.sh <python_hook_module>
 #
 # Example:
-#   claude_hook.sh deepwork.hooks.policy_check
+#   claude_hook.sh deepwork.hooks.rules_check
 #
 # The Python module should implement a main() function that:
 # 1. Calls deepwork.hooks.wrapper.run_hook() with a hook function
@@ -31,7 +31,7 @@ PYTHON_MODULE="${1:-}"
 
 if [ -z "${PYTHON_MODULE}" ]; then
     echo "Usage: claude_hook.sh <python_hook_module>" >&2
-    echo "Example: claude_hook.sh deepwork.hooks.policy_check" >&2
+    echo "Example: claude_hook.sh deepwork.hooks.rules_check" >&2
     exit 1
 fi
 
diff --git a/src/deepwork/hooks/gemini_hook.sh b/src/deepwork/hooks/gemini_hook.sh
index add66dfc..a2bb09da 100755
--- a/src/deepwork/hooks/gemini_hook.sh
+++ b/src/deepwork/hooks/gemini_hook.sh
@@ -9,7 +9,7 @@
 #   gemini_hook.sh <python_hook_module>
 #
 # Example:
-#   gemini_hook.sh deepwork.hooks.policy_check
+#   gemini_hook.sh deepwork.hooks.rules_check
 #
 # The Python module should implement a main() function that:
 # 1. Calls deepwork.hooks.wrapper.run_hook() with a hook function
@@ -31,7 +31,7 @@ PYTHON_MODULE="${1:-}"
 
 if [ -z "${PYTHON_MODULE}" ]; then
     echo "Usage: gemini_hook.sh <python_hook_module>" >&2
-    echo "Example: gemini_hook.sh deepwork.hooks.policy_check" >&2
+    echo "Example: gemini_hook.sh deepwork.hooks.rules_check" >&2
     exit 1
 fi
 
diff --git a/src/deepwork/hooks/policy_check.py b/src/deepwork/hooks/rules_check.py
similarity index 79%
rename from src/deepwork/hooks/policy_check.py
rename to src/deepwork/hooks/rules_check.py
index 4fb09141..121b9c5f 100644
--- a/src/deepwork/hooks/policy_check.py
+++ b/src/deepwork/hooks/rules_check.py
@@ -1,17 +1,17 @@
 """
-Policy check hook for DeepWork (v2).
+Rules check hook for DeepWork (v2).
 
-This hook evaluates policies when the agent finishes (after_agent event).
+This hook evaluates rules when the agent finishes (after_agent event).
 It uses the wrapper system for cross-platform compatibility.
 
-Policy files are loaded from .deepwork/policies/ directory as frontmatter markdown files.
+Rule files are loaded from .deepwork/rules/ directory as frontmatter markdown files.
 
 Usage (via shell wrapper):
-    claude_hook.sh deepwork.hooks.policy_check
-    gemini_hook.sh deepwork.hooks.policy_check
+    claude_hook.sh deepwork.hooks.rules_check
+    gemini_hook.sh deepwork.hooks.rules_check
 
 Or directly with platform environment variable:
-    DEEPWORK_HOOK_PLATFORM=claude python -m deepwork.hooks.policy_check
+    DEEPWORK_HOOK_PLATFORM=claude python -m deepwork.hooks.rules_check
 """
 
 from __future__ import annotations
@@ -28,18 +28,18 @@
     format_command_errors,
     run_command_action,
 )
-from deepwork.core.policy_parser import (
+from deepwork.core.rules_parser import (
     ActionType,
     DetectionMode,
-    Policy,
-    PolicyEvaluationResult,
-    PolicyParseError,
-    evaluate_policies,
-    load_policies_from_directory,
+    Rule,
+    RuleEvaluationResult,
+    RulesParseError,
+    evaluate_rules,
+    load_rules_from_directory,
 )
-from deepwork.core.policy_queue import (
+from deepwork.core.rules_queue import (
     ActionResult,
-    PolicyQueue,
+    RulesQueue,
     QueueEntryStatus,
     compute_trigger_hash,
 )
@@ -240,14 +240,14 @@ def get_changed_files_for_mode(mode: str) -> list[str]:
 
 def extract_promise_tags(text: str) -> set[str]:
     """
-    Extract policy names from <promise> tags in text.
+    Extract rule names from <promise> tags in text.
 
     Supports both:
-    - <promise>✓ Policy Name</promise>
-    - <promise>Policy Name</promise>
+    - <promise>Rule Name</promise>
+    - <promise>Rule Name</promise>
     """
     # Match with or without checkmark
-    pattern = r"<promise>(?:✓\s*)?([^<]+)</promise>"
+    pattern = r"<promise>(?:\s*)?([^<]+)</promise>"
     matches = re.findall(pattern, text, re.IGNORECASE | re.DOTALL)
     return {m.strip() for m in matches}
 
@@ -305,63 +305,63 @@ def extract_conversation_from_transcript(transcript_path: str, platform: Platfor
         return ""
 
 
-def format_policy_message(results: list[PolicyEvaluationResult]) -> str:
+def format_rules_message(results: list[RuleEvaluationResult]) -> str:
     """
-    Format triggered policies into a concise message for the agent.
+    Format triggered rules into a concise message for the agent.
 
-    Groups policies by name and uses minimal formatting.
+    Groups rules by name and uses minimal formatting.
     """
-    lines = ["## DeepWork Policies Triggered", ""]
+    lines = ["## DeepWork Rules Triggered", ""]
     lines.append(
-        "Comply with the following policies. "
-        "To mark a policy as addressed, include `<promise>✓ Policy Name</promise>` "
+        "Comply with the following rules. "
+        "To mark a rule as addressed, include `<promise>Rule Name</promise>` "
         "in your response."
     )
     lines.append("")
 
-    # Group results by policy name
-    by_name: dict[str, list[PolicyEvaluationResult]] = {}
+    # Group results by rule name
+    by_name: dict[str, list[RuleEvaluationResult]] = {}
     for result in results:
-        name = result.policy.name
+        name = result.rule.name
         if name not in by_name:
             by_name[name] = []
         by_name[name].append(result)
 
-    for name, policy_results in by_name.items():
-        policy = policy_results[0].policy
+    for name, rule_results in by_name.items():
+        rule = rule_results[0].rule
         lines.append(f"## {name}")
         lines.append("")
 
         # For set/pair modes, show the correspondence violations concisely
-        if policy.detection_mode in (DetectionMode.SET, DetectionMode.PAIR):
-            for result in policy_results:
+        if rule.detection_mode in (DetectionMode.SET, DetectionMode.PAIR):
+            for result in rule_results:
                 for trigger_file in result.trigger_files:
                     for missing_file in result.missing_files:
-                        lines.append(f"{trigger_file} → {missing_file}")
+                        lines.append(f"{trigger_file} -> {missing_file}")
             lines.append("")
 
         # Show instructions
-        if policy.instructions:
-            lines.append(policy.instructions.strip())
+        if rule.instructions:
+            lines.append(rule.instructions.strip())
             lines.append("")
 
     return "\n".join(lines)
 
 
-def policy_check_hook(hook_input: HookInput) -> HookOutput:
+def rules_check_hook(hook_input: HookInput) -> HookOutput:
     """
-    Main hook logic for policy evaluation (v2).
+    Main hook logic for rules evaluation (v2).
 
-    This is called for after_agent events to check if policies need attention
+    This is called for after_agent events to check if rules need attention
     before allowing the agent to complete.
     """
     # Only process after_agent events
     if hook_input.event != NormalizedEvent.AFTER_AGENT:
         return HookOutput()
 
-    # Check if policies directory exists
-    policies_dir = Path(".deepwork/policies")
-    if not policies_dir.exists():
+    # Check if rules directory exists
+    rules_dir = Path(".deepwork/rules")
+    if not rules_dir.exists():
         return HookOutput()
 
     # Extract conversation context from transcript
@@ -370,49 +370,49 @@ def policy_check_hook(hook_input: HookInput) -> HookOutput:
     )
 
     # Extract promise tags (case-insensitive)
-    promised_policies = extract_promise_tags(conversation_context)
+    promised_rules = extract_promise_tags(conversation_context)
 
-    # Load policies
+    # Load rules
     try:
-        policies = load_policies_from_directory(policies_dir)
-    except PolicyParseError as e:
-        print(f"Error loading policies: {e}", file=sys.stderr)
+        rules = load_rules_from_directory(rules_dir)
+    except RulesParseError as e:
+        print(f"Error loading rules: {e}", file=sys.stderr)
         return HookOutput()
 
-    if not policies:
+    if not rules:
         return HookOutput()
 
     # Initialize queue
-    queue = PolicyQueue()
-
-    # Group policies by compare_to mode
-    policies_by_mode: dict[str, list[Policy]] = {}
-    for policy in policies:
-        mode = policy.compare_to
-        if mode not in policies_by_mode:
-            policies_by_mode[mode] = []
-        policies_by_mode[mode].append(policy)
-
-    # Evaluate policies and collect results
-    prompt_results: list[PolicyEvaluationResult] = []
+    queue = RulesQueue()
+
+    # Group rules by compare_to mode
+    rules_by_mode: dict[str, list[Rule]] = {}
+    for rule in rules:
+        mode = rule.compare_to
+        if mode not in rules_by_mode:
+            rules_by_mode[mode] = []
+        rules_by_mode[mode].append(rule)
+
+    # Evaluate rules and collect results
+    prompt_results: list[RuleEvaluationResult] = []
     command_errors: list[str] = []
 
-    for mode, mode_policies in policies_by_mode.items():
+    for mode, mode_rules in rules_by_mode.items():
         changed_files = get_changed_files_for_mode(mode)
         if not changed_files:
             continue
 
         baseline_ref = get_baseline_ref(mode)
 
-        # Evaluate which policies fire
-        results = evaluate_policies(mode_policies, changed_files, promised_policies)
+        # Evaluate which rules fire
+        results = evaluate_rules(mode_rules, changed_files, promised_rules)
 
         for result in results:
-            policy = result.policy
+            rule = result.rule
 
             # Compute trigger hash for queue deduplication
             trigger_hash = compute_trigger_hash(
-                policy.name,
+                rule.name,
                 result.trigger_files,
                 baseline_ref,
             )
@@ -428,20 +428,20 @@ def policy_check_hook(hook_input: HookInput) -> HookOutput:
             # Create queue entry if new
             if not existing:
                 queue.create_entry(
-                    policy_name=policy.name,
-                    policy_file=f"{policy.filename}.md",
+                    rule_name=rule.name,
+                    rule_file=f"{rule.filename}.md",
                     trigger_files=result.trigger_files,
                     baseline_ref=baseline_ref,
                     expected_files=result.missing_files,
                 )
 
             # Handle based on action type
-            if policy.action_type == ActionType.COMMAND:
+            if rule.action_type == ActionType.COMMAND:
                 # Run command action
-                if policy.command_action:
+                if rule.command_action:
                     repo_root = Path.cwd()
                     cmd_results = run_command_action(
-                        policy.command_action,
+                        rule.command_action,
                         result.trigger_files,
                         repo_root,
                     )
@@ -460,7 +460,7 @@ def policy_check_hook(hook_input: HookInput) -> HookOutput:
                     else:
                         # Command failed
                         error_msg = format_command_errors(cmd_results)
-                        command_errors.append(f"## {policy.name}\n{error_msg}")
+                        command_errors.append(f"## {rule.name}\n{error_msg}")
                         queue.update_status(
                             trigger_hash,
                             QueueEntryStatus.FAILED,
@@ -471,7 +471,7 @@ def policy_check_hook(hook_input: HookInput) -> HookOutput:
                             ),
                         )
 
-            elif policy.action_type == ActionType.PROMPT:
+            elif rule.action_type == ActionType.PROMPT:
                 # Collect for prompt output
                 prompt_results.append(result)
 
@@ -480,13 +480,13 @@ def policy_check_hook(hook_input: HookInput) -> HookOutput:
 
     # Add command errors if any
     if command_errors:
-        messages.append("## Command Policy Errors\n")
+        messages.append("## Command Rule Errors\n")
         messages.extend(command_errors)
         messages.append("")
 
-    # Add prompt policies if any
+    # Add prompt rules if any
     if prompt_results:
-        messages.append(format_policy_message(prompt_results))
+        messages.append(format_rules_message(prompt_results))
 
     if messages:
         return HookOutput(decision="block", reason="\n".join(messages))
@@ -495,7 +495,7 @@ def policy_check_hook(hook_input: HookInput) -> HookOutput:
 
 
 def main() -> None:
-    """Entry point for the policy check hook."""
+    """Entry point for the rules check hook."""
     # Determine platform from environment
     platform_str = os.environ.get("DEEPWORK_HOOK_PLATFORM", "claude")
     try:
@@ -504,7 +504,7 @@ def main() -> None:
         platform = Platform.CLAUDE
 
     # Run the hook with the wrapper
-    exit_code = run_hook(policy_check_hook, platform)
+    exit_code = run_hook(rules_check_hook, platform)
     sys.exit(exit_code)
 
 
diff --git a/src/deepwork/schemas/policy_schema.py b/src/deepwork/schemas/rules_schema.py
similarity index 86%
rename from src/deepwork/schemas/policy_schema.py
rename to src/deepwork/schemas/rules_schema.py
index 51e35812..3112dd0f 100644
--- a/src/deepwork/schemas/policy_schema.py
+++ b/src/deepwork/schemas/rules_schema.py
@@ -1,4 +1,4 @@
-"""JSON Schema definition for policy definitions (v2 - frontmatter format)."""
+"""JSON Schema definition for rule definitions (v2 - frontmatter format)."""
 
 from typing import Any
 
@@ -10,9 +10,9 @@
     ]
 }
 
-# JSON Schema for policy frontmatter (YAML between --- delimiters)
-# Policies are stored as individual .md files in .deepwork/policies/
-POLICY_FRONTMATTER_SCHEMA: dict[str, Any] = {
+# JSON Schema for rule frontmatter (YAML between --- delimiters)
+# Rules are stored as individual .md files in .deepwork/rules/
+RULES_FRONTMATTER_SCHEMA: dict[str, Any] = {
     "$schema": "http://json-schema.org/draft-07/schema#",
     "type": "object",
     "required": ["name"],
@@ -20,16 +20,16 @@
         "name": {
             "type": "string",
             "minLength": 1,
-            "description": "Human-friendly name for the policy (displayed in promise tags)",
+            "description": "Human-friendly name for the rule (displayed in promise tags)",
         },
         # Detection mode: trigger/safety (mutually exclusive with set/pair)
         "trigger": {
             **STRING_OR_ARRAY,
-            "description": "Glob pattern(s) for files that trigger this policy",
+            "description": "Glob pattern(s) for files that trigger this rule",
         },
         "safety": {
             **STRING_OR_ARRAY,
-            "description": "Glob pattern(s) that suppress the policy if changed",
+            "description": "Glob pattern(s) that suppress the rule if changed",
         },
         # Detection mode: set (bidirectional correspondence)
         "set": {
@@ -46,7 +46,7 @@
                 "trigger": {
                     "type": "string",
                     "minLength": 1,
-                    "description": "Pattern that triggers the policy",
+                    "description": "Pattern that triggers the rule",
                 },
                 "expects": {
                     **STRING_OR_ARRAY,
diff --git a/src/deepwork/standard_jobs/deepwork_jobs/job.yml b/src/deepwork/standard_jobs/deepwork_jobs/job.yml
index e1afa5ee..e95aa2c0 100644
--- a/src/deepwork/standard_jobs/deepwork_jobs/job.yml
+++ b/src/deepwork/standard_jobs/deepwork_jobs/job.yml
@@ -77,9 +77,9 @@ steps:
             6. **Ask Structured Questions**: Do step instructions that gather user input explicitly use the phrase "ask structured questions"?
             7. **Sync Complete**: Has `deepwork sync` been run successfully?
             8. **Commands Available**: Are the slash-commands generated in `.claude/commands/`?
-            9. **Policies Considered**: Have you thought about whether policies would benefit this job?
-               - If relevant policies were identified, did you explain them and offer to run `/deepwork_policy.define`?
-               - Not every job needs policies - only suggest when genuinely helpful.
+            9. **Rules Considered**: Have you thought about whether rules would benefit this job?
+               - If relevant rules were identified, did you explain them and offer to run `/deepwork_rules.define`?
+               - Not every job needs rules - only suggest when genuinely helpful.
 
             If ANY criterion is not met, continue working to address it.
             If ALL criteria are satisfied, include `<promise>✓ Quality Criteria Met</promise>` in your response.
diff --git a/src/deepwork/standard_jobs/deepwork_jobs/steps/implement.md b/src/deepwork/standard_jobs/deepwork_jobs/steps/implement.md
index a3a790f6..600e1578 100644
--- a/src/deepwork/standard_jobs/deepwork_jobs/steps/implement.md
+++ b/src/deepwork/standard_jobs/deepwork_jobs/steps/implement.md
@@ -130,19 +130,19 @@ This will:
 
 After running `deepwork sync`, look at the "To use the new commands" section in the output. **Relay these exact reload instructions to the user** so they know how to pick up the new commands. Don't just reference the sync output - tell them directly what they need to do (e.g., "Type 'exit' then run 'claude --resume'" for Claude Code, or "Run '/memory refresh'" for Gemini CLI).
 
-### Step 7: Consider Policies for the New Job
+### Step 7: Consider Rules for the New Job
 
-After implementing the job, consider whether there are **policies** that would help enforce quality or consistency when working with this job's domain.
+After implementing the job, consider whether there are **rules** that would help enforce quality or consistency when working with this job's domain.
 
-**What are policies?**
+**What are rules?**
 
-Policies are automated guardrails defined in `.deepwork.policy.yml` that trigger when certain files change during an AI session. They help ensure:
+Rules are automated guardrails defined in `.deepwork.rules.yml` that trigger when certain files change during an AI session. They help ensure:
 - Documentation stays in sync with code
 - Team guidelines are followed
 - Architectural decisions are respected
 - Quality standards are maintained
 
-**When to suggest policies:**
+**When to suggest rules:**
 
 Think about the job you just implemented and ask:
 - Does this job produce outputs that other files depend on?
@@ -150,28 +150,28 @@ Think about the job you just implemented and ask:
 - Are there quality checks or reviews that should happen when certain files in this domain change?
 - Could changes to the job's output files impact other parts of the project?
 
-**Examples of policies that might make sense:**
+**Examples of rules that might make sense:**
 
-| Job Type | Potential Policy |
-|----------|------------------|
+| Job Type | Potential Rule |
+|----------|----------------|
 | API Design | "Update API docs when endpoint definitions change" |
 | Database Schema | "Review migrations when schema files change" |
 | Competitive Research | "Update strategy docs when competitor analysis changes" |
 | Feature Development | "Update changelog when feature files change" |
 | Configuration Management | "Update install guide when config files change" |
 
-**How to offer policy creation:**
+**How to offer rule creation:**
 
-If you identify one or more policies that would benefit the user, explain:
-1. **What the policy would do** - What triggers it and what action it prompts
+If you identify one or more rules that would benefit the user, explain:
+1. **What the rule would do** - What triggers it and what action it prompts
 2. **Why it would help** - How it prevents common mistakes or keeps things in sync
 3. **What files it would watch** - The trigger patterns
 
 Then ask the user:
 
-> "Would you like me to create this policy for you? I can run `/deepwork_policy.define` to set it up."
+> "Would you like me to create this rule for you? I can run `/deepwork_rules.define` to set it up."
 
-If the user agrees, invoke the `/deepwork_policy.define` command to guide them through creating the policy.
+If the user agrees, invoke the `/deepwork_rules.define` command to guide them through creating the rule.
 
 **Example dialogue:**
 
@@ -180,15 +180,15 @@ Based on the competitive_research job you just created, I noticed that when
 competitor analysis files change, it would be helpful to remind you to update
 your strategy documentation.
 
-I'd suggest a policy like:
+I'd suggest a rule like:
 - **Name**: "Update strategy when competitor analysis changes"
 - **Trigger**: `**/positioning_report.md`
 - **Action**: Prompt to review and update `docs/strategy.md`
 
-Would you like me to create this policy? I can run `/deepwork_policy.define` to set it up.
+Would you like me to create this rule? I can run `/deepwork_rules.define` to set it up.
 ```
 
-**Note:** Not every job needs policies. Only suggest them when they would genuinely help maintain consistency or quality. Don't force policies where they don't make sense.
+**Note:** Not every job needs rules. Only suggest them when they would genuinely help maintain consistency or quality. Don't force rules where they don't make sense.
 
 ## Example Implementation
 
@@ -222,8 +222,8 @@ Before marking this step complete, ensure:
 - [ ] `deepwork sync` executed successfully
 - [ ] Commands generated in platform directory
 - [ ] User informed to follow reload instructions from `deepwork sync`
-- [ ] Considered whether policies would benefit this job (Step 7)
-- [ ] If policies suggested, offered to run `/deepwork_policy.define`
+- [ ] Considered whether rules would benefit this job (Step 7)
+- [ ] If rules suggested, offered to run `/deepwork_rules.define`
 
 ## Quality Criteria
 
@@ -235,4 +235,4 @@ Before marking this step complete, ensure:
 - Steps with user inputs explicitly use "ask structured questions" phrasing
 - Sync completed successfully
 - Commands available for use
-- Thoughtfully considered relevant policies for the job domain
+- Thoughtfully considered relevant rules for the job domain
diff --git a/src/deepwork/standard_jobs/deepwork_policy/hooks/capture_prompt_work_tree.sh b/src/deepwork/standard_jobs/deepwork_rules/hooks/capture_prompt_work_tree.sh
similarity index 100%
rename from src/deepwork/standard_jobs/deepwork_policy/hooks/capture_prompt_work_tree.sh
rename to src/deepwork/standard_jobs/deepwork_rules/hooks/capture_prompt_work_tree.sh
diff --git a/.deepwork/jobs/deepwork_policy/hooks/global_hooks.yml b/src/deepwork/standard_jobs/deepwork_rules/hooks/global_hooks.yml
similarity index 62%
rename from .deepwork/jobs/deepwork_policy/hooks/global_hooks.yml
rename to src/deepwork/standard_jobs/deepwork_rules/hooks/global_hooks.yml
index 0e024fc7..f76202ab 100644
--- a/.deepwork/jobs/deepwork_policy/hooks/global_hooks.yml
+++ b/src/deepwork/standard_jobs/deepwork_rules/hooks/global_hooks.yml
@@ -1,8 +1,8 @@
-# DeepWork Policy Hooks Configuration
+# DeepWork Rules Hooks Configuration
 # Maps Claude Code lifecycle events to hook scripts
 
 UserPromptSubmit:
   - user_prompt_submit.sh
 
 Stop:
-  - policy_stop_hook.sh
+  - rules_stop_hook.sh
diff --git a/src/deepwork/standard_jobs/deepwork_rules/hooks/rules_stop_hook.sh b/src/deepwork/standard_jobs/deepwork_rules/hooks/rules_stop_hook.sh
new file mode 100755
index 00000000..20fa8a3f
--- /dev/null
+++ b/src/deepwork/standard_jobs/deepwork_rules/hooks/rules_stop_hook.sh
@@ -0,0 +1,43 @@
+#!/bin/bash
+# rules_stop_hook.sh - Evaluates rules when the agent stops
+#
+# This script is called as a Claude Code Stop hook. It:
+# 1. Evaluates rules from .deepwork/rules/
+# 2. Computes changed files based on each rule's compare_to setting
+# 3. Checks for <promise> tags in the conversation transcript
+# 4. Returns JSON to block stop if rules need attention
+
+set -e
+
+# Check if rules directory exists with .md files
+RULES_DIR=".deepwork/rules"
+
+if [ ! -d "${RULES_DIR}" ]; then
+    # No rules directory, nothing to do
+    exit 0
+fi
+
+# Check if there are any .md files
+if ! ls "${RULES_DIR}"/*.md 1>/dev/null 2>&1; then
+    # No rule files, nothing to do
+    exit 0
+fi
+
+# Read the hook input JSON from stdin
+HOOK_INPUT=""
+if [ ! -t 0 ]; then
+    HOOK_INPUT=$(cat)
+fi
+
+# Call the Python rules evaluator via the cross-platform wrapper
+# The wrapper reads JSON input and handles transcript extraction
+# Note: exit code 2 means "block" which is valid (not an error), so capture it
+result=$(echo "${HOOK_INPUT}" | DEEPWORK_HOOK_PLATFORM=claude DEEPWORK_HOOK_EVENT=Stop python -m deepwork.hooks.rules_check 2>/dev/null) || true
+
+# If no output (error case), provide empty JSON as fallback
+if [ -z "${result}" ]; then
+    result='{}'
+fi
+
+# Output the result (JSON for Claude Code hooks)
+echo "${result}"
diff --git a/src/deepwork/standard_jobs/deepwork_policy/hooks/user_prompt_submit.sh b/src/deepwork/standard_jobs/deepwork_rules/hooks/user_prompt_submit.sh
similarity index 100%
rename from src/deepwork/standard_jobs/deepwork_policy/hooks/user_prompt_submit.sh
rename to src/deepwork/standard_jobs/deepwork_rules/hooks/user_prompt_submit.sh
diff --git a/src/deepwork/standard_jobs/deepwork_rules/job.yml b/src/deepwork/standard_jobs/deepwork_rules/job.yml
new file mode 100644
index 00000000..9e9ece74
--- /dev/null
+++ b/src/deepwork/standard_jobs/deepwork_rules/job.yml
@@ -0,0 +1,37 @@
+name: deepwork_rules
+version: "0.2.0"
+summary: "Rules enforcement for AI agent sessions"
+description: |
+  Manages rules that automatically trigger when certain files change during an AI agent session.
+  Rules help ensure that code changes follow team guidelines, documentation is updated,
+  and architectural decisions are respected.
+
+  Rules are defined in a `.deepwork.rules.yml` file at the root of your project. Each rule
+  specifies:
+  - Trigger patterns: Glob patterns for files that, when changed, should trigger the rule
+  - Safety patterns: Glob patterns for files that, if also changed, mean the rule doesn't need to fire
+  - Instructions: What the agent should do when the rule triggers
+
+  Example use cases:
+  - Update installation docs when configuration files change
+  - Require security review when authentication code is modified
+  - Ensure API documentation stays in sync with API code
+  - Remind developers to update changelogs
+
+changelog:
+  - version: "0.1.0"
+    changes: "Initial version"
+  - version: "0.2.0"
+    changes: "Standardized on 'ask structured questions' phrasing for user input"
+
+steps:
+  - id: define
+    name: "Define Rule"
+    description: "Create or update rule entries in .deepwork.rules.yml"
+    instructions_file: steps/define.md
+    inputs:
+      - name: rule_purpose
+        description: "What guideline or constraint should this rule enforce?"
+    outputs:
+      - .deepwork.rules.yml
+    dependencies: []
diff --git a/src/deepwork/standard_jobs/deepwork_rules/steps/define.md b/src/deepwork/standard_jobs/deepwork_rules/steps/define.md
new file mode 100644
index 00000000..3e8be899
--- /dev/null
+++ b/src/deepwork/standard_jobs/deepwork_rules/steps/define.md
@@ -0,0 +1,198 @@
+# Define Rule
+
+## Objective
+
+Create or update rule entries in the `.deepwork.rules.yml` file to enforce team guidelines, documentation requirements, or other constraints when specific files change.
+
+## Task
+
+Guide the user through defining a new rule by asking structured questions. **Do not create the rule without first understanding what they want to enforce.**
+
+**Important**: Use the AskUserQuestion tool to ask structured questions when gathering information from the user. This provides a better user experience with clear options and guided choices.
+
+### Step 1: Understand the Rule Purpose
+
+Start by asking structured questions to understand what the user wants to enforce:
+
+1. **What guideline or constraint should this rule enforce?**
+   - What situation triggers the need for action?
+   - What files or directories, when changed, should trigger this rule?
+   - Examples: "When config files change", "When API code changes", "When database schema changes"
+
+2. **What action should be taken?**
+   - What should the agent do when the rule triggers?
+   - Update documentation? Perform a security review? Update tests?
+   - Is there a specific file or process that needs attention?
+
+3. **Are there any "safety" conditions?**
+   - Are there files that, if also changed, mean the rule doesn't need to fire?
+   - For example: If config changes AND install_guide.md changes, assume docs are already updated
+   - This prevents redundant prompts when the user has already done the right thing
+
+### Step 2: Define the Trigger Patterns
+
+Help the user define glob patterns for files that should trigger the rule:
+
+**Common patterns:**
+- `src/**/*.py` - All Python files in src directory (recursive)
+- `app/config/**/*` - All files in app/config directory
+- `*.md` - All markdown files in root
+- `src/api/**/*` - All files in the API directory
+- `migrations/**/*.sql` - All SQL migrations
+
+**Pattern syntax:**
+- `*` - Matches any characters within a single path segment
+- `**` - Matches any characters across multiple path segments (recursive)
+- `?` - Matches a single character
+
+### Step 3: Define Safety Patterns (Optional)
+
+If there are files that, when also changed, mean the rule shouldn't fire:
+
+**Examples:**
+- Rule: "Update install guide when config changes"
+  - Trigger: `app/config/**/*`
+  - Safety: `docs/install_guide.md` (if already updated, don't prompt)
+
+- Rule: "Security review for auth changes"
+  - Trigger: `src/auth/**/*`
+  - Safety: `SECURITY.md`, `docs/security_review.md`
+
+### Step 3b: Choose the Comparison Mode (Optional)
+
+The `compare_to` field controls what baseline is used when detecting "changed files":
+
+**Options:**
+- `base` (default) - Compares to the base of the current branch (merge-base with main/master). This is the most common choice for feature branches, as it shows all changes made on the branch.
+- `default_tip` - Compares to the current tip of the default branch (main/master). Useful when you want to see the difference from what's currently in production.
+- `prompt` - Compares to the state at the start of each prompt. Useful for rules that should only fire based on changes made during a single agent response.
+
+**When to use each:**
+- **base**: Best for most rules. "Did this branch change config files?" -> trigger docs review
+- **default_tip**: For rules about what's different from production/main
+- **prompt**: For rules that should only consider very recent changes within the current session
+
+Most rules should use the default (`base`) and don't need to specify `compare_to`.
+
+### Step 4: Write the Instructions
+
+Create clear, actionable instructions for what the agent should do when the rule fires.
+
+**Good instructions include:**
+- What to check or review
+- What files might need updating
+- Specific actions to take
+- Quality criteria for completion
+
+**Example:**
+```
+Configuration files have changed. Please:
+1. Review docs/install_guide.md for accuracy
+2. Update any installation steps that reference changed config
+3. Verify environment variable documentation is current
+4. Test that installation instructions still work
+```
+
+### Step 5: Create the Rule Entry
+
+Create or update `.deepwork.rules.yml` in the project root.
+
+**File Location**: `.deepwork.rules.yml` (root of project)
+
+**Format**:
+```yaml
+- name: "[Friendly name for the rule]"
+  trigger: "[glob pattern]"  # or array: ["pattern1", "pattern2"]
+  safety: "[glob pattern]"   # optional, or array
+  compare_to: "base"         # optional: "base" (default), "default_tip", or "prompt"
+  instructions: |
+    [Multi-line instructions for the agent...]
+```
+
+**Alternative with instructions_file**:
+```yaml
+- name: "[Friendly name for the rule]"
+  trigger: "[glob pattern]"
+  safety: "[glob pattern]"
+  compare_to: "base"         # optional
+  instructions_file: "path/to/instructions.md"
+```
+
+### Step 6: Verify the Rule
+
+After creating the rule:
+
+1. **Check the YAML syntax** - Ensure valid YAML formatting
+2. **Test trigger patterns** - Verify patterns match intended files
+3. **Review instructions** - Ensure they're clear and actionable
+4. **Check for conflicts** - Ensure the rule doesn't conflict with existing ones
+
+## Example Rules
+
+### Update Documentation on Config Changes
+```yaml
+- name: "Update install guide on config changes"
+  trigger: "app/config/**/*"
+  safety: "docs/install_guide.md"
+  instructions: |
+    Configuration files have been modified. Please review docs/install_guide.md
+    and update it if any installation instructions need to change based on the
+    new configuration.
+```
+
+### Security Review for Auth Code
+```yaml
+- name: "Security review for authentication changes"
+  trigger:
+    - "src/auth/**/*"
+    - "src/security/**/*"
+  safety:
+    - "SECURITY.md"
+    - "docs/security_audit.md"
+  instructions: |
+    Authentication or security code has been changed. Please:
+    1. Review for hardcoded credentials or secrets
+    2. Check input validation on user inputs
+    3. Verify access control logic is correct
+    4. Update security documentation if needed
+```
+
+### API Documentation Sync
+```yaml
+- name: "API documentation update"
+  trigger: "src/api/**/*.py"
+  safety: "docs/api/**/*.md"
+  instructions: |
+    API code has changed. Please verify that API documentation in docs/api/
+    is up to date with the code changes. Pay special attention to:
+    - New or changed endpoints
+    - Modified request/response schemas
+    - Updated authentication requirements
+```
+
+## Output Format
+
+### .deepwork.rules.yml
+Create or update this file at the project root with the new rule entry.
+
+## Quality Criteria
+
+- Asked structured questions to understand user requirements
+- Rule name is clear and descriptive
+- Trigger patterns accurately match the intended files
+- Safety patterns prevent unnecessary triggering
+- Instructions are actionable and specific
+- YAML is valid and properly formatted
+
+## Context
+
+Rules are evaluated automatically when you finish working on a task. The system:
+1. Determines which files have changed based on each rule's `compare_to` setting:
+   - `base` (default): Files changed since the branch diverged from main/master
+   - `default_tip`: Files different from the current main/master branch
+   - `prompt`: Files changed since the last prompt submission
+2. Checks if any changes match rule trigger patterns
+3. Skips rules where safety patterns also matched
+4. Prompts you with instructions for any triggered rules
+
+You can mark a rule as addressed by including `<promise>Rule Name</promise>` in your response (replace Rule Name with the actual rule name). This tells the system you've already handled that rule's requirements.
diff --git a/src/deepwork/templates/default_policy.yml b/src/deepwork/templates/default_rules.yml
similarity index 85%
rename from src/deepwork/templates/default_policy.yml
rename to src/deepwork/templates/default_rules.yml
index 2f895bde..ec0fbd31 100644
--- a/src/deepwork/templates/default_policy.yml
+++ b/src/deepwork/templates/default_rules.yml
@@ -1,19 +1,19 @@
-# DeepWork Policy Configuration
+# DeepWork Rules Configuration
 #
-# Policies are automated guardrails that trigger when specific files change.
+# Rules are automated guardrails that trigger when specific files change.
 # They help ensure documentation stays current, security reviews happen, etc.
 #
-# Use /deepwork_policy.define to create new policies interactively.
+# Use /deepwork_rules.define to create new rules interactively.
 #
 # Format:
-#   - name: "Friendly name for the policy"
+#   - name: "Friendly name for the rule"
 #     trigger: "glob/pattern/**/*"  # or array: ["pattern1", "pattern2"]
-#     safety: "pattern/**/*"        # optional - if these also changed, skip the policy
+#     safety: "pattern/**/*"        # optional - if these also changed, skip the rule
 #     compare_to: "base"            # optional: "base" (default), "default_tip", or "prompt"
 #     instructions: |
 #       Multi-line instructions for the AI agent...
 #
-# Example policies (uncomment and customize):
+# Example rules (uncomment and customize):
 #
 # - name: "README Documentation"
 #   trigger: "src/**/*"
diff --git a/tests/integration/test_install_flow.py b/tests/integration/test_install_flow.py
index 4f42353f..ab394961 100644
--- a/tests/integration/test_install_flow.py
+++ b/tests/integration/test_install_flow.py
@@ -152,8 +152,8 @@ def test_install_is_idempotent(self, mock_claude_project: Path) -> None:
         assert (claude_dir / "deepwork_jobs.define.md").exists()
         assert (claude_dir / "deepwork_jobs.learn.md").exists()
 
-    def test_install_creates_policy_template(self, mock_claude_project: Path) -> None:
-        """Test that install creates a policy template file."""
+    def test_install_creates_rules_template(self, mock_claude_project: Path) -> None:
+        """Test that install creates a rules template file."""
         runner = CliRunner()
 
         result = runner.invoke(
@@ -163,34 +163,34 @@ def test_install_creates_policy_template(self, mock_claude_project: Path) -> Non
         )
 
         assert result.exit_code == 0
-        assert ".deepwork.policy.yml template" in result.output
+        assert ".deepwork.rules.yml template" in result.output
 
-        # Verify policy file was created
-        policy_file = mock_claude_project / ".deepwork.policy.yml"
-        assert policy_file.exists()
+        # Verify rules file was created
+        rules_file = mock_claude_project / ".deepwork.rules.yml"
+        assert rules_file.exists()
 
-        # Verify it's the template (has comment header, no active policies)
-        content = policy_file.read_text()
-        assert "# DeepWork Policy Configuration" in content
-        assert "# Use /deepwork_policy.define" in content
+        # Verify it's the template (has comment header, no active rules)
+        content = rules_file.read_text()
+        assert "# DeepWork Rules Configuration" in content
+        assert "# Use /deepwork_rules.define" in content
 
-        # Verify it does NOT contain deepwork-specific policies
+        # Verify it does NOT contain deepwork-specific rules
         assert "Standard Jobs Source of Truth" not in content
         assert "Version and Changelog Update" not in content
         assert "pyproject.toml" not in content
 
-    def test_install_preserves_existing_policy_file(self, mock_claude_project: Path) -> None:
-        """Test that install doesn't overwrite existing policy file."""
+    def test_install_preserves_existing_rules_file(self, mock_claude_project: Path) -> None:
+        """Test that install doesn't overwrite existing rules file."""
         runner = CliRunner()
 
-        # Create a custom policy file before install
-        policy_file = mock_claude_project / ".deepwork.policy.yml"
-        custom_content = """- name: "My Custom Policy"
+        # Create a custom rules file before install
+        rules_file = mock_claude_project / ".deepwork.rules.yml"
+        custom_content = """- name: "My Custom Rule"
   trigger: "src/**/*"
   instructions: |
     Custom instructions here.
 """
-        policy_file.write_text(custom_content)
+        rules_file.write_text(custom_content)
 
         result = runner.invoke(
             cli,
@@ -199,10 +199,10 @@ def test_install_preserves_existing_policy_file(self, mock_claude_project: Path)
         )
 
         assert result.exit_code == 0
-        assert ".deepwork.policy.yml already exists" in result.output
+        assert ".deepwork.rules.yml already exists" in result.output
 
         # Verify original content is preserved
-        assert policy_file.read_text() == custom_content
+        assert rules_file.read_text() == custom_content
 
 
 class TestCLIEntryPoint:
diff --git a/tests/shell_script_tests/README.md b/tests/shell_script_tests/README.md
index 95bf0468..983ad4ec 100644
--- a/tests/shell_script_tests/README.md
+++ b/tests/shell_script_tests/README.md
@@ -6,9 +6,9 @@ Automated tests for DeepWork shell scripts, with a focus on validating Claude Co
 
 | Script | Type | Description |
 |--------|------|-------------|
-| `policy_stop_hook.sh` | Stop Hook | Evaluates policies and blocks agent stop if policies are triggered |
+| `rules_stop_hook.sh` | Stop Hook | Evaluates rules and blocks agent stop if rules are triggered |
 | `user_prompt_submit.sh` | UserPromptSubmit Hook | Captures work tree state when user submits a prompt |
-| `capture_prompt_work_tree.sh` | Helper | Records current git state for `compare_to: prompt` policies |
+| `capture_prompt_work_tree.sh` | Helper | Records current git state for `compare_to: prompt` rules |
 | `make_new_job.sh` | Utility | Creates directory structure for new DeepWork jobs |
 
 ## Claude Code Hooks JSON Format
@@ -38,7 +38,7 @@ Hook scripts must return valid JSON responses. The tests enforce these formats:
 uv run pytest tests/shell_script_tests/ -v
 
 # Run tests for a specific script
-uv run pytest tests/shell_script_tests/test_policy_stop_hook.py -v
+uv run pytest tests/shell_script_tests/test_rules_stop_hook.py -v
 
 # Run with coverage
 uv run pytest tests/shell_script_tests/ --cov=src/deepwork
@@ -49,7 +49,7 @@ uv run pytest tests/shell_script_tests/ --cov=src/deepwork
 ```
 tests/shell_script_tests/
 ├── conftest.py                      # Shared fixtures and helpers
-├── test_policy_stop_hook.py         # Stop hook blocking/allowing tests
+├── test_rules_stop_hook.py          # Stop hook blocking/allowing tests
 ├── test_user_prompt_submit.py       # Prompt submission hook tests
 ├── test_capture_prompt_work_tree.py # Work tree capture tests
 ├── test_hooks_json_format.py        # JSON format validation tests
@@ -63,8 +63,8 @@ Available in `conftest.py`:
 | Fixture | Description |
 |---------|-------------|
 | `git_repo` | Basic git repo with initial commit |
-| `git_repo_with_policy` | Git repo with a Python file policy |
-| `policy_hooks_dir` | Path to policy hooks scripts |
+| `git_repo_with_rule` | Git repo with a Python file rule |
+| `rules_hooks_dir` | Path to rules hooks scripts |
 | `jobs_scripts_dir` | Path to job management scripts |
 
 ## Adding New Tests
diff --git a/tests/shell_script_tests/conftest.py b/tests/shell_script_tests/conftest.py
index e9b97682..64e62f1c 100644
--- a/tests/shell_script_tests/conftest.py
+++ b/tests/shell_script_tests/conftest.py
@@ -23,8 +23,8 @@ def git_repo(tmp_path: Path) -> Path:
 
 
 @pytest.fixture
-def git_repo_with_policy(tmp_path: Path) -> Path:
-    """Create a git repo with policy that will fire."""
+def git_repo_with_rule(tmp_path: Path) -> Path:
+    """Create a git repo with rule that will fire."""
     repo = Repo.init(tmp_path)
 
     readme = tmp_path / "README.md"
@@ -32,15 +32,15 @@ def git_repo_with_policy(tmp_path: Path) -> Path:
     repo.index.add(["README.md"])
     repo.index.commit("Initial commit")
 
-    # Create v2 policy directory and file
-    policies_dir = tmp_path / ".deepwork" / "policies"
-    policies_dir.mkdir(parents=True, exist_ok=True)
+    # Create v2 rules directory and file
+    rules_dir = tmp_path / ".deepwork" / "rules"
+    rules_dir.mkdir(parents=True, exist_ok=True)
 
-    # Policy that triggers on any Python file (v2 format)
-    policy_file = policies_dir / "python-file-policy.md"
-    policy_file.write_text(
+    # Rule that triggers on any Python file (v2 format)
+    rule_file = rules_dir / "python-file-rule.md"
+    rule_file.write_text(
         """---
-name: Python File Policy
+name: Python File Rule
 trigger: "**/*.py"
 compare_to: prompt
 ---
@@ -56,14 +56,14 @@ def git_repo_with_policy(tmp_path: Path) -> Path:
 
 
 @pytest.fixture
-def policy_hooks_dir() -> Path:
-    """Return the path to the policy hooks scripts directory."""
+def rules_hooks_dir() -> Path:
+    """Return the path to the rules hooks scripts directory."""
     return (
         Path(__file__).parent.parent.parent
         / "src"
         / "deepwork"
         / "standard_jobs"
-        / "deepwork_policy"
+        / "deepwork_rules"
         / "hooks"
     )
 
diff --git a/tests/shell_script_tests/test_capture_prompt_work_tree.py b/tests/shell_script_tests/test_capture_prompt_work_tree.py
index 4f187b13..6f0435b1 100644
--- a/tests/shell_script_tests/test_capture_prompt_work_tree.py
+++ b/tests/shell_script_tests/test_capture_prompt_work_tree.py
@@ -1,7 +1,7 @@
 """Tests for capture_prompt_work_tree.sh helper script.
 
 This script captures the git work tree state for use with
-compare_to: prompt policies. It should:
+compare_to: prompt rules. It should:
 1. Create .deepwork directory if needed
 2. Stage all changes with git add -A
 3. Record changed files to .deepwork/.last_work_tree
@@ -35,36 +35,36 @@ def run_capture_script(script_path: Path, cwd: Path) -> tuple[str, str, int]:
 class TestCapturePromptWorkTreeBasic:
     """Basic functionality tests for capture_prompt_work_tree.sh."""
 
-    def test_exits_successfully(self, policy_hooks_dir: Path, git_repo: Path) -> None:
+    def test_exits_successfully(self, rules_hooks_dir: Path, git_repo: Path) -> None:
         """Test that the script exits with code 0."""
-        script_path = policy_hooks_dir / "capture_prompt_work_tree.sh"
+        script_path = rules_hooks_dir / "capture_prompt_work_tree.sh"
         stdout, stderr, code = run_capture_script(script_path, git_repo)
 
         assert code == 0, f"Expected exit code 0, got {code}. stderr: {stderr}"
 
-    def test_creates_deepwork_directory(self, policy_hooks_dir: Path, git_repo: Path) -> None:
+    def test_creates_deepwork_directory(self, rules_hooks_dir: Path, git_repo: Path) -> None:
         """Test that the script creates .deepwork directory."""
         deepwork_dir = git_repo / ".deepwork"
         assert not deepwork_dir.exists(), "Precondition: .deepwork should not exist"
 
-        script_path = policy_hooks_dir / "capture_prompt_work_tree.sh"
+        script_path = rules_hooks_dir / "capture_prompt_work_tree.sh"
         stdout, stderr, code = run_capture_script(script_path, git_repo)
 
         assert code == 0, f"Script failed with stderr: {stderr}"
         assert deepwork_dir.exists(), "Script should create .deepwork directory"
 
-    def test_creates_last_work_tree_file(self, policy_hooks_dir: Path, git_repo: Path) -> None:
+    def test_creates_last_work_tree_file(self, rules_hooks_dir: Path, git_repo: Path) -> None:
         """Test that the script creates .last_work_tree file."""
-        script_path = policy_hooks_dir / "capture_prompt_work_tree.sh"
+        script_path = rules_hooks_dir / "capture_prompt_work_tree.sh"
         stdout, stderr, code = run_capture_script(script_path, git_repo)
 
         work_tree_file = git_repo / ".deepwork" / ".last_work_tree"
         assert code == 0, f"Script failed with stderr: {stderr}"
         assert work_tree_file.exists(), "Script should create .last_work_tree file"
 
-    def test_empty_repo_produces_empty_file(self, policy_hooks_dir: Path, git_repo: Path) -> None:
+    def test_empty_repo_produces_empty_file(self, rules_hooks_dir: Path, git_repo: Path) -> None:
         """Test that a clean repo produces an empty work tree file."""
-        script_path = policy_hooks_dir / "capture_prompt_work_tree.sh"
+        script_path = rules_hooks_dir / "capture_prompt_work_tree.sh"
         stdout, stderr, code = run_capture_script(script_path, git_repo)
 
         # Clean repo should have empty or minimal content
@@ -75,7 +75,7 @@ def test_empty_repo_produces_empty_file(self, policy_hooks_dir: Path, git_repo:
 class TestCapturePromptWorkTreeFileTracking:
     """Tests for file tracking behavior in capture_prompt_work_tree.sh."""
 
-    def test_captures_staged_files(self, policy_hooks_dir: Path, git_repo: Path) -> None:
+    def test_captures_staged_files(self, rules_hooks_dir: Path, git_repo: Path) -> None:
         """Test that staged files are captured."""
         # Create and stage a file
         new_file = git_repo / "staged.py"
@@ -83,7 +83,7 @@ def test_captures_staged_files(self, policy_hooks_dir: Path, git_repo: Path) ->
         repo = Repo(git_repo)
         repo.index.add(["staged.py"])
 
-        script_path = policy_hooks_dir / "capture_prompt_work_tree.sh"
+        script_path = rules_hooks_dir / "capture_prompt_work_tree.sh"
         stdout, stderr, code = run_capture_script(script_path, git_repo)
 
         work_tree_file = git_repo / ".deepwork" / ".last_work_tree"
@@ -92,13 +92,13 @@ def test_captures_staged_files(self, policy_hooks_dir: Path, git_repo: Path) ->
         assert code == 0, f"Script failed with stderr: {stderr}"
         assert "staged.py" in content, "Staged file should be in work tree"
 
-    def test_captures_unstaged_changes(self, policy_hooks_dir: Path, git_repo: Path) -> None:
+    def test_captures_unstaged_changes(self, rules_hooks_dir: Path, git_repo: Path) -> None:
         """Test that unstaged changes are captured (after staging by script)."""
         # Create an unstaged file
         unstaged = git_repo / "unstaged.py"
         unstaged.write_text("# Unstaged file\n")
 
-        script_path = policy_hooks_dir / "capture_prompt_work_tree.sh"
+        script_path = rules_hooks_dir / "capture_prompt_work_tree.sh"
         stdout, stderr, code = run_capture_script(script_path, git_repo)
 
         work_tree_file = git_repo / ".deepwork" / ".last_work_tree"
@@ -107,14 +107,14 @@ def test_captures_unstaged_changes(self, policy_hooks_dir: Path, git_repo: Path)
         assert code == 0, f"Script failed with stderr: {stderr}"
         assert "unstaged.py" in content, "Unstaged file should be captured"
 
-    def test_captures_files_in_subdirectories(self, policy_hooks_dir: Path, git_repo: Path) -> None:
+    def test_captures_files_in_subdirectories(self, rules_hooks_dir: Path, git_repo: Path) -> None:
         """Test that files in subdirectories are captured."""
         # Create files in nested directories
         src_dir = git_repo / "src" / "components"
         src_dir.mkdir(parents=True)
         (src_dir / "button.py").write_text("# Button component\n")
 
-        script_path = policy_hooks_dir / "capture_prompt_work_tree.sh"
+        script_path = rules_hooks_dir / "capture_prompt_work_tree.sh"
         stdout, stderr, code = run_capture_script(script_path, git_repo)
 
         work_tree_file = git_repo / ".deepwork" / ".last_work_tree"
@@ -124,10 +124,10 @@ def test_captures_files_in_subdirectories(self, policy_hooks_dir: Path, git_repo
         assert "src/components/button.py" in content, "Nested file should be captured"
 
     def test_captures_multiple_files(
-        self, policy_hooks_dir: Path, git_repo_with_changes: Path
+        self, rules_hooks_dir: Path, git_repo_with_changes: Path
     ) -> None:
         """Test that multiple files are captured."""
-        script_path = policy_hooks_dir / "capture_prompt_work_tree.sh"
+        script_path = rules_hooks_dir / "capture_prompt_work_tree.sh"
         stdout, stderr, code = run_capture_script(script_path, git_repo_with_changes)
 
         work_tree_file = git_repo_with_changes / ".deepwork" / ".last_work_tree"
@@ -137,14 +137,14 @@ def test_captures_multiple_files(
         assert "modified.py" in content, "Modified file should be captured"
         assert "src/main.py" in content, "File in src/ should be captured"
 
-    def test_file_list_is_sorted_and_unique(self, policy_hooks_dir: Path, git_repo: Path) -> None:
+    def test_file_list_is_sorted_and_unique(self, rules_hooks_dir: Path, git_repo: Path) -> None:
         """Test that the file list is sorted and deduplicated."""
         # Create multiple files
         (git_repo / "z_file.py").write_text("# Z file\n")
         (git_repo / "a_file.py").write_text("# A file\n")
         (git_repo / "m_file.py").write_text("# M file\n")
 
-        script_path = policy_hooks_dir / "capture_prompt_work_tree.sh"
+        script_path = rules_hooks_dir / "capture_prompt_work_tree.sh"
         stdout, stderr, code = run_capture_script(script_path, git_repo)
 
         work_tree_file = git_repo / ".deepwork" / ".last_work_tree"
@@ -161,7 +161,7 @@ def test_file_list_is_sorted_and_unique(self, policy_hooks_dir: Path, git_repo:
 class TestCapturePromptWorkTreeGitStates:
     """Tests for handling various git states in capture_prompt_work_tree.sh."""
 
-    def test_handles_deleted_files(self, policy_hooks_dir: Path, git_repo: Path) -> None:
+    def test_handles_deleted_files(self, rules_hooks_dir: Path, git_repo: Path) -> None:
         """Test that deleted files are handled gracefully."""
         # Create and commit a file, then delete it
         to_delete = git_repo / "to_delete.py"
@@ -173,12 +173,12 @@ def test_handles_deleted_files(self, policy_hooks_dir: Path, git_repo: Path) ->
         # Now delete it
         to_delete.unlink()
 
-        script_path = policy_hooks_dir / "capture_prompt_work_tree.sh"
+        script_path = rules_hooks_dir / "capture_prompt_work_tree.sh"
         stdout, stderr, code = run_capture_script(script_path, git_repo)
 
         assert code == 0, f"Script should handle deletions. stderr: {stderr}"
 
-    def test_handles_renamed_files(self, policy_hooks_dir: Path, git_repo: Path) -> None:
+    def test_handles_renamed_files(self, rules_hooks_dir: Path, git_repo: Path) -> None:
         """Test that renamed files are tracked."""
         # Create and commit a file
         old_name = git_repo / "old_name.py"
@@ -191,7 +191,7 @@ def test_handles_renamed_files(self, policy_hooks_dir: Path, git_repo: Path) ->
         new_name = git_repo / "new_name.py"
         old_name.rename(new_name)
 
-        script_path = policy_hooks_dir / "capture_prompt_work_tree.sh"
+        script_path = rules_hooks_dir / "capture_prompt_work_tree.sh"
         stdout, stderr, code = run_capture_script(script_path, git_repo)
 
         work_tree_file = git_repo / ".deepwork" / ".last_work_tree"
@@ -201,13 +201,13 @@ def test_handles_renamed_files(self, policy_hooks_dir: Path, git_repo: Path) ->
         # Both old (deleted) and new should appear as changes
         assert "new_name.py" in content, "New filename should be captured"
 
-    def test_handles_modified_files(self, policy_hooks_dir: Path, git_repo: Path) -> None:
+    def test_handles_modified_files(self, rules_hooks_dir: Path, git_repo: Path) -> None:
         """Test that modified committed files are tracked."""
         # Modify an existing committed file
         readme = git_repo / "README.md"
         readme.write_text("# Modified content\n")
 
-        script_path = policy_hooks_dir / "capture_prompt_work_tree.sh"
+        script_path = rules_hooks_dir / "capture_prompt_work_tree.sh"
         stdout, stderr, code = run_capture_script(script_path, git_repo)
 
         work_tree_file = git_repo / ".deepwork" / ".last_work_tree"
@@ -220,17 +220,17 @@ def test_handles_modified_files(self, policy_hooks_dir: Path, git_repo: Path) ->
 class TestCapturePromptWorkTreeIdempotence:
     """Tests for idempotent behavior of capture_prompt_work_tree.sh."""
 
-    def test_multiple_runs_succeed(self, policy_hooks_dir: Path, git_repo: Path) -> None:
+    def test_multiple_runs_succeed(self, rules_hooks_dir: Path, git_repo: Path) -> None:
         """Test that the script can be run multiple times."""
-        script_path = policy_hooks_dir / "capture_prompt_work_tree.sh"
+        script_path = rules_hooks_dir / "capture_prompt_work_tree.sh"
 
         for i in range(3):
             stdout, stderr, code = run_capture_script(script_path, git_repo)
             assert code == 0, f"Run {i + 1} failed with stderr: {stderr}"
 
-    def test_updates_on_new_changes(self, policy_hooks_dir: Path, git_repo: Path) -> None:
+    def test_updates_on_new_changes(self, rules_hooks_dir: Path, git_repo: Path) -> None:
         """Test that subsequent runs capture new changes."""
-        script_path = policy_hooks_dir / "capture_prompt_work_tree.sh"
+        script_path = rules_hooks_dir / "capture_prompt_work_tree.sh"
 
         # First run
         run_capture_script(script_path, git_repo)
@@ -246,12 +246,12 @@ def test_updates_on_new_changes(self, policy_hooks_dir: Path, git_repo: Path) ->
 
         assert "new_file.py" in content, "New file should be captured"
 
-    def test_existing_deepwork_dir_not_error(self, policy_hooks_dir: Path, git_repo: Path) -> None:
+    def test_existing_deepwork_dir_not_error(self, rules_hooks_dir: Path, git_repo: Path) -> None:
         """Test that existing .deepwork directory is not an error."""
         # Pre-create the directory
         (git_repo / ".deepwork").mkdir()
 
-        script_path = policy_hooks_dir / "capture_prompt_work_tree.sh"
+        script_path = rules_hooks_dir / "capture_prompt_work_tree.sh"
         stdout, stderr, code = run_capture_script(script_path, git_repo)
 
         assert code == 0, f"Should handle existing .deepwork dir. stderr: {stderr}"
diff --git a/tests/shell_script_tests/test_hook_wrappers.py b/tests/shell_script_tests/test_hook_wrappers.py
index 7b3b1436..ee2c0155 100644
--- a/tests/shell_script_tests/test_hook_wrappers.py
+++ b/tests/shell_script_tests/test_hook_wrappers.py
@@ -282,19 +282,19 @@ def test_non_blocking_event(
         assert output == {} or output.get("decision", "") not in ("block", "deny")
 
 
-class TestPolicyCheckHook:
-    """Tests for the policy_check hook module."""
+class TestRulesCheckHook:
+    """Tests for the rules_check hook module."""
 
     def test_module_imports(self) -> None:
-        """Test that the policy_check module can be imported."""
-        from deepwork.hooks import policy_check
+        """Test that the rules_check module can be imported."""
+        from deepwork.hooks import rules_check
 
-        assert hasattr(policy_check, "main")
-        assert hasattr(policy_check, "policy_check_hook")
+        assert hasattr(rules_check, "main")
+        assert hasattr(rules_check, "rules_check_hook")
 
     def test_hook_function_returns_output(self) -> None:
-        """Test that policy_check_hook returns a HookOutput."""
-        from deepwork.hooks.policy_check import policy_check_hook
+        """Test that rules_check_hook returns a HookOutput."""
+        from deepwork.hooks.rules_check import rules_check_hook
         from deepwork.hooks.wrapper import HookInput, HookOutput, NormalizedEvent, Platform
 
         # Create a minimal hook input
@@ -304,7 +304,7 @@ def test_hook_function_returns_output(self) -> None:
             session_id="test",
         )
 
-        output = policy_check_hook(hook_input)
+        output = rules_check_hook(hook_input)
 
         assert isinstance(output, HookOutput)
         # Should not block for before_prompt event
diff --git a/tests/shell_script_tests/test_hooks_json_format.py b/tests/shell_script_tests/test_hooks_json_format.py
index 14de1b21..74bea39b 100644
--- a/tests/shell_script_tests/test_hooks_json_format.py
+++ b/tests/shell_script_tests/test_hooks_json_format.py
@@ -116,12 +116,12 @@ def validate_prompt_hook_response(response: dict | None) -> None:
     assert isinstance(response, dict), f"Prompt hook output must be a JSON object: {response}"
 
 
-class TestPolicyStopHookJsonFormat:
-    """Tests specifically for policy_stop_hook.sh JSON format compliance."""
+class TestRulesStopHookJsonFormat:
+    """Tests specifically for rules_stop_hook.sh JSON format compliance."""
 
-    def test_allow_response_is_empty_json(self, policy_hooks_dir: Path, git_repo: Path) -> None:
+    def test_allow_response_is_empty_json(self, rules_hooks_dir: Path, git_repo: Path) -> None:
         """Test that allow response is empty JSON object."""
-        script_path = policy_hooks_dir / "policy_stop_hook.sh"
+        script_path = rules_hooks_dir / "rules_stop_hook.sh"
         stdout, stderr, code = run_hook_script(script_path, git_repo)
 
         response = validate_json_output(stdout)
@@ -131,17 +131,17 @@ def test_allow_response_is_empty_json(self, policy_hooks_dir: Path, git_repo: Pa
             assert response == {}, f"Allow response should be empty: {response}"
 
     def test_block_response_has_required_fields(
-        self, policy_hooks_dir: Path, git_repo_with_policy: Path
+        self, rules_hooks_dir: Path, git_repo_with_rule: Path
     ) -> None:
         """Test that block response has decision and reason."""
-        # Create a file that triggers the policy
-        py_file = git_repo_with_policy / "test.py"
+        # Create a file that triggers the rule
+        py_file = git_repo_with_rule / "test.py"
         py_file.write_text("# Python file\n")
-        repo = Repo(git_repo_with_policy)
+        repo = Repo(git_repo_with_rule)
         repo.index.add(["test.py"])
 
-        script_path = policy_hooks_dir / "policy_stop_hook.sh"
-        stdout, stderr, code = run_hook_script(script_path, git_repo_with_policy)
+        script_path = rules_hooks_dir / "rules_stop_hook.sh"
+        stdout, stderr, code = run_hook_script(script_path, git_repo_with_rule)
 
         response = validate_json_output(stdout)
         validate_stop_hook_response(response)
@@ -151,37 +151,37 @@ def test_block_response_has_required_fields(
         assert response.get("decision") == "block", "Expected block decision"
         assert "reason" in response, "Expected reason field"
 
-    def test_block_reason_contains_policy_info(
-        self, policy_hooks_dir: Path, git_repo_with_policy: Path
+    def test_block_reason_contains_rule_info(
+        self, rules_hooks_dir: Path, git_repo_with_rule: Path
     ) -> None:
-        """Test that block reason contains policy information."""
-        py_file = git_repo_with_policy / "test.py"
+        """Test that block reason contains rule information."""
+        py_file = git_repo_with_rule / "test.py"
         py_file.write_text("# Python file\n")
-        repo = Repo(git_repo_with_policy)
+        repo = Repo(git_repo_with_rule)
         repo.index.add(["test.py"])
 
-        script_path = policy_hooks_dir / "policy_stop_hook.sh"
-        stdout, stderr, code = run_hook_script(script_path, git_repo_with_policy)
+        script_path = rules_hooks_dir / "rules_stop_hook.sh"
+        stdout, stderr, code = run_hook_script(script_path, git_repo_with_rule)
 
         response = validate_json_output(stdout)
 
         assert response is not None, "Expected blocking response"
         reason = response.get("reason", "")
 
-        # Should contain useful policy information
-        assert "Policy" in reason or "policy" in reason, f"Reason should mention policy: {reason}"
+        # Should contain useful rule information
+        assert "Rule" in reason or "rule" in reason, f"Reason should mention rule: {reason}"
 
     def test_no_extraneous_keys_in_response(
-        self, policy_hooks_dir: Path, git_repo_with_policy: Path
+        self, rules_hooks_dir: Path, git_repo_with_rule: Path
     ) -> None:
         """Test that response only contains expected keys."""
-        py_file = git_repo_with_policy / "test.py"
+        py_file = git_repo_with_rule / "test.py"
         py_file.write_text("# Python file\n")
-        repo = Repo(git_repo_with_policy)
+        repo = Repo(git_repo_with_rule)
         repo.index.add(["test.py"])
 
-        script_path = policy_hooks_dir / "policy_stop_hook.sh"
-        stdout, stderr, code = run_hook_script(script_path, git_repo_with_policy)
+        script_path = rules_hooks_dir / "rules_stop_hook.sh"
+        stdout, stderr, code = run_hook_script(script_path, git_repo_with_rule)
 
         response = validate_json_output(stdout)
 
@@ -194,16 +194,16 @@ def test_no_extraneous_keys_in_response(
             )
 
     def test_output_is_single_line_json(
-        self, policy_hooks_dir: Path, git_repo_with_policy: Path
+        self, rules_hooks_dir: Path, git_repo_with_rule: Path
     ) -> None:
         """Test that JSON output is single-line (no pretty printing)."""
-        py_file = git_repo_with_policy / "test.py"
+        py_file = git_repo_with_rule / "test.py"
         py_file.write_text("# Python file\n")
-        repo = Repo(git_repo_with_policy)
+        repo = Repo(git_repo_with_rule)
         repo.index.add(["test.py"])
 
-        script_path = policy_hooks_dir / "policy_stop_hook.sh"
-        stdout, stderr, code = run_hook_script(script_path, git_repo_with_policy)
+        script_path = rules_hooks_dir / "rules_stop_hook.sh"
+        stdout, stderr, code = run_hook_script(script_path, git_repo_with_rule)
 
         # Remove trailing newline and check for internal newlines
         output = stdout.strip()
@@ -220,17 +220,17 @@ def test_output_is_single_line_json(
 class TestUserPromptSubmitHookJsonFormat:
     """Tests for user_prompt_submit.sh JSON format compliance."""
 
-    def test_output_is_valid_json_or_empty(self, policy_hooks_dir: Path, git_repo: Path) -> None:
+    def test_output_is_valid_json_or_empty(self, rules_hooks_dir: Path, git_repo: Path) -> None:
         """Test that output is valid JSON or empty."""
-        script_path = policy_hooks_dir / "user_prompt_submit.sh"
+        script_path = rules_hooks_dir / "user_prompt_submit.sh"
         stdout, stderr, code = run_hook_script(script_path, git_repo)
 
         response = validate_json_output(stdout)
         validate_prompt_hook_response(response)
 
-    def test_does_not_block_prompt_submission(self, policy_hooks_dir: Path, git_repo: Path) -> None:
+    def test_does_not_block_prompt_submission(self, rules_hooks_dir: Path, git_repo: Path) -> None:
         """Test that hook does not block prompt submission."""
-        script_path = policy_hooks_dir / "user_prompt_submit.sh"
+        script_path = rules_hooks_dir / "user_prompt_submit.sh"
         stdout, stderr, code = run_hook_script(script_path, git_repo)
 
         response = validate_json_output(stdout)
@@ -246,12 +246,12 @@ class TestHooksJsonFormatWithTranscript:
     """Tests for hook JSON format when using transcript input."""
 
     def test_stop_hook_with_transcript_input(
-        self, policy_hooks_dir: Path, git_repo_with_policy: Path
+        self, rules_hooks_dir: Path, git_repo_with_rule: Path
     ) -> None:
         """Test stop hook JSON format when transcript is provided."""
-        py_file = git_repo_with_policy / "test.py"
+        py_file = git_repo_with_rule / "test.py"
         py_file.write_text("# Python file\n")
-        repo = Repo(git_repo_with_policy)
+        repo = Repo(git_repo_with_rule)
         repo.index.add(["test.py"])
 
         # Create mock transcript
@@ -268,9 +268,9 @@ def test_stop_hook_with_transcript_input(
             f.write("\n")
 
         try:
-            script_path = policy_hooks_dir / "policy_stop_hook.sh"
+            script_path = rules_hooks_dir / "rules_stop_hook.sh"
             hook_input = {"transcript_path": transcript_path}
-            stdout, stderr, code = run_hook_script(script_path, git_repo_with_policy, hook_input)
+            stdout, stderr, code = run_hook_script(script_path, git_repo_with_rule, hook_input)
 
             response = validate_json_output(stdout)
             validate_stop_hook_response(response)
@@ -279,12 +279,12 @@ def test_stop_hook_with_transcript_input(
             os.unlink(transcript_path)
 
     def test_stop_hook_with_promise_returns_empty(
-        self, policy_hooks_dir: Path, git_repo_with_policy: Path
+        self, rules_hooks_dir: Path, git_repo_with_rule: Path
     ) -> None:
-        """Test that promised policies return empty JSON."""
-        py_file = git_repo_with_policy / "test.py"
+        """Test that promised rules return empty JSON."""
+        py_file = git_repo_with_rule / "test.py"
         py_file.write_text("# Python file\n")
-        repo = Repo(git_repo_with_policy)
+        repo = Repo(git_repo_with_rule)
         repo.index.add(["test.py"])
 
         # Create transcript with promise tag
@@ -298,7 +298,7 @@ def test_stop_hook_with_promise_returns_empty(
                             "content": [
                                 {
                                     "type": "text",
-                                    "text": "<promise>✓ Python File Policy</promise>",
+                                    "text": "<promise>Python File Rule</promise>",
                                 }
                             ]
                         },
@@ -308,14 +308,14 @@ def test_stop_hook_with_promise_returns_empty(
             f.write("\n")
 
         try:
-            script_path = policy_hooks_dir / "policy_stop_hook.sh"
+            script_path = rules_hooks_dir / "rules_stop_hook.sh"
             hook_input = {"transcript_path": transcript_path}
-            stdout, stderr, code = run_hook_script(script_path, git_repo_with_policy, hook_input)
+            stdout, stderr, code = run_hook_script(script_path, git_repo_with_rule, hook_input)
 
             response = validate_json_output(stdout)
             validate_stop_hook_response(response)
 
-            # Should be empty (allow) because policy was promised
+            # Should be empty (allow) because rule was promised
             if response is not None:
                 assert response == {}, f"Expected empty response: {response}"
 
@@ -326,38 +326,38 @@ def test_stop_hook_with_promise_returns_empty(
 class TestHooksExitCodes:
     """Tests for hook script exit codes."""
 
-    def test_stop_hook_exits_zero_on_allow(self, policy_hooks_dir: Path, git_repo: Path) -> None:
+    def test_stop_hook_exits_zero_on_allow(self, rules_hooks_dir: Path, git_repo: Path) -> None:
         """Test that stop hook exits 0 when allowing."""
-        script_path = policy_hooks_dir / "policy_stop_hook.sh"
+        script_path = rules_hooks_dir / "rules_stop_hook.sh"
         stdout, stderr, code = run_hook_script(script_path, git_repo)
 
         assert code == 0, f"Allow should exit 0. stderr: {stderr}"
 
     def test_stop_hook_exits_zero_on_block(
-        self, policy_hooks_dir: Path, git_repo_with_policy: Path
+        self, rules_hooks_dir: Path, git_repo_with_rule: Path
     ) -> None:
         """Test that stop hook exits 0 even when blocking."""
-        py_file = git_repo_with_policy / "test.py"
+        py_file = git_repo_with_rule / "test.py"
         py_file.write_text("# Python file\n")
-        repo = Repo(git_repo_with_policy)
+        repo = Repo(git_repo_with_rule)
         repo.index.add(["test.py"])
 
-        script_path = policy_hooks_dir / "policy_stop_hook.sh"
-        stdout, stderr, code = run_hook_script(script_path, git_repo_with_policy)
+        script_path = rules_hooks_dir / "rules_stop_hook.sh"
+        stdout, stderr, code = run_hook_script(script_path, git_repo_with_rule)
 
         # Hooks should exit 0 and communicate via JSON
         assert code == 0, f"Block should still exit 0. stderr: {stderr}"
 
-    def test_user_prompt_hook_exits_zero(self, policy_hooks_dir: Path, git_repo: Path) -> None:
+    def test_user_prompt_hook_exits_zero(self, rules_hooks_dir: Path, git_repo: Path) -> None:
         """Test that user prompt hook always exits 0."""
-        script_path = policy_hooks_dir / "user_prompt_submit.sh"
+        script_path = rules_hooks_dir / "user_prompt_submit.sh"
         stdout, stderr, code = run_hook_script(script_path, git_repo)
 
         assert code == 0, f"User prompt hook should exit 0. stderr: {stderr}"
 
-    def test_capture_script_exits_zero(self, policy_hooks_dir: Path, git_repo: Path) -> None:
+    def test_capture_script_exits_zero(self, rules_hooks_dir: Path, git_repo: Path) -> None:
         """Test that capture script exits 0."""
-        script_path = policy_hooks_dir / "capture_prompt_work_tree.sh"
+        script_path = rules_hooks_dir / "capture_prompt_work_tree.sh"
         stdout, stderr, code = run_hook_script(script_path, git_repo)
 
         assert code == 0, f"Capture script should exit 0. stderr: {stderr}"
diff --git a/tests/shell_script_tests/test_policy_stop_hook.py b/tests/shell_script_tests/test_rules_stop_hook.py
similarity index 61%
rename from tests/shell_script_tests/test_policy_stop_hook.py
rename to tests/shell_script_tests/test_rules_stop_hook.py
index bfe9c04c..605f0fcb 100644
--- a/tests/shell_script_tests/test_policy_stop_hook.py
+++ b/tests/shell_script_tests/test_rules_stop_hook.py
@@ -1,6 +1,6 @@
-"""Tests for policy_stop_hook.sh shell script.
+"""Tests for rules_stop_hook.sh shell script.
 
-These tests verify that the policy stop hook correctly outputs JSON
+These tests verify that the rules stop hook correctly outputs JSON
 to block or allow the stop event in Claude Code.
 """
 
@@ -16,8 +16,8 @@
 
 
 @pytest.fixture
-def git_repo_with_src_policy(tmp_path: Path) -> Path:
-    """Create a git repo with a v2 policy file that triggers on src/** changes."""
+def git_repo_with_src_rule(tmp_path: Path) -> Path:
+    """Create a git repo with a v2 rule file that triggers on src/** changes."""
     repo = Repo.init(tmp_path)
 
     readme = tmp_path / "README.md"
@@ -25,20 +25,20 @@ def git_repo_with_src_policy(tmp_path: Path) -> Path:
     repo.index.add(["README.md"])
     repo.index.commit("Initial commit")
 
-    # Create v2 policy directory and file
-    policies_dir = tmp_path / ".deepwork" / "policies"
-    policies_dir.mkdir(parents=True, exist_ok=True)
+    # Create v2 rules directory and file
+    rules_dir = tmp_path / ".deepwork" / "rules"
+    rules_dir.mkdir(parents=True, exist_ok=True)
 
     # Use compare_to: prompt since test repos don't have origin remote
-    policy_file = policies_dir / "test-policy.md"
-    policy_file.write_text(
+    rule_file = rules_dir / "test-rule.md"
+    rule_file.write_text(
         """---
-name: Test Policy
+name: Test Rule
 trigger: "src/**/*"
 compare_to: prompt
 ---
-This is a test policy that fires when src/ files change.
-Please address this policy.
+This is a test rule that fires when src/ files change.
+Please address this rule.
 """
     )
 
@@ -54,29 +54,29 @@ def run_stop_hook(
     cwd: Path,
     hook_input: dict | None = None,
 ) -> tuple[str, str, int]:
-    """Run the policy_stop_hook.sh script and return its output."""
+    """Run the rules_stop_hook.sh script and return its output."""
     return run_shell_script(script_path, cwd, hook_input=hook_input)
 
 
-class TestPolicyStopHookBlocking:
-    """Tests for policy_stop_hook.sh blocking behavior."""
+class TestRulesStopHookBlocking:
+    """Tests for rules_stop_hook.sh blocking behavior."""
 
-    def test_outputs_block_json_when_policy_fires(
-        self, policy_hooks_dir: Path, git_repo_with_src_policy: Path
+    def test_outputs_block_json_when_rule_fires(
+        self, rules_hooks_dir: Path, git_repo_with_src_rule: Path
     ) -> None:
-        """Test that the hook outputs blocking JSON when a policy fires."""
-        # Create a file that triggers the policy
-        src_dir = git_repo_with_src_policy / "src"
+        """Test that the hook outputs blocking JSON when a rule fires."""
+        # Create a file that triggers the rule
+        src_dir = git_repo_with_src_rule / "src"
         src_dir.mkdir(exist_ok=True)
         (src_dir / "main.py").write_text("# New file\n")
 
         # Stage the change
-        repo = Repo(git_repo_with_src_policy)
+        repo = Repo(git_repo_with_src_rule)
         repo.index.add(["src/main.py"])
 
         # Run the stop hook
-        script_path = policy_hooks_dir / "policy_stop_hook.sh"
-        stdout, stderr, code = run_stop_hook(script_path, git_repo_with_src_policy)
+        script_path = rules_hooks_dir / "rules_stop_hook.sh"
+        stdout, stderr, code = run_stop_hook(script_path, git_repo_with_src_rule)
 
         # Parse the output as JSON
         output = stdout.strip()
@@ -91,18 +91,18 @@ def test_outputs_block_json_when_policy_fires(
         assert "decision" in result, f"Expected 'decision' key in JSON: {result}"
         assert result["decision"] == "block", f"Expected decision='block', got: {result}"
         assert "reason" in result, f"Expected 'reason' key in JSON: {result}"
-        assert "Test Policy" in result["reason"], f"Policy name not in reason: {result}"
+        assert "Test Rule" in result["reason"], f"Rule name not in reason: {result}"
 
-    def test_outputs_empty_json_when_no_policy_fires(
-        self, policy_hooks_dir: Path, git_repo_with_src_policy: Path
+    def test_outputs_empty_json_when_no_rule_fires(
+        self, rules_hooks_dir: Path, git_repo_with_src_rule: Path
     ) -> None:
-        """Test that the hook outputs empty JSON when no policy fires."""
-        # Don't create any files that would trigger the policy
-        # (policy triggers on src/** but we haven't created anything in src/)
+        """Test that the hook outputs empty JSON when no rule fires."""
+        # Don't create any files that would trigger the rule
+        # (rule triggers on src/** but we haven't created anything in src/)
 
         # Run the stop hook
-        script_path = policy_hooks_dir / "policy_stop_hook.sh"
-        stdout, stderr, code = run_stop_hook(script_path, git_repo_with_src_policy)
+        script_path = rules_hooks_dir / "rules_stop_hook.sh"
+        stdout, stderr, code = run_stop_hook(script_path, git_repo_with_src_rule)
 
         # Parse the output as JSON
         output = stdout.strip()
@@ -114,16 +114,16 @@ def test_outputs_empty_json_when_no_policy_fires(
             pytest.fail(f"Output is not valid JSON: {output!r}. Error: {e}")
 
         # Should be empty JSON (no blocking)
-        assert result == {}, f"Expected empty JSON when no policies fire, got: {result}"
+        assert result == {}, f"Expected empty JSON when no rules fire, got: {result}"
 
-    def test_exits_early_when_no_policy_dir(self, policy_hooks_dir: Path, git_repo: Path) -> None:
-        """Test that the hook exits cleanly when no policy directory exists."""
-        script_path = policy_hooks_dir / "policy_stop_hook.sh"
+    def test_exits_early_when_no_rules_dir(self, rules_hooks_dir: Path, git_repo: Path) -> None:
+        """Test that the hook exits cleanly when no rules directory exists."""
+        script_path = rules_hooks_dir / "rules_stop_hook.sh"
         stdout, stderr, code = run_stop_hook(script_path, git_repo)
 
         # Should exit with code 0 and produce no output (or empty)
         assert code == 0, f"Expected exit code 0, got {code}. stderr: {stderr}"
-        # No output is fine when there's no policy directory
+        # No output is fine when there's no rules directory
         output = stdout.strip()
         if output:
             # If there is output, it should be valid JSON
@@ -135,16 +135,16 @@ def test_exits_early_when_no_policy_dir(self, policy_hooks_dir: Path, git_repo:
                 pass
 
     def test_respects_promise_tags(
-        self, policy_hooks_dir: Path, git_repo_with_src_policy: Path
+        self, rules_hooks_dir: Path, git_repo_with_src_rule: Path
     ) -> None:
-        """Test that promised policies are not re-triggered."""
-        # Create a file that triggers the policy
-        src_dir = git_repo_with_src_policy / "src"
+        """Test that promised rules are not re-triggered."""
+        # Create a file that triggers the rule
+        src_dir = git_repo_with_src_rule / "src"
         src_dir.mkdir(exist_ok=True)
         (src_dir / "main.py").write_text("# New file\n")
 
         # Stage the change
-        repo = Repo(git_repo_with_src_policy)
+        repo = Repo(git_repo_with_src_rule)
         repo.index.add(["src/main.py"])
 
         # Create a mock transcript with the promise tag
@@ -159,7 +159,7 @@ def test_respects_promise_tags(
                             "content": [
                                 {
                                     "type": "text",
-                                    "text": "I've addressed the policy. <promise>✓ Test Policy</promise>",
+                                    "text": "I've addressed the rule. <promise>Test Rule</promise>",
                                 }
                             ]
                         },
@@ -170,9 +170,9 @@ def test_respects_promise_tags(
 
         try:
             # Run the stop hook with transcript path
-            script_path = policy_hooks_dir / "policy_stop_hook.sh"
+            script_path = rules_hooks_dir / "rules_stop_hook.sh"
             hook_input = {"transcript_path": transcript_path, "hook_event_name": "Stop"}
-            stdout, stderr, code = run_stop_hook(script_path, git_repo_with_src_policy, hook_input)
+            stdout, stderr, code = run_stop_hook(script_path, git_repo_with_src_rule, hook_input)
 
             # Parse the output
             output = stdout.strip()
@@ -180,13 +180,13 @@ def test_respects_promise_tags(
 
             result = json.loads(output)
 
-            # Should be empty JSON because the policy was promised
-            assert result == {}, f"Expected empty JSON when policy is promised, got: {result}"
+            # Should be empty JSON because the rule was promised
+            assert result == {}, f"Expected empty JSON when rule is promised, got: {result}"
         finally:
             os.unlink(transcript_path)
 
-    def test_safety_pattern_prevents_firing(self, policy_hooks_dir: Path, tmp_path: Path) -> None:
-        """Test that safety patterns prevent policies from firing."""
+    def test_safety_pattern_prevents_firing(self, rules_hooks_dir: Path, tmp_path: Path) -> None:
+        """Test that safety patterns prevent rules from firing."""
         # Initialize git repo
         repo = Repo.init(tmp_path)
 
@@ -195,14 +195,14 @@ def test_safety_pattern_prevents_firing(self, policy_hooks_dir: Path, tmp_path:
         repo.index.add(["README.md"])
         repo.index.commit("Initial commit")
 
-        # Create v2 policy with a safety pattern
-        policies_dir = tmp_path / ".deepwork" / "policies"
-        policies_dir.mkdir(parents=True, exist_ok=True)
+        # Create v2 rule with a safety pattern
+        rules_dir = tmp_path / ".deepwork" / "rules"
+        rules_dir.mkdir(parents=True, exist_ok=True)
 
-        policy_file = policies_dir / "documentation-policy.md"
-        policy_file.write_text(
+        rule_file = rules_dir / "documentation-rule.md"
+        rule_file.write_text(
             """---
-name: Documentation Policy
+name: Documentation Rule
 trigger: "src/**/*"
 safety: "docs/**/*"
 compare_to: prompt
@@ -228,7 +228,7 @@ def test_safety_pattern_prevents_firing(self, policy_hooks_dir: Path, tmp_path:
         repo.index.add(["src/main.py", "docs/api.md"])
 
         # Run the stop hook
-        script_path = policy_hooks_dir / "policy_stop_hook.sh"
+        script_path = rules_hooks_dir / "rules_stop_hook.sh"
         stdout, stderr, code = run_stop_hook(script_path, tmp_path)
 
         # Parse the output
@@ -241,23 +241,23 @@ def test_safety_pattern_prevents_firing(self, policy_hooks_dir: Path, tmp_path:
         assert result == {}, f"Expected empty JSON when safety pattern matches, got: {result}"
 
 
-class TestPolicyStopHookJsonFormat:
-    """Tests for the JSON output format of policy_stop_hook.sh."""
+class TestRulesStopHookJsonFormat:
+    """Tests for the JSON output format of rules_stop_hook.sh."""
 
     def test_json_has_correct_structure(
-        self, policy_hooks_dir: Path, git_repo_with_src_policy: Path
+        self, rules_hooks_dir: Path, git_repo_with_src_rule: Path
     ) -> None:
         """Test that blocking JSON has the correct Claude Code structure."""
-        # Create a file that triggers the policy
-        src_dir = git_repo_with_src_policy / "src"
+        # Create a file that triggers the rule
+        src_dir = git_repo_with_src_rule / "src"
         src_dir.mkdir(exist_ok=True)
         (src_dir / "main.py").write_text("# New file\n")
 
-        repo = Repo(git_repo_with_src_policy)
+        repo = Repo(git_repo_with_src_rule)
         repo.index.add(["src/main.py"])
 
-        script_path = policy_hooks_dir / "policy_stop_hook.sh"
-        stdout, stderr, code = run_stop_hook(script_path, git_repo_with_src_policy)
+        script_path = rules_hooks_dir / "rules_stop_hook.sh"
+        stdout, stderr, code = run_stop_hook(script_path, git_repo_with_src_rule)
 
         result = json.loads(stdout.strip())
 
@@ -270,24 +270,24 @@ def test_json_has_correct_structure(
         assert isinstance(result["reason"], str)
         assert len(result["reason"]) > 0
 
-    def test_reason_contains_policy_instructions(
-        self, policy_hooks_dir: Path, git_repo_with_src_policy: Path
+    def test_reason_contains_rule_instructions(
+        self, rules_hooks_dir: Path, git_repo_with_src_rule: Path
     ) -> None:
-        """Test that the reason includes the policy instructions."""
-        src_dir = git_repo_with_src_policy / "src"
+        """Test that the reason includes the rule instructions."""
+        src_dir = git_repo_with_src_rule / "src"
         src_dir.mkdir(exist_ok=True)
         (src_dir / "main.py").write_text("# New file\n")
 
-        repo = Repo(git_repo_with_src_policy)
+        repo = Repo(git_repo_with_src_rule)
         repo.index.add(["src/main.py"])
 
-        script_path = policy_hooks_dir / "policy_stop_hook.sh"
-        stdout, stderr, code = run_stop_hook(script_path, git_repo_with_src_policy)
+        script_path = rules_hooks_dir / "rules_stop_hook.sh"
+        stdout, stderr, code = run_stop_hook(script_path, git_repo_with_src_rule)
 
         result = json.loads(stdout.strip())
 
-        # Check that the reason contains the policy content
+        # Check that the reason contains the rule content
         reason = result["reason"]
-        assert "DeepWork Policies Triggered" in reason
-        assert "Test Policy" in reason
-        assert "test policy that fires" in reason
+        assert "DeepWork Rules Triggered" in reason
+        assert "Test Rule" in reason
+        assert "test rule that fires" in reason
diff --git a/tests/shell_script_tests/test_user_prompt_submit.py b/tests/shell_script_tests/test_user_prompt_submit.py
index b503727b..3f1b655e 100644
--- a/tests/shell_script_tests/test_user_prompt_submit.py
+++ b/tests/shell_script_tests/test_user_prompt_submit.py
@@ -28,34 +28,34 @@ def run_user_prompt_submit_hook(
 class TestUserPromptSubmitHookExecution:
     """Tests for user_prompt_submit.sh execution behavior."""
 
-    def test_exits_successfully(self, policy_hooks_dir: Path, git_repo: Path) -> None:
+    def test_exits_successfully(self, rules_hooks_dir: Path, git_repo: Path) -> None:
         """Test that the hook exits with code 0."""
-        script_path = policy_hooks_dir / "user_prompt_submit.sh"
+        script_path = rules_hooks_dir / "user_prompt_submit.sh"
         stdout, stderr, code = run_user_prompt_submit_hook(script_path, git_repo)
 
         assert code == 0, f"Expected exit code 0, got {code}. stderr: {stderr}"
 
-    def test_creates_deepwork_directory(self, policy_hooks_dir: Path, git_repo: Path) -> None:
+    def test_creates_deepwork_directory(self, rules_hooks_dir: Path, git_repo: Path) -> None:
         """Test that the hook creates .deepwork directory if it doesn't exist."""
         deepwork_dir = git_repo / ".deepwork"
         assert not deepwork_dir.exists(), "Precondition: .deepwork should not exist"
 
-        script_path = policy_hooks_dir / "user_prompt_submit.sh"
+        script_path = rules_hooks_dir / "user_prompt_submit.sh"
         stdout, stderr, code = run_user_prompt_submit_hook(script_path, git_repo)
 
         assert code == 0, f"Script failed with stderr: {stderr}"
         assert deepwork_dir.exists(), "Hook should create .deepwork directory"
 
-    def test_creates_last_work_tree_file(self, policy_hooks_dir: Path, git_repo: Path) -> None:
+    def test_creates_last_work_tree_file(self, rules_hooks_dir: Path, git_repo: Path) -> None:
         """Test that the hook creates .deepwork/.last_work_tree file."""
-        script_path = policy_hooks_dir / "user_prompt_submit.sh"
+        script_path = rules_hooks_dir / "user_prompt_submit.sh"
         stdout, stderr, code = run_user_prompt_submit_hook(script_path, git_repo)
 
         work_tree_file = git_repo / ".deepwork" / ".last_work_tree"
         assert code == 0, f"Script failed with stderr: {stderr}"
         assert work_tree_file.exists(), "Hook should create .last_work_tree file"
 
-    def test_captures_staged_changes(self, policy_hooks_dir: Path, git_repo: Path) -> None:
+    def test_captures_staged_changes(self, rules_hooks_dir: Path, git_repo: Path) -> None:
         """Test that the hook captures staged file changes."""
         # Create and stage a new file
         new_file = git_repo / "new_file.py"
@@ -63,7 +63,7 @@ def test_captures_staged_changes(self, policy_hooks_dir: Path, git_repo: Path) -
         repo = Repo(git_repo)
         repo.index.add(["new_file.py"])
 
-        script_path = policy_hooks_dir / "user_prompt_submit.sh"
+        script_path = rules_hooks_dir / "user_prompt_submit.sh"
         stdout, stderr, code = run_user_prompt_submit_hook(script_path, git_repo)
 
         assert code == 0, f"Script failed with stderr: {stderr}"
@@ -72,13 +72,13 @@ def test_captures_staged_changes(self, policy_hooks_dir: Path, git_repo: Path) -
         content = work_tree_file.read_text()
         assert "new_file.py" in content, "Staged file should be captured"
 
-    def test_captures_untracked_files(self, policy_hooks_dir: Path, git_repo: Path) -> None:
+    def test_captures_untracked_files(self, rules_hooks_dir: Path, git_repo: Path) -> None:
         """Test that the hook captures untracked files."""
         # Create an untracked file (don't stage it)
         untracked = git_repo / "untracked.txt"
         untracked.write_text("untracked content\n")
 
-        script_path = policy_hooks_dir / "user_prompt_submit.sh"
+        script_path = rules_hooks_dir / "user_prompt_submit.sh"
         stdout, stderr, code = run_user_prompt_submit_hook(script_path, git_repo)
 
         assert code == 0, f"Script failed with stderr: {stderr}"
@@ -99,9 +99,9 @@ class TestUserPromptSubmitHookJsonOutput:
     Either is acceptable; invalid JSON is NOT acceptable.
     """
 
-    def test_output_is_empty_or_valid_json(self, policy_hooks_dir: Path, git_repo: Path) -> None:
+    def test_output_is_empty_or_valid_json(self, rules_hooks_dir: Path, git_repo: Path) -> None:
         """Test that output is either empty or valid JSON."""
-        script_path = policy_hooks_dir / "user_prompt_submit.sh"
+        script_path = rules_hooks_dir / "user_prompt_submit.sh"
         stdout, stderr, code = run_user_prompt_submit_hook(script_path, git_repo)
 
         output = stdout.strip()
@@ -114,9 +114,9 @@ def test_output_is_empty_or_valid_json(self, policy_hooks_dir: Path, git_repo: P
             except json.JSONDecodeError as e:
                 pytest.fail(f"Output is not valid JSON: {output!r}. Error: {e}")
 
-    def test_does_not_block_prompt(self, policy_hooks_dir: Path, git_repo: Path) -> None:
+    def test_does_not_block_prompt(self, rules_hooks_dir: Path, git_repo: Path) -> None:
         """Test that the hook does not return a blocking response."""
-        script_path = policy_hooks_dir / "user_prompt_submit.sh"
+        script_path = rules_hooks_dir / "user_prompt_submit.sh"
         stdout, stderr, code = run_user_prompt_submit_hook(script_path, git_repo)
 
         output = stdout.strip()
@@ -135,18 +135,18 @@ def test_does_not_block_prompt(self, policy_hooks_dir: Path, git_repo: Path) ->
 class TestUserPromptSubmitHookIdempotence:
     """Tests for idempotent behavior of user_prompt_submit.sh."""
 
-    def test_multiple_runs_succeed(self, policy_hooks_dir: Path, git_repo: Path) -> None:
+    def test_multiple_runs_succeed(self, rules_hooks_dir: Path, git_repo: Path) -> None:
         """Test that the hook can be run multiple times successfully."""
-        script_path = policy_hooks_dir / "user_prompt_submit.sh"
+        script_path = rules_hooks_dir / "user_prompt_submit.sh"
 
         # Run multiple times
         for i in range(3):
             stdout, stderr, code = run_user_prompt_submit_hook(script_path, git_repo)
             assert code == 0, f"Run {i + 1} failed with stderr: {stderr}"
 
-    def test_updates_work_tree_on_new_changes(self, policy_hooks_dir: Path, git_repo: Path) -> None:
+    def test_updates_work_tree_on_new_changes(self, rules_hooks_dir: Path, git_repo: Path) -> None:
         """Test that subsequent runs update the work tree state."""
-        script_path = policy_hooks_dir / "user_prompt_submit.sh"
+        script_path = rules_hooks_dir / "user_prompt_submit.sh"
         repo = Repo(git_repo)
 
         # First run - capture initial state
diff --git a/tests/unit/test_hook_wrapper.py b/tests/unit/test_hook_wrapper.py
index 4332c914..5e8db92b 100644
--- a/tests/unit/test_hook_wrapper.py
+++ b/tests/unit/test_hook_wrapper.py
@@ -409,14 +409,14 @@ def test_claude_stop_hook_flow(self) -> None:
         assert hook_input.event == NormalizedEvent.AFTER_AGENT
 
         # Process (would call hook function here)
-        hook_output = HookOutput(decision="block", reason="Policy X requires attention")
+        hook_output = HookOutput(decision="block", reason="Rule X requires attention")
 
         # Denormalize
         output_json = denormalize_output(hook_output, Platform.CLAUDE, hook_input.event)
         result = json.loads(output_json)
 
         assert result["decision"] == "block"
-        assert "Policy X" in result["reason"]
+        assert "Rule X" in result["reason"]
 
     def test_gemini_afteragent_hook_flow(self) -> None:
         """Test complete flow for Gemini AfterAgent hook."""
@@ -436,7 +436,7 @@ def test_gemini_afteragent_hook_flow(self) -> None:
         assert hook_input.event == NormalizedEvent.AFTER_AGENT
 
         # Process (would call hook function here)
-        hook_output = HookOutput(decision="block", reason="Policy Y requires attention")
+        hook_output = HookOutput(decision="block", reason="Rule Y requires attention")
 
         # Denormalize
         output_json = denormalize_output(hook_output, Platform.GEMINI, hook_input.event)
@@ -444,7 +444,7 @@ def test_gemini_afteragent_hook_flow(self) -> None:
 
         # Gemini should get "deny" instead of "block"
         assert result["decision"] == "deny"
-        assert "Policy Y" in result["reason"]
+        assert "Rule Y" in result["reason"]
 
     def test_cross_platform_same_hook_logic(self) -> None:
         """Test that the same hook logic produces correct output for both platforms."""
diff --git a/tests/unit/test_hooks_syncer.py b/tests/unit/test_hooks_syncer.py
index 0a1b1c0c..190fee1b 100644
--- a/tests/unit/test_hooks_syncer.py
+++ b/tests/unit/test_hooks_syncer.py
@@ -47,7 +47,7 @@ def test_from_job_dir_with_hooks(self, temp_dir: Path) -> None:
 UserPromptSubmit:
   - capture.sh
 Stop:
-  - policy_check.sh
+  - rules_check.sh
   - cleanup.sh
 """
         )
@@ -57,7 +57,7 @@ def test_from_job_dir_with_hooks(self, temp_dir: Path) -> None:
         assert result is not None
         assert result.job_name == "test_job"
         assert result.hooks["UserPromptSubmit"] == ["capture.sh"]
-        assert result.hooks["Stop"] == ["policy_check.sh", "cleanup.sh"]
+        assert result.hooks["Stop"] == ["rules_check.sh", "cleanup.sh"]
 
     def test_from_job_dir_no_hooks_file(self, temp_dir: Path) -> None:
         """Test returns None when no hooks file exists."""
diff --git a/tests/unit/test_policy_parser.py b/tests/unit/test_policy_parser.py
deleted file mode 100644
index 62c73cb8..00000000
--- a/tests/unit/test_policy_parser.py
+++ /dev/null
@@ -1,364 +0,0 @@
-"""Tests for policy definition parser."""
-
-from pathlib import Path
-
-import pytest
-
-from deepwork.core.pattern_matcher import matches_any_pattern as matches_pattern
-from deepwork.core.policy_parser import (
-    DEFAULT_COMPARE_TO,
-    DetectionMode,
-    Policy,
-    PolicyParseError,
-    evaluate_policies,
-    evaluate_policy,
-    load_policies_from_directory,
-)
-
-
-class TestMatchesPattern:
-    """Tests for matches_pattern function."""
-
-    def test_simple_glob_match(self) -> None:
-        """Test simple glob pattern matching."""
-        assert matches_pattern("file.py", ["*.py"])
-        assert not matches_pattern("file.js", ["*.py"])
-
-    def test_directory_glob_match(self) -> None:
-        """Test directory pattern matching."""
-        assert matches_pattern("src/file.py", ["src/*"])
-        assert not matches_pattern("test/file.py", ["src/*"])
-
-    def test_recursive_glob_match(self) -> None:
-        """Test recursive ** pattern matching."""
-        assert matches_pattern("src/deep/nested/file.py", ["src/**/*.py"])
-        assert matches_pattern("src/file.py", ["src/**/*.py"])
-        assert not matches_pattern("test/file.py", ["src/**/*.py"])
-
-    def test_multiple_patterns(self) -> None:
-        """Test matching against multiple patterns."""
-        patterns = ["*.py", "*.js"]
-        assert matches_pattern("file.py", patterns)
-        assert matches_pattern("file.js", patterns)
-        assert not matches_pattern("file.txt", patterns)
-
-    def test_config_directory_pattern(self) -> None:
-        """Test pattern like app/config/**/*."""
-        assert matches_pattern("app/config/settings.py", ["app/config/**/*"])
-        assert matches_pattern("app/config/nested/deep.yml", ["app/config/**/*"])
-        assert not matches_pattern("app/other/file.py", ["app/config/**/*"])
-
-
-class TestEvaluatePolicy:
-    """Tests for evaluate_policy function."""
-
-    def test_fires_when_trigger_matches(self) -> None:
-        """Test policy fires when trigger matches."""
-        policy = Policy(
-            name="Test",
-            filename="test",
-            detection_mode=DetectionMode.TRIGGER_SAFETY,
-            triggers=["src/**/*.py"],
-            safety=[],
-            instructions="Check it",
-        )
-        changed_files = ["src/main.py", "README.md"]
-
-        result = evaluate_policy(policy, changed_files)
-        assert result.should_fire is True
-
-    def test_does_not_fire_when_no_trigger_match(self) -> None:
-        """Test policy doesn't fire when no trigger matches."""
-        policy = Policy(
-            name="Test",
-            filename="test",
-            detection_mode=DetectionMode.TRIGGER_SAFETY,
-            triggers=["src/**/*.py"],
-            safety=[],
-            instructions="Check it",
-        )
-        changed_files = ["test/main.py", "README.md"]
-
-        result = evaluate_policy(policy, changed_files)
-        assert result.should_fire is False
-
-    def test_does_not_fire_when_safety_matches(self) -> None:
-        """Test policy doesn't fire when safety file is also changed."""
-        policy = Policy(
-            name="Test",
-            filename="test",
-            detection_mode=DetectionMode.TRIGGER_SAFETY,
-            triggers=["app/config/**/*"],
-            safety=["docs/install_guide.md"],
-            instructions="Update docs",
-        )
-        changed_files = ["app/config/settings.py", "docs/install_guide.md"]
-
-        result = evaluate_policy(policy, changed_files)
-        assert result.should_fire is False
-
-    def test_fires_when_trigger_matches_but_safety_doesnt(self) -> None:
-        """Test policy fires when trigger matches but safety doesn't."""
-        policy = Policy(
-            name="Test",
-            filename="test",
-            detection_mode=DetectionMode.TRIGGER_SAFETY,
-            triggers=["app/config/**/*"],
-            safety=["docs/install_guide.md"],
-            instructions="Update docs",
-        )
-        changed_files = ["app/config/settings.py", "app/main.py"]
-
-        result = evaluate_policy(policy, changed_files)
-        assert result.should_fire is True
-
-    def test_multiple_safety_patterns(self) -> None:
-        """Test policy with multiple safety patterns."""
-        policy = Policy(
-            name="Test",
-            filename="test",
-            detection_mode=DetectionMode.TRIGGER_SAFETY,
-            triggers=["src/auth/**/*"],
-            safety=["SECURITY.md", "docs/security_review.md"],
-            instructions="Security review",
-        )
-
-        # Should not fire if any safety file is changed
-        result1 = evaluate_policy(policy, ["src/auth/login.py", "SECURITY.md"])
-        assert result1.should_fire is False
-        result2 = evaluate_policy(policy, ["src/auth/login.py", "docs/security_review.md"])
-        assert result2.should_fire is False
-
-        # Should fire if no safety files changed
-        result3 = evaluate_policy(policy, ["src/auth/login.py"])
-        assert result3.should_fire is True
-
-
-class TestEvaluatePolicies:
-    """Tests for evaluate_policies function."""
-
-    def test_returns_fired_policies(self) -> None:
-        """Test that evaluate_policies returns all fired policies."""
-        policies = [
-            Policy(
-                name="Policy 1",
-                filename="policy1",
-                detection_mode=DetectionMode.TRIGGER_SAFETY,
-                triggers=["src/**/*"],
-                safety=[],
-                instructions="Do 1",
-            ),
-            Policy(
-                name="Policy 2",
-                filename="policy2",
-                detection_mode=DetectionMode.TRIGGER_SAFETY,
-                triggers=["test/**/*"],
-                safety=[],
-                instructions="Do 2",
-            ),
-        ]
-        changed_files = ["src/main.py", "test/test_main.py"]
-
-        fired = evaluate_policies(policies, changed_files)
-
-        assert len(fired) == 2
-        assert fired[0].policy.name == "Policy 1"
-        assert fired[1].policy.name == "Policy 2"
-
-    def test_skips_promised_policies(self) -> None:
-        """Test that promised policies are skipped."""
-        policies = [
-            Policy(
-                name="Policy 1",
-                filename="policy1",
-                detection_mode=DetectionMode.TRIGGER_SAFETY,
-                triggers=["src/**/*"],
-                safety=[],
-                instructions="Do 1",
-            ),
-            Policy(
-                name="Policy 2",
-                filename="policy2",
-                detection_mode=DetectionMode.TRIGGER_SAFETY,
-                triggers=["src/**/*"],
-                safety=[],
-                instructions="Do 2",
-            ),
-        ]
-        changed_files = ["src/main.py"]
-        promised = {"Policy 1"}
-
-        fired = evaluate_policies(policies, changed_files, promised)
-
-        assert len(fired) == 1
-        assert fired[0].policy.name == "Policy 2"
-
-    def test_returns_empty_when_no_policies_fire(self) -> None:
-        """Test returns empty list when no policies fire."""
-        policies = [
-            Policy(
-                name="Policy 1",
-                filename="policy1",
-                detection_mode=DetectionMode.TRIGGER_SAFETY,
-                triggers=["src/**/*"],
-                safety=[],
-                instructions="Do 1",
-            ),
-        ]
-        changed_files = ["test/test_main.py"]
-
-        fired = evaluate_policies(policies, changed_files)
-
-        assert len(fired) == 0
-
-
-class TestLoadPoliciesFromDirectory:
-    """Tests for load_policies_from_directory function."""
-
-    def test_loads_policies_from_directory(self, temp_dir: Path) -> None:
-        """Test loading policies from a directory."""
-        policies_dir = temp_dir / "policies"
-        policies_dir.mkdir()
-
-        # Create a policy file
-        policy_file = policies_dir / "test-policy.md"
-        policy_file.write_text(
-            """---
-name: Test Policy
-trigger: "src/**/*"
----
-Please check the source files.
-"""
-        )
-
-        policies = load_policies_from_directory(policies_dir)
-
-        assert len(policies) == 1
-        assert policies[0].name == "Test Policy"
-        assert policies[0].triggers == ["src/**/*"]
-        assert policies[0].detection_mode == DetectionMode.TRIGGER_SAFETY
-        assert "check the source files" in policies[0].instructions
-
-    def test_loads_multiple_policies(self, temp_dir: Path) -> None:
-        """Test loading multiple policies."""
-        policies_dir = temp_dir / "policies"
-        policies_dir.mkdir()
-
-        # Create policy files
-        (policies_dir / "policy1.md").write_text(
-            """---
-name: Policy 1
-trigger: "src/**/*"
----
-Instructions for policy 1.
-"""
-        )
-        (policies_dir / "policy2.md").write_text(
-            """---
-name: Policy 2
-trigger: "test/**/*"
----
-Instructions for policy 2.
-"""
-        )
-
-        policies = load_policies_from_directory(policies_dir)
-
-        assert len(policies) == 2
-        names = {p.name for p in policies}
-        assert names == {"Policy 1", "Policy 2"}
-
-    def test_returns_empty_for_empty_directory(self, temp_dir: Path) -> None:
-        """Test that empty directory returns empty list."""
-        policies_dir = temp_dir / "policies"
-        policies_dir.mkdir()
-
-        policies = load_policies_from_directory(policies_dir)
-
-        assert policies == []
-
-    def test_returns_empty_for_nonexistent_directory(self, temp_dir: Path) -> None:
-        """Test that nonexistent directory returns empty list."""
-        policies_dir = temp_dir / "nonexistent"
-
-        policies = load_policies_from_directory(policies_dir)
-
-        assert policies == []
-
-    def test_loads_policy_with_set_detection_mode(self, temp_dir: Path) -> None:
-        """Test loading a policy with set detection mode."""
-        policies_dir = temp_dir / "policies"
-        policies_dir.mkdir()
-
-        policy_file = policies_dir / "source-test-pairing.md"
-        policy_file.write_text(
-            """---
-name: Source/Test Pairing
-set:
-  - src/{path}.py
-  - tests/{path}_test.py
----
-Source and test files should change together.
-"""
-        )
-
-        policies = load_policies_from_directory(policies_dir)
-
-        assert len(policies) == 1
-        assert policies[0].name == "Source/Test Pairing"
-        assert policies[0].detection_mode == DetectionMode.SET
-        assert policies[0].set_patterns == ["src/{path}.py", "tests/{path}_test.py"]
-
-    def test_loads_policy_with_pair_detection_mode(self, temp_dir: Path) -> None:
-        """Test loading a policy with pair detection mode."""
-        policies_dir = temp_dir / "policies"
-        policies_dir.mkdir()
-
-        policy_file = policies_dir / "api-docs.md"
-        policy_file.write_text(
-            """---
-name: API Documentation
-pair:
-  trigger: src/api/{name}.py
-  expects: docs/api/{name}.md
----
-API code requires documentation.
-"""
-        )
-
-        policies = load_policies_from_directory(policies_dir)
-
-        assert len(policies) == 1
-        assert policies[0].name == "API Documentation"
-        assert policies[0].detection_mode == DetectionMode.PAIR
-        assert policies[0].pair_config is not None
-        assert policies[0].pair_config.trigger == "src/api/{name}.py"
-        assert policies[0].pair_config.expects == ["docs/api/{name}.md"]
-
-    def test_loads_policy_with_command_action(self, temp_dir: Path) -> None:
-        """Test loading a policy with command action."""
-        policies_dir = temp_dir / "policies"
-        policies_dir.mkdir()
-
-        policy_file = policies_dir / "format-python.md"
-        policy_file.write_text(
-            """---
-name: Format Python
-trigger: "**/*.py"
-action:
-  command: "ruff format {file}"
-  run_for: each_match
----
-"""
-        )
-
-        policies = load_policies_from_directory(policies_dir)
-
-        assert len(policies) == 1
-        assert policies[0].name == "Format Python"
-        from deepwork.core.policy_parser import ActionType
-
-        assert policies[0].action_type == ActionType.COMMAND
-        assert policies[0].command_action is not None
-        assert policies[0].command_action.command == "ruff format {file}"
-        assert policies[0].command_action.run_for == "each_match"
diff --git a/tests/unit/test_rules_parser.py b/tests/unit/test_rules_parser.py
new file mode 100644
index 00000000..2906816a
--- /dev/null
+++ b/tests/unit/test_rules_parser.py
@@ -0,0 +1,364 @@
+"""Tests for rule definition parser."""
+
+from pathlib import Path
+
+import pytest
+
+from deepwork.core.pattern_matcher import matches_any_pattern as matches_pattern
+from deepwork.core.rules_parser import (
+    DEFAULT_COMPARE_TO,
+    DetectionMode,
+    Rule,
+    RulesParseError,
+    evaluate_rules,
+    evaluate_rule,
+    load_rules_from_directory,
+)
+
+
+class TestMatchesPattern:
+    """Tests for matches_pattern function."""
+
+    def test_simple_glob_match(self) -> None:
+        """Test simple glob pattern matching."""
+        assert matches_pattern("file.py", ["*.py"])
+        assert not matches_pattern("file.js", ["*.py"])
+
+    def test_directory_glob_match(self) -> None:
+        """Test directory pattern matching."""
+        assert matches_pattern("src/file.py", ["src/*"])
+        assert not matches_pattern("test/file.py", ["src/*"])
+
+    def test_recursive_glob_match(self) -> None:
+        """Test recursive ** pattern matching."""
+        assert matches_pattern("src/deep/nested/file.py", ["src/**/*.py"])
+        assert matches_pattern("src/file.py", ["src/**/*.py"])
+        assert not matches_pattern("test/file.py", ["src/**/*.py"])
+
+    def test_multiple_patterns(self) -> None:
+        """Test matching against multiple patterns."""
+        patterns = ["*.py", "*.js"]
+        assert matches_pattern("file.py", patterns)
+        assert matches_pattern("file.js", patterns)
+        assert not matches_pattern("file.txt", patterns)
+
+    def test_config_directory_pattern(self) -> None:
+        """Test pattern like app/config/**/*."""
+        assert matches_pattern("app/config/settings.py", ["app/config/**/*"])
+        assert matches_pattern("app/config/nested/deep.yml", ["app/config/**/*"])
+        assert not matches_pattern("app/other/file.py", ["app/config/**/*"])
+
+
+class TestEvaluateRule:
+    """Tests for evaluate_rule function."""
+
+    def test_fires_when_trigger_matches(self) -> None:
+        """Test rule fires when trigger matches."""
+        rule = Rule(
+            name="Test",
+            filename="test",
+            detection_mode=DetectionMode.TRIGGER_SAFETY,
+            triggers=["src/**/*.py"],
+            safety=[],
+            instructions="Check it",
+        )
+        changed_files = ["src/main.py", "README.md"]
+
+        result = evaluate_rule(rule, changed_files)
+        assert result.should_fire is True
+
+    def test_does_not_fire_when_no_trigger_match(self) -> None:
+        """Test rule doesn't fire when no trigger matches."""
+        rule = Rule(
+            name="Test",
+            filename="test",
+            detection_mode=DetectionMode.TRIGGER_SAFETY,
+            triggers=["src/**/*.py"],
+            safety=[],
+            instructions="Check it",
+        )
+        changed_files = ["test/main.py", "README.md"]
+
+        result = evaluate_rule(rule, changed_files)
+        assert result.should_fire is False
+
+    def test_does_not_fire_when_safety_matches(self) -> None:
+        """Test rule doesn't fire when safety file is also changed."""
+        rule = Rule(
+            name="Test",
+            filename="test",
+            detection_mode=DetectionMode.TRIGGER_SAFETY,
+            triggers=["app/config/**/*"],
+            safety=["docs/install_guide.md"],
+            instructions="Update docs",
+        )
+        changed_files = ["app/config/settings.py", "docs/install_guide.md"]
+
+        result = evaluate_rule(rule, changed_files)
+        assert result.should_fire is False
+
+    def test_fires_when_trigger_matches_but_safety_doesnt(self) -> None:
+        """Test rule fires when trigger matches but safety doesn't."""
+        rule = Rule(
+            name="Test",
+            filename="test",
+            detection_mode=DetectionMode.TRIGGER_SAFETY,
+            triggers=["app/config/**/*"],
+            safety=["docs/install_guide.md"],
+            instructions="Update docs",
+        )
+        changed_files = ["app/config/settings.py", "app/main.py"]
+
+        result = evaluate_rule(rule, changed_files)
+        assert result.should_fire is True
+
+    def test_multiple_safety_patterns(self) -> None:
+        """Test rule with multiple safety patterns."""
+        rule = Rule(
+            name="Test",
+            filename="test",
+            detection_mode=DetectionMode.TRIGGER_SAFETY,
+            triggers=["src/auth/**/*"],
+            safety=["SECURITY.md", "docs/security_review.md"],
+            instructions="Security review",
+        )
+
+        # Should not fire if any safety file is changed
+        result1 = evaluate_rule(rule, ["src/auth/login.py", "SECURITY.md"])
+        assert result1.should_fire is False
+        result2 = evaluate_rule(rule, ["src/auth/login.py", "docs/security_review.md"])
+        assert result2.should_fire is False
+
+        # Should fire if no safety files changed
+        result3 = evaluate_rule(rule, ["src/auth/login.py"])
+        assert result3.should_fire is True
+
+
+class TestEvaluateRules:
+    """Tests for evaluate_rules function."""
+
+    def test_returns_fired_rules(self) -> None:
+        """Test that evaluate_rules returns all fired rules."""
+        rules = [
+            Rule(
+                name="Rule 1",
+                filename="rule1",
+                detection_mode=DetectionMode.TRIGGER_SAFETY,
+                triggers=["src/**/*"],
+                safety=[],
+                instructions="Do 1",
+            ),
+            Rule(
+                name="Rule 2",
+                filename="rule2",
+                detection_mode=DetectionMode.TRIGGER_SAFETY,
+                triggers=["test/**/*"],
+                safety=[],
+                instructions="Do 2",
+            ),
+        ]
+        changed_files = ["src/main.py", "test/test_main.py"]
+
+        fired = evaluate_rules(rules, changed_files)
+
+        assert len(fired) == 2
+        assert fired[0].rule.name == "Rule 1"
+        assert fired[1].rule.name == "Rule 2"
+
+    def test_skips_promised_rules(self) -> None:
+        """Test that promised rules are skipped."""
+        rules = [
+            Rule(
+                name="Rule 1",
+                filename="rule1",
+                detection_mode=DetectionMode.TRIGGER_SAFETY,
+                triggers=["src/**/*"],
+                safety=[],
+                instructions="Do 1",
+            ),
+            Rule(
+                name="Rule 2",
+                filename="rule2",
+                detection_mode=DetectionMode.TRIGGER_SAFETY,
+                triggers=["src/**/*"],
+                safety=[],
+                instructions="Do 2",
+            ),
+        ]
+        changed_files = ["src/main.py"]
+        promised = {"Rule 1"}
+
+        fired = evaluate_rules(rules, changed_files, promised)
+
+        assert len(fired) == 1
+        assert fired[0].rule.name == "Rule 2"
+
+    def test_returns_empty_when_no_rules_fire(self) -> None:
+        """Test returns empty list when no rules fire."""
+        rules = [
+            Rule(
+                name="Rule 1",
+                filename="rule1",
+                detection_mode=DetectionMode.TRIGGER_SAFETY,
+                triggers=["src/**/*"],
+                safety=[],
+                instructions="Do 1",
+            ),
+        ]
+        changed_files = ["test/test_main.py"]
+
+        fired = evaluate_rules(rules, changed_files)
+
+        assert len(fired) == 0
+
+
+class TestLoadRulesFromDirectory:
+    """Tests for load_rules_from_directory function."""
+
+    def test_loads_rules_from_directory(self, temp_dir: Path) -> None:
+        """Test loading rules from a directory."""
+        rules_dir = temp_dir / "rules"
+        rules_dir.mkdir()
+
+        # Create a rule file
+        rule_file = rules_dir / "test-rule.md"
+        rule_file.write_text(
+            """---
+name: Test Rule
+trigger: "src/**/*"
+---
+Please check the source files.
+"""
+        )
+
+        rules = load_rules_from_directory(rules_dir)
+
+        assert len(rules) == 1
+        assert rules[0].name == "Test Rule"
+        assert rules[0].triggers == ["src/**/*"]
+        assert rules[0].detection_mode == DetectionMode.TRIGGER_SAFETY
+        assert "check the source files" in rules[0].instructions
+
+    def test_loads_multiple_rules(self, temp_dir: Path) -> None:
+        """Test loading multiple rules."""
+        rules_dir = temp_dir / "rules"
+        rules_dir.mkdir()
+
+        # Create rule files
+        (rules_dir / "rule1.md").write_text(
+            """---
+name: Rule 1
+trigger: "src/**/*"
+---
+Instructions for rule 1.
+"""
+        )
+        (rules_dir / "rule2.md").write_text(
+            """---
+name: Rule 2
+trigger: "test/**/*"
+---
+Instructions for rule 2.
+"""
+        )
+
+        rules = load_rules_from_directory(rules_dir)
+
+        assert len(rules) == 2
+        names = {r.name for r in rules}
+        assert names == {"Rule 1", "Rule 2"}
+
+    def test_returns_empty_for_empty_directory(self, temp_dir: Path) -> None:
+        """Test that empty directory returns empty list."""
+        rules_dir = temp_dir / "rules"
+        rules_dir.mkdir()
+
+        rules = load_rules_from_directory(rules_dir)
+
+        assert rules == []
+
+    def test_returns_empty_for_nonexistent_directory(self, temp_dir: Path) -> None:
+        """Test that nonexistent directory returns empty list."""
+        rules_dir = temp_dir / "nonexistent"
+
+        rules = load_rules_from_directory(rules_dir)
+
+        assert rules == []
+
+    def test_loads_rule_with_set_detection_mode(self, temp_dir: Path) -> None:
+        """Test loading a rule with set detection mode."""
+        rules_dir = temp_dir / "rules"
+        rules_dir.mkdir()
+
+        rule_file = rules_dir / "source-test-pairing.md"
+        rule_file.write_text(
+            """---
+name: Source/Test Pairing
+set:
+  - src/{path}.py
+  - tests/{path}_test.py
+---
+Source and test files should change together.
+"""
+        )
+
+        rules = load_rules_from_directory(rules_dir)
+
+        assert len(rules) == 1
+        assert rules[0].name == "Source/Test Pairing"
+        assert rules[0].detection_mode == DetectionMode.SET
+        assert rules[0].set_patterns == ["src/{path}.py", "tests/{path}_test.py"]
+
+    def test_loads_rule_with_pair_detection_mode(self, temp_dir: Path) -> None:
+        """Test loading a rule with pair detection mode."""
+        rules_dir = temp_dir / "rules"
+        rules_dir.mkdir()
+
+        rule_file = rules_dir / "api-docs.md"
+        rule_file.write_text(
+            """---
+name: API Documentation
+pair:
+  trigger: src/api/{name}.py
+  expects: docs/api/{name}.md
+---
+API code requires documentation.
+"""
+        )
+
+        rules = load_rules_from_directory(rules_dir)
+
+        assert len(rules) == 1
+        assert rules[0].name == "API Documentation"
+        assert rules[0].detection_mode == DetectionMode.PAIR
+        assert rules[0].pair_config is not None
+        assert rules[0].pair_config.trigger == "src/api/{name}.py"
+        assert rules[0].pair_config.expects == ["docs/api/{name}.md"]
+
+    def test_loads_rule_with_command_action(self, temp_dir: Path) -> None:
+        """Test loading a rule with command action."""
+        rules_dir = temp_dir / "rules"
+        rules_dir.mkdir()
+
+        rule_file = rules_dir / "format-python.md"
+        rule_file.write_text(
+            """---
+name: Format Python
+trigger: "**/*.py"
+action:
+  command: "ruff format {file}"
+  run_for: each_match
+---
+"""
+        )
+
+        rules = load_rules_from_directory(rules_dir)
+
+        assert len(rules) == 1
+        assert rules[0].name == "Format Python"
+        from deepwork.core.rules_parser import ActionType
+
+        assert rules[0].action_type == ActionType.COMMAND
+        assert rules[0].command_action is not None
+        assert rules[0].command_action.command == "ruff format {file}"
+        assert rules[0].command_action.run_for == "each_match"

From aa77a79dd298e2e295a6509215c9d44850bc2e09 Mon Sep 17 00:00:00 2001
From: Noah Horton <noah@unsupervised.com>
Date: Sat, 17 Jan 2026 14:15:14 -0700
Subject: [PATCH 12/21] Remove stale deepwork_policy hook entries from
 settings.json

The previous commit renamed deepwork_policy to deepwork_rules but left
duplicate hook entries in settings.json pointing to the old paths.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---
 .claude/settings.json | 18 ------------------
 1 file changed, 18 deletions(-)

diff --git a/.claude/settings.json b/.claude/settings.json
index aa1b950c..84a93bed 100644
--- a/.claude/settings.json
+++ b/.claude/settings.json
@@ -92,15 +92,6 @@
   },
   "hooks": {
     "UserPromptSubmit": [
-      {
-        "matcher": "",
-        "hooks": [
-          {
-            "type": "command",
-            "command": ".deepwork/jobs/deepwork_policy/hooks/user_prompt_submit.sh"
-          }
-        ]
-      },
       {
         "matcher": "",
         "hooks": [
@@ -112,15 +103,6 @@
       }
     ],
     "Stop": [
-      {
-        "matcher": "",
-        "hooks": [
-          {
-            "type": "command",
-            "command": ".deepwork/jobs/deepwork_policy/hooks/policy_stop_hook.sh"
-          }
-        ]
-      },
       {
         "matcher": "",
         "hooks": [

From 5911dcb6fedddb51341466ca02dc47ffd80e1bb3 Mon Sep 17 00:00:00 2001
From: Noah Horton <noah@unsupervised.com>
Date: Sat, 17 Jan 2026 14:28:48 -0700
Subject: [PATCH 13/21] Add comprehensive test coverage and fix
 security/linting issues

- Add 134 new tests covering test plan scenarios:
  - test_pattern_matcher.py: glob patterns, variable extraction, resolution
  - test_command_executor.py: variable substitution, command execution
  - test_rules_queue.py: queue entry lifecycle, hash calculation
  - test_schema_validation.py: required fields, mutual exclusivity
  - Extended test_rules_parser.py with correspondence sets/pairs tests

- Security: Add shlex.quote() to command_executor.py to prevent
  command injection via malicious file paths

- Fix ruff linting issues in pattern_matcher.py, rules_queue.py,
  and rules_check.py (f-strings, datetime.UTC, open mode)

- Update .gitignore comment from "policy" to "rules"

- Remove doc/test_scenarios.md (all scenarios now covered by tests)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---
 .deepwork/.gitignore                  |   2 +-
 doc/test_scenarios.md                 | 489 --------------------------
 src/deepwork/core/command_executor.py |  10 +-
 src/deepwork/core/pattern_matcher.py  |   6 +-
 src/deepwork/core/rules_queue.py      |  14 +-
 src/deepwork/hooks/rules_check.py     |   2 +-
 tests/unit/test_command_executor.py   | 205 +++++++++++
 tests/unit/test_pattern_matcher.py    | 205 +++++++++++
 tests/unit/test_rules_parser.py       | 371 +++++++++++++++++++
 tests/unit/test_rules_queue.py        | 352 ++++++++++++++++++
 tests/unit/test_schema_validation.py  | 323 +++++++++++++++++
 11 files changed, 1475 insertions(+), 504 deletions(-)
 delete mode 100644 doc/test_scenarios.md
 create mode 100644 tests/unit/test_command_executor.py
 create mode 100644 tests/unit/test_pattern_matcher.py
 create mode 100644 tests/unit/test_rules_queue.py
 create mode 100644 tests/unit/test_schema_validation.py

diff --git a/.deepwork/.gitignore b/.deepwork/.gitignore
index eed09d08..0ef10e54 100644
--- a/.deepwork/.gitignore
+++ b/.deepwork/.gitignore
@@ -1,3 +1,3 @@
 # DeepWork temporary files
-# These files are used for policy evaluation during sessions
+# These files are used for rules evaluation during sessions
 .last_work_tree
diff --git a/doc/test_scenarios.md b/doc/test_scenarios.md
deleted file mode 100644
index 137120c1..00000000
--- a/doc/test_scenarios.md
+++ /dev/null
@@ -1,489 +0,0 @@
-# Rules System Test Scenarios
-
-This document describes test scenarios for validating the rules system implementation.
-
-## 1. Pattern Matching
-
-### 1.1 Basic Glob Patterns
-
-| ID | Scenario | Pattern | File | Expected |
-|----|----------|---------|------|----------|
-| PM-1.1.1 | Exact match | `README.md` | `README.md` | Match |
-| PM-1.1.2 | Exact no match | `README.md` | `readme.md` | No match |
-| PM-1.1.3 | Single wildcard | `*.py` | `main.py` | Match |
-| PM-1.1.4 | Single wildcard nested | `*.py` | `src/main.py` | No match |
-| PM-1.1.5 | Double wildcard | `**/*.py` | `src/main.py` | Match |
-| PM-1.1.6 | Double wildcard deep | `**/*.py` | `src/a/b/c/main.py` | Match |
-| PM-1.1.7 | Double wildcard root | `**/*.py` | `main.py` | Match |
-| PM-1.1.8 | Directory prefix | `src/**/*` | `src/foo.py` | Match |
-| PM-1.1.9 | Directory prefix deep | `src/**/*` | `src/a/b/c.py` | Match |
-| PM-1.1.10 | Directory no match | `src/**/*` | `lib/foo.py` | No match |
-| PM-1.1.11 | Brace expansion | `*.{js,ts}` | `app.ts` | Match |
-| PM-1.1.12 | Brace expansion second | `*.{js,ts}` | `app.js` | Match |
-| PM-1.1.13 | Brace expansion no match | `*.{js,ts}` | `app.py` | No match |
-
-### 1.2 Variable Patterns
-
-| ID | Scenario | Pattern | File | Expected Variables |
-|----|----------|---------|------|-------------------|
-| PM-1.2.1 | Single var path | `src/{path}.py` | `src/foo/bar.py` | `{path: "foo/bar"}` |
-| PM-1.2.2 | Single var name | `src/{name}.py` | `src/utils.py` | `{name: "utils"}` |
-| PM-1.2.3 | Name no nested | `src/{name}.py` | `src/foo/bar.py` | No match |
-| PM-1.2.4 | Two variables | `{dir}/{name}.py` | `src/main.py` | `{dir: "src", name: "main"}` |
-| PM-1.2.5 | Prefix and suffix | `test_{name}_test.py` | `test_foo_test.py` | `{name: "foo"}` |
-| PM-1.2.6 | Nested path | `src/{path}/index.py` | `src/a/b/index.py` | `{path: "a/b"}` |
-| PM-1.2.7 | Explicit multi | `src/{**mod}/main.py` | `src/a/b/c/main.py` | `{mod: "a/b/c"}` |
-| PM-1.2.8 | Explicit single | `src/{*name}.py` | `src/utils.py` | `{name: "utils"}` |
-| PM-1.2.9 | Mixed explicit | `{*dir}/{**path}.py` | `src/a/b/c.py` | `{dir: "src", path: "a/b/c"}` |
-
-### 1.3 Pattern Resolution
-
-| ID | Scenario | Pattern | Variables | Expected Output |
-|----|----------|---------|-----------|-----------------|
-| PM-1.3.1 | Simple substitution | `tests/{path}_test.py` | `{path: "foo"}` | `tests/foo_test.py` |
-| PM-1.3.2 | Nested path | `tests/{path}_test.py` | `{path: "a/b/c"}` | `tests/a/b/c_test.py` |
-| PM-1.3.3 | Multiple vars | `{dir}/test_{name}.py` | `{dir: "tests", name: "foo"}` | `tests/test_foo.py` |
-
-## 2. Instruction Rules
-
-### 2.1 Basic Trigger/Safety
-
-| ID | Scenario | Changed Files | Trigger | Safety | Expected |
-|----|----------|---------------|---------|--------|----------|
-| IR-2.1.1 | Trigger match, no safety | `["src/main.py"]` | `src/**/*.py` | None | Fire |
-| IR-2.1.2 | Trigger match, safety match | `["src/main.py", "README.md"]` | `src/**/*.py` | `README.md` | No fire |
-| IR-2.1.3 | Trigger no match | `["docs/readme.md"]` | `src/**/*.py` | None | No fire |
-| IR-2.1.4 | Multiple triggers, one match | `["lib/utils.py"]` | `["src/**/*.py", "lib/**/*.py"]` | None | Fire |
-| IR-2.1.5 | Safety match only | `["README.md"]` | `src/**/*.py` | `README.md` | No fire |
-| IR-2.1.6 | Multiple safety, one match | `["src/main.py", "CHANGELOG.md"]` | `src/**/*.py` | `["README.md", "CHANGELOG.md"]` | No fire |
-| IR-2.1.7 | Multiple triggers, multiple files | `["src/a.py", "lib/b.py"]` | `["src/**/*.py", "lib/**/*.py"]` | None | Fire |
-
-### 2.2 Compare Modes
-
-```
-Setup: Branch diverged 3 commits ago from main
-- Commit 1: Added src/feature.py
-- Commit 2: Modified src/feature.py
-- Commit 3: Added tests/feature_test.py
-- Unstaged: Modified src/utils.py
-```
-
-| ID | Scenario | compare_to | Expected Changed Files |
-|----|----------|------------|----------------------|
-| IR-2.2.1 | Base comparison | `base` | `["src/feature.py", "tests/feature_test.py", "src/utils.py"]` |
-| IR-2.2.2 | Default tip (main ahead 1) | `default_tip` | All base + main's changes |
-| IR-2.2.3 | Prompt baseline (captured after commit 2) | `prompt` | `["tests/feature_test.py", "src/utils.py"]` |
-
-### 2.3 Promise Tags
-
-Promise tags use the rule's `name` field (not filename) with a checkmark prefix for human readability.
-
-| ID | Scenario | Conversation Contains | Rule `name` | Expected |
-|----|----------|----------------------|---------------|----------|
-| IR-2.3.1 | Standard promise | `<promise>✓ README Accuracy</promise>` | `README Accuracy` | Suppressed |
-| IR-2.3.2 | Without checkmark | `<promise>README Accuracy</promise>` | `README Accuracy` | Suppressed |
-| IR-2.3.3 | Case insensitive | `<promise>✓ readme accuracy</promise>` | `README Accuracy` | Suppressed |
-| IR-2.3.4 | Whitespace | `<promise>  ✓ README Accuracy  </promise>` | `README Accuracy` | Suppressed |
-| IR-2.3.5 | No promise | (none) | `README Accuracy` | Not suppressed |
-| IR-2.3.6 | Wrong promise | `<promise>✓ Other Rule</promise>` | `README Accuracy` | Not suppressed |
-| IR-2.3.7 | Multiple promises | `<promise>✓ A</promise><promise>✓ B</promise>` | `A` | Suppressed |
-
-## 3. Correspondence Sets
-
-### 3.1 Two-Pattern Sets
-
-```yaml
-set:
-  - "src/{path}.py"
-  - "tests/{path}_test.py"
-```
-
-| ID | Scenario | Changed Files | Expected |
-|----|----------|---------------|----------|
-| CS-3.1.1 | Both changed | `["src/foo.py", "tests/foo_test.py"]` | No fire (satisfied) |
-| CS-3.1.2 | Only source | `["src/foo.py"]` | Fire (missing test) |
-| CS-3.1.3 | Only test | `["tests/foo_test.py"]` | Fire (missing source) |
-| CS-3.1.4 | Nested both | `["src/a/b.py", "tests/a/b_test.py"]` | No fire |
-| CS-3.1.5 | Nested only source | `["src/a/b.py"]` | Fire |
-| CS-3.1.6 | Unrelated file | `["docs/readme.md"]` | No fire |
-| CS-3.1.7 | Source + unrelated | `["src/foo.py", "docs/readme.md"]` | Fire |
-| CS-3.1.8 | Both + unrelated | `["src/foo.py", "tests/foo_test.py", "docs/readme.md"]` | No fire |
-
-### 3.2 Three-Pattern Sets
-
-```yaml
-set:
-  - "models/{name}.py"
-  - "schemas/{name}.py"
-  - "migrations/{name}.sql"
-```
-
-| ID | Scenario | Changed Files | Expected |
-|----|----------|---------------|----------|
-| CS-3.2.1 | All three | `["models/user.py", "schemas/user.py", "migrations/user.sql"]` | No fire |
-| CS-3.2.2 | Two of three | `["models/user.py", "schemas/user.py"]` | Fire (missing migration) |
-| CS-3.2.3 | One of three | `["models/user.py"]` | Fire (missing 2) |
-| CS-3.2.4 | Different names | `["models/user.py", "schemas/order.py"]` | Fire (both incomplete) |
-
-### 3.3 Edge Cases
-
-| ID | Scenario | Changed Files | Expected |
-|----|----------|---------------|----------|
-| CS-3.3.1 | File matches both patterns | `["src/test_foo_test.py"]` | Depends on pattern specificity |
-| CS-3.3.2 | Empty path variable | (N/A - patterns require content) | Pattern validation error |
-| CS-3.3.3 | Multiple files same pattern | `["src/a.py", "src/b.py"]` | Fire for each without corresponding test |
-
-## 4. Correspondence Pairs
-
-### 4.1 Basic Pairs
-
-```yaml
-pair:
-  trigger: "api/{path}.py"
-  expects: "docs/api/{path}.md"
-```
-
-| ID | Scenario | Changed Files | Expected |
-|----|----------|---------------|----------|
-| CP-4.1.1 | Both changed | `["api/users.py", "docs/api/users.md"]` | No fire |
-| CP-4.1.2 | Only trigger | `["api/users.py"]` | Fire |
-| CP-4.1.3 | Only expected | `["docs/api/users.md"]` | No fire (directional) |
-| CP-4.1.4 | Trigger + unrelated | `["api/users.py", "README.md"]` | Fire |
-| CP-4.1.5 | Expected + unrelated | `["docs/api/users.md", "README.md"]` | No fire |
-
-### 4.2 Multi-Expects Pairs
-
-```yaml
-pair:
-  trigger: "api/{path}.py"
-  expects:
-    - "docs/api/{path}.md"
-    - "openapi/{path}.yaml"
-```
-
-| ID | Scenario | Changed Files | Expected |
-|----|----------|---------------|----------|
-| CP-4.2.1 | All three | `["api/users.py", "docs/api/users.md", "openapi/users.yaml"]` | No fire |
-| CP-4.2.2 | Trigger + one expect | `["api/users.py", "docs/api/users.md"]` | Fire (missing openapi) |
-| CP-4.2.3 | Only trigger | `["api/users.py"]` | Fire (missing both) |
-| CP-4.2.4 | Both expects only | `["docs/api/users.md", "openapi/users.yaml"]` | No fire |
-
-## 5. Command Rules
-
-### 5.1 Basic Commands
-
-```yaml
-- name: "Format Python"
-  trigger: "**/*.py"
-  action:
-    command: "ruff format {file}"
-    run_for: each_match
-```
-
-| ID | Scenario | Changed Files | Expected Behavior |
-|----|----------|---------------|-------------------|
-| CMD-5.1.1 | Single file | `["src/main.py"]` | Run `ruff format src/main.py` |
-| CMD-5.1.2 | Multiple files | `["src/a.py", "src/b.py"]` | Run command for each file |
-| CMD-5.1.3 | Non-matching | `["README.md"]` | No command run |
-
-### 5.2 All Matches Mode
-
-```yaml
-action:
-  command: "eslint --fix {files}"
-  run_for: all_matches
-```
-
-| ID | Scenario | Changed Files | Expected Command |
-|----|----------|---------------|------------------|
-| CMD-5.2.1 | Multiple files | `["a.js", "b.js", "c.js"]` | `eslint --fix a.js b.js c.js` |
-| CMD-5.2.2 | Single file | `["a.js"]` | `eslint --fix a.js` |
-
-### 5.3 Command Errors
-
-| ID | Scenario | Command Result | Expected |
-|----|----------|----------------|----------|
-| CMD-5.3.1 | Exit code 0 | Success | Pass |
-| CMD-5.3.2 | Exit code 1 | Failure | Fail, show stderr |
-| CMD-5.3.3 | Timeout | Command hangs | Fail, timeout error |
-| CMD-5.3.4 | Command not found | Not executable | Fail, not found error |
-
-## 6. Queue System
-
-### 6.1 Queue Entry Lifecycle
-
-| ID | Scenario | Initial State | Action | Final State |
-|----|----------|---------------|--------|-------------|
-| QS-6.1.1 | New trigger | (none) | Trigger detected | `.queued` |
-| QS-6.1.2 | Safety suppression | `.queued` | Safety pattern matches | `.skipped` |
-| QS-6.1.3 | Prompt addressed | `.queued` | Promise tag found | `.passed` |
-| QS-6.1.4 | Command success | `.queued` | Command passes | `.passed` |
-| QS-6.1.5 | Command failure | `.queued` | Command fails | `.failed` |
-| QS-6.1.6 | Re-trigger same | `.passed` | Same files changed | No new entry |
-| QS-6.1.7 | Re-trigger different | `.passed` | Different files | New `.queued` |
-
-### 6.2 Hash Calculation
-
-| ID | Scenario | Rule | Files | Baseline | Expected Hash Differs? |
-|----|----------|--------|-------|----------|------------------------|
-| QS-6.2.1 | Same everything | RuleA | `[a.py]` | commit1 | Same hash |
-| QS-6.2.2 | Different files | RuleA | `[a.py]` vs `[b.py]` | commit1 | Different |
-| QS-6.2.3 | Different baseline | RuleA | `[a.py]` | commit1 vs commit2 | Different |
-| QS-6.2.4 | Different rule | RuleA vs RuleB | `[a.py]` | commit1 | Different |
-
-### 6.3 Queue Cleanup
-
-| ID | Scenario | Entry Age | Entry Status | Expected |
-|----|----------|-----------|--------------|----------|
-| QS-6.3.1 | Old queued | 25 hours | `.queued` | Pruned |
-| QS-6.3.2 | Recent queued | 1 hour | `.queued` | Kept |
-| QS-6.3.3 | Old passed | 2 hours | `.passed` | Pruned |
-| QS-6.3.4 | Recent passed | 30 min | `.passed` | Kept |
-| QS-6.3.5 | Old failed | 25 hours | `.failed` | Pruned |
-
-### 6.4 Concurrent Access
-
-| ID | Scenario | Process A | Process B | Expected |
-|----|----------|-----------|-----------|----------|
-| QS-6.4.1 | Simultaneous create | Creates entry | Creates entry | One wins, other no-ops |
-| QS-6.4.2 | Create during eval | Creating | Evaluating existing | A creates new, B continues |
-| QS-6.4.3 | Both evaluate same | Evaluating | Evaluating | File locking prevents race |
-
-## 7. Output Management
-
-### 7.1 Output Batching
-
-| ID | Scenario | Triggered Rules | Expected Output |
-|----|----------|-----------------|-----------------|
-| OM-7.1.1 | Single rule | 1 | Full instructions |
-| OM-7.1.2 | Two rules | 2 | Both, grouped |
-| OM-7.1.3 | Many rules | 10 | Batched by rule name |
-| OM-7.1.4 | Same rule multiple files | 3 Source/Test pairs | Grouped under single heading |
-
-### 7.2 Output Format
-
-| ID | Scenario | Input | Expected Format |
-|----|----------|-------|-----------------|
-| OM-7.2.1 | Correspondence violation | `src/foo.py` missing `tests/foo_test.py` | `src/foo.py → tests/foo_test.py` |
-| OM-7.2.2 | Multiple same rule | 3 correspondence violations | Single heading, 3 lines |
-| OM-7.2.3 | Instruction rule | Source files changed | Short summary + instructions |
-
-## 8. Schema Validation
-
-### 8.1 Required Fields
-
-| ID | Scenario | Missing Field | Expected Error |
-|----|----------|---------------|----------------|
-| SV-8.1.1 | Missing name | `name` | "required field 'name'" |
-| SV-8.1.2 | Missing detection mode | no `trigger`, `set`, or `pair` | "must have 'trigger', 'set', or 'pair'" |
-| SV-8.1.3 | Missing markdown body | empty body (prompt action) | "instruction rules require markdown body" |
-| SV-8.1.4 | Missing set patterns | `set` is empty | "set requires at least 2 patterns" |
-
-### 8.2 Mutually Exclusive Fields
-
-| ID | Scenario | Fields Present | Expected Error |
-|----|----------|----------------|----------------|
-| SV-8.2.1 | Both trigger and set | `trigger` + `set` | "use trigger, set, or pair" |
-| SV-8.2.2 | Both trigger and pair | `trigger` + `pair` | "use trigger, set, or pair" |
-| SV-8.2.3 | All detection modes | `trigger` + `set` + `pair` | "use only one detection mode" |
-
-### 8.3 Pattern Validation
-
-| ID | Scenario | Pattern | Expected Error |
-|----|----------|---------|----------------|
-| SV-8.3.1 | Unclosed brace | `src/{path.py` | "unclosed brace" |
-| SV-8.3.2 | Empty variable | `src/{}.py` | "empty variable name" |
-| SV-8.3.3 | Invalid chars in var | `src/{path/name}.py` | "invalid variable name" |
-| SV-8.3.4 | Duplicate variable | `{path}/{path}.py` | "duplicate variable 'path'" |
-
-### 8.4 Value Validation
-
-| ID | Scenario | Field | Value | Expected Error |
-|----|----------|-------|-------|----------------|
-| SV-8.4.1 | Invalid compare_to | `compare_to` | `"yesterday"` | "must be base, default_tip, or prompt" |
-| SV-8.4.2 | Invalid run_for | `run_for` | `"first_match"` | "must be each_match or all_matches" |
-
-## 9. Integration Tests
-
-### 9.1 End-to-End Instruction Rule
-
-```
-Given: Rule requiring tests for source changes
-When: User modifies src/auth/login.py without test
-Then:
-  1. Stop hook fires
-  2. Detector creates queue entry
-  3. Evaluator returns instructions
-  4. Agent sees rule message
-  5. Agent adds tests
-  6. Agent includes promise tag
-  7. Next stop: queue entry marked passed
-  8. Agent can stop successfully
-```
-
-### 9.2 End-to-End Command Rule
-
-```
-Given: Auto-format rule for Python files
-When: User creates unformatted src/new_file.py
-Then:
-  1. Stop hook fires
-  2. Detector creates queue entry
-  3. Evaluator runs formatter
-  4. Formatter modifies file
-  5. Evaluator verifies idempotency
-  6. Queue entry marked passed
-  7. Agent notified of formatting changes
-```
-
-### 9.3 End-to-End Correspondence Set
-
-```
-Given: Source/test pairing rule
-When: User modifies src/utils.py only
-Then:
-  1. Detector matches src/utils.py to pattern
-  2. Resolver calculates expected tests/utils_test.py
-  3. tests/utils_test.py not in changed files
-  4. Queue entry created for incomplete correspondence
-  5. Evaluator returns instructions
-  6. Agent sees "expected tests/utils_test.py to change"
-```
-
-### 9.4 Multiple Rules Same File
-
-```
-Given:
-  - Rule A: "Format Python" (command)
-  - Rule B: "Test Coverage" (set)
-  - Rule C: "README Accuracy" (instruction)
-When: User modifies src/main.py
-Then:
-  1. All three rules trigger
-  2. Command rule runs first
-  3. Set rule checks for test
-  4. Instruction rule prepares message
-  5. Agent sees batched output with all requirements
-```
-
-### 9.5 Safety Pattern Across Rules
-
-```
-Given:
-  - Rule A: trigger=src/**/*.py, safety=CHANGELOG.md
-  - Rule B: trigger=src/**/*.py, safety=README.md
-When: User modifies src/main.py and CHANGELOG.md
-Then:
-  1. Rule A: safety match, skipped
-  2. Rule B: no safety match, fires
-  3. Only Rule B instructions shown
-```
-
-## 10. Performance Tests
-
-### 10.1 Large File Count
-
-| ID | Scenario | File Count | Expected |
-|----|----------|------------|----------|
-| PT-10.1.1 | Many changed files | 100 | < 1s evaluation |
-| PT-10.1.2 | Very many files | 1000 | < 5s evaluation |
-| PT-10.1.3 | Pattern-heavy | 50 rules, 100 files | < 2s evaluation |
-
-### 10.2 Queue Size
-
-| ID | Scenario | Queue Entries | Expected |
-|----|----------|---------------|----------|
-| PT-10.2.1 | Moderate queue | 100 entries | < 100ms load |
-| PT-10.2.2 | Large queue | 1000 entries | < 500ms load |
-| PT-10.2.3 | Cleanup performance | 10000 old entries | < 1s cleanup |
-
-### 10.3 Pattern Matching
-
-| ID | Scenario | Patterns | Files | Expected |
-|----|----------|----------|-------|----------|
-| PT-10.3.1 | Simple patterns | 10 | 100 | < 10ms |
-| PT-10.3.2 | Complex patterns | 50 with variables | 100 | < 50ms |
-| PT-10.3.3 | Deep recursion | `**/**/**/*.py` | 1000 | < 100ms |
-
-## Test Data Fixtures
-
-### Sample Rule Files
-
-Rules are stored as individual markdown files in `.deepwork/rules/`:
-
-**`.deepwork/rules/readme-accuracy.md`**
-```markdown
----
-name: README Accuracy
-trigger: src/**/*
-safety: README.md
----
-Please review README.md for accuracy.
-```
-
-**`.deepwork/rules/source-test-pairing.md`**
-```markdown
----
-name: Source/Test Pairing
-set:
-  - src/{path}.py
-  - tests/{path}_test.py
----
-Source and test should change together.
-```
-
-**`.deepwork/rules/api-documentation.md`**
-```markdown
----
-name: API Documentation
-pair:
-  trigger: api/{module}.py
-  expects: docs/api/{module}.md
----
-API changes need documentation.
-```
-
-**`.deepwork/rules/python-formatting.md`**
-```markdown
----
-name: Python Formatting
-trigger: "**/*.py"
-action:
-  command: black {file}
-  run_for: each_match
----
-Auto-formats Python files with Black.
-```
-
-### Sample Queue Entry
-
-```json
-{
-  "rule_name": "Source/Test Pairing",
-  "rule_file": "source-test-pairing.md",
-  "trigger_hash": "abc123def456",
-  "status": "queued",
-  "created_at": "2024-01-16T10:00:00Z",
-  "evaluated_at": null,
-  "baseline_ref": "abc123",
-  "trigger_files": ["src/auth/login.py"],
-  "expected_files": ["tests/auth/login_test.py"],
-  "matched_files": [],
-  "action_result": null
-}
-```
-
-### Directory Structure for Tests
-
-```
-.deepwork/
-├── rules/
-│   ├── readme-accuracy.md
-│   ├── source-test-pairing.md
-│   ├── api-documentation.md
-│   └── python-formatting.md
-└── tmp/                         # GITIGNORED
-    └── rules/
-        └── queue/
-            └── (queue entries created during tests)
-```
diff --git a/src/deepwork/core/command_executor.py b/src/deepwork/core/command_executor.py
index 9db456ca..629b4f50 100644
--- a/src/deepwork/core/command_executor.py
+++ b/src/deepwork/core/command_executor.py
@@ -1,5 +1,6 @@
 """Execute command actions for rules."""
 
+import shlex
 import subprocess
 from dataclasses import dataclass
 from pathlib import Path
@@ -44,13 +45,16 @@ def substitute_command_variables(
     result = command_template
 
     if file is not None:
-        result = result.replace("{file}", file)
+        # Quote file path to prevent command injection
+        result = result.replace("{file}", shlex.quote(file))
 
     if files is not None:
-        result = result.replace("{files}", " ".join(files))
+        # Quote each file path individually
+        quoted_files = " ".join(shlex.quote(f) for f in files)
+        result = result.replace("{files}", quoted_files)
 
     if repo_root is not None:
-        result = result.replace("{repo_root}", str(repo_root))
+        result = result.replace("{repo_root}", shlex.quote(str(repo_root)))
 
     return result
 
diff --git a/src/deepwork/core/pattern_matcher.py b/src/deepwork/core/pattern_matcher.py
index 9d80549b..c82ec723 100644
--- a/src/deepwork/core/pattern_matcher.py
+++ b/src/deepwork/core/pattern_matcher.py
@@ -125,11 +125,11 @@ def pattern_to_regex(pattern: str) -> tuple[str, list[str]]:
         if var_spec.startswith("**"):
             # Explicit multi-segment: {**name}
             var_name = var_spec[2:] or "path"
-            regex_part = "(?P<{}>.+)".format(re.escape(var_name))
+            regex_part = f"(?P<{re.escape(var_name)}>.+)"
         elif var_spec.startswith("*"):
             # Explicit single-segment: {*name}
             var_name = var_spec[1:] or "name"
-            regex_part = "(?P<{}>[^/]+)".format(re.escape(var_name))
+            regex_part = f"(?P<{re.escape(var_name)}>[^/]+)"
         elif var_spec == "path":
             # Conventional multi-segment
             var_name = "path"
@@ -137,7 +137,7 @@ def pattern_to_regex(pattern: str) -> tuple[str, list[str]]:
         else:
             # Default single-segment (including custom names)
             var_name = var_spec
-            regex_part = "(?P<{}>[^/]+)".format(re.escape(var_name))
+            regex_part = f"(?P<{re.escape(var_name)}>[^/]+)"
 
         result.append(regex_part)
         var_names.append(var_name)
diff --git a/src/deepwork/core/rules_queue.py b/src/deepwork/core/rules_queue.py
index 8f6ec430..4f49a4fe 100644
--- a/src/deepwork/core/rules_queue.py
+++ b/src/deepwork/core/rules_queue.py
@@ -3,7 +3,7 @@
 import hashlib
 import json
 from dataclasses import asdict, dataclass, field
-from datetime import datetime, timezone
+from datetime import UTC, datetime
 from enum import Enum
 from pathlib import Path
 from typing import Any
@@ -52,7 +52,7 @@ class QueueEntry:
 
     def __post_init__(self) -> None:
         if not self.created_at:
-            self.created_at = datetime.now(timezone.utc).isoformat()
+            self.created_at = datetime.now(UTC).isoformat()
 
     def to_dict(self) -> dict[str, Any]:
         """Convert to dictionary for JSON serialization."""
@@ -149,7 +149,7 @@ def get_entry(self, trigger_hash: str) -> QueueEntry | None:
             return None
 
         try:
-            with open(path, "r", encoding="utf-8") as f:
+            with open(path, encoding="utf-8") as f:
                 data = json.load(f)
             return QueueEntry.from_dict(data)
         except (json.JSONDecodeError, OSError, KeyError):
@@ -225,14 +225,14 @@ def update_status(
 
         # Load existing entry
         try:
-            with open(old_path, "r", encoding="utf-8") as f:
+            with open(old_path, encoding="utf-8") as f:
                 data = json.load(f)
         except (json.JSONDecodeError, OSError):
             return False
 
         # Update fields
         data["status"] = new_status.value
-        data["evaluated_at"] = datetime.now(timezone.utc).isoformat()
+        data["evaluated_at"] = datetime.now(UTC).isoformat()
         if action_result:
             data["action_result"] = asdict(action_result)
 
@@ -259,7 +259,7 @@ def get_queued_entries(self) -> list[QueueEntry]:
         entries = []
         for path in self.queue_dir.glob("*.queued.json"):
             try:
-                with open(path, "r", encoding="utf-8") as f:
+                with open(path, encoding="utf-8") as f:
                     data = json.load(f)
                 entries.append(QueueEntry.from_dict(data))
             except (json.JSONDecodeError, OSError, KeyError):
@@ -275,7 +275,7 @@ def get_all_entries(self) -> list[QueueEntry]:
         entries = []
         for path in self.queue_dir.glob("*.json"):
             try:
-                with open(path, "r", encoding="utf-8") as f:
+                with open(path, encoding="utf-8") as f:
                     data = json.load(f)
                 entries.append(QueueEntry.from_dict(data))
             except (json.JSONDecodeError, OSError, KeyError):
diff --git a/src/deepwork/hooks/rules_check.py b/src/deepwork/hooks/rules_check.py
index 121b9c5f..1d43d12e 100644
--- a/src/deepwork/hooks/rules_check.py
+++ b/src/deepwork/hooks/rules_check.py
@@ -39,8 +39,8 @@
 )
 from deepwork.core.rules_queue import (
     ActionResult,
-    RulesQueue,
     QueueEntryStatus,
+    RulesQueue,
     compute_trigger_hash,
 )
 from deepwork.hooks.wrapper import (
diff --git a/tests/unit/test_command_executor.py b/tests/unit/test_command_executor.py
new file mode 100644
index 00000000..d6b88ff3
--- /dev/null
+++ b/tests/unit/test_command_executor.py
@@ -0,0 +1,205 @@
+"""Tests for command executor (CMD-5.x from test_scenarios.md)."""
+
+from pathlib import Path
+
+import pytest
+
+from deepwork.core.command_executor import (
+    CommandResult,
+    all_commands_succeeded,
+    execute_command,
+    format_command_errors,
+    run_command_action,
+    substitute_command_variables,
+)
+from deepwork.core.rules_parser import CommandAction
+
+
+class TestSubstituteCommandVariables:
+    """Tests for command variable substitution."""
+
+    def test_single_file_substitution(self) -> None:
+        """Substitute {file} variable."""
+        result = substitute_command_variables(
+            "ruff format {file}",
+            file="src/main.py",
+        )
+        assert result == "ruff format src/main.py"
+
+    def test_multiple_files_substitution(self) -> None:
+        """Substitute {files} variable."""
+        result = substitute_command_variables(
+            "eslint --fix {files}",
+            files=["a.js", "b.js", "c.js"],
+        )
+        assert result == "eslint --fix a.js b.js c.js"
+
+    def test_repo_root_substitution(self) -> None:
+        """Substitute {repo_root} variable."""
+        result = substitute_command_variables(
+            "cd {repo_root} && pytest",
+            repo_root=Path("/home/user/project"),
+        )
+        assert result == "cd /home/user/project && pytest"
+
+    def test_all_variables(self) -> None:
+        """Substitute all variables together."""
+        result = substitute_command_variables(
+            "{repo_root}/scripts/process.sh {file} {files}",
+            file="main.py",
+            files=["a.py", "b.py"],
+            repo_root=Path("/project"),
+        )
+        assert result == "/project/scripts/process.sh main.py a.py b.py"
+
+
+class TestExecuteCommand:
+    """Tests for command execution."""
+
+    def test_successful_command(self) -> None:
+        """CMD-5.3.1: Exit code 0 - success."""
+        result = execute_command("echo hello")
+        assert result.success is True
+        assert result.exit_code == 0
+        assert "hello" in result.stdout
+
+    def test_failed_command(self) -> None:
+        """CMD-5.3.2: Exit code 1 - failure."""
+        result = execute_command("exit 1")
+        assert result.success is False
+        assert result.exit_code == 1
+
+    def test_command_timeout(self) -> None:
+        """CMD-5.3.3: Command timeout."""
+        result = execute_command("sleep 10", timeout=1)
+        assert result.success is False
+        assert "timed out" in result.stderr.lower()
+
+    def test_command_not_found(self) -> None:
+        """CMD-5.3.4: Command not found."""
+        result = execute_command("nonexistent_command_12345")
+        assert result.success is False
+        # Different systems return different error messages
+        assert result.exit_code != 0 or "not found" in result.stderr.lower()
+
+
+class TestRunCommandActionEachMatch:
+    """Tests for run_for: each_match mode (CMD-5.1.x)."""
+
+    def test_single_file(self) -> None:
+        """CMD-5.1.1: Single file triggers single command."""
+        action = CommandAction(command="echo {file}", run_for="each_match")
+        results = run_command_action(action, ["src/main.py"])
+
+        assert len(results) == 1
+        assert results[0].command == "echo src/main.py"
+        assert results[0].success is True
+
+    def test_multiple_files(self) -> None:
+        """CMD-5.1.2: Multiple files trigger command for each."""
+        action = CommandAction(command="echo {file}", run_for="each_match")
+        results = run_command_action(action, ["src/a.py", "src/b.py"])
+
+        assert len(results) == 2
+        assert results[0].command == "echo src/a.py"
+        assert results[1].command == "echo src/b.py"
+
+    def test_no_files(self) -> None:
+        """CMD-5.1.3: No files - no command run."""
+        action = CommandAction(command="echo {file}", run_for="each_match")
+        results = run_command_action(action, [])
+
+        assert len(results) == 0
+
+
+class TestRunCommandActionAllMatches:
+    """Tests for run_for: all_matches mode (CMD-5.2.x)."""
+
+    def test_multiple_files_single_command(self) -> None:
+        """CMD-5.2.1: Multiple files in single command."""
+        action = CommandAction(command="echo {files}", run_for="all_matches")
+        results = run_command_action(action, ["a.js", "b.js", "c.js"])
+
+        assert len(results) == 1
+        assert results[0].command == "echo a.js b.js c.js"
+        assert results[0].success is True
+
+    def test_single_file_single_command(self) -> None:
+        """CMD-5.2.2: Single file in single command."""
+        action = CommandAction(command="echo {files}", run_for="all_matches")
+        results = run_command_action(action, ["a.js"])
+
+        assert len(results) == 1
+        assert results[0].command == "echo a.js"
+
+
+class TestAllCommandsSucceeded:
+    """Tests for all_commands_succeeded helper."""
+
+    def test_all_success(self) -> None:
+        """All commands succeeded."""
+        results = [
+            CommandResult(success=True, exit_code=0, stdout="ok", stderr="", command="echo 1"),
+            CommandResult(success=True, exit_code=0, stdout="ok", stderr="", command="echo 2"),
+        ]
+        assert all_commands_succeeded(results) is True
+
+    def test_one_failure(self) -> None:
+        """One command failed."""
+        results = [
+            CommandResult(success=True, exit_code=0, stdout="ok", stderr="", command="echo 1"),
+            CommandResult(success=False, exit_code=1, stdout="", stderr="error", command="exit 1"),
+        ]
+        assert all_commands_succeeded(results) is False
+
+    def test_empty_list(self) -> None:
+        """Empty list is considered success."""
+        assert all_commands_succeeded([]) is True
+
+
+class TestFormatCommandErrors:
+    """Tests for format_command_errors helper."""
+
+    def test_single_error(self) -> None:
+        """Format single error."""
+        results = [
+            CommandResult(
+                success=False,
+                exit_code=1,
+                stdout="",
+                stderr="Something went wrong",
+                command="failing_cmd",
+            ),
+        ]
+        output = format_command_errors(results)
+        assert "failing_cmd" in output
+        assert "Something went wrong" in output
+        assert "Exit code: 1" in output
+
+    def test_multiple_errors(self) -> None:
+        """Format multiple errors."""
+        results = [
+            CommandResult(
+                success=False, exit_code=1, stdout="", stderr="Error 1", command="cmd1"
+            ),
+            CommandResult(
+                success=False, exit_code=2, stdout="", stderr="Error 2", command="cmd2"
+            ),
+        ]
+        output = format_command_errors(results)
+        assert "cmd1" in output
+        assert "Error 1" in output
+        assert "cmd2" in output
+        assert "Error 2" in output
+
+    def test_ignores_success(self) -> None:
+        """Ignore successful commands."""
+        results = [
+            CommandResult(success=True, exit_code=0, stdout="ok", stderr="", command="good_cmd"),
+            CommandResult(
+                success=False, exit_code=1, stdout="", stderr="bad", command="bad_cmd"
+            ),
+        ]
+        output = format_command_errors(results)
+        assert "good_cmd" not in output
+        assert "bad_cmd" in output
diff --git a/tests/unit/test_pattern_matcher.py b/tests/unit/test_pattern_matcher.py
new file mode 100644
index 00000000..69d73e7e
--- /dev/null
+++ b/tests/unit/test_pattern_matcher.py
@@ -0,0 +1,205 @@
+"""Tests for pattern matching with variable extraction."""
+
+import pytest
+
+from deepwork.core.pattern_matcher import (
+    PatternError,
+    match_pattern,
+    matches_any_pattern,
+    matches_glob,
+    resolve_pattern,
+    validate_pattern,
+)
+
+
+class TestBasicGlobPatterns:
+    """Tests for basic glob pattern matching (PM-1.1.x from test_scenarios.md)."""
+
+    def test_exact_match(self) -> None:
+        """PM-1.1.1: Exact match."""
+        assert matches_glob("README.md", "README.md")
+
+    def test_exact_no_match(self) -> None:
+        """PM-1.1.2: Exact no match (case sensitive)."""
+        assert not matches_glob("readme.md", "README.md")
+
+    def test_single_wildcard(self) -> None:
+        """PM-1.1.3: Single wildcard."""
+        assert matches_glob("main.py", "*.py")
+
+    def test_single_wildcard_nested(self) -> None:
+        """PM-1.1.4: Single wildcard - fnmatch matches nested paths too.
+
+        Note: Standard fnmatch does match across directory separators.
+        Use **/*.py pattern to explicitly require directory prefixes.
+        """
+        # fnmatch's * matches any character including /
+        # This is different from shell glob behavior
+        assert matches_glob("src/main.py", "*.py")
+
+    def test_double_wildcard(self) -> None:
+        """PM-1.1.5: Double wildcard matches nested paths."""
+        assert matches_glob("src/main.py", "**/*.py")
+
+    def test_double_wildcard_deep(self) -> None:
+        """PM-1.1.6: Double wildcard matches deeply nested paths."""
+        assert matches_glob("src/a/b/c/main.py", "**/*.py")
+
+    def test_double_wildcard_root(self) -> None:
+        """PM-1.1.7: Double wildcard matches root-level files."""
+        assert matches_glob("main.py", "**/*.py")
+
+    def test_directory_prefix(self) -> None:
+        """PM-1.1.8: Directory prefix matching."""
+        assert matches_glob("src/foo.py", "src/**/*")
+
+    def test_directory_prefix_deep(self) -> None:
+        """PM-1.1.9: Directory prefix matching deeply nested."""
+        assert matches_glob("src/a/b/c.py", "src/**/*")
+
+    def test_directory_no_match(self) -> None:
+        """PM-1.1.10: Directory prefix no match."""
+        assert not matches_glob("lib/foo.py", "src/**/*")
+
+    def test_brace_expansion_ts(self) -> None:
+        """PM-1.1.11: Brace expansion - not supported by fnmatch.
+
+        Note: Python's fnmatch doesn't support brace expansion.
+        Use matches_any_pattern with multiple patterns instead.
+        """
+        # fnmatch doesn't support {a,b} syntax
+        assert not matches_glob("app.ts", "*.{js,ts}")
+        # Use matches_any_pattern for multiple extensions
+        assert matches_any_pattern("app.ts", ["*.ts", "*.js"])
+
+    def test_brace_expansion_js(self) -> None:
+        """PM-1.1.12: Brace expansion - not supported by fnmatch."""
+        assert not matches_glob("app.js", "*.{js,ts}")
+        assert matches_any_pattern("app.js", ["*.ts", "*.js"])
+
+    def test_brace_expansion_no_match(self) -> None:
+        """PM-1.1.13: Brace expansion no match."""
+        # Neither {a,b} syntax nor multiple patterns match
+        assert not matches_glob("app.py", "*.{js,ts}")
+        assert not matches_any_pattern("app.py", ["*.ts", "*.js"])
+
+
+class TestVariablePatterns:
+    """Tests for variable pattern matching and extraction (PM-1.2.x)."""
+
+    def test_single_var_path(self) -> None:
+        """PM-1.2.1: Single variable captures nested path."""
+        result = match_pattern("src/{path}.py", "src/foo/bar.py")
+        assert result.matched
+        assert result.variables == {"path": "foo/bar"}
+
+    def test_single_var_name(self) -> None:
+        """PM-1.2.2: Single variable name (non-path)."""
+        result = match_pattern("src/{name}.py", "src/utils.py")
+        assert result.matched
+        assert result.variables == {"name": "utils"}
+
+    def test_name_no_nested(self) -> None:
+        """PM-1.2.3: {name} doesn't match nested paths (single segment)."""
+        result = match_pattern("src/{name}.py", "src/foo/bar.py")
+        # {name} only captures single segment, not nested paths
+        assert not result.matched
+
+    def test_two_variables(self) -> None:
+        """PM-1.2.4: Two variables in pattern."""
+        result = match_pattern("{dir}/{name}.py", "src/main.py")
+        assert result.matched
+        assert result.variables == {"dir": "src", "name": "main"}
+
+    def test_prefix_and_suffix(self) -> None:
+        """PM-1.2.5: Prefix and suffix around variable."""
+        result = match_pattern("test_{name}_test.py", "test_foo_test.py")
+        assert result.matched
+        assert result.variables == {"name": "foo"}
+
+    def test_nested_path_variable(self) -> None:
+        """PM-1.2.6: Nested path in middle."""
+        result = match_pattern("src/{path}/index.py", "src/a/b/index.py")
+        assert result.matched
+        assert result.variables == {"path": "a/b"}
+
+    def test_explicit_multi_segment(self) -> None:
+        """PM-1.2.7: Explicit {**mod} for multi-segment."""
+        result = match_pattern("src/{**mod}/main.py", "src/a/b/c/main.py")
+        assert result.matched
+        assert result.variables == {"mod": "a/b/c"}
+
+    def test_explicit_single_segment(self) -> None:
+        """PM-1.2.8: Explicit {*name} for single segment."""
+        result = match_pattern("src/{*name}.py", "src/utils.py")
+        assert result.matched
+        assert result.variables == {"name": "utils"}
+
+    def test_mixed_explicit(self) -> None:
+        """PM-1.2.9: Mixed explicit single and multi."""
+        result = match_pattern("{*dir}/{**path}.py", "src/a/b/c.py")
+        assert result.matched
+        assert result.variables == {"dir": "src", "path": "a/b/c"}
+
+
+class TestPatternResolution:
+    """Tests for pattern resolution / substitution (PM-1.3.x)."""
+
+    def test_simple_substitution(self) -> None:
+        """PM-1.3.1: Simple variable substitution."""
+        result = resolve_pattern("tests/{path}_test.py", {"path": "foo"})
+        assert result == "tests/foo_test.py"
+
+    def test_nested_path_substitution(self) -> None:
+        """PM-1.3.2: Nested path substitution."""
+        result = resolve_pattern("tests/{path}_test.py", {"path": "a/b/c"})
+        assert result == "tests/a/b/c_test.py"
+
+    def test_multiple_vars_substitution(self) -> None:
+        """PM-1.3.3: Multiple variables substitution."""
+        result = resolve_pattern("{dir}/test_{name}.py", {"dir": "tests", "name": "foo"})
+        assert result == "tests/test_foo.py"
+
+
+class TestPatternValidation:
+    """Tests for pattern syntax validation (SV-8.3.x)."""
+
+    def test_unclosed_brace(self) -> None:
+        """SV-8.3.1: Unclosed brace."""
+        with pytest.raises(PatternError, match="Unclosed brace|unclosed brace"):
+            validate_pattern("src/{path.py")
+
+    def test_empty_variable(self) -> None:
+        """SV-8.3.2: Empty variable name."""
+        with pytest.raises(PatternError, match="[Ee]mpty variable name"):
+            validate_pattern("src/{}.py")
+
+    def test_invalid_chars_in_var(self) -> None:
+        """SV-8.3.3: Invalid characters in variable name."""
+        with pytest.raises(PatternError, match="[Ii]nvalid"):
+            validate_pattern("src/{path/name}.py")
+
+    def test_duplicate_variable(self) -> None:
+        """SV-8.3.4: Duplicate variable name."""
+        with pytest.raises(PatternError, match="[Dd]uplicate"):
+            validate_pattern("{path}/{path}.py")
+
+
+class TestMatchesAnyPattern:
+    """Tests for matches_any_pattern function."""
+
+    def test_matches_first_pattern(self) -> None:
+        """Match against first of multiple patterns."""
+        assert matches_any_pattern("file.py", ["*.py", "*.js"])
+
+    def test_matches_second_pattern(self) -> None:
+        """Match against second of multiple patterns."""
+        assert matches_any_pattern("file.js", ["*.py", "*.js"])
+
+    def test_no_match(self) -> None:
+        """No match in any pattern."""
+        assert not matches_any_pattern("file.txt", ["*.py", "*.js"])
+
+    def test_empty_patterns(self) -> None:
+        """Empty patterns list never matches."""
+        assert not matches_any_pattern("file.py", [])
diff --git a/tests/unit/test_rules_parser.py b/tests/unit/test_rules_parser.py
index 2906816a..f764edf7 100644
--- a/tests/unit/test_rules_parser.py
+++ b/tests/unit/test_rules_parser.py
@@ -8,6 +8,7 @@
 from deepwork.core.rules_parser import (
     DEFAULT_COMPARE_TO,
     DetectionMode,
+    PairConfig,
     Rule,
     RulesParseError,
     evaluate_rules,
@@ -362,3 +363,373 @@ def test_loads_rule_with_command_action(self, temp_dir: Path) -> None:
         assert rules[0].command_action is not None
         assert rules[0].command_action.command == "ruff format {file}"
         assert rules[0].command_action.run_for == "each_match"
+
+
+class TestCorrespondenceSets:
+    """Tests for set correspondence evaluation (CS-3.x from test_scenarios.md)."""
+
+    def test_both_changed_no_fire(self) -> None:
+        """CS-3.1.1: Both source and test changed - no fire."""
+        rule = Rule(
+            name="Source/Test Pairing",
+            filename="source-test-pairing",
+            detection_mode=DetectionMode.SET,
+            set_patterns=["src/{path}.py", "tests/{path}_test.py"],
+            instructions="Update tests",
+        )
+        changed_files = ["src/foo.py", "tests/foo_test.py"]
+
+        result = evaluate_rule(rule, changed_files)
+        assert result.should_fire is False
+
+    def test_only_source_fires(self) -> None:
+        """CS-3.1.2: Only source changed - fires."""
+        rule = Rule(
+            name="Source/Test Pairing",
+            filename="source-test-pairing",
+            detection_mode=DetectionMode.SET,
+            set_patterns=["src/{path}.py", "tests/{path}_test.py"],
+            instructions="Update tests",
+        )
+        changed_files = ["src/foo.py"]
+
+        result = evaluate_rule(rule, changed_files)
+        assert result.should_fire is True
+        assert "src/foo.py" in result.trigger_files
+        assert "tests/foo_test.py" in result.missing_files
+
+    def test_only_test_fires(self) -> None:
+        """CS-3.1.3: Only test changed - fires."""
+        rule = Rule(
+            name="Source/Test Pairing",
+            filename="source-test-pairing",
+            detection_mode=DetectionMode.SET,
+            set_patterns=["src/{path}.py", "tests/{path}_test.py"],
+            instructions="Update source",
+        )
+        changed_files = ["tests/foo_test.py"]
+
+        result = evaluate_rule(rule, changed_files)
+        assert result.should_fire is True
+        assert "tests/foo_test.py" in result.trigger_files
+        assert "src/foo.py" in result.missing_files
+
+    def test_nested_both_no_fire(self) -> None:
+        """CS-3.1.4: Nested paths - both changed."""
+        rule = Rule(
+            name="Source/Test Pairing",
+            filename="source-test-pairing",
+            detection_mode=DetectionMode.SET,
+            set_patterns=["src/{path}.py", "tests/{path}_test.py"],
+            instructions="Update tests",
+        )
+        changed_files = ["src/a/b.py", "tests/a/b_test.py"]
+
+        result = evaluate_rule(rule, changed_files)
+        assert result.should_fire is False
+
+    def test_nested_only_source_fires(self) -> None:
+        """CS-3.1.5: Nested paths - only source."""
+        rule = Rule(
+            name="Source/Test Pairing",
+            filename="source-test-pairing",
+            detection_mode=DetectionMode.SET,
+            set_patterns=["src/{path}.py", "tests/{path}_test.py"],
+            instructions="Update tests",
+        )
+        changed_files = ["src/a/b.py"]
+
+        result = evaluate_rule(rule, changed_files)
+        assert result.should_fire is True
+        assert "tests/a/b_test.py" in result.missing_files
+
+    def test_unrelated_file_no_fire(self) -> None:
+        """CS-3.1.6: Unrelated file - no fire."""
+        rule = Rule(
+            name="Source/Test Pairing",
+            filename="source-test-pairing",
+            detection_mode=DetectionMode.SET,
+            set_patterns=["src/{path}.py", "tests/{path}_test.py"],
+            instructions="Update tests",
+        )
+        changed_files = ["docs/readme.md"]
+
+        result = evaluate_rule(rule, changed_files)
+        assert result.should_fire is False
+
+    def test_source_plus_unrelated_fires(self) -> None:
+        """CS-3.1.7: Source + unrelated - fires."""
+        rule = Rule(
+            name="Source/Test Pairing",
+            filename="source-test-pairing",
+            detection_mode=DetectionMode.SET,
+            set_patterns=["src/{path}.py", "tests/{path}_test.py"],
+            instructions="Update tests",
+        )
+        changed_files = ["src/foo.py", "docs/readme.md"]
+
+        result = evaluate_rule(rule, changed_files)
+        assert result.should_fire is True
+
+    def test_both_plus_unrelated_no_fire(self) -> None:
+        """CS-3.1.8: Both + unrelated - no fire."""
+        rule = Rule(
+            name="Source/Test Pairing",
+            filename="source-test-pairing",
+            detection_mode=DetectionMode.SET,
+            set_patterns=["src/{path}.py", "tests/{path}_test.py"],
+            instructions="Update tests",
+        )
+        changed_files = ["src/foo.py", "tests/foo_test.py", "docs/readme.md"]
+
+        result = evaluate_rule(rule, changed_files)
+        assert result.should_fire is False
+
+
+class TestThreePatternSets:
+    """Tests for three-pattern set correspondence (CS-3.2.x)."""
+
+    def test_all_three_no_fire(self) -> None:
+        """CS-3.2.1: All three files changed - no fire."""
+        rule = Rule(
+            name="Model/Schema/Migration",
+            filename="model-schema-migration",
+            detection_mode=DetectionMode.SET,
+            set_patterns=[
+                "models/{name}.py",
+                "schemas/{name}.py",
+                "migrations/{name}.sql",
+            ],
+            instructions="Update all related files",
+        )
+        changed_files = ["models/user.py", "schemas/user.py", "migrations/user.sql"]
+
+        result = evaluate_rule(rule, changed_files)
+        assert result.should_fire is False
+
+    def test_two_of_three_fires(self) -> None:
+        """CS-3.2.2: Two of three - fires (missing migration)."""
+        rule = Rule(
+            name="Model/Schema/Migration",
+            filename="model-schema-migration",
+            detection_mode=DetectionMode.SET,
+            set_patterns=[
+                "models/{name}.py",
+                "schemas/{name}.py",
+                "migrations/{name}.sql",
+            ],
+            instructions="Update all related files",
+        )
+        changed_files = ["models/user.py", "schemas/user.py"]
+
+        result = evaluate_rule(rule, changed_files)
+        assert result.should_fire is True
+        assert "migrations/user.sql" in result.missing_files
+
+    def test_one_of_three_fires(self) -> None:
+        """CS-3.2.3: One of three - fires (missing 2)."""
+        rule = Rule(
+            name="Model/Schema/Migration",
+            filename="model-schema-migration",
+            detection_mode=DetectionMode.SET,
+            set_patterns=[
+                "models/{name}.py",
+                "schemas/{name}.py",
+                "migrations/{name}.sql",
+            ],
+            instructions="Update all related files",
+        )
+        changed_files = ["models/user.py"]
+
+        result = evaluate_rule(rule, changed_files)
+        assert result.should_fire is True
+        assert len(result.missing_files) == 2
+        assert "schemas/user.py" in result.missing_files
+        assert "migrations/user.sql" in result.missing_files
+
+    def test_different_names_fire_both(self) -> None:
+        """CS-3.2.4: Different names - both incomplete."""
+        rule = Rule(
+            name="Model/Schema/Migration",
+            filename="model-schema-migration",
+            detection_mode=DetectionMode.SET,
+            set_patterns=[
+                "models/{name}.py",
+                "schemas/{name}.py",
+                "migrations/{name}.sql",
+            ],
+            instructions="Update all related files",
+        )
+        changed_files = ["models/user.py", "schemas/order.py"]
+
+        result = evaluate_rule(rule, changed_files)
+        assert result.should_fire is True
+        # Both trigger because each is incomplete
+        assert "models/user.py" in result.trigger_files or "schemas/order.py" in result.trigger_files
+
+
+class TestCorrespondencePairs:
+    """Tests for pair correspondence evaluation (CP-4.x from test_scenarios.md)."""
+
+    def test_both_changed_no_fire(self) -> None:
+        """CP-4.1.1: Both trigger and expected changed - no fire."""
+        rule = Rule(
+            name="API Documentation",
+            filename="api-documentation",
+            detection_mode=DetectionMode.PAIR,
+            pair_config=PairConfig(
+                trigger="api/{path}.py",
+                expects=["docs/api/{path}.md"],
+            ),
+            instructions="Update API docs",
+        )
+        changed_files = ["api/users.py", "docs/api/users.md"]
+
+        result = evaluate_rule(rule, changed_files)
+        assert result.should_fire is False
+
+    def test_only_trigger_fires(self) -> None:
+        """CP-4.1.2: Only trigger changed - fires."""
+        rule = Rule(
+            name="API Documentation",
+            filename="api-documentation",
+            detection_mode=DetectionMode.PAIR,
+            pair_config=PairConfig(
+                trigger="api/{path}.py",
+                expects=["docs/api/{path}.md"],
+            ),
+            instructions="Update API docs",
+        )
+        changed_files = ["api/users.py"]
+
+        result = evaluate_rule(rule, changed_files)
+        assert result.should_fire is True
+        assert "api/users.py" in result.trigger_files
+        assert "docs/api/users.md" in result.missing_files
+
+    def test_only_expected_no_fire(self) -> None:
+        """CP-4.1.3: Only expected changed - no fire (directional)."""
+        rule = Rule(
+            name="API Documentation",
+            filename="api-documentation",
+            detection_mode=DetectionMode.PAIR,
+            pair_config=PairConfig(
+                trigger="api/{path}.py",
+                expects=["docs/api/{path}.md"],
+            ),
+            instructions="Update API docs",
+        )
+        changed_files = ["docs/api/users.md"]
+
+        result = evaluate_rule(rule, changed_files)
+        assert result.should_fire is False
+
+    def test_trigger_plus_unrelated_fires(self) -> None:
+        """CP-4.1.4: Trigger + unrelated - fires."""
+        rule = Rule(
+            name="API Documentation",
+            filename="api-documentation",
+            detection_mode=DetectionMode.PAIR,
+            pair_config=PairConfig(
+                trigger="api/{path}.py",
+                expects=["docs/api/{path}.md"],
+            ),
+            instructions="Update API docs",
+        )
+        changed_files = ["api/users.py", "README.md"]
+
+        result = evaluate_rule(rule, changed_files)
+        assert result.should_fire is True
+
+    def test_expected_plus_unrelated_no_fire(self) -> None:
+        """CP-4.1.5: Expected + unrelated - no fire."""
+        rule = Rule(
+            name="API Documentation",
+            filename="api-documentation",
+            detection_mode=DetectionMode.PAIR,
+            pair_config=PairConfig(
+                trigger="api/{path}.py",
+                expects=["docs/api/{path}.md"],
+            ),
+            instructions="Update API docs",
+        )
+        changed_files = ["docs/api/users.md", "README.md"]
+
+        result = evaluate_rule(rule, changed_files)
+        assert result.should_fire is False
+
+
+class TestMultiExpectsPairs:
+    """Tests for multi-expects pair correspondence (CP-4.2.x)."""
+
+    def test_all_three_no_fire(self) -> None:
+        """CP-4.2.1: All three changed - no fire."""
+        rule = Rule(
+            name="API Full Documentation",
+            filename="api-full-documentation",
+            detection_mode=DetectionMode.PAIR,
+            pair_config=PairConfig(
+                trigger="api/{path}.py",
+                expects=["docs/api/{path}.md", "openapi/{path}.yaml"],
+            ),
+            instructions="Update API docs and OpenAPI",
+        )
+        changed_files = ["api/users.py", "docs/api/users.md", "openapi/users.yaml"]
+
+        result = evaluate_rule(rule, changed_files)
+        assert result.should_fire is False
+
+    def test_trigger_plus_one_expect_fires(self) -> None:
+        """CP-4.2.2: Trigger + one expect - fires (missing openapi)."""
+        rule = Rule(
+            name="API Full Documentation",
+            filename="api-full-documentation",
+            detection_mode=DetectionMode.PAIR,
+            pair_config=PairConfig(
+                trigger="api/{path}.py",
+                expects=["docs/api/{path}.md", "openapi/{path}.yaml"],
+            ),
+            instructions="Update API docs and OpenAPI",
+        )
+        changed_files = ["api/users.py", "docs/api/users.md"]
+
+        result = evaluate_rule(rule, changed_files)
+        assert result.should_fire is True
+        assert "openapi/users.yaml" in result.missing_files
+
+    def test_only_trigger_fires_missing_both(self) -> None:
+        """CP-4.2.3: Only trigger - fires (missing both)."""
+        rule = Rule(
+            name="API Full Documentation",
+            filename="api-full-documentation",
+            detection_mode=DetectionMode.PAIR,
+            pair_config=PairConfig(
+                trigger="api/{path}.py",
+                expects=["docs/api/{path}.md", "openapi/{path}.yaml"],
+            ),
+            instructions="Update API docs and OpenAPI",
+        )
+        changed_files = ["api/users.py"]
+
+        result = evaluate_rule(rule, changed_files)
+        assert result.should_fire is True
+        assert len(result.missing_files) == 2
+        assert "docs/api/users.md" in result.missing_files
+        assert "openapi/users.yaml" in result.missing_files
+
+    def test_both_expects_only_no_fire(self) -> None:
+        """CP-4.2.4: Both expects only - no fire."""
+        rule = Rule(
+            name="API Full Documentation",
+            filename="api-full-documentation",
+            detection_mode=DetectionMode.PAIR,
+            pair_config=PairConfig(
+                trigger="api/{path}.py",
+                expects=["docs/api/{path}.md", "openapi/{path}.yaml"],
+            ),
+            instructions="Update API docs and OpenAPI",
+        )
+        changed_files = ["docs/api/users.md", "openapi/users.yaml"]
+
+        result = evaluate_rule(rule, changed_files)
+        assert result.should_fire is False
diff --git a/tests/unit/test_rules_queue.py b/tests/unit/test_rules_queue.py
new file mode 100644
index 00000000..4b66ea7d
--- /dev/null
+++ b/tests/unit/test_rules_queue.py
@@ -0,0 +1,352 @@
+"""Tests for rules queue system (QS-6.x from test_scenarios.md)."""
+
+import json
+from pathlib import Path
+
+import pytest
+
+from deepwork.core.rules_queue import (
+    ActionResult,
+    QueueEntry,
+    QueueEntryStatus,
+    RulesQueue,
+    compute_trigger_hash,
+)
+
+
+class TestComputeTriggerHash:
+    """Tests for hash calculation (QS-6.2.x)."""
+
+    def test_same_everything_same_hash(self) -> None:
+        """QS-6.2.1: Same rule, files, baseline - same hash."""
+        hash1 = compute_trigger_hash("RuleA", ["a.py"], "commit1")
+        hash2 = compute_trigger_hash("RuleA", ["a.py"], "commit1")
+        assert hash1 == hash2
+
+    def test_different_files_different_hash(self) -> None:
+        """QS-6.2.2: Different files - different hash."""
+        hash1 = compute_trigger_hash("RuleA", ["a.py"], "commit1")
+        hash2 = compute_trigger_hash("RuleA", ["b.py"], "commit1")
+        assert hash1 != hash2
+
+    def test_different_baseline_different_hash(self) -> None:
+        """QS-6.2.3: Different baseline - different hash."""
+        hash1 = compute_trigger_hash("RuleA", ["a.py"], "commit1")
+        hash2 = compute_trigger_hash("RuleA", ["a.py"], "commit2")
+        assert hash1 != hash2
+
+    def test_different_rule_different_hash(self) -> None:
+        """QS-6.2.4: Different rule - different hash."""
+        hash1 = compute_trigger_hash("RuleA", ["a.py"], "commit1")
+        hash2 = compute_trigger_hash("RuleB", ["a.py"], "commit1")
+        assert hash1 != hash2
+
+    def test_file_order_independent(self) -> None:
+        """File order should not affect hash (sorted internally)."""
+        hash1 = compute_trigger_hash("RuleA", ["a.py", "b.py"], "commit1")
+        hash2 = compute_trigger_hash("RuleA", ["b.py", "a.py"], "commit1")
+        assert hash1 == hash2
+
+
+class TestQueueEntry:
+    """Tests for QueueEntry dataclass."""
+
+    def test_to_dict_and_from_dict(self) -> None:
+        """Round-trip serialization."""
+        entry = QueueEntry(
+            rule_name="Test Rule",
+            rule_file="test-rule.md",
+            trigger_hash="abc123",
+            status=QueueEntryStatus.QUEUED,
+            baseline_ref="commit1",
+            trigger_files=["src/main.py"],
+            expected_files=["tests/main_test.py"],
+        )
+
+        data = entry.to_dict()
+        restored = QueueEntry.from_dict(data)
+
+        assert restored.rule_name == entry.rule_name
+        assert restored.rule_file == entry.rule_file
+        assert restored.trigger_hash == entry.trigger_hash
+        assert restored.status == entry.status
+        assert restored.trigger_files == entry.trigger_files
+        assert restored.expected_files == entry.expected_files
+
+    def test_with_action_result(self) -> None:
+        """Serialization with action result."""
+        entry = QueueEntry(
+            rule_name="Test Rule",
+            rule_file="test-rule.md",
+            trigger_hash="abc123",
+            action_result=ActionResult(type="command", output="ok", exit_code=0),
+        )
+
+        data = entry.to_dict()
+        restored = QueueEntry.from_dict(data)
+
+        assert restored.action_result is not None
+        assert restored.action_result.type == "command"
+        assert restored.action_result.exit_code == 0
+
+
+class TestRulesQueue:
+    """Tests for RulesQueue class (QS-6.1.x, QS-6.3.x)."""
+
+    @pytest.fixture
+    def queue(self, tmp_path: Path) -> RulesQueue:
+        """Create a queue with temp directory."""
+        return RulesQueue(tmp_path / "queue")
+
+    def test_create_entry(self, queue: RulesQueue) -> None:
+        """QS-6.1.1: Create new queue entry."""
+        entry = queue.create_entry(
+            rule_name="Test Rule",
+            rule_file="test-rule.md",
+            trigger_files=["src/main.py"],
+            baseline_ref="commit1",
+        )
+
+        assert entry is not None
+        assert entry.status == QueueEntryStatus.QUEUED
+        assert entry.rule_name == "Test Rule"
+
+    def test_create_duplicate_returns_none(self, queue: RulesQueue) -> None:
+        """QS-6.1.6: Re-trigger same files returns None."""
+        entry1 = queue.create_entry(
+            rule_name="Test Rule",
+            rule_file="test-rule.md",
+            trigger_files=["src/main.py"],
+            baseline_ref="commit1",
+        )
+        entry2 = queue.create_entry(
+            rule_name="Test Rule",
+            rule_file="test-rule.md",
+            trigger_files=["src/main.py"],
+            baseline_ref="commit1",
+        )
+
+        assert entry1 is not None
+        assert entry2 is None  # Duplicate
+
+    def test_create_different_files_new_entry(self, queue: RulesQueue) -> None:
+        """QS-6.1.7: Different files create new entry."""
+        entry1 = queue.create_entry(
+            rule_name="Test Rule",
+            rule_file="test-rule.md",
+            trigger_files=["src/a.py"],
+            baseline_ref="commit1",
+        )
+        entry2 = queue.create_entry(
+            rule_name="Test Rule",
+            rule_file="test-rule.md",
+            trigger_files=["src/b.py"],  # Different file
+            baseline_ref="commit1",
+        )
+
+        assert entry1 is not None
+        assert entry2 is not None
+
+    def test_has_entry(self, queue: RulesQueue) -> None:
+        """Check if entry exists."""
+        entry = queue.create_entry(
+            rule_name="Test Rule",
+            rule_file="test-rule.md",
+            trigger_files=["src/main.py"],
+            baseline_ref="commit1",
+        )
+        assert entry is not None
+
+        assert queue.has_entry(entry.trigger_hash) is True
+        assert queue.has_entry("nonexistent") is False
+
+    def test_get_entry(self, queue: RulesQueue) -> None:
+        """Retrieve entry by hash."""
+        entry = queue.create_entry(
+            rule_name="Test Rule",
+            rule_file="test-rule.md",
+            trigger_files=["src/main.py"],
+            baseline_ref="commit1",
+        )
+        assert entry is not None
+
+        retrieved = queue.get_entry(entry.trigger_hash)
+        assert retrieved is not None
+        assert retrieved.rule_name == "Test Rule"
+
+    def test_get_nonexistent_entry(self, queue: RulesQueue) -> None:
+        """Get nonexistent entry returns None."""
+        assert queue.get_entry("nonexistent") is None
+
+    def test_update_status_to_passed(self, queue: RulesQueue) -> None:
+        """QS-6.1.3: Update status to passed."""
+        entry = queue.create_entry(
+            rule_name="Test Rule",
+            rule_file="test-rule.md",
+            trigger_files=["src/main.py"],
+            baseline_ref="commit1",
+        )
+        assert entry is not None
+
+        success = queue.update_status(entry.trigger_hash, QueueEntryStatus.PASSED)
+        assert success is True
+
+        updated = queue.get_entry(entry.trigger_hash)
+        assert updated is not None
+        assert updated.status == QueueEntryStatus.PASSED
+        assert updated.evaluated_at is not None
+
+    def test_update_status_to_failed(self, queue: RulesQueue) -> None:
+        """QS-6.1.5: Update status to failed."""
+        entry = queue.create_entry(
+            rule_name="Test Rule",
+            rule_file="test-rule.md",
+            trigger_files=["src/main.py"],
+            baseline_ref="commit1",
+        )
+        assert entry is not None
+
+        action_result = ActionResult(type="command", output="error", exit_code=1)
+        success = queue.update_status(
+            entry.trigger_hash, QueueEntryStatus.FAILED, action_result
+        )
+        assert success is True
+
+        updated = queue.get_entry(entry.trigger_hash)
+        assert updated is not None
+        assert updated.status == QueueEntryStatus.FAILED
+        assert updated.action_result is not None
+        assert updated.action_result.exit_code == 1
+
+    def test_update_status_to_skipped(self, queue: RulesQueue) -> None:
+        """QS-6.1.2: Update status to skipped (safety suppression)."""
+        entry = queue.create_entry(
+            rule_name="Test Rule",
+            rule_file="test-rule.md",
+            trigger_files=["src/main.py"],
+            baseline_ref="commit1",
+        )
+        assert entry is not None
+
+        success = queue.update_status(entry.trigger_hash, QueueEntryStatus.SKIPPED)
+        assert success is True
+
+        updated = queue.get_entry(entry.trigger_hash)
+        assert updated is not None
+        assert updated.status == QueueEntryStatus.SKIPPED
+
+    def test_update_nonexistent_returns_false(self, queue: RulesQueue) -> None:
+        """Update nonexistent entry returns False."""
+        success = queue.update_status("nonexistent", QueueEntryStatus.PASSED)
+        assert success is False
+
+    def test_get_queued_entries(self, queue: RulesQueue) -> None:
+        """Get only queued entries."""
+        # Create multiple entries with different statuses
+        entry1 = queue.create_entry(
+            rule_name="Rule 1",
+            rule_file="rule1.md",
+            trigger_files=["a.py"],
+            baseline_ref="commit1",
+        )
+        entry2 = queue.create_entry(
+            rule_name="Rule 2",
+            rule_file="rule2.md",
+            trigger_files=["b.py"],
+            baseline_ref="commit1",
+        )
+        assert entry1 is not None
+        assert entry2 is not None
+
+        # Update one to passed
+        queue.update_status(entry1.trigger_hash, QueueEntryStatus.PASSED)
+
+        # Get queued only
+        queued = queue.get_queued_entries()
+        assert len(queued) == 1
+        assert queued[0].rule_name == "Rule 2"
+
+    def test_get_all_entries(self, queue: RulesQueue) -> None:
+        """Get all entries regardless of status."""
+        entry1 = queue.create_entry(
+            rule_name="Rule 1",
+            rule_file="rule1.md",
+            trigger_files=["a.py"],
+            baseline_ref="commit1",
+        )
+        entry2 = queue.create_entry(
+            rule_name="Rule 2",
+            rule_file="rule2.md",
+            trigger_files=["b.py"],
+            baseline_ref="commit1",
+        )
+        assert entry1 is not None
+        assert entry2 is not None
+
+        queue.update_status(entry1.trigger_hash, QueueEntryStatus.PASSED)
+
+        all_entries = queue.get_all_entries()
+        assert len(all_entries) == 2
+
+    def test_remove_entry(self, queue: RulesQueue) -> None:
+        """Remove entry by hash."""
+        entry = queue.create_entry(
+            rule_name="Test Rule",
+            rule_file="test-rule.md",
+            trigger_files=["src/main.py"],
+            baseline_ref="commit1",
+        )
+        assert entry is not None
+
+        removed = queue.remove_entry(entry.trigger_hash)
+        assert removed is True
+        assert queue.has_entry(entry.trigger_hash) is False
+
+    def test_remove_nonexistent_returns_false(self, queue: RulesQueue) -> None:
+        """Remove nonexistent entry returns False."""
+        removed = queue.remove_entry("nonexistent")
+        assert removed is False
+
+    def test_clear(self, queue: RulesQueue) -> None:
+        """Clear all entries."""
+        queue.create_entry(
+            rule_name="Rule 1",
+            rule_file="rule1.md",
+            trigger_files=["a.py"],
+            baseline_ref="commit1",
+        )
+        queue.create_entry(
+            rule_name="Rule 2",
+            rule_file="rule2.md",
+            trigger_files=["b.py"],
+            baseline_ref="commit1",
+        )
+
+        count = queue.clear()
+        assert count == 2
+        assert len(queue.get_all_entries()) == 0
+
+    def test_clear_empty_queue(self, queue: RulesQueue) -> None:
+        """Clear empty queue returns 0."""
+        count = queue.clear()
+        assert count == 0
+
+    def test_file_structure(self, queue: RulesQueue) -> None:
+        """Verify queue files are named correctly."""
+        entry = queue.create_entry(
+            rule_name="Test Rule",
+            rule_file="test-rule.md",
+            trigger_files=["src/main.py"],
+            baseline_ref="commit1",
+        )
+        assert entry is not None
+
+        # Check file exists with correct naming
+        expected_file = queue.queue_dir / f"{entry.trigger_hash}.queued.json"
+        assert expected_file.exists()
+
+        # Update status and check file renamed
+        queue.update_status(entry.trigger_hash, QueueEntryStatus.PASSED)
+        assert not expected_file.exists()
+        passed_file = queue.queue_dir / f"{entry.trigger_hash}.passed.json"
+        assert passed_file.exists()
diff --git a/tests/unit/test_schema_validation.py b/tests/unit/test_schema_validation.py
new file mode 100644
index 00000000..fc921ec8
--- /dev/null
+++ b/tests/unit/test_schema_validation.py
@@ -0,0 +1,323 @@
+"""Tests for schema validation (SV-8.x from test_scenarios.md)."""
+
+from pathlib import Path
+
+import pytest
+
+from deepwork.core.rules_parser import RulesParseError, parse_rule_file
+
+
+class TestRequiredFields:
+    """Tests for required field validation (SV-8.1.x)."""
+
+    def test_missing_name(self, tmp_path: Path) -> None:
+        """SV-8.1.1: Missing name field."""
+        rule_file = tmp_path / "test.md"
+        rule_file.write_text(
+            """---
+trigger: "src/**/*"
+---
+Instructions here.
+"""
+        )
+
+        with pytest.raises(RulesParseError, match="name"):
+            parse_rule_file(rule_file)
+
+    def test_missing_detection_mode(self, tmp_path: Path) -> None:
+        """SV-8.1.2: Missing trigger, set, or pair."""
+        rule_file = tmp_path / "test.md"
+        rule_file.write_text(
+            """---
+name: Test Rule
+---
+Instructions here.
+"""
+        )
+
+        with pytest.raises(RulesParseError):
+            parse_rule_file(rule_file)
+
+    def test_missing_markdown_body(self, tmp_path: Path) -> None:
+        """SV-8.1.3: Missing markdown body (for prompt action)."""
+        rule_file = tmp_path / "test.md"
+        rule_file.write_text(
+            """---
+name: Test Rule
+trigger: "src/**/*"
+---
+"""
+        )
+
+        with pytest.raises(RulesParseError, match="markdown body|instructions"):
+            parse_rule_file(rule_file)
+
+    def test_set_requires_two_patterns(self, tmp_path: Path) -> None:
+        """SV-8.1.4: Set requires at least 2 patterns.
+
+        Note: Schema validation catches this before rule parser.
+        """
+        rule_file = tmp_path / "test.md"
+        rule_file.write_text(
+            """---
+name: Test Rule
+set:
+  - src/{path}.py
+---
+Instructions here.
+"""
+        )
+
+        # Schema validation will fail due to minItems: 2
+        with pytest.raises(RulesParseError):
+            parse_rule_file(rule_file)
+
+
+class TestMutuallyExclusiveFields:
+    """Tests for mutually exclusive field validation (SV-8.2.x)."""
+
+    def test_both_trigger_and_set(self, tmp_path: Path) -> None:
+        """SV-8.2.1: Both trigger and set is invalid."""
+        rule_file = tmp_path / "test.md"
+        rule_file.write_text(
+            """---
+name: Test Rule
+trigger: "src/**/*"
+set:
+  - src/{path}.py
+  - tests/{path}_test.py
+---
+Instructions here.
+"""
+        )
+
+        with pytest.raises(RulesParseError):
+            parse_rule_file(rule_file)
+
+    def test_both_trigger_and_pair(self, tmp_path: Path) -> None:
+        """SV-8.2.2: Both trigger and pair is invalid."""
+        rule_file = tmp_path / "test.md"
+        rule_file.write_text(
+            """---
+name: Test Rule
+trigger: "src/**/*"
+pair:
+  trigger: api/{path}.py
+  expects: docs/{path}.md
+---
+Instructions here.
+"""
+        )
+
+        with pytest.raises(RulesParseError):
+            parse_rule_file(rule_file)
+
+    def test_all_detection_modes(self, tmp_path: Path) -> None:
+        """SV-8.2.3: All three detection modes is invalid."""
+        rule_file = tmp_path / "test.md"
+        rule_file.write_text(
+            """---
+name: Test Rule
+trigger: "src/**/*"
+set:
+  - src/{path}.py
+  - tests/{path}_test.py
+pair:
+  trigger: api/{path}.py
+  expects: docs/{path}.md
+---
+Instructions here.
+"""
+        )
+
+        with pytest.raises(RulesParseError):
+            parse_rule_file(rule_file)
+
+
+class TestValueValidation:
+    """Tests for value validation (SV-8.4.x)."""
+
+    def test_invalid_compare_to(self, tmp_path: Path) -> None:
+        """SV-8.4.1: Invalid compare_to value."""
+        rule_file = tmp_path / "test.md"
+        rule_file.write_text(
+            """---
+name: Test Rule
+trigger: "src/**/*"
+compare_to: yesterday
+---
+Instructions here.
+"""
+        )
+
+        with pytest.raises(RulesParseError):
+            parse_rule_file(rule_file)
+
+    def test_invalid_run_for(self, tmp_path: Path) -> None:
+        """SV-8.4.2: Invalid run_for value."""
+        rule_file = tmp_path / "test.md"
+        rule_file.write_text(
+            """---
+name: Test Rule
+trigger: "**/*.py"
+action:
+  command: "ruff format {file}"
+  run_for: first_match
+---
+"""
+        )
+
+        with pytest.raises(RulesParseError):
+            parse_rule_file(rule_file)
+
+
+class TestValidRules:
+    """Tests for valid rule parsing."""
+
+    def test_valid_trigger_safety_rule(self, tmp_path: Path) -> None:
+        """Valid trigger/safety rule parses successfully."""
+        rule_file = tmp_path / "test.md"
+        rule_file.write_text(
+            """---
+name: Test Rule
+trigger: "src/**/*"
+safety: README.md
+---
+Please check the code.
+"""
+        )
+
+        rule = parse_rule_file(rule_file)
+        assert rule.name == "Test Rule"
+        assert rule.triggers == ["src/**/*"]
+        assert rule.safety == ["README.md"]
+
+    def test_valid_set_rule(self, tmp_path: Path) -> None:
+        """Valid set rule parses successfully."""
+        rule_file = tmp_path / "test.md"
+        rule_file.write_text(
+            """---
+name: Source/Test Pairing
+set:
+  - src/{path}.py
+  - tests/{path}_test.py
+---
+Source and test should change together.
+"""
+        )
+
+        rule = parse_rule_file(rule_file)
+        assert rule.name == "Source/Test Pairing"
+        assert len(rule.set_patterns) == 2
+
+    def test_valid_pair_rule(self, tmp_path: Path) -> None:
+        """Valid pair rule parses successfully."""
+        rule_file = tmp_path / "test.md"
+        rule_file.write_text(
+            """---
+name: API Documentation
+pair:
+  trigger: api/{module}.py
+  expects: docs/api/{module}.md
+---
+API changes need documentation.
+"""
+        )
+
+        rule = parse_rule_file(rule_file)
+        assert rule.name == "API Documentation"
+        assert rule.pair_config is not None
+        assert rule.pair_config.trigger == "api/{module}.py"
+        assert rule.pair_config.expects == ["docs/api/{module}.md"]
+
+    def test_valid_command_rule(self, tmp_path: Path) -> None:
+        """Valid command rule parses successfully."""
+        rule_file = tmp_path / "test.md"
+        rule_file.write_text(
+            """---
+name: Format Python
+trigger: "**/*.py"
+action:
+  command: "ruff format {file}"
+  run_for: each_match
+---
+"""
+        )
+
+        rule = parse_rule_file(rule_file)
+        assert rule.name == "Format Python"
+        assert rule.command_action is not None
+        assert rule.command_action.command == "ruff format {file}"
+        assert rule.command_action.run_for == "each_match"
+
+    def test_valid_compare_to_values(self, tmp_path: Path) -> None:
+        """Valid compare_to values parse successfully."""
+        for compare_to in ["base", "default_tip", "prompt"]:
+            rule_file = tmp_path / "test.md"
+            rule_file.write_text(
+                f"""---
+name: Test Rule
+trigger: "src/**/*"
+compare_to: {compare_to}
+---
+Instructions here.
+"""
+            )
+
+            rule = parse_rule_file(rule_file)
+            assert rule.compare_to == compare_to
+
+    def test_multiple_triggers(self, tmp_path: Path) -> None:
+        """Multiple triggers as array parses successfully."""
+        rule_file = tmp_path / "test.md"
+        rule_file.write_text(
+            """---
+name: Test Rule
+trigger:
+  - src/**/*.py
+  - lib/**/*.py
+---
+Instructions here.
+"""
+        )
+
+        rule = parse_rule_file(rule_file)
+        assert rule.triggers == ["src/**/*.py", "lib/**/*.py"]
+
+    def test_multiple_safety_patterns(self, tmp_path: Path) -> None:
+        """Multiple safety patterns as array parses successfully."""
+        rule_file = tmp_path / "test.md"
+        rule_file.write_text(
+            """---
+name: Test Rule
+trigger: src/**/*
+safety:
+  - README.md
+  - CHANGELOG.md
+---
+Instructions here.
+"""
+        )
+
+        rule = parse_rule_file(rule_file)
+        assert rule.safety == ["README.md", "CHANGELOG.md"]
+
+    def test_multiple_expects(self, tmp_path: Path) -> None:
+        """Multiple expects patterns parses successfully."""
+        rule_file = tmp_path / "test.md"
+        rule_file.write_text(
+            """---
+name: Test Rule
+pair:
+  trigger: api/{module}.py
+  expects:
+    - docs/api/{module}.md
+    - openapi/{module}.yaml
+---
+Instructions here.
+"""
+        )
+
+        rule = parse_rule_file(rule_file)
+        assert rule.pair_config is not None
+        assert rule.pair_config.expects == ["docs/api/{module}.md", "openapi/{module}.yaml"]

From 5051f796d8a491f2dcd895dc5c069c667dc8e399 Mon Sep 17 00:00:00 2001
From: Noah Horton <noah@unsupervised.com>
Date: Sat, 17 Jan 2026 14:43:39 -0700
Subject: [PATCH 14/21] Complete migration from v1 to v2 rules format

- Replace single .deepwork.rules.yml (v1) with individual .md files
  in .deepwork/rules/ directory (v2 frontmatter markdown format)

- Update install.py to create rules directory structure with:
  - README explaining v2 format
  - Example templates (.md.example files)

- Add v2 example templates in standard_jobs/deepwork_rules/rules/:
  - readme-documentation.md.example (trigger/safety mode)
  - api-documentation-sync.md.example (trigger/safety mode)
  - security-review.md.example (trigger-only mode)
  - source-test-pairing.md.example (set/bidirectional mode)

- Completely rewrite deepwork_rules.define step for v2 format:
  - Detection mode selection (trigger/safety, set, pair)
  - Variable pattern syntax ({path}, {name})
  - Updated examples and file location guidance

- Migrate this repo's bespoke rules to v2:
  - readme-accuracy.md
  - architecture-documentation-accuracy.md
  - standard-jobs-source-of-truth.md
  - version-and-changelog-update.md

- Remove deprecated src/deepwork/templates/default_rules.yml

- Update integration tests for v2 directory structure

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---
 .claude/commands/deepwork_jobs.implement.md   |   2 +-
 .claude/commands/deepwork_rules.define.md     | 269 ++++++++++-------
 .deepwork.rules.yml                           |  71 -----
 .../jobs/deepwork_jobs/steps/implement.md     |   2 +-
 .deepwork/jobs/deepwork_rules/job.yml         |  20 +-
 .deepwork/jobs/deepwork_rules/rules/.gitkeep  |  13 +
 .../rules/api-documentation-sync.md.example   |  10 +
 .../rules/readme-documentation.md.example     |  10 +
 .../rules/security-review.md.example          |  11 +
 .../rules/source-test-pairing.md.example      |  13 +
 .deepwork/jobs/deepwork_rules/steps/define.md | 251 +++++++++-------
 .../architecture-documentation-accuracy.md    |  10 +
 .deepwork/rules/readme-accuracy.md            |  10 +
 .../rules/standard-jobs-source-of-truth.md    |  24 ++
 .../rules/version-and-changelog-update.md     |  28 ++
 .gemini/commands/deepwork_jobs/implement.toml |   2 +-
 .gemini/commands/deepwork_rules/define.toml   | 271 +++++++++++-------
 doc/architecture.md                           |   9 +-
 src/deepwork/cli/install.py                   |  99 +++++--
 .../deepwork_jobs/steps/implement.md          |   2 +-
 .../standard_jobs/deepwork_rules/job.yml      |  20 +-
 .../deepwork_rules/rules/.gitkeep             |  13 +
 .../rules/api-documentation-sync.md.example   |  10 +
 .../rules/readme-documentation.md.example     |  10 +
 .../rules/security-review.md.example          |  11 +
 .../rules/source-test-pairing.md.example      |  13 +
 .../deepwork_rules/steps/define.md            | 251 +++++++++-------
 src/deepwork/templates/default_rules.yml      |  53 ----
 tests/integration/test_install_flow.py        |  56 ++--
 29 files changed, 941 insertions(+), 623 deletions(-)
 delete mode 100644 .deepwork.rules.yml
 create mode 100644 .deepwork/jobs/deepwork_rules/rules/.gitkeep
 create mode 100644 .deepwork/jobs/deepwork_rules/rules/api-documentation-sync.md.example
 create mode 100644 .deepwork/jobs/deepwork_rules/rules/readme-documentation.md.example
 create mode 100644 .deepwork/jobs/deepwork_rules/rules/security-review.md.example
 create mode 100644 .deepwork/jobs/deepwork_rules/rules/source-test-pairing.md.example
 create mode 100644 .deepwork/rules/architecture-documentation-accuracy.md
 create mode 100644 .deepwork/rules/readme-accuracy.md
 create mode 100644 .deepwork/rules/standard-jobs-source-of-truth.md
 create mode 100644 .deepwork/rules/version-and-changelog-update.md
 create mode 100644 src/deepwork/standard_jobs/deepwork_rules/rules/.gitkeep
 create mode 100644 src/deepwork/standard_jobs/deepwork_rules/rules/api-documentation-sync.md.example
 create mode 100644 src/deepwork/standard_jobs/deepwork_rules/rules/readme-documentation.md.example
 create mode 100644 src/deepwork/standard_jobs/deepwork_rules/rules/security-review.md.example
 create mode 100644 src/deepwork/standard_jobs/deepwork_rules/rules/source-test-pairing.md.example
 delete mode 100644 src/deepwork/templates/default_rules.yml

diff --git a/.claude/commands/deepwork_jobs.implement.md b/.claude/commands/deepwork_jobs.implement.md
index 76089b2d..7c224679 100644
--- a/.claude/commands/deepwork_jobs.implement.md
+++ b/.claude/commands/deepwork_jobs.implement.md
@@ -206,7 +206,7 @@ After implementing the job, consider whether there are **rules** that would help
 
 **What are rules?**
 
-Rules are automated guardrails defined in `.deepwork.rules.yml` that trigger when certain files change during an AI session. They help ensure:
+Rules are automated guardrails stored as markdown files in `.deepwork/rules/` that trigger when certain files change during an AI session. They help ensure:
 - Documentation stays in sync with code
 - Team guidelines are followed
 - Architectural decisions are respected
diff --git a/.claude/commands/deepwork_rules.define.md b/.claude/commands/deepwork_rules.define.md
index 286cd54a..148247f2 100644
--- a/.claude/commands/deepwork_rules.define.md
+++ b/.claude/commands/deepwork_rules.define.md
@@ -1,5 +1,5 @@
 ---
-description: Create or update rule entries in .deepwork.rules.yml
+description: Create a new rule file in .deepwork/rules/
 ---
 
 # deepwork_rules.define
@@ -14,17 +14,17 @@ Manages rules that automatically trigger when certain files change during an AI
 Rules help ensure that code changes follow team guidelines, documentation is updated,
 and architectural decisions are respected.
 
-Rules are defined in a `.deepwork.rules.yml` file at the root of your project. Each rule
-specifies:
-- Trigger patterns: Glob patterns for files that, when changed, should trigger the rule
-- Safety patterns: Glob patterns for files that, if also changed, mean the rule doesn't need to fire
-- Instructions: What the agent should do when the rule triggers
+Rules are stored as individual markdown files with YAML frontmatter in the `.deepwork/rules/`
+directory. Each rule file specifies:
+- Detection mode: trigger/safety, set (bidirectional), or pair (directional)
+- Patterns: Glob patterns for matching files, with optional variable capture
+- Instructions: Markdown content describing what the agent should do
 
 Example use cases:
 - Update installation docs when configuration files change
 - Require security review when authentication code is modified
 - Ensure API documentation stays in sync with API code
-- Remind developers to update changelogs
+- Enforce source/test file pairing
 
 
 
@@ -34,7 +34,7 @@ Example use cases:
 
 ## Objective
 
-Create or update rule entries in the `.deepwork.rules.yml` file to enforce team guidelines, documentation requirements, or other constraints when specific files change.
+Create a new rule file in the `.deepwork/rules/` directory to enforce team guidelines, documentation requirements, or other constraints when specific files change.
 
 ## Task
 
@@ -61,9 +61,28 @@ Start by asking structured questions to understand what the user wants to enforc
    - For example: If config changes AND install_guide.md changes, assume docs are already updated
    - This prevents redundant prompts when the user has already done the right thing
 
-### Step 2: Define the Trigger Patterns
+### Step 2: Choose the Detection Mode
 
-Help the user define glob patterns for files that should trigger the rule:
+Help the user select the appropriate detection mode:
+
+**Trigger/Safety Mode** (most common):
+- Fires when trigger patterns match AND no safety patterns match
+- Use for: "When X changes, check Y" rules
+- Example: When config changes, verify install docs
+
+**Set Mode** (bidirectional correspondence):
+- Fires when files that should change together don't all change
+- Use for: Source/test pairing, model/migration sync
+- Example: `src/foo.py` and `tests/foo_test.py` should change together
+
+**Pair Mode** (directional correspondence):
+- Fires when a trigger file changes but expected files don't
+- Changes to expected files alone do NOT trigger
+- Use for: API code requires documentation updates (but docs can update independently)
+
+### Step 3: Define the Patterns
+
+Help the user define glob patterns for files.
 
 **Common patterns:**
 - `src/**/*.py` - All Python files in src directory (recursive)
@@ -72,41 +91,28 @@ Help the user define glob patterns for files that should trigger the rule:
 - `src/api/**/*` - All files in the API directory
 - `migrations/**/*.sql` - All SQL migrations
 
+**Variable patterns (for set/pair modes):**
+- `src/{path}.py` - Captures path variable (e.g., `foo/bar` from `src/foo/bar.py`)
+- `tests/{path}_test.py` - Uses same path variable in corresponding file
+- `{name}` matches single segment, `{path}` matches multiple segments
+
 **Pattern syntax:**
 - `*` - Matches any characters within a single path segment
 - `**` - Matches any characters across multiple path segments (recursive)
 - `?` - Matches a single character
 
-### Step 3: Define Safety Patterns (Optional)
-
-If there are files that, when also changed, mean the rule shouldn't fire:
-
-**Examples:**
-- Rule: "Update install guide when config changes"
-  - Trigger: `app/config/**/*`
-  - Safety: `docs/install_guide.md` (if already updated, don't prompt)
-
-- Rule: "Security review for auth changes"
-  - Trigger: `src/auth/**/*`
-  - Safety: `SECURITY.md`, `docs/security_review.md`
-
-### Step 3b: Choose the Comparison Mode (Optional)
+### Step 4: Choose the Comparison Mode (Optional)
 
 The `compare_to` field controls what baseline is used when detecting "changed files":
 
 **Options:**
-- `base` (default) - Compares to the base of the current branch (merge-base with main/master). This is the most common choice for feature branches, as it shows all changes made on the branch.
-- `default_tip` - Compares to the current tip of the default branch (main/master). Useful when you want to see the difference from what's currently in production.
-- `prompt` - Compares to the state at the start of each prompt. Useful for rules that should only fire based on changes made during a single agent response.
-
-**When to use each:**
-- **base**: Best for most rules. "Did this branch change config files?" -> trigger docs review
-- **default_tip**: For rules about what's different from production/main
-- **prompt**: For rules that should only consider very recent changes within the current session
+- `base` (default) - Compares to the base of the current branch (merge-base with main/master). Best for feature branches.
+- `default_tip` - Compares to the current tip of the default branch. Useful for seeing difference from production.
+- `prompt` - Compares to the state at the start of each prompt. For rules about very recent changes.
 
 Most rules should use the default (`base`) and don't need to specify `compare_to`.
 
-### Step 4: Write the Instructions
+### Step 5: Write the Instructions
 
 Create clear, actionable instructions for what the agent should do when the rule fires.
 
@@ -116,45 +122,62 @@ Create clear, actionable instructions for what the agent should do when the rule
 - Specific actions to take
 - Quality criteria for completion
 
-**Example:**
-```
-Configuration files have changed. Please:
-1. Review docs/install_guide.md for accuracy
-2. Update any installation steps that reference changed config
-3. Verify environment variable documentation is current
-4. Test that installation instructions still work
-```
+**Template variables available in instructions:**
+- `{trigger_files}` - Files that triggered the rule
+- `{expected_files}` - Expected corresponding files (for set/pair modes)
 
-### Step 5: Create the Rule Entry
+### Step 6: Create the Rule File
 
-Create or update `.deepwork.rules.yml` in the project root.
+Create a new file in `.deepwork/rules/` with a kebab-case filename:
 
-**File Location**: `.deepwork.rules.yml` (root of project)
+**File Location**: `.deepwork/rules/{rule-name}.md`
+
+**Format for Trigger/Safety Mode:**
+```markdown
+---
+name: Friendly Name for the Rule
+trigger: "glob/pattern/**/*"  # or array: ["pattern1", "pattern2"]
+safety: "optional/pattern"    # optional, or array
+compare_to: base              # optional: "base" (default), "default_tip", or "prompt"
+---
+Instructions for the agent when this rule fires.
 
-**Format**:
-```yaml
-- name: "[Friendly name for the rule]"
-  trigger: "[glob pattern]"  # or array: ["pattern1", "pattern2"]
-  safety: "[glob pattern]"   # optional, or array
-  compare_to: "base"         # optional: "base" (default), "default_tip", or "prompt"
-  instructions: |
-    [Multi-line instructions for the agent...]
+Multi-line markdown content is supported.
 ```
 
-**Alternative with instructions_file**:
-```yaml
-- name: "[Friendly name for the rule]"
-  trigger: "[glob pattern]"
-  safety: "[glob pattern]"
-  compare_to: "base"         # optional
-  instructions_file: "path/to/instructions.md"
+**Format for Set Mode (bidirectional):**
+```markdown
+---
+name: Source/Test Pairing
+set:
+  - src/{path}.py
+  - tests/{path}_test.py
+---
+Source and test files should change together.
+
+Modified: {trigger_files}
+Expected: {expected_files}
 ```
 
-### Step 6: Verify the Rule
+**Format for Pair Mode (directional):**
+```markdown
+---
+name: API Documentation
+pair:
+  trigger: api/{path}.py
+  expects: docs/api/{path}.md
+---
+API code requires documentation updates.
+
+Changed API: {trigger_files}
+Update docs: {expected_files}
+```
+
+### Step 7: Verify the Rule
 
 After creating the rule:
 
-1. **Check the YAML syntax** - Ensure valid YAML formatting
+1. **Check the YAML frontmatter** - Ensure valid YAML formatting
 2. **Test trigger patterns** - Verify patterns match intended files
 3. **Review instructions** - Ensure they're clear and actionable
 4. **Check for conflicts** - Ensure the rule doesn't conflict with existing ones
@@ -162,72 +185,100 @@ After creating the rule:
 ## Example Rules
 
 ### Update Documentation on Config Changes
-```yaml
-- name: "Update install guide on config changes"
-  trigger: "app/config/**/*"
-  safety: "docs/install_guide.md"
-  instructions: |
-    Configuration files have been modified. Please review docs/install_guide.md
-    and update it if any installation instructions need to change based on the
-    new configuration.
+`.deepwork/rules/config-docs.md`:
+```markdown
+---
+name: Update Install Guide on Config Changes
+trigger: app/config/**/*
+safety: docs/install_guide.md
+---
+Configuration files have been modified. Please review docs/install_guide.md
+and update it if any installation instructions need to change based on the
+new configuration.
 ```
 
 ### Security Review for Auth Code
-```yaml
-- name: "Security review for authentication changes"
-  trigger:
-    - "src/auth/**/*"
-    - "src/security/**/*"
-  safety:
-    - "SECURITY.md"
-    - "docs/security_audit.md"
-  instructions: |
-    Authentication or security code has been changed. Please:
-    1. Review for hardcoded credentials or secrets
-    2. Check input validation on user inputs
-    3. Verify access control logic is correct
-    4. Update security documentation if needed
+`.deepwork/rules/security-review.md`:
+```markdown
+---
+name: Security Review for Authentication Changes
+trigger:
+  - src/auth/**/*
+  - src/security/**/*
+safety:
+  - SECURITY.md
+  - docs/security_audit.md
+---
+Authentication or security code has been changed. Please:
+
+1. Review for hardcoded credentials or secrets
+2. Check input validation on user inputs
+3. Verify access control logic is correct
+4. Update security documentation if needed
+```
+
+### Source/Test Pairing
+`.deepwork/rules/source-test-pairing.md`:
+```markdown
+---
+name: Source/Test Pairing
+set:
+  - src/{path}.py
+  - tests/{path}_test.py
+---
+Source and test files should change together.
+
+When modifying source code, ensure corresponding tests are updated.
+When adding tests, ensure they test actual source code.
+
+Modified: {trigger_files}
+Expected: {expected_files}
 ```
 
 ### API Documentation Sync
-```yaml
-- name: "API documentation update"
-  trigger: "src/api/**/*.py"
-  safety: "docs/api/**/*.md"
-  instructions: |
-    API code has changed. Please verify that API documentation in docs/api/
-    is up to date with the code changes. Pay special attention to:
-    - New or changed endpoints
-    - Modified request/response schemas
-    - Updated authentication requirements
+`.deepwork/rules/api-docs.md`:
+```markdown
+---
+name: API Documentation Update
+pair:
+  trigger: src/api/{path}.py
+  expects: docs/api/{path}.md
+---
+API code has changed. Please verify that API documentation in docs/api/
+is up to date with the code changes. Pay special attention to:
+
+- New or changed endpoints
+- Modified request/response schemas
+- Updated authentication requirements
+
+Changed API: {trigger_files}
+Update: {expected_files}
 ```
 
 ## Output Format
 
-### .deepwork.rules.yml
-Create or update this file at the project root with the new rule entry.
+### .deepwork/rules/{rule-name}.md
+Create a new file with the rule definition using YAML frontmatter and markdown body.
 
 ## Quality Criteria
 
 - Asked structured questions to understand user requirements
-- Rule name is clear and descriptive
-- Trigger patterns accurately match the intended files
-- Safety patterns prevent unnecessary triggering
+- Rule name is clear and descriptive (used in promise tags)
+- Correct detection mode selected for the use case
+- Patterns accurately match the intended files
+- Safety patterns prevent unnecessary triggering (if applicable)
 - Instructions are actionable and specific
-- YAML is valid and properly formatted
+- YAML frontmatter is valid
 
 ## Context
 
-Rules are evaluated automatically when you finish working on a task. The system:
-1. Determines which files have changed based on each rule's `compare_to` setting:
-   - `base` (default): Files changed since the branch diverged from main/master
-   - `default_tip`: Files different from the current main/master branch
-   - `prompt`: Files changed since the last prompt submission
-2. Checks if any changes match rule trigger patterns
-3. Skips rules where safety patterns also matched
+Rules are evaluated automatically when the agent finishes a task. The system:
+1. Determines which files have changed based on each rule's `compare_to` setting
+2. Evaluates rules based on their detection mode (trigger/safety, set, or pair)
+3. Skips rules where the correspondence is satisfied (for set/pair) or safety matched
 4. Prompts you with instructions for any triggered rules
 
-You can mark a rule as addressed by including `<promise>Rule Name</promise>` in your response (replace Rule Name with the actual rule name). This tells the system you've already handled that rule's requirements.
+You can mark a rule as addressed by including `<promise>Rule Name</promise>` in your response (replace Rule Name with the actual rule name from the `name` field). This tells the system you've already handled that rule's requirements.
 
 
 ## Inputs
@@ -255,7 +306,7 @@ All work for this job should be done on a dedicated work branch:
 ## Output Requirements
 
 Create the following output(s):
-- `.deepwork.rules.yml`
+- `.deepwork/rules/{rule-name}.md`
 Ensure all outputs are:
 - Well-formatted and complete
 - Ready for review or use by subsequent steps
@@ -268,7 +319,7 @@ After completing this step:
 
 2. **Inform the user**:
    - The define command is complete
-   - Outputs created: .deepwork.rules.yml
+   - Outputs created: .deepwork/rules/{rule-name}.md
    - This command can be run again anytime to make further changes
 
 ## Command Complete
diff --git a/.deepwork.rules.yml b/.deepwork.rules.yml
deleted file mode 100644
index d3444dc3..00000000
--- a/.deepwork.rules.yml
+++ /dev/null
@@ -1,71 +0,0 @@
-- name: "README Accuracy"
-  trigger: "src/**/*"
-  safety: "README.md"
-  instructions: |
-    Source code in src/ has been modified. Please review README.md for accuracy:
-    1. Verify project overview still reflects current functionality
-    2. Check that usage examples are still correct
-    3. Ensure installation/setup instructions remain valid
-    4. Update any sections that reference changed code
-
-- name: "Architecture Documentation Accuracy"
-  trigger: "src/**/*"
-  safety: "doc/architecture.md"
-  instructions: |
-    Source code in src/ has been modified. Please review doc/architecture.md for accuracy:
-    1. Verify the documented architecture matches the current implementation
-    2. Check that file paths and directory structures are still correct
-    3. Ensure component descriptions reflect actual behavior
-    4. Update any diagrams or flows that may have changed
-
-- name: "Standard Jobs Source of Truth"
-  trigger:
-    - ".deepwork/jobs/deepwork_jobs/**/*"
-    - ".deepwork/jobs/deepwork_rules/**/*"
-  safety:
-    - "src/deepwork/standard_jobs/deepwork_jobs/**/*"
-    - "src/deepwork/standard_jobs/deepwork_rules/**/*"
-  instructions: |
-    You modified files in `.deepwork/jobs/deepwork_jobs/` or `.deepwork/jobs/deepwork_rules/`.
-
-    **These are installed copies, NOT the source of truth!**
-
-    Standard jobs (deepwork_jobs, deepwork_rules) must be edited in their source location:
-    - Source: `src/deepwork/standard_jobs/[job_name]/`
-    - Installed copy: `.deepwork/jobs/[job_name]/` (DO NOT edit directly)
-
-    **Required action:**
-    1. Revert your changes to `.deepwork/jobs/deepwork_*/`
-    2. Make the same changes in `src/deepwork/standard_jobs/[job_name]/`
-    3. Run `deepwork install --platform claude` to sync changes
-    4. Verify the changes propagated correctly
-
-    See CLAUDE.md section "CRITICAL: Editing Standard Jobs" for details.
-
-- name: "Version and Changelog Update"
-  trigger: "src/**/*"
-  safety:
-    - "pyproject.toml"
-    - "CHANGELOG.md"
-  instructions: |
-    Source code in src/ has been modified. **You MUST evaluate whether version and changelog updates are needed.**
-
-    **Evaluate the changes:**
-    1. Is this a bug fix, new feature, breaking change, or internal refactor?
-    2. Does this change affect the public API or user-facing behavior?
-    3. Would users need to know about this change when upgrading?
-
-    **If version update is needed:**
-    1. Update the `version` field in `pyproject.toml` following semantic versioning:
-       - PATCH (0.1.x): Bug fixes, minor internal changes
-       - MINOR (0.x.0): New features, non-breaking changes
-       - MAJOR (x.0.0): Breaking changes
-    2. Add an entry to `CHANGELOG.md` under an appropriate version header:
-       - Use categories: Added, Changed, Fixed, Removed, Deprecated, Security
-       - Include a clear, user-facing description of what changed
-       - Follow the Keep a Changelog format
-
-    **If NO version update is needed** (e.g., tests only, comments, internal refactoring with no behavior change):
-    - Explicitly state why no version bump is required
-
-    **This rule requires explicit action** - either update both files or justify why no update is needed.
diff --git a/.deepwork/jobs/deepwork_jobs/steps/implement.md b/.deepwork/jobs/deepwork_jobs/steps/implement.md
index 600e1578..7771eaee 100644
--- a/.deepwork/jobs/deepwork_jobs/steps/implement.md
+++ b/.deepwork/jobs/deepwork_jobs/steps/implement.md
@@ -136,7 +136,7 @@ After implementing the job, consider whether there are **rules** that would help
 
 **What are rules?**
 
-Rules are automated guardrails defined in `.deepwork.rules.yml` that trigger when certain files change during an AI session. They help ensure:
+Rules are automated guardrails stored as markdown files in `.deepwork/rules/` that trigger when certain files change during an AI session. They help ensure:
 - Documentation stays in sync with code
 - Team guidelines are followed
 - Architectural decisions are respected
diff --git a/.deepwork/jobs/deepwork_rules/job.yml b/.deepwork/jobs/deepwork_rules/job.yml
index 9e9ece74..af540bc4 100644
--- a/.deepwork/jobs/deepwork_rules/job.yml
+++ b/.deepwork/jobs/deepwork_rules/job.yml
@@ -1,37 +1,39 @@
 name: deepwork_rules
-version: "0.2.0"
+version: "0.3.0"
 summary: "Rules enforcement for AI agent sessions"
 description: |
   Manages rules that automatically trigger when certain files change during an AI agent session.
   Rules help ensure that code changes follow team guidelines, documentation is updated,
   and architectural decisions are respected.
 
-  Rules are defined in a `.deepwork.rules.yml` file at the root of your project. Each rule
-  specifies:
-  - Trigger patterns: Glob patterns for files that, when changed, should trigger the rule
-  - Safety patterns: Glob patterns for files that, if also changed, mean the rule doesn't need to fire
-  - Instructions: What the agent should do when the rule triggers
+  Rules are stored as individual markdown files with YAML frontmatter in the `.deepwork/rules/`
+  directory. Each rule file specifies:
+  - Detection mode: trigger/safety, set (bidirectional), or pair (directional)
+  - Patterns: Glob patterns for matching files, with optional variable capture
+  - Instructions: Markdown content describing what the agent should do
 
   Example use cases:
   - Update installation docs when configuration files change
   - Require security review when authentication code is modified
   - Ensure API documentation stays in sync with API code
-  - Remind developers to update changelogs
+  - Enforce source/test file pairing
 
 changelog:
   - version: "0.1.0"
     changes: "Initial version"
   - version: "0.2.0"
     changes: "Standardized on 'ask structured questions' phrasing for user input"
+  - version: "0.3.0"
+    changes: "Migrated to v2 format - individual markdown files in .deepwork/rules/"
 
 steps:
   - id: define
     name: "Define Rule"
-    description: "Create or update rule entries in .deepwork.rules.yml"
+    description: "Create a new rule file in .deepwork/rules/"
     instructions_file: steps/define.md
     inputs:
       - name: rule_purpose
         description: "What guideline or constraint should this rule enforce?"
     outputs:
-      - .deepwork.rules.yml
+      - .deepwork/rules/{rule-name}.md
     dependencies: []
diff --git a/.deepwork/jobs/deepwork_rules/rules/.gitkeep b/.deepwork/jobs/deepwork_rules/rules/.gitkeep
new file mode 100644
index 00000000..429162b4
--- /dev/null
+++ b/.deepwork/jobs/deepwork_rules/rules/.gitkeep
@@ -0,0 +1,13 @@
+# This directory contains example rule templates.
+# Copy and customize these files to create your own rules.
+#
+# Rule files use YAML frontmatter in markdown format:
+#
+# ---
+# name: Rule Name
+# trigger: "pattern/**/*"
+# safety: "optional/pattern"
+# ---
+# Instructions in markdown here.
+#
+# See doc/rules_syntax.md for full documentation.
diff --git a/.deepwork/jobs/deepwork_rules/rules/api-documentation-sync.md.example b/.deepwork/jobs/deepwork_rules/rules/api-documentation-sync.md.example
new file mode 100644
index 00000000..427da7ae
--- /dev/null
+++ b/.deepwork/jobs/deepwork_rules/rules/api-documentation-sync.md.example
@@ -0,0 +1,10 @@
+---
+name: API Documentation Sync
+trigger: src/api/**/*
+safety: docs/api/**/*.md
+---
+API code has changed. Please verify that API documentation is up to date:
+
+- New or changed endpoints
+- Modified request/response schemas
+- Updated authentication requirements
diff --git a/.deepwork/jobs/deepwork_rules/rules/readme-documentation.md.example b/.deepwork/jobs/deepwork_rules/rules/readme-documentation.md.example
new file mode 100644
index 00000000..6be90c83
--- /dev/null
+++ b/.deepwork/jobs/deepwork_rules/rules/readme-documentation.md.example
@@ -0,0 +1,10 @@
+---
+name: README Documentation
+trigger: src/**/*
+safety: README.md
+---
+Source code has been modified. Please review README.md for accuracy:
+
+1. Verify the project overview reflects current functionality
+2. Check that usage examples are still correct
+3. Ensure installation/setup instructions remain valid
diff --git a/.deepwork/jobs/deepwork_rules/rules/security-review.md.example b/.deepwork/jobs/deepwork_rules/rules/security-review.md.example
new file mode 100644
index 00000000..abce3194
--- /dev/null
+++ b/.deepwork/jobs/deepwork_rules/rules/security-review.md.example
@@ -0,0 +1,11 @@
+---
+name: Security Review for Auth Changes
+trigger:
+  - src/auth/**/*
+  - src/security/**/*
+---
+Authentication or security code has been changed. Please:
+
+1. Review for hardcoded credentials or secrets
+2. Check input validation on user inputs
+3. Verify access control logic is correct
diff --git a/.deepwork/jobs/deepwork_rules/rules/source-test-pairing.md.example b/.deepwork/jobs/deepwork_rules/rules/source-test-pairing.md.example
new file mode 100644
index 00000000..3ebd6968
--- /dev/null
+++ b/.deepwork/jobs/deepwork_rules/rules/source-test-pairing.md.example
@@ -0,0 +1,13 @@
+---
+name: Source/Test Pairing
+set:
+  - src/{path}.py
+  - tests/{path}_test.py
+---
+Source and test files should change together.
+
+When modifying source code, ensure corresponding tests are updated.
+When adding tests, ensure they test actual source code.
+
+Modified source: {trigger_files}
+Expected tests: {expected_files}
diff --git a/.deepwork/jobs/deepwork_rules/steps/define.md b/.deepwork/jobs/deepwork_rules/steps/define.md
index 3e8be899..1e38a5e6 100644
--- a/.deepwork/jobs/deepwork_rules/steps/define.md
+++ b/.deepwork/jobs/deepwork_rules/steps/define.md
@@ -2,7 +2,7 @@
 
 ## Objective
 
-Create or update rule entries in the `.deepwork.rules.yml` file to enforce team guidelines, documentation requirements, or other constraints when specific files change.
+Create a new rule file in the `.deepwork/rules/` directory to enforce team guidelines, documentation requirements, or other constraints when specific files change.
 
 ## Task
 
@@ -29,9 +29,28 @@ Start by asking structured questions to understand what the user wants to enforc
    - For example: If config changes AND install_guide.md changes, assume docs are already updated
    - This prevents redundant prompts when the user has already done the right thing
 
-### Step 2: Define the Trigger Patterns
+### Step 2: Choose the Detection Mode
 
-Help the user define glob patterns for files that should trigger the rule:
+Help the user select the appropriate detection mode:
+
+**Trigger/Safety Mode** (most common):
+- Fires when trigger patterns match AND no safety patterns match
+- Use for: "When X changes, check Y" rules
+- Example: When config changes, verify install docs
+
+**Set Mode** (bidirectional correspondence):
+- Fires when files that should change together don't all change
+- Use for: Source/test pairing, model/migration sync
+- Example: `src/foo.py` and `tests/foo_test.py` should change together
+
+**Pair Mode** (directional correspondence):
+- Fires when a trigger file changes but expected files don't
+- Changes to expected files alone do NOT trigger
+- Use for: API code requires documentation updates (but docs can update independently)
+
+### Step 3: Define the Patterns
+
+Help the user define glob patterns for files.
 
 **Common patterns:**
 - `src/**/*.py` - All Python files in src directory (recursive)
@@ -40,41 +59,28 @@ Help the user define glob patterns for files that should trigger the rule:
 - `src/api/**/*` - All files in the API directory
 - `migrations/**/*.sql` - All SQL migrations
 
+**Variable patterns (for set/pair modes):**
+- `src/{path}.py` - Captures path variable (e.g., `foo/bar` from `src/foo/bar.py`)
+- `tests/{path}_test.py` - Uses same path variable in corresponding file
+- `{name}` matches single segment, `{path}` matches multiple segments
+
 **Pattern syntax:**
 - `*` - Matches any characters within a single path segment
 - `**` - Matches any characters across multiple path segments (recursive)
 - `?` - Matches a single character
 
-### Step 3: Define Safety Patterns (Optional)
-
-If there are files that, when also changed, mean the rule shouldn't fire:
-
-**Examples:**
-- Rule: "Update install guide when config changes"
-  - Trigger: `app/config/**/*`
-  - Safety: `docs/install_guide.md` (if already updated, don't prompt)
-
-- Rule: "Security review for auth changes"
-  - Trigger: `src/auth/**/*`
-  - Safety: `SECURITY.md`, `docs/security_review.md`
-
-### Step 3b: Choose the Comparison Mode (Optional)
+### Step 4: Choose the Comparison Mode (Optional)
 
 The `compare_to` field controls what baseline is used when detecting "changed files":
 
 **Options:**
-- `base` (default) - Compares to the base of the current branch (merge-base with main/master). This is the most common choice for feature branches, as it shows all changes made on the branch.
-- `default_tip` - Compares to the current tip of the default branch (main/master). Useful when you want to see the difference from what's currently in production.
-- `prompt` - Compares to the state at the start of each prompt. Useful for rules that should only fire based on changes made during a single agent response.
-
-**When to use each:**
-- **base**: Best for most rules. "Did this branch change config files?" -> trigger docs review
-- **default_tip**: For rules about what's different from production/main
-- **prompt**: For rules that should only consider very recent changes within the current session
+- `base` (default) - Compares to the base of the current branch (merge-base with main/master). Best for feature branches.
+- `default_tip` - Compares to the current tip of the default branch. Useful for seeing difference from production.
+- `prompt` - Compares to the state at the start of each prompt. For rules about very recent changes.
 
 Most rules should use the default (`base`) and don't need to specify `compare_to`.
 
-### Step 4: Write the Instructions
+### Step 5: Write the Instructions
 
 Create clear, actionable instructions for what the agent should do when the rule fires.
 
@@ -84,45 +90,62 @@ Create clear, actionable instructions for what the agent should do when the rule
 - Specific actions to take
 - Quality criteria for completion
 
-**Example:**
-```
-Configuration files have changed. Please:
-1. Review docs/install_guide.md for accuracy
-2. Update any installation steps that reference changed config
-3. Verify environment variable documentation is current
-4. Test that installation instructions still work
-```
+**Template variables available in instructions:**
+- `{trigger_files}` - Files that triggered the rule
+- `{expected_files}` - Expected corresponding files (for set/pair modes)
+
+### Step 6: Create the Rule File
 
-### Step 5: Create the Rule Entry
+Create a new file in `.deepwork/rules/` with a kebab-case filename:
 
-Create or update `.deepwork.rules.yml` in the project root.
+**File Location**: `.deepwork/rules/{rule-name}.md`
 
-**File Location**: `.deepwork.rules.yml` (root of project)
+**Format for Trigger/Safety Mode:**
+```markdown
+---
+name: Friendly Name for the Rule
+trigger: "glob/pattern/**/*"  # or array: ["pattern1", "pattern2"]
+safety: "optional/pattern"    # optional, or array
+compare_to: base              # optional: "base" (default), "default_tip", or "prompt"
+---
+Instructions for the agent when this rule fires.
 
-**Format**:
-```yaml
-- name: "[Friendly name for the rule]"
-  trigger: "[glob pattern]"  # or array: ["pattern1", "pattern2"]
-  safety: "[glob pattern]"   # optional, or array
-  compare_to: "base"         # optional: "base" (default), "default_tip", or "prompt"
-  instructions: |
-    [Multi-line instructions for the agent...]
+Multi-line markdown content is supported.
 ```
 
-**Alternative with instructions_file**:
-```yaml
-- name: "[Friendly name for the rule]"
-  trigger: "[glob pattern]"
-  safety: "[glob pattern]"
-  compare_to: "base"         # optional
-  instructions_file: "path/to/instructions.md"
+**Format for Set Mode (bidirectional):**
+```markdown
+---
+name: Source/Test Pairing
+set:
+  - src/{path}.py
+  - tests/{path}_test.py
+---
+Source and test files should change together.
+
+Modified: {trigger_files}
+Expected: {expected_files}
 ```
 
-### Step 6: Verify the Rule
+**Format for Pair Mode (directional):**
+```markdown
+---
+name: API Documentation
+pair:
+  trigger: api/{path}.py
+  expects: docs/api/{path}.md
+---
+API code requires documentation updates.
+
+Changed API: {trigger_files}
+Update docs: {expected_files}
+```
+
+### Step 7: Verify the Rule
 
 After creating the rule:
 
-1. **Check the YAML syntax** - Ensure valid YAML formatting
+1. **Check the YAML frontmatter** - Ensure valid YAML formatting
 2. **Test trigger patterns** - Verify patterns match intended files
 3. **Review instructions** - Ensure they're clear and actionable
 4. **Check for conflicts** - Ensure the rule doesn't conflict with existing ones
@@ -130,69 +153,97 @@ After creating the rule:
 ## Example Rules
 
 ### Update Documentation on Config Changes
-```yaml
-- name: "Update install guide on config changes"
-  trigger: "app/config/**/*"
-  safety: "docs/install_guide.md"
-  instructions: |
-    Configuration files have been modified. Please review docs/install_guide.md
-    and update it if any installation instructions need to change based on the
-    new configuration.
+`.deepwork/rules/config-docs.md`:
+```markdown
+---
+name: Update Install Guide on Config Changes
+trigger: app/config/**/*
+safety: docs/install_guide.md
+---
+Configuration files have been modified. Please review docs/install_guide.md
+and update it if any installation instructions need to change based on the
+new configuration.
 ```
 
 ### Security Review for Auth Code
-```yaml
-- name: "Security review for authentication changes"
-  trigger:
-    - "src/auth/**/*"
-    - "src/security/**/*"
-  safety:
-    - "SECURITY.md"
-    - "docs/security_audit.md"
-  instructions: |
-    Authentication or security code has been changed. Please:
-    1. Review for hardcoded credentials or secrets
-    2. Check input validation on user inputs
-    3. Verify access control logic is correct
-    4. Update security documentation if needed
+`.deepwork/rules/security-review.md`:
+```markdown
+---
+name: Security Review for Authentication Changes
+trigger:
+  - src/auth/**/*
+  - src/security/**/*
+safety:
+  - SECURITY.md
+  - docs/security_audit.md
+---
+Authentication or security code has been changed. Please:
+
+1. Review for hardcoded credentials or secrets
+2. Check input validation on user inputs
+3. Verify access control logic is correct
+4. Update security documentation if needed
+```
+
+### Source/Test Pairing
+`.deepwork/rules/source-test-pairing.md`:
+```markdown
+---
+name: Source/Test Pairing
+set:
+  - src/{path}.py
+  - tests/{path}_test.py
+---
+Source and test files should change together.
+
+When modifying source code, ensure corresponding tests are updated.
+When adding tests, ensure they test actual source code.
+
+Modified: {trigger_files}
+Expected: {expected_files}
 ```
 
 ### API Documentation Sync
-```yaml
-- name: "API documentation update"
-  trigger: "src/api/**/*.py"
-  safety: "docs/api/**/*.md"
-  instructions: |
-    API code has changed. Please verify that API documentation in docs/api/
-    is up to date with the code changes. Pay special attention to:
-    - New or changed endpoints
-    - Modified request/response schemas
-    - Updated authentication requirements
+`.deepwork/rules/api-docs.md`:
+```markdown
+---
+name: API Documentation Update
+pair:
+  trigger: src/api/{path}.py
+  expects: docs/api/{path}.md
+---
+API code has changed. Please verify that API documentation in docs/api/
+is up to date with the code changes. Pay special attention to:
+
+- New or changed endpoints
+- Modified request/response schemas
+- Updated authentication requirements
+
+Changed API: {trigger_files}
+Update: {expected_files}
 ```
 
 ## Output Format
 
-### .deepwork.rules.yml
-Create or update this file at the project root with the new rule entry.
+### .deepwork/rules/{rule-name}.md
+Create a new file with the rule definition using YAML frontmatter and markdown body.
 
 ## Quality Criteria
 
 - Asked structured questions to understand user requirements
-- Rule name is clear and descriptive
-- Trigger patterns accurately match the intended files
-- Safety patterns prevent unnecessary triggering
+- Rule name is clear and descriptive (used in promise tags)
+- Correct detection mode selected for the use case
+- Patterns accurately match the intended files
+- Safety patterns prevent unnecessary triggering (if applicable)
 - Instructions are actionable and specific
-- YAML is valid and properly formatted
+- YAML frontmatter is valid
 
 ## Context
 
-Rules are evaluated automatically when you finish working on a task. The system:
-1. Determines which files have changed based on each rule's `compare_to` setting:
-   - `base` (default): Files changed since the branch diverged from main/master
-   - `default_tip`: Files different from the current main/master branch
-   - `prompt`: Files changed since the last prompt submission
-2. Checks if any changes match rule trigger patterns
-3. Skips rules where safety patterns also matched
+Rules are evaluated automatically when the agent finishes a task. The system:
+1. Determines which files have changed based on each rule's `compare_to` setting
+2. Evaluates rules based on their detection mode (trigger/safety, set, or pair)
+3. Skips rules where the correspondence is satisfied (for set/pair) or safety matched
 4. Prompts you with instructions for any triggered rules
 
-You can mark a rule as addressed by including `<promise>Rule Name</promise>` in your response (replace Rule Name with the actual rule name). This tells the system you've already handled that rule's requirements.
+You can mark a rule as addressed by including `<promise>Rule Name</promise>` in your response (replace Rule Name with the actual rule name from the `name` field). This tells the system you've already handled that rule's requirements.
diff --git a/.deepwork/rules/architecture-documentation-accuracy.md b/.deepwork/rules/architecture-documentation-accuracy.md
new file mode 100644
index 00000000..42f74f88
--- /dev/null
+++ b/.deepwork/rules/architecture-documentation-accuracy.md
@@ -0,0 +1,10 @@
+---
+name: Architecture Documentation Accuracy
+trigger: src/**/*
+safety: doc/architecture.md
+---
+Source code in src/ has been modified. Please review doc/architecture.md for accuracy:
+1. Verify the documented architecture matches the current implementation
+2. Check that file paths and directory structures are still correct
+3. Ensure component descriptions reflect actual behavior
+4. Update any diagrams or flows that may have changed
diff --git a/.deepwork/rules/readme-accuracy.md b/.deepwork/rules/readme-accuracy.md
new file mode 100644
index 00000000..8284142b
--- /dev/null
+++ b/.deepwork/rules/readme-accuracy.md
@@ -0,0 +1,10 @@
+---
+name: README Accuracy
+trigger: src/**/*
+safety: README.md
+---
+Source code in src/ has been modified. Please review README.md for accuracy:
+1. Verify project overview still reflects current functionality
+2. Check that usage examples are still correct
+3. Ensure installation/setup instructions remain valid
+4. Update any sections that reference changed code
diff --git a/.deepwork/rules/standard-jobs-source-of-truth.md b/.deepwork/rules/standard-jobs-source-of-truth.md
new file mode 100644
index 00000000..3698489d
--- /dev/null
+++ b/.deepwork/rules/standard-jobs-source-of-truth.md
@@ -0,0 +1,24 @@
+---
+name: Standard Jobs Source of Truth
+trigger:
+  - .deepwork/jobs/deepwork_jobs/**/*
+  - .deepwork/jobs/deepwork_rules/**/*
+safety:
+  - src/deepwork/standard_jobs/deepwork_jobs/**/*
+  - src/deepwork/standard_jobs/deepwork_rules/**/*
+---
+You modified files in `.deepwork/jobs/deepwork_jobs/` or `.deepwork/jobs/deepwork_rules/`.
+
+**These are installed copies, NOT the source of truth!**
+
+Standard jobs (deepwork_jobs, deepwork_rules) must be edited in their source location:
+- Source: `src/deepwork/standard_jobs/[job_name]/`
+- Installed copy: `.deepwork/jobs/[job_name]/` (DO NOT edit directly)
+
+**Required action:**
+1. Revert your changes to `.deepwork/jobs/deepwork_*/`
+2. Make the same changes in `src/deepwork/standard_jobs/[job_name]/`
+3. Run `deepwork install --platform claude` to sync changes
+4. Verify the changes propagated correctly
+
+See CLAUDE.md section "CRITICAL: Editing Standard Jobs" for details.
diff --git a/.deepwork/rules/version-and-changelog-update.md b/.deepwork/rules/version-and-changelog-update.md
new file mode 100644
index 00000000..58e35088
--- /dev/null
+++ b/.deepwork/rules/version-and-changelog-update.md
@@ -0,0 +1,28 @@
+---
+name: Version and Changelog Update
+trigger: src/**/*
+safety:
+  - pyproject.toml
+  - CHANGELOG.md
+---
+Source code in src/ has been modified. **You MUST evaluate whether version and changelog updates are needed.**
+
+**Evaluate the changes:**
+1. Is this a bug fix, new feature, breaking change, or internal refactor?
+2. Does this change affect the public API or user-facing behavior?
+3. Would users need to know about this change when upgrading?
+
+**If version update is needed:**
+1. Update the `version` field in `pyproject.toml` following semantic versioning:
+   - PATCH (0.1.x): Bug fixes, minor internal changes
+   - MINOR (0.x.0): New features, non-breaking changes
+   - MAJOR (x.0.0): Breaking changes
+2. Add an entry to `CHANGELOG.md` under an appropriate version header:
+   - Use categories: Added, Changed, Fixed, Removed, Deprecated, Security
+   - Include a clear, user-facing description of what changed
+   - Follow the Keep a Changelog format
+
+**If NO version update is needed** (e.g., tests only, comments, internal refactoring with no behavior change):
+- Explicitly state why no version bump is required
+
+**This rule requires explicit action** - either update both files or justify why no update is needed.
diff --git a/.gemini/commands/deepwork_jobs/implement.toml b/.gemini/commands/deepwork_jobs/implement.toml
index 4c09fc47..4cc5a989 100644
--- a/.gemini/commands/deepwork_jobs/implement.toml
+++ b/.gemini/commands/deepwork_jobs/implement.toml
@@ -174,7 +174,7 @@ After implementing the job, consider whether there are **rules** that would help
 
 **What are rules?**
 
-Rules are automated guardrails defined in `.deepwork.rules.yml` that trigger when certain files change during an AI session. They help ensure:
+Rules are automated guardrails stored as markdown files in `.deepwork/rules/` that trigger when certain files change during an AI session. They help ensure:
 - Documentation stays in sync with code
 - Team guidelines are followed
 - Architectural decisions are respected
diff --git a/.gemini/commands/deepwork_rules/define.toml b/.gemini/commands/deepwork_rules/define.toml
index 3615c83e..28d6d5b4 100644
--- a/.gemini/commands/deepwork_rules/define.toml
+++ b/.gemini/commands/deepwork_rules/define.toml
@@ -1,10 +1,10 @@
 # deepwork_rules:define
 #
-# Create or update rule entries in .deepwork.rules.yml
+# Create a new rule file in .deepwork/rules/
 #
 # Generated by DeepWork - do not edit manually
 
-description = "Create or update rule entries in .deepwork.rules.yml"
+description = "Create a new rule file in .deepwork/rules/"
 
 prompt = """
 # deepwork_rules:define
@@ -19,17 +19,17 @@ Manages rules that automatically trigger when certain files change during an AI
 Rules help ensure that code changes follow team guidelines, documentation is updated,
 and architectural decisions are respected.
 
-Rules are defined in a `.deepwork.rules.yml` file at the root of your project. Each rule
-specifies:
-- Trigger patterns: Glob patterns for files that, when changed, should trigger the rule
-- Safety patterns: Glob patterns for files that, if also changed, mean the rule doesn't need to fire
-- Instructions: What the agent should do when the rule triggers
+Rules are stored as individual markdown files with YAML frontmatter in the `.deepwork/rules/`
+directory. Each rule file specifies:
+- Detection mode: trigger/safety, set (bidirectional), or pair (directional)
+- Patterns: Glob patterns for matching files, with optional variable capture
+- Instructions: Markdown content describing what the agent should do
 
 Example use cases:
 - Update installation docs when configuration files change
 - Require security review when authentication code is modified
 - Ensure API documentation stays in sync with API code
-- Remind developers to update changelogs
+- Enforce source/test file pairing
 
 
 
@@ -39,7 +39,7 @@ Example use cases:
 
 ## Objective
 
-Create or update rule entries in the `.deepwork.rules.yml` file to enforce team guidelines, documentation requirements, or other constraints when specific files change.
+Create a new rule file in the `.deepwork/rules/` directory to enforce team guidelines, documentation requirements, or other constraints when specific files change.
 
 ## Task
 
@@ -66,9 +66,28 @@ Start by asking structured questions to understand what the user wants to enforc
    - For example: If config changes AND install_guide.md changes, assume docs are already updated
    - This prevents redundant prompts when the user has already done the right thing
 
-### Step 2: Define the Trigger Patterns
+### Step 2: Choose the Detection Mode
 
-Help the user define glob patterns for files that should trigger the rule:
+Help the user select the appropriate detection mode:
+
+**Trigger/Safety Mode** (most common):
+- Fires when trigger patterns match AND no safety patterns match
+- Use for: "When X changes, check Y" rules
+- Example: When config changes, verify install docs
+
+**Set Mode** (bidirectional correspondence):
+- Fires when files that should change together don't all change
+- Use for: Source/test pairing, model/migration sync
+- Example: `src/foo.py` and `tests/foo_test.py` should change together
+
+**Pair Mode** (directional correspondence):
+- Fires when a trigger file changes but expected files don't
+- Changes to expected files alone do NOT trigger
+- Use for: API code requires documentation updates (but docs can update independently)
+
+### Step 3: Define the Patterns
+
+Help the user define glob patterns for files.
 
 **Common patterns:**
 - `src/**/*.py` - All Python files in src directory (recursive)
@@ -77,41 +96,28 @@ Help the user define glob patterns for files that should trigger the rule:
 - `src/api/**/*` - All files in the API directory
 - `migrations/**/*.sql` - All SQL migrations
 
+**Variable patterns (for set/pair modes):**
+- `src/{path}.py` - Captures path variable (e.g., `foo/bar` from `src/foo/bar.py`)
+- `tests/{path}_test.py` - Uses same path variable in corresponding file
+- `{name}` matches single segment, `{path}` matches multiple segments
+
 **Pattern syntax:**
 - `*` - Matches any characters within a single path segment
 - `**` - Matches any characters across multiple path segments (recursive)
 - `?` - Matches a single character
 
-### Step 3: Define Safety Patterns (Optional)
-
-If there are files that, when also changed, mean the rule shouldn't fire:
-
-**Examples:**
-- Rule: "Update install guide when config changes"
-  - Trigger: `app/config/**/*`
-  - Safety: `docs/install_guide.md` (if already updated, don't prompt)
-
-- Rule: "Security review for auth changes"
-  - Trigger: `src/auth/**/*`
-  - Safety: `SECURITY.md`, `docs/security_review.md`
-
-### Step 3b: Choose the Comparison Mode (Optional)
+### Step 4: Choose the Comparison Mode (Optional)
 
 The `compare_to` field controls what baseline is used when detecting "changed files":
 
 **Options:**
-- `base` (default) - Compares to the base of the current branch (merge-base with main/master). This is the most common choice for feature branches, as it shows all changes made on the branch.
-- `default_tip` - Compares to the current tip of the default branch (main/master). Useful when you want to see the difference from what's currently in production.
-- `prompt` - Compares to the state at the start of each prompt. Useful for rules that should only fire based on changes made during a single agent response.
-
-**When to use each:**
-- **base**: Best for most rules. "Did this branch change config files?" -> trigger docs review
-- **default_tip**: For rules about what's different from production/main
-- **prompt**: For rules that should only consider very recent changes within the current session
+- `base` (default) - Compares to the base of the current branch (merge-base with main/master). Best for feature branches.
+- `default_tip` - Compares to the current tip of the default branch. Useful for seeing difference from production.
+- `prompt` - Compares to the state at the start of each prompt. For rules about very recent changes.
 
 Most rules should use the default (`base`) and don't need to specify `compare_to`.
 
-### Step 4: Write the Instructions
+### Step 5: Write the Instructions
 
 Create clear, actionable instructions for what the agent should do when the rule fires.
 
@@ -121,45 +127,62 @@ Create clear, actionable instructions for what the agent should do when the rule
 - Specific actions to take
 - Quality criteria for completion
 
-**Example:**
-```
-Configuration files have changed. Please:
-1. Review docs/install_guide.md for accuracy
-2. Update any installation steps that reference changed config
-3. Verify environment variable documentation is current
-4. Test that installation instructions still work
-```
+**Template variables available in instructions:**
+- `{trigger_files}` - Files that triggered the rule
+- `{expected_files}` - Expected corresponding files (for set/pair modes)
 
-### Step 5: Create the Rule Entry
+### Step 6: Create the Rule File
 
-Create or update `.deepwork.rules.yml` in the project root.
+Create a new file in `.deepwork/rules/` with a kebab-case filename:
 
-**File Location**: `.deepwork.rules.yml` (root of project)
+**File Location**: `.deepwork/rules/{rule-name}.md`
+
+**Format for Trigger/Safety Mode:**
+```markdown
+---
+name: Friendly Name for the Rule
+trigger: "glob/pattern/**/*"  # or array: ["pattern1", "pattern2"]
+safety: "optional/pattern"    # optional, or array
+compare_to: base              # optional: "base" (default), "default_tip", or "prompt"
+---
+Instructions for the agent when this rule fires.
 
-**Format**:
-```yaml
-- name: "[Friendly name for the rule]"
-  trigger: "[glob pattern]"  # or array: ["pattern1", "pattern2"]
-  safety: "[glob pattern]"   # optional, or array
-  compare_to: "base"         # optional: "base" (default), "default_tip", or "prompt"
-  instructions: |
-    [Multi-line instructions for the agent...]
+Multi-line markdown content is supported.
 ```
 
-**Alternative with instructions_file**:
-```yaml
-- name: "[Friendly name for the rule]"
-  trigger: "[glob pattern]"
-  safety: "[glob pattern]"
-  compare_to: "base"         # optional
-  instructions_file: "path/to/instructions.md"
+**Format for Set Mode (bidirectional):**
+```markdown
+---
+name: Source/Test Pairing
+set:
+  - src/{path}.py
+  - tests/{path}_test.py
+---
+Source and test files should change together.
+
+Modified: {trigger_files}
+Expected: {expected_files}
 ```
 
-### Step 6: Verify the Rule
+**Format for Pair Mode (directional):**
+```markdown
+---
+name: API Documentation
+pair:
+  trigger: api/{path}.py
+  expects: docs/api/{path}.md
+---
+API code requires documentation updates.
+
+Changed API: {trigger_files}
+Update docs: {expected_files}
+```
+
+### Step 7: Verify the Rule
 
 After creating the rule:
 
-1. **Check the YAML syntax** - Ensure valid YAML formatting
+1. **Check the YAML frontmatter** - Ensure valid YAML formatting
 2. **Test trigger patterns** - Verify patterns match intended files
 3. **Review instructions** - Ensure they're clear and actionable
 4. **Check for conflicts** - Ensure the rule doesn't conflict with existing ones
@@ -167,72 +190,100 @@ After creating the rule:
 ## Example Rules
 
 ### Update Documentation on Config Changes
-```yaml
-- name: "Update install guide on config changes"
-  trigger: "app/config/**/*"
-  safety: "docs/install_guide.md"
-  instructions: |
-    Configuration files have been modified. Please review docs/install_guide.md
-    and update it if any installation instructions need to change based on the
-    new configuration.
+`.deepwork/rules/config-docs.md`:
+```markdown
+---
+name: Update Install Guide on Config Changes
+trigger: app/config/**/*
+safety: docs/install_guide.md
+---
+Configuration files have been modified. Please review docs/install_guide.md
+and update it if any installation instructions need to change based on the
+new configuration.
 ```
 
 ### Security Review for Auth Code
-```yaml
-- name: "Security review for authentication changes"
-  trigger:
-    - "src/auth/**/*"
-    - "src/security/**/*"
-  safety:
-    - "SECURITY.md"
-    - "docs/security_audit.md"
-  instructions: |
-    Authentication or security code has been changed. Please:
-    1. Review for hardcoded credentials or secrets
-    2. Check input validation on user inputs
-    3. Verify access control logic is correct
-    4. Update security documentation if needed
+`.deepwork/rules/security-review.md`:
+```markdown
+---
+name: Security Review for Authentication Changes
+trigger:
+  - src/auth/**/*
+  - src/security/**/*
+safety:
+  - SECURITY.md
+  - docs/security_audit.md
+---
+Authentication or security code has been changed. Please:
+
+1. Review for hardcoded credentials or secrets
+2. Check input validation on user inputs
+3. Verify access control logic is correct
+4. Update security documentation if needed
+```
+
+### Source/Test Pairing
+`.deepwork/rules/source-test-pairing.md`:
+```markdown
+---
+name: Source/Test Pairing
+set:
+  - src/{path}.py
+  - tests/{path}_test.py
+---
+Source and test files should change together.
+
+When modifying source code, ensure corresponding tests are updated.
+When adding tests, ensure they test actual source code.
+
+Modified: {trigger_files}
+Expected: {expected_files}
 ```
 
 ### API Documentation Sync
-```yaml
-- name: "API documentation update"
-  trigger: "src/api/**/*.py"
-  safety: "docs/api/**/*.md"
-  instructions: |
-    API code has changed. Please verify that API documentation in docs/api/
-    is up to date with the code changes. Pay special attention to:
-    - New or changed endpoints
-    - Modified request/response schemas
-    - Updated authentication requirements
+`.deepwork/rules/api-docs.md`:
+```markdown
+---
+name: API Documentation Update
+pair:
+  trigger: src/api/{path}.py
+  expects: docs/api/{path}.md
+---
+API code has changed. Please verify that API documentation in docs/api/
+is up to date with the code changes. Pay special attention to:
+
+- New or changed endpoints
+- Modified request/response schemas
+- Updated authentication requirements
+
+Changed API: {trigger_files}
+Update: {expected_files}
 ```
 
 ## Output Format
 
-### .deepwork.rules.yml
-Create or update this file at the project root with the new rule entry.
+### .deepwork/rules/{rule-name}.md
+Create a new file with the rule definition using YAML frontmatter and markdown body.
 
 ## Quality Criteria
 
 - Asked structured questions to understand user requirements
-- Rule name is clear and descriptive
-- Trigger patterns accurately match the intended files
-- Safety patterns prevent unnecessary triggering
+- Rule name is clear and descriptive (used in promise tags)
+- Correct detection mode selected for the use case
+- Patterns accurately match the intended files
+- Safety patterns prevent unnecessary triggering (if applicable)
 - Instructions are actionable and specific
-- YAML is valid and properly formatted
+- YAML frontmatter is valid
 
 ## Context
 
-Rules are evaluated automatically when you finish working on a task. The system:
-1. Determines which files have changed based on each rule's `compare_to` setting:
-   - `base` (default): Files changed since the branch diverged from main/master
-   - `default_tip`: Files different from the current main/master branch
-   - `prompt`: Files changed since the last prompt submission
-2. Checks if any changes match rule trigger patterns
-3. Skips rules where safety patterns also matched
+Rules are evaluated automatically when the agent finishes a task. The system:
+1. Determines which files have changed based on each rule's `compare_to` setting
+2. Evaluates rules based on their detection mode (trigger/safety, set, or pair)
+3. Skips rules where the correspondence is satisfied (for set/pair) or safety matched
 4. Prompts you with instructions for any triggered rules
 
-You can mark a rule as addressed by including `<promise>Rule Name</promise>` in your response (replace Rule Name with the actual rule name). This tells the system you've already handled that rule's requirements.
+You can mark a rule as addressed by including `<promise>Rule Name</promise>` in your response (replace Rule Name with the actual rule name from the `name` field). This tells the system you've already handled that rule's requirements.
 
 
 ## Inputs
@@ -260,7 +311,7 @@ All work for this job should be done on a dedicated work branch:
 ## Output Requirements
 
 Create the following output(s):
-- `.deepwork.rules.yml`
+- `.deepwork/rules/{rule-name}.md`
 
 Ensure all outputs are:
 - Well-formatted and complete
@@ -274,7 +325,7 @@ After completing this step:
 
 2. **Inform the user**:
    - The define command is complete
-   - Outputs created: .deepwork.rules.yml
+   - Outputs created: .deepwork/rules/{rule-name}.md
    - This command can be run again anytime to make further changes
 
 ## Command Complete
diff --git a/doc/architecture.md b/doc/architecture.md
index 8c494beb..a0a0e959 100644
--- a/doc/architecture.md
+++ b/doc/architecture.md
@@ -122,9 +122,10 @@ def install(platform: str):
     # Inject core job definitions
     inject_deepwork_jobs(".deepwork/jobs/")
 
-    # Create default rules template (if not exists)
-    if not exists(".deepwork.rules.yml"):
-        copy_template("default_rules.yml", ".deepwork.rules.yml")
+    # Create rules directory with example templates (if not exists)
+    if not exists(".deepwork/rules/"):
+        create_directory(".deepwork/rules/")
+        copy_example_rules(".deepwork/rules/")
 
     # Update config (supports multiple platforms)
     config = load_yaml(".deepwork/config.yml") or {}
@@ -1217,7 +1218,7 @@ User: When API code changes, the API documentation should be updated
 Claude: Got it. Let me ask a few questions...
         [Interactive dialog to define trigger, safety, and instructions]
 
-Claude: ✓ Created rule "API documentation update" in .deepwork.rules.yml
+Claude: Created rule "API documentation update" in .deepwork/rules/api-documentation.md
 ```
 
 ---
diff --git a/src/deepwork/cli/install.py b/src/deepwork/cli/install.py
index ce7608b1..a84bb8ec 100644
--- a/src/deepwork/cli/install.py
+++ b/src/deepwork/cli/install.py
@@ -113,44 +113,85 @@ def _create_deepwork_gitignore(deepwork_dir: Path) -> None:
         gitignore_path.write_text(gitignore_content)
 
 
-def _create_default_rules_file(project_path: Path) -> bool:
+def _create_rules_directory(project_path: Path) -> bool:
     """
-    Create a default rules file template in the project root.
+    Create the v2 rules directory structure with example templates.
 
-    Only creates the file if it doesn't already exist.
+    Creates .deepwork/rules/ with example rule files that users can customize.
+    Only creates the directory if it doesn't already exist.
 
     Args:
         project_path: Path to the project root
 
     Returns:
-        True if the file was created, False if it already existed
+        True if the directory was created, False if it already existed
     """
-    rules_file = project_path / ".deepwork.rules.yml"
+    rules_dir = project_path / ".deepwork" / "rules"
 
-    if rules_file.exists():
+    if rules_dir.exists():
         return False
 
-    # Copy the template from the templates directory
-    template_path = Path(__file__).parent.parent / "templates" / "default_rules.yml"
+    # Create the rules directory
+    ensure_dir(rules_dir)
 
-    if template_path.exists():
-        shutil.copy(template_path, rules_file)
-    else:
-        # Fallback: create a minimal template inline
-        rules_file.write_text(
-            """# DeepWork Rules Configuration
-#
-# Rules are automated guardrails that trigger when specific files change.
-# Use /deepwork_rules.define to create new rules interactively.
-#
-# Format:
-#   - name: "Rule name"
-#     trigger: "glob/pattern/**/*"
-#     safety: "optional/pattern/**/*"
-#     instructions: |
-#       Instructions for the AI agent...
+    # Copy example rule templates from the deepwork_rules standard job
+    example_rules_dir = (
+        Path(__file__).parent.parent / "standard_jobs" / "deepwork_rules" / "rules"
+    )
+
+    if example_rules_dir.exists():
+        # Copy all .example files
+        for example_file in example_rules_dir.glob("*.md.example"):
+            dest_file = rules_dir / example_file.name
+            shutil.copy(example_file, dest_file)
+
+    # Create a README file explaining the rules system
+    readme_content = """# DeepWork Rules
+
+Rules are automated guardrails that trigger when specific files change during
+AI agent sessions. They help ensure documentation stays current, security reviews
+happen, and team guidelines are followed.
+
+## Getting Started
+
+1. Copy an example file and rename it (remove the `.example` suffix):
+   ```
+   cp readme-documentation.md.example readme-documentation.md
+   ```
+
+2. Edit the file to match your project's patterns
+
+3. The rule will automatically trigger when matching files change
+
+## Rule Format
+
+Rules use YAML frontmatter in markdown files:
+
+```markdown
+---
+name: Rule Name
+trigger: "pattern/**/*"
+safety: "optional/pattern"
+---
+Instructions in markdown here.
+```
+
+## Detection Modes
+
+- **trigger/safety**: Fire when trigger matches, unless safety also matches
+- **set**: Bidirectional file correspondence (e.g., source + test)
+- **pair**: Directional correspondence (e.g., API code -> docs)
+
+## Documentation
+
+See `doc/rules_syntax.md` in the DeepWork repository for full syntax documentation.
+
+## Creating Rules Interactively
+
+Use `/deepwork_rules.define` to create new rules with guidance.
 """
-        )
+    readme_path = rules_dir / "README.md"
+    readme_path.write_text(readme_content)
 
     return True
 
@@ -277,11 +318,11 @@ def _install_deepwork(platform_name: str | None, project_path: Path) -> None:
     _create_deepwork_gitignore(deepwork_dir)
     console.print("  [green]✓[/green] Created .deepwork/.gitignore")
 
-    # Step 3d: Create default rules file template
-    if _create_default_rules_file(project_path):
-        console.print("  [green]✓[/green] Created .deepwork.rules.yml template")
+    # Step 3d: Create rules directory with v2 templates
+    if _create_rules_directory(project_path):
+        console.print("  [green]✓[/green] Created .deepwork/rules/ with example templates")
     else:
-        console.print("  [dim]•[/dim] .deepwork.rules.yml already exists")
+        console.print("  [dim]•[/dim] .deepwork/rules/ already exists")
 
     # Step 4: Load or create config.yml
     console.print("[yellow]→[/yellow] Updating configuration...")
diff --git a/src/deepwork/standard_jobs/deepwork_jobs/steps/implement.md b/src/deepwork/standard_jobs/deepwork_jobs/steps/implement.md
index 600e1578..7771eaee 100644
--- a/src/deepwork/standard_jobs/deepwork_jobs/steps/implement.md
+++ b/src/deepwork/standard_jobs/deepwork_jobs/steps/implement.md
@@ -136,7 +136,7 @@ After implementing the job, consider whether there are **rules** that would help
 
 **What are rules?**
 
-Rules are automated guardrails defined in `.deepwork.rules.yml` that trigger when certain files change during an AI session. They help ensure:
+Rules are automated guardrails stored as markdown files in `.deepwork/rules/` that trigger when certain files change during an AI session. They help ensure:
 - Documentation stays in sync with code
 - Team guidelines are followed
 - Architectural decisions are respected
diff --git a/src/deepwork/standard_jobs/deepwork_rules/job.yml b/src/deepwork/standard_jobs/deepwork_rules/job.yml
index 9e9ece74..af540bc4 100644
--- a/src/deepwork/standard_jobs/deepwork_rules/job.yml
+++ b/src/deepwork/standard_jobs/deepwork_rules/job.yml
@@ -1,37 +1,39 @@
 name: deepwork_rules
-version: "0.2.0"
+version: "0.3.0"
 summary: "Rules enforcement for AI agent sessions"
 description: |
   Manages rules that automatically trigger when certain files change during an AI agent session.
   Rules help ensure that code changes follow team guidelines, documentation is updated,
   and architectural decisions are respected.
 
-  Rules are defined in a `.deepwork.rules.yml` file at the root of your project. Each rule
-  specifies:
-  - Trigger patterns: Glob patterns for files that, when changed, should trigger the rule
-  - Safety patterns: Glob patterns for files that, if also changed, mean the rule doesn't need to fire
-  - Instructions: What the agent should do when the rule triggers
+  Rules are stored as individual markdown files with YAML frontmatter in the `.deepwork/rules/`
+  directory. Each rule file specifies:
+  - Detection mode: trigger/safety, set (bidirectional), or pair (directional)
+  - Patterns: Glob patterns for matching files, with optional variable capture
+  - Instructions: Markdown content describing what the agent should do
 
   Example use cases:
   - Update installation docs when configuration files change
   - Require security review when authentication code is modified
   - Ensure API documentation stays in sync with API code
-  - Remind developers to update changelogs
+  - Enforce source/test file pairing
 
 changelog:
   - version: "0.1.0"
     changes: "Initial version"
   - version: "0.2.0"
     changes: "Standardized on 'ask structured questions' phrasing for user input"
+  - version: "0.3.0"
+    changes: "Migrated to v2 format - individual markdown files in .deepwork/rules/"
 
 steps:
   - id: define
     name: "Define Rule"
-    description: "Create or update rule entries in .deepwork.rules.yml"
+    description: "Create a new rule file in .deepwork/rules/"
     instructions_file: steps/define.md
     inputs:
       - name: rule_purpose
         description: "What guideline or constraint should this rule enforce?"
     outputs:
-      - .deepwork.rules.yml
+      - .deepwork/rules/{rule-name}.md
     dependencies: []
diff --git a/src/deepwork/standard_jobs/deepwork_rules/rules/.gitkeep b/src/deepwork/standard_jobs/deepwork_rules/rules/.gitkeep
new file mode 100644
index 00000000..429162b4
--- /dev/null
+++ b/src/deepwork/standard_jobs/deepwork_rules/rules/.gitkeep
@@ -0,0 +1,13 @@
+# This directory contains example rule templates.
+# Copy and customize these files to create your own rules.
+#
+# Rule files use YAML frontmatter in markdown format:
+#
+# ---
+# name: Rule Name
+# trigger: "pattern/**/*"
+# safety: "optional/pattern"
+# ---
+# Instructions in markdown here.
+#
+# See doc/rules_syntax.md for full documentation.
diff --git a/src/deepwork/standard_jobs/deepwork_rules/rules/api-documentation-sync.md.example b/src/deepwork/standard_jobs/deepwork_rules/rules/api-documentation-sync.md.example
new file mode 100644
index 00000000..427da7ae
--- /dev/null
+++ b/src/deepwork/standard_jobs/deepwork_rules/rules/api-documentation-sync.md.example
@@ -0,0 +1,10 @@
+---
+name: API Documentation Sync
+trigger: src/api/**/*
+safety: docs/api/**/*.md
+---
+API code has changed. Please verify that API documentation is up to date:
+
+- New or changed endpoints
+- Modified request/response schemas
+- Updated authentication requirements
diff --git a/src/deepwork/standard_jobs/deepwork_rules/rules/readme-documentation.md.example b/src/deepwork/standard_jobs/deepwork_rules/rules/readme-documentation.md.example
new file mode 100644
index 00000000..6be90c83
--- /dev/null
+++ b/src/deepwork/standard_jobs/deepwork_rules/rules/readme-documentation.md.example
@@ -0,0 +1,10 @@
+---
+name: README Documentation
+trigger: src/**/*
+safety: README.md
+---
+Source code has been modified. Please review README.md for accuracy:
+
+1. Verify the project overview reflects current functionality
+2. Check that usage examples are still correct
+3. Ensure installation/setup instructions remain valid
diff --git a/src/deepwork/standard_jobs/deepwork_rules/rules/security-review.md.example b/src/deepwork/standard_jobs/deepwork_rules/rules/security-review.md.example
new file mode 100644
index 00000000..abce3194
--- /dev/null
+++ b/src/deepwork/standard_jobs/deepwork_rules/rules/security-review.md.example
@@ -0,0 +1,11 @@
+---
+name: Security Review for Auth Changes
+trigger:
+  - src/auth/**/*
+  - src/security/**/*
+---
+Authentication or security code has been changed. Please:
+
+1. Review for hardcoded credentials or secrets
+2. Check input validation on user inputs
+3. Verify access control logic is correct
diff --git a/src/deepwork/standard_jobs/deepwork_rules/rules/source-test-pairing.md.example b/src/deepwork/standard_jobs/deepwork_rules/rules/source-test-pairing.md.example
new file mode 100644
index 00000000..3ebd6968
--- /dev/null
+++ b/src/deepwork/standard_jobs/deepwork_rules/rules/source-test-pairing.md.example
@@ -0,0 +1,13 @@
+---
+name: Source/Test Pairing
+set:
+  - src/{path}.py
+  - tests/{path}_test.py
+---
+Source and test files should change together.
+
+When modifying source code, ensure corresponding tests are updated.
+When adding tests, ensure they test actual source code.
+
+Modified source: {trigger_files}
+Expected tests: {expected_files}
diff --git a/src/deepwork/standard_jobs/deepwork_rules/steps/define.md b/src/deepwork/standard_jobs/deepwork_rules/steps/define.md
index 3e8be899..1e38a5e6 100644
--- a/src/deepwork/standard_jobs/deepwork_rules/steps/define.md
+++ b/src/deepwork/standard_jobs/deepwork_rules/steps/define.md
@@ -2,7 +2,7 @@
 
 ## Objective
 
-Create or update rule entries in the `.deepwork.rules.yml` file to enforce team guidelines, documentation requirements, or other constraints when specific files change.
+Create a new rule file in the `.deepwork/rules/` directory to enforce team guidelines, documentation requirements, or other constraints when specific files change.
 
 ## Task
 
@@ -29,9 +29,28 @@ Start by asking structured questions to understand what the user wants to enforc
    - For example: If config changes AND install_guide.md changes, assume docs are already updated
    - This prevents redundant prompts when the user has already done the right thing
 
-### Step 2: Define the Trigger Patterns
+### Step 2: Choose the Detection Mode
 
-Help the user define glob patterns for files that should trigger the rule:
+Help the user select the appropriate detection mode:
+
+**Trigger/Safety Mode** (most common):
+- Fires when trigger patterns match AND no safety patterns match
+- Use for: "When X changes, check Y" rules
+- Example: When config changes, verify install docs
+
+**Set Mode** (bidirectional correspondence):
+- Fires when files that should change together don't all change
+- Use for: Source/test pairing, model/migration sync
+- Example: `src/foo.py` and `tests/foo_test.py` should change together
+
+**Pair Mode** (directional correspondence):
+- Fires when a trigger file changes but expected files don't
+- Changes to expected files alone do NOT trigger
+- Use for: API code requires documentation updates (but docs can update independently)
+
+### Step 3: Define the Patterns
+
+Help the user define glob patterns for files.
 
 **Common patterns:**
 - `src/**/*.py` - All Python files in src directory (recursive)
@@ -40,41 +59,28 @@ Help the user define glob patterns for files that should trigger the rule:
 - `src/api/**/*` - All files in the API directory
 - `migrations/**/*.sql` - All SQL migrations
 
+**Variable patterns (for set/pair modes):**
+- `src/{path}.py` - Captures path variable (e.g., `foo/bar` from `src/foo/bar.py`)
+- `tests/{path}_test.py` - Uses same path variable in corresponding file
+- `{name}` matches single segment, `{path}` matches multiple segments
+
 **Pattern syntax:**
 - `*` - Matches any characters within a single path segment
 - `**` - Matches any characters across multiple path segments (recursive)
 - `?` - Matches a single character
 
-### Step 3: Define Safety Patterns (Optional)
-
-If there are files that, when also changed, mean the rule shouldn't fire:
-
-**Examples:**
-- Rule: "Update install guide when config changes"
-  - Trigger: `app/config/**/*`
-  - Safety: `docs/install_guide.md` (if already updated, don't prompt)
-
-- Rule: "Security review for auth changes"
-  - Trigger: `src/auth/**/*`
-  - Safety: `SECURITY.md`, `docs/security_review.md`
-
-### Step 3b: Choose the Comparison Mode (Optional)
+### Step 4: Choose the Comparison Mode (Optional)
 
 The `compare_to` field controls what baseline is used when detecting "changed files":
 
 **Options:**
-- `base` (default) - Compares to the base of the current branch (merge-base with main/master). This is the most common choice for feature branches, as it shows all changes made on the branch.
-- `default_tip` - Compares to the current tip of the default branch (main/master). Useful when you want to see the difference from what's currently in production.
-- `prompt` - Compares to the state at the start of each prompt. Useful for rules that should only fire based on changes made during a single agent response.
-
-**When to use each:**
-- **base**: Best for most rules. "Did this branch change config files?" -> trigger docs review
-- **default_tip**: For rules about what's different from production/main
-- **prompt**: For rules that should only consider very recent changes within the current session
+- `base` (default) - Compares to the base of the current branch (merge-base with main/master). Best for feature branches.
+- `default_tip` - Compares to the current tip of the default branch. Useful for seeing difference from production.
+- `prompt` - Compares to the state at the start of each prompt. For rules about very recent changes.
 
 Most rules should use the default (`base`) and don't need to specify `compare_to`.
 
-### Step 4: Write the Instructions
+### Step 5: Write the Instructions
 
 Create clear, actionable instructions for what the agent should do when the rule fires.
 
@@ -84,45 +90,62 @@ Create clear, actionable instructions for what the agent should do when the rule
 - Specific actions to take
 - Quality criteria for completion
 
-**Example:**
-```
-Configuration files have changed. Please:
-1. Review docs/install_guide.md for accuracy
-2. Update any installation steps that reference changed config
-3. Verify environment variable documentation is current
-4. Test that installation instructions still work
-```
+**Template variables available in instructions:**
+- `{trigger_files}` - Files that triggered the rule
+- `{expected_files}` - Expected corresponding files (for set/pair modes)
+
+### Step 6: Create the Rule File
 
-### Step 5: Create the Rule Entry
+Create a new file in `.deepwork/rules/` with a kebab-case filename:
 
-Create or update `.deepwork.rules.yml` in the project root.
+**File Location**: `.deepwork/rules/{rule-name}.md`
 
-**File Location**: `.deepwork.rules.yml` (root of project)
+**Format for Trigger/Safety Mode:**
+```markdown
+---
+name: Friendly Name for the Rule
+trigger: "glob/pattern/**/*"  # or array: ["pattern1", "pattern2"]
+safety: "optional/pattern"    # optional, or array
+compare_to: base              # optional: "base" (default), "default_tip", or "prompt"
+---
+Instructions for the agent when this rule fires.
 
-**Format**:
-```yaml
-- name: "[Friendly name for the rule]"
-  trigger: "[glob pattern]"  # or array: ["pattern1", "pattern2"]
-  safety: "[glob pattern]"   # optional, or array
-  compare_to: "base"         # optional: "base" (default), "default_tip", or "prompt"
-  instructions: |
-    [Multi-line instructions for the agent...]
+Multi-line markdown content is supported.
 ```
 
-**Alternative with instructions_file**:
-```yaml
-- name: "[Friendly name for the rule]"
-  trigger: "[glob pattern]"
-  safety: "[glob pattern]"
-  compare_to: "base"         # optional
-  instructions_file: "path/to/instructions.md"
+**Format for Set Mode (bidirectional):**
+```markdown
+---
+name: Source/Test Pairing
+set:
+  - src/{path}.py
+  - tests/{path}_test.py
+---
+Source and test files should change together.
+
+Modified: {trigger_files}
+Expected: {expected_files}
 ```
 
-### Step 6: Verify the Rule
+**Format for Pair Mode (directional):**
+```markdown
+---
+name: API Documentation
+pair:
+  trigger: api/{path}.py
+  expects: docs/api/{path}.md
+---
+API code requires documentation updates.
+
+Changed API: {trigger_files}
+Update docs: {expected_files}
+```
+
+### Step 7: Verify the Rule
 
 After creating the rule:
 
-1. **Check the YAML syntax** - Ensure valid YAML formatting
+1. **Check the YAML frontmatter** - Ensure valid YAML formatting
 2. **Test trigger patterns** - Verify patterns match intended files
 3. **Review instructions** - Ensure they're clear and actionable
 4. **Check for conflicts** - Ensure the rule doesn't conflict with existing ones
@@ -130,69 +153,97 @@ After creating the rule:
 ## Example Rules
 
 ### Update Documentation on Config Changes
-```yaml
-- name: "Update install guide on config changes"
-  trigger: "app/config/**/*"
-  safety: "docs/install_guide.md"
-  instructions: |
-    Configuration files have been modified. Please review docs/install_guide.md
-    and update it if any installation instructions need to change based on the
-    new configuration.
+`.deepwork/rules/config-docs.md`:
+```markdown
+---
+name: Update Install Guide on Config Changes
+trigger: app/config/**/*
+safety: docs/install_guide.md
+---
+Configuration files have been modified. Please review docs/install_guide.md
+and update it if any installation instructions need to change based on the
+new configuration.
 ```
 
 ### Security Review for Auth Code
-```yaml
-- name: "Security review for authentication changes"
-  trigger:
-    - "src/auth/**/*"
-    - "src/security/**/*"
-  safety:
-    - "SECURITY.md"
-    - "docs/security_audit.md"
-  instructions: |
-    Authentication or security code has been changed. Please:
-    1. Review for hardcoded credentials or secrets
-    2. Check input validation on user inputs
-    3. Verify access control logic is correct
-    4. Update security documentation if needed
+`.deepwork/rules/security-review.md`:
+```markdown
+---
+name: Security Review for Authentication Changes
+trigger:
+  - src/auth/**/*
+  - src/security/**/*
+safety:
+  - SECURITY.md
+  - docs/security_audit.md
+---
+Authentication or security code has been changed. Please:
+
+1. Review for hardcoded credentials or secrets
+2. Check input validation on user inputs
+3. Verify access control logic is correct
+4. Update security documentation if needed
+```
+
+### Source/Test Pairing
+`.deepwork/rules/source-test-pairing.md`:
+```markdown
+---
+name: Source/Test Pairing
+set:
+  - src/{path}.py
+  - tests/{path}_test.py
+---
+Source and test files should change together.
+
+When modifying source code, ensure corresponding tests are updated.
+When adding tests, ensure they test actual source code.
+
+Modified: {trigger_files}
+Expected: {expected_files}
 ```
 
 ### API Documentation Sync
-```yaml
-- name: "API documentation update"
-  trigger: "src/api/**/*.py"
-  safety: "docs/api/**/*.md"
-  instructions: |
-    API code has changed. Please verify that API documentation in docs/api/
-    is up to date with the code changes. Pay special attention to:
-    - New or changed endpoints
-    - Modified request/response schemas
-    - Updated authentication requirements
+`.deepwork/rules/api-docs.md`:
+```markdown
+---
+name: API Documentation Update
+pair:
+  trigger: src/api/{path}.py
+  expects: docs/api/{path}.md
+---
+API code has changed. Please verify that API documentation in docs/api/
+is up to date with the code changes. Pay special attention to:
+
+- New or changed endpoints
+- Modified request/response schemas
+- Updated authentication requirements
+
+Changed API: {trigger_files}
+Update: {expected_files}
 ```
 
 ## Output Format
 
-### .deepwork.rules.yml
-Create or update this file at the project root with the new rule entry.
+### .deepwork/rules/{rule-name}.md
+Create a new file with the rule definition using YAML frontmatter and markdown body.
 
 ## Quality Criteria
 
 - Asked structured questions to understand user requirements
-- Rule name is clear and descriptive
-- Trigger patterns accurately match the intended files
-- Safety patterns prevent unnecessary triggering
+- Rule name is clear and descriptive (used in promise tags)
+- Correct detection mode selected for the use case
+- Patterns accurately match the intended files
+- Safety patterns prevent unnecessary triggering (if applicable)
 - Instructions are actionable and specific
-- YAML is valid and properly formatted
+- YAML frontmatter is valid
 
 ## Context
 
-Rules are evaluated automatically when you finish working on a task. The system:
-1. Determines which files have changed based on each rule's `compare_to` setting:
-   - `base` (default): Files changed since the branch diverged from main/master
-   - `default_tip`: Files different from the current main/master branch
-   - `prompt`: Files changed since the last prompt submission
-2. Checks if any changes match rule trigger patterns
-3. Skips rules where safety patterns also matched
+Rules are evaluated automatically when the agent finishes a task. The system:
+1. Determines which files have changed based on each rule's `compare_to` setting
+2. Evaluates rules based on their detection mode (trigger/safety, set, or pair)
+3. Skips rules where the correspondence is satisfied (for set/pair) or safety matched
 4. Prompts you with instructions for any triggered rules
 
-You can mark a rule as addressed by including `<promise>Rule Name</promise>` in your response (replace Rule Name with the actual rule name). This tells the system you've already handled that rule's requirements.
+You can mark a rule as addressed by including `<promise>Rule Name</promise>` in your response (replace Rule Name with the actual rule name from the `name` field). This tells the system you've already handled that rule's requirements.
diff --git a/src/deepwork/templates/default_rules.yml b/src/deepwork/templates/default_rules.yml
deleted file mode 100644
index ec0fbd31..00000000
--- a/src/deepwork/templates/default_rules.yml
+++ /dev/null
@@ -1,53 +0,0 @@
-# DeepWork Rules Configuration
-#
-# Rules are automated guardrails that trigger when specific files change.
-# They help ensure documentation stays current, security reviews happen, etc.
-#
-# Use /deepwork_rules.define to create new rules interactively.
-#
-# Format:
-#   - name: "Friendly name for the rule"
-#     trigger: "glob/pattern/**/*"  # or array: ["pattern1", "pattern2"]
-#     safety: "pattern/**/*"        # optional - if these also changed, skip the rule
-#     compare_to: "base"            # optional: "base" (default), "default_tip", or "prompt"
-#     instructions: |
-#       Multi-line instructions for the AI agent...
-#
-# Example rules (uncomment and customize):
-#
-# - name: "README Documentation"
-#   trigger: "src/**/*"
-#   safety: "README.md"
-#   instructions: |
-#     Source code has been modified. Please review README.md for accuracy:
-#     1. Verify the project overview reflects current functionality
-#     2. Check that usage examples are still correct
-#     3. Ensure installation/setup instructions remain valid
-#
-# - name: "API Documentation Sync"
-#   trigger: "src/api/**/*"
-#   safety: "docs/api/**/*.md"
-#   instructions: |
-#     API code has changed. Please verify that API documentation is up to date:
-#     - New or changed endpoints
-#     - Modified request/response schemas
-#     - Updated authentication requirements
-#
-# - name: "Security Review for Auth Changes"
-#   trigger:
-#     - "src/auth/**/*"
-#     - "src/security/**/*"
-#   instructions: |
-#     Authentication or security code has been changed. Please:
-#     1. Review for hardcoded credentials or secrets
-#     2. Check input validation on user inputs
-#     3. Verify access control logic is correct
-#
-# - name: "Test Coverage for New Code"
-#   trigger: "src/**/*.py"
-#   safety: "tests/**/*.py"
-#   instructions: |
-#     New source code was added. Please ensure appropriate test coverage:
-#     1. Add unit tests for new functions/methods
-#     2. Update integration tests if behavior changed
-#     3. Verify all new code paths are tested
diff --git a/tests/integration/test_install_flow.py b/tests/integration/test_install_flow.py
index ab394961..a6a40659 100644
--- a/tests/integration/test_install_flow.py
+++ b/tests/integration/test_install_flow.py
@@ -152,8 +152,8 @@ def test_install_is_idempotent(self, mock_claude_project: Path) -> None:
         assert (claude_dir / "deepwork_jobs.define.md").exists()
         assert (claude_dir / "deepwork_jobs.learn.md").exists()
 
-    def test_install_creates_rules_template(self, mock_claude_project: Path) -> None:
-        """Test that install creates a rules template file."""
+    def test_install_creates_rules_directory(self, mock_claude_project: Path) -> None:
+        """Test that install creates the v2 rules directory with example templates."""
         runner = CliRunner()
 
         result = runner.invoke(
@@ -163,34 +163,40 @@ def test_install_creates_rules_template(self, mock_claude_project: Path) -> None
         )
 
         assert result.exit_code == 0
-        assert ".deepwork.rules.yml template" in result.output
+        assert ".deepwork/rules/ with example templates" in result.output
 
-        # Verify rules file was created
-        rules_file = mock_claude_project / ".deepwork.rules.yml"
-        assert rules_file.exists()
+        # Verify rules directory was created
+        rules_dir = mock_claude_project / ".deepwork" / "rules"
+        assert rules_dir.exists()
 
-        # Verify it's the template (has comment header, no active rules)
-        content = rules_file.read_text()
-        assert "# DeepWork Rules Configuration" in content
-        assert "# Use /deepwork_rules.define" in content
+        # Verify README was created
+        readme_file = rules_dir / "README.md"
+        assert readme_file.exists()
+        content = readme_file.read_text()
+        assert "DeepWork Rules" in content
+        assert "YAML frontmatter" in content
 
-        # Verify it does NOT contain deepwork-specific rules
-        assert "Standard Jobs Source of Truth" not in content
-        assert "Version and Changelog Update" not in content
-        assert "pyproject.toml" not in content
+        # Verify example templates were copied
+        example_files = list(rules_dir.glob("*.md.example"))
+        assert len(example_files) >= 1  # At least one example template
 
-    def test_install_preserves_existing_rules_file(self, mock_claude_project: Path) -> None:
-        """Test that install doesn't overwrite existing rules file."""
+    def test_install_preserves_existing_rules_directory(
+        self, mock_claude_project: Path
+    ) -> None:
+        """Test that install doesn't overwrite existing rules directory."""
         runner = CliRunner()
 
-        # Create a custom rules file before install
-        rules_file = mock_claude_project / ".deepwork.rules.yml"
-        custom_content = """- name: "My Custom Rule"
-  trigger: "src/**/*"
-  instructions: |
-    Custom instructions here.
+        # Create a custom rules directory before install
+        rules_dir = mock_claude_project / ".deepwork" / "rules"
+        rules_dir.mkdir(parents=True)
+        custom_rule = rules_dir / "my-custom-rule.md"
+        custom_content = """---
+name: My Custom Rule
+trigger: "src/**/*"
+---
+Custom instructions here.
 """
-        rules_file.write_text(custom_content)
+        custom_rule.write_text(custom_content)
 
         result = runner.invoke(
             cli,
@@ -199,10 +205,10 @@ def test_install_preserves_existing_rules_file(self, mock_claude_project: Path)
         )
 
         assert result.exit_code == 0
-        assert ".deepwork.rules.yml already exists" in result.output
+        assert ".deepwork/rules/ already exists" in result.output
 
         # Verify original content is preserved
-        assert rules_file.read_text() == custom_content
+        assert custom_rule.read_text() == custom_content
 
 
 class TestCLIEntryPoint:

From 34484a3ac850f604b58a4b84b3cf766942a601b0 Mon Sep 17 00:00:00 2001
From: Noah Horton <noah@unsupervised.com>
Date: Sat, 17 Jan 2026 14:59:10 -0700
Subject: [PATCH 15/21] Fix hook exit code to always return 0 with JSON format

Hooks using JSON output format should always exit with code 0.
The blocking behavior is controlled by the "decision" field in the
JSON output, not the exit code.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---
 src/deepwork/hooks/wrapper.py                  | 5 ++---
 tests/shell_script_tests/test_hook_wrappers.py | 6 ++++--
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/src/deepwork/hooks/wrapper.py b/src/deepwork/hooks/wrapper.py
index 4733b5fb..ef20899c 100644
--- a/src/deepwork/hooks/wrapper.py
+++ b/src/deepwork/hooks/wrapper.py
@@ -358,7 +358,6 @@ def run_hook(
     output_json = denormalize_output(hook_output, platform, hook_input.event)
     write_stdout(output_json)
 
-    # Return exit code based on decision
-    if hook_output.decision in ("block", "deny"):
-        return 2
+    # Always return 0 when using JSON output format
+    # The decision field in the JSON controls blocking behavior
     return 0
diff --git a/tests/shell_script_tests/test_hook_wrappers.py b/tests/shell_script_tests/test_hook_wrappers.py
index ee2c0155..6ba604c4 100644
--- a/tests/shell_script_tests/test_hook_wrappers.py
+++ b/tests/shell_script_tests/test_hook_wrappers.py
@@ -209,7 +209,8 @@ def test_claude_wrapper_with_stop_event(
             env=env,
         )
 
-        assert result.returncode == 2, f"Expected exit code 2 for blocking. stderr: {result.stderr}"
+        # Exit code 0 even when blocking - the JSON decision field controls behavior
+        assert result.returncode == 0, f"Expected exit code 0. stderr: {result.stderr}"
 
         output = json.loads(result.stdout.strip())
         assert output["decision"] == "block"
@@ -242,7 +243,8 @@ def test_gemini_wrapper_with_afteragent_event(
             env=env,
         )
 
-        assert result.returncode == 2, f"Expected exit code 2 for blocking. stderr: {result.stderr}"
+        # Exit code 0 even when blocking - the JSON decision field controls behavior
+        assert result.returncode == 0, f"Expected exit code 0. stderr: {result.stderr}"
 
         output = json.loads(result.stdout.strip())
         # Gemini should get "deny" instead of "block"

From 53a3202b781dee0f835e264288f1b31c3f9f2738 Mon Sep 17 00:00:00 2001
From: Noah Horton <noah@unsupervised.com>
Date: Sat, 17 Jan 2026 15:08:22 -0700
Subject: [PATCH 16/21] Add critical contract warning comments to hook test
 files

Add prominent warning comments to test files that verify Claude Code hook
JSON format and exit code contracts. These comments reference the official
documentation and clearly mark tests that should not be modified without
consulting the hook specification.

Files updated:
- tests/shell_script_tests/test_hooks_json_format.py
- tests/shell_script_tests/test_hook_wrappers.py
- tests/unit/test_hook_wrapper.py

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---
 .../shell_script_tests/test_hook_wrappers.py  | 44 +++++++++
 .../test_hooks_json_format.py                 | 73 +++++++++++++-
 tests/unit/test_hook_wrapper.py               | 94 +++++++++++++++++--
 3 files changed, 198 insertions(+), 13 deletions(-)

diff --git a/tests/shell_script_tests/test_hook_wrappers.py b/tests/shell_script_tests/test_hook_wrappers.py
index 6ba604c4..a9f3ae74 100644
--- a/tests/shell_script_tests/test_hook_wrappers.py
+++ b/tests/shell_script_tests/test_hook_wrappers.py
@@ -1,5 +1,27 @@
 """Tests for the platform hook wrapper shell scripts.
 
+# ******************************************************************************
+# ***                         CRITICAL CONTRACT TESTS                        ***
+# ******************************************************************************
+#
+# These tests verify the integration between shell wrapper scripts and the
+# Python hook wrapper, following Claude Code hook contracts documented in:
+# doc/platforms/claude/hooks_system.md
+#
+# Hook Contract Summary:
+#   - Exit code 0: Success, stdout parsed as JSON
+#   - Exit code 2: Blocking error, stderr shown (NOT used for JSON format)
+#   - Allow response: {} (empty JSON object)
+#   - Block response: {"decision": "block", "reason": "..."}
+#
+# CRITICAL: Hooks using JSON output format MUST return exit code 0.
+# The "decision" field in the JSON controls blocking behavior, NOT the exit code.
+#
+# DO NOT MODIFY the exit code assertions without first consulting the official
+# Claude Code documentation at: https://docs.anthropic.com/en/docs/claude-code/hooks
+#
+# ******************************************************************************
+
 These tests verify that claude_hook.sh and gemini_hook.sh correctly
 invoke Python hooks and handle input/output.
 """
@@ -130,6 +152,28 @@ def test_sets_platform_environment_variable(self, hooks_dir: Path, src_dir: Path
         assert 'DEEPWORK_HOOK_PLATFORM="gemini"' in content
 
 
+# ******************************************************************************
+# ***                    DO NOT EDIT EXIT CODE ASSERTIONS!                   ***
+# ******************************************************************************
+#
+# As documented in doc/platforms/claude/hooks_system.md:
+#
+#   | Exit Code | Meaning         | Behavior                          |
+#   |-----------|-----------------|-----------------------------------|
+#   | 0         | Success         | stdout parsed as JSON             |
+#   | 2         | Blocking error  | stderr shown, operation blocked   |
+#   | Other     | Warning         | stderr logged, continues          |
+#
+# CRITICAL: Hooks using JSON output format MUST return exit code 0.
+# The "decision" field in the JSON controls blocking behavior, NOT the exit code.
+#
+# Example valid outputs:
+#   Exit 0 + stdout: {}                                      -> Allow
+#   Exit 0 + stdout: {"decision": "block", "reason": "..."}  -> Block
+#   Exit 0 + stdout: {"decision": "deny", "reason": "..."}   -> Block (Gemini)
+#
+# See: https://docs.anthropic.com/en/docs/claude-code/hooks
+# ******************************************************************************
 class TestHookWrapperIntegration:
     """Integration tests for hook wrappers with actual Python hooks."""
 
diff --git a/tests/shell_script_tests/test_hooks_json_format.py b/tests/shell_script_tests/test_hooks_json_format.py
index 74bea39b..0d9e793b 100644
--- a/tests/shell_script_tests/test_hooks_json_format.py
+++ b/tests/shell_script_tests/test_hooks_json_format.py
@@ -1,5 +1,23 @@
 """Tests for Claude Code hooks JSON format validation.
 
+# ******************************************************************************
+# ***                         CRITICAL CONTRACT TESTS                        ***
+# ******************************************************************************
+#
+# These tests verify the EXACT format required by Claude Code hooks as
+# documented in: doc/platforms/claude/hooks_system.md
+#
+# DO NOT MODIFY these tests without first consulting the official Claude Code
+# documentation at: https://docs.anthropic.com/en/docs/claude-code/hooks
+#
+# Hook Contract Summary:
+#   - Exit code 0: Success, stdout parsed as JSON
+#   - Exit code 2: Blocking error, stderr shown
+#   - Allow response: {} (empty JSON object)
+#   - Block response: {"decision": "block", "reason": "..."}
+#
+# ******************************************************************************
+
 Claude Code hooks have specific JSON response formats that must be followed:
 
 Stop hooks (hooks.after_agent):
@@ -66,6 +84,13 @@ def validate_json_output(output: str) -> dict | None:
         pytest.fail(f"Invalid JSON output: {stripped!r}. Error: {e}")
 
 
+# ******************************************************************************
+# *** DO NOT EDIT THIS FUNCTION! ***
+# As documented in doc/platforms/claude/hooks_system.md, Stop hooks must return:
+#   - {} (empty object) to allow
+#   - {"decision": "block", "reason": "..."} to block
+# Any other format will cause undefined behavior in Claude Code.
+# ******************************************************************************
 def validate_stop_hook_response(response: dict | None) -> None:
     """
     Validate a Stop hook response follows Claude Code format.
@@ -323,11 +348,39 @@ def test_stop_hook_with_promise_returns_empty(
             os.unlink(transcript_path)
 
 
+# ******************************************************************************
+# ***                    DO NOT EDIT THESE EXIT CODE TESTS!                  ***
+# ******************************************************************************
+#
+# As documented in doc/platforms/claude/hooks_system.md:
+#
+#   | Exit Code | Meaning         | Behavior                          |
+#   |-----------|-----------------|-----------------------------------|
+#   | 0         | Success         | stdout parsed as JSON             |
+#   | 2         | Blocking error  | stderr shown, operation blocked   |
+#   | Other     | Warning         | stderr logged, continues          |
+#
+# CRITICAL: Hooks using JSON output format MUST return exit code 0.
+# The "decision" field in the JSON controls blocking behavior, NOT the exit code.
+#
+# Example valid outputs:
+#   Exit 0 + stdout: {}                                      -> Allow
+#   Exit 0 + stdout: {"decision": "block", "reason": "..."}  -> Block
+#
+# See: https://docs.anthropic.com/en/docs/claude-code/hooks
+# ******************************************************************************
 class TestHooksExitCodes:
-    """Tests for hook script exit codes."""
+    """Tests for hook script exit codes.
+
+    CRITICAL: These tests verify the documented Claude Code hook contract.
+    All hooks MUST exit 0 when using JSON output format.
+    """
 
     def test_stop_hook_exits_zero_on_allow(self, rules_hooks_dir: Path, git_repo: Path) -> None:
-        """Test that stop hook exits 0 when allowing."""
+        """Test that stop hook exits 0 when allowing.
+
+        DO NOT CHANGE THIS TEST - it verifies the documented hook contract.
+        """
         script_path = rules_hooks_dir / "rules_stop_hook.sh"
         stdout, stderr, code = run_hook_script(script_path, git_repo)
 
@@ -336,7 +389,11 @@ def test_stop_hook_exits_zero_on_allow(self, rules_hooks_dir: Path, git_repo: Pa
     def test_stop_hook_exits_zero_on_block(
         self, rules_hooks_dir: Path, git_repo_with_rule: Path
     ) -> None:
-        """Test that stop hook exits 0 even when blocking."""
+        """Test that stop hook exits 0 even when blocking.
+
+        DO NOT CHANGE THIS TEST - it verifies the documented hook contract.
+        Blocking is communicated via JSON {"decision": "block"}, NOT via exit code.
+        """
         py_file = git_repo_with_rule / "test.py"
         py_file.write_text("# Python file\n")
         repo = Repo(git_repo_with_rule)
@@ -349,14 +406,20 @@ def test_stop_hook_exits_zero_on_block(
         assert code == 0, f"Block should still exit 0. stderr: {stderr}"
 
     def test_user_prompt_hook_exits_zero(self, rules_hooks_dir: Path, git_repo: Path) -> None:
-        """Test that user prompt hook always exits 0."""
+        """Test that user prompt hook always exits 0.
+
+        DO NOT CHANGE THIS TEST - it verifies the documented hook contract.
+        """
         script_path = rules_hooks_dir / "user_prompt_submit.sh"
         stdout, stderr, code = run_hook_script(script_path, git_repo)
 
         assert code == 0, f"User prompt hook should exit 0. stderr: {stderr}"
 
     def test_capture_script_exits_zero(self, rules_hooks_dir: Path, git_repo: Path) -> None:
-        """Test that capture script exits 0."""
+        """Test that capture script exits 0.
+
+        DO NOT CHANGE THIS TEST - it verifies the documented hook contract.
+        """
         script_path = rules_hooks_dir / "capture_prompt_work_tree.sh"
         stdout, stderr, code = run_hook_script(script_path, git_repo)
 
diff --git a/tests/unit/test_hook_wrapper.py b/tests/unit/test_hook_wrapper.py
index 5e8db92b..fd1a51d9 100644
--- a/tests/unit/test_hook_wrapper.py
+++ b/tests/unit/test_hook_wrapper.py
@@ -1,5 +1,22 @@
 """Tests for the hook wrapper module.
 
+# ******************************************************************************
+# ***                         CRITICAL CONTRACT TESTS                        ***
+# ******************************************************************************
+#
+# These tests verify the EXACT format required by Claude Code hooks as
+# documented in: doc/platforms/claude/hooks_system.md
+#
+# Hook JSON Contract Summary:
+#   - Allow response: {} (empty JSON object)
+#   - Block response: {"decision": "block", "reason": "..."} (Claude Code)
+#   - Block response: {"decision": "deny", "reason": "..."}  (Gemini CLI)
+#
+# DO NOT MODIFY these tests without first consulting the official Claude Code
+# documentation at: https://docs.anthropic.com/en/docs/claude-code/hooks
+#
+# ******************************************************************************
+
 These tests verify that the hook wrapper correctly normalizes input/output
 between different AI CLI platforms (Claude Code, Gemini CLI).
 """
@@ -157,18 +174,34 @@ def test_empty_input(self) -> None:
         assert hook_input.tool_name == ""
 
 
+# ******************************************************************************
+# *** DO NOT EDIT THESE OUTPUT FORMAT TESTS! ***
+# As documented in doc/platforms/claude/hooks_system.md, hook responses must be:
+#   - {} (empty object) to allow
+#   - {"decision": "block", "reason": "..."} to block (Claude Code)
+#   - {"decision": "deny", "reason": "..."} to block (Gemini CLI)
+# Any other format may cause undefined behavior.
+# See: https://docs.anthropic.com/en/docs/claude-code/hooks
+# ******************************************************************************
 class TestHookOutput:
     """Tests for HookOutput denormalization."""
 
     def test_empty_output_produces_empty_json(self) -> None:
-        """Test that empty HookOutput produces empty dict."""
+        """Test that empty HookOutput produces empty dict.
+
+        DO NOT CHANGE THIS TEST - it verifies the documented hook contract.
+        """
         output = HookOutput()
         result = output.to_dict(Platform.CLAUDE, NormalizedEvent.AFTER_AGENT)
 
         assert result == {}
 
     def test_block_decision_claude(self) -> None:
-        """Test blocking output for Claude."""
+        """Test blocking output for Claude.
+
+        DO NOT CHANGE THIS TEST - it verifies the documented hook contract.
+        Claude Code expects {"decision": "block", "reason": "..."} to block.
+        """
         output = HookOutput(decision="block", reason="Must complete X first")
         result = output.to_dict(Platform.CLAUDE, NormalizedEvent.AFTER_AGENT)
 
@@ -176,7 +209,11 @@ def test_block_decision_claude(self) -> None:
         assert result["reason"] == "Must complete X first"
 
     def test_block_decision_gemini_converts_to_deny(self) -> None:
-        """Test that 'block' is converted to 'deny' for Gemini."""
+        """Test that 'block' is converted to 'deny' for Gemini.
+
+        DO NOT CHANGE THIS TEST - it verifies the documented hook contract.
+        Gemini CLI expects {"decision": "deny", "reason": "..."} to block.
+        """
         output = HookOutput(decision="block", reason="Must complete X first")
         result = output.to_dict(Platform.GEMINI, NormalizedEvent.AFTER_AGENT)
 
@@ -288,11 +325,22 @@ def test_invalid_json(self) -> None:
         assert hook_input.session_id == ""
 
 
+# ******************************************************************************
+# *** DO NOT EDIT THESE JSON OUTPUT TESTS! ***
+# As documented in doc/platforms/claude/hooks_system.md, hook JSON output must:
+#   - Be valid JSON
+#   - Return {} for allow
+#   - Return {"decision": "block", "reason": "..."} for block
+# See: https://docs.anthropic.com/en/docs/claude-code/hooks
+# ******************************************************************************
 class TestDenormalizeOutput:
     """Tests for the denormalize_output function."""
 
     def test_produces_valid_json(self) -> None:
-        """Test that output is valid JSON."""
+        """Test that output is valid JSON.
+
+        DO NOT CHANGE THIS TEST - it verifies the documented hook contract.
+        """
         output = HookOutput(decision="block", reason="test")
         json_str = denormalize_output(output, Platform.CLAUDE, NormalizedEvent.AFTER_AGENT)
 
@@ -301,7 +349,10 @@ def test_produces_valid_json(self) -> None:
         assert parsed["decision"] == "block"
 
     def test_empty_output_produces_empty_object(self) -> None:
-        """Test that empty output produces '{}'."""
+        """Test that empty output produces '{}' (allow response).
+
+        DO NOT CHANGE THIS TEST - it verifies the documented hook contract.
+        """
         output = HookOutput()
         json_str = denormalize_output(output, Platform.CLAUDE, NormalizedEvent.AFTER_AGENT)
 
@@ -389,11 +440,31 @@ def test_common_tools_map_to_same_normalized_name(self) -> None:
                 assert TOOL_TO_NORMALIZED[Platform.GEMINI][gemini_tool] == tool
 
 
+# ******************************************************************************
+# ***                    DO NOT EDIT THESE INTEGRATION TESTS!                ***
+# ******************************************************************************
+#
+# These tests verify the complete input/output flow for both Claude Code and
+# Gemini CLI, following the hook contracts documented in:
+# doc/platforms/claude/hooks_system.md
+#
+# Claude Code contract:
+#   - Block: {"decision": "block", "reason": "..."}
+#
+# Gemini CLI contract:
+#   - Block: {"decision": "deny", "reason": "..."}
+#
+# The "block" vs "deny" terminology is a platform difference, not a bug.
+# See: https://docs.anthropic.com/en/docs/claude-code/hooks
+# ******************************************************************************
 class TestIntegration:
     """Integration tests for the full normalization flow."""
 
     def test_claude_stop_hook_flow(self) -> None:
-        """Test complete flow for Claude Stop hook."""
+        """Test complete flow for Claude Stop hook.
+
+        DO NOT CHANGE THIS TEST - it verifies the documented hook contract.
+        """
         # Input from Claude
         raw_input = json.dumps(
             {
@@ -419,7 +490,10 @@ def test_claude_stop_hook_flow(self) -> None:
         assert "Rule X" in result["reason"]
 
     def test_gemini_afteragent_hook_flow(self) -> None:
-        """Test complete flow for Gemini AfterAgent hook."""
+        """Test complete flow for Gemini AfterAgent hook.
+
+        DO NOT CHANGE THIS TEST - it verifies the documented hook contract.
+        """
         # Input from Gemini
         raw_input = json.dumps(
             {
@@ -447,7 +521,11 @@ def test_gemini_afteragent_hook_flow(self) -> None:
         assert "Rule Y" in result["reason"]
 
     def test_cross_platform_same_hook_logic(self) -> None:
-        """Test that the same hook logic produces correct output for both platforms."""
+        """Test that the same hook logic produces correct output for both platforms.
+
+        DO NOT CHANGE THIS TEST - it verifies the documented hook contract.
+        The "block" vs "deny" platform difference is intentional.
+        """
 
         def sample_hook(hook_input: HookInput) -> HookOutput:
             """Sample hook that blocks if event is after_agent."""

From 0ce890e8e1f0ef16f24e1c90791097a8ed682150 Mon Sep 17 00:00:00 2001
From: Noah Horton <noah@unsupervised.com>
Date: Sat, 17 Jan 2026 15:14:41 -0700
Subject: [PATCH 17/21] Merge hook test files into single test_hooks.py

Consolidate test_hooks_json_format.py and test_hook_wrappers.py into a
single test_hooks.py file with logical organization:

- TestClaudeHookWrapper / TestGeminiHookWrapper: Platform wrapper scripts
- TestRulesStopHook / TestUserPromptSubmitHook: Rules-specific hooks
- TestHooksWithTranscript: Transcript input handling
- TestHookExitCodes: Exit code contract tests (DO NOT EDIT)
- TestHookWrapperIntegration: Integration tests with Python hooks
- TestRulesCheckModule: Python module tests

Also moved hooks_dir and src_dir fixtures to conftest.py for sharing.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---
 tests/shell_script_tests/conftest.py          |  12 +
 .../shell_script_tests/test_hook_wrappers.py  | 357 ------------------
 ...est_hooks_json_format.py => test_hooks.py} | 355 +++++++++++++++--
 3 files changed, 345 insertions(+), 379 deletions(-)
 delete mode 100644 tests/shell_script_tests/test_hook_wrappers.py
 rename tests/shell_script_tests/{test_hooks_json_format.py => test_hooks.py} (55%)

diff --git a/tests/shell_script_tests/conftest.py b/tests/shell_script_tests/conftest.py
index 64e62f1c..3ac15822 100644
--- a/tests/shell_script_tests/conftest.py
+++ b/tests/shell_script_tests/conftest.py
@@ -68,6 +68,18 @@ def rules_hooks_dir() -> Path:
     )
 
 
+@pytest.fixture
+def hooks_dir() -> Path:
+    """Return the path to the main hooks directory (platform wrappers)."""
+    return Path(__file__).parent.parent.parent / "src" / "deepwork" / "hooks"
+
+
+@pytest.fixture
+def src_dir() -> Path:
+    """Return the path to the src directory for PYTHONPATH."""
+    return Path(__file__).parent.parent.parent / "src"
+
+
 @pytest.fixture
 def jobs_scripts_dir() -> Path:
     """Return the path to the jobs scripts directory."""
diff --git a/tests/shell_script_tests/test_hook_wrappers.py b/tests/shell_script_tests/test_hook_wrappers.py
deleted file mode 100644
index a9f3ae74..00000000
--- a/tests/shell_script_tests/test_hook_wrappers.py
+++ /dev/null
@@ -1,357 +0,0 @@
-"""Tests for the platform hook wrapper shell scripts.
-
-# ******************************************************************************
-# ***                         CRITICAL CONTRACT TESTS                        ***
-# ******************************************************************************
-#
-# These tests verify the integration between shell wrapper scripts and the
-# Python hook wrapper, following Claude Code hook contracts documented in:
-# doc/platforms/claude/hooks_system.md
-#
-# Hook Contract Summary:
-#   - Exit code 0: Success, stdout parsed as JSON
-#   - Exit code 2: Blocking error, stderr shown (NOT used for JSON format)
-#   - Allow response: {} (empty JSON object)
-#   - Block response: {"decision": "block", "reason": "..."}
-#
-# CRITICAL: Hooks using JSON output format MUST return exit code 0.
-# The "decision" field in the JSON controls blocking behavior, NOT the exit code.
-#
-# DO NOT MODIFY the exit code assertions without first consulting the official
-# Claude Code documentation at: https://docs.anthropic.com/en/docs/claude-code/hooks
-#
-# ******************************************************************************
-
-These tests verify that claude_hook.sh and gemini_hook.sh correctly
-invoke Python hooks and handle input/output.
-"""
-
-import json
-import os
-import subprocess
-from pathlib import Path
-
-import pytest
-
-
-@pytest.fixture
-def hooks_dir() -> Path:
-    """Return the path to the hooks directory."""
-    return Path(__file__).parent.parent.parent / "src" / "deepwork" / "hooks"
-
-
-@pytest.fixture
-def src_dir() -> Path:
-    """Return the path to the src directory for PYTHONPATH."""
-    return Path(__file__).parent.parent.parent / "src"
-
-
-def run_hook_script(
-    script_path: Path,
-    python_module: str,
-    hook_input: dict,
-    platform: str,
-    src_dir: Path,
-) -> tuple[str, str, int]:
-    """
-    Run a hook wrapper script with the given input.
-
-    Args:
-        script_path: Path to the wrapper script (claude_hook.sh or gemini_hook.sh)
-        python_module: Python module to invoke
-        hook_input: JSON input to pass via stdin
-        platform: Platform identifier for env var
-        src_dir: Path to src directory for PYTHONPATH
-
-    Returns:
-        Tuple of (stdout, stderr, return_code)
-    """
-    env = os.environ.copy()
-    env["PYTHONPATH"] = str(src_dir)
-    env["DEEPWORK_HOOK_PLATFORM"] = platform
-
-    result = subprocess.run(
-        ["bash", str(script_path), python_module],
-        capture_output=True,
-        text=True,
-        input=json.dumps(hook_input),
-        env=env,
-    )
-
-    return result.stdout, result.stderr, result.returncode
-
-
-class TestClaudeHookWrapper:
-    """Tests for claude_hook.sh wrapper script."""
-
-    def test_script_exists_and_is_executable(self, hooks_dir: Path) -> None:
-        """Test that the Claude hook script exists and is executable."""
-        script_path = hooks_dir / "claude_hook.sh"
-        assert script_path.exists(), "claude_hook.sh should exist"
-        assert os.access(script_path, os.X_OK), "claude_hook.sh should be executable"
-
-    def test_usage_error_without_module(self, hooks_dir: Path, src_dir: Path) -> None:
-        """Test that script shows usage error when no module provided."""
-        script_path = hooks_dir / "claude_hook.sh"
-        env = os.environ.copy()
-        env["PYTHONPATH"] = str(src_dir)
-
-        result = subprocess.run(
-            ["bash", str(script_path)],
-            capture_output=True,
-            text=True,
-            env=env,
-        )
-
-        assert result.returncode == 1
-        assert "Usage:" in result.stderr
-
-    def test_sets_platform_environment_variable(self, hooks_dir: Path, src_dir: Path) -> None:
-        """Test that the script sets DEEPWORK_HOOK_PLATFORM correctly."""
-        # Create a simple test module that outputs the platform env var
-        # We'll use a Python one-liner via -c
-        script_path = hooks_dir / "claude_hook.sh"
-        env = os.environ.copy()
-        env["PYTHONPATH"] = str(src_dir)
-
-        # We can't easily test this without a real module, so we'll verify
-        # the script exists and has the right content
-        content = script_path.read_text()
-        assert 'DEEPWORK_HOOK_PLATFORM="claude"' in content
-
-
-class TestGeminiHookWrapper:
-    """Tests for gemini_hook.sh wrapper script."""
-
-    def test_script_exists_and_is_executable(self, hooks_dir: Path) -> None:
-        """Test that the Gemini hook script exists and is executable."""
-        script_path = hooks_dir / "gemini_hook.sh"
-        assert script_path.exists(), "gemini_hook.sh should exist"
-        assert os.access(script_path, os.X_OK), "gemini_hook.sh should be executable"
-
-    def test_usage_error_without_module(self, hooks_dir: Path, src_dir: Path) -> None:
-        """Test that script shows usage error when no module provided."""
-        script_path = hooks_dir / "gemini_hook.sh"
-        env = os.environ.copy()
-        env["PYTHONPATH"] = str(src_dir)
-
-        result = subprocess.run(
-            ["bash", str(script_path)],
-            capture_output=True,
-            text=True,
-            env=env,
-        )
-
-        assert result.returncode == 1
-        assert "Usage:" in result.stderr
-
-    def test_sets_platform_environment_variable(self, hooks_dir: Path, src_dir: Path) -> None:
-        """Test that the script sets DEEPWORK_HOOK_PLATFORM correctly."""
-        script_path = hooks_dir / "gemini_hook.sh"
-        content = script_path.read_text()
-        assert 'DEEPWORK_HOOK_PLATFORM="gemini"' in content
-
-
-# ******************************************************************************
-# ***                    DO NOT EDIT EXIT CODE ASSERTIONS!                   ***
-# ******************************************************************************
-#
-# As documented in doc/platforms/claude/hooks_system.md:
-#
-#   | Exit Code | Meaning         | Behavior                          |
-#   |-----------|-----------------|-----------------------------------|
-#   | 0         | Success         | stdout parsed as JSON             |
-#   | 2         | Blocking error  | stderr shown, operation blocked   |
-#   | Other     | Warning         | stderr logged, continues          |
-#
-# CRITICAL: Hooks using JSON output format MUST return exit code 0.
-# The "decision" field in the JSON controls blocking behavior, NOT the exit code.
-#
-# Example valid outputs:
-#   Exit 0 + stdout: {}                                      -> Allow
-#   Exit 0 + stdout: {"decision": "block", "reason": "..."}  -> Block
-#   Exit 0 + stdout: {"decision": "deny", "reason": "..."}   -> Block (Gemini)
-#
-# See: https://docs.anthropic.com/en/docs/claude-code/hooks
-# ******************************************************************************
-class TestHookWrapperIntegration:
-    """Integration tests for hook wrappers with actual Python hooks."""
-
-    @pytest.fixture
-    def test_hook_module(self, tmp_path: Path) -> tuple[Path, str]:
-        """Create a temporary test hook module."""
-        module_dir = tmp_path / "test_hooks"
-        module_dir.mkdir(parents=True)
-
-        # Create __init__.py
-        (module_dir / "__init__.py").write_text("")
-
-        # Create the hook module
-        hook_code = '''
-"""Test hook module."""
-import os
-import sys
-
-from deepwork.hooks.wrapper import (
-    HookInput,
-    HookOutput,
-    NormalizedEvent,
-    Platform,
-    run_hook,
-)
-
-
-def test_hook(hook_input: HookInput) -> HookOutput:
-    """Test hook that blocks for after_agent events."""
-    if hook_input.event == NormalizedEvent.AFTER_AGENT:
-        return HookOutput(decision="block", reason="Test block reason")
-    return HookOutput()
-
-
-def main() -> None:
-    platform_str = os.environ.get("DEEPWORK_HOOK_PLATFORM", "claude")
-    try:
-        platform = Platform(platform_str)
-    except ValueError:
-        platform = Platform.CLAUDE
-
-    exit_code = run_hook(test_hook, platform)
-    sys.exit(exit_code)
-
-
-if __name__ == "__main__":
-    main()
-'''
-        (module_dir / "test_hook.py").write_text(hook_code)
-
-        return tmp_path, "test_hooks.test_hook"
-
-    def test_claude_wrapper_with_stop_event(
-        self,
-        hooks_dir: Path,
-        src_dir: Path,
-        test_hook_module: tuple[Path, str],
-    ) -> None:
-        """Test Claude wrapper processes Stop event correctly."""
-        tmp_path, module_name = test_hook_module
-        script_path = hooks_dir / "claude_hook.sh"
-
-        hook_input = {
-            "session_id": "test123",
-            "hook_event_name": "Stop",
-            "cwd": "/project",
-        }
-
-        env = os.environ.copy()
-        env["PYTHONPATH"] = f"{src_dir}:{tmp_path}"
-
-        result = subprocess.run(
-            ["bash", str(script_path), module_name],
-            capture_output=True,
-            text=True,
-            input=json.dumps(hook_input),
-            env=env,
-        )
-
-        # Exit code 0 even when blocking - the JSON decision field controls behavior
-        assert result.returncode == 0, f"Expected exit code 0. stderr: {result.stderr}"
-
-        output = json.loads(result.stdout.strip())
-        assert output["decision"] == "block"
-        assert "Test block reason" in output["reason"]
-
-    def test_gemini_wrapper_with_afteragent_event(
-        self,
-        hooks_dir: Path,
-        src_dir: Path,
-        test_hook_module: tuple[Path, str],
-    ) -> None:
-        """Test Gemini wrapper processes AfterAgent event correctly."""
-        tmp_path, module_name = test_hook_module
-        script_path = hooks_dir / "gemini_hook.sh"
-
-        hook_input = {
-            "session_id": "test456",
-            "hook_event_name": "AfterAgent",
-            "cwd": "/project",
-        }
-
-        env = os.environ.copy()
-        env["PYTHONPATH"] = f"{src_dir}:{tmp_path}"
-
-        result = subprocess.run(
-            ["bash", str(script_path), module_name],
-            capture_output=True,
-            text=True,
-            input=json.dumps(hook_input),
-            env=env,
-        )
-
-        # Exit code 0 even when blocking - the JSON decision field controls behavior
-        assert result.returncode == 0, f"Expected exit code 0. stderr: {result.stderr}"
-
-        output = json.loads(result.stdout.strip())
-        # Gemini should get "deny" instead of "block"
-        assert output["decision"] == "deny"
-        assert "Test block reason" in output["reason"]
-
-    def test_non_blocking_event(
-        self,
-        hooks_dir: Path,
-        src_dir: Path,
-        test_hook_module: tuple[Path, str],
-    ) -> None:
-        """Test that non-blocking events return exit code 0."""
-        tmp_path, module_name = test_hook_module
-        script_path = hooks_dir / "claude_hook.sh"
-
-        # SessionStart is not blocked by the test hook
-        hook_input = {
-            "session_id": "test789",
-            "hook_event_name": "SessionStart",
-            "cwd": "/project",
-        }
-
-        env = os.environ.copy()
-        env["PYTHONPATH"] = f"{src_dir}:{tmp_path}"
-
-        result = subprocess.run(
-            ["bash", str(script_path), module_name],
-            capture_output=True,
-            text=True,
-            input=json.dumps(hook_input),
-            env=env,
-        )
-
-        assert result.returncode == 0, f"Expected exit code 0. stderr: {result.stderr}"
-        output = json.loads(result.stdout.strip())
-        assert output == {} or output.get("decision", "") not in ("block", "deny")
-
-
-class TestRulesCheckHook:
-    """Tests for the rules_check hook module."""
-
-    def test_module_imports(self) -> None:
-        """Test that the rules_check module can be imported."""
-        from deepwork.hooks import rules_check
-
-        assert hasattr(rules_check, "main")
-        assert hasattr(rules_check, "rules_check_hook")
-
-    def test_hook_function_returns_output(self) -> None:
-        """Test that rules_check_hook returns a HookOutput."""
-        from deepwork.hooks.rules_check import rules_check_hook
-        from deepwork.hooks.wrapper import HookInput, HookOutput, NormalizedEvent, Platform
-
-        # Create a minimal hook input
-        hook_input = HookInput(
-            platform=Platform.CLAUDE,
-            event=NormalizedEvent.BEFORE_PROMPT,  # Not after_agent, so no blocking
-            session_id="test",
-        )
-
-        output = rules_check_hook(hook_input)
-
-        assert isinstance(output, HookOutput)
-        # Should not block for before_prompt event
-        assert output.decision != "block"
diff --git a/tests/shell_script_tests/test_hooks_json_format.py b/tests/shell_script_tests/test_hooks.py
similarity index 55%
rename from tests/shell_script_tests/test_hooks_json_format.py
rename to tests/shell_script_tests/test_hooks.py
index 0d9e793b..74394824 100644
--- a/tests/shell_script_tests/test_hooks_json_format.py
+++ b/tests/shell_script_tests/test_hooks.py
@@ -1,4 +1,4 @@
-"""Tests for Claude Code hooks JSON format validation.
+"""Tests for hook shell scripts and JSON format compliance.
 
 # ******************************************************************************
 # ***                         CRITICAL CONTRACT TESTS                        ***
@@ -12,10 +12,13 @@
 #
 # Hook Contract Summary:
 #   - Exit code 0: Success, stdout parsed as JSON
-#   - Exit code 2: Blocking error, stderr shown
+#   - Exit code 2: Blocking error, stderr shown (NOT used for JSON format)
 #   - Allow response: {} (empty JSON object)
 #   - Block response: {"decision": "block", "reason": "..."}
 #
+# CRITICAL: Hooks using JSON output format MUST return exit code 0.
+# The "decision" field in the JSON controls blocking behavior, NOT the exit code.
+#
 # ******************************************************************************
 
 Claude Code hooks have specific JSON response formats that must be followed:
@@ -40,6 +43,7 @@
 
 import json
 import os
+import subprocess
 import tempfile
 from pathlib import Path
 
@@ -49,15 +53,52 @@
 from .conftest import run_shell_script
 
 
-def run_hook_script(
+# =============================================================================
+# Helper Functions
+# =============================================================================
+
+
+def run_rules_hook_script(
     script_path: Path,
     cwd: Path,
     hook_input: dict | None = None,
 ) -> tuple[str, str, int]:
-    """Run a hook script and return its output."""
+    """Run a rules hook script and return its output."""
     return run_shell_script(script_path, cwd, hook_input=hook_input)
 
 
+def run_platform_wrapper_script(
+    script_path: Path,
+    python_module: str,
+    hook_input: dict,
+    src_dir: Path,
+) -> tuple[str, str, int]:
+    """
+    Run a platform hook wrapper script with the given input.
+
+    Args:
+        script_path: Path to the wrapper script (claude_hook.sh or gemini_hook.sh)
+        python_module: Python module to invoke
+        hook_input: JSON input to pass via stdin
+        src_dir: Path to src directory for PYTHONPATH
+
+    Returns:
+        Tuple of (stdout, stderr, return_code)
+    """
+    env = os.environ.copy()
+    env["PYTHONPATH"] = str(src_dir)
+
+    result = subprocess.run(
+        ["bash", str(script_path), python_module],
+        capture_output=True,
+        text=True,
+        input=json.dumps(hook_input),
+        env=env,
+    )
+
+    return result.stdout, result.stderr, result.returncode
+
+
 def validate_json_output(output: str) -> dict | None:
     """
     Validate that output is valid JSON or empty.
@@ -141,13 +182,87 @@ def validate_prompt_hook_response(response: dict | None) -> None:
     assert isinstance(response, dict), f"Prompt hook output must be a JSON object: {response}"
 
 
-class TestRulesStopHookJsonFormat:
-    """Tests specifically for rules_stop_hook.sh JSON format compliance."""
+# =============================================================================
+# Platform Wrapper Script Tests
+# =============================================================================
+
+
+class TestClaudeHookWrapper:
+    """Tests for claude_hook.sh wrapper script."""
+
+    def test_script_exists_and_is_executable(self, hooks_dir: Path) -> None:
+        """Test that the Claude hook script exists and is executable."""
+        script_path = hooks_dir / "claude_hook.sh"
+        assert script_path.exists(), "claude_hook.sh should exist"
+        assert os.access(script_path, os.X_OK), "claude_hook.sh should be executable"
+
+    def test_usage_error_without_module(self, hooks_dir: Path, src_dir: Path) -> None:
+        """Test that script shows usage error when no module provided."""
+        script_path = hooks_dir / "claude_hook.sh"
+        env = os.environ.copy()
+        env["PYTHONPATH"] = str(src_dir)
+
+        result = subprocess.run(
+            ["bash", str(script_path)],
+            capture_output=True,
+            text=True,
+            env=env,
+        )
+
+        assert result.returncode == 1
+        assert "Usage:" in result.stderr
+
+    def test_sets_platform_environment_variable(self, hooks_dir: Path, src_dir: Path) -> None:
+        """Test that the script sets DEEPWORK_HOOK_PLATFORM correctly."""
+        script_path = hooks_dir / "claude_hook.sh"
+        content = script_path.read_text()
+        assert 'DEEPWORK_HOOK_PLATFORM="claude"' in content
+
+
+class TestGeminiHookWrapper:
+    """Tests for gemini_hook.sh wrapper script."""
+
+    def test_script_exists_and_is_executable(self, hooks_dir: Path) -> None:
+        """Test that the Gemini hook script exists and is executable."""
+        script_path = hooks_dir / "gemini_hook.sh"
+        assert script_path.exists(), "gemini_hook.sh should exist"
+        assert os.access(script_path, os.X_OK), "gemini_hook.sh should be executable"
+
+    def test_usage_error_without_module(self, hooks_dir: Path, src_dir: Path) -> None:
+        """Test that script shows usage error when no module provided."""
+        script_path = hooks_dir / "gemini_hook.sh"
+        env = os.environ.copy()
+        env["PYTHONPATH"] = str(src_dir)
+
+        result = subprocess.run(
+            ["bash", str(script_path)],
+            capture_output=True,
+            text=True,
+            env=env,
+        )
+
+        assert result.returncode == 1
+        assert "Usage:" in result.stderr
+
+    def test_sets_platform_environment_variable(self, hooks_dir: Path, src_dir: Path) -> None:
+        """Test that the script sets DEEPWORK_HOOK_PLATFORM correctly."""
+        script_path = hooks_dir / "gemini_hook.sh"
+        content = script_path.read_text()
+        assert 'DEEPWORK_HOOK_PLATFORM="gemini"' in content
+
+
+# =============================================================================
+# Rules Hook Script Tests
+# =============================================================================
+
+
+class TestRulesStopHook:
+    """Tests for rules_stop_hook.sh JSON format compliance."""
 
     def test_allow_response_is_empty_json(self, rules_hooks_dir: Path, git_repo: Path) -> None:
         """Test that allow response is empty JSON object."""
         script_path = rules_hooks_dir / "rules_stop_hook.sh"
-        stdout, stderr, code = run_hook_script(script_path, git_repo)
+        stdout, stderr, code = run_rules_hook_script(script_path, git_repo)
 
         response = validate_json_output(stdout)
         validate_stop_hook_response(response)
@@ -166,7 +281,7 @@ def test_block_response_has_required_fields(
         repo.index.add(["test.py"])
 
         script_path = rules_hooks_dir / "rules_stop_hook.sh"
-        stdout, stderr, code = run_hook_script(script_path, git_repo_with_rule)
+        stdout, stderr, code = run_rules_hook_script(script_path, git_repo_with_rule)
 
         response = validate_json_output(stdout)
         validate_stop_hook_response(response)
@@ -186,7 +301,7 @@ def test_block_reason_contains_rule_info(
         repo.index.add(["test.py"])
 
         script_path = rules_hooks_dir / "rules_stop_hook.sh"
-        stdout, stderr, code = run_hook_script(script_path, git_repo_with_rule)
+        stdout, stderr, code = run_rules_hook_script(script_path, git_repo_with_rule)
 
         response = validate_json_output(stdout)
 
@@ -206,7 +321,7 @@ def test_no_extraneous_keys_in_response(
         repo.index.add(["test.py"])
 
         script_path = rules_hooks_dir / "rules_stop_hook.sh"
-        stdout, stderr, code = run_hook_script(script_path, git_repo_with_rule)
+        stdout, stderr, code = run_rules_hook_script(script_path, git_repo_with_rule)
 
         response = validate_json_output(stdout)
 
@@ -228,7 +343,7 @@ def test_output_is_single_line_json(
         repo.index.add(["test.py"])
 
         script_path = rules_hooks_dir / "rules_stop_hook.sh"
-        stdout, stderr, code = run_hook_script(script_path, git_repo_with_rule)
+        stdout, stderr, code = run_rules_hook_script(script_path, git_repo_with_rule)
 
         # Remove trailing newline and check for internal newlines
         output = stdout.strip()
@@ -242,13 +357,13 @@ def test_output_is_single_line_json(
             json.loads(json_line)
 
 
-class TestUserPromptSubmitHookJsonFormat:
+class TestUserPromptSubmitHook:
     """Tests for user_prompt_submit.sh JSON format compliance."""
 
     def test_output_is_valid_json_or_empty(self, rules_hooks_dir: Path, git_repo: Path) -> None:
         """Test that output is valid JSON or empty."""
         script_path = rules_hooks_dir / "user_prompt_submit.sh"
-        stdout, stderr, code = run_hook_script(script_path, git_repo)
+        stdout, stderr, code = run_rules_hook_script(script_path, git_repo)
 
         response = validate_json_output(stdout)
         validate_prompt_hook_response(response)
@@ -256,7 +371,7 @@ def test_output_is_valid_json_or_empty(self, rules_hooks_dir: Path, git_repo: Pa
     def test_does_not_block_prompt_submission(self, rules_hooks_dir: Path, git_repo: Path) -> None:
         """Test that hook does not block prompt submission."""
         script_path = rules_hooks_dir / "user_prompt_submit.sh"
-        stdout, stderr, code = run_hook_script(script_path, git_repo)
+        stdout, stderr, code = run_rules_hook_script(script_path, git_repo)
 
         response = validate_json_output(stdout)
 
@@ -267,7 +382,7 @@ def test_does_not_block_prompt_submission(self, rules_hooks_dir: Path, git_repo:
             )
 
 
-class TestHooksJsonFormatWithTranscript:
+class TestHooksWithTranscript:
     """Tests for hook JSON format when using transcript input."""
 
     def test_stop_hook_with_transcript_input(
@@ -295,7 +410,7 @@ def test_stop_hook_with_transcript_input(
         try:
             script_path = rules_hooks_dir / "rules_stop_hook.sh"
             hook_input = {"transcript_path": transcript_path}
-            stdout, stderr, code = run_hook_script(script_path, git_repo_with_rule, hook_input)
+            stdout, stderr, code = run_rules_hook_script(script_path, git_repo_with_rule, hook_input)
 
             response = validate_json_output(stdout)
             validate_stop_hook_response(response)
@@ -335,7 +450,7 @@ def test_stop_hook_with_promise_returns_empty(
         try:
             script_path = rules_hooks_dir / "rules_stop_hook.sh"
             hook_input = {"transcript_path": transcript_path}
-            stdout, stderr, code = run_hook_script(script_path, git_repo_with_rule, hook_input)
+            stdout, stderr, code = run_rules_hook_script(script_path, git_repo_with_rule, hook_input)
 
             response = validate_json_output(stdout)
             validate_stop_hook_response(response)
@@ -366,10 +481,13 @@ def test_stop_hook_with_promise_returns_empty(
 # Example valid outputs:
 #   Exit 0 + stdout: {}                                      -> Allow
 #   Exit 0 + stdout: {"decision": "block", "reason": "..."}  -> Block
+#   Exit 0 + stdout: {"decision": "deny", "reason": "..."}   -> Block (Gemini)
 #
 # See: https://docs.anthropic.com/en/docs/claude-code/hooks
 # ******************************************************************************
-class TestHooksExitCodes:
+
+
+class TestHookExitCodes:
     """Tests for hook script exit codes.
 
     CRITICAL: These tests verify the documented Claude Code hook contract.
@@ -382,7 +500,7 @@ def test_stop_hook_exits_zero_on_allow(self, rules_hooks_dir: Path, git_repo: Pa
         DO NOT CHANGE THIS TEST - it verifies the documented hook contract.
         """
         script_path = rules_hooks_dir / "rules_stop_hook.sh"
-        stdout, stderr, code = run_hook_script(script_path, git_repo)
+        stdout, stderr, code = run_rules_hook_script(script_path, git_repo)
 
         assert code == 0, f"Allow should exit 0. stderr: {stderr}"
 
@@ -400,7 +518,7 @@ def test_stop_hook_exits_zero_on_block(
         repo.index.add(["test.py"])
 
         script_path = rules_hooks_dir / "rules_stop_hook.sh"
-        stdout, stderr, code = run_hook_script(script_path, git_repo_with_rule)
+        stdout, stderr, code = run_rules_hook_script(script_path, git_repo_with_rule)
 
         # Hooks should exit 0 and communicate via JSON
         assert code == 0, f"Block should still exit 0. stderr: {stderr}"
@@ -411,7 +529,7 @@ def test_user_prompt_hook_exits_zero(self, rules_hooks_dir: Path, git_repo: Path
         DO NOT CHANGE THIS TEST - it verifies the documented hook contract.
         """
         script_path = rules_hooks_dir / "user_prompt_submit.sh"
-        stdout, stderr, code = run_hook_script(script_path, git_repo)
+        stdout, stderr, code = run_rules_hook_script(script_path, git_repo)
 
         assert code == 0, f"User prompt hook should exit 0. stderr: {stderr}"
 
@@ -421,6 +539,199 @@ def test_capture_script_exits_zero(self, rules_hooks_dir: Path, git_repo: Path)
         DO NOT CHANGE THIS TEST - it verifies the documented hook contract.
         """
         script_path = rules_hooks_dir / "capture_prompt_work_tree.sh"
-        stdout, stderr, code = run_hook_script(script_path, git_repo)
+        stdout, stderr, code = run_rules_hook_script(script_path, git_repo)
 
         assert code == 0, f"Capture script should exit 0. stderr: {stderr}"
+
+
+# =============================================================================
+# Integration Tests
+# =============================================================================
+
+
+class TestHookWrapperIntegration:
+    """Integration tests for hook wrappers with actual Python hooks."""
+
+    @pytest.fixture
+    def test_hook_module(self, tmp_path: Path) -> tuple[Path, str]:
+        """Create a temporary test hook module."""
+        module_dir = tmp_path / "test_hooks"
+        module_dir.mkdir(parents=True)
+
+        # Create __init__.py
+        (module_dir / "__init__.py").write_text("")
+
+        # Create the hook module
+        hook_code = '''
+"""Test hook module."""
+import os
+import sys
+
+from deepwork.hooks.wrapper import (
+    HookInput,
+    HookOutput,
+    NormalizedEvent,
+    Platform,
+    run_hook,
+)
+
+
+def test_hook(hook_input: HookInput) -> HookOutput:
+    """Test hook that blocks for after_agent events."""
+    if hook_input.event == NormalizedEvent.AFTER_AGENT:
+        return HookOutput(decision="block", reason="Test block reason")
+    return HookOutput()
+
+
+def main() -> None:
+    platform_str = os.environ.get("DEEPWORK_HOOK_PLATFORM", "claude")
+    try:
+        platform = Platform(platform_str)
+    except ValueError:
+        platform = Platform.CLAUDE
+
+    exit_code = run_hook(test_hook, platform)
+    sys.exit(exit_code)
+
+
+if __name__ == "__main__":
+    main()
+'''
+        (module_dir / "test_hook.py").write_text(hook_code)
+
+        return tmp_path, "test_hooks.test_hook"
+
+    def test_claude_wrapper_with_stop_event(
+        self,
+        hooks_dir: Path,
+        src_dir: Path,
+        test_hook_module: tuple[Path, str],
+    ) -> None:
+        """Test Claude wrapper processes Stop event correctly."""
+        tmp_path, module_name = test_hook_module
+        script_path = hooks_dir / "claude_hook.sh"
+
+        hook_input = {
+            "session_id": "test123",
+            "hook_event_name": "Stop",
+            "cwd": "/project",
+        }
+
+        env = os.environ.copy()
+        env["PYTHONPATH"] = f"{src_dir}:{tmp_path}"
+
+        result = subprocess.run(
+            ["bash", str(script_path), module_name],
+            capture_output=True,
+            text=True,
+            input=json.dumps(hook_input),
+            env=env,
+        )
+
+        # Exit code 0 even when blocking - the JSON decision field controls behavior
+        assert result.returncode == 0, f"Expected exit code 0. stderr: {result.stderr}"
+
+        output = json.loads(result.stdout.strip())
+        assert output["decision"] == "block"
+        assert "Test block reason" in output["reason"]
+
+    def test_gemini_wrapper_with_afteragent_event(
+        self,
+        hooks_dir: Path,
+        src_dir: Path,
+        test_hook_module: tuple[Path, str],
+    ) -> None:
+        """Test Gemini wrapper processes AfterAgent event correctly."""
+        tmp_path, module_name = test_hook_module
+        script_path = hooks_dir / "gemini_hook.sh"
+
+        hook_input = {
+            "session_id": "test456",
+            "hook_event_name": "AfterAgent",
+            "cwd": "/project",
+        }
+
+        env = os.environ.copy()
+        env["PYTHONPATH"] = f"{src_dir}:{tmp_path}"
+
+        result = subprocess.run(
+            ["bash", str(script_path), module_name],
+            capture_output=True,
+            text=True,
+            input=json.dumps(hook_input),
+            env=env,
+        )
+
+        # Exit code 0 even when blocking - the JSON decision field controls behavior
+        assert result.returncode == 0, f"Expected exit code 0. stderr: {result.stderr}"
+
+        output = json.loads(result.stdout.strip())
+        # Gemini should get "deny" instead of "block"
+        assert output["decision"] == "deny"
+        assert "Test block reason" in output["reason"]
+
+    def test_non_blocking_event(
+        self,
+        hooks_dir: Path,
+        src_dir: Path,
+        test_hook_module: tuple[Path, str],
+    ) -> None:
+        """Test that non-blocking events return exit code 0."""
+        tmp_path, module_name = test_hook_module
+        script_path = hooks_dir / "claude_hook.sh"
+
+        # SessionStart is not blocked by the test hook
+        hook_input = {
+            "session_id": "test789",
+            "hook_event_name": "SessionStart",
+            "cwd": "/project",
+        }
+
+        env = os.environ.copy()
+        env["PYTHONPATH"] = f"{src_dir}:{tmp_path}"
+
+        result = subprocess.run(
+            ["bash", str(script_path), module_name],
+            capture_output=True,
+            text=True,
+            input=json.dumps(hook_input),
+            env=env,
+        )
+
+        assert result.returncode == 0, f"Expected exit code 0. stderr: {result.stderr}"
+        output = json.loads(result.stdout.strip())
+        assert output == {} or output.get("decision", "") not in ("block", "deny")
+
+
+# =============================================================================
+# Python Module Tests
+# =============================================================================
+
+
+class TestRulesCheckModule:
+    """Tests for the rules_check hook module."""
+
+    def test_module_imports(self) -> None:
+        """Test that the rules_check module can be imported."""
+        from deepwork.hooks import rules_check
+
+        assert hasattr(rules_check, "main")
+        assert hasattr(rules_check, "rules_check_hook")
+
+    def test_hook_function_returns_output(self) -> None:
+        """Test that rules_check_hook returns a HookOutput."""
+        from deepwork.hooks.rules_check import rules_check_hook
+        from deepwork.hooks.wrapper import HookInput, HookOutput, NormalizedEvent, Platform
+
+        # Create a minimal hook input
+        hook_input = HookInput(
+            platform=Platform.CLAUDE,
+            event=NormalizedEvent.BEFORE_PROMPT,  # Not after_agent, so no blocking
+            session_id="test",
+        )
+
+        output = rules_check_hook(hook_input)
+
+        assert isinstance(output, HookOutput)
+        # Should not block for before_prompt event
+        assert output.decision != "block"

From ead2c2bf0e32e94abfd88b7ce9bf1210b567a9e5 Mon Sep 17 00:00:00 2001
From: Noah Horton <noah@unsupervised.com>
Date: Sat, 17 Jan 2026 15:19:19 -0700
Subject: [PATCH 18/21] Format code with ruff

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---
 src/deepwork/cli/install.py            |  4 +---
 src/deepwork/core/rules_parser.py      |  4 +---
 tests/integration/test_install_flow.py |  4 +---
 tests/shell_script_tests/test_hooks.py |  8 ++++++--
 tests/unit/test_command_executor.py    | 12 +++---------
 tests/unit/test_rules_parser.py        |  4 +++-
 tests/unit/test_rules_queue.py         |  4 +---
 7 files changed, 16 insertions(+), 24 deletions(-)

diff --git a/src/deepwork/cli/install.py b/src/deepwork/cli/install.py
index a84bb8ec..1eaca748 100644
--- a/src/deepwork/cli/install.py
+++ b/src/deepwork/cli/install.py
@@ -135,9 +135,7 @@ def _create_rules_directory(project_path: Path) -> bool:
     ensure_dir(rules_dir)
 
     # Copy example rule templates from the deepwork_rules standard job
-    example_rules_dir = (
-        Path(__file__).parent.parent / "standard_jobs" / "deepwork_rules" / "rules"
-    )
+    example_rules_dir = Path(__file__).parent.parent / "standard_jobs" / "deepwork_rules" / "rules"
 
     if example_rules_dir.exists():
         # Copy all .example files
diff --git a/src/deepwork/core/rules_parser.py b/src/deepwork/core/rules_parser.py
index 270d1ba2..1de83a6c 100644
--- a/src/deepwork/core/rules_parser.py
+++ b/src/deepwork/core/rules_parser.py
@@ -453,9 +453,7 @@ def evaluate_rule(rule: Rule, changed_files: list[str]) -> RuleEvaluationResult:
         )
 
     elif rule.detection_mode == DetectionMode.SET:
-        should_fire, trigger_files, missing_files = evaluate_set_correspondence(
-            rule, changed_files
-        )
+        should_fire, trigger_files, missing_files = evaluate_set_correspondence(rule, changed_files)
         return RuleEvaluationResult(
             rule=rule,
             should_fire=should_fire,
diff --git a/tests/integration/test_install_flow.py b/tests/integration/test_install_flow.py
index a6a40659..23037f65 100644
--- a/tests/integration/test_install_flow.py
+++ b/tests/integration/test_install_flow.py
@@ -180,9 +180,7 @@ def test_install_creates_rules_directory(self, mock_claude_project: Path) -> Non
         example_files = list(rules_dir.glob("*.md.example"))
         assert len(example_files) >= 1  # At least one example template
 
-    def test_install_preserves_existing_rules_directory(
-        self, mock_claude_project: Path
-    ) -> None:
+    def test_install_preserves_existing_rules_directory(self, mock_claude_project: Path) -> None:
         """Test that install doesn't overwrite existing rules directory."""
         runner = CliRunner()
 
diff --git a/tests/shell_script_tests/test_hooks.py b/tests/shell_script_tests/test_hooks.py
index 74394824..832f3535 100644
--- a/tests/shell_script_tests/test_hooks.py
+++ b/tests/shell_script_tests/test_hooks.py
@@ -410,7 +410,9 @@ def test_stop_hook_with_transcript_input(
         try:
             script_path = rules_hooks_dir / "rules_stop_hook.sh"
             hook_input = {"transcript_path": transcript_path}
-            stdout, stderr, code = run_rules_hook_script(script_path, git_repo_with_rule, hook_input)
+            stdout, stderr, code = run_rules_hook_script(
+                script_path, git_repo_with_rule, hook_input
+            )
 
             response = validate_json_output(stdout)
             validate_stop_hook_response(response)
@@ -450,7 +452,9 @@ def test_stop_hook_with_promise_returns_empty(
         try:
             script_path = rules_hooks_dir / "rules_stop_hook.sh"
             hook_input = {"transcript_path": transcript_path}
-            stdout, stderr, code = run_rules_hook_script(script_path, git_repo_with_rule, hook_input)
+            stdout, stderr, code = run_rules_hook_script(
+                script_path, git_repo_with_rule, hook_input
+            )
 
             response = validate_json_output(stdout)
             validate_stop_hook_response(response)
diff --git a/tests/unit/test_command_executor.py b/tests/unit/test_command_executor.py
index d6b88ff3..f22ab24f 100644
--- a/tests/unit/test_command_executor.py
+++ b/tests/unit/test_command_executor.py
@@ -179,12 +179,8 @@ def test_single_error(self) -> None:
     def test_multiple_errors(self) -> None:
         """Format multiple errors."""
         results = [
-            CommandResult(
-                success=False, exit_code=1, stdout="", stderr="Error 1", command="cmd1"
-            ),
-            CommandResult(
-                success=False, exit_code=2, stdout="", stderr="Error 2", command="cmd2"
-            ),
+            CommandResult(success=False, exit_code=1, stdout="", stderr="Error 1", command="cmd1"),
+            CommandResult(success=False, exit_code=2, stdout="", stderr="Error 2", command="cmd2"),
         ]
         output = format_command_errors(results)
         assert "cmd1" in output
@@ -196,9 +192,7 @@ def test_ignores_success(self) -> None:
         """Ignore successful commands."""
         results = [
             CommandResult(success=True, exit_code=0, stdout="ok", stderr="", command="good_cmd"),
-            CommandResult(
-                success=False, exit_code=1, stdout="", stderr="bad", command="bad_cmd"
-            ),
+            CommandResult(success=False, exit_code=1, stdout="", stderr="bad", command="bad_cmd"),
         ]
         output = format_command_errors(results)
         assert "good_cmd" not in output
diff --git a/tests/unit/test_rules_parser.py b/tests/unit/test_rules_parser.py
index f764edf7..fdfa62b8 100644
--- a/tests/unit/test_rules_parser.py
+++ b/tests/unit/test_rules_parser.py
@@ -565,7 +565,9 @@ def test_different_names_fire_both(self) -> None:
         result = evaluate_rule(rule, changed_files)
         assert result.should_fire is True
         # Both trigger because each is incomplete
-        assert "models/user.py" in result.trigger_files or "schemas/order.py" in result.trigger_files
+        assert (
+            "models/user.py" in result.trigger_files or "schemas/order.py" in result.trigger_files
+        )
 
 
 class TestCorrespondencePairs:
diff --git a/tests/unit/test_rules_queue.py b/tests/unit/test_rules_queue.py
index 4b66ea7d..fdde0045 100644
--- a/tests/unit/test_rules_queue.py
+++ b/tests/unit/test_rules_queue.py
@@ -207,9 +207,7 @@ def test_update_status_to_failed(self, queue: RulesQueue) -> None:
         assert entry is not None
 
         action_result = ActionResult(type="command", output="error", exit_code=1)
-        success = queue.update_status(
-            entry.trigger_hash, QueueEntryStatus.FAILED, action_result
-        )
+        success = queue.update_status(entry.trigger_hash, QueueEntryStatus.FAILED, action_result)
         assert success is True
 
         updated = queue.get_entry(entry.trigger_hash)

From 78ed5d943ef9ff79111e9a770d26401cf4481901 Mon Sep 17 00:00:00 2001
From: Noah Horton <noah@unsupervised.com>
Date: Sat, 17 Jan 2026 15:20:53 -0700
Subject: [PATCH 19/21] Fix ruff linting errors (unused imports, import
 sorting)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---
 tests/shell_script_tests/test_hooks.py | 1 -
 tests/unit/test_command_executor.py    | 2 --
 tests/unit/test_rules_parser.py        | 6 +-----
 tests/unit/test_rules_queue.py         | 1 -
 4 files changed, 1 insertion(+), 9 deletions(-)

diff --git a/tests/shell_script_tests/test_hooks.py b/tests/shell_script_tests/test_hooks.py
index 832f3535..bd59684d 100644
--- a/tests/shell_script_tests/test_hooks.py
+++ b/tests/shell_script_tests/test_hooks.py
@@ -52,7 +52,6 @@
 
 from .conftest import run_shell_script
 
-
 # =============================================================================
 # Helper Functions
 # =============================================================================
diff --git a/tests/unit/test_command_executor.py b/tests/unit/test_command_executor.py
index f22ab24f..77d7b320 100644
--- a/tests/unit/test_command_executor.py
+++ b/tests/unit/test_command_executor.py
@@ -2,8 +2,6 @@
 
 from pathlib import Path
 
-import pytest
-
 from deepwork.core.command_executor import (
     CommandResult,
     all_commands_succeeded,
diff --git a/tests/unit/test_rules_parser.py b/tests/unit/test_rules_parser.py
index fdfa62b8..4aedea67 100644
--- a/tests/unit/test_rules_parser.py
+++ b/tests/unit/test_rules_parser.py
@@ -2,17 +2,13 @@
 
 from pathlib import Path
 
-import pytest
-
 from deepwork.core.pattern_matcher import matches_any_pattern as matches_pattern
 from deepwork.core.rules_parser import (
-    DEFAULT_COMPARE_TO,
     DetectionMode,
     PairConfig,
     Rule,
-    RulesParseError,
-    evaluate_rules,
     evaluate_rule,
+    evaluate_rules,
     load_rules_from_directory,
 )
 
diff --git a/tests/unit/test_rules_queue.py b/tests/unit/test_rules_queue.py
index fdde0045..8c35d06d 100644
--- a/tests/unit/test_rules_queue.py
+++ b/tests/unit/test_rules_queue.py
@@ -1,6 +1,5 @@
 """Tests for rules queue system (QS-6.x from test_scenarios.md)."""
 
-import json
 from pathlib import Path
 
 import pytest

From cf756ddf1df38c457c8a8d95e778e1ff65660d1c Mon Sep 17 00:00:00 2001
From: Noah Horton <noah@unsupervised.com>
Date: Sat, 17 Jan 2026 15:55:15 -0700
Subject: [PATCH 20/21] Cleanup hooks and wrappers

---
 .claude/settings.json                         |   2 +-
 .../deepwork_rules/hooks/global_hooks.yml     |   4 +-
 .../deepwork_rules/hooks/rules_stop_hook.sh   |  43 --------
 doc/architecture.md                           |   8 +-
 src/deepwork/core/hooks_syncer.py             |  76 ++++++++-----
 .../deepwork_rules/hooks/global_hooks.yml     |   4 +-
 .../deepwork_rules/hooks/rules_stop_hook.sh   |  43 --------
 tests/shell_script_tests/README.md            |  12 +--
 tests/shell_script_tests/test_hooks.py        |  88 ++++++++-------
 .../test_rules_stop_hook.py                   | 102 +++++++++---------
 tests/unit/test_hooks_syncer.py               |  70 +++++++++---
 11 files changed, 224 insertions(+), 228 deletions(-)
 delete mode 100755 .deepwork/jobs/deepwork_rules/hooks/rules_stop_hook.sh
 delete mode 100755 src/deepwork/standard_jobs/deepwork_rules/hooks/rules_stop_hook.sh

diff --git a/.claude/settings.json b/.claude/settings.json
index 84a93bed..d2fd4875 100644
--- a/.claude/settings.json
+++ b/.claude/settings.json
@@ -108,7 +108,7 @@
         "hooks": [
           {
             "type": "command",
-            "command": ".deepwork/jobs/deepwork_rules/hooks/rules_stop_hook.sh"
+            "command": "python -m deepwork.hooks.rules_check"
           }
         ]
       }
diff --git a/.deepwork/jobs/deepwork_rules/hooks/global_hooks.yml b/.deepwork/jobs/deepwork_rules/hooks/global_hooks.yml
index f76202ab..a310d31a 100644
--- a/.deepwork/jobs/deepwork_rules/hooks/global_hooks.yml
+++ b/.deepwork/jobs/deepwork_rules/hooks/global_hooks.yml
@@ -1,8 +1,8 @@
 # DeepWork Rules Hooks Configuration
-# Maps Claude Code lifecycle events to hook scripts
+# Maps lifecycle events to hook scripts or Python modules
 
 UserPromptSubmit:
   - user_prompt_submit.sh
 
 Stop:
-  - rules_stop_hook.sh
+  - module: deepwork.hooks.rules_check
diff --git a/.deepwork/jobs/deepwork_rules/hooks/rules_stop_hook.sh b/.deepwork/jobs/deepwork_rules/hooks/rules_stop_hook.sh
deleted file mode 100755
index 20fa8a3f..00000000
--- a/.deepwork/jobs/deepwork_rules/hooks/rules_stop_hook.sh
+++ /dev/null
@@ -1,43 +0,0 @@
-#!/bin/bash
-# rules_stop_hook.sh - Evaluates rules when the agent stops
-#
-# This script is called as a Claude Code Stop hook. It:
-# 1. Evaluates rules from .deepwork/rules/
-# 2. Computes changed files based on each rule's compare_to setting
-# 3. Checks for <promise> tags in the conversation transcript
-# 4. Returns JSON to block stop if rules need attention
-
-set -e
-
-# Check if rules directory exists with .md files
-RULES_DIR=".deepwork/rules"
-
-if [ ! -d "${RULES_DIR}" ]; then
-    # No rules directory, nothing to do
-    exit 0
-fi
-
-# Check if there are any .md files
-if ! ls "${RULES_DIR}"/*.md 1>/dev/null 2>&1; then
-    # No rule files, nothing to do
-    exit 0
-fi
-
-# Read the hook input JSON from stdin
-HOOK_INPUT=""
-if [ ! -t 0 ]; then
-    HOOK_INPUT=$(cat)
-fi
-
-# Call the Python rules evaluator via the cross-platform wrapper
-# The wrapper reads JSON input and handles transcript extraction
-# Note: exit code 2 means "block" which is valid (not an error), so capture it
-result=$(echo "${HOOK_INPUT}" | DEEPWORK_HOOK_PLATFORM=claude DEEPWORK_HOOK_EVENT=Stop python -m deepwork.hooks.rules_check 2>/dev/null) || true
-
-# If no output (error case), provide empty JSON as fallback
-if [ -z "${result}" ]; then
-    result='{}'
-fi
-
-# Output the result (JSON for Claude Code hooks)
-echo "${result}"
diff --git a/doc/architecture.md b/doc/architecture.md
index a0a0e959..29400973 100644
--- a/doc/architecture.md
+++ b/doc/architecture.md
@@ -73,8 +73,7 @@ deepwork/                       # DeepWork tool repository
 │       │       └── hooks/         # Hook scripts
 │       │           ├── global_hooks.yml
 │       │           ├── user_prompt_submit.sh
-│       │           ├── capture_prompt_work_tree.sh
-│       │           └── rules_stop_hook.sh
+│       │           └── capture_prompt_work_tree.sh
 │       ├── schemas/            # Definition schemas
 │       │   ├── job_schema.py
 │       │   └── rules_schema.py
@@ -307,8 +306,7 @@ my-project/                     # User's project (target)
 │       │   └── hooks/          # Hook scripts (installed from standard_jobs)
 │       │       ├── global_hooks.yml
 │       │       ├── user_prompt_submit.sh
-│       │       ├── capture_prompt_work_tree.sh
-│       │       └── rules_stop_hook.sh
+│       │       └── capture_prompt_work_tree.sh
 │       ├── competitive_research/
 │       │   ├── job.yml         # Job metadata
 │       │   └── steps/
@@ -1135,7 +1133,7 @@ The hooks are installed to `.claude/settings.json` during `deepwork sync`:
 {
   "hooks": {
     "Stop": [
-      {"matcher": "", "hooks": [{"type": "command", "command": ".deepwork/jobs/deepwork_rules/hooks/rules_stop_hook.sh"}]}
+      {"matcher": "", "hooks": [{"type": "command", "command": "python -m deepwork.hooks.rules_check"}]}
     ]
   }
 }
diff --git a/src/deepwork/core/hooks_syncer.py b/src/deepwork/core/hooks_syncer.py
index 65257ec2..5df2e74f 100644
--- a/src/deepwork/core/hooks_syncer.py
+++ b/src/deepwork/core/hooks_syncer.py
@@ -19,27 +19,42 @@ class HooksSyncError(Exception):
 class HookEntry:
     """Represents a single hook entry for a lifecycle event."""
 
-    script: str  # Script filename
     job_name: str  # Job that provides this hook
     job_dir: Path  # Full path to job directory
+    script: str | None = None  # Script filename (if script-based hook)
+    module: str | None = None  # Python module (if module-based hook)
 
-    def get_script_path(self, project_path: Path) -> str:
+    def get_command(self, project_path: Path) -> str:
         """
-        Get the script path relative to project root.
+        Get the command to run this hook.
 
         Args:
             project_path: Path to project root
 
         Returns:
-            Relative path to script from project root
+            Command string to execute
         """
-        # Script path is: .deepwork/jobs/{job_name}/hooks/{script}
-        script_path = self.job_dir / "hooks" / self.script
-        try:
-            return str(script_path.relative_to(project_path))
-        except ValueError:
-            # If not relative, return the full path
-            return str(script_path)
+        if self.module:
+            # Python module - run directly with python -m
+            return f"python -m {self.module}"
+        elif self.script:
+            # Script path is: .deepwork/jobs/{job_name}/hooks/{script}
+            script_path = self.job_dir / "hooks" / self.script
+            try:
+                return str(script_path.relative_to(project_path))
+            except ValueError:
+                # If not relative, return the full path
+                return str(script_path)
+        else:
+            raise ValueError("HookEntry must have either script or module")
+
+
+@dataclass
+class HookSpec:
+    """Specification for a single hook (either script or module)."""
+
+    script: str | None = None
+    module: str | None = None
 
 
 @dataclass
@@ -48,7 +63,7 @@ class JobHooks:
 
     job_name: str
     job_dir: Path
-    hooks: dict[str, list[str]] = field(default_factory=dict)  # event -> [scripts]
+    hooks: dict[str, list[HookSpec]] = field(default_factory=dict)  # event -> [HookSpec]
 
     @classmethod
     def from_job_dir(cls, job_dir: Path) -> "JobHooks | None":
@@ -74,13 +89,23 @@ def from_job_dir(cls, job_dir: Path) -> "JobHooks | None":
         if not data or not isinstance(data, dict):
             return None
 
-        # Parse hooks - each key is an event, value is list of scripts
-        hooks: dict[str, list[str]] = {}
-        for event, scripts in data.items():
-            if isinstance(scripts, list):
-                hooks[event] = [str(s) for s in scripts]
-            elif isinstance(scripts, str):
-                hooks[event] = [scripts]
+        # Parse hooks - each key is an event, value is list of scripts or module specs
+        hooks: dict[str, list[HookSpec]] = {}
+        for event, entries in data.items():
+            if not isinstance(entries, list):
+                entries = [entries]
+
+            hook_specs: list[HookSpec] = []
+            for entry in entries:
+                if isinstance(entry, str):
+                    # Simple script filename
+                    hook_specs.append(HookSpec(script=entry))
+                elif isinstance(entry, dict) and "module" in entry:
+                    # Python module specification
+                    hook_specs.append(HookSpec(module=entry["module"]))
+
+            if hook_specs:
+                hooks[event] = hook_specs
 
         if not hooks:
             return None
@@ -134,17 +159,18 @@ def merge_hooks_for_platform(
     merged: dict[str, list[dict[str, Any]]] = {}
 
     for job_hooks in job_hooks_list:
-        for event, scripts in job_hooks.hooks.items():
+        for event, hook_specs in job_hooks.hooks.items():
             if event not in merged:
                 merged[event] = []
 
-            for script in scripts:
+            for spec in hook_specs:
                 entry = HookEntry(
-                    script=script,
                     job_name=job_hooks.job_name,
                     job_dir=job_hooks.job_dir,
+                    script=spec.script,
+                    module=spec.module,
                 )
-                script_path = entry.get_script_path(project_path)
+                command = entry.get_command(project_path)
 
                 # Create hook configuration for Claude Code format
                 hook_config = {
@@ -152,13 +178,13 @@ def merge_hooks_for_platform(
                     "hooks": [
                         {
                             "type": "command",
-                            "command": script_path,
+                            "command": command,
                         }
                     ],
                 }
 
                 # Check if this hook is already present (avoid duplicates)
-                if not _hook_already_present(merged[event], script_path):
+                if not _hook_already_present(merged[event], command):
                     merged[event].append(hook_config)
 
     return merged
diff --git a/src/deepwork/standard_jobs/deepwork_rules/hooks/global_hooks.yml b/src/deepwork/standard_jobs/deepwork_rules/hooks/global_hooks.yml
index f76202ab..a310d31a 100644
--- a/src/deepwork/standard_jobs/deepwork_rules/hooks/global_hooks.yml
+++ b/src/deepwork/standard_jobs/deepwork_rules/hooks/global_hooks.yml
@@ -1,8 +1,8 @@
 # DeepWork Rules Hooks Configuration
-# Maps Claude Code lifecycle events to hook scripts
+# Maps lifecycle events to hook scripts or Python modules
 
 UserPromptSubmit:
   - user_prompt_submit.sh
 
 Stop:
-  - rules_stop_hook.sh
+  - module: deepwork.hooks.rules_check
diff --git a/src/deepwork/standard_jobs/deepwork_rules/hooks/rules_stop_hook.sh b/src/deepwork/standard_jobs/deepwork_rules/hooks/rules_stop_hook.sh
deleted file mode 100755
index 20fa8a3f..00000000
--- a/src/deepwork/standard_jobs/deepwork_rules/hooks/rules_stop_hook.sh
+++ /dev/null
@@ -1,43 +0,0 @@
-#!/bin/bash
-# rules_stop_hook.sh - Evaluates rules when the agent stops
-#
-# This script is called as a Claude Code Stop hook. It:
-# 1. Evaluates rules from .deepwork/rules/
-# 2. Computes changed files based on each rule's compare_to setting
-# 3. Checks for <promise> tags in the conversation transcript
-# 4. Returns JSON to block stop if rules need attention
-
-set -e
-
-# Check if rules directory exists with .md files
-RULES_DIR=".deepwork/rules"
-
-if [ ! -d "${RULES_DIR}" ]; then
-    # No rules directory, nothing to do
-    exit 0
-fi
-
-# Check if there are any .md files
-if ! ls "${RULES_DIR}"/*.md 1>/dev/null 2>&1; then
-    # No rule files, nothing to do
-    exit 0
-fi
-
-# Read the hook input JSON from stdin
-HOOK_INPUT=""
-if [ ! -t 0 ]; then
-    HOOK_INPUT=$(cat)
-fi
-
-# Call the Python rules evaluator via the cross-platform wrapper
-# The wrapper reads JSON input and handles transcript extraction
-# Note: exit code 2 means "block" which is valid (not an error), so capture it
-result=$(echo "${HOOK_INPUT}" | DEEPWORK_HOOK_PLATFORM=claude DEEPWORK_HOOK_EVENT=Stop python -m deepwork.hooks.rules_check 2>/dev/null) || true
-
-# If no output (error case), provide empty JSON as fallback
-if [ -z "${result}" ]; then
-    result='{}'
-fi
-
-# Output the result (JSON for Claude Code hooks)
-echo "${result}"
diff --git a/tests/shell_script_tests/README.md b/tests/shell_script_tests/README.md
index 983ad4ec..76cd8f05 100644
--- a/tests/shell_script_tests/README.md
+++ b/tests/shell_script_tests/README.md
@@ -1,12 +1,12 @@
 # Shell Script Tests
 
-Automated tests for DeepWork shell scripts, with a focus on validating Claude Code hooks JSON response formats.
+Automated tests for DeepWork shell scripts and hooks, with a focus on validating Claude Code hooks JSON response formats.
 
-## Scripts Tested
+## Hooks and Scripts Tested
 
-| Script | Type | Description |
-|--------|------|-------------|
-| `rules_stop_hook.sh` | Stop Hook | Evaluates rules and blocks agent stop if rules are triggered |
+| Hook/Script | Type | Description |
+|-------------|------|-------------|
+| `deepwork.hooks.rules_check` | Stop Hook (Python) | Evaluates rules and blocks agent stop if rules are triggered |
 | `user_prompt_submit.sh` | UserPromptSubmit Hook | Captures work tree state when user submits a prompt |
 | `capture_prompt_work_tree.sh` | Helper | Records current git state for `compare_to: prompt` rules |
 | `make_new_job.sh` | Utility | Creates directory structure for new DeepWork jobs |
@@ -49,10 +49,10 @@ uv run pytest tests/shell_script_tests/ --cov=src/deepwork
 ```
 tests/shell_script_tests/
 ├── conftest.py                      # Shared fixtures and helpers
+├── test_hooks.py                    # Consolidated hook tests (JSON format, exit codes)
 ├── test_rules_stop_hook.py          # Stop hook blocking/allowing tests
 ├── test_user_prompt_submit.py       # Prompt submission hook tests
 ├── test_capture_prompt_work_tree.py # Work tree capture tests
-├── test_hooks_json_format.py        # JSON format validation tests
 └── test_make_new_job.py             # Job directory creation tests
 ```
 
diff --git a/tests/shell_script_tests/test_hooks.py b/tests/shell_script_tests/test_hooks.py
index bd59684d..4f6f8e32 100644
--- a/tests/shell_script_tests/test_hooks.py
+++ b/tests/shell_script_tests/test_hooks.py
@@ -66,6 +66,31 @@ def run_rules_hook_script(
     return run_shell_script(script_path, cwd, hook_input=hook_input)
 
 
+def run_rules_check_module(
+    cwd: Path,
+    hook_input: dict | None = None,
+    src_dir: Path | None = None,
+) -> tuple[str, str, int]:
+    """Run the rules_check Python module directly and return its output."""
+    env = os.environ.copy()
+    env["DEEPWORK_HOOK_PLATFORM"] = "claude"
+    if src_dir:
+        env["PYTHONPATH"] = str(src_dir)
+
+    stdin_data = json.dumps(hook_input) if hook_input else ""
+
+    result = subprocess.run(
+        ["python", "-m", "deepwork.hooks.rules_check"],
+        cwd=cwd,
+        capture_output=True,
+        text=True,
+        input=stdin_data,
+        env=env,
+    )
+
+    return result.stdout, result.stderr, result.returncode
+
+
 def run_platform_wrapper_script(
     script_path: Path,
     python_module: str,
@@ -256,12 +281,11 @@ def test_sets_platform_environment_variable(self, hooks_dir: Path, src_dir: Path
 
 
 class TestRulesStopHook:
-    """Tests for rules_stop_hook.sh JSON format compliance."""
+    """Tests for rules stop hook (deepwork.hooks.rules_check) JSON format compliance."""
 
-    def test_allow_response_is_empty_json(self, rules_hooks_dir: Path, git_repo: Path) -> None:
+    def test_allow_response_is_empty_json(self, src_dir: Path, git_repo: Path) -> None:
         """Test that allow response is empty JSON object."""
-        script_path = rules_hooks_dir / "rules_stop_hook.sh"
-        stdout, stderr, code = run_rules_hook_script(script_path, git_repo)
+        stdout, stderr, code = run_rules_check_module(git_repo, src_dir=src_dir)
 
         response = validate_json_output(stdout)
         validate_stop_hook_response(response)
@@ -270,7 +294,7 @@ def test_allow_response_is_empty_json(self, rules_hooks_dir: Path, git_repo: Pat
             assert response == {}, f"Allow response should be empty: {response}"
 
     def test_block_response_has_required_fields(
-        self, rules_hooks_dir: Path, git_repo_with_rule: Path
+        self, src_dir: Path, git_repo_with_rule: Path
     ) -> None:
         """Test that block response has decision and reason."""
         # Create a file that triggers the rule
@@ -279,8 +303,7 @@ def test_block_response_has_required_fields(
         repo = Repo(git_repo_with_rule)
         repo.index.add(["test.py"])
 
-        script_path = rules_hooks_dir / "rules_stop_hook.sh"
-        stdout, stderr, code = run_rules_hook_script(script_path, git_repo_with_rule)
+        stdout, stderr, code = run_rules_check_module(git_repo_with_rule, src_dir=src_dir)
 
         response = validate_json_output(stdout)
         validate_stop_hook_response(response)
@@ -290,17 +313,14 @@ def test_block_response_has_required_fields(
         assert response.get("decision") == "block", "Expected block decision"
         assert "reason" in response, "Expected reason field"
 
-    def test_block_reason_contains_rule_info(
-        self, rules_hooks_dir: Path, git_repo_with_rule: Path
-    ) -> None:
+    def test_block_reason_contains_rule_info(self, src_dir: Path, git_repo_with_rule: Path) -> None:
         """Test that block reason contains rule information."""
         py_file = git_repo_with_rule / "test.py"
         py_file.write_text("# Python file\n")
         repo = Repo(git_repo_with_rule)
         repo.index.add(["test.py"])
 
-        script_path = rules_hooks_dir / "rules_stop_hook.sh"
-        stdout, stderr, code = run_rules_hook_script(script_path, git_repo_with_rule)
+        stdout, stderr, code = run_rules_check_module(git_repo_with_rule, src_dir=src_dir)
 
         response = validate_json_output(stdout)
 
@@ -310,17 +330,14 @@ def test_block_reason_contains_rule_info(
         # Should contain useful rule information
         assert "Rule" in reason or "rule" in reason, f"Reason should mention rule: {reason}"
 
-    def test_no_extraneous_keys_in_response(
-        self, rules_hooks_dir: Path, git_repo_with_rule: Path
-    ) -> None:
+    def test_no_extraneous_keys_in_response(self, src_dir: Path, git_repo_with_rule: Path) -> None:
         """Test that response only contains expected keys."""
         py_file = git_repo_with_rule / "test.py"
         py_file.write_text("# Python file\n")
         repo = Repo(git_repo_with_rule)
         repo.index.add(["test.py"])
 
-        script_path = rules_hooks_dir / "rules_stop_hook.sh"
-        stdout, stderr, code = run_rules_hook_script(script_path, git_repo_with_rule)
+        stdout, stderr, code = run_rules_check_module(git_repo_with_rule, src_dir=src_dir)
 
         response = validate_json_output(stdout)
 
@@ -332,17 +349,14 @@ def test_no_extraneous_keys_in_response(
                 f"Unexpected keys in response: {actual_keys - valid_keys}"
             )
 
-    def test_output_is_single_line_json(
-        self, rules_hooks_dir: Path, git_repo_with_rule: Path
-    ) -> None:
+    def test_output_is_single_line_json(self, src_dir: Path, git_repo_with_rule: Path) -> None:
         """Test that JSON output is single-line (no pretty printing)."""
         py_file = git_repo_with_rule / "test.py"
         py_file.write_text("# Python file\n")
         repo = Repo(git_repo_with_rule)
         repo.index.add(["test.py"])
 
-        script_path = rules_hooks_dir / "rules_stop_hook.sh"
-        stdout, stderr, code = run_rules_hook_script(script_path, git_repo_with_rule)
+        stdout, stderr, code = run_rules_check_module(git_repo_with_rule, src_dir=src_dir)
 
         # Remove trailing newline and check for internal newlines
         output = stdout.strip()
@@ -384,9 +398,7 @@ def test_does_not_block_prompt_submission(self, rules_hooks_dir: Path, git_repo:
 class TestHooksWithTranscript:
     """Tests for hook JSON format when using transcript input."""
 
-    def test_stop_hook_with_transcript_input(
-        self, rules_hooks_dir: Path, git_repo_with_rule: Path
-    ) -> None:
+    def test_stop_hook_with_transcript_input(self, src_dir: Path, git_repo_with_rule: Path) -> None:
         """Test stop hook JSON format when transcript is provided."""
         py_file = git_repo_with_rule / "test.py"
         py_file.write_text("# Python file\n")
@@ -407,10 +419,9 @@ def test_stop_hook_with_transcript_input(
             f.write("\n")
 
         try:
-            script_path = rules_hooks_dir / "rules_stop_hook.sh"
             hook_input = {"transcript_path": transcript_path}
-            stdout, stderr, code = run_rules_hook_script(
-                script_path, git_repo_with_rule, hook_input
+            stdout, stderr, code = run_rules_check_module(
+                git_repo_with_rule, hook_input, src_dir=src_dir
             )
 
             response = validate_json_output(stdout)
@@ -420,7 +431,7 @@ def test_stop_hook_with_transcript_input(
             os.unlink(transcript_path)
 
     def test_stop_hook_with_promise_returns_empty(
-        self, rules_hooks_dir: Path, git_repo_with_rule: Path
+        self, src_dir: Path, git_repo_with_rule: Path
     ) -> None:
         """Test that promised rules return empty JSON."""
         py_file = git_repo_with_rule / "test.py"
@@ -449,10 +460,9 @@ def test_stop_hook_with_promise_returns_empty(
             f.write("\n")
 
         try:
-            script_path = rules_hooks_dir / "rules_stop_hook.sh"
             hook_input = {"transcript_path": transcript_path}
-            stdout, stderr, code = run_rules_hook_script(
-                script_path, git_repo_with_rule, hook_input
+            stdout, stderr, code = run_rules_check_module(
+                git_repo_with_rule, hook_input, src_dir=src_dir
             )
 
             response = validate_json_output(stdout)
@@ -491,25 +501,22 @@ def test_stop_hook_with_promise_returns_empty(
 
 
 class TestHookExitCodes:
-    """Tests for hook script exit codes.
+    """Tests for hook exit codes.
 
     CRITICAL: These tests verify the documented Claude Code hook contract.
     All hooks MUST exit 0 when using JSON output format.
     """
 
-    def test_stop_hook_exits_zero_on_allow(self, rules_hooks_dir: Path, git_repo: Path) -> None:
+    def test_stop_hook_exits_zero_on_allow(self, src_dir: Path, git_repo: Path) -> None:
         """Test that stop hook exits 0 when allowing.
 
         DO NOT CHANGE THIS TEST - it verifies the documented hook contract.
         """
-        script_path = rules_hooks_dir / "rules_stop_hook.sh"
-        stdout, stderr, code = run_rules_hook_script(script_path, git_repo)
+        stdout, stderr, code = run_rules_check_module(git_repo, src_dir=src_dir)
 
         assert code == 0, f"Allow should exit 0. stderr: {stderr}"
 
-    def test_stop_hook_exits_zero_on_block(
-        self, rules_hooks_dir: Path, git_repo_with_rule: Path
-    ) -> None:
+    def test_stop_hook_exits_zero_on_block(self, src_dir: Path, git_repo_with_rule: Path) -> None:
         """Test that stop hook exits 0 even when blocking.
 
         DO NOT CHANGE THIS TEST - it verifies the documented hook contract.
@@ -520,8 +527,7 @@ def test_stop_hook_exits_zero_on_block(
         repo = Repo(git_repo_with_rule)
         repo.index.add(["test.py"])
 
-        script_path = rules_hooks_dir / "rules_stop_hook.sh"
-        stdout, stderr, code = run_rules_hook_script(script_path, git_repo_with_rule)
+        stdout, stderr, code = run_rules_check_module(git_repo_with_rule, src_dir=src_dir)
 
         # Hooks should exit 0 and communicate via JSON
         assert code == 0, f"Block should still exit 0. stderr: {stderr}"
diff --git a/tests/shell_script_tests/test_rules_stop_hook.py b/tests/shell_script_tests/test_rules_stop_hook.py
index 605f0fcb..9aeb3306 100644
--- a/tests/shell_script_tests/test_rules_stop_hook.py
+++ b/tests/shell_script_tests/test_rules_stop_hook.py
@@ -1,4 +1,4 @@
-"""Tests for rules_stop_hook.sh shell script.
+"""Tests for the rules stop hook (deepwork.hooks.rules_check).
 
 These tests verify that the rules stop hook correctly outputs JSON
 to block or allow the stop event in Claude Code.
@@ -6,14 +6,13 @@
 
 import json
 import os
+import subprocess
 import tempfile
 from pathlib import Path
 
 import pytest
 from git import Repo
 
-from .conftest import run_shell_script
-
 
 @pytest.fixture
 def git_repo_with_src_rule(tmp_path: Path) -> Path:
@@ -50,33 +49,48 @@ def git_repo_with_src_rule(tmp_path: Path) -> Path:
 
 
 def run_stop_hook(
-    script_path: Path,
     cwd: Path,
     hook_input: dict | None = None,
+    src_dir: Path | None = None,
 ) -> tuple[str, str, int]:
-    """Run the rules_stop_hook.sh script and return its output."""
-    return run_shell_script(script_path, cwd, hook_input=hook_input)
+    """Run the rules_check module and return its output."""
+    env = os.environ.copy()
+    env["DEEPWORK_HOOK_PLATFORM"] = "claude"
+    if src_dir:
+        env["PYTHONPATH"] = str(src_dir)
+
+    stdin_data = json.dumps(hook_input) if hook_input else ""
+
+    result = subprocess.run(
+        ["python", "-m", "deepwork.hooks.rules_check"],
+        cwd=cwd,
+        capture_output=True,
+        text=True,
+        input=stdin_data,
+        env=env,
+    )
+
+    return result.stdout, result.stderr, result.returncode
 
 
 class TestRulesStopHookBlocking:
-    """Tests for rules_stop_hook.sh blocking behavior."""
+    """Tests for rules stop hook blocking behavior."""
 
     def test_outputs_block_json_when_rule_fires(
-        self, rules_hooks_dir: Path, git_repo_with_src_rule: Path
+        self, src_dir: Path, git_repo_with_src_rule: Path
     ) -> None:
         """Test that the hook outputs blocking JSON when a rule fires."""
         # Create a file that triggers the rule
-        src_dir = git_repo_with_src_rule / "src"
-        src_dir.mkdir(exist_ok=True)
-        (src_dir / "main.py").write_text("# New file\n")
+        test_src_dir = git_repo_with_src_rule / "src"
+        test_src_dir.mkdir(exist_ok=True)
+        (test_src_dir / "main.py").write_text("# New file\n")
 
         # Stage the change
         repo = Repo(git_repo_with_src_rule)
         repo.index.add(["src/main.py"])
 
         # Run the stop hook
-        script_path = rules_hooks_dir / "rules_stop_hook.sh"
-        stdout, stderr, code = run_stop_hook(script_path, git_repo_with_src_rule)
+        stdout, stderr, code = run_stop_hook(git_repo_with_src_rule, src_dir=src_dir)
 
         # Parse the output as JSON
         output = stdout.strip()
@@ -94,15 +108,14 @@ def test_outputs_block_json_when_rule_fires(
         assert "Test Rule" in result["reason"], f"Rule name not in reason: {result}"
 
     def test_outputs_empty_json_when_no_rule_fires(
-        self, rules_hooks_dir: Path, git_repo_with_src_rule: Path
+        self, src_dir: Path, git_repo_with_src_rule: Path
     ) -> None:
         """Test that the hook outputs empty JSON when no rule fires."""
         # Don't create any files that would trigger the rule
         # (rule triggers on src/** but we haven't created anything in src/)
 
         # Run the stop hook
-        script_path = rules_hooks_dir / "rules_stop_hook.sh"
-        stdout, stderr, code = run_stop_hook(script_path, git_repo_with_src_rule)
+        stdout, stderr, code = run_stop_hook(git_repo_with_src_rule, src_dir=src_dir)
 
         # Parse the output as JSON
         output = stdout.strip()
@@ -116,10 +129,9 @@ def test_outputs_empty_json_when_no_rule_fires(
         # Should be empty JSON (no blocking)
         assert result == {}, f"Expected empty JSON when no rules fire, got: {result}"
 
-    def test_exits_early_when_no_rules_dir(self, rules_hooks_dir: Path, git_repo: Path) -> None:
+    def test_exits_early_when_no_rules_dir(self, src_dir: Path, git_repo: Path) -> None:
         """Test that the hook exits cleanly when no rules directory exists."""
-        script_path = rules_hooks_dir / "rules_stop_hook.sh"
-        stdout, stderr, code = run_stop_hook(script_path, git_repo)
+        stdout, stderr, code = run_stop_hook(git_repo, src_dir=src_dir)
 
         # Should exit with code 0 and produce no output (or empty)
         assert code == 0, f"Expected exit code 0, got {code}. stderr: {stderr}"
@@ -134,14 +146,12 @@ def test_exits_early_when_no_rules_dir(self, rules_hooks_dir: Path, git_repo: Pa
                 # Empty or no output is acceptable
                 pass
 
-    def test_respects_promise_tags(
-        self, rules_hooks_dir: Path, git_repo_with_src_rule: Path
-    ) -> None:
+    def test_respects_promise_tags(self, src_dir: Path, git_repo_with_src_rule: Path) -> None:
         """Test that promised rules are not re-triggered."""
         # Create a file that triggers the rule
-        src_dir = git_repo_with_src_rule / "src"
-        src_dir.mkdir(exist_ok=True)
-        (src_dir / "main.py").write_text("# New file\n")
+        test_src_dir = git_repo_with_src_rule / "src"
+        test_src_dir.mkdir(exist_ok=True)
+        (test_src_dir / "main.py").write_text("# New file\n")
 
         # Stage the change
         repo = Repo(git_repo_with_src_rule)
@@ -170,9 +180,10 @@ def test_respects_promise_tags(
 
         try:
             # Run the stop hook with transcript path
-            script_path = rules_hooks_dir / "rules_stop_hook.sh"
             hook_input = {"transcript_path": transcript_path, "hook_event_name": "Stop"}
-            stdout, stderr, code = run_stop_hook(script_path, git_repo_with_src_rule, hook_input)
+            stdout, stderr, code = run_stop_hook(
+                git_repo_with_src_rule, hook_input, src_dir=src_dir
+            )
 
             # Parse the output
             output = stdout.strip()
@@ -185,7 +196,7 @@ def test_respects_promise_tags(
         finally:
             os.unlink(transcript_path)
 
-    def test_safety_pattern_prevents_firing(self, rules_hooks_dir: Path, tmp_path: Path) -> None:
+    def test_safety_pattern_prevents_firing(self, src_dir: Path, tmp_path: Path) -> None:
         """Test that safety patterns prevent rules from firing."""
         # Initialize git repo
         repo = Repo.init(tmp_path)
@@ -216,9 +227,9 @@ def test_safety_pattern_prevents_firing(self, rules_hooks_dir: Path, tmp_path: P
         (deepwork_dir / ".last_work_tree").write_text("")
 
         # Create both trigger and safety files
-        src_dir = tmp_path / "src"
-        src_dir.mkdir(exist_ok=True)
-        (src_dir / "main.py").write_text("# Source file\n")
+        test_src_dir = tmp_path / "src"
+        test_src_dir.mkdir(exist_ok=True)
+        (test_src_dir / "main.py").write_text("# Source file\n")
 
         docs_dir = tmp_path / "docs"
         docs_dir.mkdir(exist_ok=True)
@@ -228,8 +239,7 @@ def test_safety_pattern_prevents_firing(self, rules_hooks_dir: Path, tmp_path: P
         repo.index.add(["src/main.py", "docs/api.md"])
 
         # Run the stop hook
-        script_path = rules_hooks_dir / "rules_stop_hook.sh"
-        stdout, stderr, code = run_stop_hook(script_path, tmp_path)
+        stdout, stderr, code = run_stop_hook(tmp_path, src_dir=src_dir)
 
         # Parse the output
         output = stdout.strip()
@@ -242,22 +252,19 @@ def test_safety_pattern_prevents_firing(self, rules_hooks_dir: Path, tmp_path: P
 
 
 class TestRulesStopHookJsonFormat:
-    """Tests for the JSON output format of rules_stop_hook.sh."""
+    """Tests for the JSON output format of the rules stop hook."""
 
-    def test_json_has_correct_structure(
-        self, rules_hooks_dir: Path, git_repo_with_src_rule: Path
-    ) -> None:
+    def test_json_has_correct_structure(self, src_dir: Path, git_repo_with_src_rule: Path) -> None:
         """Test that blocking JSON has the correct Claude Code structure."""
         # Create a file that triggers the rule
-        src_dir = git_repo_with_src_rule / "src"
-        src_dir.mkdir(exist_ok=True)
-        (src_dir / "main.py").write_text("# New file\n")
+        test_src_dir = git_repo_with_src_rule / "src"
+        test_src_dir.mkdir(exist_ok=True)
+        (test_src_dir / "main.py").write_text("# New file\n")
 
         repo = Repo(git_repo_with_src_rule)
         repo.index.add(["src/main.py"])
 
-        script_path = rules_hooks_dir / "rules_stop_hook.sh"
-        stdout, stderr, code = run_stop_hook(script_path, git_repo_with_src_rule)
+        stdout, stderr, code = run_stop_hook(git_repo_with_src_rule, src_dir=src_dir)
 
         result = json.loads(stdout.strip())
 
@@ -271,18 +278,17 @@ def test_json_has_correct_structure(
         assert len(result["reason"]) > 0
 
     def test_reason_contains_rule_instructions(
-        self, rules_hooks_dir: Path, git_repo_with_src_rule: Path
+        self, src_dir: Path, git_repo_with_src_rule: Path
     ) -> None:
         """Test that the reason includes the rule instructions."""
-        src_dir = git_repo_with_src_rule / "src"
-        src_dir.mkdir(exist_ok=True)
-        (src_dir / "main.py").write_text("# New file\n")
+        test_src_dir = git_repo_with_src_rule / "src"
+        test_src_dir.mkdir(exist_ok=True)
+        (test_src_dir / "main.py").write_text("# New file\n")
 
         repo = Repo(git_repo_with_src_rule)
         repo.index.add(["src/main.py"])
 
-        script_path = rules_hooks_dir / "rules_stop_hook.sh"
-        stdout, stderr, code = run_stop_hook(script_path, git_repo_with_src_rule)
+        stdout, stderr, code = run_stop_hook(git_repo_with_src_rule, src_dir=src_dir)
 
         result = json.loads(stdout.strip())
 
diff --git a/tests/unit/test_hooks_syncer.py b/tests/unit/test_hooks_syncer.py
index 190fee1b..abaca222 100644
--- a/tests/unit/test_hooks_syncer.py
+++ b/tests/unit/test_hooks_syncer.py
@@ -6,6 +6,7 @@
 from deepwork.core.adapters import ClaudeAdapter
 from deepwork.core.hooks_syncer import (
     HookEntry,
+    HookSpec,
     JobHooks,
     collect_job_hooks,
     merge_hooks_for_platform,
@@ -16,19 +17,33 @@
 class TestHookEntry:
     """Tests for HookEntry dataclass."""
 
-    def test_get_script_path_relative(self, temp_dir: Path) -> None:
-        """Test getting relative script path."""
+    def test_get_command_for_script(self, temp_dir: Path) -> None:
+        """Test getting command for a script hook."""
         job_dir = temp_dir / ".deepwork" / "jobs" / "test_job"
         job_dir.mkdir(parents=True)
 
         entry = HookEntry(
+            job_name="test_job",
+            job_dir=job_dir,
             script="test_hook.sh",
+        )
+
+        cmd = entry.get_command(temp_dir)
+        assert cmd == ".deepwork/jobs/test_job/hooks/test_hook.sh"
+
+    def test_get_command_for_module(self, temp_dir: Path) -> None:
+        """Test getting command for a module hook."""
+        job_dir = temp_dir / ".deepwork" / "jobs" / "test_job"
+        job_dir.mkdir(parents=True)
+
+        entry = HookEntry(
             job_name="test_job",
             job_dir=job_dir,
+            module="deepwork.hooks.rules_check",
         )
 
-        path = entry.get_script_path(temp_dir)
-        assert path == ".deepwork/jobs/test_job/hooks/test_hook.sh"
+        cmd = entry.get_command(temp_dir)
+        assert cmd == "python -m deepwork.hooks.rules_check"
 
 
 class TestJobHooks:
@@ -56,8 +71,35 @@ def test_from_job_dir_with_hooks(self, temp_dir: Path) -> None:
 
         assert result is not None
         assert result.job_name == "test_job"
-        assert result.hooks["UserPromptSubmit"] == ["capture.sh"]
-        assert result.hooks["Stop"] == ["rules_check.sh", "cleanup.sh"]
+        assert len(result.hooks["UserPromptSubmit"]) == 1
+        assert result.hooks["UserPromptSubmit"][0].script == "capture.sh"
+        assert len(result.hooks["Stop"]) == 2
+        assert result.hooks["Stop"][0].script == "rules_check.sh"
+        assert result.hooks["Stop"][1].script == "cleanup.sh"
+
+    def test_from_job_dir_with_module_hooks(self, temp_dir: Path) -> None:
+        """Test loading module-based hooks from job directory."""
+        job_dir = temp_dir / "test_job"
+        hooks_dir = job_dir / "hooks"
+        hooks_dir.mkdir(parents=True)
+
+        # Create global_hooks.yml with module format
+        hooks_file = hooks_dir / "global_hooks.yml"
+        hooks_file.write_text(
+            """
+UserPromptSubmit:
+  - capture.sh
+Stop:
+  - module: deepwork.hooks.rules_check
+"""
+        )
+
+        result = JobHooks.from_job_dir(job_dir)
+
+        assert result is not None
+        assert result.hooks["UserPromptSubmit"][0].script == "capture.sh"
+        assert result.hooks["Stop"][0].module == "deepwork.hooks.rules_check"
+        assert result.hooks["Stop"][0].script is None
 
     def test_from_job_dir_no_hooks_file(self, temp_dir: Path) -> None:
         """Test returns None when no hooks file exists."""
@@ -91,7 +133,8 @@ def test_from_job_dir_single_script_as_string(self, temp_dir: Path) -> None:
         result = JobHooks.from_job_dir(job_dir)
 
         assert result is not None
-        assert result.hooks["Stop"] == ["cleanup.sh"]
+        assert len(result.hooks["Stop"]) == 1
+        assert result.hooks["Stop"][0].script == "cleanup.sh"
 
 
 class TestCollectJobHooks:
@@ -143,12 +186,15 @@ def test_merges_hooks_from_multiple_jobs(self, temp_dir: Path) -> None:
             JobHooks(
                 job_name="job1",
                 job_dir=job1_dir,
-                hooks={"Stop": ["hook1.sh"]},
+                hooks={"Stop": [HookSpec(script="hook1.sh")]},
             ),
             JobHooks(
                 job_name="job2",
                 job_dir=job2_dir,
-                hooks={"Stop": ["hook2.sh"], "UserPromptSubmit": ["capture.sh"]},
+                hooks={
+                    "Stop": [HookSpec(script="hook2.sh")],
+                    "UserPromptSubmit": [HookSpec(script="capture.sh")],
+                },
             ),
         ]
 
@@ -169,7 +215,7 @@ def test_avoids_duplicate_hooks(self, temp_dir: Path) -> None:
             JobHooks(
                 job_name="job1",
                 job_dir=job_dir,
-                hooks={"Stop": ["hook.sh", "hook.sh"]},
+                hooks={"Stop": [HookSpec(script="hook.sh"), HookSpec(script="hook.sh")]},
             ),
         ]
 
@@ -197,7 +243,7 @@ def test_syncs_hooks_via_adapter(self, temp_dir: Path) -> None:
             JobHooks(
                 job_name="test_job",
                 job_dir=job_dir,
-                hooks={"Stop": ["test_hook.sh"]},
+                hooks={"Stop": [HookSpec(script="test_hook.sh")]},
             ),
         ]
 
@@ -250,7 +296,7 @@ def test_merges_with_existing_settings(self, temp_dir: Path) -> None:
             JobHooks(
                 job_name="test_job",
                 job_dir=job_dir,
-                hooks={"Stop": ["new_hook.sh"]},
+                hooks={"Stop": [HookSpec(script="new_hook.sh")]},
             ),
         ]
 

From 66f2032f8236eb264c0311934eff2a65cfd11e5b Mon Sep 17 00:00:00 2001
From: Noah Horton <noah@unsupervised.com>
Date: Sat, 17 Jan 2026 16:08:24 -0700
Subject: [PATCH 21/21] Create manual test files for rule styles (#61)

* Add manual test files for testing hook/rule functionality

Creates manual_tests/claude/ directory with test files that exercise
different rule styles:
- Trigger/Safety mode (basic conditional)
- Set mode (bidirectional correspondence)
- Pair mode (directional correspondence)
- Command action (automatic command execution)
- Multi-safety (multiple safety patterns)

Each test file includes documentation explaining what it tests,
how to trigger it, and expected behavior. Corresponding rule
definitions added to .deepwork/rules/.

* Move manual test files from manual_tests/claude/ to manual_tests/

Flatten directory structure as requested. Updated all rule definitions
to reference the new paths.

* Reorganize manual tests into subfolders per test type

Group related files together:
- test_trigger_safety_mode/
- test_set_mode/
- test_pair_mode/
- test_command_action/
- test_multi_safety/

Updated rule definitions and README to match new structure.

* Add compare_to: prompt to manual test rules

This ensures rules evaluate against changes since the last prompt
rather than against the merge-base, allowing them to fire during
the current conversation when files are edited.

* Add sub-agent testing instructions to manual tests README

Explains that the best way to run these tests is as sub-agents
using a fast model (haiku), with example prompts and verification
commands.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Update manual test files with both-case test instructions

- Updated README with test matrix showing expected results
- Added TEST CASE sections to each test file documenting both
  "should fire" and "should NOT fire" scenarios
- Added test results tracking table to README

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
---
 .deepwork/rules/manual-test-command-action.md | 19 ++++++
 .deepwork/rules/manual-test-multi-safety.md   | 25 +++++++
 .deepwork/rules/manual-test-pair-mode.md      | 26 ++++++++
 .deepwork/rules/manual-test-set-mode.md       | 26 ++++++++
 .deepwork/rules/manual-test-trigger-safety.md | 21 ++++++
 manual_tests/README.md                        | 66 +++++++++++++++++++
 .../test_command_action.txt                   | 25 +++++++
 .../test_command_action_log.txt               |  3 +
 .../test_multi_safety/test_multi_safety.py    | 43 ++++++++++++
 .../test_multi_safety_changelog.md            | 16 +++++
 .../test_multi_safety_version.txt             | 10 +++
 .../test_pair_mode/test_pair_mode_expected.md | 31 +++++++++
 .../test_pair_mode/test_pair_mode_trigger.py  | 47 +++++++++++++
 .../test_set_mode/test_set_mode_source.py     | 40 +++++++++++
 .../test_set_mode/test_set_mode_test.py       | 37 +++++++++++
 .../test_trigger_safety_mode.py               | 32 +++++++++
 .../test_trigger_safety_mode_doc.md           | 20 ++++++
 17 files changed, 487 insertions(+)
 create mode 100644 .deepwork/rules/manual-test-command-action.md
 create mode 100644 .deepwork/rules/manual-test-multi-safety.md
 create mode 100644 .deepwork/rules/manual-test-pair-mode.md
 create mode 100644 .deepwork/rules/manual-test-set-mode.md
 create mode 100644 .deepwork/rules/manual-test-trigger-safety.md
 create mode 100644 manual_tests/README.md
 create mode 100644 manual_tests/test_command_action/test_command_action.txt
 create mode 100644 manual_tests/test_command_action/test_command_action_log.txt
 create mode 100644 manual_tests/test_multi_safety/test_multi_safety.py
 create mode 100644 manual_tests/test_multi_safety/test_multi_safety_changelog.md
 create mode 100644 manual_tests/test_multi_safety/test_multi_safety_version.txt
 create mode 100644 manual_tests/test_pair_mode/test_pair_mode_expected.md
 create mode 100644 manual_tests/test_pair_mode/test_pair_mode_trigger.py
 create mode 100644 manual_tests/test_set_mode/test_set_mode_source.py
 create mode 100644 manual_tests/test_set_mode/test_set_mode_test.py
 create mode 100644 manual_tests/test_trigger_safety_mode/test_trigger_safety_mode.py
 create mode 100644 manual_tests/test_trigger_safety_mode/test_trigger_safety_mode_doc.md

diff --git a/.deepwork/rules/manual-test-command-action.md b/.deepwork/rules/manual-test-command-action.md
new file mode 100644
index 00000000..966ab2de
--- /dev/null
+++ b/.deepwork/rules/manual-test-command-action.md
@@ -0,0 +1,19 @@
+---
+name: "Manual Test: Command Action"
+trigger: manual_tests/test_command_action/test_command_action.txt
+action:
+  command: echo "$(date '+%Y-%m-%d %H:%M:%S') - Command triggered by edit to {file}" >> manual_tests/test_command_action/test_command_action_log.txt
+  run_for: each_match
+compare_to: prompt
+---
+
+# Manual Test: Command Action
+
+This rule automatically appends a timestamped log entry when the
+test file is edited. No agent prompt is shown - the command runs
+automatically.
+
+## This tests:
+
+The command action feature where rules can execute shell commands
+instead of prompting the agent. The command should be idempotent.
diff --git a/.deepwork/rules/manual-test-multi-safety.md b/.deepwork/rules/manual-test-multi-safety.md
new file mode 100644
index 00000000..4ce978cb
--- /dev/null
+++ b/.deepwork/rules/manual-test-multi-safety.md
@@ -0,0 +1,25 @@
+---
+name: "Manual Test: Multi Safety"
+trigger: manual_tests/test_multi_safety/test_multi_safety.py
+safety:
+  - manual_tests/test_multi_safety/test_multi_safety_changelog.md
+  - manual_tests/test_multi_safety/test_multi_safety_version.txt
+compare_to: prompt
+---
+
+# Manual Test: Multiple Safety Patterns
+
+You changed the source file without updating version info!
+
+**Changed:** `{trigger_files}`
+
+## What to do:
+
+1. Update the changelog: `manual_tests/test_multi_safety/test_multi_safety_changelog.md`
+2. And/or update the version: `manual_tests/test_multi_safety/test_multi_safety_version.txt`
+3. Or acknowledge with `<promise>Manual Test: Multi Safety</promise>`
+
+## This tests:
+
+Trigger/safety mode with MULTIPLE safety patterns. The rule is
+suppressed if ANY of the safety files are also edited.
diff --git a/.deepwork/rules/manual-test-pair-mode.md b/.deepwork/rules/manual-test-pair-mode.md
new file mode 100644
index 00000000..9c2379bf
--- /dev/null
+++ b/.deepwork/rules/manual-test-pair-mode.md
@@ -0,0 +1,26 @@
+---
+name: "Manual Test: Pair Mode"
+pair:
+  trigger: manual_tests/test_pair_mode/test_pair_mode_trigger.py
+  expects: manual_tests/test_pair_mode/test_pair_mode_expected.md
+compare_to: prompt
+---
+
+# Manual Test: Pair Mode (Directional Correspondence)
+
+API code changed without documentation update!
+
+**Changed:** `{trigger_files}`
+**Expected:** `{expected_files}`
+
+## What to do:
+
+1. Update the API documentation in `test_pair_mode_expected.md`
+2. Or acknowledge with `<promise>Manual Test: Pair Mode</promise>`
+
+## This tests:
+
+The "pair" detection mode where there's a ONE-WAY relationship.
+When the trigger file changes, the expected file must also change.
+BUT the expected file can change independently (docs can be updated
+without requiring code changes).
diff --git a/.deepwork/rules/manual-test-set-mode.md b/.deepwork/rules/manual-test-set-mode.md
new file mode 100644
index 00000000..abe504ec
--- /dev/null
+++ b/.deepwork/rules/manual-test-set-mode.md
@@ -0,0 +1,26 @@
+---
+name: "Manual Test: Set Mode"
+set:
+  - manual_tests/test_set_mode/test_set_mode_source.py
+  - manual_tests/test_set_mode/test_set_mode_test.py
+compare_to: prompt
+---
+
+# Manual Test: Set Mode (Bidirectional Correspondence)
+
+Source and test files must change together!
+
+**Changed:** `{trigger_files}`
+**Missing:** `{expected_files}`
+
+## What to do:
+
+1. If you changed the source file, update the corresponding test file
+2. If you changed the test file, ensure the source file reflects those changes
+3. Or acknowledge with `<promise>Manual Test: Set Mode</promise>`
+
+## This tests:
+
+The "set" detection mode where files in a set must ALL change together.
+This is bidirectional - the rule fires regardless of which file in the set
+was edited first.
diff --git a/.deepwork/rules/manual-test-trigger-safety.md b/.deepwork/rules/manual-test-trigger-safety.md
new file mode 100644
index 00000000..b144a2a0
--- /dev/null
+++ b/.deepwork/rules/manual-test-trigger-safety.md
@@ -0,0 +1,21 @@
+---
+name: "Manual Test: Trigger Safety"
+trigger: manual_tests/test_trigger_safety_mode/test_trigger_safety_mode.py
+safety: manual_tests/test_trigger_safety_mode/test_trigger_safety_mode_doc.md
+compare_to: prompt
+---
+
+# Manual Test: Trigger/Safety Mode
+
+You edited `{trigger_files}` without updating the documentation.
+
+## What to do:
+
+1. Review the changes in the source file
+2. Update `manual_tests/test_trigger_safety_mode/test_trigger_safety_mode_doc.md` to reflect changes
+3. Or acknowledge this is intentional with `<promise>Manual Test: Trigger Safety</promise>`
+
+## This tests:
+
+The basic trigger/safety detection mode where editing the trigger file
+causes the rule to fire UNLESS the safety file is also edited.
diff --git a/manual_tests/README.md b/manual_tests/README.md
new file mode 100644
index 00000000..7baaddb4
--- /dev/null
+++ b/manual_tests/README.md
@@ -0,0 +1,66 @@
+# Manual Hook/Rule Tests for Claude
+
+This directory contains files designed to manually test different types of deepwork rules/hooks.
+Each test must verify BOTH that the rule fires when it should AND does not fire when it shouldn't.
+
+## How to Run These Tests
+
+**The best way to run these tests is as sub-agents using a fast model (e.g., haiku).**
+
+This approach works because:
+1. Sub-agents run in isolated contexts where changes can be detected
+2. The Stop hook evaluates rules when the sub-agent completes
+3. Using a fast model keeps test iterations quick and cheap
+
+After each sub-agent returns, run the hook to verify:
+```bash
+echo '{}' | python -m deepwork.hooks.rules_check
+```
+
+Then revert changes before the next test:
+```bash
+git checkout -- manual_tests/
+```
+
+## Test Matrix
+
+Each test has two cases: one where the rule SHOULD fire, and one where it should NOT.
+
+| Test | Should Fire | Should NOT Fire | Rule Name |
+|------|-------------|-----------------|-----------|
+| **Trigger/Safety** | Edit `.py` only | Edit `.py` AND `_doc.md` | Manual Test: Trigger Safety |
+| **Set Mode** | Edit `_source.py` only | Edit `_source.py` AND `_test.py` | Manual Test: Set Mode |
+| **Pair Mode** | Edit `_trigger.py` only | Edit `_trigger.py` AND `_expected.md` | Manual Test: Pair Mode |
+| **Pair Mode (reverse)** | — | Edit `_expected.md` only (should NOT fire) | Manual Test: Pair Mode |
+| **Command Action** | Edit `.txt` → log appended | — (always runs) | Manual Test: Command Action |
+| **Multi Safety** | Edit `.py` only | Edit `.py` AND any safety file | Manual Test: Multi Safety |
+
+## Test Results Tracking
+
+| Test Case | Fires When Should | Does NOT Fire When Shouldn't |
+|-----------|:-----------------:|:----------------------------:|
+| Trigger/Safety | ☐ | ☐ |
+| Set Mode | ☐ | ☐ |
+| Pair Mode (forward) | ☐ | ☐ |
+| Pair Mode (reverse - expected only) | — | ☐ |
+| Command Action | ☐ | — |
+| Multi Safety | ☐ | ☐ |
+
+## Test Folders
+
+| Folder | Rule Type | Description |
+|--------|-----------|-------------|
+| `test_trigger_safety_mode/` | Trigger/Safety | Basic conditional: fires unless safety file also edited |
+| `test_set_mode/` | Set (Bidirectional) | Files must change together (either direction) |
+| `test_pair_mode/` | Pair (Directional) | One-way: trigger requires expected, but not vice versa |
+| `test_command_action/` | Command Action | Automatically runs command on file change |
+| `test_multi_safety/` | Multiple Safety | Fires unless ANY of the safety files also edited |
+
+## Corresponding Rules
+
+Rules are defined in `.deepwork/rules/`:
+- `manual-test-trigger-safety.md`
+- `manual-test-set-mode.md`
+- `manual-test-pair-mode.md`
+- `manual-test-command-action.md`
+- `manual-test-multi-safety.md`
diff --git a/manual_tests/test_command_action/test_command_action.txt b/manual_tests/test_command_action/test_command_action.txt
new file mode 100644
index 00000000..f32315ab
--- /dev/null
+++ b/manual_tests/test_command_action/test_command_action.txt
@@ -0,0 +1,25 @@
+MANUAL TEST: Command Action Rule
+
+=== WHAT THIS TESTS ===
+Tests the "command action" feature where a rule automatically
+runs a shell command instead of prompting the agent.
+
+=== HOW TO TRIGGER ===
+Edit this file (add text, modify content, etc.)
+
+=== EXPECTED BEHAVIOR ===
+When this file is edited, the rule automatically runs a command
+that appends a timestamped line to test_command_action_log.txt
+
+The command is idempotent: running it multiple times produces
+consistent results (a log entry is appended).
+
+=== RULE LOCATION ===
+.deepwork/rules/manual-test-command-action.md
+
+=== LOG FILE ===
+Check test_command_action_log.txt for command execution results.
+
+---
+Edit below this line to trigger the command:
+---
diff --git a/manual_tests/test_command_action/test_command_action_log.txt b/manual_tests/test_command_action/test_command_action_log.txt
new file mode 100644
index 00000000..1ca155ed
--- /dev/null
+++ b/manual_tests/test_command_action/test_command_action_log.txt
@@ -0,0 +1,3 @@
+# Command Action Log
+# Lines below are added automatically when test_command_action.txt is edited
+# ---
diff --git a/manual_tests/test_multi_safety/test_multi_safety.py b/manual_tests/test_multi_safety/test_multi_safety.py
new file mode 100644
index 00000000..40cd981c
--- /dev/null
+++ b/manual_tests/test_multi_safety/test_multi_safety.py
@@ -0,0 +1,43 @@
+"""
+MANUAL TEST: Multiple Safety Patterns
+
+=== WHAT THIS TESTS ===
+Tests trigger/safety mode with MULTIPLE safety patterns:
+- Rule fires when this file is edited alone
+- Rule is suppressed if ANY of the safety files are also edited:
+  - test_multi_safety_changelog.md
+  - test_multi_safety_version.txt
+
+=== TEST CASE 1: Rule SHOULD fire ===
+1. Edit this file (add a comment below the marker)
+2. Do NOT edit any safety files
+3. Run: echo '{}' | python -m deepwork.hooks.rules_check
+4. Expected: "Manual Test: Multi Safety" appears in output
+
+=== TEST CASE 2: Rule should NOT fire (changelog edited) ===
+1. Edit this file (add a comment below the marker)
+2. ALSO edit test_multi_safety_changelog.md
+3. Run: echo '{}' | python -m deepwork.hooks.rules_check
+4. Expected: "Manual Test: Multi Safety" does NOT appear
+
+=== TEST CASE 3: Rule should NOT fire (version edited) ===
+1. Edit this file (add a comment below the marker)
+2. ALSO edit test_multi_safety_version.txt
+3. Run: echo '{}' | python -m deepwork.hooks.rules_check
+4. Expected: "Manual Test: Multi Safety" does NOT appear
+
+=== RULE LOCATION ===
+.deepwork/rules/manual-test-multi-safety.md
+"""
+
+
+VERSION = "1.0.0"
+
+
+def get_version():
+    """Return the current version."""
+    return VERSION
+
+
+# Edit below this line to trigger the rule
+# -------------------------------------------
diff --git a/manual_tests/test_multi_safety/test_multi_safety_changelog.md b/manual_tests/test_multi_safety/test_multi_safety_changelog.md
new file mode 100644
index 00000000..d0a6e4f9
--- /dev/null
+++ b/manual_tests/test_multi_safety/test_multi_safety_changelog.md
@@ -0,0 +1,16 @@
+# Changelog (Multi-Safety Test)
+
+## What This File Does
+
+This is one of the "safety" files for the multi-safety test.
+Editing this file suppresses the rule when the source is edited.
+
+## Changelog
+
+### v1.0.0
+- Initial release
+
+---
+
+Edit below this line to suppress the multi-safety rule:
+<!-- Changes here -->
diff --git a/manual_tests/test_multi_safety/test_multi_safety_version.txt b/manual_tests/test_multi_safety/test_multi_safety_version.txt
new file mode 100644
index 00000000..b9cf607d
--- /dev/null
+++ b/manual_tests/test_multi_safety/test_multi_safety_version.txt
@@ -0,0 +1,10 @@
+Multi-Safety Version File
+
+This is one of the "safety" files for the multi-safety test.
+Editing this file suppresses the rule when the source is edited.
+
+Current Version: 1.0.0
+
+---
+Edit below this line to suppress the multi-safety rule:
+---
diff --git a/manual_tests/test_pair_mode/test_pair_mode_expected.md b/manual_tests/test_pair_mode/test_pair_mode_expected.md
new file mode 100644
index 00000000..b4f286bd
--- /dev/null
+++ b/manual_tests/test_pair_mode/test_pair_mode_expected.md
@@ -0,0 +1,31 @@
+# API Documentation (Pair Mode Expected File)
+
+## What This File Does
+
+This is the "expected" file in a pair mode rule.
+
+## Pair Mode Behavior
+
+- When `test_pair_mode_trigger.py` changes, this file MUST also change
+- When THIS file changes alone, NO rule fires (docs can update independently)
+
+## API Reference
+
+### `api_endpoint()`
+
+Returns a status response.
+
+**Returns:** `{"status": "ok", "message": "API response"}`
+
+---
+
+## Testing Instructions
+
+1. To TRIGGER the rule: Edit only `test_pair_mode_trigger.py`
+2. To verify ONE-WAY: Edit only this file (rule should NOT fire)
+3. To SATISFY the rule: Edit both files together
+
+---
+
+Edit below this line (editing here alone should NOT trigger the rule):
+<!-- Changes here -->
diff --git a/manual_tests/test_pair_mode/test_pair_mode_trigger.py b/manual_tests/test_pair_mode/test_pair_mode_trigger.py
new file mode 100644
index 00000000..369dd18a
--- /dev/null
+++ b/manual_tests/test_pair_mode/test_pair_mode_trigger.py
@@ -0,0 +1,47 @@
+"""
+MANUAL TEST: Pair Mode (Directional Correspondence)
+
+=== WHAT THIS TESTS ===
+Tests the "pair" detection mode where there's a ONE-WAY relationship:
+- This file is the TRIGGER
+- test_pair_mode_expected.md is the EXPECTED file
+- When THIS file changes, the expected file MUST also change
+- But the expected file CAN change independently (no rule fires)
+
+=== TEST CASE 1: Rule SHOULD fire ===
+1. Edit this file (add a comment below the marker)
+2. Do NOT edit test_pair_mode_expected.md
+3. Run: echo '{}' | python -m deepwork.hooks.rules_check
+4. Expected: "Manual Test: Pair Mode" appears in output
+
+=== TEST CASE 2: Rule should NOT fire (both edited) ===
+1. Edit this file (add a comment below the marker)
+2. ALSO edit test_pair_mode_expected.md
+3. Run: echo '{}' | python -m deepwork.hooks.rules_check
+4. Expected: "Manual Test: Pair Mode" does NOT appear
+
+=== TEST CASE 3: Rule should NOT fire (expected only) ===
+1. Do NOT edit this file
+2. Edit ONLY test_pair_mode_expected.md
+3. Run: echo '{}' | python -m deepwork.hooks.rules_check
+4. Expected: "Manual Test: Pair Mode" does NOT appear
+   (This verifies the ONE-WAY nature of pair mode)
+
+=== RULE LOCATION ===
+.deepwork/rules/manual-test-pair-mode.md
+"""
+
+
+def api_endpoint():
+    """
+    An API endpoint that requires documentation.
+
+    This simulates an API file where changes require
+    documentation updates, but docs can be updated
+    independently (for typos, clarifications, etc.)
+    """
+    return {"status": "ok", "message": "API response"}
+
+
+# Edit below this line to trigger the rule
+# -------------------------------------------
diff --git a/manual_tests/test_set_mode/test_set_mode_source.py b/manual_tests/test_set_mode/test_set_mode_source.py
new file mode 100644
index 00000000..6649e424
--- /dev/null
+++ b/manual_tests/test_set_mode/test_set_mode_source.py
@@ -0,0 +1,40 @@
+"""
+MANUAL TEST: Set Mode (Bidirectional Correspondence)
+
+=== WHAT THIS TESTS ===
+Tests the "set" detection mode where files must change together:
+- This source file and test_set_mode_test.py are in a "set"
+- If EITHER file changes, the OTHER must also change
+- This is BIDIRECTIONAL (works in both directions)
+
+=== TEST CASE 1: Rule SHOULD fire ===
+1. Edit this file (add a comment below the marker)
+2. Do NOT edit test_set_mode_test.py
+3. Run: echo '{}' | python -m deepwork.hooks.rules_check
+4. Expected: "Manual Test: Set Mode" appears in output
+
+=== TEST CASE 2: Rule should NOT fire ===
+1. Edit this file (add a comment below the marker)
+2. ALSO edit test_set_mode_test.py
+3. Run: echo '{}' | python -m deepwork.hooks.rules_check
+4. Expected: "Manual Test: Set Mode" does NOT appear
+
+=== RULE LOCATION ===
+.deepwork/rules/manual-test-set-mode.md
+"""
+
+
+class Calculator:
+    """A simple calculator for testing set mode."""
+
+    def add(self, a: int, b: int) -> int:
+        """Add two numbers."""
+        return a + b
+
+    def subtract(self, a: int, b: int) -> int:
+        """Subtract b from a."""
+        return a - b
+
+
+# Edit below this line to trigger the rule
+# -------------------------------------------
diff --git a/manual_tests/test_set_mode/test_set_mode_test.py b/manual_tests/test_set_mode/test_set_mode_test.py
new file mode 100644
index 00000000..3ef349e4
--- /dev/null
+++ b/manual_tests/test_set_mode/test_set_mode_test.py
@@ -0,0 +1,37 @@
+"""
+MANUAL TEST: Set Mode - Test File (Bidirectional Correspondence)
+
+=== WHAT THIS TESTS ===
+This is the TEST file for the set mode test.
+It must change together with test_set_mode_source.py.
+
+=== HOW TO TRIGGER ===
+Option A: Edit this file alone (without test_set_mode_source.py)
+Option B: Edit test_set_mode_source.py alone (without this file)
+
+=== EXPECTED BEHAVIOR ===
+- Edit this file alone -> Rule fires, expects source file to also change
+- Edit source file alone -> Rule fires, expects this file to also change
+- Edit BOTH files -> Rule is satisfied (no fire)
+
+=== RULE LOCATION ===
+.deepwork/rules/manual-test-set-mode.md
+"""
+
+from test_set_mode_source import Calculator
+
+
+def test_add():
+    """Test the add method."""
+    calc = Calculator()
+    assert calc.add(2, 3) == 5
+
+
+def test_subtract():
+    """Test the subtract method."""
+    calc = Calculator()
+    assert calc.subtract(5, 3) == 2
+
+
+# Edit below this line to trigger the rule
+# -------------------------------------------
diff --git a/manual_tests/test_trigger_safety_mode/test_trigger_safety_mode.py b/manual_tests/test_trigger_safety_mode/test_trigger_safety_mode.py
new file mode 100644
index 00000000..68bf59b0
--- /dev/null
+++ b/manual_tests/test_trigger_safety_mode/test_trigger_safety_mode.py
@@ -0,0 +1,32 @@
+"""
+MANUAL TEST: Trigger/Safety Mode Rule
+
+=== WHAT THIS TESTS ===
+Tests the basic trigger/safety detection mode where:
+- Rule FIRES when this file is edited alone
+- Rule is SUPPRESSED when test_trigger_safety_mode_doc.md is also edited
+
+=== TEST CASE 1: Rule SHOULD fire ===
+1. Edit this file (add a comment below the marker)
+2. Do NOT edit test_trigger_safety_mode_doc.md
+3. Run: echo '{}' | python -m deepwork.hooks.rules_check
+4. Expected: "Manual Test: Trigger Safety" appears in output
+
+=== TEST CASE 2: Rule should NOT fire ===
+1. Edit this file (add a comment below the marker)
+2. ALSO edit test_trigger_safety_mode_doc.md
+3. Run: echo '{}' | python -m deepwork.hooks.rules_check
+4. Expected: "Manual Test: Trigger Safety" does NOT appear
+
+=== RULE LOCATION ===
+.deepwork/rules/manual-test-trigger-safety.md
+"""
+
+
+def example_function():
+    """An example function to demonstrate the trigger."""
+    return "Hello from trigger safety test"
+
+
+# Edit below this line to trigger the rule
+# -------------------------------------------
diff --git a/manual_tests/test_trigger_safety_mode/test_trigger_safety_mode_doc.md b/manual_tests/test_trigger_safety_mode/test_trigger_safety_mode_doc.md
new file mode 100644
index 00000000..625cf0b5
--- /dev/null
+++ b/manual_tests/test_trigger_safety_mode/test_trigger_safety_mode_doc.md
@@ -0,0 +1,20 @@
+# Documentation for Trigger Safety Test
+
+## What This File Does
+
+This is the "safety" file for the trigger/safety mode test.
+
+## How It Works
+
+When this file is edited ALONGSIDE `test_trigger_safety_mode.py`,
+the trigger/safety rule is suppressed (does not fire).
+
+## Testing
+
+1. To TRIGGER the rule: Edit only `test_trigger_safety_mode.py`
+2. To SUPPRESS the rule: Edit both files together
+
+---
+
+Edit below this line to suppress the trigger/safety rule:
+<!-- Changes here -->