|
| 1 | +--- |
| 2 | +name: manual_tests.run_fire_tests |
| 3 | +description: "Runs all 'should fire' tests serially with git reverts between each. Use after NOT-fire tests to verify rules fire correctly." |
| 4 | +user-invocable: false |
| 5 | +hooks: |
| 6 | + Stop: |
| 7 | + - hooks: |
| 8 | + - type: prompt |
| 9 | + prompt: | |
| 10 | + You must evaluate whether Claude has met all the below quality criteria for the request. |
| 11 | +
|
| 12 | + ## Quality Criteria |
| 13 | +
|
| 14 | + 1. **Sub-Agents Used**: Did the main agent spawn a sub-agent (using the Task tool) for EACH test? The main agent must NOT edit the test files directly. |
| 15 | + 2. **Serial Execution**: Were sub-agents launched ONE AT A TIME (not in parallel) to prevent cross-contamination? |
| 16 | + 3. **Hooks Fired Automatically**: Did the main agent observe the blocking hooks firing automatically when each sub-agent returned? The agent must NOT manually run the rules_check command. |
| 17 | + 4. **Git Reverted Between Tests**: Was `git checkout -- manual_tests/` run between each test to prevent cross-contamination? |
| 18 | + 5. **All Tests Run**: Were all 8 'should fire' tests executed (trigger/safety, set, pair, command action, multi safety, infinite block prompt, infinite block command, created)? |
| 19 | + 6. **Results Recorded**: Did the main agent track pass/fail status for each test case? |
| 20 | +
|
| 21 | + ## Instructions |
| 22 | +
|
| 23 | + Review the conversation and determine if ALL quality criteria above have been satisfied. |
| 24 | + Look for evidence that each criterion has been addressed. |
| 25 | +
|
| 26 | + If the agent has included `<promise>✓ Quality Criteria Met</promise>` in their response AND |
| 27 | + all criteria appear to be met, respond with: {"ok": true} |
| 28 | +
|
| 29 | + If criteria are NOT met OR the promise tag is missing, respond with: |
| 30 | + {"ok": false, "reason": "**AGENT: TAKE ACTION** - [which criteria failed and why]"} |
| 31 | +--- |
| 32 | + |
| 33 | +# manual_tests.run_fire_tests |
| 34 | + |
| 35 | +**Step 2/2** in **manual_tests** workflow |
| 36 | + |
| 37 | +> Runs all manual hook/rule tests using sub-agents. Use when validating that DeepWork rules fire correctly. |
| 38 | +
|
| 39 | +## Prerequisites (Verify First) |
| 40 | + |
| 41 | +Before proceeding, confirm these steps are complete: |
| 42 | +- `/manual_tests.run_not_fire_tests` |
| 43 | + |
| 44 | +## Instructions |
| 45 | + |
| 46 | +**Goal**: Runs all 'should fire' tests serially with git reverts between each. Use after NOT-fire tests to verify rules fire correctly. |
| 47 | + |
| 48 | +# Run Should-Fire Tests |
| 49 | + |
| 50 | +## Objective |
| 51 | + |
| 52 | +Run all "should fire" tests in **serial** sub-agents to verify that rules fire correctly when their trigger conditions are met without safety conditions. |
| 53 | + |
| 54 | +## CRITICAL: Sub-Agent Requirement |
| 55 | + |
| 56 | +**You MUST spawn sub-agents to make all file edits. DO NOT edit the test files yourself.** |
| 57 | + |
| 58 | +Why sub-agents are required: |
| 59 | +1. Sub-agents run in isolated contexts where file changes are detected |
| 60 | +2. When a sub-agent completes, the Stop hook **automatically** evaluates rules |
| 61 | +3. You (the main agent) observe whether hooks fired - you do NOT manually trigger them |
| 62 | +4. If you edit files directly, the hooks won't fire because you're not a completing sub-agent |
| 63 | + |
| 64 | +**NEVER manually run `echo '{}' | python -m deepwork.hooks.rules_check`** - this defeats the purpose of the test. Hooks must fire AUTOMATICALLY when sub-agents return. |
| 65 | + |
| 66 | +## CRITICAL: Serial Execution |
| 67 | + |
| 68 | +**These tests MUST run ONE AT A TIME, with git reverts between each.** |
| 69 | + |
| 70 | +Why serial execution is required: |
| 71 | +- These tests edit ONLY the trigger file (not the safety) |
| 72 | +- If multiple sub-agents run in parallel, sub-agent A's hook will see changes from sub-agent B |
| 73 | +- This causes cross-contamination: A gets blocked by rules triggered by B's changes |
| 74 | +- Run one test, observe the hook, revert, then run the next |
| 75 | + |
| 76 | +## Task |
| 77 | + |
| 78 | +Run all 8 "should fire" tests in **serial** sub-agents, reverting between each, and verify that blocking hooks fire automatically. |
| 79 | + |
| 80 | +### Process |
| 81 | + |
| 82 | +For EACH test below, follow this cycle: |
| 83 | + |
| 84 | +1. **Launch a sub-agent** using the Task tool (use a fast model like haiku) |
| 85 | +2. **Wait for the sub-agent to complete** |
| 86 | +3. **Observe whether the hook fired automatically** - you should see a blocking prompt or command output |
| 87 | +4. **Record the result** - pass if hook fired, fail if it didn't |
| 88 | +5. **Revert changes**: `git checkout -- manual_tests/` |
| 89 | +6. **Proceed to the next test** |
| 90 | + |
| 91 | +**IMPORTANT**: Only launch ONE sub-agent at a time. Wait for it to complete and revert before launching the next. |
| 92 | + |
| 93 | +### Test Cases (run serially) |
| 94 | + |
| 95 | +**Test 1: Trigger/Safety** |
| 96 | +- Sub-agent prompt: "Edit ONLY `manual_tests/test_trigger_safety_mode/feature.py` to add a comment. Do NOT edit the `_doc.md` file." |
| 97 | +- Expected: Hook fires with prompt about updating documentation |
| 98 | + |
| 99 | +**Test 2: Set Mode** |
| 100 | +- Sub-agent prompt: "Edit ONLY `manual_tests/test_set_mode/module_source.py` to add a comment. Do NOT edit the `_test.py` file." |
| 101 | +- Expected: Hook fires with prompt about updating tests |
| 102 | + |
| 103 | +**Test 3: Pair Mode** |
| 104 | +- Sub-agent prompt: "Edit ONLY `manual_tests/test_pair_mode/handler_trigger.py` to add a comment. Do NOT edit the `_expected.md` file." |
| 105 | +- Expected: Hook fires with prompt about updating expected output |
| 106 | + |
| 107 | +**Test 4: Command Action** |
| 108 | +- Sub-agent prompt: "Edit `manual_tests/test_command_action/input.txt` to add some text." |
| 109 | +- Expected: Command runs automatically, appending to the log file (this rule always runs, no safety condition) |
| 110 | + |
| 111 | +**Test 5: Multi Safety** |
| 112 | +- Sub-agent prompt: "Edit ONLY `manual_tests/test_multi_safety/core.py` to add a comment. Do NOT edit any of the safety files (`_safety_a.md`, `_safety_b.md`, or `_safety_c.md`)." |
| 113 | +- Expected: Hook fires with prompt about updating safety documentation |
| 114 | + |
| 115 | +**Test 6: Infinite Block Prompt** |
| 116 | +- Sub-agent prompt: "Edit `manual_tests/test_infinite_block_prompt/dangerous.py` to add a comment. Do NOT include any promise tags." |
| 117 | +- Expected: Hook fires and BLOCKS with infinite prompt - sub-agent cannot complete until promise is provided |
| 118 | + |
| 119 | +**Test 7: Infinite Block Command** |
| 120 | +- Sub-agent prompt: "Edit `manual_tests/test_infinite_block_command/risky.py` to add a comment. Do NOT include any promise tags." |
| 121 | +- Expected: Hook fires and command fails - sub-agent cannot complete until promise is provided |
| 122 | + |
| 123 | +**Test 8: Created Mode** |
| 124 | +- Sub-agent prompt: "Create a NEW file `manual_tests/test_created_mode/new_config.yml` with some YAML content. This must be a NEW file, not a modification." |
| 125 | +- Expected: Hook fires with prompt about new configuration files |
| 126 | + |
| 127 | +### Results Tracking |
| 128 | + |
| 129 | +Record the result after each test: |
| 130 | + |
| 131 | +| Test Case | Should Fire | Hook Fired? | Result | |
| 132 | +|-----------|-------------|:-----------:|:------:| |
| 133 | +| Trigger/Safety | Edit .py only | | | |
| 134 | +| Set Mode | Edit _source.py only | | | |
| 135 | +| Pair Mode | Edit _trigger.py only | | | |
| 136 | +| Command Action | Edit .txt | | | |
| 137 | +| Multi Safety | Edit .py only | | | |
| 138 | +| Infinite Block Prompt | Edit .py (no promise) | | | |
| 139 | +| Infinite Block Command | Edit .py (no promise) | | | |
| 140 | +| Created Mode | Create NEW .yml | | | |
| 141 | + |
| 142 | +## Quality Criteria |
| 143 | + |
| 144 | +- **Sub-agents spawned**: All 8 tests were run using the Task tool to spawn sub-agents - the main agent did NOT edit files directly |
| 145 | +- **Serial execution**: Sub-agents were launched ONE AT A TIME, not in parallel |
| 146 | +- **Git reverted between tests**: `git checkout -- manual_tests/` was run after each test |
| 147 | +- **Hooks observed (not triggered)**: The main agent observed hook behavior without manually running rules_check - hooks fired AUTOMATICALLY |
| 148 | +- **Blocking behavior verified**: For each test, the appropriate blocking hook fired automatically when the sub-agent returned |
| 149 | +- **Results recorded**: Pass/fail status was recorded for each test |
| 150 | +- When all criteria are met, include `<promise>✓ Quality Criteria Met</promise>` in your response |
| 151 | + |
| 152 | +## Reference |
| 153 | + |
| 154 | +See [test_reference.md](test_reference.md) for the complete test matrix and rule descriptions. |
| 155 | + |
| 156 | +## Context |
| 157 | + |
| 158 | +This step runs after the "should NOT fire" tests. These tests verify that rules correctly fire when trigger conditions are met without safety conditions. The serial execution with reverts is essential to prevent cross-contamination between tests. |
| 159 | + |
| 160 | + |
| 161 | +### Job Context |
| 162 | + |
| 163 | +A workflow for running manual tests that validate DeepWork rules/hooks fire correctly. |
| 164 | + |
| 165 | +This job tests that rules fire when they should AND do not fire when they shouldn't. |
| 166 | +Each test is run in a SUB-AGENT (not the main agent) because: |
| 167 | +1. Sub-agents run in isolated contexts where file changes can be detected |
| 168 | +2. The Stop hook automatically evaluates rules when each sub-agent completes |
| 169 | +3. The main agent can observe whether hooks fired without triggering them manually |
| 170 | + |
| 171 | +CRITICAL: All tests MUST run in sub-agents. The main agent MUST NOT make the file |
| 172 | +edits itself - it spawns sub-agents to make edits, then observes whether the hooks |
| 173 | +fired automatically when those sub-agents returned. |
| 174 | + |
| 175 | +Steps: |
| 176 | +1. run_not_fire_tests - Run all "should NOT fire" tests in PARALLEL sub-agents |
| 177 | +2. run_fire_tests - Run all "should fire" tests in SERIAL sub-agents with reverts between |
| 178 | + |
| 179 | +Test types covered: |
| 180 | +- Trigger/Safety mode |
| 181 | +- Set mode (bidirectional) |
| 182 | +- Pair mode (directional) |
| 183 | +- Command action |
| 184 | +- Multi safety |
| 185 | +- Infinite block (prompt and command) |
| 186 | +- Created mode (new files only) |
| 187 | + |
| 188 | + |
| 189 | +## Required Inputs |
| 190 | + |
| 191 | + |
| 192 | +**Files from Previous Steps** - Read these first: |
| 193 | +- `not_fire_results` (from `run_not_fire_tests`) |
| 194 | + |
| 195 | +## Work Branch |
| 196 | + |
| 197 | +Use branch format: `deepwork/manual_tests-[instance]-YYYYMMDD` |
| 198 | + |
| 199 | +- If on a matching work branch: continue using it |
| 200 | +- If on main/master: create new branch with `git checkout -b deepwork/manual_tests-[instance]-$(date +%Y%m%d)` |
| 201 | + |
| 202 | +## Outputs |
| 203 | + |
| 204 | +**Required outputs**: |
| 205 | +- `fire_results` |
| 206 | + |
| 207 | +## Guardrails |
| 208 | + |
| 209 | +- Do NOT skip prerequisite verification if this step has dependencies |
| 210 | +- Do NOT produce partial outputs; complete all required outputs before finishing |
| 211 | +- Do NOT proceed without required inputs; ask the user if any are missing |
| 212 | +- Do NOT modify files outside the scope of this step's defined outputs |
| 213 | + |
| 214 | +## Quality Validation |
| 215 | + |
| 216 | +Stop hooks will automatically validate your work. The loop continues until all criteria pass. |
| 217 | + |
| 218 | +**Criteria (all must be satisfied)**: |
| 219 | +1. **Sub-Agents Used**: Did the main agent spawn a sub-agent (using the Task tool) for EACH test? The main agent must NOT edit the test files directly. |
| 220 | +2. **Serial Execution**: Were sub-agents launched ONE AT A TIME (not in parallel) to prevent cross-contamination? |
| 221 | +3. **Hooks Fired Automatically**: Did the main agent observe the blocking hooks firing automatically when each sub-agent returned? The agent must NOT manually run the rules_check command. |
| 222 | +4. **Git Reverted Between Tests**: Was `git checkout -- manual_tests/` run between each test to prevent cross-contamination? |
| 223 | +5. **All Tests Run**: Were all 8 'should fire' tests executed (trigger/safety, set, pair, command action, multi safety, infinite block prompt, infinite block command, created)? |
| 224 | +6. **Results Recorded**: Did the main agent track pass/fail status for each test case? |
| 225 | + |
| 226 | + |
| 227 | +**To complete**: Include `<promise>✓ Quality Criteria Met</promise>` in your final response only after verifying ALL criteria are satisfied. |
| 228 | + |
| 229 | +## On Completion |
| 230 | + |
| 231 | +1. Verify outputs are created |
| 232 | +2. Inform user: "Step 2/2 complete, outputs: fire_results" |
| 233 | +3. **Workflow complete**: All steps finished. Consider creating a PR to merge the work branch. |
| 234 | + |
| 235 | +--- |
| 236 | + |
| 237 | +**Reference files**: `.deepwork/jobs/manual_tests/job.yml`, `.deepwork/jobs/manual_tests/steps/run_fire_tests.md` |
0 commit comments