Skip to content

Commit ead6ad6

Browse files
authored
Merge branch 'main' into claude/refactor-git-operations-2WxqO
2 parents 20237ce + a8b29bf commit ead6ad6

File tree

11 files changed

+1382
-80
lines changed

11 files changed

+1382
-80
lines changed
Lines changed: 237 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,237 @@
1+
---
2+
name: manual_tests.run_fire_tests
3+
description: "Runs all 'should fire' tests serially with git reverts between each. Use after NOT-fire tests to verify rules fire correctly."
4+
user-invocable: false
5+
hooks:
6+
Stop:
7+
- hooks:
8+
- type: prompt
9+
prompt: |
10+
You must evaluate whether Claude has met all the below quality criteria for the request.
11+
12+
## Quality Criteria
13+
14+
1. **Sub-Agents Used**: Did the main agent spawn a sub-agent (using the Task tool) for EACH test? The main agent must NOT edit the test files directly.
15+
2. **Serial Execution**: Were sub-agents launched ONE AT A TIME (not in parallel) to prevent cross-contamination?
16+
3. **Hooks Fired Automatically**: Did the main agent observe the blocking hooks firing automatically when each sub-agent returned? The agent must NOT manually run the rules_check command.
17+
4. **Git Reverted Between Tests**: Was `git checkout -- manual_tests/` run between each test to prevent cross-contamination?
18+
5. **All Tests Run**: Were all 8 'should fire' tests executed (trigger/safety, set, pair, command action, multi safety, infinite block prompt, infinite block command, created)?
19+
6. **Results Recorded**: Did the main agent track pass/fail status for each test case?
20+
21+
## Instructions
22+
23+
Review the conversation and determine if ALL quality criteria above have been satisfied.
24+
Look for evidence that each criterion has been addressed.
25+
26+
If the agent has included `<promise>✓ Quality Criteria Met</promise>` in their response AND
27+
all criteria appear to be met, respond with: {"ok": true}
28+
29+
If criteria are NOT met OR the promise tag is missing, respond with:
30+
{"ok": false, "reason": "**AGENT: TAKE ACTION** - [which criteria failed and why]"}
31+
---
32+
33+
# manual_tests.run_fire_tests
34+
35+
**Step 2/2** in **manual_tests** workflow
36+
37+
> Runs all manual hook/rule tests using sub-agents. Use when validating that DeepWork rules fire correctly.
38+
39+
## Prerequisites (Verify First)
40+
41+
Before proceeding, confirm these steps are complete:
42+
- `/manual_tests.run_not_fire_tests`
43+
44+
## Instructions
45+
46+
**Goal**: Runs all 'should fire' tests serially with git reverts between each. Use after NOT-fire tests to verify rules fire correctly.
47+
48+
# Run Should-Fire Tests
49+
50+
## Objective
51+
52+
Run all "should fire" tests in **serial** sub-agents to verify that rules fire correctly when their trigger conditions are met without safety conditions.
53+
54+
## CRITICAL: Sub-Agent Requirement
55+
56+
**You MUST spawn sub-agents to make all file edits. DO NOT edit the test files yourself.**
57+
58+
Why sub-agents are required:
59+
1. Sub-agents run in isolated contexts where file changes are detected
60+
2. When a sub-agent completes, the Stop hook **automatically** evaluates rules
61+
3. You (the main agent) observe whether hooks fired - you do NOT manually trigger them
62+
4. If you edit files directly, the hooks won't fire because you're not a completing sub-agent
63+
64+
**NEVER manually run `echo '{}' | python -m deepwork.hooks.rules_check`** - this defeats the purpose of the test. Hooks must fire AUTOMATICALLY when sub-agents return.
65+
66+
## CRITICAL: Serial Execution
67+
68+
**These tests MUST run ONE AT A TIME, with git reverts between each.**
69+
70+
Why serial execution is required:
71+
- These tests edit ONLY the trigger file (not the safety)
72+
- If multiple sub-agents run in parallel, sub-agent A's hook will see changes from sub-agent B
73+
- This causes cross-contamination: A gets blocked by rules triggered by B's changes
74+
- Run one test, observe the hook, revert, then run the next
75+
76+
## Task
77+
78+
Run all 8 "should fire" tests in **serial** sub-agents, reverting between each, and verify that blocking hooks fire automatically.
79+
80+
### Process
81+
82+
For EACH test below, follow this cycle:
83+
84+
1. **Launch a sub-agent** using the Task tool (use a fast model like haiku)
85+
2. **Wait for the sub-agent to complete**
86+
3. **Observe whether the hook fired automatically** - you should see a blocking prompt or command output
87+
4. **Record the result** - pass if hook fired, fail if it didn't
88+
5. **Revert changes**: `git checkout -- manual_tests/`
89+
6. **Proceed to the next test**
90+
91+
**IMPORTANT**: Only launch ONE sub-agent at a time. Wait for it to complete and revert before launching the next.
92+
93+
### Test Cases (run serially)
94+
95+
**Test 1: Trigger/Safety**
96+
- Sub-agent prompt: "Edit ONLY `manual_tests/test_trigger_safety_mode/feature.py` to add a comment. Do NOT edit the `_doc.md` file."
97+
- Expected: Hook fires with prompt about updating documentation
98+
99+
**Test 2: Set Mode**
100+
- Sub-agent prompt: "Edit ONLY `manual_tests/test_set_mode/module_source.py` to add a comment. Do NOT edit the `_test.py` file."
101+
- Expected: Hook fires with prompt about updating tests
102+
103+
**Test 3: Pair Mode**
104+
- Sub-agent prompt: "Edit ONLY `manual_tests/test_pair_mode/handler_trigger.py` to add a comment. Do NOT edit the `_expected.md` file."
105+
- Expected: Hook fires with prompt about updating expected output
106+
107+
**Test 4: Command Action**
108+
- Sub-agent prompt: "Edit `manual_tests/test_command_action/input.txt` to add some text."
109+
- Expected: Command runs automatically, appending to the log file (this rule always runs, no safety condition)
110+
111+
**Test 5: Multi Safety**
112+
- Sub-agent prompt: "Edit ONLY `manual_tests/test_multi_safety/core.py` to add a comment. Do NOT edit any of the safety files (`_safety_a.md`, `_safety_b.md`, or `_safety_c.md`)."
113+
- Expected: Hook fires with prompt about updating safety documentation
114+
115+
**Test 6: Infinite Block Prompt**
116+
- Sub-agent prompt: "Edit `manual_tests/test_infinite_block_prompt/dangerous.py` to add a comment. Do NOT include any promise tags."
117+
- Expected: Hook fires and BLOCKS with infinite prompt - sub-agent cannot complete until promise is provided
118+
119+
**Test 7: Infinite Block Command**
120+
- Sub-agent prompt: "Edit `manual_tests/test_infinite_block_command/risky.py` to add a comment. Do NOT include any promise tags."
121+
- Expected: Hook fires and command fails - sub-agent cannot complete until promise is provided
122+
123+
**Test 8: Created Mode**
124+
- Sub-agent prompt: "Create a NEW file `manual_tests/test_created_mode/new_config.yml` with some YAML content. This must be a NEW file, not a modification."
125+
- Expected: Hook fires with prompt about new configuration files
126+
127+
### Results Tracking
128+
129+
Record the result after each test:
130+
131+
| Test Case | Should Fire | Hook Fired? | Result |
132+
|-----------|-------------|:-----------:|:------:|
133+
| Trigger/Safety | Edit .py only | | |
134+
| Set Mode | Edit _source.py only | | |
135+
| Pair Mode | Edit _trigger.py only | | |
136+
| Command Action | Edit .txt | | |
137+
| Multi Safety | Edit .py only | | |
138+
| Infinite Block Prompt | Edit .py (no promise) | | |
139+
| Infinite Block Command | Edit .py (no promise) | | |
140+
| Created Mode | Create NEW .yml | | |
141+
142+
## Quality Criteria
143+
144+
- **Sub-agents spawned**: All 8 tests were run using the Task tool to spawn sub-agents - the main agent did NOT edit files directly
145+
- **Serial execution**: Sub-agents were launched ONE AT A TIME, not in parallel
146+
- **Git reverted between tests**: `git checkout -- manual_tests/` was run after each test
147+
- **Hooks observed (not triggered)**: The main agent observed hook behavior without manually running rules_check - hooks fired AUTOMATICALLY
148+
- **Blocking behavior verified**: For each test, the appropriate blocking hook fired automatically when the sub-agent returned
149+
- **Results recorded**: Pass/fail status was recorded for each test
150+
- When all criteria are met, include `<promise>✓ Quality Criteria Met</promise>` in your response
151+
152+
## Reference
153+
154+
See [test_reference.md](test_reference.md) for the complete test matrix and rule descriptions.
155+
156+
## Context
157+
158+
This step runs after the "should NOT fire" tests. These tests verify that rules correctly fire when trigger conditions are met without safety conditions. The serial execution with reverts is essential to prevent cross-contamination between tests.
159+
160+
161+
### Job Context
162+
163+
A workflow for running manual tests that validate DeepWork rules/hooks fire correctly.
164+
165+
This job tests that rules fire when they should AND do not fire when they shouldn't.
166+
Each test is run in a SUB-AGENT (not the main agent) because:
167+
1. Sub-agents run in isolated contexts where file changes can be detected
168+
2. The Stop hook automatically evaluates rules when each sub-agent completes
169+
3. The main agent can observe whether hooks fired without triggering them manually
170+
171+
CRITICAL: All tests MUST run in sub-agents. The main agent MUST NOT make the file
172+
edits itself - it spawns sub-agents to make edits, then observes whether the hooks
173+
fired automatically when those sub-agents returned.
174+
175+
Steps:
176+
1. run_not_fire_tests - Run all "should NOT fire" tests in PARALLEL sub-agents
177+
2. run_fire_tests - Run all "should fire" tests in SERIAL sub-agents with reverts between
178+
179+
Test types covered:
180+
- Trigger/Safety mode
181+
- Set mode (bidirectional)
182+
- Pair mode (directional)
183+
- Command action
184+
- Multi safety
185+
- Infinite block (prompt and command)
186+
- Created mode (new files only)
187+
188+
189+
## Required Inputs
190+
191+
192+
**Files from Previous Steps** - Read these first:
193+
- `not_fire_results` (from `run_not_fire_tests`)
194+
195+
## Work Branch
196+
197+
Use branch format: `deepwork/manual_tests-[instance]-YYYYMMDD`
198+
199+
- If on a matching work branch: continue using it
200+
- If on main/master: create new branch with `git checkout -b deepwork/manual_tests-[instance]-$(date +%Y%m%d)`
201+
202+
## Outputs
203+
204+
**Required outputs**:
205+
- `fire_results`
206+
207+
## Guardrails
208+
209+
- Do NOT skip prerequisite verification if this step has dependencies
210+
- Do NOT produce partial outputs; complete all required outputs before finishing
211+
- Do NOT proceed without required inputs; ask the user if any are missing
212+
- Do NOT modify files outside the scope of this step's defined outputs
213+
214+
## Quality Validation
215+
216+
Stop hooks will automatically validate your work. The loop continues until all criteria pass.
217+
218+
**Criteria (all must be satisfied)**:
219+
1. **Sub-Agents Used**: Did the main agent spawn a sub-agent (using the Task tool) for EACH test? The main agent must NOT edit the test files directly.
220+
2. **Serial Execution**: Were sub-agents launched ONE AT A TIME (not in parallel) to prevent cross-contamination?
221+
3. **Hooks Fired Automatically**: Did the main agent observe the blocking hooks firing automatically when each sub-agent returned? The agent must NOT manually run the rules_check command.
222+
4. **Git Reverted Between Tests**: Was `git checkout -- manual_tests/` run between each test to prevent cross-contamination?
223+
5. **All Tests Run**: Were all 8 'should fire' tests executed (trigger/safety, set, pair, command action, multi safety, infinite block prompt, infinite block command, created)?
224+
6. **Results Recorded**: Did the main agent track pass/fail status for each test case?
225+
226+
227+
**To complete**: Include `<promise>✓ Quality Criteria Met</promise>` in your final response only after verifying ALL criteria are satisfied.
228+
229+
## On Completion
230+
231+
1. Verify outputs are created
232+
2. Inform user: "Step 2/2 complete, outputs: fire_results"
233+
3. **Workflow complete**: All steps finished. Consider creating a PR to merge the work branch.
234+
235+
---
236+
237+
**Reference files**: `.deepwork/jobs/manual_tests/job.yml`, `.deepwork/jobs/manual_tests/steps/run_fire_tests.md`

0 commit comments

Comments
 (0)