Skip to content

Commit 8df8a3c

Browse files
committed
Sync generated skills for manual_tests job v1.3.0
Auto-generated by `deepwork install`: - Added skills for new infinite_block_tests and reset steps - Updated existing step skills with new configuration
1 parent 9eccfe9 commit 8df8a3c

File tree

11 files changed

+1070
-193
lines changed

11 files changed

+1070
-193
lines changed

.claude/settings.json

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,10 @@
120120
"Skill(manual_tests.run_fire_tests)",
121121
"Skill(deepwork_rules)",
122122
"Skill(deepwork_rules.define)",
123-
"Bash(deepwork rules clear_queue)"
123+
"Bash(deepwork rules clear_queue)",
124+
"Bash(rm -rf .deepwork/tmp/rules/queue/*.json)",
125+
"Skill(manual_tests.reset)",
126+
"Skill(manual_tests.infinite_block_tests)"
124127
]
125128
},
126129
"hooks": {
Lines changed: 289 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,289 @@
1+
---
2+
name: manual_tests.infinite_block_tests
3+
description: "Runs all 4 infinite block tests serially. Tests both 'should fire' (no promise) and 'should NOT fire' (with promise) scenarios."
4+
user-invocable: false
5+
hooks:
6+
Stop:
7+
- hooks:
8+
- type: prompt
9+
prompt: |
10+
You must evaluate whether Claude has met all the below quality criteria for the request.
11+
12+
## Quality Criteria
13+
14+
1. **Sub-Agents Used**: Did the main agent spawn a sub-agent (using the Task tool) for EACH test? The main agent must NOT edit the test files directly.
15+
2. **Sub-Agent Config**: Did all sub-agents use `model: "haiku"` and `max_turns: 5`?
16+
3. **Serial Execution**: Were sub-agents launched ONE AT A TIME (not in parallel)?
17+
4. **Promise Tests Passed**: Did tests with promise tags complete WITHOUT blocking?
18+
5. **Non-Promise Tests Blocked**: Did tests without promise tags correctly trigger blocking behavior?
19+
6. **Reset Between Tests**: Was the reset step called internally after each test?
20+
7. **Early Termination**: If 2 tests failed, did testing halt immediately with results reported?
21+
8. **Results Recorded**: Did the main agent track pass/fail status for each test case?
22+
23+
## Instructions
24+
25+
Review the conversation and determine if ALL quality criteria above have been satisfied.
26+
Look for evidence that each criterion has been addressed.
27+
28+
If the agent has included `<promise>✓ Quality Criteria Met</promise>` in their response OR
29+
all criteria appear to be met, respond with: {"ok": true}
30+
31+
If criteria are NOT met AND the promise tag is missing, respond with:
32+
{"ok": false, "reason": "**AGENT: TAKE ACTION** - [which criteria failed and why]"}
33+
SubagentStop:
34+
- hooks:
35+
- type: prompt
36+
prompt: |
37+
You must evaluate whether Claude has met all the below quality criteria for the request.
38+
39+
## Quality Criteria
40+
41+
1. **Sub-Agents Used**: Did the main agent spawn a sub-agent (using the Task tool) for EACH test? The main agent must NOT edit the test files directly.
42+
2. **Sub-Agent Config**: Did all sub-agents use `model: "haiku"` and `max_turns: 5`?
43+
3. **Serial Execution**: Were sub-agents launched ONE AT A TIME (not in parallel)?
44+
4. **Promise Tests Passed**: Did tests with promise tags complete WITHOUT blocking?
45+
5. **Non-Promise Tests Blocked**: Did tests without promise tags correctly trigger blocking behavior?
46+
6. **Reset Between Tests**: Was the reset step called internally after each test?
47+
7. **Early Termination**: If 2 tests failed, did testing halt immediately with results reported?
48+
8. **Results Recorded**: Did the main agent track pass/fail status for each test case?
49+
50+
## Instructions
51+
52+
Review the conversation and determine if ALL quality criteria above have been satisfied.
53+
Look for evidence that each criterion has been addressed.
54+
55+
If the agent has included `<promise>✓ Quality Criteria Met</promise>` in their response OR
56+
all criteria appear to be met, respond with: {"ok": true}
57+
58+
If criteria are NOT met AND the promise tag is missing, respond with:
59+
{"ok": false, "reason": "**AGENT: TAKE ACTION** - [which criteria failed and why]"}
60+
---
61+
62+
# manual_tests.infinite_block_tests
63+
64+
**Step 4/4** in **manual_tests** workflow
65+
66+
> Runs all manual hook/rule tests using sub-agents. Use when validating that DeepWork rules fire correctly.
67+
68+
## Prerequisites (Verify First)
69+
70+
Before proceeding, confirm these steps are complete:
71+
- `/manual_tests.run_fire_tests`
72+
73+
## Instructions
74+
75+
**Goal**: Runs all 4 infinite block tests serially. Tests both 'should fire' (no promise) and 'should NOT fire' (with promise) scenarios.
76+
77+
# Run Infinite Block Tests
78+
79+
## Objective
80+
81+
Run all infinite block tests in **serial** to verify that infinite blocking rules work correctly - both firing when they should AND not firing when bypassed with a promise tag.
82+
83+
## CRITICAL: Sub-Agent Requirement
84+
85+
**You MUST spawn sub-agents to make all file edits. DO NOT edit the test files yourself.**
86+
87+
Why sub-agents are required:
88+
1. Sub-agents run in isolated contexts where file changes are detected
89+
2. When a sub-agent completes, the Stop hook **automatically** evaluates rules
90+
3. You (the main agent) observe whether hooks fired - you do NOT manually trigger them
91+
4. If you edit files directly, the hooks won't fire because you're not a completing sub-agent
92+
93+
**NEVER manually run `echo '{}' | python -m deepwork.hooks.rules_check`** - this defeats the purpose of the test. Hooks must fire AUTOMATICALLY when sub-agents return.
94+
95+
## CRITICAL: Serial Execution
96+
97+
**These tests MUST run ONE AT A TIME, with resets between each.**
98+
99+
Why serial execution is required for infinite block tests:
100+
- Infinite block tests can block indefinitely without a promise tag
101+
- Running them in parallel would cause unpredictable blocking behavior
102+
- Serial execution allows controlled observation of each test
103+
104+
## Task
105+
106+
Run all 4 infinite block tests in **serial**, resetting between each, and verify correct blocking behavior.
107+
108+
### Process
109+
110+
For EACH test below, follow this cycle:
111+
112+
1. **Launch a sub-agent** using the Task tool with:
113+
- `model: "haiku"` - Use the fast model to minimize cost and latency
114+
- `max_turns: 5` - Prevent sub-agents from hanging indefinitely
115+
2. **Wait for the sub-agent to complete**
116+
3. **Observe whether the hook fired automatically** - you should see a blocking prompt or command output
117+
4. **If no visible blocking occurred, check the queue**:
118+
```bash
119+
ls -la .deepwork/tmp/rules/queue/
120+
cat .deepwork/tmp/rules/queue/*.json 2>/dev/null
121+
```
122+
- If queue entries exist with status "queued", the hook DID fire but blocking wasn't visible
123+
- If queue is empty, the hook did NOT fire at all
124+
- Record the queue status along with the result
125+
5. **Record the result** - see expected outcomes for each test
126+
6. **Reset** (MANDATORY after each test) - follow the reset step instructions:
127+
```bash
128+
git reset HEAD manual_tests/ && git checkout -- manual_tests/ && rm -f manual_tests/test_created_mode/new_config.yml
129+
deepwork rules clear_queue
130+
```
131+
7. **Check for early termination**: If **2 tests have now failed**, immediately:
132+
- Stop running any remaining tests
133+
- Report the results summary showing which tests passed/failed
134+
- The job halts here - do NOT proceed with remaining tests
135+
8. **Proceed to the next test** (only if fewer than 2 failures)
136+
137+
**IMPORTANT**: Only launch ONE sub-agent at a time. Wait for it to complete and reset before launching the next.
138+
139+
### Test Cases (run serially)
140+
141+
**Test 1: Infinite Block Prompt - Should NOT Fire (with promise)**
142+
- Sub-agent prompt: "Edit `manual_tests/test_infinite_block_prompt/dangerous.py` to add a comment. Include `<promise>I have verified this change is safe</promise>` in your response to bypass the infinite block."
143+
- Sub-agent config: `model: "haiku"`, `max_turns: 5`
144+
- Expected: Sub-agent completes WITHOUT blocking - the promise tag bypasses the infinite block
145+
- Result: PASS if no blocking, FAIL if blocked
146+
147+
**Test 2: Infinite Block Command - Should NOT Fire (with promise)**
148+
- Sub-agent prompt: "Edit `manual_tests/test_infinite_block_command/risky.py` to add a comment. Include `<promise>I have verified this change is safe</promise>` in your response to bypass the infinite block."
149+
- Sub-agent config: `model: "haiku"`, `max_turns: 5`
150+
- Expected: Sub-agent completes WITHOUT blocking - the promise tag bypasses the infinite block
151+
- Result: PASS if no blocking, FAIL if blocked
152+
153+
**Test 3: Infinite Block Prompt - Should Fire (no promise)**
154+
- Sub-agent prompt: "Edit `manual_tests/test_infinite_block_prompt/dangerous.py` to add a comment. Do NOT include any promise tags."
155+
- Sub-agent config: `model: "haiku"`, `max_turns: 5`
156+
- Expected: Hook fires and BLOCKS with infinite prompt - sub-agent cannot complete until promise is provided (or max_turns reached)
157+
- Result: PASS if hook fired (visible block OR queue entry), FAIL if neither
158+
159+
**Test 4: Infinite Block Command - Should Fire (no promise)**
160+
- Sub-agent prompt: "Edit `manual_tests/test_infinite_block_command/risky.py` to add a comment. Do NOT include any promise tags."
161+
- Sub-agent config: `model: "haiku"`, `max_turns: 5`
162+
- Expected: Hook fires and command fails - sub-agent cannot complete until promise is provided (or max_turns reached)
163+
- Result: PASS if hook fired (visible block OR queue entry), FAIL if neither
164+
165+
### Results Tracking
166+
167+
Record the result after each test:
168+
169+
| Test Case | Scenario | Expected | Visible Block? | Queue Entry? | Result |
170+
|-----------|----------|----------|:--------------:|:------------:|:------:|
171+
| Infinite Block Prompt | With promise | No block | | | |
172+
| Infinite Block Command | With promise | No block | | | |
173+
| Infinite Block Prompt | No promise | Blocks | | | |
174+
| Infinite Block Command | No promise | Blocks | | | |
175+
176+
**Queue Entry Status Guide:**
177+
- If queue has entry with status "queued" -> Hook fired, rule was shown to agent
178+
- If queue has entry with status "passed" -> Hook fired, rule was satisfied
179+
- If queue is empty -> Hook did NOT fire
180+
181+
## Quality Criteria
182+
183+
- **Sub-agents spawned**: Tests were run using the Task tool to spawn sub-agents - the main agent did NOT edit files directly
184+
- **Correct sub-agent config**: All sub-agents used `model: "haiku"` and `max_turns: 5`
185+
- **Serial execution**: Sub-agents were launched ONE AT A TIME, not in parallel
186+
- **Reset between tests**: Reset step was followed after each test
187+
- **Hooks observed (not triggered)**: The main agent observed hook behavior without manually running rules_check - hooks fired AUTOMATICALLY
188+
- **Blocking behavior verified**: Promise tests completed without blocking; non-promise tests were blocked
189+
- **Early termination on 2 failures**: If 2 tests failed, testing halted immediately and results were reported
190+
- **Results recorded**: Pass/fail status was recorded for each test run
191+
- When all criteria are met, include `<promise>Quality Criteria Met</promise>` in your response
192+
193+
## Reference
194+
195+
See [test_reference.md](test_reference.md) for the complete test matrix and rule descriptions.
196+
197+
## Context
198+
199+
This step runs after both the "should NOT fire" and "should fire" test steps. It specifically tests infinite blocking behavior which requires serial execution due to the blocking nature of these rules.
200+
201+
202+
### Job Context
203+
204+
A workflow for running manual tests that validate DeepWork rules/hooks fire correctly.
205+
206+
This job tests that rules fire when they should AND do not fire when they shouldn't.
207+
Each test is run in a SUB-AGENT (not the main agent) because:
208+
1. Sub-agents run in isolated contexts where file changes can be detected
209+
2. The Stop hook automatically evaluates rules when each sub-agent completes
210+
3. The main agent can observe whether hooks fired without triggering them manually
211+
212+
CRITICAL: All tests MUST run in sub-agents. The main agent MUST NOT make the file
213+
edits itself - it spawns sub-agents to make edits, then observes whether the hooks
214+
fired automatically when those sub-agents returned.
215+
216+
Sub-agent configuration:
217+
- All sub-agents should use `model: "haiku"` to minimize cost and latency
218+
- All sub-agents should use `max_turns: 5` to prevent hanging indefinitely
219+
220+
Steps:
221+
1. run_not_fire_tests - Run all "should NOT fire" tests in PARALLEL sub-agents (6 tests)
222+
2. run_fire_tests - Run all "should fire" tests in SERIAL sub-agents with resets between (6 tests)
223+
3. infinite_block_tests - Run infinite block tests in SERIAL (4 tests - both fire and not-fire)
224+
225+
Reset procedure (see steps/reset.md):
226+
- Each step calls the reset procedure internally when needed
227+
- Reset reverts git changes, removes created files, and clears the rules queue
228+
229+
Test types covered:
230+
- Trigger/Safety mode
231+
- Set mode (bidirectional)
232+
- Pair mode (directional)
233+
- Command action
234+
- Multi safety
235+
- Infinite block (prompt and command) - in dedicated step
236+
- Created mode (new files only)
237+
238+
239+
## Required Inputs
240+
241+
242+
**Files from Previous Steps** - Read these first:
243+
- `fire_results` (from `run_fire_tests`)
244+
245+
## Work Branch
246+
247+
Use branch format: `deepwork/manual_tests-[instance]-YYYYMMDD`
248+
249+
- If on a matching work branch: continue using it
250+
- If on main/master: create new branch with `git checkout -b deepwork/manual_tests-[instance]-$(date +%Y%m%d)`
251+
252+
## Outputs
253+
254+
**Required outputs**:
255+
- `infinite_block_results`
256+
257+
## Guardrails
258+
259+
- Do NOT skip prerequisite verification if this step has dependencies
260+
- Do NOT produce partial outputs; complete all required outputs before finishing
261+
- Do NOT proceed without required inputs; ask the user if any are missing
262+
- Do NOT modify files outside the scope of this step's defined outputs
263+
264+
## Quality Validation
265+
266+
Stop hooks will automatically validate your work. The loop continues until all criteria pass.
267+
268+
**Criteria (all must be satisfied)**:
269+
1. **Sub-Agents Used**: Did the main agent spawn a sub-agent (using the Task tool) for EACH test? The main agent must NOT edit the test files directly.
270+
2. **Sub-Agent Config**: Did all sub-agents use `model: "haiku"` and `max_turns: 5`?
271+
3. **Serial Execution**: Were sub-agents launched ONE AT A TIME (not in parallel)?
272+
4. **Promise Tests Passed**: Did tests with promise tags complete WITHOUT blocking?
273+
5. **Non-Promise Tests Blocked**: Did tests without promise tags correctly trigger blocking behavior?
274+
6. **Reset Between Tests**: Was the reset step called internally after each test?
275+
7. **Early Termination**: If 2 tests failed, did testing halt immediately with results reported?
276+
8. **Results Recorded**: Did the main agent track pass/fail status for each test case?
277+
278+
279+
**To complete**: Include `<promise>✓ Quality Criteria Met</promise>` in your final response only after verifying ALL criteria are satisfied.
280+
281+
## On Completion
282+
283+
1. Verify outputs are created
284+
2. Inform user: "Step 4/4 complete, outputs: infinite_block_results"
285+
3. **Workflow complete**: All steps finished. Consider creating a PR to merge the work branch.
286+
287+
---
288+
289+
**Reference files**: `.deepwork/jobs/manual_tests/job.yml`, `.deepwork/jobs/manual_tests/steps/infinite_block_tests.md`

0 commit comments

Comments
 (0)