Skip to content

Commit cbbc686

Browse files
nhortonclaude
andauthored
Add early termination on 2 test failures to manual_tests job (#116)
* Add early termination on 2 test failures and emphasize file revert - Process now halts as soon as 2 tests have failed and reports results - Each step must revert files and clear queue after completion (mandatory) - Updated quality criteria to reflect these requirements - Bumped version to 1.2.0 * Fix incomplete git revert - add git reset HEAD to unstage files The rules_check hook uses `git add -A` to stage files for change detection. The previous `git checkout -- manual_tests/` only reverted the working tree but left files staged in the index, causing cross-contamination between tests. Fix: Use `git reset HEAD manual_tests/ && git checkout -- manual_tests/` to properly unstage and revert all changes. * Sync skills after manual_tests job updates Auto-generated skill files updated by deepwork install to reflect v1.2.1 changes (early termination on 2 failures, complete git revert). --------- Co-authored-by: Claude <noreply@anthropic.com>
1 parent c4bff77 commit cbbc686

File tree

7 files changed

+130
-63
lines changed

7 files changed

+130
-63
lines changed

.claude/skills/manual_tests.run_fire_tests/SKILL.md

Lines changed: 23 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,8 @@ hooks:
1414
1. **Sub-Agents Used**: Did the main agent spawn a sub-agent (using the Task tool) for EACH test? The main agent must NOT edit the test files directly.
1515
2. **Serial Execution**: Were sub-agents launched ONE AT A TIME (not in parallel) to prevent cross-contamination?
1616
3. **Hooks Fired Automatically**: Did the main agent observe the blocking hooks firing automatically when each sub-agent returned? The agent must NOT manually run the rules_check command.
17-
4. **Git Reverted Between Tests**: Was `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run between each test to prevent cross-contamination?
18-
5. **All Tests Run**: Were all 8 'should fire' tests executed (trigger/safety, set, pair, command action, multi safety, infinite block prompt, infinite block command, created)?
17+
4. **Git Reverted Between Tests**: Was `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run after each test to revert files and prevent cross-contamination?
18+
5. **Early Termination**: If 2 tests failed, did testing halt immediately with results reported?
1919
6. **Results Recorded**: Did the main agent track pass/fail status for each test case?
2020
2121
## Instructions
@@ -39,8 +39,8 @@ hooks:
3939
1. **Sub-Agents Used**: Did the main agent spawn a sub-agent (using the Task tool) for EACH test? The main agent must NOT edit the test files directly.
4040
2. **Serial Execution**: Were sub-agents launched ONE AT A TIME (not in parallel) to prevent cross-contamination?
4141
3. **Hooks Fired Automatically**: Did the main agent observe the blocking hooks firing automatically when each sub-agent returned? The agent must NOT manually run the rules_check command.
42-
4. **Git Reverted Between Tests**: Was `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run between each test to prevent cross-contamination?
43-
5. **All Tests Run**: Were all 8 'should fire' tests executed (trigger/safety, set, pair, command action, multi safety, infinite block prompt, infinite block command, created)?
42+
4. **Git Reverted Between Tests**: Was `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run after each test to revert files and prevent cross-contamination?
43+
5. **Early Termination**: If 2 tests failed, did testing halt immediately with results reported?
4444
6. **Results Recorded**: Did the main agent track pass/fail status for each test case?
4545
4646
## Instructions
@@ -118,13 +118,21 @@ For EACH test below, follow this cycle:
118118
- If queue is empty, the hook did NOT fire at all
119119
- Record the queue status along with the result
120120
5. **Record the result** - pass if hook fired (visible block OR queue entry), fail if neither
121-
6. **Revert changes and clear queue**:
121+
6. **Revert changes and clear queue** (MANDATORY after each test):
122122
```bash
123-
git checkout -- manual_tests/
123+
git reset HEAD manual_tests/ && git checkout -- manual_tests/ && rm -f manual_tests/test_created_mode/new_config.yml
124124
rm -rf .deepwork/tmp/rules/queue/*.json 2>/dev/null || true
125125
```
126-
The queue must be cleared because rules that have been shown (status=QUEUED) won't fire again until cleared.
127-
7. **Proceed to the next test**
126+
**Why this command sequence**:
127+
- `git reset HEAD manual_tests/` - Unstages files from the index (rules_check uses `git add -A` which stages changes)
128+
- `git checkout -- manual_tests/` - Reverts working tree to match HEAD
129+
- `rm -f ...` - Removes any new files created during tests
130+
- The queue clear removes rules that have been shown (status=QUEUED) so they can fire again
131+
7. **Check for early termination**: If **2 tests have now failed**, immediately:
132+
- Stop running any remaining tests
133+
- Report the results summary showing which tests passed/failed
134+
- The job halts here - do NOT proceed with remaining tests
135+
8. **Proceed to the next test** (only if fewer than 2 failures)
128136

129137
**IMPORTANT**: Only launch ONE sub-agent at a time. Wait for it to complete and revert before launching the next.
130138

@@ -184,12 +192,13 @@ Record the result after each test:
184192

185193
## Quality Criteria
186194

187-
- **Sub-agents spawned**: All 8 tests were run using the Task tool to spawn sub-agents - the main agent did NOT edit files directly
195+
- **Sub-agents spawned**: Tests were run using the Task tool to spawn sub-agents - the main agent did NOT edit files directly
188196
- **Serial execution**: Sub-agents were launched ONE AT A TIME, not in parallel
189-
- **Git reverted and queue cleared between tests**: `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` was run after each test
197+
- **Git reverted and queue cleared between tests**: `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` was run after each test
190198
- **Hooks observed (not triggered)**: The main agent observed hook behavior without manually running rules_check - hooks fired AUTOMATICALLY
191-
- **Blocking behavior verified**: For each test, the appropriate blocking hook fired automatically when the sub-agent returned
192-
- **Results recorded**: Pass/fail status was recorded for each test
199+
- **Blocking behavior verified**: For each test run, the appropriate blocking hook fired automatically when the sub-agent returned
200+
- **Early termination on 2 failures**: If 2 tests failed, testing halted immediately and results were reported
201+
- **Results recorded**: Pass/fail status was recorded for each test run
193202
- When all criteria are met, include `<promise>✓ Quality Criteria Met</promise>` in your response
194203

195204
## Reference
@@ -262,8 +271,8 @@ Stop hooks will automatically validate your work. The loop continues until all c
262271
1. **Sub-Agents Used**: Did the main agent spawn a sub-agent (using the Task tool) for EACH test? The main agent must NOT edit the test files directly.
263272
2. **Serial Execution**: Were sub-agents launched ONE AT A TIME (not in parallel) to prevent cross-contamination?
264273
3. **Hooks Fired Automatically**: Did the main agent observe the blocking hooks firing automatically when each sub-agent returned? The agent must NOT manually run the rules_check command.
265-
4. **Git Reverted Between Tests**: Was `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run between each test to prevent cross-contamination?
266-
5. **All Tests Run**: Were all 8 'should fire' tests executed (trigger/safety, set, pair, command action, multi safety, infinite block prompt, infinite block command, created)?
274+
4. **Git Reverted Between Tests**: Was `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run after each test to revert files and prevent cross-contamination?
275+
5. **Early Termination**: If 2 tests failed, did testing halt immediately with results reported?
267276
6. **Results Recorded**: Did the main agent track pass/fail status for each test case?
268277

269278

.claude/skills/manual_tests.run_not_fire_tests/SKILL.md

Lines changed: 24 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,8 @@ hooks:
1414
1. **Sub-Agents Used**: Did the main agent spawn sub-agents (using the Task tool) to make the file edits? The main agent must NOT edit the test files directly.
1515
2. **Parallel Execution**: Were multiple sub-agents launched in parallel (in a single message with multiple Task tool calls)?
1616
3. **Hooks Observed**: Did the main agent observe that no blocking hooks fired when the sub-agents returned? The hooks fire AUTOMATICALLY - the agent must NOT manually run the rules_check command.
17-
4. **All Tests Run**: Were all 8 'should NOT fire' tests executed (trigger/safety, set, pair forward, pair reverse, multi safety, infinite block prompt, infinite block command, created)?
18-
5. **Git Reverted**: Were changes reverted and queue cleared after tests completed using `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json`?
17+
4. **Early Termination**: If 2 tests failed, did testing halt immediately with results reported?
18+
5. **Git Reverted**: Were changes reverted and queue cleared after tests completed (or after early termination) using `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json`?
1919
2020
## Instructions
2121
@@ -38,8 +38,8 @@ hooks:
3838
1. **Sub-Agents Used**: Did the main agent spawn sub-agents (using the Task tool) to make the file edits? The main agent must NOT edit the test files directly.
3939
2. **Parallel Execution**: Were multiple sub-agents launched in parallel (in a single message with multiple Task tool calls)?
4040
3. **Hooks Observed**: Did the main agent observe that no blocking hooks fired when the sub-agents returned? The hooks fire AUTOMATICALLY - the agent must NOT manually run the rules_check command.
41-
4. **All Tests Run**: Were all 8 'should NOT fire' tests executed (trigger/safety, set, pair forward, pair reverse, multi safety, infinite block prompt, infinite block command, created)?
42-
5. **Git Reverted**: Were changes reverted and queue cleared after tests completed using `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json`?
41+
4. **Early Termination**: If 2 tests failed, did testing halt immediately with results reported?
42+
5. **Git Reverted**: Were changes reverted and queue cleared after tests completed (or after early termination) using `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json`?
4343
4444
## Instructions
4545
@@ -118,7 +118,7 @@ Run all 8 "should NOT fire" tests in **parallel** sub-agents, then verify no blo
118118

119119
**Remember**: You are OBSERVING whether hooks fired automatically. Do NOT run any verification commands manually.
120120

121-
3. **Record the results**
121+
3. **Record the results and check for early termination**
122122

123123
Track which tests passed and which failed:
124124

@@ -133,23 +133,35 @@ Run all 8 "should NOT fire" tests in **parallel** sub-agents, then verify no blo
133133
| Infinite Block Command | Promise tag | |
134134
| Created Mode | Modify existing | |
135135

136+
**EARLY TERMINATION**: If **2 tests have failed**, immediately:
137+
1. Stop running any remaining tests
138+
2. Revert all changes and clear queue (see step 4)
139+
3. Report the results summary showing which tests passed/failed
140+
4. Do NOT proceed to the next step - the job halts here
141+
136142
4. **Revert all changes and clear queue**
137143

138-
After all tests complete, run:
144+
**IMPORTANT**: This step is MANDATORY and must run regardless of whether tests passed or failed.
145+
146+
Run these commands to clean up:
139147
```bash
140-
git checkout -- manual_tests/
148+
git reset HEAD manual_tests/ && git checkout -- manual_tests/ && rm -f manual_tests/test_created_mode/new_config.yml
141149
rm -rf .deepwork/tmp/rules/queue/*.json 2>/dev/null || true
142150
```
143151

144-
This cleans up the test files AND clears the rules queue before the "should fire" tests run. The queue must be cleared because rules that have already been shown to the agent (status=QUEUED) won't fire again until the queue is cleared.
152+
**Why this command sequence**:
153+
- `git reset HEAD manual_tests/` - Unstages files from the index (rules_check uses `git add -A` which stages changes)
154+
- `git checkout -- manual_tests/` - Reverts working tree to match HEAD
155+
- `rm -f manual_tests/test_created_mode/new_config.yml` - Removes any new files created during tests
156+
- The queue clear removes rules that have been shown (status=QUEUED) so they can fire again
145157

146158
## Quality Criteria
147159

148160
- **Sub-agents spawned**: All 8 tests were run using the Task tool to spawn sub-agents - the main agent did NOT edit files directly
149161
- **Parallel execution**: All 8 sub-agents were launched in a single message (parallel)
150162
- **Hooks observed (not triggered)**: The main agent observed hook behavior without manually running rules_check
151-
- **No unexpected blocks**: All tests passed - no blocking hooks fired
152-
- **Changes reverted and queue cleared**: `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` was run after tests completed
163+
- **Early termination on 2 failures**: If 2 tests failed, testing halted immediately and results were reported
164+
- **Changes reverted and queue cleared**: `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` was run after tests completed (regardless of pass/fail)
153165
- When all criteria are met, include `<promise>✓ Quality Criteria Met</promise>` in your response
154166

155167
## Reference
@@ -217,8 +229,8 @@ Stop hooks will automatically validate your work. The loop continues until all c
217229
1. **Sub-Agents Used**: Did the main agent spawn sub-agents (using the Task tool) to make the file edits? The main agent must NOT edit the test files directly.
218230
2. **Parallel Execution**: Were multiple sub-agents launched in parallel (in a single message with multiple Task tool calls)?
219231
3. **Hooks Observed**: Did the main agent observe that no blocking hooks fired when the sub-agents returned? The hooks fire AUTOMATICALLY - the agent must NOT manually run the rules_check command.
220-
4. **All Tests Run**: Were all 8 'should NOT fire' tests executed (trigger/safety, set, pair forward, pair reverse, multi safety, infinite block prompt, infinite block command, created)?
221-
5. **Git Reverted**: Were changes reverted and queue cleared after tests completed using `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json`?
232+
4. **Early Termination**: If 2 tests failed, did testing halt immediately with results reported?
233+
5. **Git Reverted**: Were changes reverted and queue cleared after tests completed (or after early termination) using `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json`?
222234

223235

224236
**To complete**: Include `<promise>✓ Quality Criteria Met</promise>` in your final response only after verifying ALL criteria are satisfied.

.deepwork/jobs/manual_tests/job.yml

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
name: manual_tests
2-
version: "1.1.0"
2+
version: "1.2.1"
33
summary: "Runs all manual hook/rule tests using sub-agents. Use when validating that DeepWork rules fire correctly."
44
description: |
55
A workflow for running manual tests that validate DeepWork rules/hooks fire correctly.
@@ -28,6 +28,10 @@ description: |
2828
- Created mode (new files only)
2929
3030
changelog:
31+
- version: "1.2.1"
32+
changes: "Fixed incomplete revert - now uses git reset HEAD to unstage files (rules_check stages with git add -A)"
33+
- version: "1.2.0"
34+
changes: "Added early termination on 2 test failures; emphasized mandatory file revert and queue clear after each step"
3135
- version: "1.1.0"
3236
changes: "Added rules queue clearing between tests to prevent anti-infinite-loop mechanism from blocking tests"
3337
- version: "1.0.0"
@@ -46,8 +50,8 @@ steps:
4650
- "**Sub-Agents Used**: Did the main agent spawn sub-agents (using the Task tool) to make the file edits? The main agent must NOT edit the test files directly."
4751
- "**Parallel Execution**: Were multiple sub-agents launched in parallel (in a single message with multiple Task tool calls)?"
4852
- "**Hooks Observed**: Did the main agent observe that no blocking hooks fired when the sub-agents returned? The hooks fire AUTOMATICALLY - the agent must NOT manually run the rules_check command."
49-
- "**All Tests Run**: Were all 8 'should NOT fire' tests executed (trigger/safety, set, pair forward, pair reverse, multi safety, infinite block prompt, infinite block command, created)?"
50-
- "**Git Reverted**: Were changes reverted and queue cleared after tests completed using `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json`?"
53+
- "**Early Termination**: If 2 tests failed, did testing halt immediately with results reported?"
54+
- "**Git Reverted**: Were changes reverted and queue cleared after tests completed (or after early termination) using `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json`?"
5155

5256
- id: run_fire_tests
5357
name: "Run Should-Fire Tests"
@@ -64,6 +68,6 @@ steps:
6468
- "**Sub-Agents Used**: Did the main agent spawn a sub-agent (using the Task tool) for EACH test? The main agent must NOT edit the test files directly."
6569
- "**Serial Execution**: Were sub-agents launched ONE AT A TIME (not in parallel) to prevent cross-contamination?"
6670
- "**Hooks Fired Automatically**: Did the main agent observe the blocking hooks firing automatically when each sub-agent returned? The agent must NOT manually run the rules_check command."
67-
- "**Git Reverted Between Tests**: Was `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run between each test to prevent cross-contamination?"
68-
- "**All Tests Run**: Were all 8 'should fire' tests executed (trigger/safety, set, pair, command action, multi safety, infinite block prompt, infinite block command, created)?"
71+
- "**Git Reverted Between Tests**: Was `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run after each test to revert files and prevent cross-contamination?"
72+
- "**Early Termination**: If 2 tests failed, did testing halt immediately with results reported?"
6973
- "**Results Recorded**: Did the main agent track pass/fail status for each test case?"

0 commit comments

Comments
 (0)