|
14 | 14 | 1. **Sub-Agents Used**: Did the main agent spawn a sub-agent (using the Task tool) for EACH test? The main agent must NOT edit the test files directly. |
15 | 15 | 2. **Serial Execution**: Were sub-agents launched ONE AT A TIME (not in parallel) to prevent cross-contamination? |
16 | 16 | 3. **Hooks Fired Automatically**: Did the main agent observe the blocking hooks firing automatically when each sub-agent returned? The agent must NOT manually run the rules_check command. |
17 | | - 4. **Git Reverted Between Tests**: Was `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run between each test to prevent cross-contamination? |
18 | | - 5. **All Tests Run**: Were all 8 'should fire' tests executed (trigger/safety, set, pair, command action, multi safety, infinite block prompt, infinite block command, created)? |
| 17 | + 4. **Git Reverted Between Tests**: Was `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run after each test to revert files and prevent cross-contamination? |
| 18 | + 5. **Early Termination**: If 2 tests failed, did testing halt immediately with results reported? |
19 | 19 | 6. **Results Recorded**: Did the main agent track pass/fail status for each test case? |
20 | 20 |
|
21 | 21 | ## Instructions |
|
39 | 39 | 1. **Sub-Agents Used**: Did the main agent spawn a sub-agent (using the Task tool) for EACH test? The main agent must NOT edit the test files directly. |
40 | 40 | 2. **Serial Execution**: Were sub-agents launched ONE AT A TIME (not in parallel) to prevent cross-contamination? |
41 | 41 | 3. **Hooks Fired Automatically**: Did the main agent observe the blocking hooks firing automatically when each sub-agent returned? The agent must NOT manually run the rules_check command. |
42 | | - 4. **Git Reverted Between Tests**: Was `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run between each test to prevent cross-contamination? |
43 | | - 5. **All Tests Run**: Were all 8 'should fire' tests executed (trigger/safety, set, pair, command action, multi safety, infinite block prompt, infinite block command, created)? |
| 42 | + 4. **Git Reverted Between Tests**: Was `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run after each test to revert files and prevent cross-contamination? |
| 43 | + 5. **Early Termination**: If 2 tests failed, did testing halt immediately with results reported? |
44 | 44 | 6. **Results Recorded**: Did the main agent track pass/fail status for each test case? |
45 | 45 |
|
46 | 46 | ## Instructions |
@@ -118,13 +118,21 @@ For EACH test below, follow this cycle: |
118 | 118 | - If queue is empty, the hook did NOT fire at all |
119 | 119 | - Record the queue status along with the result |
120 | 120 | 5. **Record the result** - pass if hook fired (visible block OR queue entry), fail if neither |
121 | | -6. **Revert changes and clear queue**: |
| 121 | +6. **Revert changes and clear queue** (MANDATORY after each test): |
122 | 122 | ```bash |
123 | | - git checkout -- manual_tests/ |
| 123 | + git reset HEAD manual_tests/ && git checkout -- manual_tests/ && rm -f manual_tests/test_created_mode/new_config.yml |
124 | 124 | rm -rf .deepwork/tmp/rules/queue/*.json 2>/dev/null || true |
125 | 125 | ``` |
126 | | - The queue must be cleared because rules that have been shown (status=QUEUED) won't fire again until cleared. |
127 | | -7. **Proceed to the next test** |
| 126 | + **Why this command sequence**: |
| 127 | + - `git reset HEAD manual_tests/` - Unstages files from the index (rules_check uses `git add -A` which stages changes) |
| 128 | + - `git checkout -- manual_tests/` - Reverts working tree to match HEAD |
| 129 | + - `rm -f ...` - Removes any new files created during tests |
| 130 | + - The queue clear removes rules that have been shown (status=QUEUED) so they can fire again |
| 131 | +7. **Check for early termination**: If **2 tests have now failed**, immediately: |
| 132 | + - Stop running any remaining tests |
| 133 | + - Report the results summary showing which tests passed/failed |
| 134 | + - The job halts here - do NOT proceed with remaining tests |
| 135 | +8. **Proceed to the next test** (only if fewer than 2 failures) |
128 | 136 |
|
129 | 137 | **IMPORTANT**: Only launch ONE sub-agent at a time. Wait for it to complete and revert before launching the next. |
130 | 138 |
|
@@ -184,12 +192,13 @@ Record the result after each test: |
184 | 192 |
|
185 | 193 | ## Quality Criteria |
186 | 194 |
|
187 | | -- **Sub-agents spawned**: All 8 tests were run using the Task tool to spawn sub-agents - the main agent did NOT edit files directly |
| 195 | +- **Sub-agents spawned**: Tests were run using the Task tool to spawn sub-agents - the main agent did NOT edit files directly |
188 | 196 | - **Serial execution**: Sub-agents were launched ONE AT A TIME, not in parallel |
189 | | -- **Git reverted and queue cleared between tests**: `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` was run after each test |
| 197 | +- **Git reverted and queue cleared between tests**: `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` was run after each test |
190 | 198 | - **Hooks observed (not triggered)**: The main agent observed hook behavior without manually running rules_check - hooks fired AUTOMATICALLY |
191 | | -- **Blocking behavior verified**: For each test, the appropriate blocking hook fired automatically when the sub-agent returned |
192 | | -- **Results recorded**: Pass/fail status was recorded for each test |
| 199 | +- **Blocking behavior verified**: For each test run, the appropriate blocking hook fired automatically when the sub-agent returned |
| 200 | +- **Early termination on 2 failures**: If 2 tests failed, testing halted immediately and results were reported |
| 201 | +- **Results recorded**: Pass/fail status was recorded for each test run |
193 | 202 | - When all criteria are met, include `<promise>✓ Quality Criteria Met</promise>` in your response |
194 | 203 |
|
195 | 204 | ## Reference |
@@ -262,8 +271,8 @@ Stop hooks will automatically validate your work. The loop continues until all c |
262 | 271 | 1. **Sub-Agents Used**: Did the main agent spawn a sub-agent (using the Task tool) for EACH test? The main agent must NOT edit the test files directly. |
263 | 272 | 2. **Serial Execution**: Were sub-agents launched ONE AT A TIME (not in parallel) to prevent cross-contamination? |
264 | 273 | 3. **Hooks Fired Automatically**: Did the main agent observe the blocking hooks firing automatically when each sub-agent returned? The agent must NOT manually run the rules_check command. |
265 | | -4. **Git Reverted Between Tests**: Was `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run between each test to prevent cross-contamination? |
266 | | -5. **All Tests Run**: Were all 8 'should fire' tests executed (trigger/safety, set, pair, command action, multi safety, infinite block prompt, infinite block command, created)? |
| 274 | +4. **Git Reverted Between Tests**: Was `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run after each test to revert files and prevent cross-contamination? |
| 275 | +5. **Early Termination**: If 2 tests failed, did testing halt immediately with results reported? |
267 | 276 | 6. **Results Recorded**: Did the main agent track pass/fail status for each test case? |
268 | 277 |
|
269 | 278 |
|
|
0 commit comments