Unsupervisedcom · nhorton · Jan 22, 2026 · Jan 22, 2026 · Jan 22, 2026 · Jan 22, 2026
diff --git a/.claude/skills/manual_tests.run_fire_tests/SKILL.md b/.claude/skills/manual_tests.run_fire_tests/SKILL.md
@@ -14,8 +14,8 @@ hooks:
             1. **Sub-Agents Used**: Did the main agent spawn a sub-agent (using the Task tool) for EACH test? The main agent must NOT edit the test files directly.
             2. **Serial Execution**: Were sub-agents launched ONE AT A TIME (not in parallel) to prevent cross-contamination?
             3. **Hooks Fired Automatically**: Did the main agent observe the blocking hooks firing automatically when each sub-agent returned? The agent must NOT manually run the rules_check command.
-            4. **Git Reverted Between Tests**: Was `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run between each test to prevent cross-contamination?
-            5. **All Tests Run**: Were all 8 'should fire' tests executed (trigger/safety, set, pair, command action, multi safety, infinite block prompt, infinite block command, created)?
+            4. **Git Reverted Between Tests**: Was `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run after each test to revert files and prevent cross-contamination?
+            5. **Early Termination**: If 2 tests failed, did testing halt immediately with results reported?
             6. **Results Recorded**: Did the main agent track pass/fail status for each test case?
 
             ## Instructions
@@ -39,8 +39,8 @@ hooks:
             1. **Sub-Agents Used**: Did the main agent spawn a sub-agent (using the Task tool) for EACH test? The main agent must NOT edit the test files directly.
             2. **Serial Execution**: Were sub-agents launched ONE AT A TIME (not in parallel) to prevent cross-contamination?
             3. **Hooks Fired Automatically**: Did the main agent observe the blocking hooks firing automatically when each sub-agent returned? The agent must NOT manually run the rules_check command.
-            4. **Git Reverted Between Tests**: Was `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run between each test to prevent cross-contamination?
-            5. **All Tests Run**: Were all 8 'should fire' tests executed (trigger/safety, set, pair, command action, multi safety, infinite block prompt, infinite block command, created)?
+            4. **Git Reverted Between Tests**: Was `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run after each test to revert files and prevent cross-contamination?
+            5. **Early Termination**: If 2 tests failed, did testing halt immediately with results reported?
             6. **Results Recorded**: Did the main agent track pass/fail status for each test case?
 
             ## Instructions
@@ -118,13 +118,21 @@ For EACH test below, follow this cycle:
    - If queue is empty, the hook did NOT fire at all
    - Record the queue status along with the result
 5. **Record the result** - pass if hook fired (visible block OR queue entry), fail if neither
-6. **Revert changes and clear queue**:
+6. **Revert changes and clear queue** (MANDATORY after each test):
    ```bash
-   git checkout -- manual_tests/
+   git reset HEAD manual_tests/ && git checkout -- manual_tests/ && rm -f manual_tests/test_created_mode/new_config.yml
    rm -rf .deepwork/tmp/rules/queue/*.json 2>/dev/null || true
    ```
-   The queue must be cleared because rules that have been shown (status=QUEUED) won't fire again until cleared.
-7. **Proceed to the next test**
+   **Why this command sequence**:
+   - `git reset HEAD manual_tests/` - Unstages files from the index (rules_check uses `git add -A` which stages changes)
+   - `git checkout -- manual_tests/` - Reverts working tree to match HEAD
+   - `rm -f ...` - Removes any new files created during tests
+   - The queue clear removes rules that have been shown (status=QUEUED) so they can fire again
+7. **Check for early termination**: If **2 tests have now failed**, immediately:
+   - Stop running any remaining tests
+   - Report the results summary showing which tests passed/failed
+   - The job halts here - do NOT proceed with remaining tests
+8. **Proceed to the next test** (only if fewer than 2 failures)
 
 **IMPORTANT**: Only launch ONE sub-agent at a time. Wait for it to complete and revert before launching the next.
 
@@ -184,12 +192,13 @@ Record the result after each test:
 
 ## Quality Criteria
 
-- **Sub-agents spawned**: All 8 tests were run using the Task tool to spawn sub-agents - the main agent did NOT edit files directly
+- **Sub-agents spawned**: Tests were run using the Task tool to spawn sub-agents - the main agent did NOT edit files directly
 - **Serial execution**: Sub-agents were launched ONE AT A TIME, not in parallel
-- **Git reverted and queue cleared between tests**: `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` was run after each test
+- **Git reverted and queue cleared between tests**: `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` was run after each test
 - **Hooks observed (not triggered)**: The main agent observed hook behavior without manually running rules_check - hooks fired AUTOMATICALLY
-- **Blocking behavior verified**: For each test, the appropriate blocking hook fired automatically when the sub-agent returned
-- **Results recorded**: Pass/fail status was recorded for each test
+- **Blocking behavior verified**: For each test run, the appropriate blocking hook fired automatically when the sub-agent returned
+- **Early termination on 2 failures**: If 2 tests failed, testing halted immediately and results were reported
+- **Results recorded**: Pass/fail status was recorded for each test run
 - When all criteria are met, include `<promise>✓ Quality Criteria Met</promise>` in your response
 
 ## Reference
@@ -262,8 +271,8 @@ Stop hooks will automatically validate your work. The loop continues until all c
 1. **Sub-Agents Used**: Did the main agent spawn a sub-agent (using the Task tool) for EACH test? The main agent must NOT edit the test files directly.
 2. **Serial Execution**: Were sub-agents launched ONE AT A TIME (not in parallel) to prevent cross-contamination?
 3. **Hooks Fired Automatically**: Did the main agent observe the blocking hooks firing automatically when each sub-agent returned? The agent must NOT manually run the rules_check command.
-4. **Git Reverted Between Tests**: Was `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run between each test to prevent cross-contamination?
-5. **All Tests Run**: Were all 8 'should fire' tests executed (trigger/safety, set, pair, command action, multi safety, infinite block prompt, infinite block command, created)?
+4. **Git Reverted Between Tests**: Was `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run after each test to revert files and prevent cross-contamination?
+5. **Early Termination**: If 2 tests failed, did testing halt immediately with results reported?
 6. **Results Recorded**: Did the main agent track pass/fail status for each test case?
 
 

diff --git a/.claude/skills/manual_tests.run_not_fire_tests/SKILL.md b/.claude/skills/manual_tests.run_not_fire_tests/SKILL.md
@@ -14,8 +14,8 @@ hooks:
             1. **Sub-Agents Used**: Did the main agent spawn sub-agents (using the Task tool) to make the file edits? The main agent must NOT edit the test files directly.
             2. **Parallel Execution**: Were multiple sub-agents launched in parallel (in a single message with multiple Task tool calls)?
             3. **Hooks Observed**: Did the main agent observe that no blocking hooks fired when the sub-agents returned? The hooks fire AUTOMATICALLY - the agent must NOT manually run the rules_check command.
-            4. **All Tests Run**: Were all 8 'should NOT fire' tests executed (trigger/safety, set, pair forward, pair reverse, multi safety, infinite block prompt, infinite block command, created)?
-            5. **Git Reverted**: Were changes reverted and queue cleared after tests completed using `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json`?
+            4. **Early Termination**: If 2 tests failed, did testing halt immediately with results reported?
+            5. **Git Reverted**: Were changes reverted and queue cleared after tests completed (or after early termination) using `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json`?
 
             ## Instructions
 
@@ -38,8 +38,8 @@ hooks:
             1. **Sub-Agents Used**: Did the main agent spawn sub-agents (using the Task tool) to make the file edits? The main agent must NOT edit the test files directly.
             2. **Parallel Execution**: Were multiple sub-agents launched in parallel (in a single message with multiple Task tool calls)?
             3. **Hooks Observed**: Did the main agent observe that no blocking hooks fired when the sub-agents returned? The hooks fire AUTOMATICALLY - the agent must NOT manually run the rules_check command.
-            4. **All Tests Run**: Were all 8 'should NOT fire' tests executed (trigger/safety, set, pair forward, pair reverse, multi safety, infinite block prompt, infinite block command, created)?
-            5. **Git Reverted**: Were changes reverted and queue cleared after tests completed using `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json`?
+            4. **Early Termination**: If 2 tests failed, did testing halt immediately with results reported?
+            5. **Git Reverted**: Were changes reverted and queue cleared after tests completed (or after early termination) using `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json`?
 
             ## Instructions
 
@@ -118,7 +118,7 @@ Run all 8 "should NOT fire" tests in **parallel** sub-agents, then verify no blo
 
    **Remember**: You are OBSERVING whether hooks fired automatically. Do NOT run any verification commands manually.
 
-3. **Record the results**
+3. **Record the results and check for early termination**
 
    Track which tests passed and which failed:
 
@@ -133,23 +133,35 @@ Run all 8 "should NOT fire" tests in **parallel** sub-agents, then verify no blo
    | Infinite Block Command | Promise tag | |
    | Created Mode | Modify existing | |
 
+   **EARLY TERMINATION**: If **2 tests have failed**, immediately:
+   1. Stop running any remaining tests
+   2. Revert all changes and clear queue (see step 4)
+   3. Report the results summary showing which tests passed/failed
+   4. Do NOT proceed to the next step - the job halts here
+
 4. **Revert all changes and clear queue**
 
-   After all tests complete, run:
+   **IMPORTANT**: This step is MANDATORY and must run regardless of whether tests passed or failed.
+
+   Run these commands to clean up:
    ```bash
-   git checkout -- manual_tests/
+   git reset HEAD manual_tests/ && git checkout -- manual_tests/ && rm -f manual_tests/test_created_mode/new_config.yml
    rm -rf .deepwork/tmp/rules/queue/*.json 2>/dev/null || true
    ```
 
-   This cleans up the test files AND clears the rules queue before the "should fire" tests run. The queue must be cleared because rules that have already been shown to the agent (status=QUEUED) won't fire again until the queue is cleared.
+   **Why this command sequence**:
+   - `git reset HEAD manual_tests/` - Unstages files from the index (rules_check uses `git add -A` which stages changes)
+   - `git checkout -- manual_tests/` - Reverts working tree to match HEAD
+   - `rm -f manual_tests/test_created_mode/new_config.yml` - Removes any new files created during tests
+   - The queue clear removes rules that have been shown (status=QUEUED) so they can fire again
 
 ## Quality Criteria
 
 - **Sub-agents spawned**: All 8 tests were run using the Task tool to spawn sub-agents - the main agent did NOT edit files directly
 - **Parallel execution**: All 8 sub-agents were launched in a single message (parallel)
 - **Hooks observed (not triggered)**: The main agent observed hook behavior without manually running rules_check
-- **No unexpected blocks**: All tests passed - no blocking hooks fired
-- **Changes reverted and queue cleared**: `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` was run after tests completed
+- **Early termination on 2 failures**: If 2 tests failed, testing halted immediately and results were reported
+- **Changes reverted and queue cleared**: `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` was run after tests completed (regardless of pass/fail)
 - When all criteria are met, include `<promise>✓ Quality Criteria Met</promise>` in your response
 
 ## Reference
@@ -217,8 +229,8 @@ Stop hooks will automatically validate your work. The loop continues until all c
 1. **Sub-Agents Used**: Did the main agent spawn sub-agents (using the Task tool) to make the file edits? The main agent must NOT edit the test files directly.
 2. **Parallel Execution**: Were multiple sub-agents launched in parallel (in a single message with multiple Task tool calls)?
 3. **Hooks Observed**: Did the main agent observe that no blocking hooks fired when the sub-agents returned? The hooks fire AUTOMATICALLY - the agent must NOT manually run the rules_check command.
-4. **All Tests Run**: Were all 8 'should NOT fire' tests executed (trigger/safety, set, pair forward, pair reverse, multi safety, infinite block prompt, infinite block command, created)?
-5. **Git Reverted**: Were changes reverted and queue cleared after tests completed using `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json`?
+4. **Early Termination**: If 2 tests failed, did testing halt immediately with results reported?
+5. **Git Reverted**: Were changes reverted and queue cleared after tests completed (or after early termination) using `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json`?
 
 
 **To complete**: Include `<promise>✓ Quality Criteria Met</promise>` in your final response only after verifying ALL criteria are satisfied.

diff --git a/.deepwork/jobs/manual_tests/job.yml b/.deepwork/jobs/manual_tests/job.yml
@@ -1,5 +1,5 @@
 name: manual_tests
-version: "1.1.0"
+version: "1.2.1"
 summary: "Runs all manual hook/rule tests using sub-agents. Use when validating that DeepWork rules fire correctly."
 description: |
   A workflow for running manual tests that validate DeepWork rules/hooks fire correctly.
@@ -28,6 +28,10 @@ description: |
   - Created mode (new files only)
 
 changelog:
+  - version: "1.2.1"
+    changes: "Fixed incomplete revert - now uses git reset HEAD to unstage files (rules_check stages with git add -A)"
+  - version: "1.2.0"
+    changes: "Added early termination on 2 test failures; emphasized mandatory file revert and queue clear after each step"
   - version: "1.1.0"
     changes: "Added rules queue clearing between tests to prevent anti-infinite-loop mechanism from blocking tests"
   - version: "1.0.0"
@@ -46,8 +50,8 @@ steps:
       - "**Sub-Agents Used**: Did the main agent spawn sub-agents (using the Task tool) to make the file edits? The main agent must NOT edit the test files directly."
       - "**Parallel Execution**: Were multiple sub-agents launched in parallel (in a single message with multiple Task tool calls)?"
       - "**Hooks Observed**: Did the main agent observe that no blocking hooks fired when the sub-agents returned? The hooks fire AUTOMATICALLY - the agent must NOT manually run the rules_check command."
-      - "**All Tests Run**: Were all 8 'should NOT fire' tests executed (trigger/safety, set, pair forward, pair reverse, multi safety, infinite block prompt, infinite block command, created)?"
-      - "**Git Reverted**: Were changes reverted and queue cleared after tests completed using `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json`?"
+      - "**Early Termination**: If 2 tests failed, did testing halt immediately with results reported?"
+      - "**Git Reverted**: Were changes reverted and queue cleared after tests completed (or after early termination) using `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json`?"
 
   - id: run_fire_tests
     name: "Run Should-Fire Tests"
@@ -64,6 +68,6 @@ steps:
       - "**Sub-Agents Used**: Did the main agent spawn a sub-agent (using the Task tool) for EACH test? The main agent must NOT edit the test files directly."
       - "**Serial Execution**: Were sub-agents launched ONE AT A TIME (not in parallel) to prevent cross-contamination?"
       - "**Hooks Fired Automatically**: Did the main agent observe the blocking hooks firing automatically when each sub-agent returned? The agent must NOT manually run the rules_check command."
-      - "**Git Reverted Between Tests**: Was `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run between each test to prevent cross-contamination?"
-      - "**All Tests Run**: Were all 8 'should fire' tests executed (trigger/safety, set, pair, command action, multi safety, infinite block prompt, infinite block command, created)?"
+      - "**Git Reverted Between Tests**: Was `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run after each test to revert files and prevent cross-contamination?"
+      - "**Early Termination**: If 2 tests failed, did testing halt immediately with results reported?"
       - "**Results Recorded**: Did the main agent track pass/fail status for each test case?"