From 2734beb08836fa5197bddc484bf2d3b2943a6885 Mon Sep 17 00:00:00 2001 From: Claude Date: Thu, 22 Jan 2026 19:33:54 +0000 Subject: [PATCH 1/3] Add early termination on 2 test failures and emphasize file revert - Process now halts as soon as 2 tests have failed and reports results - Each step must revert files and clear queue after completion (mandatory) - Updated quality criteria to reflect these requirements - Bumped version to 1.2.0 --- .deepwork/jobs/manual_tests/job.yml | 12 +++++++----- .../jobs/manual_tests/steps/run_fire_tests.md | 15 ++++++++++----- .../manual_tests/steps/run_not_fire_tests.md | 16 ++++++++++++---- 3 files changed, 29 insertions(+), 14 deletions(-) diff --git a/.deepwork/jobs/manual_tests/job.yml b/.deepwork/jobs/manual_tests/job.yml index 8e06f359..05a8ce2d 100644 --- a/.deepwork/jobs/manual_tests/job.yml +++ b/.deepwork/jobs/manual_tests/job.yml @@ -1,5 +1,5 @@ name: manual_tests -version: "1.1.0" +version: "1.2.0" summary: "Runs all manual hook/rule tests using sub-agents. Use when validating that DeepWork rules fire correctly." description: | A workflow for running manual tests that validate DeepWork rules/hooks fire correctly. @@ -28,6 +28,8 @@ description: | - Created mode (new files only) changelog: + - version: "1.2.0" + changes: "Added early termination on 2 test failures; emphasized mandatory file revert and queue clear after each step" - version: "1.1.0" changes: "Added rules queue clearing between tests to prevent anti-infinite-loop mechanism from blocking tests" - version: "1.0.0" @@ -46,8 +48,8 @@ steps: - "**Sub-Agents Used**: Did the main agent spawn sub-agents (using the Task tool) to make the file edits? The main agent must NOT edit the test files directly." - "**Parallel Execution**: Were multiple sub-agents launched in parallel (in a single message with multiple Task tool calls)?" - "**Hooks Observed**: Did the main agent observe that no blocking hooks fired when the sub-agents returned? The hooks fire AUTOMATICALLY - the agent must NOT manually run the rules_check command." - - "**All Tests Run**: Were all 8 'should NOT fire' tests executed (trigger/safety, set, pair forward, pair reverse, multi safety, infinite block prompt, infinite block command, created)?" - - "**Git Reverted**: Were changes reverted and queue cleared after tests completed using `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json`?" + - "**Early Termination**: If 2 tests failed, did testing halt immediately with results reported?" + - "**Git Reverted**: Were changes reverted and queue cleared after tests completed (or after early termination) using `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json`?" - id: run_fire_tests name: "Run Should-Fire Tests" @@ -64,6 +66,6 @@ steps: - "**Sub-Agents Used**: Did the main agent spawn a sub-agent (using the Task tool) for EACH test? The main agent must NOT edit the test files directly." - "**Serial Execution**: Were sub-agents launched ONE AT A TIME (not in parallel) to prevent cross-contamination?" - "**Hooks Fired Automatically**: Did the main agent observe the blocking hooks firing automatically when each sub-agent returned? The agent must NOT manually run the rules_check command." - - "**Git Reverted Between Tests**: Was `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run between each test to prevent cross-contamination?" - - "**All Tests Run**: Were all 8 'should fire' tests executed (trigger/safety, set, pair, command action, multi safety, infinite block prompt, infinite block command, created)?" + - "**Git Reverted Between Tests**: Was `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run after each test to revert files and prevent cross-contamination?" + - "**Early Termination**: If 2 tests failed, did testing halt immediately with results reported?" - "**Results Recorded**: Did the main agent track pass/fail status for each test case?" diff --git a/.deepwork/jobs/manual_tests/steps/run_fire_tests.md b/.deepwork/jobs/manual_tests/steps/run_fire_tests.md index f3e887fb..ba13a78e 100644 --- a/.deepwork/jobs/manual_tests/steps/run_fire_tests.md +++ b/.deepwork/jobs/manual_tests/steps/run_fire_tests.md @@ -46,13 +46,17 @@ For EACH test below, follow this cycle: - If queue is empty, the hook did NOT fire at all - Record the queue status along with the result 5. **Record the result** - pass if hook fired (visible block OR queue entry), fail if neither -6. **Revert changes and clear queue**: +6. **Revert changes and clear queue** (MANDATORY after each test): ```bash git checkout -- manual_tests/ rm -rf .deepwork/tmp/rules/queue/*.json 2>/dev/null || true ``` The queue must be cleared because rules that have been shown (status=QUEUED) won't fire again until cleared. -7. **Proceed to the next test** +7. **Check for early termination**: If **2 tests have now failed**, immediately: + - Stop running any remaining tests + - Report the results summary showing which tests passed/failed + - The job halts here - do NOT proceed with remaining tests +8. **Proceed to the next test** (only if fewer than 2 failures) **IMPORTANT**: Only launch ONE sub-agent at a time. Wait for it to complete and revert before launching the next. @@ -112,12 +116,13 @@ Record the result after each test: ## Quality Criteria -- **Sub-agents spawned**: All 8 tests were run using the Task tool to spawn sub-agents - the main agent did NOT edit files directly +- **Sub-agents spawned**: Tests were run using the Task tool to spawn sub-agents - the main agent did NOT edit files directly - **Serial execution**: Sub-agents were launched ONE AT A TIME, not in parallel - **Git reverted and queue cleared between tests**: `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` was run after each test - **Hooks observed (not triggered)**: The main agent observed hook behavior without manually running rules_check - hooks fired AUTOMATICALLY -- **Blocking behavior verified**: For each test, the appropriate blocking hook fired automatically when the sub-agent returned -- **Results recorded**: Pass/fail status was recorded for each test +- **Blocking behavior verified**: For each test run, the appropriate blocking hook fired automatically when the sub-agent returned +- **Early termination on 2 failures**: If 2 tests failed, testing halted immediately and results were reported +- **Results recorded**: Pass/fail status was recorded for each test run - When all criteria are met, include `✓ Quality Criteria Met` in your response ## Reference diff --git a/.deepwork/jobs/manual_tests/steps/run_not_fire_tests.md b/.deepwork/jobs/manual_tests/steps/run_not_fire_tests.md index 38cab325..c41b5fb9 100644 --- a/.deepwork/jobs/manual_tests/steps/run_not_fire_tests.md +++ b/.deepwork/jobs/manual_tests/steps/run_not_fire_tests.md @@ -52,7 +52,7 @@ Run all 8 "should NOT fire" tests in **parallel** sub-agents, then verify no blo **Remember**: You are OBSERVING whether hooks fired automatically. Do NOT run any verification commands manually. -3. **Record the results** +3. **Record the results and check for early termination** Track which tests passed and which failed: @@ -67,9 +67,17 @@ Run all 8 "should NOT fire" tests in **parallel** sub-agents, then verify no blo | Infinite Block Command | Promise tag | | | Created Mode | Modify existing | | + **EARLY TERMINATION**: If **2 tests have failed**, immediately: + 1. Stop running any remaining tests + 2. Revert all changes and clear queue (see step 4) + 3. Report the results summary showing which tests passed/failed + 4. Do NOT proceed to the next step - the job halts here + 4. **Revert all changes and clear queue** - After all tests complete, run: + **IMPORTANT**: This step is MANDATORY and must run regardless of whether tests passed or failed. + + Run these commands to clean up: ```bash git checkout -- manual_tests/ rm -rf .deepwork/tmp/rules/queue/*.json 2>/dev/null || true @@ -82,8 +90,8 @@ Run all 8 "should NOT fire" tests in **parallel** sub-agents, then verify no blo - **Sub-agents spawned**: All 8 tests were run using the Task tool to spawn sub-agents - the main agent did NOT edit files directly - **Parallel execution**: All 8 sub-agents were launched in a single message (parallel) - **Hooks observed (not triggered)**: The main agent observed hook behavior without manually running rules_check -- **No unexpected blocks**: All tests passed - no blocking hooks fired -- **Changes reverted and queue cleared**: `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` was run after tests completed +- **Early termination on 2 failures**: If 2 tests failed, testing halted immediately and results were reported +- **Changes reverted and queue cleared**: `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` was run after tests completed (regardless of pass/fail) - When all criteria are met, include `✓ Quality Criteria Met` in your response ## Reference From 8a327d339e14a2b0b9042355c7071163129f2522 Mon Sep 17 00:00:00 2001 From: Claude Date: Thu, 22 Jan 2026 19:38:00 +0000 Subject: [PATCH 2/3] Fix incomplete git revert - add git reset HEAD to unstage files The rules_check hook uses `git add -A` to stage files for change detection. The previous `git checkout -- manual_tests/` only reverted the working tree but left files staged in the index, causing cross-contamination between tests. Fix: Use `git reset HEAD manual_tests/ && git checkout -- manual_tests/` to properly unstage and revert all changes. --- .deepwork/jobs/manual_tests/job.yml | 8 +++++--- .deepwork/jobs/manual_tests/steps/run_fire_tests.md | 10 +++++++--- .../jobs/manual_tests/steps/run_not_fire_tests.md | 10 +++++++--- 3 files changed, 19 insertions(+), 9 deletions(-) diff --git a/.deepwork/jobs/manual_tests/job.yml b/.deepwork/jobs/manual_tests/job.yml index 05a8ce2d..b2662c2c 100644 --- a/.deepwork/jobs/manual_tests/job.yml +++ b/.deepwork/jobs/manual_tests/job.yml @@ -1,5 +1,5 @@ name: manual_tests -version: "1.2.0" +version: "1.2.1" summary: "Runs all manual hook/rule tests using sub-agents. Use when validating that DeepWork rules fire correctly." description: | A workflow for running manual tests that validate DeepWork rules/hooks fire correctly. @@ -28,6 +28,8 @@ description: | - Created mode (new files only) changelog: + - version: "1.2.1" + changes: "Fixed incomplete revert - now uses git reset HEAD to unstage files (rules_check stages with git add -A)" - version: "1.2.0" changes: "Added early termination on 2 test failures; emphasized mandatory file revert and queue clear after each step" - version: "1.1.0" @@ -49,7 +51,7 @@ steps: - "**Parallel Execution**: Were multiple sub-agents launched in parallel (in a single message with multiple Task tool calls)?" - "**Hooks Observed**: Did the main agent observe that no blocking hooks fired when the sub-agents returned? The hooks fire AUTOMATICALLY - the agent must NOT manually run the rules_check command." - "**Early Termination**: If 2 tests failed, did testing halt immediately with results reported?" - - "**Git Reverted**: Were changes reverted and queue cleared after tests completed (or after early termination) using `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json`?" + - "**Git Reverted**: Were changes reverted and queue cleared after tests completed (or after early termination) using `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json`?" - id: run_fire_tests name: "Run Should-Fire Tests" @@ -66,6 +68,6 @@ steps: - "**Sub-Agents Used**: Did the main agent spawn a sub-agent (using the Task tool) for EACH test? The main agent must NOT edit the test files directly." - "**Serial Execution**: Were sub-agents launched ONE AT A TIME (not in parallel) to prevent cross-contamination?" - "**Hooks Fired Automatically**: Did the main agent observe the blocking hooks firing automatically when each sub-agent returned? The agent must NOT manually run the rules_check command." - - "**Git Reverted Between Tests**: Was `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run after each test to revert files and prevent cross-contamination?" + - "**Git Reverted Between Tests**: Was `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run after each test to revert files and prevent cross-contamination?" - "**Early Termination**: If 2 tests failed, did testing halt immediately with results reported?" - "**Results Recorded**: Did the main agent track pass/fail status for each test case?" diff --git a/.deepwork/jobs/manual_tests/steps/run_fire_tests.md b/.deepwork/jobs/manual_tests/steps/run_fire_tests.md index ba13a78e..b326fc01 100644 --- a/.deepwork/jobs/manual_tests/steps/run_fire_tests.md +++ b/.deepwork/jobs/manual_tests/steps/run_fire_tests.md @@ -48,10 +48,14 @@ For EACH test below, follow this cycle: 5. **Record the result** - pass if hook fired (visible block OR queue entry), fail if neither 6. **Revert changes and clear queue** (MANDATORY after each test): ```bash - git checkout -- manual_tests/ + git reset HEAD manual_tests/ && git checkout -- manual_tests/ && rm -f manual_tests/test_created_mode/new_config.yml rm -rf .deepwork/tmp/rules/queue/*.json 2>/dev/null || true ``` - The queue must be cleared because rules that have been shown (status=QUEUED) won't fire again until cleared. + **Why this command sequence**: + - `git reset HEAD manual_tests/` - Unstages files from the index (rules_check uses `git add -A` which stages changes) + - `git checkout -- manual_tests/` - Reverts working tree to match HEAD + - `rm -f ...` - Removes any new files created during tests + - The queue clear removes rules that have been shown (status=QUEUED) so they can fire again 7. **Check for early termination**: If **2 tests have now failed**, immediately: - Stop running any remaining tests - Report the results summary showing which tests passed/failed @@ -118,7 +122,7 @@ Record the result after each test: - **Sub-agents spawned**: Tests were run using the Task tool to spawn sub-agents - the main agent did NOT edit files directly - **Serial execution**: Sub-agents were launched ONE AT A TIME, not in parallel -- **Git reverted and queue cleared between tests**: `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` was run after each test +- **Git reverted and queue cleared between tests**: `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` was run after each test - **Hooks observed (not triggered)**: The main agent observed hook behavior without manually running rules_check - hooks fired AUTOMATICALLY - **Blocking behavior verified**: For each test run, the appropriate blocking hook fired automatically when the sub-agent returned - **Early termination on 2 failures**: If 2 tests failed, testing halted immediately and results were reported diff --git a/.deepwork/jobs/manual_tests/steps/run_not_fire_tests.md b/.deepwork/jobs/manual_tests/steps/run_not_fire_tests.md index c41b5fb9..f5d58393 100644 --- a/.deepwork/jobs/manual_tests/steps/run_not_fire_tests.md +++ b/.deepwork/jobs/manual_tests/steps/run_not_fire_tests.md @@ -79,11 +79,15 @@ Run all 8 "should NOT fire" tests in **parallel** sub-agents, then verify no blo Run these commands to clean up: ```bash - git checkout -- manual_tests/ + git reset HEAD manual_tests/ && git checkout -- manual_tests/ && rm -f manual_tests/test_created_mode/new_config.yml rm -rf .deepwork/tmp/rules/queue/*.json 2>/dev/null || true ``` - This cleans up the test files AND clears the rules queue before the "should fire" tests run. The queue must be cleared because rules that have already been shown to the agent (status=QUEUED) won't fire again until the queue is cleared. + **Why this command sequence**: + - `git reset HEAD manual_tests/` - Unstages files from the index (rules_check uses `git add -A` which stages changes) + - `git checkout -- manual_tests/` - Reverts working tree to match HEAD + - `rm -f manual_tests/test_created_mode/new_config.yml` - Removes any new files created during tests + - The queue clear removes rules that have been shown (status=QUEUED) so they can fire again ## Quality Criteria @@ -91,7 +95,7 @@ Run all 8 "should NOT fire" tests in **parallel** sub-agents, then verify no blo - **Parallel execution**: All 8 sub-agents were launched in a single message (parallel) - **Hooks observed (not triggered)**: The main agent observed hook behavior without manually running rules_check - **Early termination on 2 failures**: If 2 tests failed, testing halted immediately and results were reported -- **Changes reverted and queue cleared**: `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` was run after tests completed (regardless of pass/fail) +- **Changes reverted and queue cleared**: `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` was run after tests completed (regardless of pass/fail) - When all criteria are met, include `✓ Quality Criteria Met` in your response ## Reference From 4c31733961ec744863cd0c634cedf89d53db9c15 Mon Sep 17 00:00:00 2001 From: Claude Date: Thu, 22 Jan 2026 19:40:08 +0000 Subject: [PATCH 3/3] Sync skills after manual_tests job updates Auto-generated skill files updated by deepwork install to reflect v1.2.1 changes (early termination on 2 failures, complete git revert). --- .../manual_tests.run_fire_tests/SKILL.md | 37 ++++++++++++------- .../manual_tests.run_not_fire_tests/SKILL.md | 36 ++++++++++++------ .../skills/manual_tests/run_fire_tests.toml | 29 ++++++++++----- .../manual_tests/run_not_fire_tests.toml | 28 ++++++++++---- 4 files changed, 86 insertions(+), 44 deletions(-) diff --git a/.claude/skills/manual_tests.run_fire_tests/SKILL.md b/.claude/skills/manual_tests.run_fire_tests/SKILL.md index 5b1ac3d5..86edc039 100644 --- a/.claude/skills/manual_tests.run_fire_tests/SKILL.md +++ b/.claude/skills/manual_tests.run_fire_tests/SKILL.md @@ -14,8 +14,8 @@ hooks: 1. **Sub-Agents Used**: Did the main agent spawn a sub-agent (using the Task tool) for EACH test? The main agent must NOT edit the test files directly. 2. **Serial Execution**: Were sub-agents launched ONE AT A TIME (not in parallel) to prevent cross-contamination? 3. **Hooks Fired Automatically**: Did the main agent observe the blocking hooks firing automatically when each sub-agent returned? The agent must NOT manually run the rules_check command. - 4. **Git Reverted Between Tests**: Was `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run between each test to prevent cross-contamination? - 5. **All Tests Run**: Were all 8 'should fire' tests executed (trigger/safety, set, pair, command action, multi safety, infinite block prompt, infinite block command, created)? + 4. **Git Reverted Between Tests**: Was `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run after each test to revert files and prevent cross-contamination? + 5. **Early Termination**: If 2 tests failed, did testing halt immediately with results reported? 6. **Results Recorded**: Did the main agent track pass/fail status for each test case? ## Instructions @@ -39,8 +39,8 @@ hooks: 1. **Sub-Agents Used**: Did the main agent spawn a sub-agent (using the Task tool) for EACH test? The main agent must NOT edit the test files directly. 2. **Serial Execution**: Were sub-agents launched ONE AT A TIME (not in parallel) to prevent cross-contamination? 3. **Hooks Fired Automatically**: Did the main agent observe the blocking hooks firing automatically when each sub-agent returned? The agent must NOT manually run the rules_check command. - 4. **Git Reverted Between Tests**: Was `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run between each test to prevent cross-contamination? - 5. **All Tests Run**: Were all 8 'should fire' tests executed (trigger/safety, set, pair, command action, multi safety, infinite block prompt, infinite block command, created)? + 4. **Git Reverted Between Tests**: Was `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run after each test to revert files and prevent cross-contamination? + 5. **Early Termination**: If 2 tests failed, did testing halt immediately with results reported? 6. **Results Recorded**: Did the main agent track pass/fail status for each test case? ## Instructions @@ -118,13 +118,21 @@ For EACH test below, follow this cycle: - If queue is empty, the hook did NOT fire at all - Record the queue status along with the result 5. **Record the result** - pass if hook fired (visible block OR queue entry), fail if neither -6. **Revert changes and clear queue**: +6. **Revert changes and clear queue** (MANDATORY after each test): ```bash - git checkout -- manual_tests/ + git reset HEAD manual_tests/ && git checkout -- manual_tests/ && rm -f manual_tests/test_created_mode/new_config.yml rm -rf .deepwork/tmp/rules/queue/*.json 2>/dev/null || true ``` - The queue must be cleared because rules that have been shown (status=QUEUED) won't fire again until cleared. -7. **Proceed to the next test** + **Why this command sequence**: + - `git reset HEAD manual_tests/` - Unstages files from the index (rules_check uses `git add -A` which stages changes) + - `git checkout -- manual_tests/` - Reverts working tree to match HEAD + - `rm -f ...` - Removes any new files created during tests + - The queue clear removes rules that have been shown (status=QUEUED) so they can fire again +7. **Check for early termination**: If **2 tests have now failed**, immediately: + - Stop running any remaining tests + - Report the results summary showing which tests passed/failed + - The job halts here - do NOT proceed with remaining tests +8. **Proceed to the next test** (only if fewer than 2 failures) **IMPORTANT**: Only launch ONE sub-agent at a time. Wait for it to complete and revert before launching the next. @@ -184,12 +192,13 @@ Record the result after each test: ## Quality Criteria -- **Sub-agents spawned**: All 8 tests were run using the Task tool to spawn sub-agents - the main agent did NOT edit files directly +- **Sub-agents spawned**: Tests were run using the Task tool to spawn sub-agents - the main agent did NOT edit files directly - **Serial execution**: Sub-agents were launched ONE AT A TIME, not in parallel -- **Git reverted and queue cleared between tests**: `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` was run after each test +- **Git reverted and queue cleared between tests**: `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` was run after each test - **Hooks observed (not triggered)**: The main agent observed hook behavior without manually running rules_check - hooks fired AUTOMATICALLY -- **Blocking behavior verified**: For each test, the appropriate blocking hook fired automatically when the sub-agent returned -- **Results recorded**: Pass/fail status was recorded for each test +- **Blocking behavior verified**: For each test run, the appropriate blocking hook fired automatically when the sub-agent returned +- **Early termination on 2 failures**: If 2 tests failed, testing halted immediately and results were reported +- **Results recorded**: Pass/fail status was recorded for each test run - When all criteria are met, include `✓ Quality Criteria Met` in your response ## Reference @@ -262,8 +271,8 @@ Stop hooks will automatically validate your work. The loop continues until all c 1. **Sub-Agents Used**: Did the main agent spawn a sub-agent (using the Task tool) for EACH test? The main agent must NOT edit the test files directly. 2. **Serial Execution**: Were sub-agents launched ONE AT A TIME (not in parallel) to prevent cross-contamination? 3. **Hooks Fired Automatically**: Did the main agent observe the blocking hooks firing automatically when each sub-agent returned? The agent must NOT manually run the rules_check command. -4. **Git Reverted Between Tests**: Was `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run between each test to prevent cross-contamination? -5. **All Tests Run**: Were all 8 'should fire' tests executed (trigger/safety, set, pair, command action, multi safety, infinite block prompt, infinite block command, created)? +4. **Git Reverted Between Tests**: Was `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run after each test to revert files and prevent cross-contamination? +5. **Early Termination**: If 2 tests failed, did testing halt immediately with results reported? 6. **Results Recorded**: Did the main agent track pass/fail status for each test case? diff --git a/.claude/skills/manual_tests.run_not_fire_tests/SKILL.md b/.claude/skills/manual_tests.run_not_fire_tests/SKILL.md index 02ef9d9b..2597c0f3 100644 --- a/.claude/skills/manual_tests.run_not_fire_tests/SKILL.md +++ b/.claude/skills/manual_tests.run_not_fire_tests/SKILL.md @@ -14,8 +14,8 @@ hooks: 1. **Sub-Agents Used**: Did the main agent spawn sub-agents (using the Task tool) to make the file edits? The main agent must NOT edit the test files directly. 2. **Parallel Execution**: Were multiple sub-agents launched in parallel (in a single message with multiple Task tool calls)? 3. **Hooks Observed**: Did the main agent observe that no blocking hooks fired when the sub-agents returned? The hooks fire AUTOMATICALLY - the agent must NOT manually run the rules_check command. - 4. **All Tests Run**: Were all 8 'should NOT fire' tests executed (trigger/safety, set, pair forward, pair reverse, multi safety, infinite block prompt, infinite block command, created)? - 5. **Git Reverted**: Were changes reverted and queue cleared after tests completed using `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json`? + 4. **Early Termination**: If 2 tests failed, did testing halt immediately with results reported? + 5. **Git Reverted**: Were changes reverted and queue cleared after tests completed (or after early termination) using `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json`? ## Instructions @@ -38,8 +38,8 @@ hooks: 1. **Sub-Agents Used**: Did the main agent spawn sub-agents (using the Task tool) to make the file edits? The main agent must NOT edit the test files directly. 2. **Parallel Execution**: Were multiple sub-agents launched in parallel (in a single message with multiple Task tool calls)? 3. **Hooks Observed**: Did the main agent observe that no blocking hooks fired when the sub-agents returned? The hooks fire AUTOMATICALLY - the agent must NOT manually run the rules_check command. - 4. **All Tests Run**: Were all 8 'should NOT fire' tests executed (trigger/safety, set, pair forward, pair reverse, multi safety, infinite block prompt, infinite block command, created)? - 5. **Git Reverted**: Were changes reverted and queue cleared after tests completed using `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json`? + 4. **Early Termination**: If 2 tests failed, did testing halt immediately with results reported? + 5. **Git Reverted**: Were changes reverted and queue cleared after tests completed (or after early termination) using `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json`? ## Instructions @@ -118,7 +118,7 @@ Run all 8 "should NOT fire" tests in **parallel** sub-agents, then verify no blo **Remember**: You are OBSERVING whether hooks fired automatically. Do NOT run any verification commands manually. -3. **Record the results** +3. **Record the results and check for early termination** Track which tests passed and which failed: @@ -133,23 +133,35 @@ Run all 8 "should NOT fire" tests in **parallel** sub-agents, then verify no blo | Infinite Block Command | Promise tag | | | Created Mode | Modify existing | | + **EARLY TERMINATION**: If **2 tests have failed**, immediately: + 1. Stop running any remaining tests + 2. Revert all changes and clear queue (see step 4) + 3. Report the results summary showing which tests passed/failed + 4. Do NOT proceed to the next step - the job halts here + 4. **Revert all changes and clear queue** - After all tests complete, run: + **IMPORTANT**: This step is MANDATORY and must run regardless of whether tests passed or failed. + + Run these commands to clean up: ```bash - git checkout -- manual_tests/ + git reset HEAD manual_tests/ && git checkout -- manual_tests/ && rm -f manual_tests/test_created_mode/new_config.yml rm -rf .deepwork/tmp/rules/queue/*.json 2>/dev/null || true ``` - This cleans up the test files AND clears the rules queue before the "should fire" tests run. The queue must be cleared because rules that have already been shown to the agent (status=QUEUED) won't fire again until the queue is cleared. + **Why this command sequence**: + - `git reset HEAD manual_tests/` - Unstages files from the index (rules_check uses `git add -A` which stages changes) + - `git checkout -- manual_tests/` - Reverts working tree to match HEAD + - `rm -f manual_tests/test_created_mode/new_config.yml` - Removes any new files created during tests + - The queue clear removes rules that have been shown (status=QUEUED) so they can fire again ## Quality Criteria - **Sub-agents spawned**: All 8 tests were run using the Task tool to spawn sub-agents - the main agent did NOT edit files directly - **Parallel execution**: All 8 sub-agents were launched in a single message (parallel) - **Hooks observed (not triggered)**: The main agent observed hook behavior without manually running rules_check -- **No unexpected blocks**: All tests passed - no blocking hooks fired -- **Changes reverted and queue cleared**: `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` was run after tests completed +- **Early termination on 2 failures**: If 2 tests failed, testing halted immediately and results were reported +- **Changes reverted and queue cleared**: `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` was run after tests completed (regardless of pass/fail) - When all criteria are met, include `✓ Quality Criteria Met` in your response ## Reference @@ -217,8 +229,8 @@ Stop hooks will automatically validate your work. The loop continues until all c 1. **Sub-Agents Used**: Did the main agent spawn sub-agents (using the Task tool) to make the file edits? The main agent must NOT edit the test files directly. 2. **Parallel Execution**: Were multiple sub-agents launched in parallel (in a single message with multiple Task tool calls)? 3. **Hooks Observed**: Did the main agent observe that no blocking hooks fired when the sub-agents returned? The hooks fire AUTOMATICALLY - the agent must NOT manually run the rules_check command. -4. **All Tests Run**: Were all 8 'should NOT fire' tests executed (trigger/safety, set, pair forward, pair reverse, multi safety, infinite block prompt, infinite block command, created)? -5. **Git Reverted**: Were changes reverted and queue cleared after tests completed using `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json`? +4. **Early Termination**: If 2 tests failed, did testing halt immediately with results reported? +5. **Git Reverted**: Were changes reverted and queue cleared after tests completed (or after early termination) using `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json`? **To complete**: Include `✓ Quality Criteria Met` in your final response only after verifying ALL criteria are satisfied. diff --git a/.gemini/skills/manual_tests/run_fire_tests.toml b/.gemini/skills/manual_tests/run_fire_tests.toml index dbd71b57..ba8e07d3 100644 --- a/.gemini/skills/manual_tests/run_fire_tests.toml +++ b/.gemini/skills/manual_tests/run_fire_tests.toml @@ -70,13 +70,21 @@ For EACH test below, follow this cycle: - If queue is empty, the hook did NOT fire at all - Record the queue status along with the result 5. **Record the result** - pass if hook fired (visible block OR queue entry), fail if neither -6. **Revert changes and clear queue**: +6. **Revert changes and clear queue** (MANDATORY after each test): ```bash - git checkout -- manual_tests/ + git reset HEAD manual_tests/ && git checkout -- manual_tests/ && rm -f manual_tests/test_created_mode/new_config.yml rm -rf .deepwork/tmp/rules/queue/*.json 2>/dev/null || true ``` - The queue must be cleared because rules that have been shown (status=QUEUED) won't fire again until cleared. -7. **Proceed to the next test** + **Why this command sequence**: + - `git reset HEAD manual_tests/` - Unstages files from the index (rules_check uses `git add -A` which stages changes) + - `git checkout -- manual_tests/` - Reverts working tree to match HEAD + - `rm -f ...` - Removes any new files created during tests + - The queue clear removes rules that have been shown (status=QUEUED) so they can fire again +7. **Check for early termination**: If **2 tests have now failed**, immediately: + - Stop running any remaining tests + - Report the results summary showing which tests passed/failed + - The job halts here - do NOT proceed with remaining tests +8. **Proceed to the next test** (only if fewer than 2 failures) **IMPORTANT**: Only launch ONE sub-agent at a time. Wait for it to complete and revert before launching the next. @@ -136,12 +144,13 @@ Record the result after each test: ## Quality Criteria -- **Sub-agents spawned**: All 8 tests were run using the Task tool to spawn sub-agents - the main agent did NOT edit files directly +- **Sub-agents spawned**: Tests were run using the Task tool to spawn sub-agents - the main agent did NOT edit files directly - **Serial execution**: Sub-agents were launched ONE AT A TIME, not in parallel -- **Git reverted and queue cleared between tests**: `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` was run after each test +- **Git reverted and queue cleared between tests**: `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` was run after each test - **Hooks observed (not triggered)**: The main agent observed hook behavior without manually running rules_check - hooks fired AUTOMATICALLY -- **Blocking behavior verified**: For each test, the appropriate blocking hook fired automatically when the sub-agent returned -- **Results recorded**: Pass/fail status was recorded for each test +- **Blocking behavior verified**: For each test run, the appropriate blocking hook fired automatically when the sub-agent returned +- **Early termination on 2 failures**: If 2 tests failed, testing halted immediately and results were reported +- **Results recorded**: Pass/fail status was recorded for each test run - When all criteria are met, include `✓ Quality Criteria Met` in your response ## Reference @@ -207,8 +216,8 @@ Use branch format: `deepwork/manual_tests-[instance]-YYYYMMDD` 1. **Sub-Agents Used**: Did the main agent spawn a sub-agent (using the Task tool) for EACH test? The main agent must NOT edit the test files directly. 2. **Serial Execution**: Were sub-agents launched ONE AT A TIME (not in parallel) to prevent cross-contamination? 3. **Hooks Fired Automatically**: Did the main agent observe the blocking hooks firing automatically when each sub-agent returned? The agent must NOT manually run the rules_check command. -4. **Git Reverted Between Tests**: Was `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run between each test to prevent cross-contamination? -5. **All Tests Run**: Were all 8 'should fire' tests executed (trigger/safety, set, pair, command action, multi safety, infinite block prompt, infinite block command, created)? +4. **Git Reverted Between Tests**: Was `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run after each test to revert files and prevent cross-contamination? +5. **Early Termination**: If 2 tests failed, did testing halt immediately with results reported? 6. **Results Recorded**: Did the main agent track pass/fail status for each test case? ## On Completion diff --git a/.gemini/skills/manual_tests/run_not_fire_tests.toml b/.gemini/skills/manual_tests/run_not_fire_tests.toml index d8139215..322b20b8 100644 --- a/.gemini/skills/manual_tests/run_not_fire_tests.toml +++ b/.gemini/skills/manual_tests/run_not_fire_tests.toml @@ -72,7 +72,7 @@ Run all 8 "should NOT fire" tests in **parallel** sub-agents, then verify no blo **Remember**: You are OBSERVING whether hooks fired automatically. Do NOT run any verification commands manually. -3. **Record the results** +3. **Record the results and check for early termination** Track which tests passed and which failed: @@ -87,23 +87,35 @@ Run all 8 "should NOT fire" tests in **parallel** sub-agents, then verify no blo | Infinite Block Command | Promise tag | | | Created Mode | Modify existing | | + **EARLY TERMINATION**: If **2 tests have failed**, immediately: + 1. Stop running any remaining tests + 2. Revert all changes and clear queue (see step 4) + 3. Report the results summary showing which tests passed/failed + 4. Do NOT proceed to the next step - the job halts here + 4. **Revert all changes and clear queue** - After all tests complete, run: + **IMPORTANT**: This step is MANDATORY and must run regardless of whether tests passed or failed. + + Run these commands to clean up: ```bash - git checkout -- manual_tests/ + git reset HEAD manual_tests/ && git checkout -- manual_tests/ && rm -f manual_tests/test_created_mode/new_config.yml rm -rf .deepwork/tmp/rules/queue/*.json 2>/dev/null || true ``` - This cleans up the test files AND clears the rules queue before the "should fire" tests run. The queue must be cleared because rules that have already been shown to the agent (status=QUEUED) won't fire again until the queue is cleared. + **Why this command sequence**: + - `git reset HEAD manual_tests/` - Unstages files from the index (rules_check uses `git add -A` which stages changes) + - `git checkout -- manual_tests/` - Reverts working tree to match HEAD + - `rm -f manual_tests/test_created_mode/new_config.yml` - Removes any new files created during tests + - The queue clear removes rules that have been shown (status=QUEUED) so they can fire again ## Quality Criteria - **Sub-agents spawned**: All 8 tests were run using the Task tool to spawn sub-agents - the main agent did NOT edit files directly - **Parallel execution**: All 8 sub-agents were launched in a single message (parallel) - **Hooks observed (not triggered)**: The main agent observed hook behavior without manually running rules_check -- **No unexpected blocks**: All tests passed - no blocking hooks fired -- **Changes reverted and queue cleared**: `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` was run after tests completed +- **Early termination on 2 failures**: If 2 tests failed, testing halted immediately and results were reported +- **Changes reverted and queue cleared**: `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` was run after tests completed (regardless of pass/fail) - When all criteria are met, include `✓ Quality Criteria Met` in your response ## Reference @@ -164,8 +176,8 @@ Use branch format: `deepwork/manual_tests-[instance]-YYYYMMDD` 1. **Sub-Agents Used**: Did the main agent spawn sub-agents (using the Task tool) to make the file edits? The main agent must NOT edit the test files directly. 2. **Parallel Execution**: Were multiple sub-agents launched in parallel (in a single message with multiple Task tool calls)? 3. **Hooks Observed**: Did the main agent observe that no blocking hooks fired when the sub-agents returned? The hooks fire AUTOMATICALLY - the agent must NOT manually run the rules_check command. -4. **All Tests Run**: Were all 8 'should NOT fire' tests executed (trigger/safety, set, pair forward, pair reverse, multi safety, infinite block prompt, infinite block command, created)? -5. **Git Reverted**: Were changes reverted and queue cleared after tests completed using `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json`? +4. **Early Termination**: If 2 tests failed, did testing halt immediately with results reported? +5. **Git Reverted**: Were changes reverted and queue cleared after tests completed (or after early termination) using `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json`? ## On Completion 1. Verify outputs are created