Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 23 additions & 14 deletions .claude/skills/manual_tests.run_fire_tests/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ hooks:
1. **Sub-Agents Used**: Did the main agent spawn a sub-agent (using the Task tool) for EACH test? The main agent must NOT edit the test files directly.
2. **Serial Execution**: Were sub-agents launched ONE AT A TIME (not in parallel) to prevent cross-contamination?
3. **Hooks Fired Automatically**: Did the main agent observe the blocking hooks firing automatically when each sub-agent returned? The agent must NOT manually run the rules_check command.
4. **Git Reverted Between Tests**: Was `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run between each test to prevent cross-contamination?
5. **All Tests Run**: Were all 8 'should fire' tests executed (trigger/safety, set, pair, command action, multi safety, infinite block prompt, infinite block command, created)?
4. **Git Reverted Between Tests**: Was `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run after each test to revert files and prevent cross-contamination?
5. **Early Termination**: If 2 tests failed, did testing halt immediately with results reported?
6. **Results Recorded**: Did the main agent track pass/fail status for each test case?

## Instructions
Expand All @@ -39,8 +39,8 @@ hooks:
1. **Sub-Agents Used**: Did the main agent spawn a sub-agent (using the Task tool) for EACH test? The main agent must NOT edit the test files directly.
2. **Serial Execution**: Were sub-agents launched ONE AT A TIME (not in parallel) to prevent cross-contamination?
3. **Hooks Fired Automatically**: Did the main agent observe the blocking hooks firing automatically when each sub-agent returned? The agent must NOT manually run the rules_check command.
4. **Git Reverted Between Tests**: Was `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run between each test to prevent cross-contamination?
5. **All Tests Run**: Were all 8 'should fire' tests executed (trigger/safety, set, pair, command action, multi safety, infinite block prompt, infinite block command, created)?
4. **Git Reverted Between Tests**: Was `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run after each test to revert files and prevent cross-contamination?
5. **Early Termination**: If 2 tests failed, did testing halt immediately with results reported?
6. **Results Recorded**: Did the main agent track pass/fail status for each test case?

## Instructions
Expand Down Expand Up @@ -118,13 +118,21 @@ For EACH test below, follow this cycle:
- If queue is empty, the hook did NOT fire at all
- Record the queue status along with the result
5. **Record the result** - pass if hook fired (visible block OR queue entry), fail if neither
6. **Revert changes and clear queue**:
6. **Revert changes and clear queue** (MANDATORY after each test):
```bash
git checkout -- manual_tests/
git reset HEAD manual_tests/ && git checkout -- manual_tests/ && rm -f manual_tests/test_created_mode/new_config.yml
rm -rf .deepwork/tmp/rules/queue/*.json 2>/dev/null || true
```
The queue must be cleared because rules that have been shown (status=QUEUED) won't fire again until cleared.
7. **Proceed to the next test**
**Why this command sequence**:
- `git reset HEAD manual_tests/` - Unstages files from the index (rules_check uses `git add -A` which stages changes)
- `git checkout -- manual_tests/` - Reverts working tree to match HEAD
- `rm -f ...` - Removes any new files created during tests
- The queue clear removes rules that have been shown (status=QUEUED) so they can fire again
7. **Check for early termination**: If **2 tests have now failed**, immediately:
- Stop running any remaining tests
- Report the results summary showing which tests passed/failed
- The job halts here - do NOT proceed with remaining tests
8. **Proceed to the next test** (only if fewer than 2 failures)

**IMPORTANT**: Only launch ONE sub-agent at a time. Wait for it to complete and revert before launching the next.

Expand Down Expand Up @@ -184,12 +192,13 @@ Record the result after each test:

## Quality Criteria

- **Sub-agents spawned**: All 8 tests were run using the Task tool to spawn sub-agents - the main agent did NOT edit files directly
- **Sub-agents spawned**: Tests were run using the Task tool to spawn sub-agents - the main agent did NOT edit files directly
- **Serial execution**: Sub-agents were launched ONE AT A TIME, not in parallel
- **Git reverted and queue cleared between tests**: `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` was run after each test
- **Git reverted and queue cleared between tests**: `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` was run after each test
- **Hooks observed (not triggered)**: The main agent observed hook behavior without manually running rules_check - hooks fired AUTOMATICALLY
- **Blocking behavior verified**: For each test, the appropriate blocking hook fired automatically when the sub-agent returned
- **Results recorded**: Pass/fail status was recorded for each test
- **Blocking behavior verified**: For each test run, the appropriate blocking hook fired automatically when the sub-agent returned
- **Early termination on 2 failures**: If 2 tests failed, testing halted immediately and results were reported
- **Results recorded**: Pass/fail status was recorded for each test run
- When all criteria are met, include `<promise>✓ Quality Criteria Met</promise>` in your response

## Reference
Expand Down Expand Up @@ -262,8 +271,8 @@ Stop hooks will automatically validate your work. The loop continues until all c
1. **Sub-Agents Used**: Did the main agent spawn a sub-agent (using the Task tool) for EACH test? The main agent must NOT edit the test files directly.
2. **Serial Execution**: Were sub-agents launched ONE AT A TIME (not in parallel) to prevent cross-contamination?
3. **Hooks Fired Automatically**: Did the main agent observe the blocking hooks firing automatically when each sub-agent returned? The agent must NOT manually run the rules_check command.
4. **Git Reverted Between Tests**: Was `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run between each test to prevent cross-contamination?
5. **All Tests Run**: Were all 8 'should fire' tests executed (trigger/safety, set, pair, command action, multi safety, infinite block prompt, infinite block command, created)?
4. **Git Reverted Between Tests**: Was `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run after each test to revert files and prevent cross-contamination?
5. **Early Termination**: If 2 tests failed, did testing halt immediately with results reported?
6. **Results Recorded**: Did the main agent track pass/fail status for each test case?


Expand Down
36 changes: 24 additions & 12 deletions .claude/skills/manual_tests.run_not_fire_tests/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ hooks:
1. **Sub-Agents Used**: Did the main agent spawn sub-agents (using the Task tool) to make the file edits? The main agent must NOT edit the test files directly.
2. **Parallel Execution**: Were multiple sub-agents launched in parallel (in a single message with multiple Task tool calls)?
3. **Hooks Observed**: Did the main agent observe that no blocking hooks fired when the sub-agents returned? The hooks fire AUTOMATICALLY - the agent must NOT manually run the rules_check command.
4. **All Tests Run**: Were all 8 'should NOT fire' tests executed (trigger/safety, set, pair forward, pair reverse, multi safety, infinite block prompt, infinite block command, created)?
5. **Git Reverted**: Were changes reverted and queue cleared after tests completed using `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json`?
4. **Early Termination**: If 2 tests failed, did testing halt immediately with results reported?
5. **Git Reverted**: Were changes reverted and queue cleared after tests completed (or after early termination) using `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json`?

## Instructions

Expand All @@ -38,8 +38,8 @@ hooks:
1. **Sub-Agents Used**: Did the main agent spawn sub-agents (using the Task tool) to make the file edits? The main agent must NOT edit the test files directly.
2. **Parallel Execution**: Were multiple sub-agents launched in parallel (in a single message with multiple Task tool calls)?
3. **Hooks Observed**: Did the main agent observe that no blocking hooks fired when the sub-agents returned? The hooks fire AUTOMATICALLY - the agent must NOT manually run the rules_check command.
4. **All Tests Run**: Were all 8 'should NOT fire' tests executed (trigger/safety, set, pair forward, pair reverse, multi safety, infinite block prompt, infinite block command, created)?
5. **Git Reverted**: Were changes reverted and queue cleared after tests completed using `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json`?
4. **Early Termination**: If 2 tests failed, did testing halt immediately with results reported?
5. **Git Reverted**: Were changes reverted and queue cleared after tests completed (or after early termination) using `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json`?

## Instructions

Expand Down Expand Up @@ -118,7 +118,7 @@ Run all 8 "should NOT fire" tests in **parallel** sub-agents, then verify no blo

**Remember**: You are OBSERVING whether hooks fired automatically. Do NOT run any verification commands manually.

3. **Record the results**
3. **Record the results and check for early termination**

Track which tests passed and which failed:

Expand All @@ -133,23 +133,35 @@ Run all 8 "should NOT fire" tests in **parallel** sub-agents, then verify no blo
| Infinite Block Command | Promise tag | |
| Created Mode | Modify existing | |

**EARLY TERMINATION**: If **2 tests have failed**, immediately:
1. Stop running any remaining tests
2. Revert all changes and clear queue (see step 4)
3. Report the results summary showing which tests passed/failed
4. Do NOT proceed to the next step - the job halts here

4. **Revert all changes and clear queue**

After all tests complete, run:
**IMPORTANT**: This step is MANDATORY and must run regardless of whether tests passed or failed.

Run these commands to clean up:
```bash
git checkout -- manual_tests/
git reset HEAD manual_tests/ && git checkout -- manual_tests/ && rm -f manual_tests/test_created_mode/new_config.yml
rm -rf .deepwork/tmp/rules/queue/*.json 2>/dev/null || true
```

This cleans up the test files AND clears the rules queue before the "should fire" tests run. The queue must be cleared because rules that have already been shown to the agent (status=QUEUED) won't fire again until the queue is cleared.
**Why this command sequence**:
- `git reset HEAD manual_tests/` - Unstages files from the index (rules_check uses `git add -A` which stages changes)
- `git checkout -- manual_tests/` - Reverts working tree to match HEAD
- `rm -f manual_tests/test_created_mode/new_config.yml` - Removes any new files created during tests
- The queue clear removes rules that have been shown (status=QUEUED) so they can fire again

## Quality Criteria

- **Sub-agents spawned**: All 8 tests were run using the Task tool to spawn sub-agents - the main agent did NOT edit files directly
- **Parallel execution**: All 8 sub-agents were launched in a single message (parallel)
- **Hooks observed (not triggered)**: The main agent observed hook behavior without manually running rules_check
- **No unexpected blocks**: All tests passed - no blocking hooks fired
- **Changes reverted and queue cleared**: `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` was run after tests completed
- **Early termination on 2 failures**: If 2 tests failed, testing halted immediately and results were reported
- **Changes reverted and queue cleared**: `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` was run after tests completed (regardless of pass/fail)
- When all criteria are met, include `<promise>✓ Quality Criteria Met</promise>` in your response

## Reference
Expand Down Expand Up @@ -217,8 +229,8 @@ Stop hooks will automatically validate your work. The loop continues until all c
1. **Sub-Agents Used**: Did the main agent spawn sub-agents (using the Task tool) to make the file edits? The main agent must NOT edit the test files directly.
2. **Parallel Execution**: Were multiple sub-agents launched in parallel (in a single message with multiple Task tool calls)?
3. **Hooks Observed**: Did the main agent observe that no blocking hooks fired when the sub-agents returned? The hooks fire AUTOMATICALLY - the agent must NOT manually run the rules_check command.
4. **All Tests Run**: Were all 8 'should NOT fire' tests executed (trigger/safety, set, pair forward, pair reverse, multi safety, infinite block prompt, infinite block command, created)?
5. **Git Reverted**: Were changes reverted and queue cleared after tests completed using `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json`?
4. **Early Termination**: If 2 tests failed, did testing halt immediately with results reported?
5. **Git Reverted**: Were changes reverted and queue cleared after tests completed (or after early termination) using `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json`?


**To complete**: Include `<promise>✓ Quality Criteria Met</promise>` in your final response only after verifying ALL criteria are satisfied.
Expand Down
14 changes: 9 additions & 5 deletions .deepwork/jobs/manual_tests/job.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: manual_tests
version: "1.1.0"
version: "1.2.1"
summary: "Runs all manual hook/rule tests using sub-agents. Use when validating that DeepWork rules fire correctly."
description: |
A workflow for running manual tests that validate DeepWork rules/hooks fire correctly.
Expand Down Expand Up @@ -28,6 +28,10 @@ description: |
- Created mode (new files only)

changelog:
- version: "1.2.1"
changes: "Fixed incomplete revert - now uses git reset HEAD to unstage files (rules_check stages with git add -A)"
- version: "1.2.0"
changes: "Added early termination on 2 test failures; emphasized mandatory file revert and queue clear after each step"
- version: "1.1.0"
changes: "Added rules queue clearing between tests to prevent anti-infinite-loop mechanism from blocking tests"
- version: "1.0.0"
Expand All @@ -46,8 +50,8 @@ steps:
- "**Sub-Agents Used**: Did the main agent spawn sub-agents (using the Task tool) to make the file edits? The main agent must NOT edit the test files directly."
- "**Parallel Execution**: Were multiple sub-agents launched in parallel (in a single message with multiple Task tool calls)?"
- "**Hooks Observed**: Did the main agent observe that no blocking hooks fired when the sub-agents returned? The hooks fire AUTOMATICALLY - the agent must NOT manually run the rules_check command."
- "**All Tests Run**: Were all 8 'should NOT fire' tests executed (trigger/safety, set, pair forward, pair reverse, multi safety, infinite block prompt, infinite block command, created)?"
- "**Git Reverted**: Were changes reverted and queue cleared after tests completed using `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json`?"
- "**Early Termination**: If 2 tests failed, did testing halt immediately with results reported?"
- "**Git Reverted**: Were changes reverted and queue cleared after tests completed (or after early termination) using `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json`?"

- id: run_fire_tests
name: "Run Should-Fire Tests"
Expand All @@ -64,6 +68,6 @@ steps:
- "**Sub-Agents Used**: Did the main agent spawn a sub-agent (using the Task tool) for EACH test? The main agent must NOT edit the test files directly."
- "**Serial Execution**: Were sub-agents launched ONE AT A TIME (not in parallel) to prevent cross-contamination?"
- "**Hooks Fired Automatically**: Did the main agent observe the blocking hooks firing automatically when each sub-agent returned? The agent must NOT manually run the rules_check command."
- "**Git Reverted Between Tests**: Was `git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run between each test to prevent cross-contamination?"
- "**All Tests Run**: Were all 8 'should fire' tests executed (trigger/safety, set, pair, command action, multi safety, infinite block prompt, infinite block command, created)?"
- "**Git Reverted Between Tests**: Was `git reset HEAD manual_tests/ && git checkout -- manual_tests/` and `rm -rf .deepwork/tmp/rules/queue/*.json` run after each test to revert files and prevent cross-contamination?"
- "**Early Termination**: If 2 tests failed, did testing halt immediately with results reported?"
- "**Results Recorded**: Did the main agent track pass/fail status for each test case?"
Loading