diff --git a/.claude/skills/manual_tests.run_fire_tests/SKILL.md b/.claude/skills/manual_tests.run_fire_tests/SKILL.md new file mode 100644 index 00000000..c9c4813c --- /dev/null +++ b/.claude/skills/manual_tests.run_fire_tests/SKILL.md @@ -0,0 +1,237 @@ +--- +name: manual_tests.run_fire_tests +description: "Runs all 'should fire' tests serially with git reverts between each. Use after NOT-fire tests to verify rules fire correctly." +user-invocable: false +hooks: + Stop: + - hooks: + - type: prompt + prompt: | + You must evaluate whether Claude has met all the below quality criteria for the request. + + ## Quality Criteria + + 1. **Sub-Agents Used**: Did the main agent spawn a sub-agent (using the Task tool) for EACH test? The main agent must NOT edit the test files directly. + 2. **Serial Execution**: Were sub-agents launched ONE AT A TIME (not in parallel) to prevent cross-contamination? + 3. **Hooks Fired Automatically**: Did the main agent observe the blocking hooks firing automatically when each sub-agent returned? The agent must NOT manually run the rules_check command. + 4. **Git Reverted Between Tests**: Was `git checkout -- manual_tests/` run between each test to prevent cross-contamination? + 5. **All Tests Run**: Were all 8 'should fire' tests executed (trigger/safety, set, pair, command action, multi safety, infinite block prompt, infinite block command, created)? + 6. **Results Recorded**: Did the main agent track pass/fail status for each test case? + + ## Instructions + + Review the conversation and determine if ALL quality criteria above have been satisfied. + Look for evidence that each criterion has been addressed. + + If the agent has included `✓ Quality Criteria Met` in their response AND + all criteria appear to be met, respond with: {"ok": true} + + If criteria are NOT met OR the promise tag is missing, respond with: + {"ok": false, "reason": "**AGENT: TAKE ACTION** - [which criteria failed and why]"} +--- + +# manual_tests.run_fire_tests + +**Step 2/2** in **manual_tests** workflow + +> Runs all manual hook/rule tests using sub-agents. Use when validating that DeepWork rules fire correctly. + +## Prerequisites (Verify First) + +Before proceeding, confirm these steps are complete: +- `/manual_tests.run_not_fire_tests` + +## Instructions + +**Goal**: Runs all 'should fire' tests serially with git reverts between each. Use after NOT-fire tests to verify rules fire correctly. + +# Run Should-Fire Tests + +## Objective + +Run all "should fire" tests in **serial** sub-agents to verify that rules fire correctly when their trigger conditions are met without safety conditions. + +## CRITICAL: Sub-Agent Requirement + +**You MUST spawn sub-agents to make all file edits. DO NOT edit the test files yourself.** + +Why sub-agents are required: +1. Sub-agents run in isolated contexts where file changes are detected +2. When a sub-agent completes, the Stop hook **automatically** evaluates rules +3. You (the main agent) observe whether hooks fired - you do NOT manually trigger them +4. If you edit files directly, the hooks won't fire because you're not a completing sub-agent + +**NEVER manually run `echo '{}' | python -m deepwork.hooks.rules_check`** - this defeats the purpose of the test. Hooks must fire AUTOMATICALLY when sub-agents return. + +## CRITICAL: Serial Execution + +**These tests MUST run ONE AT A TIME, with git reverts between each.** + +Why serial execution is required: +- These tests edit ONLY the trigger file (not the safety) +- If multiple sub-agents run in parallel, sub-agent A's hook will see changes from sub-agent B +- This causes cross-contamination: A gets blocked by rules triggered by B's changes +- Run one test, observe the hook, revert, then run the next + +## Task + +Run all 8 "should fire" tests in **serial** sub-agents, reverting between each, and verify that blocking hooks fire automatically. + +### Process + +For EACH test below, follow this cycle: + +1. **Launch a sub-agent** using the Task tool (use a fast model like haiku) +2. **Wait for the sub-agent to complete** +3. **Observe whether the hook fired automatically** - you should see a blocking prompt or command output +4. **Record the result** - pass if hook fired, fail if it didn't +5. **Revert changes**: `git checkout -- manual_tests/` +6. **Proceed to the next test** + +**IMPORTANT**: Only launch ONE sub-agent at a time. Wait for it to complete and revert before launching the next. + +### Test Cases (run serially) + +**Test 1: Trigger/Safety** +- Sub-agent prompt: "Edit ONLY `manual_tests/test_trigger_safety_mode/feature.py` to add a comment. Do NOT edit the `_doc.md` file." +- Expected: Hook fires with prompt about updating documentation + +**Test 2: Set Mode** +- Sub-agent prompt: "Edit ONLY `manual_tests/test_set_mode/module_source.py` to add a comment. Do NOT edit the `_test.py` file." +- Expected: Hook fires with prompt about updating tests + +**Test 3: Pair Mode** +- Sub-agent prompt: "Edit ONLY `manual_tests/test_pair_mode/handler_trigger.py` to add a comment. Do NOT edit the `_expected.md` file." +- Expected: Hook fires with prompt about updating expected output + +**Test 4: Command Action** +- Sub-agent prompt: "Edit `manual_tests/test_command_action/input.txt` to add some text." +- Expected: Command runs automatically, appending to the log file (this rule always runs, no safety condition) + +**Test 5: Multi Safety** +- Sub-agent prompt: "Edit ONLY `manual_tests/test_multi_safety/core.py` to add a comment. Do NOT edit any of the safety files (`_safety_a.md`, `_safety_b.md`, or `_safety_c.md`)." +- Expected: Hook fires with prompt about updating safety documentation + +**Test 6: Infinite Block Prompt** +- Sub-agent prompt: "Edit `manual_tests/test_infinite_block_prompt/dangerous.py` to add a comment. Do NOT include any promise tags." +- Expected: Hook fires and BLOCKS with infinite prompt - sub-agent cannot complete until promise is provided + +**Test 7: Infinite Block Command** +- Sub-agent prompt: "Edit `manual_tests/test_infinite_block_command/risky.py` to add a comment. Do NOT include any promise tags." +- Expected: Hook fires and command fails - sub-agent cannot complete until promise is provided + +**Test 8: Created Mode** +- Sub-agent prompt: "Create a NEW file `manual_tests/test_created_mode/new_config.yml` with some YAML content. This must be a NEW file, not a modification." +- Expected: Hook fires with prompt about new configuration files + +### Results Tracking + +Record the result after each test: + +| Test Case | Should Fire | Hook Fired? | Result | +|-----------|-------------|:-----------:|:------:| +| Trigger/Safety | Edit .py only | | | +| Set Mode | Edit _source.py only | | | +| Pair Mode | Edit _trigger.py only | | | +| Command Action | Edit .txt | | | +| Multi Safety | Edit .py only | | | +| Infinite Block Prompt | Edit .py (no promise) | | | +| Infinite Block Command | Edit .py (no promise) | | | +| Created Mode | Create NEW .yml | | | + +## Quality Criteria + +- **Sub-agents spawned**: All 8 tests were run using the Task tool to spawn sub-agents - the main agent did NOT edit files directly +- **Serial execution**: Sub-agents were launched ONE AT A TIME, not in parallel +- **Git reverted between tests**: `git checkout -- manual_tests/` was run after each test +- **Hooks observed (not triggered)**: The main agent observed hook behavior without manually running rules_check - hooks fired AUTOMATICALLY +- **Blocking behavior verified**: For each test, the appropriate blocking hook fired automatically when the sub-agent returned +- **Results recorded**: Pass/fail status was recorded for each test +- When all criteria are met, include `✓ Quality Criteria Met` in your response + +## Reference + +See [test_reference.md](test_reference.md) for the complete test matrix and rule descriptions. + +## Context + +This step runs after the "should NOT fire" tests. These tests verify that rules correctly fire when trigger conditions are met without safety conditions. The serial execution with reverts is essential to prevent cross-contamination between tests. + + +### Job Context + +A workflow for running manual tests that validate DeepWork rules/hooks fire correctly. + +This job tests that rules fire when they should AND do not fire when they shouldn't. +Each test is run in a SUB-AGENT (not the main agent) because: +1. Sub-agents run in isolated contexts where file changes can be detected +2. The Stop hook automatically evaluates rules when each sub-agent completes +3. The main agent can observe whether hooks fired without triggering them manually + +CRITICAL: All tests MUST run in sub-agents. The main agent MUST NOT make the file +edits itself - it spawns sub-agents to make edits, then observes whether the hooks +fired automatically when those sub-agents returned. + +Steps: +1. run_not_fire_tests - Run all "should NOT fire" tests in PARALLEL sub-agents +2. run_fire_tests - Run all "should fire" tests in SERIAL sub-agents with reverts between + +Test types covered: +- Trigger/Safety mode +- Set mode (bidirectional) +- Pair mode (directional) +- Command action +- Multi safety +- Infinite block (prompt and command) +- Created mode (new files only) + + +## Required Inputs + + +**Files from Previous Steps** - Read these first: +- `not_fire_results` (from `run_not_fire_tests`) + +## Work Branch + +Use branch format: `deepwork/manual_tests-[instance]-YYYYMMDD` + +- If on a matching work branch: continue using it +- If on main/master: create new branch with `git checkout -b deepwork/manual_tests-[instance]-$(date +%Y%m%d)` + +## Outputs + +**Required outputs**: +- `fire_results` + +## Guardrails + +- Do NOT skip prerequisite verification if this step has dependencies +- Do NOT produce partial outputs; complete all required outputs before finishing +- Do NOT proceed without required inputs; ask the user if any are missing +- Do NOT modify files outside the scope of this step's defined outputs + +## Quality Validation + +Stop hooks will automatically validate your work. The loop continues until all criteria pass. + +**Criteria (all must be satisfied)**: +1. **Sub-Agents Used**: Did the main agent spawn a sub-agent (using the Task tool) for EACH test? The main agent must NOT edit the test files directly. +2. **Serial Execution**: Were sub-agents launched ONE AT A TIME (not in parallel) to prevent cross-contamination? +3. **Hooks Fired Automatically**: Did the main agent observe the blocking hooks firing automatically when each sub-agent returned? The agent must NOT manually run the rules_check command. +4. **Git Reverted Between Tests**: Was `git checkout -- manual_tests/` run between each test to prevent cross-contamination? +5. **All Tests Run**: Were all 8 'should fire' tests executed (trigger/safety, set, pair, command action, multi safety, infinite block prompt, infinite block command, created)? +6. **Results Recorded**: Did the main agent track pass/fail status for each test case? + + +**To complete**: Include `✓ Quality Criteria Met` in your final response only after verifying ALL criteria are satisfied. + +## On Completion + +1. Verify outputs are created +2. Inform user: "Step 2/2 complete, outputs: fire_results" +3. **Workflow complete**: All steps finished. Consider creating a PR to merge the work branch. + +--- + +**Reference files**: `.deepwork/jobs/manual_tests/job.yml`, `.deepwork/jobs/manual_tests/steps/run_fire_tests.md` \ No newline at end of file diff --git a/.claude/skills/manual_tests.run_not_fire_tests/SKILL.md b/.claude/skills/manual_tests.run_not_fire_tests/SKILL.md new file mode 100644 index 00000000..3bff6fff --- /dev/null +++ b/.claude/skills/manual_tests.run_not_fire_tests/SKILL.md @@ -0,0 +1,213 @@ +--- +name: manual_tests.run_not_fire_tests +description: "Runs all 'should NOT fire' tests in parallel sub-agents. Use to verify rules don't fire when safety conditions are met." +user-invocable: false +hooks: + Stop: + - hooks: + - type: prompt + prompt: | + You must evaluate whether Claude has met all the below quality criteria for the request. + + ## Quality Criteria + + 1. **Sub-Agents Used**: Did the main agent spawn sub-agents (using the Task tool) to make the file edits? The main agent must NOT edit the test files directly. + 2. **Parallel Execution**: Were multiple sub-agents launched in parallel (in a single message with multiple Task tool calls)? + 3. **Hooks Observed**: Did the main agent observe that no blocking hooks fired when the sub-agents returned? The hooks fire AUTOMATICALLY - the agent must NOT manually run the rules_check command. + 4. **All Tests Run**: Were all 8 'should NOT fire' tests executed (trigger/safety, set, pair forward, pair reverse, multi safety, infinite block prompt, infinite block command, created)? + 5. **Git Reverted**: Were changes reverted after tests completed using `git checkout -- manual_tests/`? + + ## Instructions + + Review the conversation and determine if ALL quality criteria above have been satisfied. + Look for evidence that each criterion has been addressed. + + If the agent has included `✓ Quality Criteria Met` in their response AND + all criteria appear to be met, respond with: {"ok": true} + + If criteria are NOT met OR the promise tag is missing, respond with: + {"ok": false, "reason": "**AGENT: TAKE ACTION** - [which criteria failed and why]"} +--- + +# manual_tests.run_not_fire_tests + +**Step 1/2** in **manual_tests** workflow + +> Runs all manual hook/rule tests using sub-agents. Use when validating that DeepWork rules fire correctly. + + +## Instructions + +**Goal**: Runs all 'should NOT fire' tests in parallel sub-agents. Use to verify rules don't fire when safety conditions are met. + +# Run Should-NOT-Fire Tests + +## Objective + +Run all "should NOT fire" tests in parallel sub-agents to verify that rules do not fire when their safety conditions are met. + +## CRITICAL: Sub-Agent Requirement + +**You MUST spawn sub-agents to make all file edits. DO NOT edit the test files yourself.** + +Why sub-agents are required: +1. Sub-agents run in isolated contexts where file changes are detected +2. When a sub-agent completes, the Stop hook **automatically** evaluates rules +3. You (the main agent) observe whether hooks fired - you do NOT manually trigger them +4. If you edit files directly, the hooks won't fire because you're not a completing sub-agent + +**NEVER manually run `echo '{}' | python -m deepwork.hooks.rules_check`** - this defeats the purpose of the test. Hooks must fire AUTOMATICALLY when sub-agents return. + +## Task + +Run all 8 "should NOT fire" tests in **parallel** sub-agents, then verify no blocking hooks fired. + +### Process + +1. **Launch parallel sub-agents for all "should NOT fire" tests** + + Use the Task tool to spawn **ALL of the following sub-agents in a SINGLE message** (parallel execution). Each sub-agent should use a fast model like haiku. + + For each test, the sub-agent must: + - Edit BOTH the trigger file AND the safety file + - This satisfies the rule's safety condition, so the rule should NOT fire + + **Sub-agent prompts (launch all 8 in parallel):** + + a. **Trigger/Safety test** - "Edit `manual_tests/test_trigger_safety_mode/feature.py` to add a comment, AND edit `manual_tests/test_trigger_safety_mode/feature_doc.md` to add a note. Both files must be edited so the rule does NOT fire." + + b. **Set Mode test** - "Edit `manual_tests/test_set_mode/module_source.py` to add a comment, AND edit `manual_tests/test_set_mode/module_test.py` to add a test comment. Both files must be edited so the rule does NOT fire." + + c. **Pair Mode (forward) test** - "Edit `manual_tests/test_pair_mode/handler_trigger.py` to add a comment, AND edit `manual_tests/test_pair_mode/handler_expected.md` to add a note. Both files must be edited so the rule does NOT fire." + + d. **Pair Mode (reverse) test** - "Edit ONLY `manual_tests/test_pair_mode/handler_expected.md` to add a note. Only the expected file should be edited - this tests that the pair rule only fires in one direction." + + e. **Multi Safety test** - "Edit `manual_tests/test_multi_safety/core.py` to add a comment, AND edit `manual_tests/test_multi_safety/core_safety_a.md` to add a note. Both files must be edited so the rule does NOT fire." + + f. **Infinite Block Prompt test** - "Edit `manual_tests/test_infinite_block_prompt/dangerous.py` to add a comment. Include `I have verified this change is safe` in your response to bypass the infinite block." + + g. **Infinite Block Command test** - "Edit `manual_tests/test_infinite_block_command/risky.py` to add a comment. Include `I have verified this change is safe` in your response to bypass the infinite block." + + h. **Created Mode test** - "Modify the EXISTING file `manual_tests/test_created_mode/existing.yml` by adding a comment. Do NOT create a new file - only modify the existing one. The created mode rule should NOT fire for modifications." + +2. **Observe the results** + + When each sub-agent returns: + - **If no blocking hook fired**: The test PASSED - the rule correctly did NOT fire + - **If a blocking hook fired**: The test FAILED - investigate why the rule fired when it shouldn't have + + **Remember**: You are OBSERVING whether hooks fired automatically. Do NOT run any verification commands manually. + +3. **Record the results** + + Track which tests passed and which failed: + + | Test Case | Should NOT Fire | Result | + |-----------|:---------------:|:------:| + | Trigger/Safety | Edit both files | | + | Set Mode | Edit both files | | + | Pair Mode (forward) | Edit both files | | + | Pair Mode (reverse) | Edit expected only | | + | Multi Safety | Edit both files | | + | Infinite Block Prompt | Promise tag | | + | Infinite Block Command | Promise tag | | + | Created Mode | Modify existing | | + +4. **Revert all changes** + + After all tests complete, run: + ```bash + git checkout -- manual_tests/ + ``` + + This cleans up the test files before the "should fire" tests run. + +## Quality Criteria + +- **Sub-agents spawned**: All 8 tests were run using the Task tool to spawn sub-agents - the main agent did NOT edit files directly +- **Parallel execution**: All 8 sub-agents were launched in a single message (parallel) +- **Hooks observed (not triggered)**: The main agent observed hook behavior without manually running rules_check +- **No unexpected blocks**: All tests passed - no blocking hooks fired +- **Changes reverted**: `git checkout -- manual_tests/` was run after tests completed +- When all criteria are met, include `✓ Quality Criteria Met` in your response + +## Reference + +See [test_reference.md](test_reference.md) for the complete test matrix and rule descriptions. + +## Context + +This step runs first and tests that rules correctly do NOT fire when safety conditions are met. The "should fire" tests run after these complete and the working directory is reverted. + + +### Job Context + +A workflow for running manual tests that validate DeepWork rules/hooks fire correctly. + +This job tests that rules fire when they should AND do not fire when they shouldn't. +Each test is run in a SUB-AGENT (not the main agent) because: +1. Sub-agents run in isolated contexts where file changes can be detected +2. The Stop hook automatically evaluates rules when each sub-agent completes +3. The main agent can observe whether hooks fired without triggering them manually + +CRITICAL: All tests MUST run in sub-agents. The main agent MUST NOT make the file +edits itself - it spawns sub-agents to make edits, then observes whether the hooks +fired automatically when those sub-agents returned. + +Steps: +1. run_not_fire_tests - Run all "should NOT fire" tests in PARALLEL sub-agents +2. run_fire_tests - Run all "should fire" tests in SERIAL sub-agents with reverts between + +Test types covered: +- Trigger/Safety mode +- Set mode (bidirectional) +- Pair mode (directional) +- Command action +- Multi safety +- Infinite block (prompt and command) +- Created mode (new files only) + + + +## Work Branch + +Use branch format: `deepwork/manual_tests-[instance]-YYYYMMDD` + +- If on a matching work branch: continue using it +- If on main/master: create new branch with `git checkout -b deepwork/manual_tests-[instance]-$(date +%Y%m%d)` + +## Outputs + +**Required outputs**: +- `not_fire_results` + +## Guardrails + +- Do NOT skip prerequisite verification if this step has dependencies +- Do NOT produce partial outputs; complete all required outputs before finishing +- Do NOT proceed without required inputs; ask the user if any are missing +- Do NOT modify files outside the scope of this step's defined outputs + +## Quality Validation + +Stop hooks will automatically validate your work. The loop continues until all criteria pass. + +**Criteria (all must be satisfied)**: +1. **Sub-Agents Used**: Did the main agent spawn sub-agents (using the Task tool) to make the file edits? The main agent must NOT edit the test files directly. +2. **Parallel Execution**: Were multiple sub-agents launched in parallel (in a single message with multiple Task tool calls)? +3. **Hooks Observed**: Did the main agent observe that no blocking hooks fired when the sub-agents returned? The hooks fire AUTOMATICALLY - the agent must NOT manually run the rules_check command. +4. **All Tests Run**: Were all 8 'should NOT fire' tests executed (trigger/safety, set, pair forward, pair reverse, multi safety, infinite block prompt, infinite block command, created)? +5. **Git Reverted**: Were changes reverted after tests completed using `git checkout -- manual_tests/`? + + +**To complete**: Include `✓ Quality Criteria Met` in your final response only after verifying ALL criteria are satisfied. + +## On Completion + +1. Verify outputs are created +2. Inform user: "Step 1/2 complete, outputs: not_fire_results" +3. **Continue workflow**: Use Skill tool to invoke `/manual_tests.run_fire_tests` + +--- + +**Reference files**: `.deepwork/jobs/manual_tests/job.yml`, `.deepwork/jobs/manual_tests/steps/run_not_fire_tests.md` \ No newline at end of file diff --git a/.claude/skills/manual_tests/SKILL.md b/.claude/skills/manual_tests/SKILL.md new file mode 100644 index 00000000..bf97b88a --- /dev/null +++ b/.claude/skills/manual_tests/SKILL.md @@ -0,0 +1,80 @@ +--- +name: manual_tests +description: "Runs all manual hook/rule tests using sub-agents. Use when validating that DeepWork rules fire correctly." +--- + +# manual_tests + +**Multi-step workflow**: Runs all manual hook/rule tests using sub-agents. Use when validating that DeepWork rules fire correctly. + +> **CRITICAL**: Always invoke steps using the Skill tool. Never copy/paste step instructions directly. + +A workflow for running manual tests that validate DeepWork rules/hooks fire correctly. + +This job tests that rules fire when they should AND do not fire when they shouldn't. +Each test is run in a SUB-AGENT (not the main agent) because: +1. Sub-agents run in isolated contexts where file changes can be detected +2. The Stop hook automatically evaluates rules when each sub-agent completes +3. The main agent can observe whether hooks fired without triggering them manually + +CRITICAL: All tests MUST run in sub-agents. The main agent MUST NOT make the file +edits itself - it spawns sub-agents to make edits, then observes whether the hooks +fired automatically when those sub-agents returned. + +Steps: +1. run_not_fire_tests - Run all "should NOT fire" tests in PARALLEL sub-agents +2. run_fire_tests - Run all "should fire" tests in SERIAL sub-agents with reverts between + +Test types covered: +- Trigger/Safety mode +- Set mode (bidirectional) +- Pair mode (directional) +- Command action +- Multi safety +- Infinite block (prompt and command) +- Created mode (new files only) + + +## Available Steps + +1. **run_not_fire_tests** - Runs all 'should NOT fire' tests in parallel sub-agents. Use to verify rules don't fire when safety conditions are met. +2. **run_fire_tests** - Runs all 'should fire' tests serially with git reverts between each. Use after NOT-fire tests to verify rules fire correctly. (requires: run_not_fire_tests) + +## Execution Instructions + +### Step 1: Analyze Intent + +Parse any text following `/manual_tests` to determine user intent: +- "run_not_fire_tests" or related terms → start at `manual_tests.run_not_fire_tests` +- "run_fire_tests" or related terms → start at `manual_tests.run_fire_tests` + +### Step 2: Invoke Starting Step + +Use the Skill tool to invoke the identified starting step: +``` +Skill tool: manual_tests.run_not_fire_tests +``` + +### Step 3: Continue Workflow Automatically + +After each step completes: +1. Check if there's a next step in the sequence +2. Invoke the next step using the Skill tool +3. Repeat until workflow is complete or user intervenes + +### Handling Ambiguous Intent + +If user intent is unclear, use AskUserQuestion to clarify: +- Present available steps as numbered options +- Let user select the starting point + +## Guardrails + +- Do NOT copy/paste step instructions directly; always use the Skill tool to invoke steps +- Do NOT skip steps in the workflow unless the user explicitly requests it +- Do NOT proceed to the next step if the current step's outputs are incomplete +- Do NOT make assumptions about user intent; ask for clarification when ambiguous + +## Context Files + +- Job definition: `.deepwork/jobs/manual_tests/job.yml` \ No newline at end of file diff --git a/.deepwork/jobs/manual_tests/job.yml b/.deepwork/jobs/manual_tests/job.yml new file mode 100644 index 00000000..d35fe02d --- /dev/null +++ b/.deepwork/jobs/manual_tests/job.yml @@ -0,0 +1,67 @@ +name: manual_tests +version: "1.0.0" +summary: "Runs all manual hook/rule tests using sub-agents. Use when validating that DeepWork rules fire correctly." +description: | + A workflow for running manual tests that validate DeepWork rules/hooks fire correctly. + + This job tests that rules fire when they should AND do not fire when they shouldn't. + Each test is run in a SUB-AGENT (not the main agent) because: + 1. Sub-agents run in isolated contexts where file changes can be detected + 2. The Stop hook automatically evaluates rules when each sub-agent completes + 3. The main agent can observe whether hooks fired without triggering them manually + + CRITICAL: All tests MUST run in sub-agents. The main agent MUST NOT make the file + edits itself - it spawns sub-agents to make edits, then observes whether the hooks + fired automatically when those sub-agents returned. + + Steps: + 1. run_not_fire_tests - Run all "should NOT fire" tests in PARALLEL sub-agents + 2. run_fire_tests - Run all "should fire" tests in SERIAL sub-agents with reverts between + + Test types covered: + - Trigger/Safety mode + - Set mode (bidirectional) + - Pair mode (directional) + - Command action + - Multi safety + - Infinite block (prompt and command) + - Created mode (new files only) + +changelog: + - version: "1.0.0" + changes: "Initial job creation - tests run in sub-agents to observe automatic hook firing" + +steps: + - id: run_not_fire_tests + name: "Run Should-NOT-Fire Tests" + description: "Runs all 'should NOT fire' tests in parallel sub-agents. Use to verify rules don't fire when safety conditions are met." + instructions_file: steps/run_not_fire_tests.md + inputs: [] + outputs: + - not_fire_results # implicit state: all "should NOT fire" tests passed + dependencies: [] + quality_criteria: + - "**Sub-Agents Used**: Did the main agent spawn sub-agents (using the Task tool) to make the file edits? The main agent must NOT edit the test files directly." + - "**Parallel Execution**: Were multiple sub-agents launched in parallel (in a single message with multiple Task tool calls)?" + - "**Hooks Observed**: Did the main agent observe that no blocking hooks fired when the sub-agents returned? The hooks fire AUTOMATICALLY - the agent must NOT manually run the rules_check command." + - "**All Tests Run**: Were all 8 'should NOT fire' tests executed (trigger/safety, set, pair forward, pair reverse, multi safety, infinite block prompt, infinite block command, created)?" + - "**Git Reverted**: Were changes reverted after tests completed using `git checkout -- manual_tests/`?" + + - id: run_fire_tests + name: "Run Should-Fire Tests" + description: "Runs all 'should fire' tests serially with git reverts between each. Use after NOT-fire tests to verify rules fire correctly." + instructions_file: steps/run_fire_tests.md + inputs: + - file: not_fire_results + from_step: run_not_fire_tests + outputs: + - fire_results # implicit state: all "should fire" tests passed + dependencies: + - run_not_fire_tests + quality_criteria: + - "**Sub-Agents Used**: Did the main agent spawn a sub-agent (using the Task tool) for EACH test? The main agent must NOT edit the test files directly." + - "**Serial Execution**: Were sub-agents launched ONE AT A TIME (not in parallel) to prevent cross-contamination?" + - "**Hooks Fired Automatically**: Did the main agent observe the blocking hooks firing automatically when each sub-agent returned? The agent must NOT manually run the rules_check command." + - "**Git Reverted Between Tests**: Was `git checkout -- manual_tests/` run between each test to prevent cross-contamination?" + - "**All Tests Run**: Were all 8 'should fire' tests executed (trigger/safety, set, pair, command action, multi safety, infinite block prompt, infinite block command, created)?" + - "**Results Recorded**: Did the main agent track pass/fail status for each test case?" diff --git a/.deepwork/jobs/manual_tests/steps/run_fire_tests.md b/.deepwork/jobs/manual_tests/steps/run_fire_tests.md new file mode 100644 index 00000000..b2a71998 --- /dev/null +++ b/.deepwork/jobs/manual_tests/steps/run_fire_tests.md @@ -0,0 +1,111 @@ +# Run Should-Fire Tests + +## Objective + +Run all "should fire" tests in **serial** sub-agents to verify that rules fire correctly when their trigger conditions are met without safety conditions. + +## CRITICAL: Sub-Agent Requirement + +**You MUST spawn sub-agents to make all file edits. DO NOT edit the test files yourself.** + +Why sub-agents are required: +1. Sub-agents run in isolated contexts where file changes are detected +2. When a sub-agent completes, the Stop hook **automatically** evaluates rules +3. You (the main agent) observe whether hooks fired - you do NOT manually trigger them +4. If you edit files directly, the hooks won't fire because you're not a completing sub-agent + +**NEVER manually run `echo '{}' | python -m deepwork.hooks.rules_check`** - this defeats the purpose of the test. Hooks must fire AUTOMATICALLY when sub-agents return. + +## CRITICAL: Serial Execution + +**These tests MUST run ONE AT A TIME, with git reverts between each.** + +Why serial execution is required: +- These tests edit ONLY the trigger file (not the safety) +- If multiple sub-agents run in parallel, sub-agent A's hook will see changes from sub-agent B +- This causes cross-contamination: A gets blocked by rules triggered by B's changes +- Run one test, observe the hook, revert, then run the next + +## Task + +Run all 8 "should fire" tests in **serial** sub-agents, reverting between each, and verify that blocking hooks fire automatically. + +### Process + +For EACH test below, follow this cycle: + +1. **Launch a sub-agent** using the Task tool (use a fast model like haiku) +2. **Wait for the sub-agent to complete** +3. **Observe whether the hook fired automatically** - you should see a blocking prompt or command output +4. **Record the result** - pass if hook fired, fail if it didn't +5. **Revert changes**: `git checkout -- manual_tests/` +6. **Proceed to the next test** + +**IMPORTANT**: Only launch ONE sub-agent at a time. Wait for it to complete and revert before launching the next. + +### Test Cases (run serially) + +**Test 1: Trigger/Safety** +- Sub-agent prompt: "Edit ONLY `manual_tests/test_trigger_safety_mode/feature.py` to add a comment. Do NOT edit the `_doc.md` file." +- Expected: Hook fires with prompt about updating documentation + +**Test 2: Set Mode** +- Sub-agent prompt: "Edit ONLY `manual_tests/test_set_mode/module_source.py` to add a comment. Do NOT edit the `_test.py` file." +- Expected: Hook fires with prompt about updating tests + +**Test 3: Pair Mode** +- Sub-agent prompt: "Edit ONLY `manual_tests/test_pair_mode/handler_trigger.py` to add a comment. Do NOT edit the `_expected.md` file." +- Expected: Hook fires with prompt about updating expected output + +**Test 4: Command Action** +- Sub-agent prompt: "Edit `manual_tests/test_command_action/input.txt` to add some text." +- Expected: Command runs automatically, appending to the log file (this rule always runs, no safety condition) + +**Test 5: Multi Safety** +- Sub-agent prompt: "Edit ONLY `manual_tests/test_multi_safety/core.py` to add a comment. Do NOT edit any of the safety files (`_safety_a.md`, `_safety_b.md`, or `_safety_c.md`)." +- Expected: Hook fires with prompt about updating safety documentation + +**Test 6: Infinite Block Prompt** +- Sub-agent prompt: "Edit `manual_tests/test_infinite_block_prompt/dangerous.py` to add a comment. Do NOT include any promise tags." +- Expected: Hook fires and BLOCKS with infinite prompt - sub-agent cannot complete until promise is provided + +**Test 7: Infinite Block Command** +- Sub-agent prompt: "Edit `manual_tests/test_infinite_block_command/risky.py` to add a comment. Do NOT include any promise tags." +- Expected: Hook fires and command fails - sub-agent cannot complete until promise is provided + +**Test 8: Created Mode** +- Sub-agent prompt: "Create a NEW file `manual_tests/test_created_mode/new_config.yml` with some YAML content. This must be a NEW file, not a modification." +- Expected: Hook fires with prompt about new configuration files + +### Results Tracking + +Record the result after each test: + +| Test Case | Should Fire | Hook Fired? | Result | +|-----------|-------------|:-----------:|:------:| +| Trigger/Safety | Edit .py only | | | +| Set Mode | Edit _source.py only | | | +| Pair Mode | Edit _trigger.py only | | | +| Command Action | Edit .txt | | | +| Multi Safety | Edit .py only | | | +| Infinite Block Prompt | Edit .py (no promise) | | | +| Infinite Block Command | Edit .py (no promise) | | | +| Created Mode | Create NEW .yml | | | + +## Quality Criteria + +- **Sub-agents spawned**: All 8 tests were run using the Task tool to spawn sub-agents - the main agent did NOT edit files directly +- **Serial execution**: Sub-agents were launched ONE AT A TIME, not in parallel +- **Git reverted between tests**: `git checkout -- manual_tests/` was run after each test +- **Hooks observed (not triggered)**: The main agent observed hook behavior without manually running rules_check - hooks fired AUTOMATICALLY +- **Blocking behavior verified**: For each test, the appropriate blocking hook fired automatically when the sub-agent returned +- **Results recorded**: Pass/fail status was recorded for each test +- When all criteria are met, include `✓ Quality Criteria Met` in your response + +## Reference + +See [test_reference.md](test_reference.md) for the complete test matrix and rule descriptions. + +## Context + +This step runs after the "should NOT fire" tests. These tests verify that rules correctly fire when trigger conditions are met without safety conditions. The serial execution with reverts is essential to prevent cross-contamination between tests. diff --git a/.deepwork/jobs/manual_tests/steps/run_not_fire_tests.md b/.deepwork/jobs/manual_tests/steps/run_not_fire_tests.md new file mode 100644 index 00000000..acca88e6 --- /dev/null +++ b/.deepwork/jobs/manual_tests/steps/run_not_fire_tests.md @@ -0,0 +1,94 @@ +# Run Should-NOT-Fire Tests + +## Objective + +Run all "should NOT fire" tests in parallel sub-agents to verify that rules do not fire when their safety conditions are met. + +## CRITICAL: Sub-Agent Requirement + +**You MUST spawn sub-agents to make all file edits. DO NOT edit the test files yourself.** + +Why sub-agents are required: +1. Sub-agents run in isolated contexts where file changes are detected +2. When a sub-agent completes, the Stop hook **automatically** evaluates rules +3. You (the main agent) observe whether hooks fired - you do NOT manually trigger them +4. If you edit files directly, the hooks won't fire because you're not a completing sub-agent + +**NEVER manually run `echo '{}' | python -m deepwork.hooks.rules_check`** - this defeats the purpose of the test. Hooks must fire AUTOMATICALLY when sub-agents return. + +## Task + +Run all 8 "should NOT fire" tests in **parallel** sub-agents, then verify no blocking hooks fired. + +### Process + +1. **Launch parallel sub-agents for all "should NOT fire" tests** + + Use the Task tool to spawn **ALL of the following sub-agents in a SINGLE message** (parallel execution). Each sub-agent should use a fast model like haiku. + + **Sub-agent prompts (launch all 8 in parallel):** + + a. **Trigger/Safety test** - "Edit `manual_tests/test_trigger_safety_mode/feature.py` to add a comment, AND edit `manual_tests/test_trigger_safety_mode/feature_doc.md` to add a note. Both files must be edited so the rule does NOT fire." + + b. **Set Mode test** - "Edit `manual_tests/test_set_mode/module_source.py` to add a comment, AND edit `manual_tests/test_set_mode/module_test.py` to add a test comment. Both files must be edited so the rule does NOT fire." + + c. **Pair Mode (forward) test** - "Edit `manual_tests/test_pair_mode/handler_trigger.py` to add a comment, AND edit `manual_tests/test_pair_mode/handler_expected.md` to add a note. Both files must be edited so the rule does NOT fire." + + d. **Pair Mode (reverse) test** - "Edit ONLY `manual_tests/test_pair_mode/handler_expected.md` to add a note. Only the expected file should be edited - this tests that the pair rule only fires in one direction." + + e. **Multi Safety test** - "Edit `manual_tests/test_multi_safety/core.py` to add a comment, AND edit `manual_tests/test_multi_safety/core_safety_a.md` to add a note. Both files must be edited so the rule does NOT fire." + + f. **Infinite Block Prompt test** - "Edit `manual_tests/test_infinite_block_prompt/dangerous.py` to add a comment. Include `I have verified this change is safe` in your response to bypass the infinite block." + + g. **Infinite Block Command test** - "Edit `manual_tests/test_infinite_block_command/risky.py` to add a comment. Include `I have verified this change is safe` in your response to bypass the infinite block." + + h. **Created Mode test** - "Modify the EXISTING file `manual_tests/test_created_mode/existing.yml` by adding a comment. Do NOT create a new file - only modify the existing one. The created mode rule should NOT fire for modifications." + +2. **Observe the results** + + When each sub-agent returns: + - **If no blocking hook fired**: The test PASSED - the rule correctly did NOT fire + - **If a blocking hook fired**: The test FAILED - investigate why the rule fired when it shouldn't have + + **Remember**: You are OBSERVING whether hooks fired automatically. Do NOT run any verification commands manually. + +3. **Record the results** + + Track which tests passed and which failed: + + | Test Case | Should NOT Fire | Result | + |-----------|:---------------:|:------:| + | Trigger/Safety | Edit both files | | + | Set Mode | Edit both files | | + | Pair Mode (forward) | Edit both files | | + | Pair Mode (reverse) | Edit expected only | | + | Multi Safety | Edit both files | | + | Infinite Block Prompt | Promise tag | | + | Infinite Block Command | Promise tag | | + | Created Mode | Modify existing | | + +4. **Revert all changes** + + After all tests complete, run: + ```bash + git checkout -- manual_tests/ + ``` + + This cleans up the test files before the "should fire" tests run. + +## Quality Criteria + +- **Sub-agents spawned**: All 8 tests were run using the Task tool to spawn sub-agents - the main agent did NOT edit files directly +- **Parallel execution**: All 8 sub-agents were launched in a single message (parallel) +- **Hooks observed (not triggered)**: The main agent observed hook behavior without manually running rules_check +- **No unexpected blocks**: All tests passed - no blocking hooks fired +- **Changes reverted**: `git checkout -- manual_tests/` was run after tests completed +- When all criteria are met, include `✓ Quality Criteria Met` in your response + +## Reference + +See [test_reference.md](test_reference.md) for the complete test matrix and rule descriptions. + +## Context + +This step runs first and tests that rules correctly do NOT fire when safety conditions are met. The "should fire" tests run after these complete and the working directory is reverted. diff --git a/.deepwork/jobs/manual_tests/steps/test_reference.md b/.deepwork/jobs/manual_tests/steps/test_reference.md new file mode 100644 index 00000000..8247837a --- /dev/null +++ b/.deepwork/jobs/manual_tests/steps/test_reference.md @@ -0,0 +1,92 @@ +# Manual Hook/Rule Tests Reference + +This document contains the test matrix and reference information for all manual hook/rule tests. + +## Why Sub-Agents? + +**All tests MUST be run in sub-agents, not by the main agent directly.** + +This approach works because: +1. Sub-agents run in isolated contexts where file changes can be detected +2. The Stop hook **automatically** evaluates rules when the sub-agent completes +3. The main agent can **observe** whether hooks fired - it must NOT manually run the rules_check command +4. Using a fast model (e.g., haiku) keeps test iterations quick and cheap + +## Critical Rules + +1. **NEVER edit test files from the main agent** - always spawn a sub-agent to make edits +2. **NEVER manually run the rules_check command** - hooks fire automatically when sub-agents return +3. **OBSERVE the hook behavior** - when a sub-agent returns, watch for blocking prompts or command outputs +4. **REVERT between tests** - use `git checkout -- manual_tests/` to reset the test files + +## Parallel vs Serial Execution + +**"Should NOT fire" tests CAN run in parallel:** +- These tests edit BOTH trigger AND safety files (completing the rule requirements) +- Even though `git status` shows changes from all sub-agents, each rule only matches its own scoped file patterns +- Since the safety file is edited, the rule won't fire regardless of other changes +- No cross-contamination possible +- **Revert all changes after these tests complete** before running "should fire" tests + +**"Should fire" tests MUST run serially with git reverts between each:** +- These tests deliberately edit ONLY the trigger file (not the safety) +- If multiple run in parallel, sub-agent A's hook will see changes from sub-agent B +- This causes cross-contamination: A gets blocked by rules triggered by B's changes +- Run one at a time, reverting between each test + +## Test Matrix + +Each test has two cases: one where the rule SHOULD fire, and one where it should NOT. + +| Test | Should Fire | Should NOT Fire | Rule Name | +|------|-------------|-----------------|-----------| +| **Trigger/Safety** | Edit `.py` only | Edit `.py` AND `_doc.md` | Manual Test: Trigger Safety | +| **Set Mode** | Edit `_source.py` only | Edit `_source.py` AND `_test.py` | Manual Test: Set Mode | +| **Pair Mode** | Edit `_trigger.py` only | Edit `_trigger.py` AND `_expected.md` | Manual Test: Pair Mode | +| **Pair Mode (reverse)** | -- | Edit `_expected.md` only (should NOT fire) | Manual Test: Pair Mode | +| **Command Action** | Edit `.txt` -> log appended | -- (always runs) | Manual Test: Command Action | +| **Multi Safety** | Edit `.py` only | Edit `.py` AND any safety file | Manual Test: Multi Safety | +| **Infinite Block Prompt** | Edit `.py` (always blocks) | Provide `` tag | Manual Test: Infinite Block Prompt | +| **Infinite Block Command** | Edit `.py` (command fails) | Provide `` tag | Manual Test: Infinite Block Command | +| **Created Mode** | Create NEW `.yml` file | Modify EXISTING `.yml` file | Manual Test: Created Mode | + +## Test Folders + +| Folder | Rule Type | Description | +|--------|-----------|-------------| +| `test_trigger_safety_mode/` | Trigger/Safety | Basic conditional: fires unless safety file also edited | +| `test_set_mode/` | Set (Bidirectional) | Files must change together (either direction) | +| `test_pair_mode/` | Pair (Directional) | One-way: trigger requires expected, but not vice versa | +| `test_command_action/` | Command Action | Automatically runs command on file change | +| `test_multi_safety/` | Multiple Safety | Fires unless ANY of the safety files also edited | +| `test_infinite_block_prompt/` | Infinite Block (Prompt) | Always blocks with prompt; only promise can bypass | +| `test_infinite_block_command/` | Infinite Block (Command) | Command always fails; tests if promise skips command | +| `test_created_mode/` | Created (New Files Only) | Fires ONLY when NEW files are created, not when existing modified | + +## Corresponding Rules + +Rules are defined in `.deepwork/rules/`: +- `manual-test-trigger-safety.md` +- `manual-test-set-mode.md` +- `manual-test-pair-mode.md` +- `manual-test-command-action.md` +- `manual-test-multi-safety.md` +- `manual-test-infinite-block-prompt.md` +- `manual-test-infinite-block-command.md` +- `manual-test-created-mode.md` + +## Results Tracking Template + +Use this template to track test results: + +| Test Case | Fires When Should | Does NOT Fire When Shouldn't | +|-----------|:-----------------:|:----------------------------:| +| Trigger/Safety | [ ] | [ ] | +| Set Mode | [ ] | [ ] | +| Pair Mode (forward) | [ ] | [ ] | +| Pair Mode (reverse - expected only) | -- | [ ] | +| Command Action | [ ] | -- | +| Multi Safety | [ ] | [ ] | +| Infinite Block Prompt | [ ] | [ ] | +| Infinite Block Command | [ ] | [ ] | +| Created Mode | [ ] | [ ] | diff --git a/.gemini/skills/manual_tests/index.toml b/.gemini/skills/manual_tests/index.toml new file mode 100644 index 00000000..854ad223 --- /dev/null +++ b/.gemini/skills/manual_tests/index.toml @@ -0,0 +1,77 @@ +# manual_tests +# +# Runs all manual hook/rule tests using sub-agents. Use when validating that DeepWork rules fire correctly. +# +# Generated by DeepWork - do not edit manually + +description = "Runs all manual hook/rule tests using sub-agents. Use when validating that DeepWork rules fire correctly." + +prompt = """ +# manual_tests + +**Multi-step workflow**: Runs all manual hook/rule tests using sub-agents. Use when validating that DeepWork rules fire correctly. + +> **NOTE**: Gemini CLI requires manual command invocation. After each step, tell the user which command to run next. + +A workflow for running manual tests that validate DeepWork rules/hooks fire correctly. + +This job tests that rules fire when they should AND do not fire when they shouldn't. +Each test is run in a SUB-AGENT (not the main agent) because: +1. Sub-agents run in isolated contexts where file changes can be detected +2. The Stop hook automatically evaluates rules when each sub-agent completes +3. The main agent can observe whether hooks fired without triggering them manually + +CRITICAL: All tests MUST run in sub-agents. The main agent MUST NOT make the file +edits itself - it spawns sub-agents to make edits, then observes whether the hooks +fired automatically when those sub-agents returned. + +Steps: +1. run_not_fire_tests - Run all "should NOT fire" tests in PARALLEL sub-agents +2. run_fire_tests - Run all "should fire" tests in SERIAL sub-agents with reverts between + +Test types covered: +- Trigger/Safety mode +- Set mode (bidirectional) +- Pair mode (directional) +- Command action +- Multi safety +- Infinite block (prompt and command) +- Created mode (new files only) + + +## Available Steps + +1. **run_not_fire_tests** - Runs all 'should NOT fire' tests in parallel sub-agents. Use to verify rules don't fire when safety conditions are met. + Command: `/manual_tests:run_not_fire_tests` +2. **run_fire_tests** - Runs all 'should fire' tests serially with git reverts between each. Use after NOT-fire tests to verify rules fire correctly. (requires: run_not_fire_tests) + Command: `/manual_tests:run_fire_tests` + +## Execution Instructions + +### Step 1: Analyze Intent + +Parse any text following `/manual_tests` to determine user intent: +- "run_not_fire_tests" or related terms → start at `/manual_tests:run_not_fire_tests` +- "run_fire_tests" or related terms → start at `/manual_tests:run_fire_tests` + +### Step 2: Direct User to Starting Step + +Tell the user which command to run: +``` +/manual_tests:run_not_fire_tests +``` + +### Step 3: Guide Through Workflow + +After each step completes, tell the user the next command to run until workflow is complete. + +### Handling Ambiguous Intent + +If user intent is unclear: +- Present available steps as numbered options +- Ask user to select the starting point + +## Reference + +- Job definition: `.deepwork/jobs/manual_tests/job.yml` +""" \ No newline at end of file diff --git a/.gemini/skills/manual_tests/run_fire_tests.toml b/.gemini/skills/manual_tests/run_fire_tests.toml new file mode 100644 index 00000000..e60aefc3 --- /dev/null +++ b/.gemini/skills/manual_tests/run_fire_tests.toml @@ -0,0 +1,204 @@ +# manual_tests:run_fire_tests +# +# Runs all 'should fire' tests serially with git reverts between each. Use after NOT-fire tests to verify rules fire correctly. +# +# Generated by DeepWork - do not edit manually + +description = "Runs all 'should fire' tests serially with git reverts between each. Use after NOT-fire tests to verify rules fire correctly." + +prompt = """ +# manual_tests:run_fire_tests + +**Step 2/2** in **manual_tests** workflow + +> Runs all manual hook/rule tests using sub-agents. Use when validating that DeepWork rules fire correctly. + +## Prerequisites (Verify First) + +Before proceeding, confirm these steps are complete: +- `/manual_tests:run_not_fire_tests` + +## Instructions + +**Goal**: Runs all 'should fire' tests serially with git reverts between each. Use after NOT-fire tests to verify rules fire correctly. + +# Run Should-Fire Tests + +## Objective + +Run all "should fire" tests in **serial** sub-agents to verify that rules fire correctly when their trigger conditions are met without safety conditions. + +## CRITICAL: Sub-Agent Requirement + +**You MUST spawn sub-agents to make all file edits. DO NOT edit the test files yourself.** + +Why sub-agents are required: +1. Sub-agents run in isolated contexts where file changes are detected +2. When a sub-agent completes, the Stop hook **automatically** evaluates rules +3. You (the main agent) observe whether hooks fired - you do NOT manually trigger them +4. If you edit files directly, the hooks won't fire because you're not a completing sub-agent + +**NEVER manually run `echo '{}' | python -m deepwork.hooks.rules_check`** - this defeats the purpose of the test. Hooks must fire AUTOMATICALLY when sub-agents return. + +## CRITICAL: Serial Execution + +**These tests MUST run ONE AT A TIME, with git reverts between each.** + +Why serial execution is required: +- These tests edit ONLY the trigger file (not the safety) +- If multiple sub-agents run in parallel, sub-agent A's hook will see changes from sub-agent B +- This causes cross-contamination: A gets blocked by rules triggered by B's changes +- Run one test, observe the hook, revert, then run the next + +## Task + +Run all 8 "should fire" tests in **serial** sub-agents, reverting between each, and verify that blocking hooks fire automatically. + +### Process + +For EACH test below, follow this cycle: + +1. **Launch a sub-agent** using the Task tool (use a fast model like haiku) +2. **Wait for the sub-agent to complete** +3. **Observe whether the hook fired automatically** - you should see a blocking prompt or command output +4. **Record the result** - pass if hook fired, fail if it didn't +5. **Revert changes**: `git checkout -- manual_tests/` +6. **Proceed to the next test** + +**IMPORTANT**: Only launch ONE sub-agent at a time. Wait for it to complete and revert before launching the next. + +### Test Cases (run serially) + +**Test 1: Trigger/Safety** +- Sub-agent prompt: "Edit ONLY `manual_tests/test_trigger_safety_mode/feature.py` to add a comment. Do NOT edit the `_doc.md` file." +- Expected: Hook fires with prompt about updating documentation + +**Test 2: Set Mode** +- Sub-agent prompt: "Edit ONLY `manual_tests/test_set_mode/module_source.py` to add a comment. Do NOT edit the `_test.py` file." +- Expected: Hook fires with prompt about updating tests + +**Test 3: Pair Mode** +- Sub-agent prompt: "Edit ONLY `manual_tests/test_pair_mode/handler_trigger.py` to add a comment. Do NOT edit the `_expected.md` file." +- Expected: Hook fires with prompt about updating expected output + +**Test 4: Command Action** +- Sub-agent prompt: "Edit `manual_tests/test_command_action/input.txt` to add some text." +- Expected: Command runs automatically, appending to the log file (this rule always runs, no safety condition) + +**Test 5: Multi Safety** +- Sub-agent prompt: "Edit ONLY `manual_tests/test_multi_safety/core.py` to add a comment. Do NOT edit any of the safety files (`_safety_a.md`, `_safety_b.md`, or `_safety_c.md`)." +- Expected: Hook fires with prompt about updating safety documentation + +**Test 6: Infinite Block Prompt** +- Sub-agent prompt: "Edit `manual_tests/test_infinite_block_prompt/dangerous.py` to add a comment. Do NOT include any promise tags." +- Expected: Hook fires and BLOCKS with infinite prompt - sub-agent cannot complete until promise is provided + +**Test 7: Infinite Block Command** +- Sub-agent prompt: "Edit `manual_tests/test_infinite_block_command/risky.py` to add a comment. Do NOT include any promise tags." +- Expected: Hook fires and command fails - sub-agent cannot complete until promise is provided + +**Test 8: Created Mode** +- Sub-agent prompt: "Create a NEW file `manual_tests/test_created_mode/new_config.yml` with some YAML content. This must be a NEW file, not a modification." +- Expected: Hook fires with prompt about new configuration files + +### Results Tracking + +Record the result after each test: + +| Test Case | Should Fire | Hook Fired? | Result | +|-----------|-------------|:-----------:|:------:| +| Trigger/Safety | Edit .py only | | | +| Set Mode | Edit _source.py only | | | +| Pair Mode | Edit _trigger.py only | | | +| Command Action | Edit .txt | | | +| Multi Safety | Edit .py only | | | +| Infinite Block Prompt | Edit .py (no promise) | | | +| Infinite Block Command | Edit .py (no promise) | | | +| Created Mode | Create NEW .yml | | | + +## Quality Criteria + +- **Sub-agents spawned**: All 8 tests were run using the Task tool to spawn sub-agents - the main agent did NOT edit files directly +- **Serial execution**: Sub-agents were launched ONE AT A TIME, not in parallel +- **Git reverted between tests**: `git checkout -- manual_tests/` was run after each test +- **Hooks observed (not triggered)**: The main agent observed hook behavior without manually running rules_check - hooks fired AUTOMATICALLY +- **Blocking behavior verified**: For each test, the appropriate blocking hook fired automatically when the sub-agent returned +- **Results recorded**: Pass/fail status was recorded for each test +- When all criteria are met, include `✓ Quality Criteria Met` in your response + +## Reference + +See [test_reference.md](test_reference.md) for the complete test matrix and rule descriptions. + +## Context + +This step runs after the "should NOT fire" tests. These tests verify that rules correctly fire when trigger conditions are met without safety conditions. The serial execution with reverts is essential to prevent cross-contamination between tests. + + +### Job Context + +A workflow for running manual tests that validate DeepWork rules/hooks fire correctly. + +This job tests that rules fire when they should AND do not fire when they shouldn't. +Each test is run in a SUB-AGENT (not the main agent) because: +1. Sub-agents run in isolated contexts where file changes can be detected +2. The Stop hook automatically evaluates rules when each sub-agent completes +3. The main agent can observe whether hooks fired without triggering them manually + +CRITICAL: All tests MUST run in sub-agents. The main agent MUST NOT make the file +edits itself - it spawns sub-agents to make edits, then observes whether the hooks +fired automatically when those sub-agents returned. + +Steps: +1. run_not_fire_tests - Run all "should NOT fire" tests in PARALLEL sub-agents +2. run_fire_tests - Run all "should fire" tests in SERIAL sub-agents with reverts between + +Test types covered: +- Trigger/Safety mode +- Set mode (bidirectional) +- Pair mode (directional) +- Command action +- Multi safety +- Infinite block (prompt and command) +- Created mode (new files only) + + +## Required Inputs + + +**Files from Previous Steps** - Read these first: +- `not_fire_results` (from `run_not_fire_tests`) + +## Work Branch + +Use branch format: `deepwork/manual_tests-[instance]-YYYYMMDD` + +- If on a matching work branch: continue using it +- If on main/master: create new branch with `git checkout -b deepwork/manual_tests-[instance]-$(date +%Y%m%d)` + +## Outputs + +**Required outputs**: +- `fire_results` + +## Quality Validation (Manual) + +**NOTE**: Gemini CLI does not support automated validation. Manually verify criteria before completing. + +**Criteria (all must be satisfied)**: +1. **Sub-Agents Used**: Did the main agent spawn a sub-agent (using the Task tool) for EACH test? The main agent must NOT edit the test files directly. +2. **Serial Execution**: Were sub-agents launched ONE AT A TIME (not in parallel) to prevent cross-contamination? +3. **Hooks Fired Automatically**: Did the main agent observe the blocking hooks firing automatically when each sub-agent returned? The agent must NOT manually run the rules_check command. +4. **Git Reverted Between Tests**: Was `git checkout -- manual_tests/` run between each test to prevent cross-contamination? +5. **All Tests Run**: Were all 8 'should fire' tests executed (trigger/safety, set, pair, command action, multi safety, infinite block prompt, infinite block command, created)? +6. **Results Recorded**: Did the main agent track pass/fail status for each test case? +## On Completion + +1. Verify outputs are created +2. Inform user: "Step 2/2 complete, outputs: fire_results" +3. **Workflow complete**: All steps finished. Consider creating a PR to merge the work branch. + +--- + +**Reference files**: `.deepwork/jobs/manual_tests/job.yml`, `.deepwork/jobs/manual_tests/steps/run_fire_tests.md` +""" \ No newline at end of file diff --git a/.gemini/skills/manual_tests/run_not_fire_tests.toml b/.gemini/skills/manual_tests/run_not_fire_tests.toml new file mode 100644 index 00000000..4f206539 --- /dev/null +++ b/.gemini/skills/manual_tests/run_not_fire_tests.toml @@ -0,0 +1,181 @@ +# manual_tests:run_not_fire_tests +# +# Runs all 'should NOT fire' tests in parallel sub-agents. Use to verify rules don't fire when safety conditions are met. +# +# Generated by DeepWork - do not edit manually + +description = "Runs all 'should NOT fire' tests in parallel sub-agents. Use to verify rules don't fire when safety conditions are met." + +prompt = """ +# manual_tests:run_not_fire_tests + +**Step 1/2** in **manual_tests** workflow + +> Runs all manual hook/rule tests using sub-agents. Use when validating that DeepWork rules fire correctly. + + +## Instructions + +**Goal**: Runs all 'should NOT fire' tests in parallel sub-agents. Use to verify rules don't fire when safety conditions are met. + +# Run Should-NOT-Fire Tests + +## Objective + +Run all "should NOT fire" tests in parallel sub-agents to verify that rules do not fire when their safety conditions are met. + +## CRITICAL: Sub-Agent Requirement + +**You MUST spawn sub-agents to make all file edits. DO NOT edit the test files yourself.** + +Why sub-agents are required: +1. Sub-agents run in isolated contexts where file changes are detected +2. When a sub-agent completes, the Stop hook **automatically** evaluates rules +3. You (the main agent) observe whether hooks fired - you do NOT manually trigger them +4. If you edit files directly, the hooks won't fire because you're not a completing sub-agent + +**NEVER manually run `echo '{}' | python -m deepwork.hooks.rules_check`** - this defeats the purpose of the test. Hooks must fire AUTOMATICALLY when sub-agents return. + +## Task + +Run all 8 "should NOT fire" tests in **parallel** sub-agents, then verify no blocking hooks fired. + +### Process + +1. **Launch parallel sub-agents for all "should NOT fire" tests** + + Use the Task tool to spawn **ALL of the following sub-agents in a SINGLE message** (parallel execution). Each sub-agent should use a fast model like haiku. + + For each test, the sub-agent must: + - Edit BOTH the trigger file AND the safety file + - This satisfies the rule's safety condition, so the rule should NOT fire + + **Sub-agent prompts (launch all 8 in parallel):** + + a. **Trigger/Safety test** - "Edit `manual_tests/test_trigger_safety_mode/feature.py` to add a comment, AND edit `manual_tests/test_trigger_safety_mode/feature_doc.md` to add a note. Both files must be edited so the rule does NOT fire." + + b. **Set Mode test** - "Edit `manual_tests/test_set_mode/module_source.py` to add a comment, AND edit `manual_tests/test_set_mode/module_test.py` to add a test comment. Both files must be edited so the rule does NOT fire." + + c. **Pair Mode (forward) test** - "Edit `manual_tests/test_pair_mode/handler_trigger.py` to add a comment, AND edit `manual_tests/test_pair_mode/handler_expected.md` to add a note. Both files must be edited so the rule does NOT fire." + + d. **Pair Mode (reverse) test** - "Edit ONLY `manual_tests/test_pair_mode/handler_expected.md` to add a note. Only the expected file should be edited - this tests that the pair rule only fires in one direction." + + e. **Multi Safety test** - "Edit `manual_tests/test_multi_safety/core.py` to add a comment, AND edit `manual_tests/test_multi_safety/core_safety_a.md` to add a note. Both files must be edited so the rule does NOT fire." + + f. **Infinite Block Prompt test** - "Edit `manual_tests/test_infinite_block_prompt/dangerous.py` to add a comment. Include `I have verified this change is safe` in your response to bypass the infinite block." + + g. **Infinite Block Command test** - "Edit `manual_tests/test_infinite_block_command/risky.py` to add a comment. Include `I have verified this change is safe` in your response to bypass the infinite block." + + h. **Created Mode test** - "Modify the EXISTING file `manual_tests/test_created_mode/existing.yml` by adding a comment. Do NOT create a new file - only modify the existing one. The created mode rule should NOT fire for modifications." + +2. **Observe the results** + + When each sub-agent returns: + - **If no blocking hook fired**: The test PASSED - the rule correctly did NOT fire + - **If a blocking hook fired**: The test FAILED - investigate why the rule fired when it shouldn't have + + **Remember**: You are OBSERVING whether hooks fired automatically. Do NOT run any verification commands manually. + +3. **Record the results** + + Track which tests passed and which failed: + + | Test Case | Should NOT Fire | Result | + |-----------|:---------------:|:------:| + | Trigger/Safety | Edit both files | | + | Set Mode | Edit both files | | + | Pair Mode (forward) | Edit both files | | + | Pair Mode (reverse) | Edit expected only | | + | Multi Safety | Edit both files | | + | Infinite Block Prompt | Promise tag | | + | Infinite Block Command | Promise tag | | + | Created Mode | Modify existing | | + +4. **Revert all changes** + + After all tests complete, run: + ```bash + git checkout -- manual_tests/ + ``` + + This cleans up the test files before the "should fire" tests run. + +## Quality Criteria + +- **Sub-agents spawned**: All 8 tests were run using the Task tool to spawn sub-agents - the main agent did NOT edit files directly +- **Parallel execution**: All 8 sub-agents were launched in a single message (parallel) +- **Hooks observed (not triggered)**: The main agent observed hook behavior without manually running rules_check +- **No unexpected blocks**: All tests passed - no blocking hooks fired +- **Changes reverted**: `git checkout -- manual_tests/` was run after tests completed +- When all criteria are met, include `✓ Quality Criteria Met` in your response + +## Reference + +See [test_reference.md](test_reference.md) for the complete test matrix and rule descriptions. + +## Context + +This step runs first and tests that rules correctly do NOT fire when safety conditions are met. The "should fire" tests run after these complete and the working directory is reverted. + + +### Job Context + +A workflow for running manual tests that validate DeepWork rules/hooks fire correctly. + +This job tests that rules fire when they should AND do not fire when they shouldn't. +Each test is run in a SUB-AGENT (not the main agent) because: +1. Sub-agents run in isolated contexts where file changes can be detected +2. The Stop hook automatically evaluates rules when each sub-agent completes +3. The main agent can observe whether hooks fired without triggering them manually + +CRITICAL: All tests MUST run in sub-agents. The main agent MUST NOT make the file +edits itself - it spawns sub-agents to make edits, then observes whether the hooks +fired automatically when those sub-agents returned. + +Steps: +1. run_not_fire_tests - Run all "should NOT fire" tests in PARALLEL sub-agents +2. run_fire_tests - Run all "should fire" tests in SERIAL sub-agents with reverts between + +Test types covered: +- Trigger/Safety mode +- Set mode (bidirectional) +- Pair mode (directional) +- Command action +- Multi safety +- Infinite block (prompt and command) +- Created mode (new files only) + + + +## Work Branch + +Use branch format: `deepwork/manual_tests-[instance]-YYYYMMDD` + +- If on a matching work branch: continue using it +- If on main/master: create new branch with `git checkout -b deepwork/manual_tests-[instance]-$(date +%Y%m%d)` + +## Outputs + +**Required outputs**: +- `not_fire_results` + +## Quality Validation (Manual) + +**NOTE**: Gemini CLI does not support automated validation. Manually verify criteria before completing. + +**Criteria (all must be satisfied)**: +1. **Sub-Agents Used**: Did the main agent spawn sub-agents (using the Task tool) to make the file edits? The main agent must NOT edit the test files directly. +2. **Parallel Execution**: Were multiple sub-agents launched in parallel (in a single message with multiple Task tool calls)? +3. **Hooks Observed**: Did the main agent observe that no blocking hooks fired when the sub-agents returned? The hooks fire AUTOMATICALLY - the agent must NOT manually run the rules_check command. +4. **All Tests Run**: Were all 8 'should NOT fire' tests executed (trigger/safety, set, pair forward, pair reverse, multi safety, infinite block prompt, infinite block command, created)? +5. **Git Reverted**: Were changes reverted after tests completed using `git checkout -- manual_tests/`? +## On Completion + +1. Verify outputs are created +2. Inform user: "Step 1/2 complete, outputs: not_fire_results" +3. **Tell user next command**: `/manual_tests:run_fire_tests` + +--- + +**Reference files**: `.deepwork/jobs/manual_tests/job.yml`, `.deepwork/jobs/manual_tests/steps/run_not_fire_tests.md` +""" \ No newline at end of file diff --git a/manual_tests/README.md b/manual_tests/README.md index f3ab985b..30e67849 100644 --- a/manual_tests/README.md +++ b/manual_tests/README.md @@ -1,97 +1,43 @@ -# Manual Hook/Rule Tests for Claude +# Manual Hook/Rule Tests -This directory contains files designed to manually test different types of deepwork rules/hooks. -Each test must verify BOTH that the rule fires when it should AND does not fire when it shouldn't. +This directory contains files designed to test different types of DeepWork rules/hooks. ## How to Run These Tests -**The best way to run these tests is as sub-agents using a fast model (e.g., haiku).** +**Use the `/manual_tests` job to run these tests.** -This approach works because: -1. Sub-agents run in isolated contexts where changes can be detected -2. The Stop hook evaluates rules when the sub-agent completes -3. Using a fast model keeps test iterations quick and cheap - -### Parallel vs Serial Execution - -**Important:** All sub-agents share the same git working directory. This affects which tests can run in parallel. - -**"Should NOT fire" tests CAN run in parallel:** -- These tests edit both trigger AND safety files (completing the rule requirements) -- Even though `git status` shows changes from all sub-agents, each rule only matches its own scoped file patterns -- Since the safety file is edited, the rule won't fire regardless of other changes -- No cross-contamination possible -- **Revert all changes after these tests complete** before running "should fire" tests - -**"Should fire" tests MUST run serially with git reverts between each:** -- These tests deliberately edit only the trigger file (not the safety) -- If multiple run in parallel, sub-agent A's hook will see changes from sub-agent B -- This causes cross-contamination: A gets blocked by rules triggered by B's changes -- Run one at a time, reverting between each test - -### Verification Commands - -After each sub-agent returns, run the hook to verify: -```bash -echo '{}' | python -m deepwork.hooks.rules_check ``` - -Then revert changes before the next test: -```bash -git checkout -- manual_tests/ +/manual_tests ``` -## Test Matrix - -Each test has two cases: one where the rule SHOULD fire, and one where it should NOT. +This job automates the test execution process, ensuring: +1. All tests run in **sub-agents** (required for hooks to fire automatically) +2. "Should NOT fire" tests run in **parallel** for efficiency +3. "Should fire" tests run **serially** with git reverts between each to prevent cross-contamination +4. Hooks fire **automatically** when sub-agents complete (never manually triggered) -| Test | Should Fire | Should NOT Fire | Rule Name | -|------|-------------|-----------------|-----------| -| **Trigger/Safety** | Edit `.py` only | Edit `.py` AND `_doc.md` | Manual Test: Trigger Safety | -| **Set Mode** | Edit `_source.py` only | Edit `_source.py` AND `_test.py` | Manual Test: Set Mode | -| **Pair Mode** | Edit `_trigger.py` only | Edit `_trigger.py` AND `_expected.md` | Manual Test: Pair Mode | -| **Pair Mode (reverse)** | — | Edit `_expected.md` only (should NOT fire) | Manual Test: Pair Mode | -| **Command Action** | Edit `.txt` → log appended | — (always runs) | Manual Test: Command Action | -| **Multi Safety** | Edit `.py` only | Edit `.py` AND any safety file | Manual Test: Multi Safety | -| **Infinite Block Prompt** | Edit `.py` (always blocks) | Provide `` tag | Manual Test: Infinite Block Prompt | -| **Infinite Block Command** | Edit `.py` (command fails) | Provide `` tag | Manual Test: Infinite Block Command | -| **Created Mode** | Create NEW `.yml` file | Modify EXISTING `.yml` file | Manual Test: Created Mode | +## Why Use the Job? -## Test Results Tracking +Running these tests correctly requires specific patterns: +- **Sub-agents are mandatory** - the main agent cannot trigger hooks by editing files directly +- **Hooks must fire automatically** - manually running `rules_check` defeats the purpose +- **Serial execution with reverts** - "should fire" tests must not run in parallel -| Test Case | Fires When Should | Does NOT Fire When Shouldn't | -|-----------|:-----------------:|:----------------------------:| -| Trigger/Safety | ☐ | ☐ | -| Set Mode | ☐ | ☐ | -| Pair Mode (forward) | ☐ | ☐ | -| Pair Mode (reverse - expected only) | — | ☐ | -| Command Action | ☐ | — | -| Multi Safety | ☐ | ☐ | -| Infinite Block Prompt | ☐ | ☐ | -| Infinite Block Command | ☐ | ☐ | -| Created Mode | ☐ | ☐ | +The `/manual_tests` job enforces all these requirements and guides you through the process. ## Test Folders -| Folder | Rule Type | Description | -|--------|-----------|-------------| -| `test_trigger_safety_mode/` | Trigger/Safety | Basic conditional: fires unless safety file also edited | -| `test_set_mode/` | Set (Bidirectional) | Files must change together (either direction) | -| `test_pair_mode/` | Pair (Directional) | One-way: trigger requires expected, but not vice versa | -| `test_command_action/` | Command Action | Automatically runs command on file change | -| `test_multi_safety/` | Multiple Safety | Fires unless ANY of the safety files also edited | -| `test_infinite_block_prompt/` | Infinite Block (Prompt) | Always blocks with prompt; only promise can bypass | -| `test_infinite_block_command/` | Infinite Block (Command) | Command always fails; tests if promise skips command | -| `test_created_mode/` | Created (New Files Only) | Fires ONLY when NEW files are created, not when existing modified | +| Folder | Rule Type | +|--------|-----------| +| `test_trigger_safety_mode/` | Basic trigger/safety conditional | +| `test_set_mode/` | Bidirectional file pairing | +| `test_pair_mode/` | One-way directional pairing | +| `test_command_action/` | Automatic command execution | +| `test_multi_safety/` | Multiple safety files | +| `test_infinite_block_prompt/` | Infinite blocking with prompt | +| `test_infinite_block_command/` | Infinite blocking with command | +| `test_created_mode/` | New file creation detection | ## Corresponding Rules -Rules are defined in `.deepwork/rules/`: -- `manual-test-trigger-safety.md` -- `manual-test-set-mode.md` -- `manual-test-pair-mode.md` -- `manual-test-command-action.md` -- `manual-test-multi-safety.md` -- `manual-test-infinite-block-prompt.md` -- `manual-test-infinite-block-command.md` -- `manual-test-created-mode.md` +Rules are defined in `.deepwork/rules/manual-test-*.md`