Skip to content

Add magic string detection for reliable sub-agent hook reporting#118

Closed
nhorton wants to merge 4 commits intomainfrom
claude/review-manual-tests-AR4k2
Closed

Add magic string detection for reliable sub-agent hook reporting#118
nhorton wants to merge 4 commits intomainfrom
claude/review-manual-tests-AR4k2

Conversation

@nhorton
Copy link
Contributor

@nhorton nhorton commented Jan 22, 2026

Summary

This PR introduces a critical fix to the manual testing workflow by implementing magic string detection for reliable sub-agent hook reporting. Instead of relying on visual observation of blocking prompts or manual queue inspection, sub-agents now return standardized strings (HOOK_FIRED: or HOOK_NOT_FIRED:) that the main agent checks to determine test pass/fail status.

Key Changes

  • Magic String Protocol: Sub-agents are now instructed to return:

    • HOOK_FIRED: <description> if a DeepWork hook blocked them
    • HOOK_NOT_FIRED: Task completed successfully if no hook fired
  • Updated Test Instructions: All 8 test cases in both run_fire_tests.md and run_not_fire_tests.md now include the magic string instruction at the end of each sub-agent prompt

  • Fixed File Names: Corrected all test file references in sub-agent prompts to match actual test files:

    • feature.pytest_trigger_safety_mode.py
    • module_source.pytest_set_mode_source.py
    • handler_trigger.pytest_pair_mode_trigger.py
    • dangerous.pytest_infinite_block_prompt.py
    • risky.pytest_infinite_block_command.py
    • And similar corrections for all other test files
  • Updated Promise Tags: Fixed infinite block test promise tags to match rule names:

    • <promise>I have verified this change is safe</promise><promise>Manual Test: Infinite Block Prompt</promise>
    • <promise>I have verified this change is safe</promise><promise>Manual Test: Infinite Block Command</promise>
  • Quality Criteria Updates: Modified acceptance criteria to check for magic string detection instead of manual observation:

    • Changed from "Hooks Observed" to "Magic String Detection"
    • Clarified that agents must NOT manually run rules_check
  • Documentation Improvements: Enhanced test_reference.md with clear magic string detection explanation and updated critical rules

  • Version Bump: Updated job version from 1.2.1 to 1.3.0 with changelog entry

Implementation Details

The magic string approach provides:

  • Reliability: Explicit, unambiguous signal from sub-agents about hook behavior
  • Simplicity: Main agent checks response text instead of inferring from blocking behavior
  • Consistency: All tests use the same detection mechanism
  • Fallback: Queue inspection still available if magic string is missing (inconclusive case)

This change ensures the manual testing workflow is more robust and less dependent on visual observation or manual verification commands.

Reconciled with main branch (which added deepwork rules clear_queue CLI).

Key improvements:
1. TASK_START/HOOK_FIRED magic string detection
   - Sub-agents ALWAYS output TASK_START at response start
   - If hook fires and blocks, they also output HOOK_FIRED
   - Detection: TASK_START present + no HOOK_FIRED = hook did NOT fire
   - Eliminates impossible HOOK_NOT_FIRED requirement

2. Fixed all file names in sub-agent prompts to match actual test files
   - feature.py → test_trigger_safety_mode.py
   - module_source.py → test_set_mode_source.py
   - etc.

3. Added max_turns: 5 timeout to prevent infinite hangs
   - For "should fire" tests: timeout = PASSED (confirms blocking)
   - For "should NOT fire" tests: timeout = FAILED

4. Fixed infinite block promise tags to match rule names

5. Uses new deepwork rules clear_queue CLI command
@nhorton nhorton force-pushed the claude/review-manual-tests-AR4k2 branch from 4b2b707 to 6f71bb0 Compare January 22, 2026 20:19
claude and others added 3 commits January 22, 2026 20:22
- Document ~16k baseline input token cost per sub-agent (system prompt + tools)
- Add "Keep your response brief" instruction to sub-agent prompts
- Helps minimize additional token usage on top of unavoidable baseline
@nhorton nhorton closed this Jan 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants