From eaaf5063d66ed067709366230330b9f93a96ad34 Mon Sep 17 00:00:00 2001 From: Claude Date: Thu, 22 Jan 2026 21:23:06 +0000 Subject: [PATCH 1/3] Fix infinite loop in command rules when command always fails Command rules with FAILED status were re-running on every hook invocation, causing infinite loops when the command always fails (e.g., `false`). Added a check to skip COMMAND rules that are already FAILED, similar to the existing check for PROMPT rules that are already QUEUED. The agent must provide a promise tag to proceed. Also added debugging history documentation and updated AGENTS.md to reference it for future debugging sessions. --- AGENTS.md | 13 +++ doc/debugging_history/hooks.md | 129 ++++++++++++++++++++++++++++++ src/deepwork/hooks/rules_check.py | 10 +++ 3 files changed, 152 insertions(+) create mode 100644 doc/debugging_history/hooks.md diff --git a/AGENTS.md b/AGENTS.md index 3e469ef8..2c34acf4 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -84,6 +84,19 @@ deepwork/ ├── deepwork_rules/ # ← Installed copy, NOT source of truth └── [bespoke_job]/ # ← Source of truth for bespoke only +## Debugging Issues + +When debugging issues in this codebase, **always consult `doc/debugging_history/`** first. This directory contains documentation of past debugging sessions, including: + +- Root causes of tricky bugs +- Key learnings and patterns to avoid +- Related files and test cases + +**After resolving an issue**, append your findings to the appropriate file in `doc/debugging_history/` (or create a new file if none exists for that subsystem). This helps future agents avoid the same pitfalls. + +Current debugging history files: +- `doc/debugging_history/hooks.md` - Hooks system debugging (rules_check, blocking, queue management) + ## Development Environment This project uses **Nix Flakes** to provide a reproducible development environment. diff --git a/doc/debugging_history/hooks.md b/doc/debugging_history/hooks.md new file mode 100644 index 00000000..9948f5e1 --- /dev/null +++ b/doc/debugging_history/hooks.md @@ -0,0 +1,129 @@ +# Hooks System Debugging History + +This document records debugging sessions and findings for the DeepWork hooks system. + +--- + +## 2026-01-22: Infinite Loop Bug in Command Rules + +### Symptoms + +The manual tests "Infinite Block Command - Should Fire (no promise)" were hanging infinitely. The sub-agents spawned to test these rules never returned, even with `max_turns: 5` configured. + +### Investigation + +The `rules_check.py` hook handles two types of rule actions: +1. **PROMPT rules**: Show instructions to the agent +2. **COMMAND rules**: Run a shell command (e.g., linting, type checking) + +For **PROMPT rules**, there was existing logic to prevent infinite loops (lines 617-624 in `rules_check.py`): + +```python +# For PROMPT rules, also skip if already QUEUED (already shown to agent). +# This prevents infinite loops when transcript is unavailable or promise +# tags haven't been written yet. The agent has already seen this rule. +if ( + existing + and existing.status == QueueEntryStatus.QUEUED + and rule.action_type == ActionType.PROMPT +): + continue +``` + +However, for **COMMAND rules**, there was no equivalent protection. The flow was: + +1. Agent edits file +2. Hook runs, command fails, status set to FAILED, blocks with error +3. Agent sees error, responds (without promise) +4. Hook runs again +5. Rule triggers (same files still modified) +6. Existing entry has status FAILED, but FAILED is not in skip conditions +7. Command runs again, fails again, blocks again +8. Go to step 3 → **infinite loop** + +### Root Cause + +The queue status checks only skipped rules with status PASSED or SKIPPED: + +```python +if existing and existing.status in ( + QueueEntryStatus.PASSED, + QueueEntryStatus.SKIPPED, +): + continue +``` + +Command rules with FAILED status were not skipped, causing them to re-run on every hook invocation until a promise was provided. But without any way for the agent to know it was in a loop, the command would run infinitely. + +### The Fix + +Added a check for COMMAND rules with FAILED status, similar to the existing PROMPT/QUEUED check: + +```python +# For COMMAND rules, also skip if already FAILED (already shown error). +# This prevents infinite loops when a command always fails. +# The agent needs to either fix the underlying issue or provide a promise. +if ( + existing + and existing.status == QueueEntryStatus.FAILED + and rule.action_type == ActionType.COMMAND +): + continue +``` + +This ensures that: +- A failing command only runs once per trigger +- The agent sees the error message once +- The agent must provide a `Rule Name` tag to proceed +- No infinite loop occurs + +### Test Cases Affected + +- `manual_tests/test_infinite_block_command/` - Tests a rule with `command: "false"` (always fails) +- The test verifies that the hook fires AND the sub-agent returns in reasonable time (doesn't hang) + +### Key Learnings + +1. **Any hook action that can block must have loop prevention**: Both PROMPT and COMMAND rules need mechanisms to prevent re-triggering infinitely. + +2. **Queue status is the key to loop prevention**: The rules queue tracks what the agent has already seen. Rules should not re-trigger if they've already been shown to the agent (QUEUED for prompts, FAILED for commands). + +3. **Symmetry in action handling**: When adding loop prevention for one action type, check if other action types need similar protection. + +### Related Files + +- `src/deepwork/hooks/rules_check.py` - Main hook implementation +- `src/deepwork/core/rules_queue.py` - Queue entry status definitions +- `.deepwork/rules/manual-test-infinite-block-command.md` - Test rule +- `manual_tests/test_infinite_block_command/` - Test files + +--- + +## Template for Future Entries + +When debugging hooks issues, document findings using this structure: + +```markdown +## YYYY-MM-DD: Brief Issue Title + +### Symptoms +What was observed? What tests were failing? + +### Investigation +What was examined? What code paths were traced? + +### Root Cause +What was the actual bug? + +### The Fix +What changes were made? + +### Test Cases Affected +Which tests verify this fix? + +### Key Learnings +What general lessons apply to future development? + +### Related Files +Which files are involved? +``` diff --git a/src/deepwork/hooks/rules_check.py b/src/deepwork/hooks/rules_check.py index 0a39fa3d..9d1f5d04 100644 --- a/src/deepwork/hooks/rules_check.py +++ b/src/deepwork/hooks/rules_check.py @@ -624,6 +624,16 @@ def rules_check_hook(hook_input: HookInput) -> HookOutput: ): continue + # For COMMAND rules, also skip if already FAILED (already shown error). + # This prevents infinite loops when a command always fails. + # The agent needs to either fix the underlying issue or provide a promise. + if ( + existing + and existing.status == QueueEntryStatus.FAILED + and rule.action_type == ActionType.COMMAND + ): + continue + # Create queue entry if new if not existing: queue.create_entry( From 87648f216982e9fd5b767e082604aef71af223fc Mon Sep 17 00:00:00 2001 From: Noah Horton Date: Thu, 22 Jan 2026 14:43:33 -0700 Subject: [PATCH 2/3] fixes --- doc/debugging_history/AGENTS.md | 54 +++++++++++++++++++++++++++ doc/debugging_history/hooks.md | 62 +++++++++++++++---------------- src/deepwork/hooks/rules_check.py | 26 +++++++++++-- 3 files changed, 106 insertions(+), 36 deletions(-) create mode 100644 doc/debugging_history/AGENTS.md diff --git a/doc/debugging_history/AGENTS.md b/doc/debugging_history/AGENTS.md new file mode 100644 index 00000000..ef68c739 --- /dev/null +++ b/doc/debugging_history/AGENTS.md @@ -0,0 +1,54 @@ +# Debugging History Documentation Guide + +This directory contains documentation of debugging sessions for DeepWork. Each file focuses on a specific subsystem (e.g., `hooks.md` for the hooks system). + +## Purpose + +Recording debugging sessions helps: +1. Preserve institutional knowledge about subtle bugs +2. Prevent regressions by documenting root causes +3. Provide context for future developers encountering similar issues +4. Build a pattern library of common issues and solutions + +## Template for Debugging Entries + +When documenting a debugging session, use this structure: + +```markdown +## YYYY-MM-DD: Brief Issue Title + +### Symptoms +What was observed? What tests were failing? + +### Investigation +What was examined? What code paths were traced? + +### Root Cause +What was the actual bug? + +### The Fix +What changes were made? + +### Test Cases Affected +Which tests verify this fix? + +### Key Learnings +What general lessons apply to future development? + +### Related Files +Which files are involved? +``` + +## Guidelines + +1. **Be specific**: Include exact file paths, line numbers, and code snippets +2. **Document the journey**: Explain what you tried, not just what worked +3. **Highlight patterns**: Note if the issue represents a common class of bugs +4. **Link to commits/PRs**: Reference the fix for easy lookup +5. **Keep it concise**: Focus on what's useful for future debugging + +## File Organization + +- One file per subsystem (e.g., `hooks.md`, `queue.md`, `parser.md`) +- Entries within each file are in reverse chronological order (newest first) +- Use consistent heading levels for easy navigation diff --git a/doc/debugging_history/hooks.md b/doc/debugging_history/hooks.md index 9948f5e1..521944c9 100644 --- a/doc/debugging_history/hooks.md +++ b/doc/debugging_history/hooks.md @@ -57,12 +57,13 @@ Command rules with FAILED status were not skipped, causing them to re-run on eve ### The Fix -Added a check for COMMAND rules with FAILED status, similar to the existing PROMPT/QUEUED check: +Two-part fix in `rules_check.py`: + +1. **Prevent re-running**: Skip COMMAND rules with FAILED status to prevent infinite loops: ```python -# For COMMAND rules, also skip if already FAILED (already shown error). -# This prevents infinite loops when a command always fails. -# The agent needs to either fix the underlying issue or provide a promise. +# For COMMAND rules with FAILED status, don't re-run the command. +# The agent has already seen the error. if ( existing and existing.status == QueueEntryStatus.FAILED @@ -71,10 +72,32 @@ if ( continue ``` +2. **Honor promises**: After processing results, check all FAILED queue entries and update to SKIPPED if the agent provided a promise: + +```python +# Handle FAILED queue entries that have been promised +if promised_rules: + promised_lower = {name.lower() for name in promised_rules} + for entry in queue.get_all_entries(): + if ( + entry.status == QueueEntryStatus.FAILED + and entry.rule_name.lower() in promised_lower + ): + queue.update_status( + entry.trigger_hash, + QueueEntryStatus.SKIPPED, + ActionResult( + type="command", + output="Acknowledged via promise tag", + exit_code=None, + ), + ) +``` + This ensures that: - A failing command only runs once per trigger - The agent sees the error message once -- The agent must provide a `Rule Name` tag to proceed +- When the agent provides a `Rule Name` tag, the queue entry is properly updated to SKIPPED - No infinite loop occurs ### Test Cases Affected @@ -99,31 +122,4 @@ This ensures that: --- -## Template for Future Entries - -When debugging hooks issues, document findings using this structure: - -```markdown -## YYYY-MM-DD: Brief Issue Title - -### Symptoms -What was observed? What tests were failing? - -### Investigation -What was examined? What code paths were traced? - -### Root Cause -What was the actual bug? - -### The Fix -What changes were made? - -### Test Cases Affected -Which tests verify this fix? - -### Key Learnings -What general lessons apply to future development? - -### Related Files -Which files are involved? -``` +*For the template and guidelines on documenting debugging sessions, see [AGENTS.md](./AGENTS.md).* diff --git a/src/deepwork/hooks/rules_check.py b/src/deepwork/hooks/rules_check.py index 9d1f5d04..6ac2d652 100644 --- a/src/deepwork/hooks/rules_check.py +++ b/src/deepwork/hooks/rules_check.py @@ -624,9 +624,9 @@ def rules_check_hook(hook_input: HookInput) -> HookOutput: ): continue - # For COMMAND rules, also skip if already FAILED (already shown error). - # This prevents infinite loops when a command always fails. - # The agent needs to either fix the underlying issue or provide a promise. + # For COMMAND rules with FAILED status, don't re-run the command. + # The agent has already seen the error. If they provide a promise, + # the after-loop logic will update the status to SKIPPED. if ( existing and existing.status == QueueEntryStatus.FAILED @@ -685,6 +685,26 @@ def rules_check_hook(hook_input: HookInput) -> HookOutput: # Collect for prompt output prompt_results.append(result) + # Handle FAILED queue entries that have been promised + # (These rules weren't in results because evaluate_rules skips promised rules, + # but we need to update their queue status to SKIPPED) + if promised_rules: + promised_lower = {name.lower() for name in promised_rules} + for entry in queue.get_all_entries(): + if ( + entry.status == QueueEntryStatus.FAILED + and entry.rule_name.lower() in promised_lower + ): + queue.update_status( + entry.trigger_hash, + QueueEntryStatus.SKIPPED, + ActionResult( + type="command", + output="Acknowledged via promise tag", + exit_code=None, + ), + ) + # Build response messages: list[str] = [] From cd9f272d2b0f14d2781ee2117b934c258085fc3c Mon Sep 17 00:00:00 2001 From: Noah Horton Date: Thu, 22 Jan 2026 14:47:27 -0700 Subject: [PATCH 3/3] Bump version to 0.5.2 for COMMAND rules promise fix Adds changelog entry documenting the bug fix where FAILED command rule queue entries are now properly updated to SKIPPED when agents provide promise tags. Co-Authored-By: Claude Opus 4.5 --- CHANGELOG.md | 8 ++++++++ pyproject.toml | 2 +- uv.lock | 2 +- 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index e6155e66..1a6a3ce0 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,6 +5,14 @@ All notable changes to DeepWork will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [0.5.2] - 2026-01-22 + +### Fixed +- Fixed COMMAND rules promise handling to properly update queue status + - When an agent provides a promise tag for a FAILED command rule, the queue entry is now correctly updated to SKIPPED status + - Previously, FAILED queue entries remained in FAILED state even after being acknowledged via promise + - This ensures the rules queue accurately reflects rule state throughout the workflow + ## [0.5.1] - 2026-01-22 ### Fixed diff --git a/pyproject.toml b/pyproject.toml index c2bc3e4a..1c5f8e02 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,6 +1,6 @@ [project] name = "deepwork" -version = "0.5.1" +version = "0.5.2" description = "Framework for enabling AI agents to perform complex, multi-step work tasks" readme = "README.md" requires-python = ">=3.11" diff --git a/uv.lock b/uv.lock index 5c61745e..474e30f9 100644 --- a/uv.lock +++ b/uv.lock @@ -126,7 +126,7 @@ toml = [ [[package]] name = "deepwork" -version = "0.5.1" +version = "0.5.2" source = { editable = "." } dependencies = [ { name = "click" },