diff --git a/AGENTS.md b/AGENTS.md index 3e469ef8..2c34acf4 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -84,6 +84,19 @@ deepwork/ ├── deepwork_rules/ # ← Installed copy, NOT source of truth └── [bespoke_job]/ # ← Source of truth for bespoke only +## Debugging Issues + +When debugging issues in this codebase, **always consult `doc/debugging_history/`** first. This directory contains documentation of past debugging sessions, including: + +- Root causes of tricky bugs +- Key learnings and patterns to avoid +- Related files and test cases + +**After resolving an issue**, append your findings to the appropriate file in `doc/debugging_history/` (or create a new file if none exists for that subsystem). This helps future agents avoid the same pitfalls. + +Current debugging history files: +- `doc/debugging_history/hooks.md` - Hooks system debugging (rules_check, blocking, queue management) + ## Development Environment This project uses **Nix Flakes** to provide a reproducible development environment. diff --git a/CHANGELOG.md b/CHANGELOG.md index e6155e66..1a6a3ce0 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,6 +5,14 @@ All notable changes to DeepWork will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [0.5.2] - 2026-01-22 + +### Fixed +- Fixed COMMAND rules promise handling to properly update queue status + - When an agent provides a promise tag for a FAILED command rule, the queue entry is now correctly updated to SKIPPED status + - Previously, FAILED queue entries remained in FAILED state even after being acknowledged via promise + - This ensures the rules queue accurately reflects rule state throughout the workflow + ## [0.5.1] - 2026-01-22 ### Fixed diff --git a/doc/debugging_history/AGENTS.md b/doc/debugging_history/AGENTS.md new file mode 100644 index 00000000..ef68c739 --- /dev/null +++ b/doc/debugging_history/AGENTS.md @@ -0,0 +1,54 @@ +# Debugging History Documentation Guide + +This directory contains documentation of debugging sessions for DeepWork. Each file focuses on a specific subsystem (e.g., `hooks.md` for the hooks system). + +## Purpose + +Recording debugging sessions helps: +1. Preserve institutional knowledge about subtle bugs +2. Prevent regressions by documenting root causes +3. Provide context for future developers encountering similar issues +4. Build a pattern library of common issues and solutions + +## Template for Debugging Entries + +When documenting a debugging session, use this structure: + +```markdown +## YYYY-MM-DD: Brief Issue Title + +### Symptoms +What was observed? What tests were failing? + +### Investigation +What was examined? What code paths were traced? + +### Root Cause +What was the actual bug? + +### The Fix +What changes were made? + +### Test Cases Affected +Which tests verify this fix? + +### Key Learnings +What general lessons apply to future development? + +### Related Files +Which files are involved? +``` + +## Guidelines + +1. **Be specific**: Include exact file paths, line numbers, and code snippets +2. **Document the journey**: Explain what you tried, not just what worked +3. **Highlight patterns**: Note if the issue represents a common class of bugs +4. **Link to commits/PRs**: Reference the fix for easy lookup +5. **Keep it concise**: Focus on what's useful for future debugging + +## File Organization + +- One file per subsystem (e.g., `hooks.md`, `queue.md`, `parser.md`) +- Entries within each file are in reverse chronological order (newest first) +- Use consistent heading levels for easy navigation diff --git a/doc/debugging_history/hooks.md b/doc/debugging_history/hooks.md new file mode 100644 index 00000000..521944c9 --- /dev/null +++ b/doc/debugging_history/hooks.md @@ -0,0 +1,125 @@ +# Hooks System Debugging History + +This document records debugging sessions and findings for the DeepWork hooks system. + +--- + +## 2026-01-22: Infinite Loop Bug in Command Rules + +### Symptoms + +The manual tests "Infinite Block Command - Should Fire (no promise)" were hanging infinitely. The sub-agents spawned to test these rules never returned, even with `max_turns: 5` configured. + +### Investigation + +The `rules_check.py` hook handles two types of rule actions: +1. **PROMPT rules**: Show instructions to the agent +2. **COMMAND rules**: Run a shell command (e.g., linting, type checking) + +For **PROMPT rules**, there was existing logic to prevent infinite loops (lines 617-624 in `rules_check.py`): + +```python +# For PROMPT rules, also skip if already QUEUED (already shown to agent). +# This prevents infinite loops when transcript is unavailable or promise +# tags haven't been written yet. The agent has already seen this rule. +if ( + existing + and existing.status == QueueEntryStatus.QUEUED + and rule.action_type == ActionType.PROMPT +): + continue +``` + +However, for **COMMAND rules**, there was no equivalent protection. The flow was: + +1. Agent edits file +2. Hook runs, command fails, status set to FAILED, blocks with error +3. Agent sees error, responds (without promise) +4. Hook runs again +5. Rule triggers (same files still modified) +6. Existing entry has status FAILED, but FAILED is not in skip conditions +7. Command runs again, fails again, blocks again +8. Go to step 3 → **infinite loop** + +### Root Cause + +The queue status checks only skipped rules with status PASSED or SKIPPED: + +```python +if existing and existing.status in ( + QueueEntryStatus.PASSED, + QueueEntryStatus.SKIPPED, +): + continue +``` + +Command rules with FAILED status were not skipped, causing them to re-run on every hook invocation until a promise was provided. But without any way for the agent to know it was in a loop, the command would run infinitely. + +### The Fix + +Two-part fix in `rules_check.py`: + +1. **Prevent re-running**: Skip COMMAND rules with FAILED status to prevent infinite loops: + +```python +# For COMMAND rules with FAILED status, don't re-run the command. +# The agent has already seen the error. +if ( + existing + and existing.status == QueueEntryStatus.FAILED + and rule.action_type == ActionType.COMMAND +): + continue +``` + +2. **Honor promises**: After processing results, check all FAILED queue entries and update to SKIPPED if the agent provided a promise: + +```python +# Handle FAILED queue entries that have been promised +if promised_rules: + promised_lower = {name.lower() for name in promised_rules} + for entry in queue.get_all_entries(): + if ( + entry.status == QueueEntryStatus.FAILED + and entry.rule_name.lower() in promised_lower + ): + queue.update_status( + entry.trigger_hash, + QueueEntryStatus.SKIPPED, + ActionResult( + type="command", + output="Acknowledged via promise tag", + exit_code=None, + ), + ) +``` + +This ensures that: +- A failing command only runs once per trigger +- The agent sees the error message once +- When the agent provides a `Rule Name` tag, the queue entry is properly updated to SKIPPED +- No infinite loop occurs + +### Test Cases Affected + +- `manual_tests/test_infinite_block_command/` - Tests a rule with `command: "false"` (always fails) +- The test verifies that the hook fires AND the sub-agent returns in reasonable time (doesn't hang) + +### Key Learnings + +1. **Any hook action that can block must have loop prevention**: Both PROMPT and COMMAND rules need mechanisms to prevent re-triggering infinitely. + +2. **Queue status is the key to loop prevention**: The rules queue tracks what the agent has already seen. Rules should not re-trigger if they've already been shown to the agent (QUEUED for prompts, FAILED for commands). + +3. **Symmetry in action handling**: When adding loop prevention for one action type, check if other action types need similar protection. + +### Related Files + +- `src/deepwork/hooks/rules_check.py` - Main hook implementation +- `src/deepwork/core/rules_queue.py` - Queue entry status definitions +- `.deepwork/rules/manual-test-infinite-block-command.md` - Test rule +- `manual_tests/test_infinite_block_command/` - Test files + +--- + +*For the template and guidelines on documenting debugging sessions, see [AGENTS.md](./AGENTS.md).* diff --git a/pyproject.toml b/pyproject.toml index c2bc3e4a..1c5f8e02 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,6 +1,6 @@ [project] name = "deepwork" -version = "0.5.1" +version = "0.5.2" description = "Framework for enabling AI agents to perform complex, multi-step work tasks" readme = "README.md" requires-python = ">=3.11" diff --git a/src/deepwork/hooks/rules_check.py b/src/deepwork/hooks/rules_check.py index 0a39fa3d..6ac2d652 100644 --- a/src/deepwork/hooks/rules_check.py +++ b/src/deepwork/hooks/rules_check.py @@ -624,6 +624,16 @@ def rules_check_hook(hook_input: HookInput) -> HookOutput: ): continue + # For COMMAND rules with FAILED status, don't re-run the command. + # The agent has already seen the error. If they provide a promise, + # the after-loop logic will update the status to SKIPPED. + if ( + existing + and existing.status == QueueEntryStatus.FAILED + and rule.action_type == ActionType.COMMAND + ): + continue + # Create queue entry if new if not existing: queue.create_entry( @@ -675,6 +685,26 @@ def rules_check_hook(hook_input: HookInput) -> HookOutput: # Collect for prompt output prompt_results.append(result) + # Handle FAILED queue entries that have been promised + # (These rules weren't in results because evaluate_rules skips promised rules, + # but we need to update their queue status to SKIPPED) + if promised_rules: + promised_lower = {name.lower() for name in promised_rules} + for entry in queue.get_all_entries(): + if ( + entry.status == QueueEntryStatus.FAILED + and entry.rule_name.lower() in promised_lower + ): + queue.update_status( + entry.trigger_hash, + QueueEntryStatus.SKIPPED, + ActionResult( + type="command", + output="Acknowledged via promise tag", + exit_code=None, + ), + ) + # Build response messages: list[str] = [] diff --git a/uv.lock b/uv.lock index 5c61745e..474e30f9 100644 --- a/uv.lock +++ b/uv.lock @@ -126,7 +126,7 @@ toml = [ [[package]] name = "deepwork" -version = "0.5.1" +version = "0.5.2" source = { editable = "." } dependencies = [ { name = "click" },