Unsupervisedcom · nhorton · Jan 22, 2026 · Jan 22, 2026 · Jan 22, 2026 · Jan 22, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -84,6 +84,19 @@ deepwork/
     ├── deepwork_rules/             # ← Installed copy, NOT source of truth
     └── [bespoke_job]/              # ← Source of truth for bespoke only
 
+## Debugging Issues
+
+When debugging issues in this codebase, **always consult `doc/debugging_history/`** first. This directory contains documentation of past debugging sessions, including:
+
+- Root causes of tricky bugs
+- Key learnings and patterns to avoid
+- Related files and test cases
+
+**After resolving an issue**, append your findings to the appropriate file in `doc/debugging_history/` (or create a new file if none exists for that subsystem). This helps future agents avoid the same pitfalls.
+
+Current debugging history files:
+- `doc/debugging_history/hooks.md` - Hooks system debugging (rules_check, blocking, queue management)
+
 ## Development Environment
 
 This project uses **Nix Flakes** to provide a reproducible development environment.

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -5,6 +5,14 @@ All notable changes to DeepWork will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [0.5.2] - 2026-01-22
+
+### Fixed
+- Fixed COMMAND rules promise handling to properly update queue status
+  - When an agent provides a promise tag for a FAILED command rule, the queue entry is now correctly updated to SKIPPED status
+  - Previously, FAILED queue entries remained in FAILED state even after being acknowledged via promise
+  - This ensures the rules queue accurately reflects rule state throughout the workflow
+
 ## [0.5.1] - 2026-01-22
 
 ### Fixed

diff --git a/doc/debugging_history/AGENTS.md b/doc/debugging_history/AGENTS.md
@@ -0,0 +1,54 @@
+# Debugging History Documentation Guide
+
+This directory contains documentation of debugging sessions for DeepWork. Each file focuses on a specific subsystem (e.g., `hooks.md` for the hooks system).
+
+## Purpose
+
+Recording debugging sessions helps:
+1. Preserve institutional knowledge about subtle bugs
+2. Prevent regressions by documenting root causes
+3. Provide context for future developers encountering similar issues
+4. Build a pattern library of common issues and solutions
+
+## Template for Debugging Entries
+
+When documenting a debugging session, use this structure:
+
+```markdown
+## YYYY-MM-DD: Brief Issue Title
+
+### Symptoms
+What was observed? What tests were failing?
+
+### Investigation
+What was examined? What code paths were traced?
+
+### Root Cause
+What was the actual bug?
+
+### The Fix
+What changes were made?
+
+### Test Cases Affected
+Which tests verify this fix?
+
+### Key Learnings
+What general lessons apply to future development?
+
+### Related Files
+Which files are involved?
+```
+
+## Guidelines
+
+1. **Be specific**: Include exact file paths, line numbers, and code snippets
+2. **Document the journey**: Explain what you tried, not just what worked
+3. **Highlight patterns**: Note if the issue represents a common class of bugs
+4. **Link to commits/PRs**: Reference the fix for easy lookup
+5. **Keep it concise**: Focus on what's useful for future debugging
+
+## File Organization
+
+- One file per subsystem (e.g., `hooks.md`, `queue.md`, `parser.md`)
+- Entries within each file are in reverse chronological order (newest first)
+- Use consistent heading levels for easy navigation
diff --git a/doc/debugging_history/hooks.md b/doc/debugging_history/hooks.md
@@ -0,0 +1,125 @@
+# Hooks System Debugging History
+
+This document records debugging sessions and findings for the DeepWork hooks system.
+
+---
+
+## 2026-01-22: Infinite Loop Bug in Command Rules
+
+### Symptoms
+
+The manual tests "Infinite Block Command - Should Fire (no promise)" were hanging infinitely. The sub-agents spawned to test these rules never returned, even with `max_turns: 5` configured.
+
+### Investigation
+
+The `rules_check.py` hook handles two types of rule actions:
+1. **PROMPT rules**: Show instructions to the agent
+2. **COMMAND rules**: Run a shell command (e.g., linting, type checking)
+
+For **PROMPT rules**, there was existing logic to prevent infinite loops (lines 617-624 in `rules_check.py`):
+
+```python
+# For PROMPT rules, also skip if already QUEUED (already shown to agent).
+# This prevents infinite loops when transcript is unavailable or promise
+# tags haven't been written yet. The agent has already seen this rule.
+if (
+    existing
+    and existing.status == QueueEntryStatus.QUEUED
+    and rule.action_type == ActionType.PROMPT
+):
+    continue
+```
+
+However, for **COMMAND rules**, there was no equivalent protection. The flow was:
+
+1. Agent edits file
+2. Hook runs, command fails, status set to FAILED, blocks with error
+3. Agent sees error, responds (without promise)
+4. Hook runs again
+5. Rule triggers (same files still modified)
+6. Existing entry has status FAILED, but FAILED is not in skip conditions
+7. Command runs again, fails again, blocks again
+8. Go to step 3 → **infinite loop**
+
+### Root Cause
+
+The queue status checks only skipped rules with status PASSED or SKIPPED:
+
+```python
+if existing and existing.status in (
+    QueueEntryStatus.PASSED,
+    QueueEntryStatus.SKIPPED,
+):
+    continue
+```
+
+Command rules with FAILED status were not skipped, causing them to re-run on every hook invocation until a promise was provided. But without any way for the agent to know it was in a loop, the command would run infinitely.
+
+### The Fix
+
+Two-part fix in `rules_check.py`:
+
+1. **Prevent re-running**: Skip COMMAND rules with FAILED status to prevent infinite loops:
+
+```python
+# For COMMAND rules with FAILED status, don't re-run the command.
+# The agent has already seen the error.
+if (
+    existing
+    and existing.status == QueueEntryStatus.FAILED
+    and rule.action_type == ActionType.COMMAND
+):
+    continue
+```
+
+2. **Honor promises**: After processing results, check all FAILED queue entries and update to SKIPPED if the agent provided a promise:
+
+```python
+# Handle FAILED queue entries that have been promised
+if promised_rules:
+    promised_lower = {name.lower() for name in promised_rules}
+    for entry in queue.get_all_entries():
+        if (
+            entry.status == QueueEntryStatus.FAILED
+            and entry.rule_name.lower() in promised_lower
+        ):
+            queue.update_status(
+                entry.trigger_hash,
+                QueueEntryStatus.SKIPPED,
+                ActionResult(
+                    type="command",
+                    output="Acknowledged via promise tag",
+                    exit_code=None,
+                ),
+            )
+```
+
+This ensures that:
+- A failing command only runs once per trigger
+- The agent sees the error message once
+- When the agent provides a `<promise>Rule Name</promise>` tag, the queue entry is properly updated to SKIPPED
+- No infinite loop occurs
+
+### Test Cases Affected
+
+- `manual_tests/test_infinite_block_command/` - Tests a rule with `command: "false"` (always fails)
+- The test verifies that the hook fires AND the sub-agent returns in reasonable time (doesn't hang)
+
+### Key Learnings
+
+1. **Any hook action that can block must have loop prevention**: Both PROMPT and COMMAND rules need mechanisms to prevent re-triggering infinitely.
+
+2. **Queue status is the key to loop prevention**: The rules queue tracks what the agent has already seen. Rules should not re-trigger if they've already been shown to the agent (QUEUED for prompts, FAILED for commands).
+
+3. **Symmetry in action handling**: When adding loop prevention for one action type, check if other action types need similar protection.
+
+### Related Files
+
+- `src/deepwork/hooks/rules_check.py` - Main hook implementation
+- `src/deepwork/core/rules_queue.py` - Queue entry status definitions
+- `.deepwork/rules/manual-test-infinite-block-command.md` - Test rule
+- `manual_tests/test_infinite_block_command/` - Test files
+
+---
+
+*For the template and guidelines on documenting debugging sessions, see [AGENTS.md](./AGENTS.md).*
diff --git a/pyproject.toml b/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "deepwork"
-version = "0.5.1"
+version = "0.5.2"
 description = "Framework for enabling AI agents to perform complex, multi-step work tasks"
 readme = "README.md"
 requires-python = ">=3.11"

diff --git a/src/deepwork/hooks/rules_check.py b/src/deepwork/hooks/rules_check.py
@@ -624,6 +624,16 @@ def rules_check_hook(hook_input: HookInput) -> HookOutput:
             ):
                 continue
 
+            # For COMMAND rules with FAILED status, don't re-run the command.
+            # The agent has already seen the error. If they provide a promise,
+            # the after-loop logic will update the status to SKIPPED.
+            if (
+                existing
+                and existing.status == QueueEntryStatus.FAILED
+                and rule.action_type == ActionType.COMMAND
+            ):
+                continue
+
             # Create queue entry if new
             if not existing:
                 queue.create_entry(
@@ -675,6 +685,26 @@ def rules_check_hook(hook_input: HookInput) -> HookOutput:
                 # Collect for prompt output
                 prompt_results.append(result)
 
+    # Handle FAILED queue entries that have been promised
+    # (These rules weren't in results because evaluate_rules skips promised rules,
+    # but we need to update their queue status to SKIPPED)
+    if promised_rules:
+        promised_lower = {name.lower() for name in promised_rules}
+        for entry in queue.get_all_entries():
+            if (
+                entry.status == QueueEntryStatus.FAILED
+                and entry.rule_name.lower() in promised_lower
+            ):
+                queue.update_status(
+                    entry.trigger_hash,
+                    QueueEntryStatus.SKIPPED,
+                    ActionResult(
+                        type="command",
+                        output="Acknowledged via promise tag",
+                        exit_code=None,
+                    ),
+                )
+
     # Build response
     messages: list[str] = []
 

diff --git a/uv.lock b/uv.lock