Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,19 @@ deepwork/
├── deepwork_rules/ # ← Installed copy, NOT source of truth
└── [bespoke_job]/ # ← Source of truth for bespoke only

## Debugging Issues

When debugging issues in this codebase, **always consult `doc/debugging_history/`** first. This directory contains documentation of past debugging sessions, including:

- Root causes of tricky bugs
- Key learnings and patterns to avoid
- Related files and test cases

**After resolving an issue**, append your findings to the appropriate file in `doc/debugging_history/` (or create a new file if none exists for that subsystem). This helps future agents avoid the same pitfalls.

Current debugging history files:
- `doc/debugging_history/hooks.md` - Hooks system debugging (rules_check, blocking, queue management)

## Development Environment

This project uses **Nix Flakes** to provide a reproducible development environment.
Expand Down
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,14 @@ All notable changes to DeepWork will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.5.2] - 2026-01-22

### Fixed
- Fixed COMMAND rules promise handling to properly update queue status
- When an agent provides a promise tag for a FAILED command rule, the queue entry is now correctly updated to SKIPPED status
- Previously, FAILED queue entries remained in FAILED state even after being acknowledged via promise
- This ensures the rules queue accurately reflects rule state throughout the workflow

## [0.5.1] - 2026-01-22

### Fixed
Expand Down
54 changes: 54 additions & 0 deletions doc/debugging_history/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Debugging History Documentation Guide

This directory contains documentation of debugging sessions for DeepWork. Each file focuses on a specific subsystem (e.g., `hooks.md` for the hooks system).

## Purpose

Recording debugging sessions helps:
1. Preserve institutional knowledge about subtle bugs
2. Prevent regressions by documenting root causes
3. Provide context for future developers encountering similar issues
4. Build a pattern library of common issues and solutions

## Template for Debugging Entries

When documenting a debugging session, use this structure:

```markdown
## YYYY-MM-DD: Brief Issue Title

### Symptoms
What was observed? What tests were failing?

### Investigation
What was examined? What code paths were traced?

### Root Cause
What was the actual bug?

### The Fix
What changes were made?

### Test Cases Affected
Which tests verify this fix?

### Key Learnings
What general lessons apply to future development?

### Related Files
Which files are involved?
```

## Guidelines

1. **Be specific**: Include exact file paths, line numbers, and code snippets
2. **Document the journey**: Explain what you tried, not just what worked
3. **Highlight patterns**: Note if the issue represents a common class of bugs
4. **Link to commits/PRs**: Reference the fix for easy lookup
5. **Keep it concise**: Focus on what's useful for future debugging

## File Organization

- One file per subsystem (e.g., `hooks.md`, `queue.md`, `parser.md`)
- Entries within each file are in reverse chronological order (newest first)
- Use consistent heading levels for easy navigation
125 changes: 125 additions & 0 deletions doc/debugging_history/hooks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# Hooks System Debugging History

This document records debugging sessions and findings for the DeepWork hooks system.

---

## 2026-01-22: Infinite Loop Bug in Command Rules

### Symptoms

The manual tests "Infinite Block Command - Should Fire (no promise)" were hanging infinitely. The sub-agents spawned to test these rules never returned, even with `max_turns: 5` configured.

### Investigation

The `rules_check.py` hook handles two types of rule actions:
1. **PROMPT rules**: Show instructions to the agent
2. **COMMAND rules**: Run a shell command (e.g., linting, type checking)

For **PROMPT rules**, there was existing logic to prevent infinite loops (lines 617-624 in `rules_check.py`):

```python
# For PROMPT rules, also skip if already QUEUED (already shown to agent).
# This prevents infinite loops when transcript is unavailable or promise
# tags haven't been written yet. The agent has already seen this rule.
if (
existing
and existing.status == QueueEntryStatus.QUEUED
and rule.action_type == ActionType.PROMPT
):
continue
```

However, for **COMMAND rules**, there was no equivalent protection. The flow was:

1. Agent edits file
2. Hook runs, command fails, status set to FAILED, blocks with error
3. Agent sees error, responds (without promise)
4. Hook runs again
5. Rule triggers (same files still modified)
6. Existing entry has status FAILED, but FAILED is not in skip conditions
7. Command runs again, fails again, blocks again
8. Go to step 3 → **infinite loop**

### Root Cause

The queue status checks only skipped rules with status PASSED or SKIPPED:

```python
if existing and existing.status in (
QueueEntryStatus.PASSED,
QueueEntryStatus.SKIPPED,
):
continue
```

Command rules with FAILED status were not skipped, causing them to re-run on every hook invocation until a promise was provided. But without any way for the agent to know it was in a loop, the command would run infinitely.

### The Fix

Two-part fix in `rules_check.py`:

1. **Prevent re-running**: Skip COMMAND rules with FAILED status to prevent infinite loops:

```python
# For COMMAND rules with FAILED status, don't re-run the command.
# The agent has already seen the error.
if (
existing
and existing.status == QueueEntryStatus.FAILED
and rule.action_type == ActionType.COMMAND
):
continue
```

2. **Honor promises**: After processing results, check all FAILED queue entries and update to SKIPPED if the agent provided a promise:

```python
# Handle FAILED queue entries that have been promised
if promised_rules:
promised_lower = {name.lower() for name in promised_rules}
for entry in queue.get_all_entries():
if (
entry.status == QueueEntryStatus.FAILED
and entry.rule_name.lower() in promised_lower
):
queue.update_status(
entry.trigger_hash,
QueueEntryStatus.SKIPPED,
ActionResult(
type="command",
output="Acknowledged via promise tag",
exit_code=None,
),
)
```

This ensures that:
- A failing command only runs once per trigger
- The agent sees the error message once
- When the agent provides a `<promise>Rule Name</promise>` tag, the queue entry is properly updated to SKIPPED
- No infinite loop occurs

### Test Cases Affected

- `manual_tests/test_infinite_block_command/` - Tests a rule with `command: "false"` (always fails)
- The test verifies that the hook fires AND the sub-agent returns in reasonable time (doesn't hang)

### Key Learnings

1. **Any hook action that can block must have loop prevention**: Both PROMPT and COMMAND rules need mechanisms to prevent re-triggering infinitely.

2. **Queue status is the key to loop prevention**: The rules queue tracks what the agent has already seen. Rules should not re-trigger if they've already been shown to the agent (QUEUED for prompts, FAILED for commands).

3. **Symmetry in action handling**: When adding loop prevention for one action type, check if other action types need similar protection.

### Related Files

- `src/deepwork/hooks/rules_check.py` - Main hook implementation
- `src/deepwork/core/rules_queue.py` - Queue entry status definitions
- `.deepwork/rules/manual-test-infinite-block-command.md` - Test rule
- `manual_tests/test_infinite_block_command/` - Test files

---

*For the template and guidelines on documenting debugging sessions, see [AGENTS.md](./AGENTS.md).*
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "deepwork"
version = "0.5.1"
version = "0.5.2"
description = "Framework for enabling AI agents to perform complex, multi-step work tasks"
readme = "README.md"
requires-python = ">=3.11"
Expand Down
30 changes: 30 additions & 0 deletions src/deepwork/hooks/rules_check.py
Original file line number Diff line number Diff line change
Expand Up @@ -624,6 +624,16 @@ def rules_check_hook(hook_input: HookInput) -> HookOutput:
):
continue

# For COMMAND rules with FAILED status, don't re-run the command.
# The agent has already seen the error. If they provide a promise,
# the after-loop logic will update the status to SKIPPED.
if (
existing
and existing.status == QueueEntryStatus.FAILED
and rule.action_type == ActionType.COMMAND
):
continue

# Create queue entry if new
if not existing:
queue.create_entry(
Expand Down Expand Up @@ -675,6 +685,26 @@ def rules_check_hook(hook_input: HookInput) -> HookOutput:
# Collect for prompt output
prompt_results.append(result)

# Handle FAILED queue entries that have been promised
# (These rules weren't in results because evaluate_rules skips promised rules,
# but we need to update their queue status to SKIPPED)
if promised_rules:
promised_lower = {name.lower() for name in promised_rules}
for entry in queue.get_all_entries():
if (
entry.status == QueueEntryStatus.FAILED
and entry.rule_name.lower() in promised_lower
):
queue.update_status(
entry.trigger_hash,
QueueEntryStatus.SKIPPED,
ActionResult(
type="command",
output="Acknowledged via promise tag",
exit_code=None,
),
)

# Build response
messages: list[str] = []

Expand Down
2 changes: 1 addition & 1 deletion uv.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.