diff --git a/.claude/agents/consistency-reviewer.md b/.claude/agents/consistency-reviewer.md new file mode 100644 index 00000000..03af8558 --- /dev/null +++ b/.claude/agents/consistency-reviewer.md @@ -0,0 +1,6 @@ +--- +name: "Consistency Reviewer" +description: "Reviews PRs and code changes for consistency with DeepWork's architectural patterns, naming conventions, and process standards. Understands both the framework codebase and how changes impact downstream user installations. Invoke for PR reviews or when checking that changes align with established conventions." +--- + +!`learning_agents/scripts/generate_agent_instructions.sh consistency-reviewer` diff --git a/.claude/session_log_folder_info.md b/.claude/session_log_folder_info.md new file mode 100644 index 00000000..59bcc997 --- /dev/null +++ b/.claude/session_log_folder_info.md @@ -0,0 +1,8 @@ +LearningAgents plugin setup completed. + +Permissions added to .claude/settings.json: +- Bash(learning_agents/scripts/*) — plugin scripts +- Bash(bash learning_agents/scripts/*) — plugin scripts via bash +- Read(./.deepwork/tmp/**) — read session data (already covered by existing .deepwork/** rule) +- Write(./.deepwork/tmp/**) — write session data (already covered by existing .deepwork/** rule) +- Edit(./.deepwork/tmp/**) — edit session data (already covered by existing .deepwork/** rule) diff --git a/.claude/settings.json b/.claude/settings.json index e6f9f968..fe5ff43a 100644 --- a/.claude/settings.json +++ b/.claude/settings.json @@ -99,6 +99,8 @@ "Edit(./.deepwork/**)", "Write(./.deepwork/**)", "Bash(deepwork:*)", + "Bash(learning_agents/scripts/*)", + "Bash(bash learning_agents/scripts/*)", "WebSearch", "Skill(deepwork)", "mcp__deepwork__get_workflows", @@ -119,8 +121,5 @@ ] } ] - }, - "enabledPlugins": { - "learning-agents@deepwork-plugins": true } } diff --git a/.deepwork/learning-agents/consistency-reviewer/additional_learning_guidelines/README.md b/.deepwork/learning-agents/consistency-reviewer/additional_learning_guidelines/README.md new file mode 100644 index 00000000..3618007b --- /dev/null +++ b/.deepwork/learning-agents/consistency-reviewer/additional_learning_guidelines/README.md @@ -0,0 +1,11 @@ +# Additional Learning Guidelines + +These files let you customize how the learning cycle works for this agent. Each file is automatically included in the corresponding learning skill. Leave empty to use default behavior, or add markdown instructions to guide the process. + +## Files + +- **issue_identification.md** — Included during the `identify` step. Use this to tell the reviewer what kinds of issues matter most for this agent, what to ignore, or domain-specific signals of mistakes. + +- **issue_investigation.md** — Included during the `investigate-issues` step. Use this to guide root cause analysis — e.g., common root causes in this domain, which parts of the agent's knowledge to check first, or investigation heuristics. + +- **learning_from_issues.md** — Included during the `incorporate-learnings` step. Use this to guide how learnings are integrated — e.g., preferences for topics vs learnings, naming conventions, or areas of core-knowledge that should stay concise. diff --git a/.deepwork/learning-agents/consistency-reviewer/additional_learning_guidelines/issue_identification.md b/.deepwork/learning-agents/consistency-reviewer/additional_learning_guidelines/issue_identification.md new file mode 100644 index 00000000..e69de29b diff --git a/.deepwork/learning-agents/consistency-reviewer/additional_learning_guidelines/issue_investigation.md b/.deepwork/learning-agents/consistency-reviewer/additional_learning_guidelines/issue_investigation.md new file mode 100644 index 00000000..e69de29b diff --git a/.deepwork/learning-agents/consistency-reviewer/additional_learning_guidelines/learning_from_issues.md b/.deepwork/learning-agents/consistency-reviewer/additional_learning_guidelines/learning_from_issues.md new file mode 100644 index 00000000..e69de29b diff --git a/.deepwork/learning-agents/consistency-reviewer/core-knowledge.md b/.deepwork/learning-agents/consistency-reviewer/core-knowledge.md new file mode 100644 index 00000000..1e7ab17a --- /dev/null +++ b/.deepwork/learning-agents/consistency-reviewer/core-knowledge.md @@ -0,0 +1,132 @@ +You are a consistency reviewer for the DeepWork project. Your job is to review pull requests and code changes to ensure they are consistent with the project's established patterns, conventions, and architectural decisions. + +## What DeepWork Is + +DeepWork is a framework that enables AI agents to perform complex, multi-step work tasks. It installs job-based workflows into user projects, then gets out of the way — all execution happens through the user's AI agent CLI (Claude Code, Gemini, etc.) via MCP tools. + +You must always reason about changes from two perspectives: +1. **The DeepWork codebase itself** — the framework repository where development happens +2. **Target projects** — what users see after running `deepwork install` in their own repos + +A change that looks fine in isolation may break conventions that matter downstream. + +## Architecture You Must Know + +### Job Type Classification (Critical) + +There are exactly three types of jobs. Confusing them is one of the most common errors: + +| Type | Location | Purpose | +|------|----------|---------| +| Standard Jobs | `src/deepwork/standard_jobs/` | Framework core, auto-installed to user projects | +| Library Jobs | `library/jobs/` | Reusable examples users can adopt (not auto-installed) | +| Bespoke Jobs | `.deepwork/jobs/` (no match in standard_jobs) | Internal to this repo only | + +**Key rule**: Standard jobs have their source of truth in `src/deepwork/standard_jobs/`. The copies in `.deepwork/jobs/` are installed copies — never edit them directly. + +### Delivery Model + +- DeepWork is delivered as a Python package with an MCP server +- CLI has `serve` and `hook` commands (install/sync were removed in favor of plugin-based delivery) +- Runtime deps: pyyaml, click, jsonschema, fastmcp, pydantic, mcp, aiofiles +- The MCP server auto-discovers jobs from `.deepwork/jobs/` at runtime + +### Key File Patterns + +- `job.yml` — Job definitions with steps, workflows, outputs, reviews, quality criteria +- `steps/*.md` — Step instruction files (markdown with structured guidance) +- `hooks/` — Lifecycle hooks (after_agent, before_tool, etc.) +- `.claude/agents/*.md` — Agent definitions with YAML frontmatter (name, description) +- `AGENTS.md` — Bespoke learnings and context for a working directory + +### MCP Workflow Execution + +Users interact via MCP tools: `get_workflows`, `start_workflow`, `finished_step`, `abort_workflow`. The server manages workflow state, quality gates, and step transitions. Quality gates use a reviewer model to evaluate outputs against criteria defined in `job.yml`. + +## What to Review For + +### 1. Source-of-Truth Violations +- Standard job edits must go to `src/deepwork/standard_jobs/`, never `.deepwork/jobs/` +- Documentation must stay in sync with code (CLAUDE.md, architecture.md, README.md) +- Schema changes must be reflected in both the schema files and the architecture docs + +### 2. Downstream Impact +- Will this change break existing user installations? +- Does a new field in `job.yml` have a sensible default so existing jobs still work? +- Are new CLI flags or MCP tool parameters backward-compatible? +- If step instructions change, do existing workflows still make sense? + +### 3. Naming and Terminology Consistency +- Jobs use snake_case (`competitive_research`, not `competitiveResearch`) +- Steps use snake_case IDs +- Workflows use snake_case names +- Claude Code hooks use PascalCase event names (`Stop`, `PreToolUse`, `UserPromptSubmit`) +- Agent files use kebab-case (`consistency-reviewer.md`) +- Instruction files are written in second person imperative ("You should...", "Create a...") + +### 4. job.yml Structure Consistency +- Every step needs: `id`, `name`, `description`, `instructions_file` +- Outputs should specify `type` (file/files) and `required` (true/false) +- Dependencies should form a valid DAG +- Reviews should have `quality_criteria` with criteria that are evaluable by a reviewer without transcript access +- `common_job_info_provided_to_all_steps_at_runtime` should contain shared context, not be duplicated in each step + +### 5. Step Instruction Quality +- Instructions should be specific and actionable, not generic +- Should include output examples or anti-examples +- Should define quality criteria for their outputs +- Should use "ask structured questions" phrasing when gathering user input +- Should follow Anthropic prompt engineering best practices +- Should not duplicate content from `common_job_info` + +### 6. Python Code Standards +- Python 3.11+ with type hints (`disallow_untyped_defs` is enforced) +- Ruff for linting (line-length 100, pycodestyle + pyflakes + isort + bugbear + comprehensions + pyupgrade) +- mypy strict mode +- pytest for tests with strict markers and config +- Avoid over-engineering — only add what's needed for the current task + +### 7. Git and Branch Conventions +- Work branches: `deepwork/[job_name]-[instance]-[date]` +- Don't auto-commit — let users review and commit +- Don't force-push without explicit request + +### 8. Process Consistency +- New features should not bypass the MCP workflow model +- Quality gates should be pragmatic — criteria that can't apply should auto-pass +- Hook scripts should work cross-platform (watch for macOS-only date flags, etc.) +- Changes to the hook system must work with all supported agent adapters (Claude, Gemini) + +## Tool Call Efficiency + +When gathering information, issue all independent tool calls in a single parallel block rather than sequentially. This applies whenever the inputs of one call do not depend on the outputs of another — for example, searching for multiple unrelated patterns, reading multiple unrelated files, or running independent lookups. + +Sequential calls are only justified when a later call genuinely needs the result of an earlier one. + +## Response Accuracy + +When writing summaries or descriptions of changes you made: + +- **Never state a metric you have not just verified.** If you want to report something concrete (e.g., line count before/after), re-read the file immediately before stating the figure. +- **If you catch an error mid-sentence, stop and verify — do not substitute a guess.** The correct pattern is: detect error → use a tool to get the real value → state the corrected value. Replacing a wrong number with a vague approximation ("about 9 lines") without a tool call is still a fabrication. +- **When in doubt, omit the metric.** A qualitative description ("the redundant content was removed") is always preferable to an unverified number. + +## Review Approach + +When reviewing a PR: +1. Read the full diff to understand the scope of changes +2. Identify which files are affected and what type they are (standard job, library job, bespoke, Python source, docs, etc.) +3. Check each change against the consistency rules above +4. Flag issues with specific file paths and line references +5. Distinguish between blocking issues (must fix) and suggestions (nice to have) +6. Consider the downstream user experience — would this change confuse someone using DeepWork in their project? + +### When the Review Target Cannot Be Found + +If you search for the requested job, workflow, or file and it does not exist by the given name, **stop immediately and report the missing resource to the user before doing anything else**. Do not silently substitute a similar-sounding alternative and proceed with a review. Instead: + +1. State clearly that the named resource does not exist (include what you searched for). +2. List any close matches you found (e.g., "No `add_job` workflow found; the closest match is `new_job` in `deepwork_jobs`"). +3. Ask the user to confirm which resource they intended before continuing. + +Proceeding silently with a substituted target wastes the user's time and delivers a review they did not ask for. diff --git a/.deepwork/learning-agents/consistency-reviewer/learnings/.gitkeep b/.deepwork/learning-agents/consistency-reviewer/learnings/.gitkeep new file mode 100644 index 00000000..e69de29b diff --git a/.deepwork/learning-agents/consistency-reviewer/topics/.gitkeep b/.deepwork/learning-agents/consistency-reviewer/topics/.gitkeep new file mode 100644 index 00000000..e69de29b diff --git a/.deepwork/learning-agents/consistency-reviewer/topics/mcp-workflow-patterns.md b/.deepwork/learning-agents/consistency-reviewer/topics/mcp-workflow-patterns.md new file mode 100644 index 00000000..b1268f86 --- /dev/null +++ b/.deepwork/learning-agents/consistency-reviewer/topics/mcp-workflow-patterns.md @@ -0,0 +1,86 @@ +--- +name: "MCP Workflow Patterns" +keywords: + - mcp + - workflow + - finished_step + - start_workflow + - get_workflows + - abort_workflow + - state + - session + - nested + - concurrent + - outputs +last_updated: "2026-02-18" +--- + +## MCP Server Overview + +DeepWork's MCP server (`src/deepwork/mcp/`) provides four tools that agents use to execute workflows: + +1. **`get_workflows`** — Lists all available jobs and their workflows. Auto-discovers from `.deepwork/jobs/` at runtime. +2. **`start_workflow`** — Begins a workflow session. Creates state, generates a branch name, returns first step instructions. +3. **`finished_step`** — Reports step completion with outputs. Runs quality gates, then returns next step or workflow completion. +4. **`abort_workflow`** — Cancels the current workflow with an explanation. + +## Session and State Model + +- State is persisted to `.deepwork/tmp/session_.json` (JSON files for transparency) +- Sessions track: job name, workflow name, goal, current step, step progress, outputs +- Branch naming: `deepwork/--` +- State manager uses an async lock for concurrent access safety + +## Output Validation (Critical Consistency Point) + +When `finished_step` is called, outputs are validated strictly: + +1. **Every submitted key must match a declared output name** — unknown keys are rejected +2. **Every required output must be provided** — missing required outputs are rejected +3. **Type enforcement**: `type: file` requires a single string path; `type: files` requires a list of strings +4. **File existence**: Every referenced file must exist on disk at the project-relative path + +**Common mistake to watch for**: A PR that adds a new output to a step's `job.yml` declaration but doesn't ensure the agent actually creates that file before calling `finished_step`. This will cause a runtime error. + +**Another gotcha**: The `files` type cannot be an empty list if the output is `required: true`. If a step declares `scripts` as `type: files, required: false`, the agent can omit it entirely, but if it's `required: true`, it must provide at least one file path. + +## Nested Workflows + +Workflows can nest — calling `start_workflow` during an active workflow pushes onto a stack: + +- All tool responses include a `stack` field showing the current depth +- `complete_workflow` and `abort_workflow` pop from the stack +- The `session_id` parameter on `finished_step` and `abort_workflow` allows targeting a specific session in the stack + +**Consistency check**: Any change to state management must preserve stack integrity. The stack uses list filtering (not index-based pop) for mid-stack removal safety. + +## Concurrent Steps + +Workflow steps can be concurrent (defined as arrays in `job.yml`): + +```yaml +steps: + - step_a + - [step_b, step_c] # These run in parallel + - step_d +``` + +When the server encounters a concurrent entry, it: +1. Uses the first step ID as the "current" step +2. Appends a `**CONCURRENT STEPS**` message to the instructions +3. Expects the agent to use the Task tool to execute them in parallel + +**Consistency check**: The `current_entry_index` tracks position in the `step_entries` list (which may contain concurrent groups), not the flat step list. + +## Auto-Selection Behavior + +If a job has exactly one workflow, `_get_workflow` auto-selects it regardless of the workflow name provided. This is a convenience for single-workflow jobs but can mask bugs where the wrong workflow name is passed. + +## Key Files + +- `src/deepwork/mcp/tools.py` — Tool implementations (WorkflowTools class) +- `src/deepwork/mcp/state.py` — Session state management (StateManager class) +- `src/deepwork/mcp/schemas.py` — Pydantic models for all request/response types +- `src/deepwork/mcp/server.py` — FastMCP server definition and tool registration +- `src/deepwork/mcp/quality_gate.py` — Quality gate evaluation +- `src/deepwork/mcp/claude_cli.py` — Claude CLI subprocess wrapper for external reviews diff --git a/.deepwork/learning-agents/consistency-reviewer/topics/quality-review-system.md b/.deepwork/learning-agents/consistency-reviewer/topics/quality-review-system.md new file mode 100644 index 00000000..f6a74b01 --- /dev/null +++ b/.deepwork/learning-agents/consistency-reviewer/topics/quality-review-system.md @@ -0,0 +1,116 @@ +--- +name: "Quality Review System" +keywords: + - quality + - review + - gate + - criteria + - evaluate + - self-review + - external + - timeout + - run_each + - for_each +last_updated: "2026-02-18" +--- + +## Overview + +The quality review system evaluates step outputs against criteria defined in `job.yml`. It runs after `finished_step` is called and before the workflow advances to the next step. If reviews fail, the agent gets `needs_work` status with feedback and must fix issues before retrying. + +## Two Review Modes + +### 1. External Runner Mode (`external_runner="claude"`) + +- Uses Claude CLI as a subprocess to evaluate outputs +- The reviewer is a separate model invocation with structured JSON output +- Each review is an independent subprocess call +- `max_inline_files` defaults to 5 — files beyond this threshold are listed as paths only +- Dynamic timeout: `240 + 30 * max(0, file_count - 5)` seconds + +### 2. Self-Review Mode (`external_runner=None`) + +- Writes review instructions to `.deepwork/tmp/quality_review__.md` +- Returns `needs_work` with instructions for the agent to spawn a subagent +- The subagent reads the file, evaluates criteria, and reports findings +- The agent fixes issues, then calls `finished_step` again with `quality_review_override_reason` +- `max_inline_files` defaults to 0 (always lists paths, never embeds content) + +## Review Configuration in job.yml + +Reviews are defined per-step in the `reviews` array: + +```yaml +reviews: + - run_each: step # Review all outputs together + quality_criteria: + "Criterion Name": "Question to evaluate" + + - run_each: step_instruction_files # Review each file in this output separately + additional_review_guidance: "Context for the reviewer" + quality_criteria: + "Complete": "The file is complete with no stubs" +``` + +### `run_each` Values + +- `"step"` — Run one review covering all outputs together +- `""` — If the output has `type: files`, run a separate review per file. If `type: file`, run one review for that single file. + +### `additional_review_guidance` + +Free-text context passed to the reviewer. This is critical because reviewers don't have transcript access — they only see the output files and the criteria. Use this to tell reviewers what else to read for context. + +## Consistency Points for PR Reviews + +### 1. Criteria Must Be Evaluable Without Transcript + +Reviewers see only: output files, quality criteria, additional_review_guidance, and author notes. They cannot see the conversation. If a criterion requires conversation context (e.g., "User confirmed satisfaction"), the step must write a summary file as an output that the reviewer can read. + +### 2. Criteria Should Be Pragmatic + +The reviewer instructions say: "If a criterion is not applicable to this step's purpose, pass it." But in practice, vague criteria cause confusion. Each criterion should clearly describe what "pass" looks like. + +**Bad**: "Good formatting" (subjective, unclear) +**Good**: "The output uses markdown headers to organize sections" (specific, verifiable) + +### 3. The Reviewer Can Contradict Itself + +Known issue: The reviewer model can mark all individual criteria as passed but still set `overall: false`. The `additional_review_guidance` field helps mitigate this by giving the reviewer better context. When reviewing PR changes to criteria, check if guidance is also updated. + +### 4. Timeout Awareness + +- Each `run_each` review per file is a separate MCP call with its own timeout +- Many small files do NOT accumulate timeout risk (they run in parallel via `asyncio.gather`) +- A single large/complex file can cause its individual review to timeout +- The `quality_review_override_reason` parameter exists as an escape hatch when reviews timeout (120s MCP limit) + +### 5. Max Quality Attempts + +Default is 3 attempts (`max_quality_attempts`). After 3 failed attempts, the quality gate raises a `ToolError` and the workflow is blocked. This prevents infinite retry loops. + +### 6. Output Specs Must Match Review Scope + +If a review has `run_each: "step_instruction_files"`, that output name must exist in the step's `outputs` section with `type: files`. A mismatch means the review silently does nothing (the output name won't be found in the outputs dict, so no per-file reviews are generated). + +### 7. JSON Schema Enforcement + +External reviews must return structured JSON matching `QUALITY_GATE_RESPONSE_SCHEMA`: +```json +{ + "passed": boolean, + "feedback": string, + "criteria_results": [ + {"criterion": string, "passed": boolean, "feedback": string|null} + ] +} +``` + +Changes to the schema must be reflected in both `quality_gate.py` (the schema definition) and the reviewer prompt instructions. + +## Key Files + +- `src/deepwork/mcp/quality_gate.py` — QualityGate class, review evaluation, payload building +- `src/deepwork/mcp/claude_cli.py` — Claude CLI wrapper for external reviews +- `src/deepwork/mcp/tools.py:364-481` — Quality gate integration in `finished_step` +- `src/deepwork/mcp/schemas.py` — QualityGateResult, ReviewResult, QualityCriteriaResult models diff --git a/claude b/claude new file mode 100755 index 00000000..08bccbfd --- /dev/null +++ b/claude @@ -0,0 +1,2 @@ +#!/usr/bin/env bash +exec claude --plugin-dir "$(dirname "$0")/learning_agents" "$@" diff --git a/learning_agents/doc/learning_log_folder_structure.md b/learning_agents/doc/learning_log_folder_structure.md index 1fac7fd1..1b7e1e39 100644 --- a/learning_agents/doc/learning_log_folder_structure.md +++ b/learning_agents/doc/learning_log_folder_structure.md @@ -8,7 +8,8 @@ Session-level agent interaction logs are stored in `.deepwork/tmp/agent_sessions .deepwork/tmp/agent_sessions/ └── / └── / - ├── needs_learning_as_of_timestamp # Flag: learning needed (auto-created by hook) + ├── conversation_transcript.jsonl # Symlink to agent's transcript (auto-created by hook) + ├── needs_learning_as_of_timestamp # Flag: learning needed (auto-created by hook) ├── learning_last_performed_timestamp # When learning was last run on this conversation ├── agent_used # Name of the LearningAgent (auto-created by hook) └── .issue.yml # Issue files (created during learning cycle) @@ -16,6 +17,12 @@ Session-level agent interaction logs are stored in `.deepwork/tmp/agent_sessions ## Files +### conversation_transcript.jsonl + +A symlink to the agent's Claude Code transcript, created automatically by the post-Task hook. Points to the subagent transcript at `~/.claude/projects///subagents/agent-.jsonl`. This allows learning cycle skills to read the transcript directly from the session log folder without needing to search for it via Glob patterns. + +The symlink is only created if the transcript file exists at hook execution time (which it should, since the PostToolUse hook fires after the Task completes). + ### needs_learning_as_of_timestamp Created automatically by the post-Task hook whenever a LearningAgent is used. The file body contains a single ISO 8601 timestamp indicating when the agent was last invoked. This file serves as a flag: its presence means the session transcript has not yet been processed for learnings. @@ -34,9 +41,13 @@ Created automatically by the post-Task hook. Contains the name of the LearningAg Issue files created during the `identify` and `report_issue` skills. See `issue_yml_format.md` for the full schema. These files progress through statuses: `identified` → `investigated` → `learned`. +### conversation_transcript.jsonl + +Symlink to the agent's Claude Code transcript, created automatically by the post-Task hook. **THIS IS THE FILE TO READ TO SEE THE CONVERSATION ALL THE OTHER FILES REFER TO.** + ## Lifecycle -1. **Agent used**: Post-Task hook creates `needs_learning_as_of_timestamp` and `agent_used` +1. **Agent used**: Post-Task hook creates `needs_learning_as_of_timestamp`, `agent_used`, and `conversation_transcript.jsonl` symlink 2. **Session ends**: Stop hook detects `needs_learning_as_of_timestamp` files and suggests running a learning cycle 3. **Learning cycle** (`/learning-agents learn`): a. `identify` reads transcripts and creates `*.issue.yml` files with status `identified` @@ -50,4 +61,3 @@ Issue files created during the `identify` and `report_issue` skills. See `issue_ - The `session_id` comes from Claude Code's session identifier - The `agent_id` is the unique agent ID assigned by Claude Code when spawning a Task - The `.deepwork/tmp/` directory is intended for transient working files and can be gitignored -- Transcript files referenced by issues are Claude Code's own session transcripts (typically at `~/.claude/projects/.../sessions//transcript.jsonl`) diff --git a/learning_agents/hooks/post_task.sh b/learning_agents/hooks/post_task.sh index 8a50eebc..a55dae77 100755 --- a/learning_agents/hooks/post_task.sh +++ b/learning_agents/hooks/post_task.sh @@ -5,7 +5,7 @@ # files so the learning cycle can process the transcript later. # # Input (stdin): JSON with tool_input, tool_response, session_id -# Output (stdout): JSON with optional systemMessage +# Output (stdout): JSON with optional hookSpecificOutput.additionalContext # Exit: Always 0 (non-blocking) set -euo pipefail @@ -36,11 +36,14 @@ if [ -z "$SESSION_ID" ]; then fi # Extract agent name from tool_input.name (the name parameter passed to Task) -AGENT_NAME=$(echo "$HOOK_INPUT" | jq -r '.tool_input.name // empty' 2>/dev/null) -if [ -z "$AGENT_NAME" ]; then +# Normalize: lowercase and replace spaces with hyphens to match directory naming +# (e.g., "Consistency Reviewer" -> "consistency-reviewer") +AGENT_NAME_RAW=$(echo "$HOOK_INPUT" | jq -r '.tool_input.name // empty' 2>/dev/null) +if [ -z "$AGENT_NAME_RAW" ]; then echo '{}' exit 0 fi +AGENT_NAME=$(echo "$AGENT_NAME_RAW" | tr '[:upper:]' '[:lower:]' | tr ' ' '-') # Extract agent_id from tool_response AGENT_ID=$(echo "$HOOK_INPUT" | jq -r '.tool_response.agentId // .tool_response.agent_id // empty' 2>/dev/null) @@ -63,14 +66,43 @@ fi # CREATE SESSION TRACKING FILES # ============================================================================ -SESSION_DIR=".deepwork/tmp/agent_sessions/${SESSION_ID}/${AGENT_ID}" -mkdir -p "$SESSION_DIR" +SESSION_LOG_DIR=".deepwork/tmp/agent_sessions/${SESSION_ID}/${AGENT_ID}" +mkdir -p "$SESSION_LOG_DIR" # Write timestamp flag -date -u +"%Y-%m-%dT%H:%M:%SZ" > "${SESSION_DIR}/needs_learning_as_of_timestamp" +date -u +"%Y-%m-%dT%H:%M:%SZ" > "${SESSION_LOG_DIR}/needs_learning_as_of_timestamp" # Write agent name for later lookup -echo "$AGENT_NAME" > "${SESSION_DIR}/agent_used" +echo "$AGENT_NAME" > "${SESSION_LOG_DIR}/agent_used" + +# ============================================================================ +# SYMLINK AGENT TRANSCRIPT INTO SESSION LOG FOLDER +# ============================================================================ +# The hook input includes transcript_path — the *parent* session's transcript +# (e.g., ~/.claude/projects//.jsonl). The spawned agent's +# transcript lives at: +# //subagents/agent-.jsonl +# +# We strip the .jsonl extension from transcript_path to get the session's +# subagent directory, then append subagents/agent-.jsonl. +# The resulting symlink lets the learning cycle find the transcript directly +# from the session log folder without needing to search. + +TRANSCRIPT_PATH=$(echo "$HOOK_INPUT" | jq -r '.transcript_path // empty' 2>/dev/null) + +if [ -n "$TRANSCRIPT_PATH" ]; then + # Strip .jsonl extension to get the session directory base path + # e.g., ~/.claude/projects//ad6c338b-...jsonl → ~/.claude/projects//ad6c338b-... + SESSION_TRANSCRIPT_BASE="${TRANSCRIPT_PATH%.jsonl}" + + # Build the subagent transcript path + AGENT_TRANSCRIPT="${SESSION_TRANSCRIPT_BASE}/subagents/agent-${AGENT_ID}.jsonl" + + # Create symlink only if the transcript file actually exists + if [ -f "$AGENT_TRANSCRIPT" ]; then + ln -sf "$AGENT_TRANSCRIPT" "${SESSION_LOG_DIR}/conversation_transcript.jsonl" + fi +fi # ============================================================================ # OUTPUT POST-TASK REMINDER @@ -84,7 +116,7 @@ fi if [ -n "$REMINDER" ]; then cat << EOF -{"systemMessage":"${REMINDER}"} +{"hookSpecificOutput":{"hookEventName":"PostToolUse","additionalContext":"${REMINDER}"}} EOF else echo '{}' diff --git a/learning_agents/scripts/cat_agent_guideline.sh b/learning_agents/scripts/cat_agent_guideline.sh new file mode 100755 index 00000000..f805ff99 --- /dev/null +++ b/learning_agents/scripts/cat_agent_guideline.sh @@ -0,0 +1,30 @@ +#!/bin/bash +# cat_agent_guideline.sh - Print an agent's additional learning guideline file +# +# Usage: cat_agent_guideline.sh +# +# Reads agent_used from the session folder, then cats the corresponding +# guideline file from .deepwork/learning-agents//additional_learning_guidelines/.md +# +# Example: +# cat_agent_guideline.sh .deepwork/tmp/agent_sessions/sess-1/agent-1/ issue_identification + +set -euo pipefail + +SESSION_FOLDER="${1:-}" +GUIDELINE="${2:-}" + +if [ -z "$SESSION_FOLDER" ] || [ -z "$GUIDELINE" ]; then + echo "Usage: cat_agent_guideline.sh " >&2 + exit 1 +fi + +AGENT=$(cat "$SESSION_FOLDER/agent_used" 2>/dev/null || echo "") +if [ -z "$AGENT" ]; then + exit 0 +fi + +FILE=".deepwork/learning-agents/${AGENT}/additional_learning_guidelines/${GUIDELINE}.md" +if [ -f "$FILE" ]; then + cat "$FILE" +fi diff --git a/learning_agents/scripts/create_agent.sh b/learning_agents/scripts/create_agent.sh index aefc4712..b6d8e88c 100755 --- a/learning_agents/scripts/create_agent.sh +++ b/learning_agents/scripts/create_agent.sh @@ -79,32 +79,17 @@ if [ -f "$CLAUDE_AGENT_FILE" ]; then else mkdir -p "$(dirname "$CLAUDE_AGENT_FILE")" - # Use quoted heredoc to keep backticks/dollars literal, then sed in agent name - cat > "$CLAUDE_AGENT_FILE" << 'AGENT_MD' + # The agent file invokes the generator script with the agent name. + # Path is relative to project root (where Claude Code runs commands). + cat > "$CLAUDE_AGENT_FILE" << AGENT_MD --- name: TODO description: "TODO" --- -# Core Knowledge - -!`cat .deepwork/learning-agents/__AGENT__/core-knowledge.md` - -# Topics - -Located in `.deepwork/learning-agents/__AGENT__/topics/` - -!`for f in .deepwork/learning-agents/__AGENT__/topics/*.md; do [ -f "$f" ] || continue; desc=$(awk '/^---/{c++; next} c==1 && /^name:/{sub(/^name: *"?/,""); sub(/"$/,""); print; exit}' "$f"); echo "- $(basename "$f"): $desc"; done` - -# Learnings - -Learnings are incident post-mortems from past agent sessions capturing mistakes, root causes, and generalizable insights. Review them before starting work to avoid repeating past mistakes. Located in `.deepwork/learning-agents/__AGENT__/learnings/`. +!\`learning_agents/scripts/generate_agent_instructions.sh ${AGENT_NAME}\` AGENT_MD - # Replace placeholder with actual agent name (use .bak for GNU/BSD sed portability) - sed -i.bak "s/__AGENT__/${AGENT_NAME}/g" "$CLAUDE_AGENT_FILE" - rm -f "${CLAUDE_AGENT_FILE}.bak" - echo "Created Claude agent file: ${CLAUDE_AGENT_FILE}" fi diff --git a/learning_agents/scripts/generate_agent_instructions.sh b/learning_agents/scripts/generate_agent_instructions.sh new file mode 100755 index 00000000..12125e37 --- /dev/null +++ b/learning_agents/scripts/generate_agent_instructions.sh @@ -0,0 +1,56 @@ +#!/bin/bash +# generate_agent_instructions.sh - Generate dynamic agent instructions for a LearningAgent +# +# Usage: generate_agent_instructions.sh +# +# Accepts either directory-style name (e.g., "consistency-reviewer") +# or title-case name (e.g., "Consistency Reviewer") and outputs the +# full markdown body for the agent's Claude Code agent file. +# +# Looks for the agent directory in .deepwork/learning-agents// + +set -euo pipefail + +AGENT_INPUT="${1:-}" + +if [ -z "$AGENT_INPUT" ]; then + echo "Usage: generate_agent_instructions.sh " >&2 + exit 1 +fi + +# Normalize: convert title case to directory-style (lowercase, spaces to hyphens) +AGENT_NAME=$(echo "$AGENT_INPUT" | tr '[:upper:]' '[:lower:]' | tr ' ' '-') + +AGENT_DIR=".deepwork/learning-agents/${AGENT_NAME}" + +if [ ! -d "$AGENT_DIR" ]; then + echo "Error: Agent directory not found: ${AGENT_DIR}" >&2 + exit 1 +fi + +# --- Core Knowledge --- +echo "# Core Knowledge" +echo "" +if [ -f "${AGENT_DIR}/core-knowledge.md" ]; then + cat "${AGENT_DIR}/core-knowledge.md" +else + echo "_No core knowledge file found._" +fi +echo "" + +# --- Topics --- +echo "# Topics" +echo "" +echo "Located in \`${AGENT_DIR}/topics/\`" +echo "" +for f in "${AGENT_DIR}/topics/"*.md; do + [ -f "$f" ] || continue + desc=$(awk '/^---/{c++; next} c==1 && /^name:/{sub(/^name: *"?/,""); sub(/"$/,""); print; exit}' "$f") + echo "- $(basename "$f"): $desc" +done +echo "" + +# --- Learnings --- +echo "# Learnings" +echo "" +echo "Learnings are incident post-mortems from past agent sessions capturing mistakes, root causes, and generalizable insights. Review them before starting work to avoid repeating past mistakes. Located in \`${AGENT_DIR}/learnings/\`." diff --git a/learning_agents/scripts/generate_agent_instructions_for_session.sh b/learning_agents/scripts/generate_agent_instructions_for_session.sh new file mode 100755 index 00000000..0bf8ae6f --- /dev/null +++ b/learning_agents/scripts/generate_agent_instructions_for_session.sh @@ -0,0 +1,24 @@ +#!/bin/bash +# generate_agent_instructions_for_session.sh - Generate agent instructions from a session folder +# +# Usage: generate_agent_instructions_for_session.sh +# +# Reads agent_used from the session folder and passes it to generate_agent_instructions.sh + +set -euo pipefail + +SESSION_FOLDER="${1:-}" + +if [ -z "$SESSION_FOLDER" ]; then + echo "Usage: generate_agent_instructions_for_session.sh " >&2 + exit 1 +fi + +AGENT=$(cat "$SESSION_FOLDER/agent_used" 2>/dev/null || echo "") +if [ -z "$AGENT" ]; then + echo "Error: Could not read agent_used from $SESSION_FOLDER" >&2 + exit 1 +fi + +SCRIPT_DIR=$(cd "$(dirname "$0")" && pwd) +exec "$SCRIPT_DIR/generate_agent_instructions.sh" "$AGENT" diff --git a/learning_agents/scripts/list_pending_sessions.sh b/learning_agents/scripts/list_pending_sessions.sh new file mode 100755 index 00000000..78eb3534 --- /dev/null +++ b/learning_agents/scripts/list_pending_sessions.sh @@ -0,0 +1,64 @@ +#!/bin/bash +# list_pending_sessions.sh - List session log folders that need learning, grouped by agent +# +# Usage: list_pending_sessions.sh +# +# Finds all session folders containing a needs_learning_as_of_timestamp file, +# groups them by agent name, and outputs a structured list. +# +# Output format: +# ### "" agent sessions +# - +# - +# (blank line between agents) + +set -euo pipefail + +BASE=".deepwork/tmp/agent_sessions" + +if [ ! -d "$BASE" ]; then + exit 0 +fi + +# Collect agent:path pairs, sorted by agent name +pairs=() +while read -r f; do + dir=$(dirname "$f") + agent=$(cat "$dir/agent_used" 2>/dev/null || echo "unknown") + pairs+=("$agent|$dir") +done < <(find "$BASE" -name needs_learning_as_of_timestamp 2>/dev/null) + +if [ ${#pairs[@]} -eq 0 ]; then + exit 0 +fi + +# Sort by agent name +IFS=$'\n' sorted=($(printf '%s\n' "${pairs[@]}" | sort)); unset IFS + +# Group and print +current_agent="" +count=0 +paths=() + +flush_group() { + if [ -n "$current_agent" ] && [ ${#paths[@]} -gt 0 ]; then + echo "### \"$current_agent\" agent sessions" + for p in "${paths[@]}"; do + echo "- $p" + done + echo "" + fi +} + +for pair in "${sorted[@]}"; do + agent="${pair%%|*}" + path="${pair#*|}" + + if [ "$agent" != "$current_agent" ]; then + flush_group + current_agent="$agent" + paths=() + fi + paths+=("$path") +done +flush_group diff --git a/learning_agents/skills/create-agent/SKILL.md b/learning_agents/skills/create-agent/SKILL.md index aa5189c1..2072aa1a 100644 --- a/learning_agents/skills/create-agent/SKILL.md +++ b/learning_agents/skills/create-agent/SKILL.md @@ -1,8 +1,6 @@ --- name: create-agent description: Creates a new LearningAgent with directory structure, core-knowledge.md, and Claude Code agent file. Guides the user through initial configuration. -disable-model-invocation: true -allowed-tools: Read, Edit, Write, Bash, Glob --- # Create LearningAgent @@ -96,8 +94,33 @@ Output in this format: - `.deepwork/learning-agents//topics/` — topic documentation - `.deepwork/learning-agents//learnings/` — experience-based insights - `.claude/agents/.md` — Claude Code agent file +``` + +Then output the following restart warning **exactly as written**, including the emphasis markers and blank lines. This is critical — the user WILL NOT be able to use the new agent without doing this: + +``` +--- + +## !! IMPORTANT — YOU MUST RESTART CLAUDE CODE !! + +Claude Code does NOT hot-reload agent files. The agent you just created +is **invisible** to Claude until you restart. + +**Do this now:** -**Usage:** +1. Exit this Claude Code session (type `/exit` or Ctrl+C) +2. Restart with: `claude -c` (this continues your conversation) +3. The new agent will then be available via the Task tool + +If you skip this step, Claude will not find the agent when you try to use it. + +--- +``` + +After the restart warning, output: + +``` +**Usage (after restart):** Use the Task tool with `name: ""` to invoke this agent. **Learning cycle:** diff --git a/learning_agents/skills/identify/SKILL.md b/learning_agents/skills/identify/SKILL.md index 881ee616..16493ce1 100644 --- a/learning_agents/skills/identify/SKILL.md +++ b/learning_agents/skills/identify/SKILL.md @@ -2,8 +2,6 @@ name: identify description: Reads a session transcript and identifies issues where a LearningAgent made mistakes, had knowledge gaps, or underperformed. Creates issue files for each problem found. user-invocable: false -disable-model-invocation: true -allowed-tools: Read, Grep, Glob, Skill --- # Identify Issues in Session Transcript @@ -12,7 +10,7 @@ You are an expert AI quality reviewer analyzing session transcripts to surface a ## Arguments -`$ARGUMENTS` is the path to the session/agent_id folder (e.g., `.deepwork/tmp/agent_sessions///`). +`$ARGUMENTS` is the path to the session log folder (e.g., `.deepwork/tmp/agent_sessions///`). ## Context @@ -20,58 +18,52 @@ You are an expert AI quality reviewer analyzing session transcripts to surface a **Last learning timestamp** (empty if never learned): !`cat $ARGUMENTS/learning_last_performed_timestamp 2>/dev/null` -**Additional identification guidelines**: -!`cat .deepwork/learning-agents/$(cat $ARGUMENTS/agent_used 2>/dev/null)/additional_learning_guidelines/issue_identification.md 2>/dev/null` - -## Procedure - -### Step 1: Locate the Transcript - -Extract the session_id from `$ARGUMENTS` by taking the second-to-last path component. For example, from `.deepwork/tmp/agent_sessions/abc123/agent456/`, the session_id is `abc123`. - -Use Glob to find the transcript file by substituting the actual session_id: -``` -~/.claude/projects/**/sessions/abc123/*.jsonl -``` +**Existing issue files** (avoid duplicates): +!`ls $ARGUMENTS/*.issue.yml 2>/dev/null || echo "(none)"` -If no transcript is found, report the error (include the session_id and Glob pattern used) and stop. +**Additional identification guidelines**: +!`learning_agents/scripts/cat_agent_guideline.sh $ARGUMENTS issue_identification` -### Step 2: Read the Transcript +**Session log folder structure**: +!`cat learning_agents/doc/learning_log_folder_structure.md 2>/dev/null` -Read the transcript file. The transcript is a JSONL file (one JSON object per line). Each line has a `type` field — agent turns appear as `type: "assistant"` messages and tool results appear as `type: "tool_result"`. Focus on assistant message content and tool call outcomes to evaluate agent behavior. +## Procedure -If `learning_last_performed_timestamp` exists (shown in Context above), skip lines that occurred before that timestamp — only analyze new interactions since the last learning cycle. +### Step 1: Read the Transcript -Focus on interactions involving the agent identified in `agent_used`. +Read `$ARGUMENTS/conversation_transcript.jsonl`. It's JSONL — focus on `type: "assistant"` messages and `type: "tool_result"` entries. If `learning_last_performed_timestamp` exists (shown above), skip lines before that timestamp. -### Step 3: Identify Issues +### Step 2: Identify Issues Look for these categories of problems: 1. **Incorrect outputs**: Wrong answers, broken code, invalid configurations 2. **Knowledge gaps**: The agent didn't know something it should have 3. **Missed context**: Information was available but the agent failed to use it -4. **Poor judgment**: The agent made a questionable decision or took a suboptimal approach +4. **Poor judgment**: Questionable decisions or suboptimal approaches 5. **Pattern failures**: Repeated errors suggesting a systemic issue -Skip trivial issues like: -- Minor formatting differences -- Environmental issues (network timeouts, tool failures) -- Issues already covered by existing learnings +Skip trivial issues (minor formatting, environmental failures, issues already covered by existing learnings or issue files listed above). -### Step 4: Report Each Issue +### Step 3: Report Each Issue -For each issue identified, invoke the `report-issue` skill once per issue: +For each issue, invoke the report-issue skill: ``` -Skill learning-agents:report-issue $ARGUMENTS "" " +Skill learning-agents:report-issue $ARGUMENTS "" "" ``` -Example: `Skill learning-agents:report-issue .deepwork/tmp/agent_sessions/abc123/agent456/ "Knowledge gap: Agent did not know that date -v-30d is macOS-only syntax"` +### Step 4: Clean Up if No Issues -### Step 5: Summary +If **zero** issues were found, delete the `needs_learning_as_of_timestamp` file from the session folder: + +``` +rm $ARGUMENTS/needs_learning_as_of_timestamp +``` -Output in this format: +This marks the session as fully processed so that the investigate and incorporate steps can be skipped. + +### Step 5: Summary ``` ## Session Issue Summary @@ -84,7 +76,7 @@ Output in this format: |---|----------|-------------------| | 1 | | | -(or: "No actionable issues found. Agent performed well in this session.") +(or: "No actionable issues found. No follow-up needed — session marked as processed.") ``` ## Guardrails diff --git a/learning_agents/skills/incorporate-learnings/SKILL.md b/learning_agents/skills/incorporate-learnings/SKILL.md index e99a51ec..44d4e85e 100644 --- a/learning_agents/skills/incorporate-learnings/SKILL.md +++ b/learning_agents/skills/incorporate-learnings/SKILL.md @@ -2,13 +2,11 @@ name: incorporate-learnings description: Takes investigated issues and incorporates the learnings into the LearningAgent's knowledge base by updating core-knowledge.md, topics, or learnings files. user-invocable: false -disable-model-invocation: true -allowed-tools: Read, Grep, Glob, Edit, Write --- # Incorporate Learnings -Take investigated issues and integrate the lessons learned into the LearningAgent's knowledge base. +Take investigated issues and integrate the lessons learned into the LearningAgent's knowledge. ## Arguments @@ -18,133 +16,94 @@ Take investigated issues and integrate the lessons learned into the LearningAgen **Agent used**: !`cat $ARGUMENTS/agent_used 2>/dev/null || echo "unknown"` -If `agent_used` is "unknown", stop and report an error — the session folder is missing required metadata. +If `agent_used` is "unknown", stop and report an error. + +**Investigated Issues you need to Process**: +!`grep -l 'status: investigated' $ARGUMENTS/*.issue.yml 2>/dev/null || echo "(none)"` **Additional incorporation guidelines**: -!`cat .deepwork/learning-agents/$(cat $ARGUMENTS/agent_used 2>/dev/null)/additional_learning_guidelines/learning_from_issues.md 2>/dev/null` +!`learning_agents/scripts/cat_agent_guideline.sh $ARGUMENTS learning_from_issues` ## Procedure -### Step 1: Find Investigated Issues - -List all issue files with status `investigated`: - -```bash -grep -l 'status: investigated' $ARGUMENTS/*.issue.yml -``` - -If no investigated issues are found, report that and skip to Step 5 (still update tracking files). - -### Step 2: Read Agent Knowledge Base - -Read the current state of the agent's knowledge: - -- `.deepwork/learning-agents//core-knowledge.md` -- `.deepwork/learning-agents//topics/*.md` -- `.deepwork/learning-agents//learnings/*.md` - -Where `` is from `$ARGUMENTS/agent_used`. - -### Step 3: Incorporate Each Issue +If no investigated issues are listed above, skip to Step 3 (still update tracking files). -For each investigated issue, read the issue file (both `issue_description` and `investigation_report`) and determine the best way to incorporate the learning. Apply options in this priority order: +### Step 1: Incorporate Each Issue -#### Option D (first priority): Amend existing content +For each investigated issue, read its `issue_description` and `investigation_report`, then choose the best incorporation strategy in this priority order: -Check first. If a closely related file already exists in the agent's knowledge base that covers the same area, edit that file rather than creating a new one. +#### A. Amend existing content (prefer this) -Example: Issue "Agent used wrong retry count" when `topics/retry-handling.md` already exists → update the existing topic with the correct information. +If a closely related file already exists in the agent's knowledge base, edit it rather than creating a new one. -#### Option A: Update `core-knowledge.md` +#### B. Update `core-knowledge.md` -Use when the issue is a **universal one-liner** — something fundamental the agent should always know that can be expressed in 1-2 sentences. +For **universal one-liners** — something fundamental expressible in 1-2 sentences. -Example: Issue "Agent called a python program directly that only works with `uv run`" → add a bullet to `core-knowledge.md`: "Always use `uv run` when invoking `util.py`." +#### C. Add a new topic in `topics/` -#### Option B: Add a new topic in `topics/` - -Use when the issue reveals a new or existing **conceptual area** needing 1+ paragraphs of reference material that is not always needed, but often enough to track. +For a **conceptual area** needing 1+ paragraphs of reference material. Use this frontmatter: ```markdown --- name: keywords: - -last_updated: +last_updated: --- - - ``` -Example: Issue "Agent didn't understand retry backoff patterns" → create `topics/retry-backoff.md` with documentation on exponential backoff, jitter, and dead letter queues. - -#### Option C: Add a new learning in `learnings/` +#### D. Add a new learning in `learnings/` -Use when the **narrative context of how the issue unfolded** is needed to understand the resolution — multi-step debugging sessions, surprising interactions, or subtle misunderstandings. +For cases where the **narrative context** of how the issue unfolded is needed. Use this frontmatter: ```markdown --- -name: -last_updated: +name: +last_updated: summarized_result: | <1-3 sentence summary of the key takeaway> --- ## Context - - ## Investigation - - ## Resolution - - ## Key Takeaway - ``` -Example: Issue "Agent spent 20 minutes debugging a permissions error that was actually caused by a stale Docker volume" → create a learning capturing the full debugging narrative and the insight about checking Docker volumes early. +If you add a learning, also consider adding a brief note to a related Topic referencing it. -**IMPORTANT**: If you add a `learnings` entry, you may want to also add a brief note to a Topic with reference to the learning too. +#### E. Do nothing -#### Option D: Do nothing -If you decide that the issue would have been hard to prevent, or if it seems extremely unlikely that it will be encountered again, forgo any changes and just move on to step 4. +If the issue is unlikely to recur or would be hard to prevent, skip it. -### Step 4: Update Issue Status +### Step 2: Update Issue Status -For each incorporated issue, use Edit to change `status: investigated` to `status: learned` in the issue file. +For each incorporated issue, change `status: investigated` to `status: learned`. -### Step 5: Update Session Tracking +### Step 3: Update Session Tracking Always run this step, even if no issues were incorporated. -1. Delete `needs_learning_as_of_timestamp` if it exists: - ```bash - [ -f $ARGUMENTS/needs_learning_as_of_timestamp ] && rm $ARGUMENTS/needs_learning_as_of_timestamp - ``` - -2. Write the current timestamp to `learning_last_performed_timestamp`: - ```bash - date -u +"%Y-%m-%dT%H:%M:%SZ" > $ARGUMENTS/learning_last_performed_timestamp - ``` - -### Step 6: Summary +```bash +[ -f $ARGUMENTS/needs_learning_as_of_timestamp ] && rm $ARGUMENTS/needs_learning_as_of_timestamp +date -u +"%Y-%m-%dT%H:%M:%SZ" > $ARGUMENTS/learning_last_performed_timestamp +``` -Output in this format: +### Step 4: Summary ``` ## Incorporation Summary - **Issues processed**: -- | created learning | amended > -- → could not incorporate: +- | created learning | amended | skipped> ``` ## Guardrails - Do NOT create overly broad or vague learnings — be specific and actionable -- Do NOT duplicate existing knowledge — check before adding -- Do NOT remove existing content unless it is directly contradicted by new evidence +- Do NOT duplicate existing knowledge — check the auto-included lists above +- Do NOT remove existing content unless directly contradicted by new evidence - Keep `core-knowledge.md` concise — move detailed content to topics or learnings - Use today's date for `last_updated` fields -- Always run Step 5 (update tracking files) even if no issues were incorporated +- Always run Step 3 even if no issues were incorporated diff --git a/learning_agents/skills/investigate-issues/SKILL.md b/learning_agents/skills/investigate-issues/SKILL.md index e33bdc92..0d0e626c 100644 --- a/learning_agents/skills/investigate-issues/SKILL.md +++ b/learning_agents/skills/investigate-issues/SKILL.md @@ -2,8 +2,6 @@ name: investigate-issues description: Investigates identified issues in a LearningAgent session by reading the transcript, determining root causes, and updating issue files with investigation reports. user-invocable: false -disable-model-invocation: true -allowed-tools: Read, Grep, Glob, Edit --- # Investigate Issues @@ -16,87 +14,61 @@ Research identified issues from a LearningAgent session to determine their root ## Context +**Session log folder structure**: +!`cat learning_agents/doc/learning_log_folder_structure.md 2>/dev/null` + **Agent used**: !`cat $ARGUMENTS/agent_used 2>/dev/null || echo "unknown"` -**Agent core knowledge**: -!`cat .deepwork/learning-agents/$(cat $ARGUMENTS/agent_used 2>/dev/null)/core-knowledge.md 2>/dev/null` +**Identified issues to investigate**: +!`grep -l 'status: identified' $ARGUMENTS/*.issue.yml 2>/dev/null || echo "(none)"` **Additional investigation guidelines**: -!`cat .deepwork/learning-agents/$(cat $ARGUMENTS/agent_used 2>/dev/null)/additional_learning_guidelines/issue_investigation.md 2>/dev/null` - -## Procedure - -### Step 1: Find Identified Issues - -List all issue files with status `identified`: - -```bash -grep -l 'status: identified' $ARGUMENTS/*.issue.yml -``` +!`learning_agents/scripts/cat_agent_guideline.sh $ARGUMENTS issue_investigation` -If no identified issues are found, report that and stop. +## Current Agent State +-------- CURRENT KNOWLEDGE OF AGENT -------- +!`learning_agents/scripts/generate_agent_instructions_for_session.sh $ARGUMENTS` +------ END CURRENT KNOWLEDGE OF AGENT------- -### Step 2: Locate the Transcript - -Extract the session_id from `$ARGUMENTS` by taking the second-to-last path component (e.g., from `.deepwork/tmp/agent_sessions/abc123/agent456/`, the session_id is `abc123`). +## Procedure -Use Glob to find the transcript file by substituting the actual extracted session_id: -``` -~/.claude/projects/**/sessions//*.jsonl -``` +If no identified issues are listed above, report that and stop. -If no transcript file is found, report the missing path and stop. Do not proceed to investigate without transcript evidence. +Refer back to the `conversation_transcript.jsonl` file as needed in this process. -### Step 3: Investigate Each Issue +### Step 1: Investigate Each Issue For each issue file with status `identified`: 1. **Read the issue file** to understand what went wrong -2. **Search the transcript** for relevant sections — grep for keywords from `issue_description` or locate lines near timestamps in `seen_at_timestamps` +2. **Search the transcript** for relevant sections — grep for keywords from `issue_description` or locate lines near `seen_at_timestamps` 3. **Determine root cause** using this taxonomy: - **Knowledge gap**: Missing or incomplete content in `core-knowledge.md` - **Missing documentation**: A topic file does not exist or lacks needed detail - **Incorrect instruction**: An existing instruction leads the agent to wrong behavior - **Missing runtime context**: Information that should have been injected at runtime was absent -4. **Write the investigation report** explaining: - - Specific evidence from the transcript (reference line numbers) - - The root cause analysis - - What knowledge gap or instruction deficiency led to the issue -### Step 4: Update Issue Files +### Step 2: Update Issue Files For each investigated issue, use Edit to update the issue file: 1. Change `status: identified` to `status: investigated` -2. Add the `investigation_report` field with your findings: +2. Add the `investigation_report` field: ```yaml status: investigated -seen_at_timestamps: - - "2025-01-15T14:32:00Z" -issue_description: | - investigation_report: | - + ``` -### Step 5: Summary +### Step 3: Summary -Output in this format for each issue: - -``` -**Issue**: -**Root cause**: -**Recommended update type**: -``` +Simply say "Session log folder done." ## Guardrails - Do NOT modify the agent's knowledge base — that is the incorporate step's job - Do NOT change the `issue_description` — only add the `investigation_report` -- Do NOT skip issues — investigate every `identified` issue in the folder +- Do NOT skip issues — investigate every `identified` issue - Be specific about evidence — reference transcript line numbers -- Focus on actionable root causes, not blame diff --git a/learning_agents/skills/learn/SKILL.md b/learning_agents/skills/learn/SKILL.md index 87663e18..a9bc94e5 100644 --- a/learning_agents/skills/learn/SKILL.md +++ b/learning_agents/skills/learn/SKILL.md @@ -1,8 +1,6 @@ --- name: learn description: Runs the learning cycle on all LearningAgent sessions with pending transcripts. Identifies issues, investigates root causes, and incorporates learnings into agent definitions. -disable-model-invocation: true -allowed-tools: Read, Glob, Grep, Bash, Task, Skill --- # Learning Cycle @@ -13,54 +11,55 @@ Process unreviewed LearningAgent session transcripts to identify issues, investi This skill takes no arguments. It automatically discovers all pending sessions. -## Pending Sessions +## Session Log Folders needing Processing -!`find .deepwork/tmp/agent_sessions -name needs_learning_as_of_timestamp 2>/dev/null` +!`learning_agents/scripts/list_pending_sessions.sh` ## Procedure -### Step 1: Find Pending Sessions +If the list above is empty (or the `.deepwork/tmp/agent_sessions` directory does not exist), inform the user that there are no pending sessions to learn from and stop. -Check for pending learning sessions. The dynamic include above lists all `needs_learning_as_of_timestamp` files. If the list is empty (or the `.deepwork/tmp/agent_sessions` directory does not exist), inform the user that there are no pending sessions to learn from and stop. +### Step 1: Process Each Session -For each pending file, extract: -- The session folder path (parent directory of `needs_learning_as_of_timestamp`, e.g., `.deepwork/tmp/agent_sessions/sess-abc/agent-123/`) -- The agent name (read the `agent_used` file in that folder) +For each session log folder, run the learning cycle in sequence. -### Step 2: Process Each Session +#### 1a: Identify Issues -For each pending session folder, run the learning cycle in sequence. The Task pseudo-code below shows the parameters to pass to the Task tool: - -#### 2a: Identify Issues +Spawn a Task to run the identify skill: ``` Task tool call: name: "identify-issues" - subagent_type: general-purpose + subagent_type: learning-agents:learning-agent-expert model: sonnet - prompt: "Run the identify skill on the session folder: .deepwork/tmp/agent_sessions/// - Use: Skill learning-agents:identify .deepwork/tmp/agent_sessions///" + prompt: "Run: Skill learning-agents:identify " ``` -#### 2b: Investigate and Incorporate +**Run those in parallel** + +#### 1b: Investigate and Incorporate -After identification completes, spawn another Task to run investigation and incorporation in sequence: +After identification completes, **skip** any session where the identify step reported zero issues. Only proceed with sessions that had issues identified. + +For remaining sessions, start a new Task to run investigation and incorporation in sequence for each session_log_folder: ``` Task tool call: name: "investigate-and-incorporate" - subagent_type: general-purpose + subagent_type: learning-agents:learning-agent-expert model: sonnet - prompt: "Run these two skills in sequence on the session folder: .deepwork/tmp/agent_sessions/// - 1. First: Skill learning-agents:investigate-issues .deepwork/tmp/agent_sessions/// - 2. Then: Skill learning-agents:incorporate-learnings .deepwork/tmp/agent_sessions///" + prompt: "Run these two skills in sequence: + 1. Skill learning-agents:investigate-issues + 2. Skill learning-agents:incorporate-learnings " ``` +**Run session log folders from the same agent serially, but different agents in parallel.** I.e. if Agent A has 7 sessions and Agent B has 3 sessions, you should have 3 "batches" of Tasks where you do one session for Agent A and one for Agent B, then you would have 4 more Tasks run serially for the remaining Agent A sessions. + #### Handling failures If a sub-skill Task fails for a session, log the failure, skip that session, and continue processing remaining sessions. Do not mark `needs_learning_as_of_timestamp` as resolved for failed sessions. -### Step 3: Summary +### Step 2: Summary Output in this format: @@ -77,7 +76,6 @@ Output in this format: ## Guardrails -- Process sessions one at a time to avoid conflicts when multiple sessions involve the same agent -- If a session's transcript cannot be found, skip it and report the issue -- Do NOT modify agent files directly — always delegate to the learning cycle skills +- Do NOT modify agent files directly — always delegate to the learning cycle skills in Tasks - Use Sonnet model for Task spawns to balance cost and quality +- Use the `learning-agents:learning-agent-expert` agent for Task spawns diff --git a/learning_agents/skills/learning-agents/SKILL.md b/learning_agents/skills/learning-agents/SKILL.md index 95bd3c9e..ad12413b 100644 --- a/learning_agents/skills/learning-agents/SKILL.md +++ b/learning_agents/skills/learning-agents/SKILL.md @@ -11,6 +11,12 @@ Manage auto-improving AI sub-agents that learn from their mistakes across sessio `$ARGUMENTS` is the text after `/learning-agents` (e.g., for `/learning-agents create foo`, `$ARGUMENTS` is `create foo`). +## Setup Check + +Before routing, check if `.claude/session_log_folder_info.md` exists. If it does **not** exist, run `Skill learning-agents:setup` first, then continue with routing below. + +Only perform this check once per session — after the setup skill completes (or if the file already exists), proceed directly to routing for all subsequent invocations. + ## Routing Split `$ARGUMENTS` on the first whitespace. The first token is the sub-command (case-insensitive); the remainder is passed to the sub-skill. Accept both underscores and dashes in sub-command names (e.g., `report_issue` and `report-issue` are equivalent). @@ -33,9 +39,9 @@ Invoke: `Skill learning-agents:learn` Report an issue with a LearningAgent from the current session. -Invoke: `Skill learning-agents:report-issue
` +Invoke: `Skill learning-agents:report-issue
` -To construct the session folder path: search `.deepwork/tmp/agent_sessions/` for a subdirectory whose name contains the provided `agentId`. The path structure is `.deepwork/tmp/agent_sessions///`. If no match is found, inform the user. If multiple matches exist, use the most recently modified one. +To construct the session log folder path: search `.deepwork/tmp/agent_sessions/` for a subdirectory whose name contains the provided `agentId`. The path structure is `.deepwork/tmp/agent_sessions///`. If no match is found, inform the user. If multiple matches exist, use the most recently modified one. Example: `$ARGUMENTS = "report_issue abc123 Used wrong retry strategy"` → find folder matching `abc123` under `.deepwork/tmp/agent_sessions/`, then `Skill learning-agents:report-issue .deepwork/tmp/agent_sessions/sess-xyz/abc123/ Used wrong retry strategy` diff --git a/learning_agents/skills/report-issue/SKILL.md b/learning_agents/skills/report-issue/SKILL.md index 6ca643c0..aea79853 100644 --- a/learning_agents/skills/report-issue/SKILL.md +++ b/learning_agents/skills/report-issue/SKILL.md @@ -1,8 +1,6 @@ --- name: report-issue description: Creates an issue file tracking a problem observed in a LearningAgent session. Used by the identify skill and can be invoked directly to report issues in real-time. -user-invocable: false -disable-model-invocation: true --- # Report Issue @@ -71,5 +69,5 @@ Recorded: - Do NOT add an `investigation_report` field — that is added during the investigate step - Do NOT set status to anything other than `identified` -- Do NOT modify any other files in the session folder +- Do NOT modify any other files in the session log folder - Keep the `issue_description` factual and observable — describe symptoms, not root causes diff --git a/learning_agents/skills/setup/SKILL.md b/learning_agents/skills/setup/SKILL.md new file mode 100644 index 00000000..20a3d29f --- /dev/null +++ b/learning_agents/skills/setup/SKILL.md @@ -0,0 +1,61 @@ +--- +name: setup +description: Configures project permissions for the LearningAgents plugin. Adds required Bash and file access rules to .claude/settings.json. +--- + +# LearningAgents Setup + +Configure the current project's `.claude/settings.json` so the LearningAgents plugin can run without permission prompts. + +## Procedure + +### 1. Read or initialize `.claude/settings.json` + +Read `.claude/settings.json`. If it does not exist, start with: + +```json +{ + "permissions": { + "allow": [] + } +} +``` + +### 2. Add required permission rules + +Add the following entries to `permissions.allow` if they are not already present: + +| Rule | Purpose | +|------|---------| +| `Bash(learning_agents/scripts/*)` | Allow plugin scripts to run | +| `Bash(bash learning_agents/scripts/*)` | Allow plugin scripts invoked via `bash` | +| `Read(./.deepwork/tmp/**)` | Read session transcripts and temp files | +| `Write(./.deepwork/tmp/**)` | Write session logs and issue files | +| `Edit(./.deepwork/tmp/**)` | Edit issue files during investigation | + +Do not duplicate rules that already exist. Preserve all existing rules and formatting. + +### 3. Write `.claude/session_log_folder_info.md` + +Create `.claude/session_log_folder_info.md` with content describing what was configured: + +``` +LearningAgents plugin setup completed. + +Permissions added to .claude/settings.json: +- Bash(learning_agents/scripts/*) — plugin scripts +- Bash(bash learning_agents/scripts/*) — plugin scripts via bash +- Read(./.deepwork/tmp/**) — read session data +- Write(./.deepwork/tmp/**) — write session data +- Edit(./.deepwork/tmp/**) — edit session data +``` + +### 4. Confirm to the user + +Tell the user setup is complete and they can now use `/learning-agents` commands. + +## Guardrails + +- Never remove existing permission rules +- Only add rules that are not already present +- Always create the setup info file so setup is not re-triggered diff --git a/src/deepwork/standard_jobs/deepwork_jobs/job.yml b/src/deepwork/standard_jobs/deepwork_jobs/job.yml index f77e380f..530a015f 100644 --- a/src/deepwork/standard_jobs/deepwork_jobs/job.yml +++ b/src/deepwork/standard_jobs/deepwork_jobs/job.yml @@ -104,13 +104,11 @@ steps: - define - implement reviews: - - run_each: step + - run_each: .deepwork/tmp/test_feedback.md quality_criteria: - "Workflow Invoked": "The new workflow was actually run on the user's test case via MCP." - "Output Critiqued": "The agent identified up to 3 top issues with the output." - "User Feedback Gathered": "The agent asked the user about each issue and gathered additional feedback." - "Corrections Made": "All requested corrections were applied to the output." - "User Satisfied": "The user confirmed the output meets their needs." + "Test Case Documented": "The feedback file describes what test case was used and what the workflow produced." + "Issues Identified": "The feedback file lists specific issues found during output critique." + "Feedback Captured": "User feedback and requested corrections are documented with enough detail for the iterate step to act on." - id: iterate name: "Iterate on Workflow Design" diff --git a/src/deepwork/standard_jobs/deepwork_jobs/make_new_job.sh b/src/deepwork/standard_jobs/deepwork_jobs/make_new_job.sh index c87f40e8..66f619ca 100755 --- a/src/deepwork/standard_jobs/deepwork_jobs/make_new_job.sh +++ b/src/deepwork/standard_jobs/deepwork_jobs/make_new_job.sh @@ -117,8 +117,6 @@ This folder and its subfolders are managed using `deepwork_jobs` workflows. 1. **Use workflows** for structural changes (adding steps, modifying job.yml) 2. **Direct edits** are fine for minor instruction tweaks -3. **Run `deepwork_jobs/learn`** after executing job steps to capture improvements -4. **Run `deepwork install`** after any changes to regenerate commands EOF info "Created directory structure:" diff --git a/src/deepwork/standard_jobs/deepwork_jobs/steps/iterate.md b/src/deepwork/standard_jobs/deepwork_jobs/steps/iterate.md index 0abdef3e..6df0c1a1 100644 --- a/src/deepwork/standard_jobs/deepwork_jobs/steps/iterate.md +++ b/src/deepwork/standard_jobs/deepwork_jobs/steps/iterate.md @@ -112,7 +112,23 @@ Examples: - If data processing was slow, suggest a different method or tool - If file generation had issues, recommend a different library or format -### Step 6: Update Job Version +### Step 6: Create or Fix Scripts + +Review the test run for opportunities to add or improve scripts in the job's `scripts/` directory: + +1. **Fix existing scripts** - If any scripts were used during the test and had problems (wrong output, errors, edge cases), fix them now. + +2. **Create new scripts** - If any process during the test was manual, repetitive, or error-prone, and would be faster or more reliable as a script, create one. Good candidates: + - Data fetching or transformation that had to be done by hand + - File generation with specific formatting requirements + - Validation or checking steps that could be automated + - Setup or teardown tasks that will repeat on every run + +3. **Test the scripts** - Run any new or modified scripts to verify they work correctly. + +4. **Reference from instructions** - Update the relevant step instruction files to reference the new scripts so future runs use them. + +### Step 7: Update Job Version After making improvements: @@ -120,7 +136,7 @@ After making improvements: - Patch version (x.x.1) for minor instruction tweaks - Minor version (x.1.0) for quality criteria changes or significant improvements -### Step 7: Provide Recap +### Step 8: Provide Recap Summarize the improvements made: diff --git a/src/deepwork/standard_jobs/deepwork_jobs/steps/learn.md b/src/deepwork/standard_jobs/deepwork_jobs/steps/learn.md index 2136ed5a..e0c8e060 100644 --- a/src/deepwork/standard_jobs/deepwork_jobs/steps/learn.md +++ b/src/deepwork/standard_jobs/deepwork_jobs/steps/learn.md @@ -148,7 +148,23 @@ The AGENTS.md file captures project-specific knowledge that helps future agent r - Use line numbers when referencing specific code: `file.ext:42` - Group related learnings together -### Step 6: Update Job Version +### Step 6: Create or Fix Scripts + +Review the conversation for opportunities to add or improve scripts in the job's `scripts/` directory: + +1. **Fix existing scripts** - If any scripts were used during execution and had problems (wrong output, errors, edge cases), fix them now. + +2. **Create new scripts** - If any process during execution was manual, repetitive, or error-prone, and would be faster or more reliable as a script, create one. Good candidates: + - Data fetching or transformation that had to be done by hand + - File generation with specific formatting requirements + - Validation or checking steps that could be automated + - Setup or teardown tasks that will repeat on every run + +3. **Test the scripts** - Run any new or modified scripts to verify they work correctly. + +4. **Reference from instructions** - Update the relevant step instruction files to reference the new scripts so future runs use them. + +### Step 7: Update Job Version If instruction files were modified: diff --git a/src/deepwork/standard_jobs/deepwork_jobs/steps/test.md b/src/deepwork/standard_jobs/deepwork_jobs/steps/test.md index 36d27128..ce57c2ff 100644 --- a/src/deepwork/standard_jobs/deepwork_jobs/steps/test.md +++ b/src/deepwork/standard_jobs/deepwork_jobs/steps/test.md @@ -77,9 +77,21 @@ After addressing the identified issues: 3. **Confirm completion** - When the user says the output is good, confirm that testing is complete +### Step 5: Write Test Feedback + +Once the user is satisfied, write a summary of the test run to `.deepwork/tmp/test_feedback.md`. This file is consumed by the iterate step. Include: + +1. **Test case description** - What was tested +2. **Issues found during critique** - The problems identified in Step 3 +3. **User feedback** - What the user requested changed and why +4. **Corrections applied** - What was fixed +5. **Final outcome** - Whether the user was satisfied and any remaining concerns + +This file is the primary record of what happened during testing and what needs to improve in the workflow. + ### Loop Behavior -The feedback loop should continue until the user explicitly indicates satisfaction. Look for signals like: +The feedback loop (Steps 3-4) should continue until the user explicitly indicates satisfaction. Look for signals like: - "Looks good" - "That's fine" - "I'm happy with it"