Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
114 changes: 114 additions & 0 deletions JOURNAL.md
Original file line number Diff line number Diff line change
Expand Up @@ -1226,3 +1226,117 @@ The Ralph realignment is working! The CLI now:
- `npm run typecheck`
- `npm test`
- `npm run build`

## 2026-01-28 - CRITICAL FIX: Multi-task plan processing loop

### Problem
The `ghcralph run --file PLAN.md` command only processed ONE task per invocation, then exited. After successfully completing the first task in a plan file, the CLI would terminate instead of continuing to the remaining tasks. This was a critical bug that broke the core functionality of the CLI.

**Root Cause**: The `run` command in `src/commands/run.ts` only processed **one task per invocation**. There was no outer loop to continue processing the remaining pending tasks after the first task completed.

### Fix
Implemented Option A from the remediation plan (`plans/LOOP_MAJOR_BUG_REMEDIATION_PLAN.md`):

1. **Core Multi-Task Loop** (`src/commands/run.ts`):
- Added outer `while (currentTask)` loop that processes ALL pending tasks
- Creates **fresh AI agent instance** for each task (Ralph pattern core principle)
- Added task-level retry loop with configurable `maxRetriesPerTask` (default: 2)
- Prints final summary with total tasks processed/completed/failed

2. **New CLI Flag**:
- Added `--pause-between-tasks` flag for strict Ralph mode (human review after each task)

3. **New Configuration Options** (`src/core/config-schema.ts`):
- `maxRetriesPerTask: number` (default: 2) - retries per task before marking failed
- `autoPush: boolean` (default: false) - auto-push after each task completion

4. **New CheckpointManager Methods** (`src/core/checkpoint-manager.ts`):
- `createTaskCheckpoint()` - commits after successful task completion
- `createFailureCheckpoint()` - commits after failed task attempt (preserves state for post-mortem)

5. **New GitBranchManager Methods** (`src/core/git-branch-manager.ts`):
- `pushToRemote()` - pushes current branch to remote
- `hasRemote()` - checks if a remote exists

6. **New ProgressTracker Methods** (`src/core/progress-tracker.ts`):
- `loadPreviousTaskResults()` - loads previous task results for context injection
- `appendTaskResult()` - appends task result to progress file for tracking

7. **New PlanManager Interface Method** (`src/core/plan-manager.ts`):
- `reload?()` - optional method to reload plan from source (already implemented in LocalMarkdownPlan)

8. **Prompt Engineering for Honesty** (`src/core/context-builder.ts`):
- Added `HONESTY_GUIDANCE` section to prompt template
- Encourages agents to be honest about failures
- Documents blockers instead of false completion claims

9. **New STUCK Action** (`src/core/response-parser.ts`, `src/core/action-executor.ts`):
- Added `[ACTION:STUCK]` action type for graceful failure signaling
- Agents can report: attempted actions, blockers, and suggestions
- STUCK triggers retry with fresh agent (benefits from progress documentation)

10. **Utility Function** (`src/utils/shell.ts`):
- Added `waitForKeypress()` for `--pause-between-tasks` mode

### Files Modified
- `src/commands/run.ts` - Core fix with multi-task loop
- `src/core/config-schema.ts` - New config options
- `src/core/checkpoint-manager.ts` - Task-level checkpoints
- `src/core/git-branch-manager.ts` - Push to remote
- `src/core/progress-tracker.ts` - Multi-task progress tracking
- `src/core/plan-manager.ts` - Optional reload method
- `src/core/context-builder.ts` - Honesty guidance in prompt
- `src/core/response-parser.ts` - STUCK action type
- `src/core/action-executor.ts` - STUCK action handling
- `src/utils/shell.ts` - waitForKeypress utility
- `src/core/config-schema.test.ts` - Updated test for new config keys

### Validation
- `npm run typecheck` βœ…
- `npm test` βœ… (285 tests passing)
- `npm run build` βœ…

## 2026-01-28 - Model Compatibility Improvements

### Context
Following the multi-task loop fix, analyzed the `MODEL_COMPAT_TEST_PLAN.md` to address model compatibility concerns:
1. The `ghcralph init` command had a hardcoded list of 5 models
2. GitHub Copilot CLI actually offers 14+ models
3. The SDK provides `client.listModels()` API for dynamic model discovery
4. No tests existed to validate parsing across different model output styles

### Changes

1. **Dynamic Model Listing** (`src/integrations/copilot-agent.ts`):
- Added `listAvailableModels()` instance method - fetches models from existing client
- Added static `fetchAvailableModels()` method - creates temporary client to fetch models
- Re-exported `ModelInfo` type from SDK for consumers

2. **Dynamic Model Selection in Init** (`src/commands/init.ts`):
- Added `fetchModelOptions()` helper that calls `CopilotAgent.fetchAvailableModels()`
- Updated model selection prompt to use dynamically fetched models
- Falls back to hardcoded list if SDK fetch fails
- Maintains "Custom (enter manually)" option

3. **Model Compatibility Tests** (`src/core/model-compatibility.test.ts`):
- Created parameterized test suite for response parsing across model variations
- Tests CREATE, EDIT, EXECUTE, COMPLETE, and STUCK action parsing
- Documents current parser behavior with different formatting styles
- Tests edge cases: Windows line endings, mixed case action types, malformed blocks

4. **Updated CopilotAgent Tests** (`src/integrations/copilot-agent.test.ts`):
- Added `mockListModels` for SDK mock
- Added tests for `listAvailableModels()` and `fetchAvailableModels()`
- Tests error handling when SDK fetch fails

### Files Modified
- `src/integrations/copilot-agent.ts` - listAvailableModels methods
- `src/integrations/index.ts` - Export ModelInfo type
- `src/commands/init.ts` - Dynamic model fetching
- `src/core/model-compatibility.test.ts` - New parameterized tests
- `src/integrations/copilot-agent.test.ts` - listModels tests

### Validation
- `npm run typecheck` βœ…
- `npm test` βœ… (305 tests passing)
- `npm run build` βœ…
43 changes: 27 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ Run **autonomous, checkpointed coding loops** with GitHub Copilotβ€”designed to

- 🌿 **Branch isolation**: works on a `ghcralph/*` branch (never modifies `main`/`master` directly)
- πŸ’Ύ **Automatic checkpoints**: commits after each iteration for easy rollback
- πŸ”„ **Multi-task processing**: processes ALL tasks in plan files automatically
- πŸ›‘οΈ **Guardrails**: iteration limits, token budgets, timeouts, circuit breaker on repeated failures
- πŸ“‹ **Flexible plan sources**: GitHub Issues or local Markdown task lists
- πŸ’» **Cross-platform**: Windows, macOS, Linux
Expand Down Expand Up @@ -76,14 +77,15 @@ This approach prioritizes **safety** (automatic checkpoints, git isolation) and

## Key Features

- πŸ”„ **Autonomous Loop**: Repeatedly invokes AI agent until task completion
- πŸ”„ **Multi-Task Loop**: Processes ALL tasks in a plan file automatically with fresh AI agent per task
- πŸ“‹ **Flexible Plan Sources**: GitHub Issues or local Markdown task lists
- πŸ›‘οΈ **Safety First**: Git branch isolation, file deletion safeguards
- πŸ’Ύ **Automatic Checkpoints**: Git commits after each iteration for easy rollback
- πŸ’Ύ **Automatic Checkpoints**: Git commits after each task completion for easy rollback
- πŸ“Š **Progress Tracking**: Real-time status, token usage, and session logs
- ⚑ **Guardrails**: Iteration limits, token budgets, timeout controls
- ⚑ **Guardrails**: Iteration limits, token budgets, timeout controls, task-level retries
- πŸ”§ **Highly Configurable**: Customize behavior via CLI, env vars, or config files
- πŸ’» **Cross-Platform**: Works on Windows, macOS, and Linux
- πŸ€– **Dynamic Model Discovery**: Fetches available models from Copilot SDK

## Commands

Expand Down Expand Up @@ -157,6 +159,9 @@ ghcralph run --github
# Control iterations, tokens, and model via configuration
# (set maxIterations / maxTokens / defaultModel in .ghcralph/config.json)

# Pause between tasks for human review (strict Ralph mode)
ghcralph run --file PLAN.md --pause-between-tasks

# Specify context files
ghcralph run --task "Fix tests" --context "src/**/*.test.ts"

Expand Down Expand Up @@ -184,19 +189,21 @@ GitHub Copilot Ralph uses a hierarchical configuration system:

### Configuration Options

| Option | Default | Description |
| --------------- | ----------- | ----------------------------------------------------- |
| `planSource` | `local` | Plan source: `github` or `local` |
| `maxIterations` | `10` | Maximum loop iterations |
| `maxTokens` | `100000` | Token budget |
| `defaultModel` | `gpt-4.1` | Copilot model to use |
| `autoCommit` | `true` | Auto-commit after iterations |
| `branchPrefix` | `ghcralph/` | Prefix for GitHub Copilot Ralph branches |
| `githubRepo` | - | GitHub repository (owner/repo) for GitHub plan source |
| `githubLabel` | - | Default GitHub issue label filter for GitHub plan |
| `githubMilestone` | - | Default GitHub issue milestone filter for GitHub plan |
| `githubAssignee` | - | Default GitHub issue assignee filter for GitHub plan |
| `localPlanFile` | - | Path to local plan file |
| Option | Default | Description |
| ------------------ | ----------- | ----------------------------------------------------- |
| `planSource` | `local` | Plan source: `github` or `local` |
| `maxIterations` | `10` | Maximum loop iterations per task |
| `maxTokens` | `100000` | Token budget per task |
| `defaultModel` | `gpt-4.1` | Copilot model to use (dynamically fetched from SDK) |
| `autoCommit` | `true` | Auto-commit after iterations |
| `branchPrefix` | `ghcralph/` | Prefix for GitHub Copilot Ralph branches |
| `maxRetriesPerTask`| `2` | Retries per task before marking as failed |
| `autoPush` | `false` | Auto-push to remote after each task completion |
| `githubRepo` | - | GitHub repository (owner/repo) for GitHub plan source |
| `githubLabel` | - | Default GitHub issue label filter for GitHub plan |
| `githubMilestone` | - | Default GitHub issue milestone filter for GitHub plan |
| `githubAssignee` | - | Default GitHub issue assignee filter for GitHub plan |
| `localPlanFile` | - | Path to local plan file |

### Environment Variables

Expand All @@ -208,6 +215,8 @@ export GHCRALPH_MAX_TOKENS=50000
export GHCRALPH_DEFAULT_MODEL=gpt-4.1
export GHCRALPH_AUTO_COMMIT=true
export GHCRALPH_BRANCH_PREFIX=ghcralph/
export GHCRALPH_MAX_RETRIES_PER_TASK=3
export GHCRALPH_AUTO_PUSH=true
export GHCRALPH_PLAN_SOURCE=local
export GHCRALPH_GITHUB_REPO=owner/repo
export GHCRALPH_GITHUB_LABEL=ralph-ready
Expand All @@ -225,6 +234,8 @@ export GHCRALPH_GITHUB_ASSIGNEE=octocat
"defaultModel": "gpt-4.1",
"autoCommit": true,
"branchPrefix": "ghcralph/",
"maxRetriesPerTask": 2,
"autoPush": false,
"githubRepo": "owner/repo",
"githubLabel": "ralph-ready",
"githubMilestone": "v1.0",
Expand Down
57 changes: 40 additions & 17 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -457,6 +457,8 @@ graph LR
| **Context accumulation** | Model drifts with long context | Conversation history accumulates | βœ… FIXED |
| **Complex prompt template** | Meta-info confuses weaker models | Iteration/token counts in prompt | βœ… FIXED |
| **Model sensitivity** | Weaker models perform poorly | Prompt relies on implicit understanding | βœ… FIXED |
| **Single task per run** | Only first task processed, then exits | No outer loop for multi-task iteration | βœ… FIXED v0.1.2 |
| **Hardcoded model list** | Init shows outdated model options | Model list not fetched from SDK | βœ… FIXED v0.1.2 |

### Current vs Expected Flow

Expand Down Expand Up @@ -538,20 +540,38 @@ graph LR
The action executor component has been implemented in `src/core/action-executor.ts`:

**Supported Actions:**
| Action | Description | Example |
| ---------- | ------------------ | ----------------------------------------------- |
| `CREATE` | Create a new file | `[ACTION:CREATE] path: file.txt` |
| `EDIT` | Edit existing file | `[ACTION:EDIT] path: file.txt [OLD]...[NEW]...` |
| `DELETE` | Delete a file | `[ACTION:DELETE] path: file.txt` |
| `EXECUTE` | Run shell command | `[ACTION:EXECUTE] command: npm test` |
| `COMPLETE` | Mark task done | `[ACTION:COMPLETE] reason: Tests pass` |
| Action | Description | Example |
| ---------- | -------------------------- | ----------------------------------------------- |
| `CREATE` | Create a new file | `[ACTION:CREATE] path: file.txt` |
| `EDIT` | Edit existing file | `[ACTION:EDIT] path: file.txt [OLD]...[NEW]...` |
| `DELETE` | Delete a file | `[ACTION:DELETE] path: file.txt` |
| `EXECUTE` | Run shell command | `[ACTION:EXECUTE] command: npm test` |
| `COMPLETE` | Mark task done | `[ACTION:COMPLETE] reason: Tests pass` |
| `STUCK` | Signal blocked/unable | `[ACTION:STUCK] attempted:... blocker:...` |

**Safety Features:**
- Path validation (prevents escaping working directory)
- File safeguard integration (protects baseline files from deletion)
- Command timeout (30 seconds default)
- Dry run mode for testing

### 2.1.1 STUCK Action βœ… NEW in v0.1.2

The STUCK action allows the AI agent to signal when it cannot complete a task:

```
[ACTION:STUCK]
attempted: What the agent tried to do
blocker: What is preventing completion
suggestion: Optional suggestion for next steps
```

**Behavior:**
- STUCK triggers a task retry with a fresh AI agent
- The progress file documents the failed attempt for context
- After `maxRetriesPerTask` (default: 2) STUCKs, the task is marked failed
- Prevents false completion claims - encourages honest failure reporting

### 2.2 Verification Hooks βœ… IMPLEMENTED

The verification hooks component has been implemented in `src/core/verification-hooks.ts`:
Expand Down Expand Up @@ -898,16 +918,19 @@ graph TB
The current architecture successfully:
- βœ… Authenticates with GitHub Copilot
- βœ… Manages iteration loops with limits and guards
- βœ… **Processes ALL tasks in plan files** (multi-task loop)
- βœ… Creates **fresh AI agent per task** (Ralph pattern core)
- βœ… Builds context-rich prompts
- βœ… Sends/receives from Copilot SDK
- βœ… Tracks progress and tokens

The current architecture lacks:
- ❌ Structured output format specification
- ❌ Response parsing for file operations
- ❌ Action execution (file create/edit/delete)
- ❌ Command execution for verification
- ❌ Feedback loop to inform AI of results
- ❌ Clear task completion detection

To work reliably with models like gpt-4.1, the CLI needs to move from a "chat wrapper" to a true "agent executor" that defines explicit action formats, parses responses, executes actions, and provides feedback.
- βœ… Parses structured ACTION responses
- βœ… Executes file and shell actions
- βœ… Supports graceful failure with STUCK action
- βœ… Dynamic model discovery from SDK

The CLI has evolved from a "chat wrapper" to a true "agent executor" that:
1. Defines explicit action formats (CREATE, EDIT, DELETE, EXECUTE, COMPLETE, STUCK)
2. Parses AI responses for structured actions
3. Executes actions on the filesystem
4. Provides feedback to inform subsequent iterations
5. Processes multiple tasks with task-level retries and checkpoints
48 changes: 48 additions & 0 deletions docs/cookbook.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,32 @@ ghcralph run --task "Implement user authentication with JWT" \
- [ ] Add integration tests
```

### Multi-Task Processing

When you run `ghcralph run --file PLAN.md`, Ralph will:

1. **Process ALL tasks** in the plan file automatically
2. **Create a fresh AI agent** for each task (prevents context pollution)
3. **Retry failed tasks** up to `maxRetriesPerTask` times (default: 2)
4. **Commit after each task** with `createTaskCheckpoint()`
5. **Print a final summary** showing tasks processed/completed/failed

```bash
# Process all tasks in a plan file
ghcralph run --file TODO.md

# Pause between tasks for human review (strict Ralph mode)
ghcralph run --file TODO.md --pause-between-tasks
```

**Configuration:**
```json
{
"maxRetriesPerTask": 2,
"autoPush": false
}
```

---

## Pattern: Refactoring Session
Expand Down Expand Up @@ -246,6 +272,28 @@ ghcralph rollback --list
ghcralph rollback --iterations 1
```

### Task marked as STUCK

If a task is marked as STUCK (agent signaled it cannot complete):

```bash
# Check the progress file for details on what was attempted
cat .ghcralph/progress.md

# The agent will retry with fresh context up to maxRetriesPerTask times
# If all retries fail, review the blocker and consider:
# 1. Breaking the task into smaller pieces
# 2. Providing more context with --context
# 3. Resolving the blocker manually and re-running
```

**Configure retry behavior:**
```json
{
"maxRetriesPerTask": 3
}
```

### Token budget exhausted

```bash
Expand Down
4 changes: 2 additions & 2 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "ghcralph",
"version": "0.1.1",
"version": "0.1.2",
"description": "GitHub Copilot Ralph - A cross-platform CLI for running autonomous agentic coding loops using the Ralph Wiggum pattern with GitHub Copilot",
"main": "dist/index.js",
"types": "dist/index.d.ts",
Expand Down
Loading