From 7ffad0ff719dbb1898035044bfb6d877f08f2f94 Mon Sep 17 00:00:00 2001 From: Raphael Pothin Date: Wed, 28 Jan 2026 19:43:18 +0000 Subject: [PATCH 1/5] feat: add v0.1.4 remediation plan addressing commit quality, verbosity, and progress persistence issues --- plans/v0-1-4_REMEDIATION_PLAN.md | 1018 ++++++++++++++++++++++++++++++ 1 file changed, 1018 insertions(+) create mode 100644 plans/v0-1-4_REMEDIATION_PLAN.md diff --git a/plans/v0-1-4_REMEDIATION_PLAN.md b/plans/v0-1-4_REMEDIATION_PLAN.md new file mode 100644 index 0000000..b298443 --- /dev/null +++ b/plans/v0-1-4_REMEDIATION_PLAN.md @@ -0,0 +1,1018 @@ +# v0.1.4 Remediation Plan: Progress & Commit Issues + +**Created**: 2026-01-28 +**Version Target**: v0.1.4 +**Context**: Deep source code analysis of "ugly" and "bad" issues from v0.1.3 test run + +--- + +## Table of Contents + +1. [Executive Summary](#executive-summary) +2. [Issue 1: Commit Message Quality (Preamble Leak + Truncation)](#issue-1-commit-message-quality-preamble-leak--truncation) 🔴 UGLY +3. [Issue 2: Progress Verbosity Config Not Applied](#issue-2-progress-verbosity-config-not-applied) 🔴 UGLY +4. [Issue 3: Progress File Not Persisting Task History](#issue-3-progress-file-not-persisting-task-history) 🟡 BAD +5. [Issue 4: Auto-Push Not Executed Despite Config](#issue-4-auto-push-not-executed-despite-config) 🟡 BAD +6. [Implementation Priority](#implementation-priority) +7. [Success Criteria](#success-criteria) + +--- + +## Executive Summary + +Analysis of the v0.1.3 test run identified **4 issues** requiring source code remediation: + +### The Ugly (Critical - Visible Quality Issues) + +| # | Issue | Severity | Root Cause Location | Effort | +| --- | ----------------------------------------------- | -------- | -------------------------------------- | ------ | +| 1 | Commit messages: preamble leak + mid-word cuts | 🔴 UGLY | Heuristic extraction + char truncation | 1.5h | +| 2 | progressVerbosity config ignored | 🔴 UGLY | `rawResponse` never populated | 1.5h | + +### The Bad (Important - Functional Issues) + +| # | Issue | Severity | Root Cause Location | Effort | +| --- | ---------------------------------- | -------- | --------------------------------- | ------ | +| 3 | Progress file only shows last task | 🟡 BAD | `run.ts` uses wrong methods | 2h | +| 4 | Auto-push not executed | 🟡 BAD | Config default + missing CLI flag | 1h | + +**Total Estimated Effort**: ~6 hours + +### Key Design Decision: `[COMMIT_MESSAGE]` Block + +Issues 1 (preamble leak) and the former Issue 5 (truncation) are **consolidated** into a single solution using an explicit `[COMMIT_MESSAGE]` block in the agent prompt. This follows the existing `[ACTION:*]` pattern and provides: + +- **100% reliable extraction** - No heuristic pattern matching +- **Agent-crafted quality** - Agent knows commit message guidelines upfront +- **No truncation issues** - Agent respects 50-char limit from the start +- **Consistent architecture** - Same pattern as other ACTION blocks + +--- + +## Issue 1: Commit Message Quality (Preamble Leak + Truncation) + +### Problem Statement + +Two related commit message quality issues were observed: + +**Problem A - Agent Preamble Leak:** +``` +ghcralph: task 10/11 iter 1 - I'll check the current state of the c... +``` + +**Problem B - Mid-Word Truncation:** +``` +ghcralph: task 11/11 iter 1 - Task complete: Division already retur... +``` + +**Expected:** +``` +ghcralph: task 10/11 iter 1 - Verify division by zero handling +``` + +### Root Cause Analysis + +**Current Approach** (Heuristic-based): + +1. `extractSummary()` in `loop-engine.ts` takes the first line of AI response +2. AI often starts with conversational text ("I'll check...", "Let me...") +3. `checkpoint-manager.ts` truncates at 40 chars with `substring(0, 37) + '...'` +4. Truncation cuts mid-word without word-boundary awareness + +**Why Heuristics Fail**: +- Impossible to reliably filter all conversational patterns +- Must maintain regex patterns for each new preamble style +- Truncation happens after extraction, agent unaware of limits +- No feedback loop for agent to craft quality messages + +### Solution: Explicit `[COMMIT_MESSAGE]` Block + +Instead of heuristically extracting commit messages from agent responses, we introduce an explicit `[COMMIT_MESSAGE]` block that the agent must provide. + +**Comparison**: + +| Aspect | Heuristic Extraction | `[COMMIT_MESSAGE]` Block | +| --------------------- | ------------------------------ | ----------------------------- | +| **Reliability** | Regex patterns may miss cases | 100% deterministic extraction | +| **Quality** | Extracts whatever agent said | Agent crafts intentional msg | +| **Truncation** | We truncate after extraction | Agent respects limits upfront | +| **Consistency** | Varies by response format | Same pattern as `[ACTION:*]` | +| **Maintenance** | Must tune preamble patterns | Single regex, no tuning | +| **Solves Both Issues**| Needs separate truncation fix | ✅ One solution for both | + +### Implementation Details + +#### Step 1: Add `[COMMIT_MESSAGE]` to Prompt Examples + +**File**: [src/core/prompt-examples.ts](src/core/prompt-examples.ts) + +Add new example constant: + +```typescript +/** + * Example of COMMIT_MESSAGE block for git commit summaries + */ +export const COMMIT_MESSAGE_EXAMPLE = `[COMMIT_MESSAGE] +Add division operation with error handling +[/COMMIT_MESSAGE]`; +``` + +Update `FORMAT_INSTRUCTIONS`: + +```typescript +**[COMMIT_MESSAGE]** - Provide commit message for this iteration +\`\`\` +[COMMIT_MESSAGE] + +[/COMMIT_MESSAGE] +\`\`\` +``` + +Add to `ALL_EXAMPLES`: + +```typescript +### Example: Commit message for the iteration +${COMMIT_MESSAGE_EXAMPLE} +``` + +#### Step 2: Add Commit Message Guidelines to Context + +**File**: [src/core/context-builder.ts](src/core/context-builder.ts) + +Add guidelines section: + +```typescript +const COMMIT_MESSAGE_GUIDELINES = ` +## Commit Message Guidelines + +Each response SHOULD include a commit message block: + +[COMMIT_MESSAGE] + +[/COMMIT_MESSAGE] + +**Rules:** +- Maximum 50 characters +- Use imperative mood ("Add", "Fix", "Update" not "Added", "Fixed") +- Be specific about what changed +- No periods at the end +- If task is complete, describe the accomplishment +- If in progress, describe the action taken + +**Good examples:** +- "Add division operation with zero check" +- "Fix off-by-one error in loop counter" +- "Update calculator to handle decimals" + +**Bad examples:** +- "I'll check the calculator" (conversational) +- "Made some changes" (vague) +- "Fixed the thing that was broken." (period, vague) +`; +``` + +#### Step 3: Update extractSummary() with Block Extraction + +**File**: [src/core/loop-engine.ts](src/core/loop-engine.ts) +**Location**: Replace `extractSummary` method (lines 554-558) + +```typescript +/** + * Extract commit message summary from the response content. + * Priority: [COMMIT_MESSAGE] block > [ACTION:COMPLETE] reason > First action type > Fallback + */ +private extractSummary(content: string): string { + // Priority 1: Explicit [COMMIT_MESSAGE] block (preferred) + const commitMatch = content.match( + /\[COMMIT_MESSAGE\]\s*([\s\S]*?)\[\/COMMIT_MESSAGE\]/i + ); + if (commitMatch?.[1]) { + const message = commitMatch[1].trim().split('\n')[0]?.trim(); + if (message && message.length > 0) { + // Already within limits, agent followed guidelines + return message.length > 50 ? message.substring(0, 47) + '...' : message; + } + } + + // Priority 2: Extract from COMPLETE action reason (fallback for completion iterations) + const completeMatch = content.match( + /\[ACTION:COMPLETE\]\s*(?:reason:\s*)?([\s\S]*?)(?:\[\/ACTION|\[ACTION:|$)/i + ); + if (completeMatch?.[1]) { + const reason = completeMatch[1].trim().split('\n')[0]?.trim(); + if (reason && reason.length > 0) { + return this.truncateAtWord(`Complete: ${reason}`, 50); + } + } + + // Priority 3: Use first action type as summary + const actionMatch = content.match(/\[ACTION:(\w+)\]/i); + if (actionMatch?.[1]) { + const actionType = actionMatch[1].toUpperCase(); + // Try to get more context from the action + if (actionType === 'CREATE') { + const pathMatch = content.match(/\[ACTION:CREATE\]\s*(?:path:\s*)?([^\n]+)/i); + if (pathMatch?.[1]) { + return this.truncateAtWord(`Create ${pathMatch[1].trim()}`, 50); + } + } else if (actionType === 'EXECUTE') { + const cmdMatch = content.match(/\[ACTION:EXECUTE\]\s*(?:command:\s*)?([^\n]+)/i); + if (cmdMatch?.[1]) { + return this.truncateAtWord(`Run: ${cmdMatch[1].trim()}`, 50); + } + } + return `[${actionType}]`; + } + + // Priority 4: Fallback - truncate first non-preamble line + const preamblePatterns = /^(I'll|I will|Let me|I need to|I'm going|First,?|Now,?|OK|Okay|Sure|Here)/i; + for (const line of content.split('\n')) { + const trimmed = line.trim(); + if (trimmed.length > 0 && !preamblePatterns.test(trimmed)) { + return this.truncateAtWord(trimmed, 50); + } + } + + return 'Iteration update'; +} + +/** + * Truncate at word boundary to avoid mid-word cuts + */ +private truncateAtWord(str: string, maxLen: number): string { + if (str.length <= maxLen) return str; + + const ellipsis = '...'; + const targetLen = maxLen - ellipsis.length; + const truncated = str.substring(0, targetLen); + const lastSpace = truncated.lastIndexOf(' '); + + // Use word boundary if in the latter half + if (lastSpace > targetLen * 0.5) { + return truncated.substring(0, lastSpace) + ellipsis; + } + return truncated + ellipsis; +} +``` + +#### Step 4: Remove Truncation in Checkpoint Manager + +**File**: [src/core/checkpoint-manager.ts](src/core/checkpoint-manager.ts) + +Since `extractSummary()` now returns properly-sized messages, we can simplify: + +```typescript +// Line 205-208: Before +const truncatedSummary = summary.length > 40 + ? summary.substring(0, 37) + '...' + : summary; + +// Line 205-208: After +// Summary already truncated by extractSummary(), just use directly +const commitSummary = summary; +``` + +Or keep a safety limit but use word-boundary truncation: + +```typescript +const commitSummary = summary.length > 50 + ? this.truncateAtWord(summary, 50) + : summary; +``` + +### Fallback Behavior + +The solution maintains **backwards compatibility** with a graceful fallback chain: + +1. ✅ `[COMMIT_MESSAGE]` block present → use it (best) +2. ⚠️ No block, but `[ACTION:COMPLETE]` → extract reason +3. ⚠️ No COMPLETE, but other actions → use action type + context +4. ❌ No actions found → filter preamble + truncate at word + +### Testing Strategy + +```bash +# Test 1: Agent includes [COMMIT_MESSAGE] +# Expected: Exact message extracted, no truncation needed + +# Test 2: Agent omits block but has [ACTION:COMPLETE] +# Expected: "Complete: " with word-boundary truncation + +# Test 3: Agent omits block, has [ACTION:EXECUTE] only +# Expected: "Run: " summary + +# Test 4: Agent has no blocks (legacy response) +# Expected: Fallback to filtered first line, word-boundary truncated +``` + +### Migration Notes + +- **No breaking changes** - fallback chain ensures old responses still work +- **System prompt update needed** - add `[COMMIT_MESSAGE]` to prompt templates +- **Model compatibility** - `prompt-examples.ts` already supports model strength tiers + +--- + +## Issue 2: Progress Verbosity Config Not Applied + +### Problem Statement + +Config has `"progressVerbosity": "standard"` but the progress file only shows minimal info: + +```markdown +#### Iteration 1 (4:43:24 PM) ✓ +- **Tokens**: 1,534 +- **Summary**: Task complete: Division already returns integer results... +- **Duration**: 23s +``` + +Missing for "standard" verbosity: +- Actions executed list +- Command outputs + +Missing for "full" verbosity: +- Agent reasoning/response + +### Source Code Analysis + +**File**: [src/core/progress-tracker.ts](src/core/progress-tracker.ts) + +The `formatIteration` method checks for `rawResponse` and `actions`: + +```typescript +// Full verbosity: include raw response if available +if (this.verbosity === 'full' && iter.rawResponse) { // Line 398 + md += `\n**Agent Response**:\n`; + // ... +} + +// Full verbosity: include actions executed if available +if (this.verbosity === 'full' && iter.actions && iter.actions.length > 0) { // Line 404 + md += `\n**Actions Executed**:\n`; + // ... +} +``` + +**File**: [src/core/loop-engine.ts](src/core/loop-engine.ts) + +The problem: `completeIteration()` is called but **never populates** `rawResponse` or `actions`: + +```typescript +// Line 518 +const completedRecord = completeIteration( + record, + true, + result.tokenUsage.totalTokens, + summary // Only summary is passed +); +``` + +**Root Cause**: The `rawResponse` and `actions` fields are defined in the `IterationRecord` interface but never populated anywhere in the codebase. + +### Options + +#### Option A: Populate Fields in Loop Engine (Recommended) + +Modify `LoopEngine.runIteration()` to capture and store the response and actions. + +| Pros | Cons | +| ---------------------------- | ----------------------------- | +| Enables full verbosity | Stores more data in memory | +| Clean implementation | May be verbose for some users | +| Uses existing infrastructure | - | + +#### Option B: Add Verbosity Awareness to Loop Engine + +Pass verbosity config to loop engine, only capture when needed. + +| Pros | Cons | +| ----------------------- | ------------------------------------- | +| Only stores when needed | Adds config dependency | +| Memory efficient | More complex wiring | +| - | Verbosity becomes loop-engine concern | + +#### Option C: Fix "Standard" Verbosity Definition + +Currently both "minimal" and "standard" show the same output. Define "standard" to include actions. + +| Pros | Cons | +| ------------------------- | ----------------------------- | +| Clearer verbosity levels | Still needs data population | +| Matches user expectations | Must implement Option A first | + +### Recommendation: Option A + Option C Combined + +1. **Populate the data** in loop engine (Option A) +2. **Differentiate verbosity levels** in formatter (Option C) + +### Implementation Details + +#### Step 1: Capture Raw Response and Actions in Loop Engine + +**File**: [src/core/loop-engine.ts](src/core/loop-engine.ts) +**Location**: After action execution in `runIteration()` (around line 520) + +```typescript +// In runIteration(), after executing actions +// Capture action results for verbosity logging +const actionSummaries: Array<{type: string; success: boolean; summary?: string}> = []; + +for (const execResult of executionResults) { + actionSummaries.push({ + type: execResult.action.type, + success: execResult.success, + summary: execResult.message, + }); +} + +// Complete the iteration record +const completedRecord = completeIteration( + record, + true, + result.tokenUsage.totalTokens, + summary +); + +// Attach verbosity data +completedRecord.rawResponse = responseContent; +completedRecord.actions = actionSummaries; +``` + +#### Step 2: Update formatIteration for Standard Verbosity + +**File**: [src/core/progress-tracker.ts](src/core/progress-tracker.ts) +**Location**: `formatIteration` method (lines 371-414) + +```typescript +private formatIteration(iter: IterationRecord): string { + const time = iter.startedAt.toLocaleTimeString(); + const status = iter.success ? '✓' : '✗'; + + // Minimal verbosity: just iteration header + if (this.verbosity === 'minimal') { + return `#### Iteration ${iter.iteration} (${time}) ${status}\n\n`; + } + + let md = `#### Iteration ${iter.iteration} (${time}) ${status}\n\n`; + md += `- **Tokens**: ${iter.tokensUsed.toLocaleString()}\n`; + + if (iter.summary) { + md += `- **Summary**: ${iter.summary}\n`; + } + + if (iter.error) { + md += `- **Error**: ${iter.error}\n`; + } + + if (iter.endedAt) { + const duration = iter.endedAt.getTime() - iter.startedAt.getTime(); + md += `- **Duration**: ${Math.floor(duration / 1000)}s\n`; + } + + // NEW: Standard+ verbosity: include actions executed + if ((this.verbosity === 'standard' || this.verbosity === 'full') + && iter.actions && iter.actions.length > 0) { + md += `\n**Actions**:\n`; + for (const action of iter.actions) { + const actionStatus = action.success ? '✓' : '✗'; + md += `- ${actionStatus} \`[${action.type}]\``; + if (action.summary) { + // Truncate long summaries + const truncated = action.summary.length > 60 + ? action.summary.substring(0, 57) + '...' + : action.summary; + md += ` ${truncated}`; + } + md += '\n'; + } + } + + // Full verbosity only: include raw response + if (this.verbosity === 'full' && iter.rawResponse) { + md += `\n**Agent Response**:\n`; + md += `\`\`\`\n${iter.rawResponse}\n\`\`\`\n`; + } + + md += `\n`; + return md; +} +``` + +--- + +## Issue 3: Progress File Not Persisting Task History + +### Problem Statement + +After completing 11 tasks, the progress file only contains 36 lines showing the **last task only**. All previous 10 tasks' iteration logs are lost. + +### Source Code Analysis + +#### Current Architecture + +The `ProgressTracker` class has **two separate mechanisms** for progress: + +**Mechanism A: Session-Based (Full History)** - NOT USED +``` +src/core/progress-tracker.ts +├── startSession(branch, totalTasks) # Lines 123-131 +├── setCurrentTask(taskNumber, state) # Lines 152-166 +├── recordTaskCompletion(state, status, ...) # Lines 171-205 +├── saveFullSession() # Lines 210-218 +└── generateFullSessionMarkdown() # Lines 223-304 +``` + +**Mechanism B: Single-Task (Overwrites)** - USED BY RUN.TS +``` +src/core/progress-tracker.ts +├── save(state) # Lines 418-428 +├── generateMarkdown(state) # Lines 316-347 +└── appendTaskResult(task, status, ...) # Lines 552-602 +``` + +#### The Problem + +In [run.ts](src/commands/run.ts): + +```typescript +// Line 521: Called after each iteration - OVERWRITES the file each time +progressTracker.save(state).catch(() => {}); + +// Lines 563, 602, 617, 638: Called after each task - APPENDS to file +await progressTracker.appendTaskResult(activeTask, 'completed', taskAttempt, ...); +``` + +**What happens**: +1. Task 1, Iteration 1: `save()` → writes "Task 1, iter 1" to file ✓ +2. Task 1, Iteration 2: `save()` → **overwrites** with "Task 1, iter 2" ✓ +3. Task 1 complete: `appendTaskResult()` → appends "Task 1 complete" ✓ +4. Task 2, Iteration 1: `save()` → **overwrites entire file** with "Task 2, iter 1" ❌ +5. ... +6. Task 11, Iteration 1: `save()` → file only has Task 11 ❌ + +**The `save()` method uses `generateMarkdown()` which creates a fresh file every time**, wiping out all previous content including the appended task results. + +#### Why Session-Based Methods Aren't Used + +The `run.ts` command never calls: +- `startSession()` - to initialize session tracking +- `setCurrentTask()` - to register current task +- `recordTaskCompletion()` - to add task to session history + +These methods exist in `ProgressTracker` but are **never invoked**. + +### Options + +#### Option A: Use Session-Based Architecture (Recommended) + +Wire up the existing session-based methods in `run.ts`. + +| Pros | Cons | +| ---------------------------------- | --------------------------- | +| Architecture already exists | Need to update run.ts logic | +| Full iteration history preserved | Minor refactoring required | +| Cleaner code (single mechanism) | Testing effort | +| In-memory accumulation is reliable | - | + +#### Option B: Fix save() to Append Instead of Overwrite + +Modify `save()` to read existing content and append. + +| Pros | Cons | +| --------------- | ------------------------------------ | +| Minimal changes | File grows with repeated iterations | +| Quick fix | Duplicate sections per task | +| - | Need to parse to update current task | +| - | Messy file structure | + +#### Option C: Write Progress Once at Task End Only + +Remove `save()` call from iterationEnd handler, only write at task completion. + +| Pros | Cons | +| -------------------------- | ------------------------------------ | +| Simpler logic | No progress visible during iteration | +| File written once per task | Loss of iteration detail if crash | +| Cleaner file structure | - | + +### Recommendation: Option A (Use Session Architecture) + +**Rationale**: +1. Architecture already fully implemented in `ProgressTracker` +2. Session-based approach was clearly the intended design +3. `generateFullSessionMarkdown()` produces clean, structured output +4. In-memory accumulation is more reliable than file-based append +5. Minimal net code change (mostly wiring, not new logic) + +### Implementation Details + +#### Step 1: Update run.ts to Initialize Session + +**File**: [src/commands/run.ts](src/commands/run.ts) +**Location**: After `progressTracker` creation (around line 406) + +```typescript +// Create progress tracker with verbosity setting +const progressVerbosity = config.progressVerbosity ?? 'standard'; +const progressTracker = new ProgressTracker(undefined, maxIterations, progressVerbosity); + +// NEW: Initialize session for multi-task tracking +progressTracker.startSession(branchInfo?.branchName, totalTasksInPlan); +``` + +#### Step 2: Replace iterationEnd save() with setCurrentTask() + +**File**: [src/commands/run.ts](src/commands/run.ts) +**Location**: iterationEnd event handler (around line 521) + +```typescript +// BEFORE: +progressTracker.save(state).catch(() => { + // Ignore save errors +}); + +// AFTER: +progressTracker.setCurrentTask(totalTasksProcessed, state); +// Note: This updates in-memory state, file written at task completion +``` + +#### Step 3: Replace appendTaskResult() with recordTaskCompletion() + +**File**: [src/commands/run.ts](src/commands/run.ts) +**Location**: Task completion handling (lines 563, 602, 617, 638) + +```typescript +// BEFORE: +await progressTracker.appendTaskResult( + activeTask, + 'completed', + taskAttempt, + `Completed in ${finalState.iteration} iterations` +); + +// AFTER: +await progressTracker.recordTaskCompletion( + finalState, + 'completed', + taskAttempt, + `Completed in ${finalState.iteration} iterations` +); +``` + +#### Step 4: Deprecate or Remove Old Methods + +Mark `save()` and `appendTaskResult()` as deprecated: + +```typescript +/** + * @deprecated Use recordTaskCompletion() instead for multi-task runs + */ +async save(state: FullLoopState): Promise { ... } +``` + +--- + +## Issue 4: Auto-Push Not Executed Despite Config + +### Problem Statement + +After completing all 11 tasks with 15 commits, the `ghcralph/task-15-20260128` branch only exists locally. It was never pushed to the remote repository. + +### Evidence + +```bash +$ git ls-remote --heads origin +faac297... refs/heads/main # Only main exists on remote +``` + +Config shows: +```json +{ + "autoPush": false, + "pushStrategy": "per-task" +} +``` + +### Source Code Analysis + +**File**: [src/commands/run.ts](src/commands/run.ts) + +The auto-push functionality **is implemented** and works correctly: + +```typescript +// Line 425: Config read +const autoPush = config.autoPush ?? false; +const pushStrategy = config.pushStrategy ?? 'per-task'; + +// Line 584-593: Per-task push +if (autoPush && pushStrategy === 'per-task' && isGitRepo) { + info('Pushing changes to remote...'); + const pushed = await gitManager.pushToRemote(); + // ... +} + +// Line 701-709: Per-run push +if (autoPush && pushStrategy === 'per-run' && isGitRepo && totalTasksCompleted > 0) { + info('Pushing all changes to remote...'); + const pushed = await gitManager.pushToRemote(); + // ... +} +``` + +**File**: [src/core/git-branch-manager.ts](src/core/git-branch-manager.ts) + +The push method exists and works: + +```typescript +// Lines 386-397 +async pushToRemote(remote: string = 'origin', force: boolean = false): Promise { + // ... + await execAsync(`git push ${forceFlag} ${remote} ${currentBranch.name}`.trim(), { cwd: this.config.cwd }); + // ... +} +``` + +**Root Cause**: The default value `autoPush: false` means push is disabled unless user explicitly enables it. This is **correct behavior** but creates a UX gap: + +1. User might not know about `autoPush` config option +2. No CLI flag to enable push for a single run +3. Default behavior doesn't match user expectations + +### Options + +#### Option A: Add --push CLI Flag (Recommended) + +Add a `--push` flag to `ghcralph run` to enable pushing for a single run. + +```bash +ghcralph run --file ./PLAN.md --push +``` + +| Pros | Cons | +| -------------------------------- | --------------------- | +| Explicit user intent | Another flag to learn | +| Doesn't change default | - | +| One-time override without config | - | + +#### Option B: Change Default to autoPush: true + +Make auto-push enabled by default. + +| Pros | Cons | +| ------------------------- | -------------------------------- | +| Matches expectations | Breaking change for some users | +| No explicit action needed | May push to unwanted remotes | +| - | Auth issues could cause failures | + +#### Option C: Prompt User at End of Run + +Ask user if they want to push when autoPush is false. + +| Pros | Cons | +| --------------------- | -------------------------- | +| Interactive UX | Not automation-friendly | +| User stays in control | Adds delay to completion | +| - | May be annoying in scripts | + +#### Option D: Better Documentation + Init Default + +Keep default as false, but: +- `ghcralph init` prompts for autoPush preference +- README prominently documents the option +- Run output shows "Push disabled, use --push or set autoPush: true" + +| Pros | Cons | +| ------------------- | --------------------------- | +| No breaking changes | Relies on user reading docs | +| Clear messaging | Extra init complexity | +| - | - | + +### Recommendation: Option A + Option D Combined + +1. **Add `--push` CLI flag** for immediate usability +2. **Add informational message** at run end when push is disabled +3. **Document prominently** in README + +### Implementation Details + +#### Step 1: Add --push Flag to Run Command + +**File**: [src/commands/run.ts](src/commands/run.ts) +**Location**: Command options definition + +```typescript +.option('--push', 'Push changes to remote after completion') +``` + +#### Step 2: Override autoPush When Flag Present + +**File**: [src/commands/run.ts](src/commands/run.ts) +**Location**: After config loading + +```typescript +const autoPush = options.push || (config.autoPush ?? false); +``` + +#### Step 3: Add Informational Message + +**File**: [src/commands/run.ts](src/commands/run.ts) +**Location**: After final summary, when push didn't happen + +```typescript +if (!autoPush && isGitRepo && totalTasksCompleted > 0) { + console.log(''); + info(`💡 Changes not pushed. Use --push flag or set "autoPush": true in config.`); +} +``` + +--- + +## Implementation Priority + +### Phase 1: The Ugly - Critical Quality Fixes (3 hours) + +| Task | Issue | File | Effort | +| ---- | ----------------------------------- | --------------------------------------------------- | ------ | +| 1.1 | Add COMMIT_MESSAGE to prompt-examples | [prompt-examples.ts](src/core/prompt-examples.ts) | 0.5h | +| 1.2 | Add commit guidelines to context | [context-builder.ts](src/core/context-builder.ts) | 0.25h | +| 1.3 | Implement new extractSummary() with block extraction | [loop-engine.ts](src/core/loop-engine.ts) | 0.5h | +| 1.4 | Add truncateAtWord() helper | [loop-engine.ts](src/core/loop-engine.ts) | 0.25h | +| 1.5 | Capture rawResponse and actions | [loop-engine.ts](src/core/loop-engine.ts) | 0.5h | +| 1.6 | Update formatIteration for standard | [progress-tracker.ts](src/core/progress-tracker.ts) | 0.5h | +| 1.7 | Test commit messages and verbosity | - | 0.5h | + +### Phase 2: The Bad - Functional Fixes (3 hours) + +| Task | Issue | File | Effort | +| ---- | --------------------------------------- | ----------------------------------------- | ------ | +| 2.1 | Wire up session-based progress | [run.ts](src/commands/run.ts) | 1h | +| 2.2 | Replace save()/appendTaskResult() calls | [run.ts](src/commands/run.ts) | 0.5h | +| 2.3 | Add --push CLI flag | [run.ts](src/commands/run.ts) | 0.5h | +| 2.4 | Add push info message | [run.ts](src/commands/run.ts) | 0.25h | +| 2.5 | Test all Phase 2 changes | - | 0.75h | + +--- + +## Success Criteria + +### Issue 1: Commit Message Quality + +- [ ] Agent responses include `[COMMIT_MESSAGE]` blocks +- [ ] Commit messages are max 50 chars, imperative mood +- [ ] No commit messages start with "I'll", "Let me", "I need to", etc. +- [ ] No mid-word truncation in any commit messages +- [ ] Fallback chain works when block is absent + +### Issue 2: Verbosity + +- [ ] `minimal` shows only iteration header and status +- [ ] `standard` shows header + tokens + summary + duration + **actions list** +- [ ] `full` shows everything in standard + **raw agent response** +- [ ] Changing config value produces different output + +### Issue 3: Progress Persistence + +- [ ] Running 11-task plan produces progress.md with **all 11 tasks** documented +- [ ] Each task section includes iteration count, tokens, duration, summary +- [ ] File header shows run session info (branch, start time, total tasks) +- [ ] File is updated incrementally after each task completion + +### Issue 4: Auto-Push + +- [ ] `--push` flag pushes branch to remote +- [ ] Info message shown when push not enabled +- [ ] Per-task and per-run strategies both work with --push + +--- + +## Test Plan + +### Manual Testing + +```bash +# Setup +cd /workspaces/ghc-ralph-cli-demo +git checkout main +git branch -D ghcralph/task-15-20260128 # Delete old test branch +rm -rf .ghcralph/progress.md + +# Test all issues +ghcralph run --file ./PLAN.md --verbose --push + +# Verify Issue 1: Commit messages are clean +git log --oneline | head -20 +# Should see: +# - No "I'll", "Let me", etc. +# - Clean messages like "Add division operation" or "Complete: tests pass" +# - No mid-word cuts + +# Verify Issue 2: Verbosity (change config to "full" and re-run subset) + +# Verify Issue 3: Progress has all tasks +wc -l .ghcralph/progress.md # Should be > 200 lines +grep -c "## Task" .ghcralph/progress.md # Should be 11 + +# Verify Issue 4: Push worked +git ls-remote --heads origin | grep ghcralph +``` + +### Unit Tests + +Add/update tests: +- `src/core/loop-engine.test.ts` - extractSummary with [COMMIT_MESSAGE] blocks +- `src/core/prompt-examples.test.ts` - COMMIT_MESSAGE_EXAMPLE constant +- `src/core/progress-tracker.test.ts` - session-based methods + +--- + +## Appendix: Files Modified + +| File | Changes | +| ---------------------------------------------------------------- | ----------------------------------------------- | +| [src/core/prompt-examples.ts](src/core/prompt-examples.ts) | Add COMMIT_MESSAGE_EXAMPLE, update instructions | +| [src/core/context-builder.ts](src/core/context-builder.ts) | Add commit message guidelines | +| [src/core/loop-engine.ts](src/core/loop-engine.ts) | extractSummary rewrite, truncateAtWord, capture | +| [src/core/progress-tracker.ts](src/core/progress-tracker.ts) | formatIteration standard verbosity | +| [src/commands/run.ts](src/commands/run.ts) | Session methods, --push flag, info message | + +--- + +## Appendix: Expected Commit Messages + +After fixes, commit messages should look like: + +``` +# With [COMMIT_MESSAGE] block (best): +abc1234 ghcralph: task 1/11 iter 1 - Add calculator.sh with basic ops +def5678 ghcralph: task 1/11 iter 2 - Fix arithmetic syntax error +ghi9012 ghcralph: task 2/11 iter 1 - Add subtraction operation + +# Fallback to COMPLETE reason: +jkl3456 ghcralph: task 3/11 iter 2 - Complete: All operations working + +# Fallback to action type: +mno7890 ghcralph: task 4/11 iter 1 - Create src/helper.sh +pqr1234 ghcralph: task 5/11 iter 1 - Run: npm test +``` + +--- + +## Appendix: Expected Progress File Structure + +After fixes, progress.md should look like: + +```markdown +# Ralph Progress Log + +## Run Session + +- **Started**: 2026-01-28T16:39:48.839Z +- **Branch**: ghcralph/task-15-20260128 +- **Total Tasks**: 11 +- **Completed**: 11 +- **Elapsed**: 2m 55s + +--- + +## Task 1: Create calculator.sh with basic structure + +- **ID**: task-15 +- **Status**: ✅ completed +- **Attempt**: 1 +- **Iterations**: 2 +- **Tokens Used**: 2,864 +- **Started**: 2026-01-28T16:40:00.000Z +- **Completed**: 2026-01-28T16:40:45.000Z +- **Duration**: 45s +- **Summary**: Completed in 2 iterations + +### Iteration Log + +#### Iteration 1 (4:40:05 PM) ✓ + +- **Tokens**: 1,325 +- **Summary**: [ACTION:CREATE] +- **Duration**: 20s + +**Actions**: +- ✓ `[CREATE]` calculator.sh +- ✓ `[EXECUTE]` chmod +x calculator.sh && ./calculator.sh 5 + 3 + +#### Iteration 2 (4:40:25 PM) ✓ + +- **Tokens**: 1,539 +- **Summary**: Task complete: calculator.sh created with basic structure +- **Duration**: 20s + +**Actions**: +- ✓ `[EXECUTE]` ls -la *.test.* +- ✓ `[COMPLETE]` calculator.sh created with basic structure + +--- + +## Task 2: Implement addition operation (+) + +- **ID**: task-16 +- **Status**: ✅ completed +... +``` From c917658a463ae8137915b4083d191ab7c37d382b Mon Sep 17 00:00:00 2001 From: Raphael Pothin Date: Wed, 28 Jan 2026 19:56:37 +0000 Subject: [PATCH 2/5] feat: implement v0.1.4 remediation plan addressing commit quality, verbosity, progress persistence, and push flag Issue 1 - Commit Message Quality: - Add [COMMIT_MESSAGE] block to prompt examples and format instructions - Add commit message guidelines to context builder - Rewrite extractSummary() with priority-based extraction - Add truncateAtWord() helper for word-boundary truncation Issue 2 - Progress Verbosity: - Capture rawResponse and actions in loop-engine iteration records - Update formatIteration() to show actions for standard/full verbosity Issue 3 - Progress File Persistence: - Wire up session-based progress tracking (startSession, setCurrentTask) - Replace save()/appendTaskResult() with recordTaskCompletion() Issue 4 - Auto-Push Flag: - Add --push CLI flag to run command - Show info message when push is disabled but changes were made Version bumped to 0.1.4 --- JOURNAL.md | 92 ++++++++++++++++++++++++++++++++++ package-lock.json | 4 +- package.json | 2 +- src/commands/run.ts | 51 ++++++++++++------- src/core/checkpoint-manager.ts | 27 +++++++--- src/core/context-builder.ts | 39 +++++++++++++- src/core/loop-engine.ts | 86 +++++++++++++++++++++++++++++-- src/core/progress-tracker.ts | 26 ++++++---- src/core/prompt-examples.ts | 18 +++++++ 9 files changed, 302 insertions(+), 43 deletions(-) diff --git a/JOURNAL.md b/JOURNAL.md index 3a15dd3..dcde519 100644 --- a/JOURNAL.md +++ b/JOURNAL.md @@ -1540,3 +1540,95 @@ Reviewed and updated all documentation to align with v0.1.3 changes. ### Validation - `npm run typecheck` ✅ - `npm test` ✅ (311 tests passing) + +## 2026-01-28 - v0.1.4 Remediation Implementation + +### Context +After testing v0.1.3 in the ghc-ralph-cli-demo repository, we identified 4 issues requiring remediation: +1. **Commit Message Quality**: Preamble leaking ("I'll check...") and mid-word truncation +2. **Progress Verbosity**: Config not applied; standard verbosity showed minimal info +3. **Progress File Persistence**: Only last task visible; previous tasks lost on each save +4. **Auto-Push**: No CLI flag to enable push for a single run + +### Issue 1: Commit Message Quality (RESOLVED) + +**Problem**: Commit messages contained conversational preambles and were truncated mid-word. + +**Solution**: Introduced explicit `[COMMIT_MESSAGE]` block in agent prompts. + +**Changes Made**: +- `src/core/prompt-examples.ts`: + - Added `COMMIT_MESSAGE_EXAMPLE` constant + - Updated `FORMAT_INSTRUCTIONS` with `[COMMIT_MESSAGE]` block documentation + - Added commit message example to `ALL_EXAMPLES` +- `src/core/context-builder.ts`: + - Added `COMMIT_MESSAGE_GUIDELINES` constant with formatting rules + - Updated prompt template to include `{commit_message_guidelines}` placeholder + - Added replacement logic in `buildContext()` method +- `src/core/loop-engine.ts`: + - Completely rewrote `extractSummary()` with priority-based extraction: + 1. `[COMMIT_MESSAGE]` block (preferred) + 2. `[ACTION:COMPLETE]` reason (fallback) + 3. First action type with context (fallback) + 4. First non-preamble line (last resort) + - Added `truncateAtWord()` helper for word-boundary truncation +- `src/core/checkpoint-manager.ts`: + - Added `truncateAtWord()` helper function + - Replaced character-based truncation with word-boundary truncation + +### Issue 2: Progress Verbosity (RESOLVED) + +**Problem**: `progressVerbosity` config was ignored; `rawResponse` and `actions` never populated. + +**Solution**: Capture data in loop engine and differentiate verbosity levels in formatter. + +**Changes Made**: +- `src/core/loop-engine.ts`: + - After `completeIteration()`, attach `rawResponse` and `actions` to the record + - Actions derived from `executionResult.results` with type, success, summary +- `src/core/progress-tracker.ts`: + - Updated `formatIteration()` to show actions for `standard` and `full` verbosity + - Moved raw response output to `full` verbosity only + - Changed actions format to use bullet points with backtick-wrapped types + +### Issue 3: Progress File Persistence (RESOLVED) + +**Problem**: `save()` overwrote entire file each iteration, losing previous task history. + +**Solution**: Wire up the existing session-based architecture that was implemented but unused. + +**Changes Made**: +- `src/commands/run.ts`: + - Added import for `createInitialState` and `FullLoopState` types + - Called `progressTracker.startSession()` after tracker creation + - Replaced `progressTracker.save()` with `progressTracker.setCurrentTask()` in iterationEnd + - Replaced all `appendTaskResult()` calls with `recordTaskCompletion()` + - Created minimal error state for exception catch block + +### Issue 4: Auto-Push CLI Flag (RESOLVED) + +**Problem**: No way to enable push for a single run without config change. + +**Solution**: Added `--push` flag that overrides config setting. + +**Changes Made**: +- `src/commands/run.ts`: + - Added `push?: boolean` to `RunOptions` interface + - Added `.option('--push', 'Push changes to remote after completion')` + - Updated autoPush logic: `options.push === true || (config.autoPush ?? false)` + - Added informational message when push is disabled but changes were made + - Added example in help text: `$ ghcralph run --file PLAN.md --push` + +### Files Modified +- `src/core/prompt-examples.ts` - COMMIT_MESSAGE block and examples +- `src/core/context-builder.ts` - Commit message guidelines +- `src/core/loop-engine.ts` - extractSummary rewrite, rawResponse/actions capture +- `src/core/checkpoint-manager.ts` - Word-boundary truncation +- `src/core/progress-tracker.ts` - Standard verbosity actions display +- `src/commands/run.ts` - Session-based tracking, --push flag + +### Validation +- `npm run build` ✅ +- `npm run lint` ✅ +- `npm test` ✅ (311 tests passing) +- Version bumped to 0.1.4 diff --git a/package-lock.json b/package-lock.json index 72466b9..dc28d35 100644 --- a/package-lock.json +++ b/package-lock.json @@ -1,12 +1,12 @@ { "name": "ghcralph", - "version": "0.1.3", + "version": "0.1.4", "lockfileVersion": 3, "requires": true, "packages": { "": { "name": "ghcralph", - "version": "0.1.3", + "version": "0.1.4", "license": "MIT", "dependencies": { "@github/copilot-sdk": "^0.1.17", diff --git a/package.json b/package.json index efd97af..562f852 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "ghcralph", - "version": "0.1.3", + "version": "0.1.4", "description": "GitHub Copilot Ralph - A cross-platform CLI for running autonomous agentic coding loops using the Ralph Wiggum pattern with GitHub Copilot", "main": "dist/index.js", "types": "dist/index.d.ts", diff --git a/src/commands/run.ts b/src/commands/run.ts index 67450f7..0003852 100644 --- a/src/commands/run.ts +++ b/src/commands/run.ts @@ -20,8 +20,10 @@ import { CheckpointManager, FileSafeguardManager, ConfigManager, + createInitialState, type PlanManager, type TaskContext, + type FullLoopState, } from '../core/index.js'; import type { Task } from '../types/index.js'; @@ -37,6 +39,7 @@ export interface RunOptions { allowDelete?: boolean; dryRun?: boolean; pauseBetweenTasks?: boolean; + push?: boolean; } /** @@ -147,6 +150,7 @@ export function registerRunCommand(program: Command): void { .option('--allow-delete', 'Allow deletion of pre-existing files') .option('--dry-run', 'Show what would happen without executing') .option('--pause-between-tasks', 'Pause for human review after each task (strict Ralph mode)') + .option('--push', 'Push changes to remote after completion') .addHelpText('after', ` Config-backed settings (set via .ghcralph/config.json or GHCRALPH_* env vars): - maxIterations, maxTokens, defaultModel, autoCommit, branchPrefix @@ -159,6 +163,7 @@ Examples: $ ghcralph run --github $ ghcralph run --task "Fix bug" --context "src/**/*.ts" --branch fix/login-bug $ ghcralph run --task "Large refactor" --unlimited --timeout 60 + $ ghcralph run --file PLAN.md --push See also: ghcralph status View current session progress @@ -422,7 +427,7 @@ See also: let totalTasksCompleted = 0; let totalTasksFailed = 0; const maxRetriesPerTask = config.maxRetriesPerTask ?? 2; - const autoPush = config.autoPush ?? false; + const autoPush = options.push === true || (config.autoPush ?? false); const pushStrategy = config.pushStrategy ?? 'per-task'; // Compute total tasks count for commit message context (if available) @@ -437,6 +442,9 @@ See also: } } + // Initialize progress session for multi-task tracking + progressTracker.startSession(branchInfo?.branchName, totalTasksInPlan || undefined); + // ========== MULTI-TASK ITERATION LOOP ========== // This is the core fix: process ALL tasks in the plan, not just the first one @@ -517,10 +525,8 @@ See also: }); } - // Save progress after each iteration - progressTracker.save(state).catch(() => { - // Ignore save errors - }); + // Update in-memory progress state (file written at task completion) + progressTracker.setCurrentTask(totalTasksProcessed, state); }); events.on('error', (err) => { @@ -559,9 +565,9 @@ See also: info(`Task marked as complete in plan file`); } - // Document result in progress file - await progressTracker.appendTaskResult( - activeTask, + // Document result in progress file using session-based tracking + await progressTracker.recordTaskCompletion( + finalState, 'completed', taskAttempt, `Completed in ${finalState.iteration} iterations` @@ -598,9 +604,9 @@ See also: // User requested stop - exit the entire run warn('Loop was stopped by user'); - // Document the stop - await progressTracker.appendTaskResult( - activeTask, + // Document the stop using session-based tracking + await progressTracker.recordTaskCompletion( + finalState, 'stuck', taskAttempt, 'Stopped by user' @@ -613,9 +619,9 @@ See also: } else if (finalState.status === 'failed') { warn(`✗ Task attempt ${taskAttempt} failed: ${activeTask.title}`); - // Document failure for learning - await progressTracker.appendTaskResult( - activeTask, + // Document failure for learning using session-based tracking + await progressTracker.recordTaskCompletion( + finalState, 'failed', taskAttempt, 'Loop execution failed' @@ -634,9 +640,14 @@ See also: const errMsg = err instanceof Error ? err.message : String(err); error(errMsg); - // Document failure - await progressTracker.appendTaskResult( - activeTask, + // Document failure using session-based tracking with minimal error state + const errorState: FullLoopState = { + ...createInitialState(activeTask), + status: 'failed', + endedAt: new Date(), + }; + await progressTracker.recordTaskCompletion( + errorState, 'failed', taskAttempt, undefined, @@ -708,6 +719,12 @@ See also: } } + // Show informational message when push is disabled but changes were made + if (!autoPush && isGitRepo && totalTasksCompleted > 0) { + console.log(''); + info('💡 Changes not pushed. Use --push flag or set "autoPush": true in config.'); + } + if (totalTasksFailed === 0 && totalTasksCompleted > 0) { success(`🎉 All ${totalTasksCompleted} tasks completed successfully!`); } else if (totalTasksCompleted > 0) { diff --git a/src/core/checkpoint-manager.ts b/src/core/checkpoint-manager.ts index e160a55..36cff3c 100644 --- a/src/core/checkpoint-manager.ts +++ b/src/core/checkpoint-manager.ts @@ -44,6 +44,24 @@ const DEFAULT_CONFIG: CheckpointConfig = { cwd: process.cwd(), }; +/** + * Truncate at word boundary to avoid mid-word cuts + */ +function truncateAtWord(str: string, maxLen: number): string { + if (str.length <= maxLen) return str; + + const ellipsis = '...'; + const targetLen = maxLen - ellipsis.length; + const truncated = str.substring(0, targetLen); + const lastSpace = truncated.lastIndexOf(' '); + + // Use word boundary if in the latter half + if (lastSpace > targetLen * 0.5) { + return truncated.substring(0, lastSpace) + ellipsis; + } + return truncated + ellipsis; +} + /** * Checkpoint record */ @@ -202,9 +220,8 @@ export class CheckpointManager { } // Build commit message with optional task context - const truncatedSummary = summary.length > 40 - ? summary.substring(0, 37) + '...' - : summary; + // Summary is already truncated by extractSummary(), use word-boundary truncation as safety + const truncatedSummary = truncateAtWord(summary, 50); let message: string; if (taskContext) { @@ -406,9 +423,7 @@ export class CheckpointManager { } // Build commit message for task completion with optional task context - const truncatedTitle = taskTitle.length > 40 - ? taskTitle.substring(0, 37) + '...' - : taskTitle; + const truncatedTitle = truncateAtWord(taskTitle, 50); let message: string; if (taskContext) { diff --git a/src/core/context-builder.ts b/src/core/context-builder.ts index edb19fc..594d572 100644 --- a/src/core/context-builder.ts +++ b/src/core/context-builder.ts @@ -38,7 +38,9 @@ const DEFAULT_PROMPT_TEMPLATE = `You are an expert software engineer. Your task ## Instructions - Make small, focused changes - Test your changes with [ACTION:EXECUTE] -- Use [ACTION:COMPLETE] when tests pass and task is done`; +- Use [ACTION:COMPLETE] when tests pass and task is done + +{commit_message_guidelines}`; /** * Honesty guidance section to encourage accurate reporting and graceful failure handling. @@ -76,6 +78,38 @@ suggestion: IMPORTANT: Honest [ACTION:STUCK] reporting enables retry with fresh context. False [ACTION:COMPLETE] claims hide problems and compound technical debt.`; +/** + * Commit message guidelines for agent responses. + * Encourages the agent to provide quality commit messages in a structured block. + */ +const COMMIT_MESSAGE_GUIDELINES = `## Commit Message Guidelines + +Each response SHOULD include a commit message block: + +\`\`\` +[COMMIT_MESSAGE] + +[/COMMIT_MESSAGE] +\`\`\` + +**Rules:** +- Maximum 50 characters +- Use imperative mood ("Add", "Fix", "Update" not "Added", "Fixed") +- Be specific about what changed +- No periods at the end +- If task is complete, describe the accomplishment +- If in progress, describe the action taken + +**Good examples:** +- "Add division operation with zero check" +- "Fix off-by-one error in loop counter" +- "Update calculator to handle decimals" + +**Bad examples:** +- "I'll check the calculator" (conversational) +- "Made some changes" (vague) +- "Fixed the thing that was broken." (period, vague)`; + /** * Legacy prompt template with meta info (for backwards compatibility) */ @@ -281,7 +315,8 @@ export class ContextBuilder { .replace('{previous_progress}', previousProgress) .replace('{feedback_section}', feedbackSection ?? '') .replace('{output_format}', outputFormat) - .replace('{honesty_guidance}', HONESTY_GUIDANCE); + .replace('{honesty_guidance}', HONESTY_GUIDANCE) + .replace('{commit_message_guidelines}', COMMIT_MESSAGE_GUIDELINES); // Clean up multiple consecutive newlines prompt = prompt.replace(/\n{3,}/g, '\n\n').trim(); diff --git a/src/core/loop-engine.ts b/src/core/loop-engine.ts index b59e5d2..84ab377 100644 --- a/src/core/loop-engine.ts +++ b/src/core/loop-engine.ts @@ -518,6 +518,16 @@ Do not write prose explanations. Use only ACTION blocks.`; summary ); + // Attach verbosity data for progress tracking + completedRecord.rawResponse = responseContent; + if (executionResult) { + completedRecord.actions = executionResult.results.map((r) => ({ + type: r.action?.type ?? 'UNKNOWN', + success: r.success, + summary: r.message, + })); + } + this.state.iterations.push(completedRecord); this.events.emit('iterationEnd', completedRecord, this.state); } else { @@ -549,12 +559,80 @@ Do not write prose explanations. Use only ACTION blocks.`; } /** - * Extract a summary from the response + * Extract commit message summary from the response content. + * Priority: [COMMIT_MESSAGE] block > [ACTION:COMPLETE] reason > First action type > Fallback */ private extractSummary(content: string): string { - // Simple extraction - first 100 chars - const firstLine = content.split('\n')[0] ?? ''; - return firstLine.length > 100 ? firstLine.substring(0, 100) + '...' : firstLine; + // Priority 1: Explicit [COMMIT_MESSAGE] block (preferred) + const commitMatch = content.match( + /\[COMMIT_MESSAGE\]\s*([\s\S]*?)\[\/COMMIT_MESSAGE\]/i + ); + if (commitMatch?.[1]) { + const message = commitMatch[1].trim().split('\n')[0]?.trim(); + if (message && message.length > 0) { + // Already within limits, agent followed guidelines + return message.length > 50 ? this.truncateAtWord(message, 50) : message; + } + } + + // Priority 2: Extract from COMPLETE action reason (fallback for completion iterations) + const completeMatch = content.match( + /\[ACTION:COMPLETE\]\s*(?:reason:\s*)?([\s\S]*?)(?:\[\/ACTION|\[ACTION:|$)/i + ); + if (completeMatch?.[1]) { + const reason = completeMatch[1].trim().split('\n')[0]?.trim(); + if (reason && reason.length > 0) { + return this.truncateAtWord(`Complete: ${reason}`, 50); + } + } + + // Priority 3: Use first action type as summary + const actionMatch = content.match(/\[ACTION:(\w+)\]/i); + if (actionMatch?.[1]) { + const actionType = actionMatch[1].toUpperCase(); + // Try to get more context from the action + if (actionType === 'CREATE') { + const pathMatch = content.match(/\[ACTION:CREATE\]\s*(?:path:\s*)?([^\n]+)/i); + if (pathMatch?.[1]) { + return this.truncateAtWord(`Create ${pathMatch[1].trim()}`, 50); + } + } else if (actionType === 'EXECUTE') { + const cmdMatch = content.match(/\[ACTION:EXECUTE\]\s*(?:command:\s*)?([^\n]+)/i); + if (cmdMatch?.[1]) { + return this.truncateAtWord(`Run: ${cmdMatch[1].trim()}`, 50); + } + } + return `[${actionType}]`; + } + + // Priority 4: Fallback - truncate first non-preamble line + const preamblePatterns = /^(I'll|I will|Let me|I need to|I'm going|First,?|Now,?|OK|Okay|Sure|Here)/i; + for (const line of content.split('\n')) { + const trimmed = line.trim(); + if (trimmed.length > 0 && !preamblePatterns.test(trimmed)) { + return this.truncateAtWord(trimmed, 50); + } + } + + return 'Iteration update'; + } + + /** + * Truncate at word boundary to avoid mid-word cuts + */ + private truncateAtWord(str: string, maxLen: number): string { + if (str.length <= maxLen) return str; + + const ellipsis = '...'; + const targetLen = maxLen - ellipsis.length; + const truncated = str.substring(0, targetLen); + const lastSpace = truncated.lastIndexOf(' '); + + // Use word boundary if in the latter half + if (lastSpace > targetLen * 0.5) { + return truncated.substring(0, lastSpace) + ellipsis; + } + return truncated + ellipsis; } /** diff --git a/src/core/progress-tracker.ts b/src/core/progress-tracker.ts index 3577779..9991b0c 100644 --- a/src/core/progress-tracker.ts +++ b/src/core/progress-tracker.ts @@ -393,25 +393,29 @@ export class ProgressTracker { md += `- **Duration**: ${Math.floor(duration / 1000)}s\n`; } - // Full verbosity: include raw response if available - if (this.verbosity === 'full' && iter.rawResponse) { - md += `\n**Agent Response**:\n`; - md += `> ${iter.rawResponse.replace(/\n/g, '\n> ')}\n`; - } - - // Full verbosity: include actions executed if available - if (this.verbosity === 'full' && iter.actions && iter.actions.length > 0) { - md += `\n**Actions Executed**:\n`; + // Standard+ verbosity: include actions executed if available + if ((this.verbosity === 'standard' || this.verbosity === 'full') && iter.actions && iter.actions.length > 0) { + md += `\n**Actions**:\n`; for (const action of iter.actions) { const actionStatus = action.success ? '✓' : '✗'; - md += `${actionStatus} [${action.type}]`; + md += `- ${actionStatus} \`[${action.type}]\``; if (action.summary) { - md += ` ${action.summary}`; + // Truncate long summaries + const truncated = action.summary.length > 60 + ? action.summary.substring(0, 57) + '...' + : action.summary; + md += ` ${truncated}`; } md += '\n'; } } + // Full verbosity only: include raw response + if (this.verbosity === 'full' && iter.rawResponse) { + md += `\n**Agent Response**:\n`; + md += `\`\`\`\n${iter.rawResponse}\n\`\`\`\n`; + } + md += `\n`; return md; } diff --git a/src/core/prompt-examples.ts b/src/core/prompt-examples.ts index 63c4952..dc5bbda 100644 --- a/src/core/prompt-examples.ts +++ b/src/core/prompt-examples.ts @@ -68,6 +68,13 @@ attempted: Tried 3 different approaches to fix the syntax error in case statemen blocker: Bash case syntax requires specific quoting for * character that conflicts with shell expansion suggestion: Consider using a different operator syntax or escaping approach - may need human review`; +/** + * Example of COMMIT_MESSAGE block for git commit summaries + */ +export const COMMIT_MESSAGE_EXAMPLE = `[COMMIT_MESSAGE] +Add division operation with error handling +[/COMMIT_MESSAGE]`; + /** * All examples combined for inclusion in prompts */ @@ -86,6 +93,9 @@ ${COMPLETE_EXAMPLE} ### Example: Report when stuck (be honest if you can't complete!) ${STUCK_EXAMPLE} + +### Example: Commit message for the iteration +${COMMIT_MESSAGE_EXAMPLE} `; /** @@ -171,12 +181,20 @@ blocker: suggestion: \`\`\` +**[COMMIT_MESSAGE]** - Provide commit message for this iteration +\`\`\` +[COMMIT_MESSAGE] + +[/COMMIT_MESSAGE] +\`\`\` + ## Important Rules 1. Use EXACT text for [OLD] blocks - must match file contents precisely 2. Use [ACTION:EXECUTE] to run tests before marking complete 3. Only use [ACTION:COMPLETE] when ALL tests pass with exit code 0 4. Use [ACTION:STUCK] if you cannot complete - honesty is critical! 5. One action per block - you can include multiple blocks +6. Include a [COMMIT_MESSAGE] block to summarize what you did `; /** From f1030ce0ce03091e849113039e485457da700b46 Mon Sep 17 00:00:00 2001 From: Raphael Pothin Date: Wed, 28 Jan 2026 20:32:36 +0000 Subject: [PATCH 3/5] docs: update documentation for v0.1.4 features - README.md: Add --push flag, update progressVerbosity description - cookbook.md: Add --push examples, update verbosity levels - architecture.md: Add v0.1.4 Enhancements section --- JOURNAL.md | 32 +++++++++++++++ README.md | 5 ++- docs/architecture.md | 98 ++++++++++++++++++++++++++++++++++++++++++++ docs/cookbook.md | 16 +++++--- 4 files changed, 145 insertions(+), 6 deletions(-) diff --git a/JOURNAL.md b/JOURNAL.md index dcde519..01992a5 100644 --- a/JOURNAL.md +++ b/JOURNAL.md @@ -1632,3 +1632,35 @@ After testing v0.1.3 in the ghc-ralph-cli-demo repository, we identified 4 issue - `npm run lint` ✅ - `npm test` ✅ (311 tests passing) - Version bumped to 0.1.4 + +## 2026-01-28 - v0.1.4 Documentation Updates + +### Context +Updated documentation to reflect v0.1.4 changes. + +### Changes Made + +**README.md:** +- Added `--push` flag to "Advanced Run Options" section +- Updated `progressVerbosity` description to note actions are shown at standard level + +**docs/cookbook.md:** +- Added `--push` flag example in Multi-Task Processing section +- Updated Progress Verbosity descriptions (standard now includes actions) +- Enhanced Push troubleshooting section with `--push` flag usage + +**docs/architecture.md:** +- Added new "v0.1.4 Enhancements" section documenting: + - Commit Message Quality (`[COMMIT_MESSAGE]` block) + - Session-Based Progress Tracking + - Push CLI Flag + - Standard Verbosity Actions + +### Files Modified +- `README.md` +- `docs/cookbook.md` +- `docs/architecture.md` + +### Validation +- `npm run build` ✅ +- `npm run lint` ✅ diff --git a/README.md b/README.md index e5f9670..07919e3 100644 --- a/README.md +++ b/README.md @@ -162,6 +162,9 @@ ghcralph run --github # Pause between tasks for human review (strict Ralph mode) ghcralph run --file PLAN.md --pause-between-tasks +# Push changes to remote after completion +ghcralph run --file PLAN.md --push + # Specify context files ghcralph run --task "Fix tests" --context "src/**/*.test.ts" @@ -200,7 +203,7 @@ GitHub Copilot Ralph uses a hierarchical configuration system: | `maxRetriesPerTask` | `2` | Retries per task before marking as failed | | `autoPush` | `false` | Auto-push to remote after task completion | | `pushStrategy` | `per-task` | When to push: `per-task`, `per-run`, or `manual` | -| `progressVerbosity` | `standard` | Progress file detail level: `minimal`, `standard`, or `full` | +| `progressVerbosity` | `standard` | Progress file detail level: `minimal`, `standard` (+ actions), or `full` | | `githubRepo` | - | GitHub repository (owner/repo) for GitHub plan source | | `githubLabel` | - | Default GitHub issue label filter for GitHub plan | | `githubMilestone` | - | Default GitHub issue milestone filter for GitHub plan | diff --git a/docs/architecture.md b/docs/architecture.md index 7f88441..b457a3a 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -1087,3 +1087,101 @@ The ActionExecutor now warns when COMPLETE is used despite failed commands: if (action.type === 'COMPLETE' && this.hasFailedCommands()) { warn(`⚠️ Task marked complete despite ${failures.length} command failures`); } + +``` + +--- + +## v0.1.4 Enhancements + +### Commit Message Quality + +v0.1.4 introduces structured `[COMMIT_MESSAGE]` blocks for better commit message quality: + +**Prompt Addition:** +```typescript +// src/core/prompt-examples.ts +export const COMMIT_MESSAGE_EXAMPLE = `[COMMIT_MESSAGE] +Add division operation with error handling +[/COMMIT_MESSAGE]`; +``` + +**Extraction Priority Chain:** +```typescript +// src/core/loop-engine.ts - extractSummary() +// 1. Explicit [COMMIT_MESSAGE] block (preferred) +// 2. [ACTION:COMPLETE] reason (for completions) +// 3. First action type with context (e.g., "Create src/utils.ts") +// 4. First non-preamble line (fallback) +``` + +**Word-Boundary Truncation:** +```typescript +// Avoids mid-word cuts like "implementati..." +function truncateAtWord(str: string, maxLen: number): string { + if (str.length <= maxLen) return str; + const targetLen = maxLen - 3; // Room for "..." + const truncated = str.substring(0, targetLen); + const lastSpace = truncated.lastIndexOf(' '); + if (lastSpace > targetLen * 0.5) { + return truncated.substring(0, lastSpace) + '...'; + } + return truncated + '...'; +} +``` + +### Session-Based Progress Tracking + +v0.1.4 fixes progress file persistence to retain all task history: + +**Architecture:** +```mermaid +graph LR + subgraph "Old (v0.1.3)" + Save["save() per iteration"] + Overwrite["Overwrites file"] + end + + subgraph "New (v0.1.4)" + Session["startSession()"] + SetTask["setCurrentTask()"] + Record["recordTaskCompletion()"] + SaveFull["saveFullSession()"] + end + + Save --> Overwrite + Session --> SetTask --> Record --> SaveFull +``` + +**Methods Used:** +- `startSession(branch, totalTasks)` - Initialize session tracking +- `setCurrentTask(taskNumber, state)` - Update in-memory state per iteration +- `recordTaskCompletion(state, status, ...)` - Add task to history and persist + +### Push CLI Flag + +v0.1.4 adds `--push` flag for one-time push override: + +```bash +# Push without changing config +ghcralph run --file PLAN.md --push + +# Informational message when push disabled +# 💡 Changes not pushed. Use --push flag or set "autoPush": true in config. +``` + +### Standard Verbosity Actions + +v0.1.4 includes executed actions in `standard` verbosity (previously `full` only): + +```markdown +#### Iteration 1 (4:40:05 PM) ✓ + +- **Tokens**: 1,325 +- **Summary**: Create calculator.sh +- **Duration**: 20s + +**Actions**: +- ✓ `[CREATE]` calculator.sh +- ✓ `[EXECUTE]` chmod +x calculator.sh +``` diff --git a/docs/cookbook.md b/docs/cookbook.md index 072670a..3042303 100644 --- a/docs/cookbook.md +++ b/docs/cookbook.md @@ -108,6 +108,9 @@ ghcralph run --file TODO.md # Pause between tasks for human review (strict Ralph mode) ghcralph run --file TODO.md --pause-between-tasks + +# Push changes to remote after completion +ghcralph run --file TODO.md --push ``` **Configuration:** @@ -127,8 +130,8 @@ ghcralph run --file TODO.md --pause-between-tasks **Progress Verbosity:** - `minimal`: Just iteration headers (for CI) -- `standard` (default): Tokens, summary, duration -- `full`: Standard + raw AI response + actions (for debugging) +- `standard` (default): Tokens, summary, duration, executed actions +- `full`: Standard + raw AI response (for debugging) --- @@ -392,7 +395,10 @@ Check your push configuration: ghcralph config get autoPush ghcralph config get pushStrategy -# Enable auto-push +# Enable auto-push for a single run (no config change) +ghcralph run --file PLAN.md --push + +# Enable auto-push permanently ghcralph config set autoPush true # Set push strategy (per-task, per-run, or manual) @@ -415,8 +421,8 @@ ghcralph config set progressVerbosity full **Verbosity Levels:** - `minimal`: Just iteration header (for CI) -- `standard`: Tokens, summary, duration (default) -- `full`: Standard + raw AI response + executed actions +- `standard`: Tokens, summary, duration, executed actions (default) +- `full`: Standard + raw AI response --- From b4215f60777a0faebaf3fc9bd4eb158f151dcca9 Mon Sep 17 00:00:00 2001 From: Raphael Pothin Date: Wed, 28 Jan 2026 20:52:51 +0000 Subject: [PATCH 4/5] refactor: remove --push flag, improve push reminder message Remove CLI flag to avoid confusion with config-based push behavior. Configuration provides clear options: per-task, per-run, or manual. Updated message when autoPush is false: - Shows how to push manually: git push - Shows how to enable auto-push in config --- JOURNAL.md | 27 +++++++++++++++++++++++++++ README.md | 3 --- docs/architecture.md | 13 +++++-------- docs/cookbook.md | 12 +++--------- src/commands/run.ts | 8 +++----- 5 files changed, 38 insertions(+), 25 deletions(-) diff --git a/JOURNAL.md b/JOURNAL.md index 01992a5..bb9f08b 100644 --- a/JOURNAL.md +++ b/JOURNAL.md @@ -1664,3 +1664,30 @@ Updated documentation to reflect v0.1.4 changes. ### Validation - `npm run build` ✅ - `npm run lint` ✅ + +## 2026-01-28 - v0.1.4 Remove --push Flag + +### Context +After discussion, decided to remove the `--push` CLI flag to avoid confusion with the configuration-based push behavior. The configuration already provides 3 clear options: +- `autoPush: true` + `pushStrategy: "per-task"` - Push after each task +- `autoPush: true` + `pushStrategy: "per-run"` - Push at end of run +- `autoPush: false` - No auto-push (manual review and push) + +### Changes Made +- Removed `push?: boolean` from `RunOptions` interface +- Removed `.option('--push', ...)` from command definition +- Removed example from help text +- Reverted `autoPush` logic to use config only +- Updated informational message to be more helpful: + - "💡 Changes committed locally. Review and push manually with: git push" + - " To enable auto-push, set \"autoPush\": true in .ghcralph/config.json" + +### Documentation Updated +- `README.md` - Removed --push from Advanced Run Options +- `docs/cookbook.md` - Removed --push examples, updated push troubleshooting +- `docs/architecture.md` - Replaced "Push CLI Flag" section with "Push Reminder Message" + +### Validation +- `npm run build` ✅ +- `npm run lint` ✅ +- `npm test` ✅ (311 tests passing) diff --git a/README.md b/README.md index 07919e3..8da0707 100644 --- a/README.md +++ b/README.md @@ -162,9 +162,6 @@ ghcralph run --github # Pause between tasks for human review (strict Ralph mode) ghcralph run --file PLAN.md --pause-between-tasks -# Push changes to remote after completion -ghcralph run --file PLAN.md --push - # Specify context files ghcralph run --task "Fix tests" --context "src/**/*.test.ts" diff --git a/docs/architecture.md b/docs/architecture.md index b457a3a..d11dac1 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -1158,16 +1158,13 @@ graph LR - `setCurrentTask(taskNumber, state)` - Update in-memory state per iteration - `recordTaskCompletion(state, status, ...)` - Add task to history and persist -### Push CLI Flag +### Push Reminder Message -v0.1.4 adds `--push` flag for one-time push override: +v0.1.4 shows a helpful message when auto-push is disabled: -```bash -# Push without changing config -ghcralph run --file PLAN.md --push - -# Informational message when push disabled -# 💡 Changes not pushed. Use --push flag or set "autoPush": true in config. +``` +💡 Changes committed locally. Review and push manually with: git push + To enable auto-push, set "autoPush": true in .ghcralph/config.json ``` ### Standard Verbosity Actions diff --git a/docs/cookbook.md b/docs/cookbook.md index 3042303..be9af19 100644 --- a/docs/cookbook.md +++ b/docs/cookbook.md @@ -108,9 +108,6 @@ ghcralph run --file TODO.md # Pause between tasks for human review (strict Ralph mode) ghcralph run --file TODO.md --pause-between-tasks - -# Push changes to remote after completion -ghcralph run --file TODO.md --push ``` **Configuration:** @@ -395,10 +392,7 @@ Check your push configuration: ghcralph config get autoPush ghcralph config get pushStrategy -# Enable auto-push for a single run (no config change) -ghcralph run --file PLAN.md --push - -# Enable auto-push permanently +# Enable auto-push ghcralph config set autoPush true # Set push strategy (per-task, per-run, or manual) @@ -406,9 +400,9 @@ ghcralph config set pushStrategy per-task ``` **Push Strategy Options:** -- `per-task`: Push after each task completes (default) +- `per-task`: Push after each task completes (default when autoPush is true) - `per-run`: Push once at end of run -- `manual`: No auto-push (you push manually) +- `manual`: No auto-push (you push manually after review) ### Progress file missing details diff --git a/src/commands/run.ts b/src/commands/run.ts index 0003852..ee7aa24 100644 --- a/src/commands/run.ts +++ b/src/commands/run.ts @@ -39,7 +39,6 @@ export interface RunOptions { allowDelete?: boolean; dryRun?: boolean; pauseBetweenTasks?: boolean; - push?: boolean; } /** @@ -150,7 +149,6 @@ export function registerRunCommand(program: Command): void { .option('--allow-delete', 'Allow deletion of pre-existing files') .option('--dry-run', 'Show what would happen without executing') .option('--pause-between-tasks', 'Pause for human review after each task (strict Ralph mode)') - .option('--push', 'Push changes to remote after completion') .addHelpText('after', ` Config-backed settings (set via .ghcralph/config.json or GHCRALPH_* env vars): - maxIterations, maxTokens, defaultModel, autoCommit, branchPrefix @@ -163,7 +161,6 @@ Examples: $ ghcralph run --github $ ghcralph run --task "Fix bug" --context "src/**/*.ts" --branch fix/login-bug $ ghcralph run --task "Large refactor" --unlimited --timeout 60 - $ ghcralph run --file PLAN.md --push See also: ghcralph status View current session progress @@ -427,7 +424,7 @@ See also: let totalTasksCompleted = 0; let totalTasksFailed = 0; const maxRetriesPerTask = config.maxRetriesPerTask ?? 2; - const autoPush = options.push === true || (config.autoPush ?? false); + const autoPush = config.autoPush ?? false; const pushStrategy = config.pushStrategy ?? 'per-task'; // Compute total tasks count for commit message context (if available) @@ -722,7 +719,8 @@ See also: // Show informational message when push is disabled but changes were made if (!autoPush && isGitRepo && totalTasksCompleted > 0) { console.log(''); - info('💡 Changes not pushed. Use --push flag or set "autoPush": true in config.'); + info('💡 Changes committed locally. Review and push manually with: git push'); + info(' To enable auto-push, set "autoPush": true in .ghcralph/config.json'); } if (totalTasksFailed === 0 && totalTasksCompleted > 0) { From 66e8a01308e05bc1eea2abc59cb2605df89b8ea3 Mon Sep 17 00:00:00 2001 From: Raphael Pothin Date: Wed, 28 Jan 2026 21:11:19 +0000 Subject: [PATCH 5/5] fix: increase test timeout and improve cleanup for Windows CI - Increase stash test timeout from 5s to 10s for slower Windows git - Add retry logic and delay to temp directory cleanup - Fixes EBUSY file lock errors on Windows --- JOURNAL.md | 19 +++++++++++++++++++ src/core/git-branch-manager.test.ts | 10 ++++++++-- 2 files changed, 27 insertions(+), 2 deletions(-) diff --git a/JOURNAL.md b/JOURNAL.md index bb9f08b..0e5ad78 100644 --- a/JOURNAL.md +++ b/JOURNAL.md @@ -1691,3 +1691,22 @@ After discussion, decided to remove the `--push` CLI flag to avoid confusion wit - `npm run build` ✅ - `npm run lint` ✅ - `npm test` ✅ (311 tests passing) + +## 2026-01-28 - Fix Windows CI Test Timeout + +### Context +The `git-branch-manager.test.ts` test was failing on Windows CI due to: +1. Test timeout (5000ms default) - git operations are slower on Windows +2. EBUSY file lock error during temp directory cleanup + +### Changes Made +- Increased timeout for "should stash modified files" test to 10000ms +- Added retry logic and delay to afterEach cleanup for Windows compatibility + +### Files Modified +- `src/core/git-branch-manager.test.ts` + +### Validation +- `npm run build` ✅ +- `npm run lint` ✅ +- `npm test` ✅ (311 tests passing) diff --git a/src/core/git-branch-manager.test.ts b/src/core/git-branch-manager.test.ts index 597ae26..2a8494e 100644 --- a/src/core/git-branch-manager.test.ts +++ b/src/core/git-branch-manager.test.ts @@ -33,7 +33,13 @@ describe('GitBranchManager', () => { }); afterEach(async () => { - await fs.rm(tempDir, { recursive: true, force: true }); + // Small delay to allow git processes to release file handles on Windows + await new Promise((resolve) => setTimeout(resolve, 100)); + try { + await fs.rm(tempDir, { recursive: true, force: true, maxRetries: 3, retryDelay: 100 }); + } catch { + // Ignore cleanup errors on Windows + } }); describe('isGitRepository', () => { @@ -179,7 +185,7 @@ describe('GitBranchManager', () => { // Working directory should be clean now const status = await manager.getWorkingDirStatus(); expect(status.isClean).toBe(true); - }); + }, 10000); // Increased timeout for Windows }); describe('popStash', () => {