refactor(loop): consolidated round-2 improvements by rubenmarcus · Pull Request #188 · multivmlabs/ralph-starter

rubenmarcus · 2026-02-13T18:01:17Z

Summary

Consolidates all round-2 loop improvements into a single PR. Supersedes PRs #177, #179, #180, #181, #182, #183, #184, #185, #187, and #172.

Bug Fixes

wasTrimmed flag — was always true for iterations > 1 regardless of actual trimming
previousBranch cascade on failure — next batch task no longer branches from a broken state
result.cost dead field — TaskResult now populated from loop cost stats
Validation feedback mutation — uses lastValidationFeedback instead of mutating taskWithSkills
Progress status bug — deduplicated completion detection, improved error hashing

Behavioral Improvements

Git-independent change detection — filesystem snapshots as primary, git as secondary. Loop works without git repo.
Task-aware stall detection — resets idle counter on task completion progress, not just file changes. Threshold relaxed to 3 idle + i > 3.
Build validation unconditional — runs after every iteration past iter 1, regardless of change detection
Post-iteration cost ceiling — prevents starting expensive iterations when already over budget
Directory anchoring — preamble tells agent "current dir IS the project root, don't create subdirectories"
Tailwind v4/JSX rules — detailed PostCSS v4 setup instructions in preamble to address LLM knowledge cutoff
Greenfield skills auto-install — automatically installs relevant skills.sh skills for projects without package.json

Performance

Completion detection reordering — cheap string checks (EXIT_SIGNAL, markers) run before expensive semantic analysis
Memoized plan parsing — mtime-based cache with mutation protection via deep-clone
Semantic prompt trimming — cuts at paragraph/line boundaries instead of mid-instruction
Section-aware feedback compression — keeps first complete failing section, summarizes rest

UX

Smart directory selection for integration sources
Iteration progress display with task names
Rate limit display improvements
Calm warning messages at 80%/90% iteration usage

Test Plan

pnpm build — compiles cleanly
pnpm test — all 171 tests pass
Manual test: ralph-starter run --from github --issue 86 without --commit
Verify loop runs full iterations (no premature exit)
Verify build validation runs from iteration 2+
Verify "Run the project?" prompt appears

🤖 Generated with Claude Code

- Reset circuit breaker when tasks advance (prevents false positives during multi-task greenfield builds where early tasks can't pass tests) - Add package manager detection: auto-detect pnpm/yarn/bun from lockfiles and packageManager field instead of hardcoding npm - Add validation warm-up: skip validation until enough tasks are done for greenfield builds (auto-detected, configurable via --validation-warmup) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Track file changes across all iterations (not just iteration 1) - Stop loop after 2 consecutive idle iterations (no file changes) - Check IMPLEMENTATION_PLAN.md for pending tasks in all modes, not just when task string mentions the plan file - Lower default max-iterations from 10 to 7 when no plan file exists Fixes loops running all iterations for simple tasks where the agent finishes early but the loop doesn't detect completion. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When no IMPLEMENTATION_PLAN.md exists, estimate task count from the spec content by analyzing structural elements (headings, bullet points, numbered lists, checkboxes). This replaces the static default of 7 with a data-driven estimate. For the pet shop issue (#86): 4 headings + 12 bullets → ~5 iterations instead of the old static 10. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add loop-aware preamble to every iteration with key Ralph Playbook language patterns: "study" not "read", "don't assume not implemented", "no placeholders or stubs", and AGENTS.md self-improvement - For unstructured specs (no task headers), instruct agent to create IMPLEMENTATION_PLAN.md as first action instead of generic "implement all features" prompt - Add spec file references in iterations 2+ so agent can re-read requirements from specs/ directory - Add plan-creation reminder for later iterations without structured tasks - Use playbook language in structured spec prompt too Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…rnings - Show Ralph ASCII art in run command via showWelcomeCompact() instead of plain text header - Smart project location: detect existing project markers (package.json, .git, Cargo.toml, etc.) and default to "Current directory" when found - Fix type:'list' → type:'select' for inquirer v13 compatibility in project location prompt (same bug fixed across 8 files previously) - Replace scary [WARNING] silence message with calm chalk.dim status: "Agent is thinking..." at 30s, "Still working..." at 60s Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Speed: - Remove unnecessary 1-second sleep between loop iterations — saves ~1s per iteration (25s on a 25-iteration loop) Bug fix: - Fix validation feedback mutation that defeated context trimming. The executor was appending compressed errors to `taskWithSkills` (line 868), accumulating old validation errors across iterations. Now stores feedback in a separate variable and passes it through the context builder's existing `validationFeedback` parameter, which was previously passed as `undefined` (dead code). The context builder already handles per-iteration compression (2000 chars for iter 2-3, 500 for 4+). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… improve error hashing - Fix progress entry always recording 'completed' even for non-done iterations (was ternary with identical branches). Now records 'partial' for iterations that didn't complete. - Merge detectCompletion() and getCompletionReason() into single-pass detectCompletionWithReason() to eliminate duplicate analyzeResponse() calls per iteration. - Remove unused _validationPassed variable. - Improve circuit breaker error hashing: only normalize file:line:col locations, timestamps, hex addresses, and stack traces — preserving semantically meaningful content so different errors (e.g. "port 8000 in use" vs "file not found") hash differently. - Add 'partial' status to ProgressEntry type with status badge. - Update circuit breaker tests for new normalization behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…t detection - Add mtime-based caching to parsePlanTasks() — the same IMPLEMENTATION_PLAN.md file was being read and regex-parsed 4 times per iteration (init, progress check, completion check, display). The cache returns the stored result if the file's mtimeMs hasn't changed, eliminating ~75 redundant file reads across a 25-iteration loop. - Parallelize agent detection in detectAvailableAgents() — each agent check spawns an independent subprocess (e.g. `claude --version`). Running them with Promise.all() instead of sequential for/of cuts startup time from ~2-3s to <1s. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…configurable timeout Safety: - Add maxCost option to CostTracker and LoopOptions — the loop checks isOverBudget() before each iteration and exits with 'cost_ceiling' reason if exceeded. Prevents unexpected charges on long-running loops. - Add output size limit (default 50MB) in agent runner — truncates to last 80% of buffer if exceeded, preventing OOM from verbose agent output. UX: - Run all validation commands instead of stopping at first failure — the agent now sees lint AND test AND build failures in a single pass, enabling multi-fix iterations instead of fix-one-rerun-fix-another chains. Configuration: - Add agentTimeout option to LoopOptions (default: 5 min) — propagated to agent runner's timeoutMs. Complex tasks can set longer timeouts. - Add 'cost_ceiling' to LoopResult exit reasons. - Add 'partial' status to ProgressEntry for non-done iterations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Move timestamp regex before the :\d+:\d+ replacement. Previously, a timestamp like "14:07:39" would match :\d+:\d+ first, mangling it to "14:N:N" so the timestamp regex could never match. This caused same errors with different timestamps to hash differently. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…cache The file could change between stat (cache check) and readFileSync. Now stat before and after reading: only cache if both mtimes match, preventing stale content from being cached with a new mtime. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ounting Addresses PR #185 review feedback: - Remove outputTruncated flag so truncation can fire more than once - Reset outputBytes after truncation to prevent counter drift - Include stderr data in byte accounting Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The compact RALPH_WELCOME_SMALL looked out of place compared to the full RALPH_FULL art used in the wizard. Use showWelcome() consistently. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Build validation (build + typecheck) now runs after every iteration regardless of the --validate flag. This catches broken builds early: - Missing file imports (components that don't exist yet) - PostCSS/Tailwind misconfiguration - TypeScript compilation errors Key changes: - Add detectBuildCommands() with AGENTS.md > package.json > tsc fallback - Add runBuildValidation() with 2-min timeout (vs 5-min for full) - Re-detect build commands per iteration for greenfield projects - Skip when --validate already covers build/typecheck (no double-run) - Add preamble rules: "create files before importing" + "verify compilation" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ovements # Conflicts: # src/loop/executor.ts

…ovements # Conflicts: # src/loop/context-builder.ts

# Conflicts: # src/commands/run.ts

…provements

# Conflicts: # src/loop/executor.ts

…eld skills - Add filesystem-based change detection as primary method (git-independent) - Add getHeadCommitHash() and hasIterationChanges() for git-based secondary detection - Remove hasChanges gate from build/full validation (unconditional after iter 1) - Relax stall detection threshold (3 idle + i > 3) - Add directory anchoring rule to preamble (prevent nested project dirs) - Strengthen Tailwind v4 rules with exact setup instructions - Enable skills auto-install by default for greenfield projects (no package.json) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

context-builder.ts: - Fix wasTrimmed bug (was always true for iterations > 1) - Replace unsafe prompt.slice() with semantic trimming at paragraph boundaries - Section-aware feedback compression (keep first complete section, summarize rest) task-counter.ts: - Protect cache from consumer mutation via deep-clone - Extract MAX_ESTIMATED_ITERATIONS constant (was magic number 25) task-executor.ts: - Don't cascade previousBranch on failure (prevents branching from broken state) - Populate result.cost from loop cost stats (was dead field) executor.ts: - Task-aware stall detection (reset idle counter on task progress, not just file changes) - Post-iteration cost ceiling check (prevents starting expensive iteration over budget) - Reorder completion detection: cheap checks first, expensive semantic analysis last Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

greptile-apps

_{36 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

src/loop/executor.ts

…ions 2+ The context builder was dropping all spec content after iteration 1, causing "spec amnesia" where the agent lost sight of design requirements. Also, the preamble only had negative design guidance ("NEVER use...") with no positive instruction to follow the spec faithfully. - Add buildSpecSummary() to read specs/ directory for later iterations - Rewrite design section: spec is now "FIRST PRIORITY" source of truth - Include spec summary in iterations 2-3 and truncated hint in 4+ - Add dev server exception clause for visual verification flows Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Wire up buildSpecSummary() in the executor so the context builder can include abbreviated spec content in later iterations, preventing the agent from losing sight of design requirements. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Allow callers to cap the number of skills included in the prompt. Used by the --design flag to limit to 3-4 focused design skills instead of the default 5. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The fix command was losing all original design context — the agent only saw the custom task string and build errors, never the spec or plan. This caused design fixes to be guesswork rather than spec-adherent. Changes: - Include specs/ and IMPLEMENTATION_PLAN.md content in fixTask so the agent knows what "correct" looks like - Add --design flag: structured screenshot → analyze → plan → fix flow with 3 viewport breakpoints (desktop/tablet/mobile) - Bump default iterations: 7 for --design, 5 for design keywords, 3 default - Clarify dev server override for visual verification - Register --design option in CLI Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

detectBuildCommands was hardcoding `npm run build` instead of using the project's actual package manager (pnpm/yarn/bun). This caused build validation to fail in projects that enforce a specific pm. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chatgpt-codex-connector · 2026-02-13T23:14:22Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

The fix command was exiting with "nothing to fix!" when build checks passed and no custom task was given. But --design targets visual issues that build checks can't detect, so it should always proceed to the screenshot/analysis flow. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chatgpt-codex-connector · 2026-02-13T23:16:32Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Three issues fixed: - Skills were showing "25 detected" because maxSkills wasn't threaded through LoopOptions to formatSkillsForPrompt. Now --design caps to 4. - Startup display now shows "4 active (25 installed)" instead of raw count - Design prompt now forcefully instructs the agent to start with dev server + screenshots as the VERY FIRST action, ignoring IMPLEMENTATION_PLAN.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chatgpt-codex-connector · 2026-02-13T23:20:40Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

The design fix prompt was too vague — "layout/spacing problems" led the agent to suggest padding tweaks instead of catching obvious structural issues like content not being centered or huge empty gaps. Rewritten Phase 2 (Issue Identification) to: - Prioritize page structure (centering, containers, max-width) over cosmetic - Check for content pinned to edges, broken grid layouts, unbalanced columns - Require CONCRETE issues visible in screenshots, not generic improvements Rewritten Phase 3 (Fix Plan) to: - Require exact file + CSS property for each fix - Focus on minimal fixes, not redesigning entire components Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chatgpt-codex-connector · 2026-02-13T23:22:36Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Pass user's custom task text (not the full generated prompt) to autoInstallSkillsFromTask. The --design prompt contains dozens of CSS/design keywords that triggered excessive skill search queries, causing skills to accumulate globally (25+ after a few runs). Also lower MAX_SKILLS_TO_INSTALL from 5 to 3 to cap accumulation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The preamble said "Study IMPLEMENTATION_PLAN.md and work on ONE task" which directly conflicted with the --design prompt's "Ignore IMPLEMENTATION_PLAN.md — this is a visual fix pass." The preamble appeared first and won, confusing the agent. Add skipPlanInstructions option that replaces plan-related rules with "This is a fix/review pass" when active. Set from fix --design. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix --design: 7 → 5 (5-phase structure should complete in 3-4 iters) isDesignTask: 5 → 4 (visual tasks with keyword detection) Reduces worst-case wall time from 35min to 25min. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Each iteration now appends a summary to .ralph/iteration-log.md with status (validation passed/failed), whether files changed, and agent summary text. On iterations 2+, the last 3 entries are included in the prompt as "## Previous Iterations" so the agent knows what was already tried and can avoid repeating failed approaches. This is a lightweight alternative to full session continuity (--resume) which is deferred to 0.3.1. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Loop header shows "Design Fix", "Fix", or agent name based on fixMode instead of always showing "Running Claude Code" - Subtask tree renders below header when current task has subtasks: [x] Create hero component [ ] Add responsive styles - Add fixMode option to LoopOptions ('design' | 'scan' | 'custom') Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

After each iteration in design mode, check ports 3000/5173/4321/8080 for orphaned dev server processes and SIGTERM them. This prevents resource leaks when the agent crashes or times out without cleaning up the dev server it started for visual verification. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add lastValidationFeedback field to SessionState so that when a session is paused and later resumed, the agent gets the last validation errors as context. The resume command now passes this as initialValidationFeedback to runLoop. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add uiLibrary as an optional field in TechStack to support UI component library selection (shadcn/ui, shadcn-vue, shadcn-svelte, MUI, Chakra). Updated normalizeTechStack, hasTechStack, and the wizard summary display. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…b projects When no UI library/styling is specified, web projects now default to: - Tailwind CSS for styling - shadcn/ui (React/Next.js), shadcn-vue (Vue), or shadcn-svelte (Svelte) Updated REFINEMENT_PROMPT to include uiLibrary field and guidance for the LLM to suggest this default stack. Template fallback also sets these defaults when the LLM is unavailable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add uiLibrary to spec display (A4) - Add Tailwind v4 setup instructions to AGENTS.md including cascade layers warning and explicit "no manual CSS resets" guidance (B1) - Add shadcn/ui + motion-primitives setup instructions to AGENTS.md (B1) - Add Setup Notes section to spec with Tailwind v4 + UI library details to prevent CSS cascade conflicts (B2) - Add formatTech entries for shadcn, MUI, Chakra, motion-primitives Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… check - C1: Require DESIGN_VERIFIED completion token for design mode, disable legacy "All tasks completed" markers via requireExitSignal - C2: Update Phases 4-5 to instruct agent to emit DESIGN_VERIFIED only after taking verification screenshots - C3: Increase default design iterations from 5 to 7 for fix+verify cycles - C4: Add CSS cascade conflict check as priority 0 in Phase 2 — detects the "spacing broken + colors working" pattern caused by unlayered CSS overriding Tailwind v4 @layer utilities Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ruction - D1: Credit screenshot/viewport activity as productive progress in design mode, preventing stall detector from killing analysis iterations - D2: Suppress "All tasks completed" instruction for design mode (skipPlanInstructions=true), replacing with "Follow the completion instructions in the task below" to avoid conflicting with DESIGN_VERIFIED Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chatgpt-codex-connector · 2026-02-14T00:59:31Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

rubenmarcus · 2026-02-14T01:02:09Z

https://github.com/greptileai review

rubenmarcus and others added 24 commits February 12, 2026 14:10

fix: use full Ralph ASCII art in run command instead of compact version

ca1852c

The compact RALPH_WELCOME_SMALL looked out of place compared to the full RALPH_FULL art used in the wizard. Use showWelcome() consistently. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge branch 'fix/loop-validation-improvements' into test/round2-impr…

de405de

…ovements # Conflicts: # src/loop/executor.ts

Merge branch 'fix/loop-early-termination' into test/round2-improvements

9484927

Merge branch 'feat/prompt-engineering-playbook' into test/round2-impr…

4cec9ba

…ovements # Conflicts: # src/loop/context-builder.ts

Merge branch 'fix/loop-ux-improvements' into test/round2-improvements

54922fa

# Conflicts: # src/commands/run.ts

Merge branch 'fix/loop-speed-validation-feedback' into test/round2-im…

0a4ada6

…provements

Merge branch 'fix/loop-bug-fixes' into test/round2-improvements

e2a8e7e

# Conflicts: # src/loop/executor.ts

Merge branch 'perf/loop-performance' into test/round2-improvements

3db52aa

Merge branch 'feat/loop-safety-ux' into test/round2-improvements

f9b3a07

# Conflicts: # src/loop/executor.ts

github-actions bot assigned rubenmarcus Feb 13, 2026

github-actions bot added candidate-release PR is ready for release core documentation refactor tests labels Feb 13, 2026

greptile-apps bot reviewed Feb 13, 2026

View reviewed changes

src/loop/executor.ts Show resolved Hide resolved

src/loop/executor.ts Show resolved Hide resolved

rubenmarcus and others added 5 commits February 13, 2026 23:02

feat(skills): add maxSkills parameter to formatSkillsForPrompt

3c941a2

Allow callers to cap the number of skills included in the prompt. Used by the --design flag to limit to 3-4 focused design skills instead of the default 5. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

rubenmarcus and others added 13 commits February 13, 2026 23:54

docs: add --design flag, UI defaults, and changelog for beta.17

5d65a00

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

rubenmarcus removed the candidate-release PR is ready for release label Feb 14, 2026

rubenmarcus merged commit 7ade690 into main Feb 14, 2026
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

refactor(loop): consolidated round-2 improvements#188

refactor(loop): consolidated round-2 improvements#188
rubenmarcus merged 54 commits intomainfrom
test/round2-improvements

rubenmarcus commented Feb 13, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot commented Feb 13, 2026

Uh oh!

chatgpt-codex-connector bot commented Feb 13, 2026

Uh oh!

chatgpt-codex-connector bot commented Feb 13, 2026

Uh oh!

chatgpt-codex-connector bot commented Feb 13, 2026

Uh oh!

chatgpt-codex-connector bot commented Feb 14, 2026

Uh oh!

rubenmarcus commented Feb 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Comments

Conversation

rubenmarcus commented Feb 13, 2026

Summary

Bug Fixes

Behavioral Improvements

Performance

UX

Test Plan

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot commented Feb 13, 2026

Uh oh!

chatgpt-codex-connector bot commented Feb 13, 2026

Uh oh!

chatgpt-codex-connector bot commented Feb 13, 2026

Uh oh!

chatgpt-codex-connector bot commented Feb 13, 2026

Uh oh!

chatgpt-codex-connector bot commented Feb 14, 2026

Uh oh!

rubenmarcus commented Feb 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant