diff --git a/CLAUDE.md b/CLAUDE.md
index f95bb927..9c848190 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -77,6 +77,120 @@ Only update CLAUDE.md if you have **genuinely reusable knowledge** that would he
- Keep changes focused and minimal
- Follow existing code patterns
+## Mandatory Quality Gates (Backpressure)
+
+Quality gates are **mandatory blockers**, not suggestions. You MUST NOT mark a story as complete until ALL gates pass.
+
+### Required Gates
+
+Before marking ANY story as `passes: true`, you MUST verify:
+
+1. **Typecheck MUST pass** - Run `npm run build` (or project equivalent) with zero errors
+2. **Lint MUST pass** - Run `npm run lint` (or project equivalent) with zero errors
+3. **Tests MUST pass** - Run `npm test` (or project equivalent) with zero failures
+
+If ANY gate fails, the story is NOT complete. Period.
+
+### Forbidden Shortcuts
+
+Never use these to bypass quality gates:
+
+| Forbidden | Why |
+|-----------|-----|
+| `@ts-ignore` | Hides type errors instead of fixing them |
+| `@ts-expect-error` | Same as above - masks real problems |
+| `eslint-disable` | Suppresses lint rules without fixing violations |
+| `eslint-disable-next-line` | Same as above - circumvents quality checks |
+| `// @nocheck` | Disables type checking for entire file |
+| `any` type | Defeats the purpose of TypeScript |
+
+If you find yourself reaching for these, STOP. Fix the actual issue.
+
+### 3-Attempt Limit
+
+If you cannot make a story pass quality gates after 3 attempts:
+
+1. **STOP** - Do not continue iterating on the same approach
+2. **Document** - Add detailed notes about what's failing and why
+3. **Skip** - Move to the next story and let a human investigate
+4. **Never** - Do not use forbidden shortcuts to force a pass
+
+This prevents infinite loops on fundamentally blocked stories.
+
+### Backpressure Mindset
+
+Think of quality gates as physical barriers, not speed bumps:
+- A speed bump slows you down but lets you pass
+- A barrier stops you completely until you have the right key
+
+You cannot "push through" a failing gate. You must fix it or stop.
+
+## Verification Before Completion
+
+Before claiming ANY story is complete, you MUST verify your work systematically. Do not trust your memory or assumptions—run the checks.
+
+### Verification Checklist
+
+Before marking a story as `passes: true`, complete this checklist:
+
+```
+## Verification Checklist for [Story ID]
+
+### 1. Acceptance Criteria Check
+- [ ] Criterion 1: [How verified - command/file check/grep]
+- [ ] Criterion 2: [How verified]
+- [ ] Criterion 3: [How verified]
+... (one checkbox per criterion)
+
+### 2. Quality Gates
+- [ ] Typecheck passes: `npm run build` (or equivalent)
+- [ ] Lint passes: `npm run lint` (or equivalent)
+- [ ] Tests pass: `npm test` (or equivalent)
+
+### 3. Regression Check
+- [ ] Full test suite passes (not just new tests)
+- [ ] No unrelated failures introduced
+
+### 4. Final Verification
+- [ ] Re-read each acceptance criterion one more time
+- [ ] Confirmed each criterion is met with evidence
+```
+
+### How to Verify Each Criterion
+
+For each acceptance criterion, you must have **evidence**, not just belief:
+
+| Criterion Type | Verification Method |
+|----------------|---------------------|
+| "File X exists" | `ls -la path/to/X` or Read tool |
+| "Contains section Y" | `grep -n "Y" file` or Read tool |
+| "Command succeeds" | Run the command, check exit code |
+| "Output contains Z" | Run command, pipe to grep |
+| "Valid JSON" | `jq . file.json` succeeds |
+
+### Before Outputting COMPLETE
+
+When you believe ALL stories are done and you're about to output `COMPLETE`:
+
+1. **Re-verify the current story** - Run all quality gates one more time
+2. **Check prd.json** - Confirm all stories show `passes: true`
+3. **Run full verification** - `jq '.userStories[] | select(.passes == false) | .id' prd.json` should return nothing
+4. **Only then** output the COMPLETE signal
+
+If ANY verification fails at this stage, do NOT output COMPLETE. Fix the issue first.
+
+### Evidence Over Assertion
+
+Never claim something works without proving it:
+
+| Bad (Assertion) | Good (Evidence) |
+|-----------------|-----------------|
+| "I added the section" | "Verified with `grep -n 'Section Name' file` - found at line 42" |
+| "Tests pass" | "Ran `npm test` - 47 tests passed, 0 failed" |
+| "File is valid JSON" | "Ran `jq . file.json` - parsed successfully" |
+
+Run the command. See the output. Report the evidence.
+
## Browser Testing (If Available)
For any story that changes UI, verify it works in the browser if you have browser testing tools configured (e.g., via MCP):
diff --git a/README.md b/README.md
index 4db35bc2..73178527 100644
--- a/README.md
+++ b/README.md
@@ -8,6 +8,16 @@ Based on [Geoffrey Huntley's Ralph pattern](https://ghuntley.com/ralph/).
[Read my in-depth article on how I use Ralph](https://x.com/ryancarson/status/2008548371712135632)
+## Security Warning
+
+**Ralph runs AI agents autonomously with full access to your codebase.** Before running:
+
+- **Never expose production credentials** - Ralph could accidentally commit, log, or transmit sensitive values like `AWS_ACCESS_KEY_ID`, `DATABASE_URL`, or API keys
+- **Use sandboxing** - Run Ralph in a Docker container, VM, or isolated sandbox environment to limit potential damage
+- **Review commits before pushing** - Always review what Ralph committed before pushing to remote
+
+See [docs/SECURITY.md](docs/SECURITY.md) for complete security guidance, including pre-flight checklists and emergency stop procedures.
+
## Prerequisites
- One of the following AI coding tools installed and authenticated:
diff --git a/docs/COST_TRACKING.md b/docs/COST_TRACKING.md
new file mode 100644
index 00000000..65904ecb
--- /dev/null
+++ b/docs/COST_TRACKING.md
@@ -0,0 +1,151 @@
+# Cost Tracking Guide
+
+Running autonomous agents like Ralph can incur significant API costs. This guide helps you set budgets, track usage, and prevent runaway costs.
+
+## Budget Recommendations
+
+Set these budget limits before starting any autonomous session:
+
+| Feature Size | Estimated Stories | Recommended Budget | Max Iterations |
+|--------------|-------------------|-------------------|----------------|
+| Small | 1-3 stories | $5-10 | 10 |
+| Medium | 4-8 stories | $15-30 | 25 |
+| Large | 9-15 stories | $40-75 | 50 |
+| XL | 16+ stories | $100+ | 100 |
+
+**Note:** These are estimates. Actual costs depend on story complexity, codebase size, and retry frequency.
+
+## Feature Size to Budget Mapping
+
+Use this guide to estimate budget before starting:
+
+### Small Features ($5-10)
+- Bug fixes with clear reproduction steps
+- Adding a single new field or column
+- Documentation updates
+- Simple configuration changes
+- 1-3 acceptance criteria per story
+
+### Medium Features ($15-30)
+- New API endpoint with tests
+- Adding a new UI component
+- Integration with existing service
+- Refactoring a single module
+- 3-5 acceptance criteria per story
+
+### Large Features ($40-75)
+- New feature spanning multiple files
+- Database migration with data transformation
+- Multi-step workflow implementation
+- Cross-cutting concerns (auth, logging)
+- 5+ acceptance criteria per story
+
+## Claude Code Usage Tracking
+
+Claude Code provides built-in usage tracking. Use these commands to monitor costs:
+
+### Check Current Usage
+```bash
+# View usage summary for current session
+claude usage
+
+# View detailed usage breakdown
+claude usage --detailed
+```
+
+### Set Budget Limits
+```bash
+# Set a budget limit before starting (prevents overspend)
+claude config set budget_limit 25.00
+
+# Check remaining budget
+claude usage --remaining
+```
+
+### Monitor During Session
+```bash
+# Watch usage in real-time (run in separate terminal)
+watch -n 30 'claude usage'
+```
+
+### Post-Session Analysis
+```bash
+# Export usage report
+claude usage --export > usage-report-$(date +%Y%m%d).json
+
+# Parse costs from report
+cat usage-report-*.json | jq '.total_cost'
+```
+
+## Amp Usage Tracking
+
+Amp (Sourcegraph's AI assistant) tracks usage through Sourcegraph's dashboard:
+
+### Web Dashboard
+1. Navigate to your Sourcegraph instance
+2. Go to **Settings** → **Usage & Billing**
+3. View Amp usage by time period
+
+### CLI Tracking
+```bash
+# Check Amp usage via API (requires auth token)
+curl -H "Authorization: token $SRC_ACCESS_TOKEN" \
+ https://sourcegraph.com/.api/user/usage | jq '.'
+
+# Filter for Amp-specific usage
+curl -H "Authorization: token $SRC_ACCESS_TOKEN" \
+ https://sourcegraph.com/.api/user/usage | jq '.amp'
+```
+
+### Amp Budget Controls
+- Set organization-wide limits in Sourcegraph admin
+- Per-user limits available in enterprise plans
+- Monitor usage alerts via email or Slack integration
+
+## Cost Prevention Strategies
+
+### Before Starting
+1. **Estimate scope** - Map features to budget sizes above
+2. **Set hard limits** - Configure budget caps that stop execution
+3. **Use circuit breakers** - Limit retries per story (see ralph.sh)
+4. **Start small** - Run a pilot with 1-2 stories before full batch
+
+### During Execution
+1. **Monitor actively** - Watch `claude usage` during runs
+2. **Check progress.txt** - Stories with many retries indicate problems
+3. **Stop early** - Kill the session if costs are tracking above budget
+4. **Review prd.json** - Check for stories repeatedly failing
+
+### After Completion
+1. **Export usage** - Save detailed reports for analysis
+2. **Calculate cost per story** - Total cost / stories completed
+3. **Adjust estimates** - Update budget recommendations based on actuals
+4. **Identify expensive patterns** - Stories with many retries cost more
+
+## Red Flags: Cost Warning Signs
+
+Watch for these patterns that indicate escalating costs:
+
+| Red Flag | Likely Cause | Action |
+|----------|--------------|--------|
+| Same story retrying 3+ times | Unclear acceptance criteria | Stop and clarify requirements |
+| Many small commits | Agent thrashing on solution | Review approach |
+| No progress for 5+ iterations | Blocking issue | Stop and investigate |
+| Budget 50% spent, < 25% done | Scope underestimated | Re-evaluate or pause |
+
+## Cost Tracking Checklist
+
+Before each Ralph session:
+
+- [ ] Estimated feature size and set budget
+- [ ] Configured hard budget limit in Claude/Amp
+- [ ] Set max iterations in ralph.sh
+- [ ] Have monitoring terminal ready
+- [ ] Know how to emergency stop
+
+After each Ralph session:
+
+- [ ] Exported usage report
+- [ ] Calculated actual vs estimated cost
+- [ ] Updated budget estimates if needed
+- [ ] Documented expensive stories for future reference
diff --git a/docs/MONITORING.md b/docs/MONITORING.md
new file mode 100644
index 00000000..8c5a8fb7
--- /dev/null
+++ b/docs/MONITORING.md
@@ -0,0 +1,169 @@
+# Monitoring Guide
+
+This guide helps operators monitor Ralph Wiggum during autonomous runs and know when to intervene.
+
+## Red Flags: When to Intervene
+
+Watch for these patterns that indicate Ralph needs human intervention:
+
+### 1. Repeated Failures on Same Story
+```bash
+# Check progress.txt for repeated story attempts
+grep -c "US-00X" progress.txt
+```
+If the same story ID appears more than 3 times, Ralph is likely stuck.
+
+### 2. Typecheck/Lint Loops
+```bash
+# Watch for repeated error patterns
+tail -f progress.txt | grep -E "(typecheck|lint|error)"
+```
+Repeated cycles of "fixing" the same error indicates a fundamental misunderstanding.
+
+### 3. File Thrashing
+```bash
+# Check git for excessive changes to same file
+git log --oneline --follow -20 -- path/to/file.ts
+```
+Multiple commits to the same file in quick succession suggests trial-and-error debugging.
+
+### 4. Scope Creep
+```bash
+# Check for unexpected file changes
+git diff --stat HEAD~5
+```
+If Ralph is modifying files unrelated to the current story, it may have lost focus.
+
+### 5. Silent Failures
+```bash
+# Check if progress is being made
+ls -la progress.txt
+cat prd.json | jq '.userStories[] | select(.passes == true) | .id'
+```
+If progress.txt hasn't been updated but Ralph is still running, something may be wrong.
+
+### 6. Credential Warnings
+```bash
+# Monitor for any credential-related output
+grep -i -E "(password|secret|key|token|credential)" progress.txt
+```
+Any mention of credentials in logs requires immediate review.
+
+### 7. Network Activity
+```bash
+# Check for unexpected network calls (if using network monitoring)
+lsof -i -P | grep ralph
+```
+Unexpected network activity could indicate Ralph is accessing external services.
+
+## Monitoring Commands
+
+Use these commands to monitor Ralph in real-time:
+
+### Real-Time Progress
+```bash
+# Follow progress updates
+tail -f progress.txt
+
+# Watch for story completions
+watch -n 5 'cat prd.json | jq ".userStories[] | select(.passes == true) | .id"'
+```
+
+### Story Status Dashboard
+```bash
+# Show all story statuses
+cat prd.json | jq -r '.userStories[] | "\(.id): \(.title) - passes: \(.passes)"'
+
+# Count completed vs total
+echo "Completed: $(cat prd.json | jq '[.userStories[] | select(.passes == true)] | length')/$(cat prd.json | jq '.userStories | length')"
+```
+
+### Git Activity
+```bash
+# Watch for new commits
+watch -n 10 'git log --oneline -10'
+
+# Check uncommitted changes
+git status --short
+
+# View recent diffs
+git diff HEAD~1 --stat
+```
+
+### Resource Usage
+```bash
+# Monitor CPU/memory usage
+top -l 1 | grep -E "(ralph|claude|amp)"
+
+# Check disk usage in project
+du -sh .
+```
+
+## When to Stop and Regenerate Plan
+
+Stop Ralph and regenerate the plan when:
+
+1. **Same error appears 3+ times** - The current approach isn't working
+2. **Story takes more than 5 iterations** - Requirements may be unclear or impossible
+3. **Multiple stories fail in sequence** - There may be a fundamental issue with the plan
+4. **Unexpected side effects** - Ralph is breaking previously working features
+5. **Tests start failing** - Regression indicates architectural problems
+6. **Budget threshold reached** - Cost is exceeding the value of the feature
+
+### How to Stop and Reassess
+
+```bash
+# 1. Stop Ralph gracefully
+touch .ralph-stop
+# OR
+Ctrl+C
+
+# 2. Review current state
+git log --oneline -10
+git diff
+cat progress.txt | tail -50
+
+# 3. Check which stories are problematic
+cat prd.json | jq '.userStories[] | select(.passes == false) | {id, title, notes}'
+
+# 4. Consider if PRD needs revision
+# - Are acceptance criteria clear and achievable?
+# - Are there missing dependencies between stories?
+# - Is the scope realistic?
+```
+
+## Intervention Checklist
+
+Before intervening, run through this checklist:
+
+- [ ] **Is Ralph actually stuck?** - Wait at least 2 minutes for complex operations
+- [ ] **Check the logs** - Review progress.txt for context on what Ralph is attempting
+- [ ] **Review recent commits** - Understand what changes have been made
+- [ ] **Check story notes** - Ralph may have added notes explaining difficulties
+- [ ] **Verify acceptance criteria** - Ensure they are actually achievable
+- [ ] **Check for external dependencies** - Does the story require services Ralph can't access?
+- [ ] **Review error messages** - Are there clear errors indicating the problem?
+- [ ] **Consider partial progress** - Can you help Ralph past a specific blocker?
+
+### Post-Intervention Actions
+
+After intervening:
+
+1. **Document the intervention** - Add a note to progress.txt explaining what you did
+2. **Update story notes** - Add context to prd.json if helpful
+3. **Consider PRD changes** - Split complex stories or clarify criteria if needed
+4. **Restart cleanly** - Ensure Ralph has a clear starting point
+5. **Monitor closely** - Watch the first few iterations after intervention
+
+## Alert Thresholds
+
+Configure these alerts for autonomous monitoring:
+
+| Metric | Warning | Critical |
+|--------|---------|----------|
+| Same story iterations | 3 | 5 |
+| Time on single story | 15 min | 30 min |
+| Consecutive failures | 2 | 3 |
+| Files changed per commit | 10 | 20 |
+| API cost per story | $1 | $5 |
+| Total run cost | $10 | $25 |
diff --git a/docs/SECURITY.md b/docs/SECURITY.md
new file mode 100644
index 00000000..2694d23f
--- /dev/null
+++ b/docs/SECURITY.md
@@ -0,0 +1,117 @@
+# Security Guide
+
+Ralph Wiggum runs as an autonomous agent with significant system access. This document outlines security best practices to prevent credential exposure and unauthorized actions.
+
+## Mandatory Safeguards
+
+Before running Ralph in any environment, ensure these safeguards are in place:
+
+1. **Never expose production credentials** - Ralph should not have access to production databases, cloud accounts, or API keys
+2. **Use isolated environments** - Run Ralph in sandboxed containers, VMs, or development environments only
+3. **Limit file system access** - Restrict Ralph to the project directory when possible
+4. **Review generated code** - Always review commits before merging to protected branches
+5. **Monitor token usage** - Set budget limits to prevent runaway API costs
+
+## Pre-Flight Security Checklist
+
+Run through this checklist before starting any Ralph session:
+
+- [ ] **Environment Variables Cleared** - Ensure dangerous environment variables are not set:
+ - `AWS_ACCESS_KEY_ID` - AWS credentials could allow cloud resource access
+ - `AWS_SECRET_ACCESS_KEY` - AWS credentials could allow cloud resource access
+ - `DATABASE_URL` - Database connection strings could expose production data
+ - `OPENAI_API_KEY` - Could incur costs on your account
+ - `ANTHROPIC_API_KEY` - Could incur costs on your account
+ - `GITHUB_TOKEN` - Could push to repositories or access private repos
+ - `NPM_TOKEN` - Could publish packages
+ - `DOCKER_PASSWORD` - Could push images
+
+- [ ] **Running in Sandbox** - Confirm you're in a sandboxed environment
+- [ ] **Git Remote Verified** - Ensure pushes go to correct repository
+- [ ] **Branch Protection** - Confirm main/master has branch protection enabled
+- [ ] **Budget Set** - API cost limits configured
+
+## Emergency Stop
+
+If Ralph begins behaving unexpectedly, use these methods to stop execution:
+
+### Immediate Stop
+```bash
+# Kill the ralph.sh process
+pkill -f ralph.sh
+
+# Or find and kill specifically
+ps aux | grep ralph.sh
+kill -9
+```
+
+### Graceful Stop
+```bash
+# Create a stop file (if ralph.sh is configured to check for it)
+touch .ralph-stop
+
+# Or simply Ctrl+C in the terminal running ralph.sh
+```
+
+### Post-Emergency Checklist
+1. Review git log for any unexpected commits
+2. Check git diff for uncommitted changes
+3. Review any files created or modified
+4. Check cloud console for any unexpected resources
+5. Rotate any credentials that may have been exposed
+
+## Docker Sandboxing
+
+Running Ralph in Docker provides isolation from your host system:
+
+```dockerfile
+# Dockerfile.ralph
+FROM node:20-slim
+
+# Install required tools
+RUN apt-get update && apt-get install -y \
+ git \
+ jq \
+ curl \
+ && rm -rf /var/lib/apt/lists/*
+
+# Create non-root user
+RUN useradd -m -s /bin/bash ralph
+USER ralph
+WORKDIR /home/ralph/workspace
+
+# Copy only necessary files
+COPY --chown=ralph:ralph . .
+
+# Don't include any credentials in the image
+# Pass API keys at runtime only
+```
+
+```bash
+# Build the sandbox
+docker build -f Dockerfile.ralph -t ralph-sandbox .
+
+# Run with minimal permissions
+docker run -it --rm \
+ --network=none \
+ --read-only \
+ --tmpfs /tmp \
+ -v $(pwd):/home/ralph/workspace \
+ -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
+ ralph-sandbox \
+ ./ralph.sh
+```
+
+### Docker Security Options
+- `--network=none` - Prevents network access (remove if Ralph needs to fetch dependencies)
+- `--read-only` - Makes container filesystem read-only
+- `--tmpfs /tmp` - Provides writable temp directory
+- Mount only the project directory, not your entire home folder
+
+## Additional Recommendations
+
+1. **Use separate API keys** - Create dedicated API keys for Ralph with lower rate limits
+2. **Enable audit logging** - Log all commands Ralph executes for review
+3. **Set up alerts** - Configure cost alerts in your cloud provider dashboards
+4. **Regular credential rotation** - Rotate any credentials that have been in the environment
+5. **Review before merge** - Never auto-merge Ralph's PRs without human review
diff --git a/prd.json.example b/prd.json.example
index fbc40668..ded93d0a 100644
--- a/prd.json.example
+++ b/prd.json.example
@@ -2,6 +2,11 @@
"project": "MyApp",
"branchName": "ralph/task-priority",
"description": "Task Priority System - Add priority levels to tasks",
+ "verificationCommands": {
+ "typecheck": "npm run build",
+ "lint": "npm run lint",
+ "test": "npm test"
+ },
"userStories": [
{
"id": "US-001",
diff --git a/prompt.md b/prompt.md
index cdebe901..b7ad678b 100644
--- a/prompt.md
+++ b/prompt.md
@@ -80,6 +80,120 @@ Only update AGENTS.md if you have **genuinely reusable knowledge** that would he
- Keep changes focused and minimal
- Follow existing code patterns
+## Mandatory Quality Gates (Backpressure)
+
+Quality gates are **mandatory blockers**, not suggestions. You MUST NOT mark a story as complete until ALL gates pass.
+
+### Required Gates
+
+Before marking ANY story as `passes: true`, you MUST verify:
+
+1. **Typecheck MUST pass** - Run `npm run build` (or project equivalent) with zero errors
+2. **Lint MUST pass** - Run `npm run lint` (or project equivalent) with zero errors
+3. **Tests MUST pass** - Run `npm test` (or project equivalent) with zero failures
+
+If ANY gate fails, the story is NOT complete. Period.
+
+### Forbidden Shortcuts
+
+Never use these to bypass quality gates:
+
+| Forbidden | Why |
+|-----------|-----|
+| `@ts-ignore` | Hides type errors instead of fixing them |
+| `@ts-expect-error` | Same as above - masks real problems |
+| `eslint-disable` | Suppresses lint rules without fixing violations |
+| `eslint-disable-next-line` | Same as above - circumvents quality checks |
+| `// @nocheck` | Disables type checking for entire file |
+| `any` type | Defeats the purpose of TypeScript |
+
+If you find yourself reaching for these, STOP. Fix the actual issue.
+
+### 3-Attempt Limit
+
+If you cannot make a story pass quality gates after 3 attempts:
+
+1. **STOP** - Do not continue iterating on the same approach
+2. **Document** - Add detailed notes about what's failing and why
+3. **Skip** - Move to the next story and let a human investigate
+4. **Never** - Do not use forbidden shortcuts to force a pass
+
+This prevents infinite loops on fundamentally blocked stories.
+
+### Backpressure Mindset
+
+Think of quality gates as physical barriers, not speed bumps:
+- A speed bump slows you down but lets you pass
+- A barrier stops you completely until you have the right key
+
+You cannot "push through" a failing gate. You must fix it or stop.
+
+## Verification Before Completion
+
+Before claiming ANY story is complete, you MUST verify your work systematically. Do not trust your memory or assumptions—run the checks.
+
+### Verification Checklist
+
+Before marking a story as `passes: true`, complete this checklist:
+
+```
+## Verification Checklist for [Story ID]
+
+### 1. Acceptance Criteria Check
+- [ ] Criterion 1: [How verified - command/file check/grep]
+- [ ] Criterion 2: [How verified]
+- [ ] Criterion 3: [How verified]
+... (one checkbox per criterion)
+
+### 2. Quality Gates
+- [ ] Typecheck passes: `npm run build` (or equivalent)
+- [ ] Lint passes: `npm run lint` (or equivalent)
+- [ ] Tests pass: `npm test` (or equivalent)
+
+### 3. Regression Check
+- [ ] Full test suite passes (not just new tests)
+- [ ] No unrelated failures introduced
+
+### 4. Final Verification
+- [ ] Re-read each acceptance criterion one more time
+- [ ] Confirmed each criterion is met with evidence
+```
+
+### How to Verify Each Criterion
+
+For each acceptance criterion, you must have **evidence**, not just belief:
+
+| Criterion Type | Verification Method |
+|----------------|---------------------|
+| "File X exists" | `ls -la path/to/X` or Read tool |
+| "Contains section Y" | `grep -n "Y" file` or Read tool |
+| "Command succeeds" | Run the command, check exit code |
+| "Output contains Z" | Run command, pipe to grep |
+| "Valid JSON" | `jq . file.json` succeeds |
+
+### Before Outputting COMPLETE
+
+When you believe ALL stories are done and you're about to output `COMPLETE`:
+
+1. **Re-verify the current story** - Run all quality gates one more time
+2. **Check prd.json** - Confirm all stories show `passes: true`
+3. **Run full verification** - `jq '.userStories[] | select(.passes == false) | .id' prd.json` should return nothing
+4. **Only then** output the COMPLETE signal
+
+If ANY verification fails at this stage, do NOT output COMPLETE. Fix the issue first.
+
+### Evidence Over Assertion
+
+Never claim something works without proving it:
+
+| Bad (Assertion) | Good (Evidence) |
+|-----------------|-----------------|
+| "I added the section" | "Verified with `grep -n 'Section Name' file` - found at line 42" |
+| "Tests pass" | "Ran `npm test` - 47 tests passed, 0 failed" |
+| "File is valid JSON" | "Ran `jq . file.json` - parsed successfully" |
+
+Run the command. See the output. Report the evidence.
+
## Browser Testing (Required for Frontend Stories)
For any story that changes UI, you MUST verify it works in the browser:
diff --git a/ralph.sh b/ralph.sh
index baff052a..6bec1924 100755
--- a/ralph.sh
+++ b/ralph.sh
@@ -7,6 +7,8 @@ set -e
# Parse arguments
TOOL="amp" # Default to amp for backwards compatibility
MAX_ITERATIONS=10
+MAX_ATTEMPTS_PER_STORY="${MAX_ATTEMPTS_PER_STORY:-5}"
+SKIP_SECURITY="${SKIP_SECURITY_CHECK:-false}"
while [[ $# -gt 0 ]]; do
case $1 in
@@ -18,6 +20,10 @@ while [[ $# -gt 0 ]]; do
TOOL="${1#*=}"
shift
;;
+ --skip-security-check)
+ SKIP_SECURITY="true"
+ shift
+ ;;
*)
# Assume it's max_iterations if it's a number
if [[ "$1" =~ ^[0-9]+$ ]]; then
@@ -33,6 +39,49 @@ if [[ "$TOOL" != "amp" && "$TOOL" != "claude" ]]; then
echo "Error: Invalid tool '$TOOL'. Must be 'amp' or 'claude'."
exit 1
fi
+
+# Security Pre-Flight Check
+if [[ "$SKIP_SECURITY" != "true" ]]; then
+ echo ""
+ echo "==============================================================="
+ echo " Security Pre-Flight Check"
+ echo "==============================================================="
+ echo ""
+
+ SECURITY_WARNINGS=()
+
+ if [[ -n "${AWS_ACCESS_KEY_ID:-}" ]]; then
+ SECURITY_WARNINGS+=("AWS_ACCESS_KEY_ID is set - production credentials may be exposed")
+ fi
+
+ if [[ -n "${DATABASE_URL:-}" ]]; then
+ SECURITY_WARNINGS+=("DATABASE_URL is set - database credentials may be exposed")
+ fi
+
+ if [[ ${#SECURITY_WARNINGS[@]} -gt 0 ]]; then
+ echo "WARNING: Potential credential exposure detected:"
+ echo ""
+ for warning in "${SECURITY_WARNINGS[@]}"; do
+ echo " - $warning"
+ done
+ echo ""
+ echo "Running an autonomous agent with these credentials set could expose"
+ echo "them in logs, commit messages, or API calls."
+ echo ""
+ echo "See docs/SECURITY.md for sandboxing guidance."
+ echo ""
+ read -p "Continue anyway? (y/N) " -n 1 -r
+ echo ""
+ if [[ ! $REPLY =~ ^[Yy]$ ]]; then
+ echo "Aborted. Unset credentials or use --skip-security-check to bypass."
+ exit 1
+ fi
+ else
+ echo "No credential exposure risks detected."
+ fi
+ echo ""
+fi
+
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PRD_FILE="$SCRIPT_DIR/prd.json"
PROGRESS_FILE="$SCRIPT_DIR/progress.txt"
@@ -79,7 +128,62 @@ if [ ! -f "$PROGRESS_FILE" ]; then
echo "---" >> "$PROGRESS_FILE"
fi
-echo "Starting Ralph - Tool: $TOOL - Max iterations: $MAX_ITERATIONS"
+# Circuit breaker: track attempts per story
+ATTEMPTS_FILE="$SCRIPT_DIR/.story-attempts"
+LAST_STORY_FILE="$SCRIPT_DIR/.last-story"
+
+# Initialize attempts tracking
+if [ ! -f "$ATTEMPTS_FILE" ]; then
+ echo "{}" > "$ATTEMPTS_FILE"
+fi
+
+# Function to get current story being worked on
+get_current_story() {
+ if [ -f "$PRD_FILE" ]; then
+ jq -r '.userStories[] | select(.passes == false) | .id' "$PRD_FILE" 2>/dev/null | head -1
+ fi
+}
+
+# Function to get attempts for a story
+get_story_attempts() {
+ local story_id="$1"
+ jq -r --arg id "$story_id" '.[$id] // 0' "$ATTEMPTS_FILE" 2>/dev/null || echo "0"
+}
+
+# Function to increment attempts for a story
+increment_story_attempts() {
+ local story_id="$1"
+ local current=$(get_story_attempts "$story_id")
+ local new_count=$((current + 1))
+ jq --arg id "$story_id" --argjson count "$new_count" '.[$id] = $count' "$ATTEMPTS_FILE" > "$ATTEMPTS_FILE.tmp" && mv "$ATTEMPTS_FILE.tmp" "$ATTEMPTS_FILE"
+ echo "$new_count"
+}
+
+# Function to mark story as skipped due to max attempts
+mark_story_skipped() {
+ local story_id="$1"
+ local max_attempts="$2"
+ local note="Skipped: exceeded $max_attempts attempts without passing"
+ jq --arg id "$story_id" --arg note "$note" '
+ .userStories = [.userStories[] | if .id == $id then .notes = $note else . end]
+ ' "$PRD_FILE" > "$PRD_FILE.tmp" && mv "$PRD_FILE.tmp" "$PRD_FILE"
+ echo "Circuit breaker: Marked story $story_id as skipped after $max_attempts attempts"
+}
+
+# Function to check and apply circuit breaker
+check_circuit_breaker() {
+ local story_id="$1"
+ local attempts=$(get_story_attempts "$story_id")
+
+ if [ "$attempts" -ge "$MAX_ATTEMPTS_PER_STORY" ]; then
+ echo "Circuit breaker: Story $story_id has reached max attempts ($attempts/$MAX_ATTEMPTS_PER_STORY)"
+ mark_story_skipped "$story_id" "$MAX_ATTEMPTS_PER_STORY"
+ return 0 # true - circuit breaker tripped
+ fi
+ return 1 # false - circuit breaker not tripped
+}
+
+echo "Starting Ralph - Tool: $TOOL - Max iterations: $MAX_ITERATIONS - Max attempts per story: $MAX_ATTEMPTS_PER_STORY"
for i in $(seq 1 $MAX_ITERATIONS); do
echo ""
@@ -87,6 +191,42 @@ for i in $(seq 1 $MAX_ITERATIONS); do
echo " Ralph Iteration $i of $MAX_ITERATIONS ($TOOL)"
echo "==============================================================="
+ # Get current story and check circuit breaker
+ CURRENT_STORY=$(get_current_story)
+
+ if [ -n "$CURRENT_STORY" ]; then
+ # Check if this is the same story as last iteration (consecutive failure detection)
+ LAST_STORY=""
+ if [ -f "$LAST_STORY_FILE" ]; then
+ LAST_STORY=$(cat "$LAST_STORY_FILE" 2>/dev/null || echo "")
+ fi
+
+ if [ "$CURRENT_STORY" == "$LAST_STORY" ]; then
+ echo "Consecutive attempt on story: $CURRENT_STORY"
+ ATTEMPTS=$(increment_story_attempts "$CURRENT_STORY")
+ echo "Attempts on $CURRENT_STORY: $ATTEMPTS/$MAX_ATTEMPTS_PER_STORY"
+
+ # Check circuit breaker
+ if check_circuit_breaker "$CURRENT_STORY"; then
+ echo "Skipping to next story..."
+ echo "$CURRENT_STORY" > "$LAST_STORY_FILE"
+ sleep 1
+ continue
+ fi
+ else
+ # New story, record first attempt
+ if [ -n "$CURRENT_STORY" ]; then
+ ATTEMPTS=$(increment_story_attempts "$CURRENT_STORY")
+ echo "Starting story: $CURRENT_STORY (attempt $ATTEMPTS/$MAX_ATTEMPTS_PER_STORY)"
+ fi
+ fi
+
+ # Record current story for next iteration
+ echo "$CURRENT_STORY" > "$LAST_STORY_FILE"
+ else
+ echo "No incomplete stories found"
+ fi
+
# Run the selected tool with the ralph prompt
if [[ "$TOOL" == "amp" ]]; then
OUTPUT=$(cat "$SCRIPT_DIR/prompt.md" | amp --dangerously-allow-all 2>&1 | tee /dev/stderr) || true
@@ -98,9 +238,27 @@ for i in $(seq 1 $MAX_ITERATIONS); do
# Check for completion signal
if echo "$OUTPUT" | grep -q "COMPLETE"; then
echo ""
- echo "Ralph completed all tasks!"
- echo "Completed at iteration $i of $MAX_ITERATIONS"
- exit 0
+ echo "COMPLETE signal received. Verifying all stories pass..."
+
+ # Verify all stories actually have passes:true
+ INCOMPLETE_STORIES=$(jq -r '.userStories[] | select(.passes == false) | .id' "$PRD_FILE" 2>/dev/null || echo "")
+
+ if [ -z "$INCOMPLETE_STORIES" ]; then
+ echo "Verification passed: All stories have passes:true"
+ echo ""
+ echo "Ralph completed all tasks!"
+ echo "Completed at iteration $i of $MAX_ITERATIONS"
+ exit 0
+ else
+ echo ""
+ echo "WARNING: COMPLETE claimed but verification failed!"
+ echo "The following stories still have passes:false:"
+ echo "$INCOMPLETE_STORIES" | while read -r story_id; do
+ echo " - $story_id"
+ done
+ echo ""
+ echo "Continuing iteration to fix incomplete stories..."
+ fi
fi
echo "Iteration $i complete. Continuing..."
diff --git a/skills/planning/SKILL.md b/skills/planning/SKILL.md
new file mode 100644
index 00000000..03e5bdbc
--- /dev/null
+++ b/skills/planning/SKILL.md
@@ -0,0 +1,343 @@
+---
+name: planning
+description: "Deep requirements exploration before creating a PRD. Use when starting any new feature to ensure requirements are fully understood. Triggers on: plan this feature, explore requirements, planning session, before I write a prd."
+---
+
+# Planning Skill
+
+Forces deep requirements exploration through 5 mandatory question rounds before you can create a PRD. This prevents under-specified features and wasted implementation cycles.
+
+---
+
+## The Job
+
+1. Conduct 5 rounds of questions with the user
+2. Document answers in a planning summary
+3. Save output to `tasks/planning-[feature].md`
+4. Only then can you proceed to PRD creation
+
+**Important:** You cannot skip rounds. All 5 rounds must be completed before moving to PRD.
+
+---
+
+## Completion Gate
+
+**You MUST complete all 5 rounds before this skill is considered complete.**
+
+After each round, explicitly state:
+```
+Round [N] complete. [5-N] rounds remaining.
+```
+
+Do NOT proceed to PRD creation until you have stated:
+```
+Round 5 complete. Planning session finished.
+```
+
+If the user asks to skip rounds or rush to implementation, remind them:
+> "Planning requires all 5 rounds. Skipping leads to incomplete requirements and rework. Which question should we tackle next?"
+
+---
+
+## Round 1: Problem Understanding
+
+**Goal:** Understand WHAT problem we're solving and WHY it matters.
+
+Ask questions about:
+- What problem does this solve?
+- Who experiences this problem?
+- What happens today without this feature?
+- What pain points does this address?
+- Why is this important now?
+
+### Example Questions:
+
+```
+1. What specific problem are we trying to solve?
+ A. Users cannot do X at all
+ B. Users can do X but it's slow/painful
+ C. Users frequently make mistakes doing X
+ D. Other: [please specify]
+
+2. Who experiences this problem most acutely?
+ A. New users during onboarding
+ B. Power users doing advanced tasks
+ C. All users equally
+ D. Internal team members
+
+3. What happens today when users encounter this problem?
+ A. They work around it manually
+ B. They contact support
+ C. They abandon the task
+ D. They use a competitor
+```
+
+After gathering answers, summarize:
+```
+## Round 1 Summary: Problem Understanding
+- Problem: [concise problem statement]
+- Affected users: [who]
+- Current workaround: [what they do now]
+- Impact: [why it matters]
+```
+
+**Round 1 complete. 4 rounds remaining.**
+
+---
+
+## Round 2: Scope Definition
+
+**Goal:** Define the boundaries of what we WILL and WON'T build.
+
+Ask questions about:
+- What is the minimum viable solution?
+- What would a full-featured version include?
+- What is explicitly out of scope?
+- What are the must-haves vs nice-to-haves?
+- What adjacent features should we NOT touch?
+
+### Example Questions:
+
+```
+1. What is the minimum viable version of this feature?
+ A. Just the core functionality, no polish
+ B. Core + basic UI polish
+ C. Full feature set with advanced options
+ D. Let me describe: [specify]
+
+2. Which of these are must-haves vs nice-to-haves?
+ [List potential features, ask user to categorize]
+
+3. What should this feature explicitly NOT do?
+ A. No integration with external services
+ B. No admin configuration options
+ C. No mobile-specific features
+ D. Other: [specify]
+
+4. Are there adjacent features we should leave alone?
+ A. Yes: [list them]
+ B. No, we can modify anything needed
+```
+
+After gathering answers, summarize:
+```
+## Round 2 Summary: Scope Definition
+- MVP includes: [list]
+- Nice-to-haves (not MVP): [list]
+- Explicitly out of scope: [list]
+- Do not touch: [list of adjacent features to avoid]
+```
+
+**Round 2 complete. 3 rounds remaining.**
+
+---
+
+## Round 3: Technical Constraints
+
+**Goal:** Identify technical limitations, dependencies, and architecture requirements.
+
+Ask questions about:
+- What existing systems does this touch?
+- What database changes are needed?
+- What API changes are needed?
+- Are there performance requirements?
+- Are there security considerations?
+- What dependencies exist?
+
+### Example Questions:
+
+```
+1. What existing systems will this feature interact with?
+ A. Database only
+ B. Database + existing API endpoints
+ C. Database + API + external services
+ D. Let me list: [specify]
+
+2. Are there performance requirements?
+ A. Must handle X requests per second
+ B. Must respond within X milliseconds
+ C. No specific requirements
+ D. Other: [specify]
+
+3. Are there security considerations?
+ A. Handles sensitive user data
+ B. Requires authentication checks
+ C. Needs rate limiting
+ D. No special security needs
+
+4. What existing code patterns should we follow?
+ A. Follow existing patterns in [module]
+ B. This is a new pattern for the codebase
+ C. Not sure, needs investigation
+```
+
+After gathering answers, summarize:
+```
+## Round 3 Summary: Technical Constraints
+- Systems affected: [list]
+- Database changes: [yes/no, what]
+- API changes: [yes/no, what]
+- Performance requirements: [list]
+- Security considerations: [list]
+- Patterns to follow: [reference]
+```
+
+**Round 3 complete. 2 rounds remaining.**
+
+---
+
+## Round 4: Edge Cases
+
+**Goal:** Identify what could go wrong and how to handle it.
+
+Ask questions about:
+- What happens when X fails?
+- What if the user does Y unexpectedly?
+- What about empty states?
+- What about error states?
+- What about concurrent operations?
+- What about data migration for existing users?
+
+### Example Questions:
+
+```
+1. What should happen when [primary action] fails?
+ A. Show error message and let user retry
+ B. Automatically retry X times
+ C. Fall back to [alternative behavior]
+ D. Other: [specify]
+
+2. What about empty states (no data yet)?
+ A. Show helpful empty state with CTA
+ B. Show nothing
+ C. Show sample/demo data
+ D. Other: [specify]
+
+3. What about existing users/data?
+ A. Migration needed for existing data
+ B. Feature only applies to new data
+ C. Backfill existing data automatically
+ D. Let users manually migrate
+
+4. What if user does something unexpected?
+ [List specific unexpected behaviors and ask how to handle]
+```
+
+After gathering answers, summarize:
+```
+## Round 4 Summary: Edge Cases
+- Error handling: [approach]
+- Empty states: [approach]
+- Data migration: [approach]
+- Unexpected user behavior: [list with handling]
+- Concurrent operations: [approach]
+```
+
+**Round 4 complete. 1 round remaining.**
+
+---
+
+## Round 5: Verification Strategy
+
+**Goal:** Define how we'll know the feature works correctly.
+
+Ask questions about:
+- How will we test this feature?
+- What manual testing is needed?
+- What automated tests should exist?
+- How do we verify in production?
+- What metrics indicate success?
+- What could we monitor for issues?
+
+### Example Questions:
+
+```
+1. What automated tests should cover this feature?
+ A. Unit tests for core logic
+ B. Integration tests for API endpoints
+ C. E2E tests for user flows
+ D. All of the above
+
+2. What manual testing is required?
+ A. Visual inspection of UI changes
+ B. Testing edge cases in browser
+ C. Testing with different user roles
+ D. List specific scenarios: [specify]
+
+3. How do we know this feature is successful in production?
+ A. Users complete [action] X% more often
+ B. Support tickets about [topic] decrease
+ C. Feature adoption reaches X%
+ D. Other metrics: [specify]
+
+4. What should we monitor for issues?
+ A. Error rates on new endpoints
+ B. Performance metrics
+ C. User feedback/complaints
+ D. All of the above
+```
+
+After gathering answers, summarize:
+```
+## Round 5 Summary: Verification Strategy
+- Automated tests: [list]
+- Manual testing: [list]
+- Success metrics: [list]
+- Monitoring: [list]
+```
+
+**Round 5 complete. Planning session finished.**
+
+---
+
+## Output Format
+
+After all 5 rounds, compile the summaries into `tasks/planning-[feature].md`:
+
+```markdown
+# Planning Summary: [Feature Name]
+
+Generated: [Date]
+Status: Ready for PRD
+
+---
+
+## Round 1: Problem Understanding
+[Summary from Round 1]
+
+## Round 2: Scope Definition
+[Summary from Round 2]
+
+## Round 3: Technical Constraints
+[Summary from Round 3]
+
+## Round 4: Edge Cases
+[Summary from Round 4]
+
+## Round 5: Verification Strategy
+[Summary from Round 5]
+
+---
+
+## Next Steps
+
+1. Create PRD using `/prd` skill
+2. Convert to `prd.json` using `/ralph` skill
+3. Run Ralph to implement
+```
+
+---
+
+## Checklist
+
+Before completing planning:
+
+- [ ] Round 1 complete (Problem Understanding)
+- [ ] Round 2 complete (Scope Definition)
+- [ ] Round 3 complete (Technical Constraints)
+- [ ] Round 4 complete (Edge Cases)
+- [ ] Round 5 complete (Verification Strategy)
+- [ ] All summaries documented
+- [ ] Saved to `tasks/planning-[feature].md`
+
+**Do not proceed to PRD until all boxes are checked.**
diff --git a/skills/prd/SKILL.md b/skills/prd/SKILL.md
index 0e55eb1a..4b61382b 100644
--- a/skills/prd/SKILL.md
+++ b/skills/prd/SKILL.md
@@ -9,6 +9,18 @@ Create detailed Product Requirements Documents that are clear, actionable, and s
---
+## Prerequisites
+
+Before writing a PRD, verify that planning has been completed:
+
+1. **Check for planning summary:** Look for `tasks/planning-[feature].md` from the `/planning` skill
+2. **If no planning exists:** STOP. Tell the user to run `/planning` first to explore requirements
+3. **If planning exists:** Reference it to inform the PRD structure
+
+**Why this matters:** PRDs written without planning often miss edge cases, have vague requirements, or solve the wrong problem. The planning skill forces 5 rounds of questions that surface critical details.
+
+---
+
## The Job
1. Receive a feature description from the user
diff --git a/skills/ralph/SKILL.md b/skills/ralph/SKILL.md
index c17043c6..509be804 100644
--- a/skills/ralph/SKILL.md
+++ b/skills/ralph/SKILL.md
@@ -79,9 +79,21 @@ Stories execute in priority order. Earlier stories must not depend on later ones
---
-## Acceptance Criteria: Must Be Verifiable
+## Acceptance Criteria: MACHINE-VERIFIABLE Required
-Each criterion must be something Ralph can CHECK, not something vague.
+**Every criterion must be MACHINE-VERIFIABLE.** If Ralph cannot verify it with a command, file check, or automated test, it is not a valid criterion.
+
+### Verification Types
+
+Each criterion should be checkable by one of these methods:
+
+| Type | How to Verify | Example Criterion |
+|------|---------------|-------------------|
+| **Command exit code** | Run command, check exit 0 | "Typecheck passes" → `npm run build` |
+| **File check** | Check file exists or has content | "File `docs/API.md` exists" → `ls docs/API.md` |
+| **Grep/content match** | Search file for pattern | "Contains 'export default'" → `grep -q 'export default' file.ts` |
+| **Database query** | Query returns expected result | "User table has email column" → `\d users` shows column |
+| **Browser automation** | Dev-browser skill verifies visually | "Button is visible" → navigate and screenshot |
### Good criteria (verifiable):
- "Add `status` column to tasks table with default 'pending'"
@@ -90,11 +102,33 @@ Each criterion must be something Ralph can CHECK, not something vague.
- "Typecheck passes"
- "Tests pass"
-### Bad criteria (vague):
-- "Works correctly"
-- "User can do X easily"
-- "Good UX"
-- "Handles edge cases"
+### FORBIDDEN Criteria
+
+**Never use these vague terms** — they cannot be machine-verified:
+
+| Forbidden Term | Why It Fails |
+|----------------|--------------|
+| "Works correctly" | What does "correctly" mean? No verification command. |
+| "Good UX" | Subjective. Cannot be automated. |
+| "Handles edge cases" | Which edge cases? Unspecified = unverifiable. |
+| "Is performant" | What threshold? No measurable target. |
+| "User-friendly" | Subjective opinion, not a testable state. |
+| "Clean code" | Style preference, not machine-checkable. |
+| "Properly implemented" | Circular definition, no verification method. |
+
+### Vague to Specific Conversion
+
+When you encounter vague requirements, convert them:
+
+| Vague (FORBIDDEN) | Specific (VERIFIABLE) |
+|-------------------|----------------------|
+| "Works correctly" | "Returns 200 status code for valid input" |
+| "Good UX" | "Form shows inline validation errors within 100ms" |
+| "Handles edge cases" | "Returns 400 error when email is empty" |
+| "Is performant" | "Query completes in under 100ms for 1000 rows" |
+| "User-friendly error messages" | "Error div contains text 'Invalid email format'" |
+| "Secure authentication" | "Password is hashed with bcrypt before storage" |
+| "Responsive design" | "Component renders at 320px, 768px, and 1024px widths" |
### Always include as final criterion:
```