snarktank · harrymunro · Jan 26, 2026 · Jan 26, 2026 · Jan 26, 2026 · Jan 26, 2026
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -77,6 +77,120 @@ Only update CLAUDE.md if you have **genuinely reusable knowledge** that would he
 - Keep changes focused and minimal
 - Follow existing code patterns
 
+## Mandatory Quality Gates (Backpressure)
+
+Quality gates are **mandatory blockers**, not suggestions. You MUST NOT mark a story as complete until ALL gates pass.
+
+### Required Gates
+
+Before marking ANY story as `passes: true`, you MUST verify:
+
+1. **Typecheck MUST pass** - Run `npm run build` (or project equivalent) with zero errors
+2. **Lint MUST pass** - Run `npm run lint` (or project equivalent) with zero errors
+3. **Tests MUST pass** - Run `npm test` (or project equivalent) with zero failures
+
+If ANY gate fails, the story is NOT complete. Period.
+
+### Forbidden Shortcuts
+
+Never use these to bypass quality gates:
+
+| Forbidden | Why |
+|-----------|-----|
+| `@ts-ignore` | Hides type errors instead of fixing them |
+| `@ts-expect-error` | Same as above - masks real problems |
+| `eslint-disable` | Suppresses lint rules without fixing violations |
+| `eslint-disable-next-line` | Same as above - circumvents quality checks |
+| `// @nocheck` | Disables type checking for entire file |
+| `any` type | Defeats the purpose of TypeScript |
+
+If you find yourself reaching for these, STOP. Fix the actual issue.
+
+### 3-Attempt Limit
+
+If you cannot make a story pass quality gates after 3 attempts:
+
+1. **STOP** - Do not continue iterating on the same approach
+2. **Document** - Add detailed notes about what's failing and why
+3. **Skip** - Move to the next story and let a human investigate
+4. **Never** - Do not use forbidden shortcuts to force a pass
+
+This prevents infinite loops on fundamentally blocked stories.
+
+### Backpressure Mindset
+
+Think of quality gates as physical barriers, not speed bumps:
+- A speed bump slows you down but lets you pass
+- A barrier stops you completely until you have the right key
+
+You cannot "push through" a failing gate. You must fix it or stop.
+
+## Verification Before Completion
+
+Before claiming ANY story is complete, you MUST verify your work systematically. Do not trust your memory or assumptions—run the checks.
+
+### Verification Checklist
+
+Before marking a story as `passes: true`, complete this checklist:
+
+```
+## Verification Checklist for [Story ID]
+
+### 1. Acceptance Criteria Check
+- [ ] Criterion 1: [How verified - command/file check/grep]
+- [ ] Criterion 2: [How verified]
+- [ ] Criterion 3: [How verified]
+... (one checkbox per criterion)
+
+### 2. Quality Gates
+- [ ] Typecheck passes: `npm run build` (or equivalent)
+- [ ] Lint passes: `npm run lint` (or equivalent)
+- [ ] Tests pass: `npm test` (or equivalent)
+
+### 3. Regression Check
+- [ ] Full test suite passes (not just new tests)
+- [ ] No unrelated failures introduced
+
+### 4. Final Verification
+- [ ] Re-read each acceptance criterion one more time
+- [ ] Confirmed each criterion is met with evidence
+```
+
+### How to Verify Each Criterion
+
+For each acceptance criterion, you must have **evidence**, not just belief:
+
+| Criterion Type | Verification Method |
+|----------------|---------------------|
+| "File X exists" | `ls -la path/to/X` or Read tool |
+| "Contains section Y" | `grep -n "Y" file` or Read tool |
+| "Command succeeds" | Run the command, check exit code |
+| "Output contains Z" | Run command, pipe to grep |
+| "Valid JSON" | `jq . file.json` succeeds |
+
+### Before Outputting COMPLETE
+
+When you believe ALL stories are done and you're about to output `<promise>COMPLETE</promise>`:
+
+1. **Re-verify the current story** - Run all quality gates one more time
+2. **Check prd.json** - Confirm all stories show `passes: true`
+3. **Run full verification** - `jq '.userStories[] | select(.passes == false) | .id' prd.json` should return nothing
+4. **Only then** output the COMPLETE signal
+
+If ANY verification fails at this stage, do NOT output COMPLETE. Fix the issue first.
+
+### Evidence Over Assertion
+
+Never claim something works without proving it:
+
+| Bad (Assertion) | Good (Evidence) |
+|-----------------|-----------------|
+| "I added the section" | "Verified with `grep -n 'Section Name' file` - found at line 42" |
+| "Tests pass" | "Ran `npm test` - 47 tests passed, 0 failed" |
+| "File is valid JSON" | "Ran `jq . file.json` - parsed successfully" |
+
+Run the command. See the output. Report the evidence.
+
 ## Browser Testing (If Available)
 
 For any story that changes UI, verify it works in the browser if you have browser testing tools configured (e.g., via MCP):

diff --git a/README.md b/README.md
@@ -8,6 +8,16 @@ Based on [Geoffrey Huntley's Ralph pattern](https://ghuntley.com/ralph/).
 
 [Read my in-depth article on how I use Ralph](https://x.com/ryancarson/status/2008548371712135632)
 
+## Security Warning
+
+**Ralph runs AI agents autonomously with full access to your codebase.** Before running:
+
+- **Never expose production credentials** - Ralph could accidentally commit, log, or transmit sensitive values like `AWS_ACCESS_KEY_ID`, `DATABASE_URL`, or API keys
+- **Use sandboxing** - Run Ralph in a Docker container, VM, or isolated sandbox environment to limit potential damage
+- **Review commits before pushing** - Always review what Ralph committed before pushing to remote
+
+See [docs/SECURITY.md](docs/SECURITY.md) for complete security guidance, including pre-flight checklists and emergency stop procedures.
+
 ## Prerequisites
 
 - One of the following AI coding tools installed and authenticated:

diff --git a/docs/COST_TRACKING.md b/docs/COST_TRACKING.md
@@ -0,0 +1,151 @@
+# Cost Tracking Guide
+
+Running autonomous agents like Ralph can incur significant API costs. This guide helps you set budgets, track usage, and prevent runaway costs.
+
+## Budget Recommendations
+
+Set these budget limits before starting any autonomous session:
+
+| Feature Size | Estimated Stories | Recommended Budget | Max Iterations |
+|--------------|-------------------|-------------------|----------------|
+| Small        | 1-3 stories       | $5-10             | 10             |
+| Medium       | 4-8 stories       | $15-30            | 25             |
+| Large        | 9-15 stories      | $40-75            | 50             |
+| XL           | 16+ stories       | $100+             | 100            |
+
+**Note:** These are estimates. Actual costs depend on story complexity, codebase size, and retry frequency.
+
+## Feature Size to Budget Mapping
+
+Use this guide to estimate budget before starting:
+
+### Small Features ($5-10)
+- Bug fixes with clear reproduction steps
+- Adding a single new field or column
+- Documentation updates
+- Simple configuration changes
+- 1-3 acceptance criteria per story
+
+### Medium Features ($15-30)
+- New API endpoint with tests
+- Adding a new UI component
+- Integration with existing service
+- Refactoring a single module
+- 3-5 acceptance criteria per story
+
+### Large Features ($40-75)
+- New feature spanning multiple files
+- Database migration with data transformation
+- Multi-step workflow implementation
+- Cross-cutting concerns (auth, logging)
+- 5+ acceptance criteria per story
+
+## Claude Code Usage Tracking
+
+Claude Code provides built-in usage tracking. Use these commands to monitor costs:
+
+### Check Current Usage
+```bash
+# View usage summary for current session
+claude usage
+
+# View detailed usage breakdown
+claude usage --detailed
+```
+
+### Set Budget Limits
+```bash
+# Set a budget limit before starting (prevents overspend)
+claude config set budget_limit 25.00
+
+# Check remaining budget
+claude usage --remaining
+```
+
+### Monitor During Session
+```bash
+# Watch usage in real-time (run in separate terminal)
+watch -n 30 'claude usage'
+```
+
+### Post-Session Analysis
+```bash
+# Export usage report
+claude usage --export > usage-report-$(date +%Y%m%d).json
+
+# Parse costs from report
+cat usage-report-*.json | jq '.total_cost'
+```
+
+## Amp Usage Tracking
+
+Amp (Sourcegraph's AI assistant) tracks usage through Sourcegraph's dashboard:
+
+### Web Dashboard
+1. Navigate to your Sourcegraph instance
+2. Go to **Settings** → **Usage & Billing**
+3. View Amp usage by time period
+
+### CLI Tracking
+```bash
+# Check Amp usage via API (requires auth token)
+curl -H "Authorization: token $SRC_ACCESS_TOKEN" \
+  https://sourcegraph.com/.api/user/usage | jq '.'
+
+# Filter for Amp-specific usage
+curl -H "Authorization: token $SRC_ACCESS_TOKEN" \
+  https://sourcegraph.com/.api/user/usage | jq '.amp'
+```
+
+### Amp Budget Controls
+- Set organization-wide limits in Sourcegraph admin
+- Per-user limits available in enterprise plans
+- Monitor usage alerts via email or Slack integration
+
+## Cost Prevention Strategies
+
+### Before Starting
+1. **Estimate scope** - Map features to budget sizes above
+2. **Set hard limits** - Configure budget caps that stop execution
+3. **Use circuit breakers** - Limit retries per story (see ralph.sh)
+4. **Start small** - Run a pilot with 1-2 stories before full batch
+
+### During Execution
+1. **Monitor actively** - Watch `claude usage` during runs
+2. **Check progress.txt** - Stories with many retries indicate problems
+3. **Stop early** - Kill the session if costs are tracking above budget
+4. **Review prd.json** - Check for stories repeatedly failing
+
+### After Completion
+1. **Export usage** - Save detailed reports for analysis
+2. **Calculate cost per story** - Total cost / stories completed
+3. **Adjust estimates** - Update budget recommendations based on actuals
+4. **Identify expensive patterns** - Stories with many retries cost more
+
+## Red Flags: Cost Warning Signs
+
+Watch for these patterns that indicate escalating costs:
+
+| Red Flag | Likely Cause | Action |
+|----------|--------------|--------|
+| Same story retrying 3+ times | Unclear acceptance criteria | Stop and clarify requirements |
+| Many small commits | Agent thrashing on solution | Review approach |
+| No progress for 5+ iterations | Blocking issue | Stop and investigate |
+| Budget 50% spent, < 25% done | Scope underestimated | Re-evaluate or pause |
+
+## Cost Tracking Checklist
+
+Before each Ralph session:
+
+- [ ] Estimated feature size and set budget
+- [ ] Configured hard budget limit in Claude/Amp
+- [ ] Set max iterations in ralph.sh
+- [ ] Have monitoring terminal ready
+- [ ] Know how to emergency stop
+
+After each Ralph session:
+
+- [ ] Exported usage report
+- [ ] Calculated actual vs estimated cost
+- [ ] Updated budget estimates if needed
+- [ ] Documented expensive stories for future reference