Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
114 changes: 114 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,120 @@ Only update CLAUDE.md if you have **genuinely reusable knowledge** that would he
- Keep changes focused and minimal
- Follow existing code patterns

## Mandatory Quality Gates (Backpressure)

Quality gates are **mandatory blockers**, not suggestions. You MUST NOT mark a story as complete until ALL gates pass.

### Required Gates

Before marking ANY story as `passes: true`, you MUST verify:

1. **Typecheck MUST pass** - Run `npm run build` (or project equivalent) with zero errors
2. **Lint MUST pass** - Run `npm run lint` (or project equivalent) with zero errors
3. **Tests MUST pass** - Run `npm test` (or project equivalent) with zero failures

If ANY gate fails, the story is NOT complete. Period.

### Forbidden Shortcuts

Never use these to bypass quality gates:

| Forbidden | Why |
|-----------|-----|
| `@ts-ignore` | Hides type errors instead of fixing them |
| `@ts-expect-error` | Same as above - masks real problems |
| `eslint-disable` | Suppresses lint rules without fixing violations |
| `eslint-disable-next-line` | Same as above - circumvents quality checks |
| `// @nocheck` | Disables type checking for entire file |
| `any` type | Defeats the purpose of TypeScript |

If you find yourself reaching for these, STOP. Fix the actual issue.

### 3-Attempt Limit

If you cannot make a story pass quality gates after 3 attempts:

1. **STOP** - Do not continue iterating on the same approach
2. **Document** - Add detailed notes about what's failing and why
3. **Skip** - Move to the next story and let a human investigate
4. **Never** - Do not use forbidden shortcuts to force a pass

This prevents infinite loops on fundamentally blocked stories.

### Backpressure Mindset

Think of quality gates as physical barriers, not speed bumps:
- A speed bump slows you down but lets you pass
- A barrier stops you completely until you have the right key

You cannot "push through" a failing gate. You must fix it or stop.

## Verification Before Completion

Before claiming ANY story is complete, you MUST verify your work systematically. Do not trust your memory or assumptions—run the checks.

### Verification Checklist

Before marking a story as `passes: true`, complete this checklist:

```
## Verification Checklist for [Story ID]

### 1. Acceptance Criteria Check
- [ ] Criterion 1: [How verified - command/file check/grep]
- [ ] Criterion 2: [How verified]
- [ ] Criterion 3: [How verified]
... (one checkbox per criterion)

### 2. Quality Gates
- [ ] Typecheck passes: `npm run build` (or equivalent)
- [ ] Lint passes: `npm run lint` (or equivalent)
- [ ] Tests pass: `npm test` (or equivalent)

### 3. Regression Check
- [ ] Full test suite passes (not just new tests)
- [ ] No unrelated failures introduced

### 4. Final Verification
- [ ] Re-read each acceptance criterion one more time
- [ ] Confirmed each criterion is met with evidence
```

### How to Verify Each Criterion

For each acceptance criterion, you must have **evidence**, not just belief:

| Criterion Type | Verification Method |
|----------------|---------------------|
| "File X exists" | `ls -la path/to/X` or Read tool |
| "Contains section Y" | `grep -n "Y" file` or Read tool |
| "Command succeeds" | Run the command, check exit code |
| "Output contains Z" | Run command, pipe to grep |
| "Valid JSON" | `jq . file.json` succeeds |

### Before Outputting COMPLETE

When you believe ALL stories are done and you're about to output `<promise>COMPLETE</promise>`:

1. **Re-verify the current story** - Run all quality gates one more time
2. **Check prd.json** - Confirm all stories show `passes: true`
3. **Run full verification** - `jq '.userStories[] | select(.passes == false) | .id' prd.json` should return nothing
4. **Only then** output the COMPLETE signal

If ANY verification fails at this stage, do NOT output COMPLETE. Fix the issue first.

### Evidence Over Assertion

Never claim something works without proving it:

| Bad (Assertion) | Good (Evidence) |
|-----------------|-----------------|
| "I added the section" | "Verified with `grep -n 'Section Name' file` - found at line 42" |
| "Tests pass" | "Ran `npm test` - 47 tests passed, 0 failed" |
| "File is valid JSON" | "Ran `jq . file.json` - parsed successfully" |

Run the command. See the output. Report the evidence.

## Browser Testing (If Available)

For any story that changes UI, verify it works in the browser if you have browser testing tools configured (e.g., via MCP):
Expand Down
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,16 @@ Based on [Geoffrey Huntley's Ralph pattern](https://ghuntley.com/ralph/).

[Read my in-depth article on how I use Ralph](https://x.com/ryancarson/status/2008548371712135632)

## Security Warning

**Ralph runs AI agents autonomously with full access to your codebase.** Before running:

- **Never expose production credentials** - Ralph could accidentally commit, log, or transmit sensitive values like `AWS_ACCESS_KEY_ID`, `DATABASE_URL`, or API keys
- **Use sandboxing** - Run Ralph in a Docker container, VM, or isolated sandbox environment to limit potential damage
- **Review commits before pushing** - Always review what Ralph committed before pushing to remote

See [docs/SECURITY.md](docs/SECURITY.md) for complete security guidance, including pre-flight checklists and emergency stop procedures.

## Prerequisites

- One of the following AI coding tools installed and authenticated:
Expand Down
151 changes: 151 additions & 0 deletions docs/COST_TRACKING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
# Cost Tracking Guide

Running autonomous agents like Ralph can incur significant API costs. This guide helps you set budgets, track usage, and prevent runaway costs.

## Budget Recommendations

Set these budget limits before starting any autonomous session:

| Feature Size | Estimated Stories | Recommended Budget | Max Iterations |
|--------------|-------------------|-------------------|----------------|
| Small | 1-3 stories | $5-10 | 10 |
| Medium | 4-8 stories | $15-30 | 25 |
| Large | 9-15 stories | $40-75 | 50 |
| XL | 16+ stories | $100+ | 100 |

**Note:** These are estimates. Actual costs depend on story complexity, codebase size, and retry frequency.

## Feature Size to Budget Mapping

Use this guide to estimate budget before starting:

### Small Features ($5-10)
- Bug fixes with clear reproduction steps
- Adding a single new field or column
- Documentation updates
- Simple configuration changes
- 1-3 acceptance criteria per story

### Medium Features ($15-30)
- New API endpoint with tests
- Adding a new UI component
- Integration with existing service
- Refactoring a single module
- 3-5 acceptance criteria per story

### Large Features ($40-75)
- New feature spanning multiple files
- Database migration with data transformation
- Multi-step workflow implementation
- Cross-cutting concerns (auth, logging)
- 5+ acceptance criteria per story

## Claude Code Usage Tracking

Claude Code provides built-in usage tracking. Use these commands to monitor costs:

### Check Current Usage
```bash
# View usage summary for current session
claude usage

# View detailed usage breakdown
claude usage --detailed
```

### Set Budget Limits
```bash
# Set a budget limit before starting (prevents overspend)
claude config set budget_limit 25.00

# Check remaining budget
claude usage --remaining
```

### Monitor During Session
```bash
# Watch usage in real-time (run in separate terminal)
watch -n 30 'claude usage'
```

### Post-Session Analysis
```bash
# Export usage report
claude usage --export > usage-report-$(date +%Y%m%d).json

# Parse costs from report
cat usage-report-*.json | jq '.total_cost'
```

## Amp Usage Tracking

Amp (Sourcegraph's AI assistant) tracks usage through Sourcegraph's dashboard:

### Web Dashboard
1. Navigate to your Sourcegraph instance
2. Go to **Settings** → **Usage & Billing**
3. View Amp usage by time period

### CLI Tracking
```bash
# Check Amp usage via API (requires auth token)
curl -H "Authorization: token $SRC_ACCESS_TOKEN" \
https://sourcegraph.com/.api/user/usage | jq '.'

# Filter for Amp-specific usage
curl -H "Authorization: token $SRC_ACCESS_TOKEN" \
https://sourcegraph.com/.api/user/usage | jq '.amp'
```

### Amp Budget Controls
- Set organization-wide limits in Sourcegraph admin
- Per-user limits available in enterprise plans
- Monitor usage alerts via email or Slack integration

## Cost Prevention Strategies

### Before Starting
1. **Estimate scope** - Map features to budget sizes above
2. **Set hard limits** - Configure budget caps that stop execution
3. **Use circuit breakers** - Limit retries per story (see ralph.sh)
4. **Start small** - Run a pilot with 1-2 stories before full batch

### During Execution
1. **Monitor actively** - Watch `claude usage` during runs
2. **Check progress.txt** - Stories with many retries indicate problems
3. **Stop early** - Kill the session if costs are tracking above budget
4. **Review prd.json** - Check for stories repeatedly failing

### After Completion
1. **Export usage** - Save detailed reports for analysis
2. **Calculate cost per story** - Total cost / stories completed
3. **Adjust estimates** - Update budget recommendations based on actuals
4. **Identify expensive patterns** - Stories with many retries cost more

## Red Flags: Cost Warning Signs

Watch for these patterns that indicate escalating costs:

| Red Flag | Likely Cause | Action |
|----------|--------------|--------|
| Same story retrying 3+ times | Unclear acceptance criteria | Stop and clarify requirements |
| Many small commits | Agent thrashing on solution | Review approach |
| No progress for 5+ iterations | Blocking issue | Stop and investigate |
| Budget 50% spent, < 25% done | Scope underestimated | Re-evaluate or pause |

## Cost Tracking Checklist

Before each Ralph session:

- [ ] Estimated feature size and set budget
- [ ] Configured hard budget limit in Claude/Amp
- [ ] Set max iterations in ralph.sh
- [ ] Have monitoring terminal ready
- [ ] Know how to emergency stop

After each Ralph session:

- [ ] Exported usage report
- [ ] Calculated actual vs estimated cost
- [ ] Updated budget estimates if needed
- [ ] Documented expensive stories for future reference
Loading