Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 71 additions & 8 deletions docs/workflows/review.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
# PR Review Workflow

The `pr-review` skill provides AI-powered code review for pull requests.
The `pr-review` skill provides AI-powered code review for pull requests, focusing on **real issues that matter** rather than style nitpicks.

## Overview

When a pull request is opened or updated:
1. GitHub Actions triggers the workflow
2. OpenCode loads the `pr-review` skill
3. AI analyzes each changed file
4. Posts a sticky comment with findings using `submit_review` tool
3. AI checks for prior review comments on the PR
4. AI analyzes each changed file for substantive issues
5. Posts a sticky comment with findings using `submit_review` tool

## Installation

Expand Down Expand Up @@ -55,18 +56,80 @@ jobs:
| `GITHUB_TOKEN` | Yes | GitHub authentication (automatic) |
| `ANTHROPIC_API_KEY` | Yes | Anthropic API key |

## Review Philosophy

The skill is designed to act like a **senior engineer in a time-limited code review**:

- Focus on issues that represent real risks or actual bugs
- Skip style preferences and subjective opinions
- Prioritize signal over noise with fewer, higher-quality findings
- Check prior review comments to avoid re-flagging resolved issues

## Review Focus Areas

The skill focuses on:
The skill focuses on substantive issues in these areas:

### 1. Correctness
Logic errors, broken control flow, off-by-one bugs, incorrect conditions.

**Flags**: Missing await, wrong operators, broken loops
**Skips**: Unused variables, type confusion without runtime impact

### 2. Security
Exploitable vulnerabilities that could be attacked.

**Flags**: SQL injection, XSS, auth bypass, secrets in code
**Skips**: Hypothetical scenarios, generic security advice

### 3. Stability
Issues that cause crashes, data loss, or resource problems.

**Flags**: Unhandled rejections, race conditions, resource leaks
**Skips**: Extra defensive checks, optional timeout additions

### 4. Maintainability
Clarity issues that cause actual confusion.

**Flags**: Misleading names, logic contradicting docs
**Skips**: Naming preferences, "could be cleaner" suggestions

## What the Skill Avoids

The skill explicitly skips these common nitpicks:

1. **Coding style preferences** - Spacing, quotes, line length (linters handle this)
2. **Defensive programming suggestions** - Extra null checks beyond type requirements
3. **DRY violations without real cost** - Small duplication that doesn't hurt maintenance
4. **Missing code comments** - Unless security-critical or intentionally obscure
5. **Subjective complexity concerns** - "Function too long" without concrete problems
6. **Minor refactoring suggestions** - Aesthetic improvements without objective benefit

## Context Awareness

The skill reads prior PR comments before reviewing to:

- Understand what feedback was already given
- Verify if previous issues were addressed
- Avoid re-flagging the same issues
- Acknowledge resolved feedback in the review summary

## Verdict Behavior

The skill uses verdicts conservatively:

- **`request_changes`**: Only for issues that should **block the merge** (bugs, security holes, stability risks)
- **`approve`**: Used generously for clean PRs or when issues are low-severity

1. **Correctness** - Logic errors, bugs, edge cases
2. **Security** - Vulnerabilities, injection risks, auth issues
3. **Stability** - Error handling, race conditions, resource leaks
4. **Maintainability** - Clarity, naming, convention violations
Severity levels:
- `low`: Worth noting but doesn't block merge
- `medium`: Will probably cause a bug in production
- `high`: Will definitely cause problems for users
- `critical`: Security vulnerability or data loss risk

## Customizing

Edit `.opencode/skill/pr-review/SKILL.md` to:
- Change review priorities
- Add project-specific guidelines
- Modify the verdict rules
- Adjust what counts as a nitpick
150 changes: 134 additions & 16 deletions src/skills/pr-review.ts
Original file line number Diff line number Diff line change
Expand Up @@ -9,37 +9,122 @@ metadata:

## What I Do

Review pull request changes systematically and post findings as a sticky PR comment using the \`submit_review\` tool.
Review pull request changes systematically and post findings as a sticky PR comment using the \`submit_review\` tool. Focus on **real issues that matter** - bugs, security risks, and stability problems that would block a merge in a typical code review.

## Review Philosophy

**Ask yourself before flagging any issue**: "Would a senior engineer in a time-limited code review raise this issue?"

- Flag issues that represent **real risks** or **actual bugs**
- Skip style preferences and subjective opinions
- Prioritize signal over noise - fewer, higher-quality findings

## Workflow

1. **Gather context**: Use GitHub CLI to get PR metadata
1. **Check prior feedback**: Fetch existing PR comments to understand context
\`\`\`bash
gh pr view <number> --json comments --jq '.comments[].body'
\`\`\`

2. **Gather PR context**: Get PR metadata and changed files
\`\`\`bash
gh pr view <number> --json files,title,body,headRefOid
\`\`\`

2. **Create todo list**: One item per changed file using \`todowrite\`
3. **Create validation todo list**: Track previously-flagged issues (if any) using \`todowrite\`
- One item per prior issue: "Validate if [issue] was addressed"
- One item per changed file for new review

3. **Analyze each file**:
4. **Analyze each file**:
- Mark todo as \`in_progress\`
- Read the file diff and surrounding context
- Note issues (correctness, security, stability, maintainability)
- Note ONLY issues that pass the "real issue" test (see Review Priorities)
- Mark todo as \`completed\`

4. **Synthesize review**: After ALL files are analyzed, determine verdict and summary
5. **Synthesize review**: After ALL files are analyzed, determine verdict and summary
- Acknowledge addressed feedback from prior reviews
- Only include NEW issues not previously identified

5. **Submit**: Call \`submit_review\` exactly once with all findings
6. **Submit**: Call \`submit_review\` exactly once with all findings

## Review Priorities

Focus on these areas in order of importance:
Focus on these areas. Each includes concrete examples of what IS and ISN'T worth flagging.

### 1. Correctness (Logic errors that cause wrong behavior)

| Flag This | Skip This |
|-----------|-----------|
| Off-by-one errors in loops/bounds | Type confusion that doesn't break runtime |
| Broken control flow (early returns, missing breaks) | Unused variables (linters catch this) |
| Incorrect boolean logic | Missing optional chaining on guaranteed-present fields |
| Wrong comparison operators | Stylistic null checks beyond language guarantees |
| Missing await on async calls | |

### 2. Security (Exploitable vulnerabilities)

| Flag This | Skip This |
|-----------|-----------|
| SQL/NoSQL injection | Hypothetical attack scenarios requiring admin access |
| XSS vulnerabilities | Missing rate limiting (unless auth-related) |
| Auth bypass possibilities | Theoretical timing attacks |
| Secrets/credentials in code | Generic "consider security implications" |
| Unsafe deserialization | |
| Path traversal | |

### 3. Stability (Issues causing crashes or data loss)

| Flag This | Skip This |
|-----------|-----------|
| Unhandled promise rejections in critical paths | Adding extra try-catch "just in case" |
| Race conditions with visible effects | Defensive null checks on typed fields |
| Resource leaks (unclosed handles, listeners) | Missing error logging (non-critical) |
| Infinite loops / recursion without base case | Optional timeout additions |
| Division by zero possibilities | |

### 4. Maintainability (Clarity issues causing confusion)

| Flag This | Skip This |
|-----------|-----------|
| Misleading function/variable names | Naming preferences (camelCase vs snake_case) |
| Logic that contradicts its documentation | Minor comment improvements |
| Dead code that appears intentional | DRY violations without maintenance risk |
| Inconsistency with adjacent code patterns | Subjective "this could be cleaner" |

## Common Nitpicks to Avoid

**DO NOT flag these issues** - they create noise without value:

1. **Coding style preferences** - Spaces vs tabs, quote styles, trailing commas, line length. Let linters handle this.

2. **Defensive programming suggestions** - Adding null checks, optional chaining, or try-catch blocks beyond what the type system or runtime requires.

3. **DRY violations without real cost** - Small code duplication that doesn't create maintenance burden. Not everything needs abstraction.

1. **Correctness** - Logic errors, broken control flow, off-by-one errors, incorrect conditions
2. **Security** - Injection vulnerabilities, auth issues, secrets in code, unsafe deserialization
3. **Stability** - Error handling, race conditions, resource leaks, null/undefined cases
4. **Maintainability** - Clarity, naming, violations of local conventions
4. **Missing code comments** - Unless the code is security-critical or intentionally non-obvious, don't require comments.

Only flag style issues when they hide bugs or cause real confusion.
5. **Subjective complexity concerns** - "This function is too long" or "Consider breaking this up" without identifying a concrete problem it causes.

6. **Minor refactoring suggestions** - Aesthetic improvements, variable renaming preferences, or "cleaner" alternatives that aren't objectively better.

## Context from Prior Reviews

Before starting your review:

1. **Fetch existing PR comments** to see prior feedback:
\`\`\`bash
gh pr view <number> --json comments,reviews --jq '.comments[].body, .reviews[].body'
\`\`\`

2. **If previous issues were flagged**:
- Create a todo item for each: "Validate: [previous issue title]"
- Check if the latest commits address each issue
- Mark as completed with status (resolved/still-present)

3. **In your final review**:
- Acknowledge issues that were fixed: "Previously flagged X - now resolved"
- Only flag issues that are NEW or STILL UNRESOLVED
- Don't re-flag the same issue in slightly different words

## Using submit_review

Expand Down Expand Up @@ -67,9 +152,24 @@ Each issue in the array needs:

## Verdict Rules

- Use \`request_changes\` if there are ANY medium, high, or critical issues
- Use \`approve\` if there are only low-severity issues or no issues at all
- The verdict must be consistent with the worst severity in the issues array
**Only use \`request_changes\` for issues that should BLOCK the merge:**

- \`critical\`: Security vulnerabilities, data loss risks, complete feature breakage
- \`high\`: Bugs that will affect users, significant logic errors
- \`medium\`: Edge cases likely to cause issues, unclear but problematic patterns

**Use \`approve\` generously:**

- If issues are \`low\` severity only - approve with notes
- If issues are style/preference-based - approve (and don't flag them)
- If you're unsure whether something is a real issue - lean toward approve

**Severity guidance:**

- Default to \`low\` unless the issue clearly meets \`medium\` or higher criteria
- \`medium\` = "This will probably cause a bug in production"
- \`high\` = "This will definitely cause problems for users"
- \`critical\` = "This is a security hole or will cause data loss"

## Common Mistakes to Avoid

Expand All @@ -79,6 +179,8 @@ Each issue in the array needs:
- Do NOT guess repository, PR number, or commit SHA - derive from git/gh commands
- Do NOT include JSON fragments in summary or explanation fields
- Do NOT put prose like "change to:" in the suggestion field
- Do NOT re-flag issues from prior reviews that were already addressed
- Do NOT flag style issues that linters or formatters should handle

## Example Issue

Expand All @@ -92,4 +194,20 @@ Each issue in the array needs:
"suggestion": "const user = await db.query('SELECT * FROM users WHERE email = $1', [email])"
}
\`\`\`

## Example of What NOT to Flag

\`\`\`typescript
// DON'T flag: "Consider using optional chaining"
const name = user.profile.name; // If types guarantee profile exists, this is fine

// DON'T flag: "Variable could be const"
let count = 0; // Let linters handle this

// DON'T flag: "Function is too long"
function processOrder() { /* 80 lines */ } // Unless there's a concrete bug

// DON'T flag: "Consider adding error handling"
await saveUser(user); // Unless errors here would cause data loss
\`\`\`
`;