Skip to content

SDLC pipeline reviewers produce shallow analysis compared to PR reviewers #897

@james-in-a-box

Description

@james-in-a-box

Problem

The SDLC pipeline reviewers (code, contract, agent-design) produce significantly shallower analysis than the automated PR reviewers, despite sharing the same criteria files (shared/prompts/code-review-criteria.md, etc.).

Evidence from PR #895 (issue #871):

The SDLC implement-phase reviews stored in .egg-state/reviews/871-*.json are all brief JSON blobs with empty feedback:

SDLC Review Verdict Content
871-implement-code-review approved ~2 sentences, "feedback": ""
871-implement-contract-review approved ~2 sentences, "feedback": ""
871-plan-plan-review approved ~3 sentences, "feedback": ""
871-refine-refine-review approved 1 sentence, "feedback": ""

The automated PR reviewers running on the same code produced:

PR Review Content
contract-verification 20-row task-by-task table with criterion, evidence, and verdict for each task. Full re-verification on delta commit.
code review Found 5 specific issues: PR description inaccuracy, repeated identifier pattern (8+ sites), truthiness vs is not None inconsistency, duplicated fallback logic, data file inconsistency. Two drove a follow-up commit (930c82c). Full re-review of the fix.
agent-mode-design Brief but substantive in both rounds.

The SDLC code reviewer even noticed entrypoint.py and CLAUDE.md scope creep but only stored them as summary text with no feedback — those observations didn't result in any action.

Root Cause

The prompts share criteria files but diverge on output format and thoroughness instructions:

1. The verdict JSON schema constrains depth

_build_review_prompt() in orchestrator/routes/pipelines.py:1554-1570 tells reviewers to write:

{
  "reviewer": "code",
  "verdict": "approved" or "needs_revision",
  "summary": "Brief summary of findings",
  "feedback": "Detailed feedback if needs_revision, empty if approved",
  "timestamp": "ISO 8601 timestamp"
}

The instruction "empty if approved" explicitly tells reviewers not to provide feedback when approving. There is no field for advisory suggestions or non-blocking observations. This structurally discourages thorough analysis — an agent told to write {"feedback": ""} will naturally skim.

2. The SDLC prompt is less detailed than the PR prompt

PR code review prompt (action/build-review-prompt.sh):

  • "Perform a comprehensive, thorough code review"
  • "This is critical infrastructure—your review is the last line of defense before code reaches production"
  • "Find ALL issues on the first pass. Do not stop after identifying a few problems"
  • 6 detailed procedural steps (get diff, review every file, read context, trace data flow, research, consider edge cases)
  • Includes review conventions (be comprehensive, specific, direct, suggest fixes, provide context)

SDLC code review prompt (_build_review_prompt()) for non-draft code reviews:

  • Steps 1-9 exist but are briefer
  • No "last line of defense" framing
  • No review conventions included
  • The preamble says "Be thorough" but the output format immediately undercuts it

SDLC non-code reviewers (contract, agent-design, refine, plan) get only:

1. Read the draft at {path}
2. Evaluate it against the criteria below
3. Write your verdict to {path} as JSON
4. Commit the verdict file

3. No "approve with suggestions" concept

PR reviewers can approve with advisory suggestions via <!-- has-suggestions --> marker, which triggers the feedback-addressing workflow. SDLC reviewers have binary approved/needs_revision — no way to surface non-blocking observations that still get acted on.

Key Files

File Role
orchestrator/routes/pipelines.py (_build_review_prompt, lines 1452-1586) SDLC review prompt builder
orchestrator/routes/pipelines.py (_get_reviewer_scope_preamble, lines 1004-1044) Reviewer scope instructions
orchestrator/models.py (ReviewVerdict, lines 97-103) Verdict data model
orchestrator/routes/pipelines.py (_aggregate_review_verdicts, lines 1712-1738) Verdict aggregation
action/build-review-prompt.sh PR code review prompt builder
action/build-contract-verification-prompt.sh PR contract verification prompt builder
action/build-agent-mode-design-review-prompt.sh PR agent-design review prompt builder
action/review-conventions.md Review posting conventions (PR only)
shared/prompts/code-review-criteria.md Shared criteria (both systems)
shared/prompts/contract-review-criteria.md Shared criteria (both systems)
shared/prompts/agent-design-criteria.md Shared criteria (both systems)

Expected Outcome

SDLC reviewers should produce at minimum the same depth of analysis as PR reviewers. They should:

  1. Always provide detailed file-by-file analysis regardless of verdict
  2. Surface advisory suggestions even when approving
  3. Follow the same review conventions (be comprehensive, specific, direct, suggest fixes)
  4. Produce output that is actionable — not just a pass/fail stamp

This likely requires changes to the verdict model (add analysis/suggestions fields that are always populated), the prompt builder (align with PR reviewer instructions), and the aggregation logic (surface suggestions from approved verdicts).

Authored-by: egg

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions