-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Problem
The SDLC pipeline reviewers (code, contract, agent-design) produce significantly shallower analysis than the automated PR reviewers, despite sharing the same criteria files (shared/prompts/code-review-criteria.md, etc.).
Evidence from PR #895 (issue #871):
The SDLC implement-phase reviews stored in .egg-state/reviews/871-*.json are all brief JSON blobs with empty feedback:
| SDLC Review | Verdict | Content |
|---|---|---|
871-implement-code-review |
approved | ~2 sentences, "feedback": "" |
871-implement-contract-review |
approved | ~2 sentences, "feedback": "" |
871-plan-plan-review |
approved | ~3 sentences, "feedback": "" |
871-refine-refine-review |
approved | 1 sentence, "feedback": "" |
The automated PR reviewers running on the same code produced:
| PR Review | Content |
|---|---|
| contract-verification | 20-row task-by-task table with criterion, evidence, and verdict for each task. Full re-verification on delta commit. |
| code review | Found 5 specific issues: PR description inaccuracy, repeated identifier pattern (8+ sites), truthiness vs is not None inconsistency, duplicated fallback logic, data file inconsistency. Two drove a follow-up commit (930c82c). Full re-review of the fix. |
| agent-mode-design | Brief but substantive in both rounds. |
The SDLC code reviewer even noticed entrypoint.py and CLAUDE.md scope creep but only stored them as summary text with no feedback — those observations didn't result in any action.
Root Cause
The prompts share criteria files but diverge on output format and thoroughness instructions:
1. The verdict JSON schema constrains depth
_build_review_prompt() in orchestrator/routes/pipelines.py:1554-1570 tells reviewers to write:
{
"reviewer": "code",
"verdict": "approved" or "needs_revision",
"summary": "Brief summary of findings",
"feedback": "Detailed feedback if needs_revision, empty if approved",
"timestamp": "ISO 8601 timestamp"
}The instruction "empty if approved" explicitly tells reviewers not to provide feedback when approving. There is no field for advisory suggestions or non-blocking observations. This structurally discourages thorough analysis — an agent told to write {"feedback": ""} will naturally skim.
2. The SDLC prompt is less detailed than the PR prompt
PR code review prompt (action/build-review-prompt.sh):
- "Perform a comprehensive, thorough code review"
- "This is critical infrastructure—your review is the last line of defense before code reaches production"
- "Find ALL issues on the first pass. Do not stop after identifying a few problems"
- 6 detailed procedural steps (get diff, review every file, read context, trace data flow, research, consider edge cases)
- Includes review conventions (be comprehensive, specific, direct, suggest fixes, provide context)
SDLC code review prompt (_build_review_prompt()) for non-draft code reviews:
- Steps 1-9 exist but are briefer
- No "last line of defense" framing
- No review conventions included
- The preamble says "Be thorough" but the output format immediately undercuts it
SDLC non-code reviewers (contract, agent-design, refine, plan) get only:
1. Read the draft at {path}
2. Evaluate it against the criteria below
3. Write your verdict to {path} as JSON
4. Commit the verdict file
3. No "approve with suggestions" concept
PR reviewers can approve with advisory suggestions via <!-- has-suggestions --> marker, which triggers the feedback-addressing workflow. SDLC reviewers have binary approved/needs_revision — no way to surface non-blocking observations that still get acted on.
Key Files
| File | Role |
|---|---|
orchestrator/routes/pipelines.py (_build_review_prompt, lines 1452-1586) |
SDLC review prompt builder |
orchestrator/routes/pipelines.py (_get_reviewer_scope_preamble, lines 1004-1044) |
Reviewer scope instructions |
orchestrator/models.py (ReviewVerdict, lines 97-103) |
Verdict data model |
orchestrator/routes/pipelines.py (_aggregate_review_verdicts, lines 1712-1738) |
Verdict aggregation |
action/build-review-prompt.sh |
PR code review prompt builder |
action/build-contract-verification-prompt.sh |
PR contract verification prompt builder |
action/build-agent-mode-design-review-prompt.sh |
PR agent-design review prompt builder |
action/review-conventions.md |
Review posting conventions (PR only) |
shared/prompts/code-review-criteria.md |
Shared criteria (both systems) |
shared/prompts/contract-review-criteria.md |
Shared criteria (both systems) |
shared/prompts/agent-design-criteria.md |
Shared criteria (both systems) |
Expected Outcome
SDLC reviewers should produce at minimum the same depth of analysis as PR reviewers. They should:
- Always provide detailed file-by-file analysis regardless of verdict
- Surface advisory suggestions even when approving
- Follow the same review conventions (be comprehensive, specific, direct, suggest fixes)
- Produce output that is actionable — not just a pass/fail stamp
This likely requires changes to the verdict model (add analysis/suggestions fields that are always populated), the prompt builder (align with PR reviewer instructions), and the aggregation logic (surface suggestions from approved verdicts).
Authored-by: egg