SDLC pipeline reviewers produce shallow analysis compared to PR reviewers

## Problem

The SDLC pipeline reviewers (code, contract, agent-design) produce significantly shallower analysis than the automated PR reviewers, despite sharing the same criteria files (`shared/prompts/code-review-criteria.md`, etc.).

**Evidence from PR #895 (issue #871):**

The **SDLC implement-phase reviews** stored in `.egg-state/reviews/871-*.json` are all brief JSON blobs with empty feedback:

| SDLC Review | Verdict | Content |
|-------------|---------|---------|
| `871-implement-code-review` | approved | ~2 sentences, `"feedback": ""` |
| `871-implement-contract-review` | approved | ~2 sentences, `"feedback": ""` |
| `871-plan-plan-review` | approved | ~3 sentences, `"feedback": ""` |
| `871-refine-refine-review` | approved | 1 sentence, `"feedback": ""` |

The **automated PR reviewers** running on the same code produced:

| PR Review | Content |
|-----------|---------|
| **contract-verification** | 20-row task-by-task table with criterion, evidence, and verdict for each task. Full re-verification on delta commit. |
| **code review** | Found **5 specific issues**: PR description inaccuracy, repeated identifier pattern (8+ sites), truthiness vs `is not None` inconsistency, duplicated fallback logic, data file inconsistency. Two drove a follow-up commit (`930c82c`). Full re-review of the fix. |
| **agent-mode-design** | Brief but substantive in both rounds. |

The SDLC code reviewer even noticed entrypoint.py and CLAUDE.md scope creep but only stored them as summary text with no feedback — those observations didn't result in any action.

## Root Cause

The prompts share criteria files but diverge on **output format and thoroughness instructions**:

### 1. The verdict JSON schema constrains depth

`_build_review_prompt()` in `orchestrator/routes/pipelines.py:1554-1570` tells reviewers to write:

```json
{
  "reviewer": "code",
  "verdict": "approved" or "needs_revision",
  "summary": "Brief summary of findings",
  "feedback": "Detailed feedback if needs_revision, empty if approved",
  "timestamp": "ISO 8601 timestamp"
}
```

The instruction **"empty if approved"** explicitly tells reviewers not to provide feedback when approving. There is no field for advisory suggestions or non-blocking observations. This structurally discourages thorough analysis — an agent told to write `{"feedback": ""}` will naturally skim.

### 2. The SDLC prompt is less detailed than the PR prompt

**PR code review prompt** (`action/build-review-prompt.sh`):
- "Perform a comprehensive, thorough code review"
- "This is critical infrastructure—your review is the last line of defense before code reaches production"
- "Find ALL issues on the first pass. Do not stop after identifying a few problems"
- 6 detailed procedural steps (get diff, review every file, read context, trace data flow, research, consider edge cases)
- Includes review conventions (be comprehensive, specific, direct, suggest fixes, provide context)

**SDLC code review prompt** (`_build_review_prompt()`) for non-draft code reviews:
- Steps 1-9 exist but are briefer
- No "last line of defense" framing
- No review conventions included
- The preamble says "Be thorough" but the output format immediately undercuts it

**SDLC non-code reviewers** (contract, agent-design, refine, plan) get only:
```
1. Read the draft at {path}
2. Evaluate it against the criteria below
3. Write your verdict to {path} as JSON
4. Commit the verdict file
```

### 3. No "approve with suggestions" concept

PR reviewers can approve with advisory suggestions via `` marker, which triggers the feedback-addressing workflow. SDLC reviewers have binary approved/needs_revision — no way to surface non-blocking observations that still get acted on.

## Key Files

| File | Role |
|------|------|
| `orchestrator/routes/pipelines.py` (`_build_review_prompt`, lines 1452-1586) | SDLC review prompt builder |
| `orchestrator/routes/pipelines.py` (`_get_reviewer_scope_preamble`, lines 1004-1044) | Reviewer scope instructions |
| `orchestrator/models.py` (`ReviewVerdict`, lines 97-103) | Verdict data model |
| `orchestrator/routes/pipelines.py` (`_aggregate_review_verdicts`, lines 1712-1738) | Verdict aggregation |
| `action/build-review-prompt.sh` | PR code review prompt builder |
| `action/build-contract-verification-prompt.sh` | PR contract verification prompt builder |
| `action/build-agent-mode-design-review-prompt.sh` | PR agent-design review prompt builder |
| `action/review-conventions.md` | Review posting conventions (PR only) |
| `shared/prompts/code-review-criteria.md` | Shared criteria (both systems) |
| `shared/prompts/contract-review-criteria.md` | Shared criteria (both systems) |
| `shared/prompts/agent-design-criteria.md` | Shared criteria (both systems) |

## Expected Outcome

SDLC reviewers should produce **at minimum** the same depth of analysis as PR reviewers. They should:

1. Always provide detailed file-by-file analysis regardless of verdict
2. Surface advisory suggestions even when approving
3. Follow the same review conventions (be comprehensive, specific, direct, suggest fixes)
4. Produce output that is actionable — not just a pass/fail stamp

This likely requires changes to the verdict model (add `analysis`/`suggestions` fields that are always populated), the prompt builder (align with PR reviewer instructions), and the aggregation logic (surface suggestions from approved verdicts).

Authored-by: egg

SDLC Review	Verdict	Content
`871-implement-code-review`	approved	~2 sentences, `"feedback": ""`
`871-implement-contract-review`	approved	~2 sentences, `"feedback": ""`
`871-plan-plan-review`	approved	~3 sentences, `"feedback": ""`
`871-refine-refine-review`	approved	1 sentence, `"feedback": ""`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SDLC pipeline reviewers produce shallow analysis compared to PR reviewers #897

Problem

Root Cause

1. The verdict JSON schema constrains depth

2. The SDLC prompt is less detailed than the PR prompt

3. No "approve with suggestions" concept

Key Files

Expected Outcome

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

PR Review	Content
contract-verification	20-row task-by-task table with criterion, evidence, and verdict for each task. Full re-verification on delta commit.
code review	Found 5 specific issues: PR description inaccuracy, repeated identifier pattern (8+ sites), truthiness vs `is not None` inconsistency, duplicated fallback logic, data file inconsistency. Two drove a follow-up commit (`930c82c`). Full re-review of the fix.
agent-mode-design	Brief but substantive in both rounds.

File	Role
`orchestrator/routes/pipelines.py` (`_build_review_prompt`, lines 1452-1586)	SDLC review prompt builder
`orchestrator/routes/pipelines.py` (`_get_reviewer_scope_preamble`, lines 1004-1044)	Reviewer scope instructions
`orchestrator/models.py` (`ReviewVerdict`, lines 97-103)	Verdict data model
`orchestrator/routes/pipelines.py` (`_aggregate_review_verdicts`, lines 1712-1738)	Verdict aggregation
`action/build-review-prompt.sh`	PR code review prompt builder
`action/build-contract-verification-prompt.sh`	PR contract verification prompt builder
`action/build-agent-mode-design-review-prompt.sh`	PR agent-design review prompt builder
`action/review-conventions.md`	Review posting conventions (PR only)
`shared/prompts/code-review-criteria.md`	Shared criteria (both systems)
`shared/prompts/contract-review-criteria.md`	Shared criteria (both systems)
`shared/prompts/agent-design-criteria.md`	Shared criteria (both systems)

SDLC pipeline reviewers produce shallow analysis compared to PR reviewers #897

Description

Problem

Root Cause

1. The verdict JSON schema constrains depth

2. The SDLC prompt is less detailed than the PR prompt

3. No "approve with suggestions" concept

Key Files

Expected Outcome

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions