Add checkpoint discovery hints to agent prompts#894
Conversation
Container ecb438f293a0195cad54e4f5fe6dc8c1371af55ce05af32dd1dfefbe82427861 exited with uncommitted changes. This commit preserves the agent's work-in-progress. Authored-by: egg
Container 302feb3d8b2a3c5204f2f8d40b381654e890ed79dd73f4a45e788f1bc5531258 exited with uncommitted changes. This commit preserves the agent's work-in-progress. Authored-by: egg
Add role-specific checkpoint discovery hints at three levels:
1. Orchestrator prompts (auto-injected into every agent session):
- _build_role_context(): checkpoint pointer in For More Context section
- _build_agent_prompt(): tester gets coder checkpoint list, documenter
gets context --files, integrator gets context and cost commands
- _build_phase_scoped_prompt(): revision checklist gets failed-session hint
2. Agent mode commands (supplementary reference):
- tester-mode.md: Review Prior Work section
- integrator-mode.md: Pipeline Overview section
- documenter-mode.md: Find Changed Files section
- coder-mode.md: Revision Cycle Context section
3. Mission and checkpoint rules (baseline awareness):
- mission.md: Checkpoints row in context sources table, checkpoint
hint in Gather context workflow step
- checkpoint.md: When to Use section with role-specific guidance
All changes are additive text. No logic changes, no schema migrations.
All 1331 orchestrator tests pass.
Import the real docker module in orchestrator/tests/conftest.py before
test collection so that modules using sys.modules.setdefault("docker",
MagicMock()) don't shadow the real package. This prevents
docker_client.NotFound et al. from being bound to MagicMock objects
which aren't BaseException subclasses and break except clauses.
There was a problem hiding this comment.
No agent-mode design concerns. The changes are well-aligned with the agent-mode design guidelines.
The PR adds short checkpoint discovery hints (1-3 lines each) to orchestrator prompts, agent mode commands, and rules. These are orientation, not constraint — they tell agents when to use an existing tool (egg-checkpoint) without pre-fetching data, requiring structured output, or micromanaging procedure. The hints use suggestive language ("Before writing tests, review...") rather than imperative mandates, and reference the existing checkpoint.md rule for details rather than duplicating content. Analysis roles are correctly excluded.
— Authored by egg
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Code Review: Add checkpoint discovery hints to agent prompts
Verdict: Approve (with minor suggestions)
Low-risk PR that adds text-only checkpoint discovery hints to orchestrator prompts, agent mode commands, rules, and documentation. No logic changes, no schema migrations. The approach is sound — addressing discoverability at the highest-leverage injection points (orchestrator prompts) rather than over-engineering with new tooling.
Issues Found
1. Command inconsistency: tester-mode.md omits --phase implement (non-blocking)
orchestrator/routes/pipelines.py:2485 uses:
egg-checkpoint list --pipeline $EGG_PIPELINE_ID --agent-type coder --phase implement
sandbox/.claude/commands/tester-mode.md:92 uses:
egg-checkpoint list --pipeline $EGG_PIPELINE_ID --agent-type coder
The --phase implement flag is omitted from tester-mode.md. The pipelines.py version is more precise — it filters to the implement phase, which is what the tester actually needs. The tester-mode.md version would return checkpoints from all phases (including plan/refine), which is rarely useful. The code reviewer for this PR noted the same inconsistency. The checkpoint.md rule (line 9) also omits --phase implement for the tester use case.
Suggestion: Add --phase implement to both tester-mode.md and the tester entry in checkpoint.md to match pipelines.py.
2. conftest.py docker import fix — correct but worth a note (non-blocking)
The conftest.py change (orchestrator/tests/conftest.py:20-28) pre-imports the real docker module before test collection to prevent sys.modules.setdefault("docker", MagicMock()) calls from shadowing it. This is the correct fix for the test_docker_client.py failures caused by mock pollution from test_checkpoint_discovery.py.
The fix is sound and well-documented with a clear comment explaining the rationale. The try/except ImportError handles environments where docker isn't installed. This is a legitimate fix to a pre-existing fragility in the test suite's module mocking strategy.
3. Test coverage is solid (positive)
460 lines of tests in test_checkpoint_discovery.py covering:
- Positive cases for all execution roles (tester, documenter, integrator)
- Negative cases for analysis roles (architect, task_planner, risk_analyst)
- Edge cases (None prompt, None issue_number, empty feedback, cycle 0)
- Revision checklist gating on
review_cycle > 0 and review_feedback - End-to-end tests verifying both general and role-specific hints
- Ordering assertions (checkpoint hint appears after gap-finding section)
The test file follows the same docker mock pattern as test_pipeline_prompts.py.
No Blocking Issues
The PR is well-scoped and correctly implements the prompt-only approach (Option A). Specifically:
- Orchestrator prompt changes (
pipelines.py): Checkpoint hints are injected at the right locations —_build_role_contextfor the general pointer,_build_agent_promptfor role-specific commands, and_build_phase_scoped_promptfor revision cycle failed-session hints. Analysis roles are correctly excluded via the early return at line 1225. - Agent mode commands: Each role gets a contextually appropriate section. Coder-mode correctly scopes the hint to revision cycles only.
- Rules updates:
checkpoint.mdgets a "When to Use" section mapping roles to commands.mission.mdadds checkpoints to the context sources table and workflow. - Documentation:
checkpoint-access.mdadds documenter and coder-revision workflow examples.sdlc-pipeline.mdlists checkpoints as a 4th context persistence mechanism. .egg-statefiles: Pipeline state artifacts are properly updated for issue #887. These are expected tracking files.
The hint language is appropriately suggestive ("Before writing tests, review..." / "Find all changed files...") rather than imperative, which should minimize unnecessary checkpoint queries on first-cycle runs.
— Authored by egg
There was a problem hiding this comment.
Contract Verification: PASS
All 14 tasks across 4 phases verified against the contract for issue #887.
Phase 1: Orchestrator prompt injection (5/5 tasks verified)
| Task | Acceptance Criteria | Status |
|---|---|---|
| task-1-1 | _build_role_context() includes egg-checkpoint context --pipeline for all execution roles |
Verified — line appended to "For More Context" section, analysis roles correctly excluded |
| task-1-2 | Tester prompt includes egg-checkpoint list --pipeline $EGG_PIPELINE_ID --agent-type coder --phase implement |
Verified — exact command present in tester section |
| task-1-3 | Documenter prompt includes egg-checkpoint context --pipeline $EGG_PIPELINE_ID --files |
Verified — exact command present in documenter section |
| task-1-4 | Integrator prompt includes egg-checkpoint context and egg-checkpoint cost commands |
Verified — both commands present in integrator section |
| task-1-5 | Revision checklist includes egg-checkpoint list --issue $EGG_ISSUE_NUMBER --status failed when review_cycle > 0 |
Verified — added to revision checklist, correctly absent for cycle 0 and no-feedback cases |
Phase 2: Agent mode command updates (4/4 tasks verified)
| Task | Acceptance Criteria | Status |
|---|---|---|
| task-2-1 | tester-mode.md contains ## Review Prior Work with egg-checkpoint list command |
Verified |
| task-2-2 | integrator-mode.md contains ## Pipeline Overview with context and cost commands |
Verified |
| task-2-3 | documenter-mode.md contains ## Find Changed Files with context --files command |
Verified |
| task-2-4 | coder-mode.md contains ## Revision Cycle Context with --status failed command |
Verified |
Phase 3: Mission rule and checkpoint rule updates (3/3 tasks verified)
| Task | Acceptance Criteria | Status |
|---|---|---|
| task-3-1 | mission.md context sources table includes Checkpoints row | Verified — row added with egg-checkpoint CLI location |
| task-3-2 | mission.md "Gather context" mentions egg-checkpoint for multi-agent pipelines | Verified — appended to Gather context step |
| task-3-3 | checkpoint.md has "When to Use" section for tester, documenter, integrator, coder (revision) | Verified — all four roles listed with commands |
Phase 4: Verification (2/2 tasks verified)
| Task | Acceptance Criteria | Status |
|---|---|---|
| task-4-1 | All existing tests pass | Verified — 6683 passed, 2 pre-existing failures (test_docker_client.py, unrelated) |
| task-4-2 | Checkpoint hints appear in prompt outputs for each role | Verified — 29 new tests in test_checkpoint_discovery.py cover all roles |
Additional Changes (outside contract scope but consistent)
- docs/guides/checkpoint-access.md: Added documenter and coder-revision workflow examples, updated tester example to use --pipeline flag, added cost command to integrator example
- docs/guides/sdlc-pipeline.md: Added checkpoints as 4th context persistence mechanism and checkpoint discovery hints bullet
- sandbox/.claude/README.md: Added checkpoint.md to core rules listing
- orchestrator/tests/conftest.py: Fixed docker_client test failures from sys.modules mock pollution (import-ordering fix)
- orchestrator/tests/test_checkpoint_discovery.py: 460-line test file with 29 tests covering all prompt-building functions
Contract Notes
- The contract's top-level
acceptance_criteriaarray is empty — all criteria are embedded per-task. Could not useegg-contract verify-criteriondue to this structure. - All task statuses in the contract remain "pending" (not updated by the coder/integrator), but the implementation is complete and verified.
- No contract violations found.
- No orphaned code outside contract scope.
- All changes are additive text — no logic changes, no schema migrations.
Verdict: Approve
All 14 acceptance criteria are met. Implementation is clean, well-tested, and consistent across all injection points. No gaps or violations found.
— Authored by egg
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Addresses review feedback: the tester checkpoint list command in tester-mode.md and checkpoint.md was missing --phase implement, which would return checkpoints from all phases instead of just the implement phase that testers actually need. Now matches the authoritative version in orchestrator/routes/pipelines.py.
There was a problem hiding this comment.
Agreed. Added --phase implement to both sandbox/.claude/commands/tester-mode.md and sandbox/.claude/rules/checkpoint.md tester entries to match the authoritative version in orchestrator/routes/pipelines.py:2485. The tester only needs implement-phase checkpoints, so filtering by phase avoids returning plan/refine noise.
Pushed in 6f5a4f2.
— Authored by egg
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Contract Verification: Re-review (commit 6f5a4f2)
Verdict: PASS — All contract tasks verified.
Note: The branch was force-pushed since the previous review at b49d7978 (commit no longer in history). Full re-verification was performed.
Phase 1: Orchestrator prompt injection ✓
| Task | Acceptance Criteria | Status |
|---|---|---|
| task-1-1 | _build_role_context() includes egg-checkpoint context --pipeline for all execution roles |
Verified — Added at L1289+, analysis roles excluded by early return |
| task-1-2 | Tester prompt includes egg-checkpoint list --pipeline $EGG_PIPELINE_ID --agent-type coder --phase implement |
Verified — Added in tester section of _build_agent_prompt() |
| task-1-3 | Documenter prompt includes egg-checkpoint context --pipeline $EGG_PIPELINE_ID --files |
Verified — Added in documenter section |
| task-1-4 | Integrator prompt includes egg-checkpoint context and egg-checkpoint cost commands |
Verified — Both commands added in integrator section |
| task-1-5 | Revision checklist includes failed-session hint when review_cycle > 0 |
Verified — Added inside if review_feedback and review_cycle > 0: guard |
Phase 2: Agent mode command updates ✓
| Task | Acceptance Criteria | Status |
|---|---|---|
| task-2-1 | tester-mode.md has ## Review Prior Work with egg-checkpoint list |
Verified |
| task-2-2 | integrator-mode.md has ## Pipeline Overview with context + cost commands |
Verified |
| task-2-3 | documenter-mode.md has ## Find Changed Files with context --files |
Verified |
| task-2-4 | coder-mode.md has ## Revision Cycle Context with --status failed |
Verified |
Phase 3: Mission rule and checkpoint rule updates ✓
| Task | Acceptance Criteria | Status |
|---|---|---|
| task-3-1 | mission.md context sources table includes Checkpoints row |
Verified |
| task-3-2 | mission.md "Gather context" step mentions egg-checkpoint context |
Verified |
| task-3-3 | checkpoint.md has "When to Use" with tester/documenter/integrator/coder guidance |
Verified — All four roles listed with role-specific commands |
Phase 4: Verification ✓
| Task | Acceptance Criteria | Status |
|---|---|---|
| task-4-1 | All existing tests pass | Verified — 6685 passed, 85 skipped. 2 pre-existing failures in test_docker_client.py (on main) |
| task-4-2 | Checkpoint hints confirmed in prompts for all roles | Verified — 29 new tests in test_checkpoint_discovery.py covering all roles and edge cases |
Additional Changes (non-contract)
orchestrator/tests/conftest.py: Docker import fix to preventsys.modulesmock pollution — correctly addresses a pre-existing test ordering issuedocs/guides/checkpoint-access.md: Added documenter and coder-revision workflow examples, updated tester example to use--pipeline, added cost command to integrator exampledocs/guides/sdlc-pipeline.md: Added checkpoints as 4th context persistence mechanismsandbox/.claude/README.md: Added checkpoint.md to core rules listing
All additional changes are consistent with the contract's intent and do not violate any criteria.
Contract Integrity
- No regressions on previously verified criteria (force-push, so full re-verification done)
- No orphaned code outside contract scope
- No contract violations found
- All 14 task acceptance criteria met
— Authored by egg
There was a problem hiding this comment.
Re-Review: Add checkpoint discovery hints to agent prompts
Delta reviewed: b49d79788d..6f5a4f2 (feedback-addressing commit)
Previous Feedback: Addressed
The --phase implement inconsistency has been fixed correctly. The tester checkpoint command now includes --phase implement in all three locations:
sandbox/.claude/commands/tester-mode.md— fixedsandbox/.claude/rules/checkpoint.md— fixeddocs/guides/checkpoint-access.md— already correct
All five injection points (pipelines.py, tester-mode.md, checkpoint.md, checkpoint-access.md, and the test assertions) now use the same command: egg-checkpoint list --pipeline $EGG_PIPELINE_ID --agent-type coder --phase implement.
Full Review Summary
Reviewed all 21 changed files across the PR. No blocking issues found.
Orchestrator prompt changes (pipelines.py): Five clean insertions across _build_role_context, _build_agent_prompt (tester/documenter/integrator), and _build_phase_scoped_prompt. Analysis roles are correctly excluded via the early return at line 1225. Checkpoint hints are placed at the right location in each section — after the relevant task context, not before. The revision checklist hint is properly gated behind review_cycle > 0 and review_feedback.
Agent mode commands (.md files): Each role gets a contextually appropriate section placed before the Quality Checklist. Commands match the authoritative versions in pipelines.py.
Rules and docs: checkpoint.md "When to Use" section, mission.md context sources table, and checkpoint-access.md workflow examples are all consistent with each other and with the orchestrator prompts.
Tests (test_checkpoint_discovery.py): 29 tests covering positive cases for all execution roles, negative cases for all analysis roles, edge cases (None prompt, None issue_number, empty feedback, cycle 0), ordering assertions, and end-to-end flows. Test structure follows the existing test_pipeline_prompts.py pattern.
conftest.py fix: Pre-importing docker before test collection prevents sys.modules.setdefault("docker", MagicMock()) from shadowing the real package. Correct fix with proper try/except ImportError guard. This addresses a real test isolation issue.
No security, correctness, robustness, or design issues identified. All changes are additive prompt text — no logic modifications, no schema changes, no breaking changes.
— Authored by egg
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| egg is addressing review feedback... |
|
egg feedback addressed. View run logs 7 previous review(s) hidden. |
Resolved conflicts in two globally-named .egg-state files that were written by different pipelines (issue #871 vs #887) — the exact class of conflict this PR is designed to prevent: - .egg-state/agent-outputs/risk_analyst-output.json: kept PR branch version (issue #871 risk assessment) - .egg-state/checks/implement-results.json: kept PR branch version (issue #871 check results) Both files are ephemeral pipeline state artifacts. Main's versions were from a different pipeline run (issue #887, PR #894).
Improve checkpoint discoverability for agents
Agents rarely use
egg-checkpointbecause nothing in their prompts tells them to.The CLI works, documentation exists, and a Claude Code rule is present — but agents
only discover checkpoints if they happen to read the right docs. This change surfaces
checkpoint commands directly in agent prompts and mode documentation so every agent
session automatically knows how to query prior work.
Changes:
orchestrator/routes/pipelines.py): Added checkpointdiscovery hints to
_build_role_context,_build_agent_prompt(tester, documenter,integrator roles), and
_build_phase_scoped_prompt(revision cycles with failedsession lookup).
tester-mode.md,documenter-mode.md,integrator-mode.md, andcoder-mode.md(revision cycle).checkpoint.mdwith a "When to Use" section mapping each roleto its most useful checkpoint commands. Updated
mission.mdto list checkpointsas a context source and mention them in the gather-context workflow.
docs/guides/checkpoint-access.mdwith documenter andcoder-revision workflow examples, plus cost command for integrators.
test_checkpoint_discovery.py(460 lines) verifying all prompthints are present, with a
conftest.pyfix forsys.modulesmock isolation.Closes #887
Issue: #887
Test plan:
pytest orchestrator/tests/test_checkpoint_discovery.py— all tests passegg-checkpointcommands.mdfiles contain checkpoint sectionsAuthored-by: egg