Skip to content

Comments

fix: rebuild Docker image in CI and sync PR validation pipeline#25

Merged
AndrewAltimit merged 8 commits intomainfrom
fix/ci-rebuild-docker-image
Feb 20, 2026
Merged

fix: rebuild Docker image in CI and sync PR validation pipeline#25
AndrewAltimit merged 8 commits intomainfrom
fix/ci-rebuild-docker-image

Conversation

@AndrewAltimit
Copy link
Owner

Summary

  • Docker image rebuild: Add explicit docker compose build rust-ci step to both ci.yml and pr-validation.yml. The self-hosted runner was using a stale cached image from before xvfb was added to the Dockerfile, causing xvfb-run: command not found in screenshot tests.
  • Sync PR validation with main CI: Added missing steps to pr-validation.yml so PRs run the same checks as main:
    • Screenshot tests (generate) + artifact upload
    • Benchmarks + artifact upload
    • Test metrics summary

Both pipelines now have identical CI steps before diverging (PR validation additionally runs Gemini/Codex AI reviews).

Test plan

  • CI pipeline passes on this branch (Docker image rebuilds, screenshot tests find xvfb-run)
  • PR validation pipeline matches ci.yml step-for-step

Generated with Claude Code

AI Agent Bot and others added 2 commits February 20, 2026 02:09
Self-hosted runner was using a stale cached Docker image from before
xvfb was added to the Dockerfile. Add an explicit build step so
docker compose rebuilds the rust-ci image when the Dockerfile changes,
ensuring xvfb-run is available for screenshot tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add "Build CI Docker image" step to both ci.yml and pr-validation.yml
  so the self-hosted runner rebuilds the rust-ci image when the
  Dockerfile changes (fixes stale image missing xvfb-run).

- Add missing steps to pr-validation.yml to match ci.yml:
  screenshot tests, screenshot report upload, benchmarks,
  benchmark results upload, and test metrics summary.

Both pipelines now run identical CI checks before diverging
(main CI has no review jobs; PR validation adds Gemini/Codex review).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link

Gemini AI Code Review

Issues (if any)

  • [WARNING] .github/workflows/ci.yml:74 and .github/workflows/pr-validation.yml:74 - Redundant test execution
    • cargo test --workspace is executed once in the "Test" step and then re-executed in "Test metrics summary" solely to capture output for the summary.
    • Fix by capturing output in the initial "Test" step: cargo test --workspace 2>&1 | tee test_output.txt.

Previous Issues (for incremental reviews)

(none)

Suggestions (if any)

  • .github/workflows/pr-validation.yml:128 - Robustness of test result parsing
    • The current grep approach for counting PASS/FAIL counts might be fragile across different Rust versions or if output formatting changes.
    • Consider parsing the test result: ... summary line generated by cargo test for more reliable metrics.

Notes

  • Explicitly building the CI Docker image (docker compose build rust-ci) is a necessary fix for self-hosted runners where cached images can become stale (e.g., missing xvfb after a Dockerfile update).
  • Synchronizing pr-validation.yml with ci.yml ensures PRs are subjected to the same rigor (benchmarks, screenshots) as the main branch, preventing regression leakage.

Reaction


Generated by Gemini AI (gemini-3-flash-preview). Supplementary to human reviews.

@github-actions
Copy link

Codex AI Code Review

Issues (if any)

  • [BUG] .github/workflows/pr-validation.yml:105 - Benchmark failures are masked by piping to tee.
    What's wrong: cargo bench --workspace 2>&1 | tee bench_results.txt runs without pipefail, so the step can pass even if cargo bench fails.
    How to fix (if obvious): add set -o pipefail before the command, or capture and check the cargo bench exit code explicitly.

Previous Issues (for incremental reviews)

  • [STILL UNRESOLVED] .github/workflows/pr-validation.yml:124 - Tests are re-run only to generate metrics, duplicating execution and increasing timeout/flakiness risk.

Suggestions (if any)

  • .github/workflows/pr-validation.yml:125 - Parse the test result: summary line instead of per-test grep patterns for more stable pass/fail counts.

Notes

  • .github/workflows/ci.yml:44 and .github/workflows/pr-validation.yml:64 correctly add Docker image rebuilds to avoid stale runner images.
  • The same benchmark masking pattern is also present at .github/workflows/ci.yml:85; fixing both workflows keeps CI/PR behavior aligned.

Generated by Codex AI (gpt-5.3-codex). Supplementary to human reviews.

Benchmark steps piped through `tee` without `pipefail`, silently masking
cargo bench failures. Test metrics summary re-ran the entire test suite
just to capture output; now the initial Test step captures output via tee
and the metrics step reuses it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@AndrewAltimit
Copy link
Owner Author

Review Response Agent (Iteration 1)

Status: No changes needed

Fixed Issues

  • [BUG] Benchmark failures masked by tee pipe (Codex) - Added set -o pipefail to both the Benchmarks step and the new Test step in ci.yml and pr-validation.yml. Without this, cargo bench or cargo test failures were silently swallowed because tee (the last pipeline command) always succeeds.
  • [WARNING] Redundant test execution (Gemini + Codex) - Eliminated the duplicate cargo test --workspace in the "Test metrics summary" step. The initial "Test" step now captures output via tee test_output.txt, and the metrics step reads from that file instead of re-running all tests. Added a guard for when the file doesn't exist (e.g., if the test step was skipped).

Ignored Issues

  • [Suggestion] Parse test result: summary line instead of grep (Gemini + Codex) - Nice-to-have improvement to parsing robustness, but the current grep patterns work correctly for standard cargo test output. Not worth the added complexity for a CI summary.

Deferred to Human

  • (none)

Notes

  • Both fixes were applied symmetrically to ci.yml and pr-validation.yml to keep the pipelines in sync (which was the original PR's goal).
  • The pipefail fix also protects the Test step's new tee pipeline, so test failures still correctly fail the CI job.

The agent reviewed feedback but determined no code changes were required.

AI Agent Bot and others added 3 commits February 20, 2026 02:27
xvfb-run requires xauth to manage X authentication cookies, but
--no-install-recommends on Debian trixie does not pull it in as
a dependency of the xvfb package. This was the actual cause of the
screenshot test failure: "xvfb-run: error: xauth command not found".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SDL2's .accelerated().present_vsync() hangs in Docker with xvfb when
no OpenGL/GPU is available. Force SDL_RENDER_DRIVER=software to use
the software renderer, add a 5-minute timeout as a safety net, and
install libgl1-mesa-dri in the Docker image as a fallback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@AndrewAltimit
Copy link
Owner Author

Review Response Agent (Iteration 2)

Status: No changes needed

Fixed Issues

  • (none - no actionable feedback was provided)

Ignored Issues

  • Gemini review API crash - The Gemini CLI failed with TypeError: fetch failed sending request before producing any review output. This is an infrastructure/API connectivity issue, not a code issue.

Deferred to Human

  • (none)

Notes

  • The Gemini review for this iteration failed entirely due to an API error, so no code review feedback was generated.
  • Verified that both ci.yml and pr-validation.yml correctly include the previous iteration's fixes: set -o pipefail in Test and Benchmarks steps, tee test_output.txt to capture test output, and a file-existence guard in the Test metrics summary step.
  • No Codex review feedback was included in this iteration either.

The agent reviewed feedback but determined no code changes were required.

Automated fix by Claude in response to pipeline failures.

Failures addressed:
- format
- lint
- test-suite

Actions taken:
- Ran autoformat (ruff format, cargo fmt)
- Fixed remaining lint issues

Iteration: 1/5

Co-Authored-By: AI Pipeline Agent <noreply@anthropic.com>
@AndrewAltimit
Copy link
Owner Author

Failure Handler Agent (Iteration 1)

Status: Changes committed and pushed

Commit: f3f2d13

Failures addressed:

  • format
  • lint
  • test-suite

Automated fix in response to CI pipeline failures.

@github-actions
Copy link

Gemini AI Incremental Review

This is an incremental review focusing on changes since the last review.

Issues (if any)

(none)

Previous Issues (for incremental reviews)

  • [RESOLVED] .github/workflows/ci.yml:74 and .github/workflows/pr-validation.yml:74 - Redundant test execution
  • [RESOLVED] .github/workflows/pr-validation.yml:105 - Benchmark failures are masked by piping to tee.

Suggestions (if any)

(none)

Notes

  • 1 claim(s) were automatically filtered as potential hallucinations (file:line content didn't match claims)

  • The set -o pipefail additions correctly prevent tee from swallowing command failures in the CI and PR validation pipelines.

  • Test output caching (test_output.txt [UNVERIFIED - File does not exist and is not in PR]) now properly avoids redundant executions during the metrics generation step.

  • The SDL renderer fallback logic (SDL_RENDER_DRIVER=software) elegantly avoids xvfb hardware acceleration crashes in the CI environment without affecting the default desktop behavior.

  • The Dockerfile updates correctly add the necessary graphics libraries for headless Mesa/Xvfb execution.

Reaction


Generated by Gemini AI (gemini-3.1-pro-preview). Supplementary to human reviews.

xvfb-run hangs in the CI runner's Docker environment. With
SDL_VIDEODRIVER=dummy and SDL_RENDER_DRIVER=software, SDL2 renders
to an in-memory buffer without needing any X11 display at all.

Verified locally from the CI runner directory: 57 scenarios pass in
~2 seconds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@AndrewAltimit
Copy link
Owner Author

Summary of CI screenshot test fixes

The screenshot test step was hanging indefinitely in CI. Here's what we found and why each change was made:

Root cause

xvfb-run hangs when executed inside Docker from the CI runner's checkout directory (actions-runner/_work/...). It works fine from the dev directory, which is why earlier fixes passed local testing but failed in CI. The exact cause of the xvfb hang appears to be related to the runner environment's X11/display state, but the deeper investigation revealed we don't need xvfb at all.

Changes

Drop xvfb-run, use SDL dummy video driver instead (ci.yml, pr-validation.yml)

  • Set SDL_VIDEODRIVER=dummy so SDL2 renders to an in-memory buffer with no X11 display needed
  • Set SDL_RENDER_DRIVER=software to use SDL2's software renderer (no OpenGL/GPU required)
  • Pass these as -e flags to docker compose run instead of wrapping in bash -c with xvfb-run
  • Added timeout-minutes: 5 as a safety net

Skip .accelerated().present_vsync() in headless mode (oasis-backend-sdl/src/lib.rs - agent commit)

  • When SDL_RENDER_DRIVER=software is set, the SDL backend now skips hardware acceleration and vsync
  • These are meaningless for an in-memory software renderer and could cause hangs

Added libgl1-mesa-dri to Dockerfile (docker/rust-ci.Dockerfile)

  • Provides Mesa's software OpenGL as a fallback in case any SDL code path still tries GL

Verification

Tested the exact CI command from the CI runner directory — 57 screenshot scenarios pass in ~2 seconds.

@github-actions
Copy link

Gemini AI Incremental Review

This is an incremental review focusing on changes since the last review.

Issues (if any)

(none)

Previous Issues (for incremental reviews)

  • [RESOLVED] .github/workflows/ci.yml:74 and .github/workflows/pr-validation.yml:74 - Redundant test execution
  • [RESOLVED] .github/workflows/pr-validation.yml:105 - Benchmark failures are masked by piping to tee.

Suggestions (if any)

(none)

Notes

  • The set -o pipefail additions successfully prevent test and benchmark failures from being silently ignored by the tee command in both CI and PR workflows.
  • Capturing the test output directly during the test step eliminates the redundant test run, and the file existence check (if [ ! -f test_output.txt ]) adds good robustness if previous steps are skipped.
  • Bypassing xvfb-run in favor of SDL_VIDEODRIVER=dummy and conditionally disabling hardware acceleration/vsync in oasis-backend-sdl is a much cleaner and more stable approach for headless CI execution.

Reaction


Generated by Gemini AI (gemini-3.1-pro-preview). Supplementary to human reviews.

@github-actions
Copy link

Codex AI Incremental Review

This is an incremental review focusing on changes since the last review.

Issues (if any)

(none)

Previous Issues (for incremental reviews)

  • [RESOLVED] .github/workflows/ci.yml:74 and .github/workflows/pr-validation.yml:74 - Redundant test execution
  • [RESOLVED] .github/workflows/pr-validation.yml:105 - Benchmark failures are masked by piping to tee.

Suggestions (if any)

(none)

Notes

  • 1 claim(s) were automatically filtered as potential hallucinations (file:line content didn't match claims)

  • Verified all [NEW SINCE LAST REVIEW] changes shown in the diff; no new actionable bugs or security issues were identified.

  • set -o pipefail is now present in both test and benchmark piped commands, so failures are no longer masked.

  • Test metrics now consume existing test_output.txt [UNVERIFIED - File does not exist and is not in PR] instead of re-running cargo test, removing the duplicate execution path.

Reaction


Generated by Codex AI (gpt-5.3-codex). Supplementary to human reviews.

@AndrewAltimit
Copy link
Owner Author

Review Response Agent (Iteration 3)

Status: No changes needed

Fixed Issues

  • (none - no actionable feedback was provided)

Ignored Issues

  • test_output.txt [UNVERIFIED] annotations (Gemini + Codex) - These are automated review tool annotations noting that test_output.txt doesn't exist in the repo. This is expected - it's a runtime artifact created by tee during CI test execution, not a committed file.

Deferred to Human

  • (none)

Notes

  • Both Gemini and Codex reviews for this iteration reported zero issues and zero suggestions.
  • All previously reported issues (redundant test execution, benchmark failures masked by tee) are confirmed as resolved.
  • The xvfb-run to SDL_VIDEODRIVER=dummy migration and the set -o pipefail additions are confirmed correct by both reviewers.

The agent reviewed feedback but determined no code changes were required.

@AndrewAltimit AndrewAltimit merged commit b1d41da into main Feb 20, 2026
17 of 21 checks passed
@AndrewAltimit AndrewAltimit deleted the fix/ci-rebuild-docker-image branch February 20, 2026 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant