fix: rebuild Docker image in CI and sync PR validation pipeline by AndrewAltimit · Pull Request #25 · AndrewAltimit/oasis-os

AndrewAltimit · 2026-02-20T08:11:15Z

Summary

Docker image rebuild: Add explicit docker compose build rust-ci step to both ci.yml and pr-validation.yml. The self-hosted runner was using a stale cached image from before xvfb was added to the Dockerfile, causing xvfb-run: command not found in screenshot tests.
Sync PR validation with main CI: Added missing steps to pr-validation.yml so PRs run the same checks as main:
- Screenshot tests (generate) + artifact upload
- Benchmarks + artifact upload
- Test metrics summary

Both pipelines now have identical CI steps before diverging (PR validation additionally runs Gemini/Codex AI reviews).

Test plan

CI pipeline passes on this branch (Docker image rebuilds, screenshot tests find xvfb-run)
PR validation pipeline matches ci.yml step-for-step

Self-hosted runner was using a stale cached Docker image from before xvfb was added to the Dockerfile. Add an explicit build step so docker compose rebuilds the rust-ci image when the Dockerfile changes, ensuring xvfb-run is available for screenshot tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add "Build CI Docker image" step to both ci.yml and pr-validation.yml so the self-hosted runner rebuilds the rust-ci image when the Dockerfile changes (fixes stale image missing xvfb-run). - Add missing steps to pr-validation.yml to match ci.yml: screenshot tests, screenshot report upload, benchmarks, benchmark results upload, and test metrics summary. Both pipelines now run identical CI checks before diverging (main CI has no review jobs; PR validation adds Gemini/Codex review). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-02-20T08:22:24Z

Gemini AI Code Review

Issues (if any)

[WARNING] .github/workflows/ci.yml:74 and .github/workflows/pr-validation.yml:74 - Redundant test execution
- cargo test --workspace is executed once in the "Test" step and then re-executed in "Test metrics summary" solely to capture output for the summary.
- Fix by capturing output in the initial "Test" step: cargo test --workspace 2>&1 | tee test_output.txt.

Previous Issues (for incremental reviews)

(none)

Suggestions (if any)

.github/workflows/pr-validation.yml:128 - Robustness of test result parsing
- The current grep approach for counting PASS/FAIL counts might be fragile across different Rust versions or if output formatting changes.
- Consider parsing the test result: ... summary line generated by cargo test for more reliable metrics.

Notes

Explicitly building the CI Docker image (docker compose build rust-ci) is a necessary fix for self-hosted runners where cached images can become stale (e.g., missing xvfb after a Dockerfile update).
Synchronizing pr-validation.yml with ci.yml ensures PRs are subjected to the same rigor (benchmarks, screenshots) as the main branch, preventing regression leakage.

Generated by Gemini AI (gemini-3-flash-preview). Supplementary to human reviews.

github-actions · 2026-02-20T08:24:13Z

Codex AI Code Review

Issues (if any)

[BUG] .github/workflows/pr-validation.yml:105 - Benchmark failures are masked by piping to tee.
What's wrong: cargo bench --workspace 2>&1 | tee bench_results.txt runs without pipefail, so the step can pass even if cargo bench fails.
How to fix (if obvious): add set -o pipefail before the command, or capture and check the cargo bench exit code explicitly.

Previous Issues (for incremental reviews)

[STILL UNRESOLVED] .github/workflows/pr-validation.yml:124 - Tests are re-run only to generate metrics, duplicating execution and increasing timeout/flakiness risk.

Suggestions (if any)

.github/workflows/pr-validation.yml:125 - Parse the test result: summary line instead of per-test grep patterns for more stable pass/fail counts.

Notes

.github/workflows/ci.yml:44 and .github/workflows/pr-validation.yml:64 correctly add Docker image rebuilds to avoid stale runner images.
The same benchmark masking pattern is also present at .github/workflows/ci.yml:85; fixing both workflows keeps CI/PR behavior aligned.

Generated by Codex AI (gpt-5.3-codex). Supplementary to human reviews.

Benchmark steps piped through `tee` without `pipefail`, silently masking cargo bench failures. Test metrics summary re-ran the entire test suite just to capture output; now the initial Test step captures output via tee and the metrics step reuses it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

AndrewAltimit · 2026-02-20T08:26:22Z

Review Response Agent (Iteration 1)

Status: No changes needed

Fixed Issues

[BUG] Benchmark failures masked by tee pipe (Codex) - Added set -o pipefail to both the Benchmarks step and the new Test step in ci.yml and pr-validation.yml. Without this, cargo bench or cargo test failures were silently swallowed because tee (the last pipeline command) always succeeds.
[WARNING] Redundant test execution (Gemini + Codex) - Eliminated the duplicate cargo test --workspace in the "Test metrics summary" step. The initial "Test" step now captures output via tee test_output.txt, and the metrics step reads from that file instead of re-running all tests. Added a guard for when the file doesn't exist (e.g., if the test step was skipped).

Ignored Issues

[Suggestion] Parse test result: summary line instead of grep (Gemini + Codex) - Nice-to-have improvement to parsing robustness, but the current grep patterns work correctly for standard cargo test output. Not worth the added complexity for a CI summary.

Deferred to Human

(none)

Notes

Both fixes were applied symmetrically to ci.yml and pr-validation.yml to keep the pipelines in sync (which was the original PR's goal).
The pipefail fix also protects the Test step's new tee pipeline, so test failures still correctly fail the CI job.

The agent reviewed feedback but determined no code changes were required.

xvfb-run requires xauth to manage X authentication cookies, but --no-install-recommends on Debian trixie does not pull it in as a dependency of the xvfb package. This was the actual cause of the screenshot test failure: "xvfb-run: error: xauth command not found". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

SDL2's .accelerated().present_vsync() hangs in Docker with xvfb when no OpenGL/GPU is available. Force SDL_RENDER_DRIVER=software to use the software renderer, add a 5-minute timeout as a safety net, and install libgl1-mesa-dri in the Docker image as a fallback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

AndrewAltimit · 2026-02-20T09:20:14Z

Review Response Agent (Iteration 2)

Status: No changes needed

Fixed Issues

(none - no actionable feedback was provided)

Ignored Issues

Gemini review API crash - The Gemini CLI failed with TypeError: fetch failed sending request before producing any review output. This is an infrastructure/API connectivity issue, not a code issue.

Deferred to Human

(none)

Notes

The Gemini review for this iteration failed entirely due to an API error, so no code review feedback was generated.
Verified that both ci.yml and pr-validation.yml correctly include the previous iteration's fixes: set -o pipefail in Test and Benchmarks steps, tee test_output.txt to capture test output, and a file-existence guard in the Test metrics summary step.
No Codex review feedback was included in this iteration either.

The agent reviewed feedback but determined no code changes were required.

Automated fix by Claude in response to pipeline failures. Failures addressed: - format - lint - test-suite Actions taken: - Ran autoformat (ruff format, cargo fmt) - Fixed remaining lint issues Iteration: 1/5 Co-Authored-By: AI Pipeline Agent <noreply@anthropic.com>

AndrewAltimit · 2026-02-20T09:26:33Z

Failure Handler Agent (Iteration 1)

Status: Changes committed and pushed

Commit: f3f2d13

Failures addressed:

format
lint
test-suite

Automated fix in response to CI pipeline failures.

github-actions · 2026-02-20T09:28:22Z

Gemini AI Incremental Review

This is an incremental review focusing on changes since the last review.

Issues (if any)

(none)

Previous Issues (for incremental reviews)

[RESOLVED] .github/workflows/ci.yml:74 and .github/workflows/pr-validation.yml:74 - Redundant test execution
[RESOLVED] .github/workflows/pr-validation.yml:105 - Benchmark failures are masked by piping to tee.

Suggestions (if any)

(none)

Notes

1 claim(s) were automatically filtered as potential hallucinations (file:line content didn't match claims)
The set -o pipefail additions correctly prevent tee from swallowing command failures in the CI and PR validation pipelines.
Test output caching (test_output.txt [UNVERIFIED - File does not exist and is not in PR]) now properly avoids redundant executions during the metrics generation step.
The SDL renderer fallback logic (SDL_RENDER_DRIVER=software) elegantly avoids xvfb hardware acceleration crashes in the CI environment without affecting the default desktop behavior.
The Dockerfile updates correctly add the necessary graphics libraries for headless Mesa/Xvfb execution.

Generated by Gemini AI (gemini-3.1-pro-preview). Supplementary to human reviews.

xvfb-run hangs in the CI runner's Docker environment. With SDL_VIDEODRIVER=dummy and SDL_RENDER_DRIVER=software, SDL2 renders to an in-memory buffer without needing any X11 display at all. Verified locally from the CI runner directory: 57 scenarios pass in ~2 seconds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

AndrewAltimit · 2026-02-20T09:37:47Z

Summary of CI screenshot test fixes

The screenshot test step was hanging indefinitely in CI. Here's what we found and why each change was made:

Root cause

xvfb-run hangs when executed inside Docker from the CI runner's checkout directory (actions-runner/_work/...). It works fine from the dev directory, which is why earlier fixes passed local testing but failed in CI. The exact cause of the xvfb hang appears to be related to the runner environment's X11/display state, but the deeper investigation revealed we don't need xvfb at all.

Changes

Drop xvfb-run, use SDL dummy video driver instead (ci.yml, pr-validation.yml)

Set SDL_VIDEODRIVER=dummy so SDL2 renders to an in-memory buffer with no X11 display needed
Set SDL_RENDER_DRIVER=software to use SDL2's software renderer (no OpenGL/GPU required)
Pass these as -e flags to docker compose run instead of wrapping in bash -c with xvfb-run
Added timeout-minutes: 5 as a safety net

Skip .accelerated().present_vsync() in headless mode (oasis-backend-sdl/src/lib.rs - agent commit)

When SDL_RENDER_DRIVER=software is set, the SDL backend now skips hardware acceleration and vsync
These are meaningless for an in-memory software renderer and could cause hangs

Added libgl1-mesa-dri to Dockerfile (docker/rust-ci.Dockerfile)

Provides Mesa's software OpenGL as a fallback in case any SDL code path still tries GL

Verification

Tested the exact CI command from the CI runner directory — 57 screenshot scenarios pass in ~2 seconds.

github-actions · 2026-02-20T10:13:13Z

Gemini AI Incremental Review

This is an incremental review focusing on changes since the last review.

Issues (if any)

(none)

Previous Issues (for incremental reviews)

[RESOLVED] .github/workflows/ci.yml:74 and .github/workflows/pr-validation.yml:74 - Redundant test execution
[RESOLVED] .github/workflows/pr-validation.yml:105 - Benchmark failures are masked by piping to tee.

Suggestions (if any)

(none)

Notes

The set -o pipefail additions successfully prevent test and benchmark failures from being silently ignored by the tee command in both CI and PR workflows.
Capturing the test output directly during the test step eliminates the redundant test run, and the file existence check (if [ ! -f test_output.txt ]) adds good robustness if previous steps are skipped.
Bypassing xvfb-run in favor of SDL_VIDEODRIVER=dummy and conditionally disabling hardware acceleration/vsync in oasis-backend-sdl is a much cleaner and more stable approach for headless CI execution.

Generated by Gemini AI (gemini-3.1-pro-preview). Supplementary to human reviews.

github-actions · 2026-02-20T10:15:47Z

Codex AI Incremental Review

This is an incremental review focusing on changes since the last review.

Issues (if any)

(none)

Previous Issues (for incremental reviews)

[RESOLVED] .github/workflows/ci.yml:74 and .github/workflows/pr-validation.yml:74 - Redundant test execution
[RESOLVED] .github/workflows/pr-validation.yml:105 - Benchmark failures are masked by piping to tee.

Suggestions (if any)

(none)

Notes

1 claim(s) were automatically filtered as potential hallucinations (file:line content didn't match claims)
Verified all [NEW SINCE LAST REVIEW] changes shown in the diff; no new actionable bugs or security issues were identified.
set -o pipefail is now present in both test and benchmark piped commands, so failures are no longer masked.
Test metrics now consume existing test_output.txt [UNVERIFIED - File does not exist and is not in PR] instead of re-running cargo test, removing the duplicate execution path.

Generated by Codex AI (gpt-5.3-codex). Supplementary to human reviews.

AndrewAltimit · 2026-02-20T10:16:47Z

Review Response Agent (Iteration 3)

Status: No changes needed

Fixed Issues

(none - no actionable feedback was provided)

Ignored Issues

test_output.txt [UNVERIFIED] annotations (Gemini + Codex) - These are automated review tool annotations noting that test_output.txt doesn't exist in the repo. This is expected - it's a runtime artifact created by tee during CI test execution, not a committed file.

Deferred to Human

(none)

Notes

Both Gemini and Codex reviews for this iteration reported zero issues and zero suggestions.
All previously reported issues (redundant test execution, benchmark failures masked by tee) are confirmed as resolved.
The xvfb-run to SDL_VIDEODRIVER=dummy migration and the set -o pipefail additions are confirmed correct by both reviewers.

The agent reviewed feedback but determined no code changes were required.

AI Agent Bot and others added 2 commits February 20, 2026 02:09

AI Agent Bot and others added 3 commits February 20, 2026 02:27

update to gemini-3.1-pro-preview

6db29e3

AndrewAltimit merged commit b1d41da into main Feb 20, 2026
17 of 21 checks passed

AndrewAltimit deleted the fix/ci-rebuild-docker-image branch February 20, 2026 10:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

fix: rebuild Docker image in CI and sync PR validation pipeline#25

fix: rebuild Docker image in CI and sync PR validation pipeline#25
AndrewAltimit merged 8 commits intomainfrom
fix/ci-rebuild-docker-image

AndrewAltimit commented Feb 20, 2026

Uh oh!

github-actions bot commented Feb 20, 2026

Uh oh!

github-actions bot commented Feb 20, 2026

Uh oh!

AndrewAltimit commented Feb 20, 2026

Uh oh!

AndrewAltimit commented Feb 20, 2026

Uh oh!

AndrewAltimit commented Feb 20, 2026

Uh oh!

github-actions bot commented Feb 20, 2026

Uh oh!

AndrewAltimit commented Feb 20, 2026

Uh oh!

github-actions bot commented Feb 20, 2026

Uh oh!

github-actions bot commented Feb 20, 2026

Uh oh!

AndrewAltimit commented Feb 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

AndrewAltimit commented Feb 20, 2026

Summary

Test plan

Uh oh!

github-actions bot commented Feb 20, 2026

Gemini AI Code Review

Issues (if any)

Previous Issues (for incremental reviews)

Suggestions (if any)

Notes

Uh oh!

github-actions bot commented Feb 20, 2026

Codex AI Code Review

Issues (if any)

Previous Issues (for incremental reviews)

Suggestions (if any)

Notes

Uh oh!

AndrewAltimit commented Feb 20, 2026

Review Response Agent (Iteration 1)

Fixed Issues

Ignored Issues

Deferred to Human

Notes

Uh oh!

AndrewAltimit commented Feb 20, 2026

Review Response Agent (Iteration 2)

Fixed Issues

Ignored Issues

Deferred to Human

Notes

Uh oh!

AndrewAltimit commented Feb 20, 2026

Failure Handler Agent (Iteration 1)

Uh oh!

github-actions bot commented Feb 20, 2026

Gemini AI Incremental Review

Issues (if any)

Previous Issues (for incremental reviews)

Suggestions (if any)

Notes

Uh oh!

AndrewAltimit commented Feb 20, 2026

Summary of CI screenshot test fixes

Root cause

Changes

Verification

Uh oh!

github-actions bot commented Feb 20, 2026

Gemini AI Incremental Review

Issues (if any)

Previous Issues (for incremental reviews)

Suggestions (if any)

Notes

Uh oh!

github-actions bot commented Feb 20, 2026

Codex AI Incremental Review

Issues (if any)

Previous Issues (for incremental reviews)

Suggestions (if any)

Notes

Uh oh!

AndrewAltimit commented Feb 20, 2026

Review Response Agent (Iteration 3)

Fixed Issues

Ignored Issues

Deferred to Human

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant