ci: Auto-build and upload uv cache on miss#608
Merged
Conversation
Instead of failing CI when the prebuilt uv cache is missing (requiring a manual rebuild on a separate machine), gracefully fall back to building from scratch and uploading the cache for future runs. - Change permissions to contents: write for release asset uploads - Convert hard failures in cache restore to warnings with cache-hit output - Add upload step that archives the uv cache after uv sync and uploads via the existing build_and_push_uv_cache.sh script (--skip-build) - Re-check before upload to avoid races when concurrent CI runs both miss the cache - Use continue-on-error so upload failures never break quality checks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move cache building to a separate `build-cache` job that runs on a larger runner (`art-cache-builder`) only when the cache is missing. This avoids OOM on the 16GB `art-large-runner` during cold builds. - `cache-status`: lightweight check for existing cache (art-large-runner) - `build-cache`: builds and uploads cache on miss (art-cache-builder, >=32GB) - `quality-checks`: restores cache and runs checks (art-large-runner) On cache hit, build-cache is skipped and quality-checks runs immediately. On cache miss, quality-checks waits for build-cache to finish first. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Restrict parallel downloads (4), installs (1), and native build jobs (2) to keep peak memory usage within the 64GB runner limit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Docker Buildx manages memory via overlay layers and doesn't get OOM-killed like bare uv sync does. This matches the pre-#560 approach and works on the existing art-large-runner (16GB) without needing a larger runner. - Add docker/ci-uv-cache.Dockerfile to build the uv cache in Docker - build-cache job uses Buildx with GHA cache, then extracts the archive and uploads via the existing build_and_push_uv_cache.sh script - Remove dependency on art-cache-builder runner Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The 4-core/16GB art-large-runner thrashes on the Docker build due to the large packages (torch, vllm, cudnn, etc.). Use a dedicated larger runner only for cache builds to finish faster and more reliably. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
transformer-engine-torch needs cudnn.h which is provided by the pip nvidia-cudnn package. Set CUDNN_PATH and related env vars pointing to the venv location so the native extension can find the headers during compilation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
With 64GB RAM on art-cache-builder, we can run build_and_push_uv_cache.sh directly without Docker. Simpler, avoids Dockerfile env var complications (cuDNN paths, etc.), and reuses the existing script that already handles all the build details. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The cache-status job only needs python3 and curl (both on the runner natively) to compute a fingerprint and check the API. Removing the pytorch container avoids a slow image pull on every CI run. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
uv synccompletes, the new "Upload uv cache on miss" step archivesUV_CACHE_DIRand uploads it via the existingbuild_and_push_uv_cache.sh --skip-buildcontinue-on-error: trueso upload failures never break quality checksTest plan
🤖 Generated with Claude Code