Skip to content

Conversation

@andrew-anyscale
Copy link
Contributor

@andrew-anyscale andrew-anyscale commented Jan 7, 2026

This migrates ray wheel builds from CLI-based approach to wanda-based
container builds for x86_64.

Changes:

  • Add ray-wheel.wanda.yaml and Dockerfile for wheel builds
  • Update build.rayci.yml wheel steps to use wanda
  • Add wheel upload steps that extract from wanda cache

Topic: ray-wheel

@andrew-anyscale
Copy link
Contributor Author

andrew-anyscale commented Jan 7, 2026

Reviews in this chain:
#59935 Migrate wheel builds to wanda (x86_64)
 └#59969 Add wanda cpp wheel builds (x86_64)
  └#59936 Add wanda ray image builds for Docker Hub
   └#59937 Add wanda anyscale image builds for release tests

@andrew-anyscale
Copy link
Contributor Author

andrew-anyscale commented Jan 7, 2026

# head base diff date summary
0 f104bf59 be77f0aa diff Jan 7 7:57 AM 8 files changed, 369 insertions(+), 7 deletions(-)
1 288cadb6 be77f0aa diff Jan 7 7:58 AM 0 files changed
2 164e2e4c be77f0aa diff Jan 7 11:29 AM 1 file changed, 22 insertions(+)
3 b4fb3e06 be77f0aa diff Jan 7 12:21 PM 1 file changed, 1 insertion(+), 1 deletion(-)
4 cd792539 be77f0aa diff Jan 7 17:29 PM 1 file changed, 1 insertion(+), 1 deletion(-)
5 7d257e47 be77f0aa diff Jan 7 17:31 PM 1 file changed, 20 deletions(-)
6 e93a821d 9700991f diff Jan 8 8:27 AM 5 files changed, 53 insertions(+), 179 deletions(-)
7 8a4bdf5b 9700991f diff Jan 8 8:32 AM 2 files changed, 6 insertions(+), 17 deletions(-)
8 a0b03e94 8c3a7135 diff Jan 8 12:01 PM 1 file changed, 14 insertions(+), 9 deletions(-)
9 32347fa5 8c3a7135 diff Jan 8 12:51 PM 1 file changed, 12 insertions(+), 1 deletion(-)
10 9074421e 8c3a7135 diff Jan 8 13:33 PM 1 file changed, 9 insertions(+), 41 deletions(-)
11 1b2c341f 8c3a7135 diff Jan 8 14:30 PM 0 files changed
12 05785cb4 8c3a7135 diff Jan 9 8:45 AM 2 files changed, 15 insertions(+), 32 deletions(-)
13 37410f9a f580a273 diff Jan 9 10:01 AM 1 file changed, 6 insertions(+), 3 deletions(-)

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request migrates the x86_64 wheel builds from a CLI-based approach to wanda-based container builds. This involves adding new wanda configuration files, Dockerfiles, and updating the Buildkite CI configuration. The changes look good and are a solid step towards modernizing the build process. I have a couple of suggestions to improve maintainability and build efficiency.

Comment on lines 104 to 66
commands:
- bazel run //ci/ray_ci:build_in_docker -- wheel --python-version {{matrix}} --architecture x86_64 --upload
# Extract wheel from wanda image (Wanda cache in ECR)
- wanda_image="$RAYCI_WORK_REPO:$RAYCI_BUILD_ID-ray-wheel-py{{matrix}}"
- container_id=$(docker create $wanda_image)
- mkdir -p /tmp/wheels
- docker cp ${container_id}:/ /tmp/wheels/
- docker rm ${container_id}
- mv /tmp/wheels/*.whl .whl/
- ./ci/build/copy_build_artifacts.sh wheel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block of commands is nearly identical to the one in the linux_cpp_wheels_upload step (lines 130-138). To improve maintainability and reduce duplication, consider extracting this logic into a reusable script.

For example, you could create a script ci/build/extract_and_upload_wheel.sh that takes the image name suffix as an argument:

#!/bin/bash
set -euo pipefail

IMAGE_NAME_SUFFIX=$1
WANDA_IMAGE="$RAYCI_WORK_REPO:$RAYCI_BUILD_ID-$IMAGE_NAME_SUFFIX"

echo "--- Extracting wheel from $WANDA_IMAGE"

CONTAINER_ID=$(docker create "$WANDA_IMAGE")
# Use trap to ensure the container is removed even if the script fails
trap 'docker rm -f "$CONTAINER_ID"' EXIT

# Clean up previous runs and prepare directories
rm -rf /tmp/wheels
mkdir -p /tmp/wheels .whl

# Copy all files from the root of the container.
docker cp "${CONTAINER_ID}:/." /tmp/wheels/

# Move the wheel to the .whl directory for the upload script
mv /tmp/wheels/*.whl .whl/

# Upload the artifact
./ci/build/copy_build_artifacts.sh wheel

Then you could simplify the commands in both steps to a single line:

  • For linux_wheels_upload: commands: - ./ci/build/extract_and_upload_wheel.sh ray-wheel-py{{matrix}}
  • For linux_cpp_wheels_upload: commands: - ./ci/build/extract_and_upload_wheel.sh ray-cpp-wheel-py{{matrix}}

# Gating: only on releases/* OR (master AND nightly)
- label: ":s3: upload: wheel py{{matrix}} (x86_64)"
key: linux_wheels_upload
if: build.branch =~ /^releases\// || (build.branch == "master" && build.env("RAYCI_SCHEDULE") == "nightly")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following logic on ci/build/copy_build_artifacts.sh, it seemed solid to just skip this step entirely when we're not in a postmerge state. I can also remove this if it's unnecessary

https://github.com/ray-project/ray/blob/master/ci/build/copy_build_artifacts.sh#L49-L57

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it already has skip-on-premerge tag, so this if statement is not really needed.

there are cases where people sometimes want to build wheels and upload to the s3 from random branches, and we do allow that.

@@ -1 +1 @@
0.21.0
0.22.0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bump is necessary to get Wanda's symlink functionality. Covered by #59744

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

understood, let's bump up wanda version in another PR first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrew-anyscale
Copy link
Contributor Author

End stack release test running here: https://buildkite.com/ray-project/release/builds/74229/steps/canvas

Mostly passing, with failures possibly related to flakes. Re-running those

@andrew-anyscale andrew-anyscale marked this pull request as ready for review January 7, 2026 19:07
@andrew-anyscale andrew-anyscale requested a review from a team as a code owner January 7, 2026 19:07
@andrew-anyscale andrew-anyscale added the go add ONLY when ready to merge, run all tests label Jan 7, 2026
@andrew-anyscale andrew-anyscale force-pushed the andrew/revup/master/ray-wheel branch 2 times, most recently from 164e2e4 to b4fb3e0 Compare January 7, 2026 20:21
@andrew-anyscale andrew-anyscale force-pushed the andrew/revup/master/ray-wheel branch 2 times, most recently from cd79253 to 7d257e4 Compare January 8, 2026 01:31
@ray-gardener ray-gardener bot added core Issues that should be addressed in Ray Core devprod labels Jan 8, 2026
Copy link
Collaborator

@aslonnie aslonnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high-level looks really promising!

maybe split the ray-cpp wheel parts out in a follow up PR? although we are installing ray-cpp wheel in the release image, actually no release tests need ray-cpp wheel in there, so we might be able to just skip it when doing release tests, and we can maybe consider release another ray-cpp image, just to reduce the size of our released ray image.

matrix:
- "3.10"
- "3.11"
- "3.12"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does ray-cpp does not yet have python 3.13?

one thing is that ray-cpp wheel's content is actually exactly the same for different python versions.. we might actually make it python version agnostic since we are here..

Comment on lines 106 to 112
- wanda_image="$RAYCI_WORK_REPO:$RAYCI_BUILD_ID-ray-wheel-py{{matrix}}"
- container_id=$(docker create $wanda_image)
- mkdir -p /tmp/wheels
- docker cp ${container_id}:/ /tmp/wheels/
- docker rm ${container_id}
- mkdir -p .whl
- mv /tmp/wheels/*.whl .whl/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you make these a bash script in ./ci/build/ and call that with bash?

same with the ray-cpp one.

we try not to write many shell command lines on buildkite test step definitions. this makes the scripts easier to run and debug locally.

- docker rm ${container_id}
- mkdir -p .whl
- mv /tmp/wheels/*.whl .whl/
- ./ci/build/copy_build_artifacts.sh wheel
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add explicit bash here too?

# Gating: only on releases/* OR (master AND nightly)
- label: ":s3: upload: wheel py{{matrix}} (x86_64)"
key: linux_wheels_upload
if: build.branch =~ /^releases\// || (build.branch == "master" && build.env("RAYCI_SCHEDULE") == "nightly")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it already has skip-on-premerge tag, so this if statement is not really needed.

there are cases where people sometimes want to build wheels and upload to the s3 from random branches, and we do allow that.

- pyproject.toml
- README.rst
- ci/build/build-manylinux-wheel.sh
- python/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually don't think ray-cpp wheel building needs most of the files under python/.. because the wheel does not really contain python files, but mostly just cpp headers.

which means this ray-cpp wheel build might be cache-able.

@aslonnie
Copy link
Collaborator

aslonnie commented Jan 8, 2026

also, could you verify that the wheel being built in this way, if from the same source code, has exactly the same content as the wheels that are built in the current/old way?

@andrew-anyscale
Copy link
Contributor Author

I've made updates to split out the ray-cpp to a separate change--

also, could you verify that the wheel being built in this way, if from the same source code, has exactly the same content as the wheels that are built in the current/old way?

Let me get some validation going to confirm the full structure. Then I'll re-mark this as ready for review!

@andrew-anyscale andrew-anyscale removed the go add ONLY when ready to merge, run all tests label Jan 8, 2026
@andrew-anyscale andrew-anyscale force-pushed the andrew/revup/master/ray-wheel branch from 8a4bdf5 to a0b03e9 Compare January 8, 2026 20:01
@andrew-anyscale
Copy link
Contributor Author

Ran verifications against commit f6c2b5f on branch #59971

Results

Script used: https://github.com/ray-project/ray/pull/59971/files#diff-7752e49bebcb555eb3a21b7db9b83036898d923c00383f567acb2214c32d8e81

Raw print: https://github.com/ray-project/ray/pull/59971/files#diff-9e6772480096d2377f1e89b9b25f51d8fb7d6781878896b864f57c0d6fdb5c04

Compared wanda-built wheel against S3 reference wheel:

Files Compared

Category Result
File list Identical (no missing/extra files)
Python source files Identical
Thirdparty .so files (aiohttp, etc.) Identical
ray/_raylet.so Binary differs (see below)
ray/_version.py Expected diff (commit hash)
METADATA Ordering diff only (cosmetic)

_raylet.so Binary Analysis

Property Remote Local Status
Size 42,215,040 bytes 42,215,040 bytes Match
Symbols - - No differences
Embedded paths none none Clean

Conclusion: Binary difference is due to non-deterministic compilation (linker ordering, etc.), not code differences. Functionally equivalent.

Expected Differences

  1. _version.py: Different commit hash due to cherry-pick (f6c2b5f7... vs d70dfe31...)
  2. METADATA: Requires-Dist entries in different order (Python dict iteration order variance) - same dependencies, different serialization order

@andrew-anyscale andrew-anyscale marked this pull request as ready for review January 8, 2026 20:11
@andrew-anyscale andrew-anyscale marked this pull request as draft January 8, 2026 20:52
@andrew-anyscale andrew-anyscale force-pushed the andrew/revup/master/ray-wheel branch 2 times, most recently from 32347fa to 9074421 Compare January 8, 2026 21:33
@andrew-anyscale andrew-anyscale marked this pull request as ready for review January 8, 2026 22:28
@andrew-anyscale andrew-anyscale force-pushed the andrew/revup/master/ray-wheel branch from 9074421 to 1b2c341 Compare January 8, 2026 22:30
Comment on lines 60 to 63
# Verify required artifacts exist before unpacking
for f in /tmp/ray_pkg.zip /tmp/ray_py_proto.zip /tmp/ray_java_pkg.zip /tmp/dashboard.tar.gz; do
[[ -f "$f" ]] || { echo "ERROR: missing artifact: $f"; exit 1; }
done
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are copied? how will they possibly be missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Removing

./ci/build/build-manylinux-wheel.sh "$PY_BIN"

# Sanity check: ensure wheels exist
shopt -s nullglob
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this line do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It changes how loops operate when globs don't match anything

Without nullglob:

for f in *.log; do
  echo "processing $f"
done

If there are no .log files, the loop runs once with f literally equal to *.log.

With nullglob:

If there are no matches, the loop runs zero times

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, where that is a nice trick, it is changing the global settings. one can append some other script without knowing about this detail.

maybe just use wheels = ($(find . -maxdepth 0 -name '*.whl'))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

Comment on lines 33 to 36
ENV BUILDKITE_COMMIT=${BUILDKITE_COMMIT:-unknown} \
PYTHON_VERSION=${PYTHON_VERSION} \
SKIP_BAZEL_BUILD=1 \
RAY_DISABLE_EXTRA_CPP=1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you move SKIP_BAZEL_BUILD RAY_DISABLE_EXTRA_CPP into the inlined bash script rather than making it image-wide env, and make it closer to the wheel building script's calling point? the other ones are also not necessary to be defined here I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 47 to 50
COPY --chown=2000:100 ci/build/build-manylinux-wheel.sh ci/build/
COPY --chown=2000:100 README.rst pyproject.toml ./
COPY --chown=2000:100 rllib/ rllib/
COPY --chown=2000:100 python/ python/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does 2000:100 mean? is it also used somewhere else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was shooting for passing ownership as non-root to the forge user (2000) . There's a similar pattern here, though it may not be necessary after all

"chown -R 2000:100 /artifact-mount",

User: https://github.com/ray-project/ray/blob/master/.buildkite/release-automation/forge_x86_64.Dockerfile?utm_source=chatgpt.com#L56

I'll try running a build without this to see if it succeeds!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried running without the ownership passing, and since we're running as USER forge, it runs into issues since it cannot create the necessary directory structure

https://buildkite.com/ray-project/premerge/builds/57282/steps/canvas?jid=019ba375-2b30-45be-a272-ba97bc9fb48e#019ba375-2b30-45be-a272-ba97bc9fb48e/L298

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can however swap up to just chowning to the user. Let me try that for clarity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


# Image has no default CMD, so provide a dummy command.
container_id="$(docker create "${WANDA_IMAGE}" /no-such-cmd)"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can actually use crane export to directly get a tarball of the files, especially for container images that only have one layer with some files that is stored in a container registry, it works pretty reliably.

and you can avoid doing all the docker pull/cp/rm dance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrew-anyscale andrew-anyscale force-pushed the andrew/revup/master/ray-wheel branch from 1b2c341 to 05785cb Compare January 9, 2026 16:45
./ci/build/build-manylinux-wheel.sh "$PY_BIN"

# Sanity check: ensure wheels exist
shopt -s nullglob
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, where that is a nice trick, it is changing the global settings. one can append some other script without knowing about this detail.

maybe just use wheels = ($(find . -maxdepth 0 -name '*.whl'))

wheels=(.whl/*.whl)
if (( ${#wheels[@]} == 0 )); then
echo "ERROR: No wheels produced in .whl/"
ls -la .whl || true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why add the || true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for a nit reason. In the case .whl doesn't exist, we return with exit code 2. This makes sure we consistently return exit code 1 on failure

@@ -1 +1 @@
0.21.0
0.22.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

understood, let's bump up wanda version in another PR first.

This migrates ray wheel builds from CLI-based approach to wanda-based
container builds for x86_64.

Changes:
- Add ray-wheel.wanda.yaml and Dockerfile for wheel builds
- Update build.rayci.yml wheel steps to use wanda
- Add wheel upload steps that extract from wanda cache

Topic: ray-wheel

Signed-off-by: andrew <andrew@anyscale.com>
@andrew-anyscale andrew-anyscale force-pushed the andrew/revup/master/ray-wheel branch from 05785cb to 37410f9 Compare January 9, 2026 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Issues that should be addressed in Ray Core devprod

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants