-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Add wanda anyscale image builds for release tests #59937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: andrew/revup/master/ray-image
Are you sure you want to change the base?
Add wanda anyscale image builds for release tests #59937
Conversation
|
f104bf5 to
288cadb
Compare
f963b42 to
0710971
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the Anyscale image build process to use Wanda for building and a new Python script for pushing images to multiple registries. The changes include new Wanda configuration files, the push_anyscale_image.py script, and updates to the Buildkite pipeline. My review identifies a few critical issues that will break the build, including a missing file in a Bazel target and an incomplete matrix configuration in the pipeline. I've also provided several suggestions to improve maintainability by reducing code duplication and making parts of the new script less brittle.
| py_binary( | ||
| name = "push_ray_image", | ||
| srcs = ["push_ray_image.py"], | ||
| exec_compatible_with = ["//bazel:py3"], | ||
| deps = [ | ||
| ":crane_lib", | ||
| "//ci/ray_ci:ray_ci_lib", | ||
| ci_require("click"), | ||
| ], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - bash release/gcloud_docker_login.sh release/aws2gce_iam.json | ||
| - bash release/azure_docker_login.sh | ||
| - az acr login --name rayreleasetest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These authentication commands are duplicated across the anyscalebuild, anyscalellmbuild, and anyscalemlbuild push jobs. To improve maintainability and reduce redundancy, consider using YAML anchors and aliases to define these commands once and reuse them in each job. This will make future updates to the authentication process much easier.
|
|
||
| # GPU_PLATFORM is the default GPU platform that gets aliased as "gpu" | ||
| # This must match the definition in ci/ray_ci/docker_container.py | ||
| GPU_PLATFORM = "cu12.1.1-cudnn8" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GPU_PLATFORM is hardcoded here. The comment mentions it must match a definition in another file, which makes this brittle. If the default GPU platform changes, this script will require a manual update, which can be easily missed. Consider sourcing this value from a shared configuration file or passing it as a command-line argument to make the script more robust and maintainable.
| def _format_platform_tag(platform: str) -> str: | ||
| """Format platform as -cpu or shortened CUDA version.""" | ||
| if platform == "cpu": | ||
| return "-cpu" | ||
| # cu12.3.2-cudnn9 -> -cu123 | ||
| versions = platform.split(".") | ||
| return f"-{versions[0]}{versions[1]}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for parsing the CUDA version string using platform.split('.') is fragile. It assumes a specific format with dots as separators and might fail with an IndexError if the platform string format changes (e.g., cu12-cudnn8). Using a regular expression would be more robust for extracting the version parts. For example: re.match(r"cu(\\d+)\\.(\\d+)", platform).
| def _get_image_tags(python_version: str, platform: str) -> List[str]: | ||
| """ | ||
| Generate image tags matching the original docker_container.py format. | ||
| Returns multiple tags for the image (canonical + aliases). | ||
| For GPU_PLATFORM, also generates -gpu alias tags to match release test expectations. | ||
| """ | ||
| branch = os.environ.get("BUILDKITE_BRANCH", "") | ||
| commit = os.environ.get("BUILDKITE_COMMIT", "")[:6] | ||
| rayci_build_id = os.environ.get("RAYCI_BUILD_ID", "") | ||
|
|
||
| py_tag = _format_python_version_tag(python_version) | ||
| platform_tag = _format_platform_tag(platform) | ||
|
|
||
| # For GPU_PLATFORM, also create -gpu alias (release tests use type: gpu) | ||
| platform_tags = [platform_tag] | ||
| if platform == GPU_PLATFORM: | ||
| platform_tags.append("-gpu") | ||
|
|
||
| tags = [] | ||
|
|
||
| if branch == "master": | ||
| # On master, use sha and build_id as tags | ||
| for ptag in platform_tags: | ||
| tags.append(f"{commit}{py_tag}{ptag}") | ||
| if rayci_build_id: | ||
| for ptag in platform_tags: | ||
| tags.append(f"{rayci_build_id}{py_tag}{ptag}") | ||
| elif branch.startswith("releases/"): | ||
| # On release branches, use release name | ||
| release_name = branch[len("releases/") :] | ||
| for ptag in platform_tags: | ||
| tags.append(f"{release_name}.{commit}{py_tag}{ptag}") | ||
| if rayci_build_id: | ||
| for ptag in platform_tags: | ||
| tags.append(f"{rayci_build_id}{py_tag}{ptag}") | ||
| else: | ||
| # For other branches (PRs, etc.) | ||
| pr = os.environ.get("BUILDKITE_PULL_REQUEST", "false") | ||
| if pr != "false": | ||
| for ptag in platform_tags: | ||
| tags.append(f"pr-{pr}.{commit}{py_tag}{ptag}") | ||
| else: | ||
| for ptag in platform_tags: | ||
| tags.append(f"{commit}{py_tag}{ptag}") | ||
| if rayci_build_id: | ||
| for ptag in platform_tags: | ||
| tags.append(f"{rayci_build_id}{py_tag}{ptag}") | ||
|
|
||
| return tags |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function has a lot of repeated code, especially the block for appending rayci_build_id tags, which is identical in all if/elif/else branches. This can be refactored to be more DRY (Don't Repeat Yourself), which improves readability and maintainability. You could determine the tag prefix first, then generate the tags in a single block of code.
def _get_image_tags(python_version: str, platform: str) -> List[str]:
"""
Generate image tags matching the original docker_container.py format.
Returns multiple tags for the image (canonical + aliases).
For GPU_PLATFORM, also generates -gpu alias tags to match release test expectations.
"""
branch = os.environ.get("BUILDKITE_BRANCH", "")
commit = os.environ.get("BUILDKITE_COMMIT", "")[:6]
rayci_build_id = os.environ.get("RAYCI_BUILD_ID", "")
py_tag = _format_python_version_tag(python_version)
platform_tag = _format_platform_tag(platform)
# For GPU_PLATFORM, also create -gpu alias (release tests use type: gpu)
platform_tags = [platform_tag]
if platform == GPU_PLATFORM:
platform_tags.append("-gpu")
tags = []
if branch == "master":
prefix = commit
elif branch.startswith("releases/"):
release_name = branch[len("releases/") :]
prefix = f"{release_name}.{commit}"
else:
pr = os.environ.get("BUILDKITE_PULL_REQUEST", "false")
if pr != "false":
prefix = f"pr-{pr}.{commit}"
else:
prefix = commit
for ptag in platform_tags:
tags.append(f"{prefix}{py_tag}{ptag}")
if rayci_build_id:
for ptag in platform_tags:
tags.append(f"{rayci_build_id}{py_tag}{ptag}")
return tags0710971 to
4cdb2dd
Compare
46cc862 to
70df756
Compare
4cdb2dd to
f65669a
Compare
70df756 to
ae32d45
Compare
f65669a to
1141897
Compare
ae32d45 to
04cfd26
Compare
af3b600 to
6bbeb9e
Compare
04cfd26 to
55d1473
Compare
6bbeb9e to
1f049e8
Compare
55d1473 to
84d7479
Compare
84d7479 to
c275fe2
Compare
d43ccd7 to
8980d9c
Compare
7c78fa2 to
29cf5ab
Compare
8980d9c to
4a143b7
Compare
29cf5ab to
d566c8a
Compare
4a143b7 to
9243672
Compare
d566c8a to
9190029
Compare
8c684c8 to
826bac0
Compare
9190029 to
73822f5
Compare
826bac0 to
ebdca80
Compare
73822f5 to
94805e9
Compare
This adds wanda-based builds for anyscale images used in release tests. Changes: - Add ray-anyscale-cpu/cuda.wanda.yaml for ray anyscale images - Add ray-llm-anyscale-cuda.wanda.yaml for LLM images - Add ray-ml-anyscale-cuda.wanda.yaml for ML images - Add push_anyscale_image.py for pushing to ECR/GCP/Azure via crane - Update release/build.rayci.yml with build and push steps - Update gcloud_docker_login.sh to v550.0.0 with arch detection Topic: anyscale-image Relative: ray-image Labels: draft Signed-off-by: andrew <andrew@anyscale.com>
94805e9 to
fc79d67
Compare
35249bf to
0e1bd8b
Compare
fc79d67 to
db94f93
Compare
0e1bd8b to
5de31a9
Compare
db94f93 to
05c0670
Compare
5de31a9 to
a96c770
Compare
05c0670 to
a671850
Compare
This adds wanda-based builds for anyscale images used in release tests.
Changes:
Topic: anyscale-image
Relative: ray-wheel
Labels: draft