Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Conda CI test env in one step #144

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 7 additions & 9 deletions ci/build_docs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,21 @@

set -euo pipefail

rapids-logger "Downloading artifacts from previous jobs"

CPP_CHANNEL=$(rapids-download-conda-from-s3 cpp)

rapids-logger "Create test conda environment"
. /opt/conda/etc/profile.d/conda.sh

RAPIDS_VERSION="$(rapids-version)"
export RAPIDS_VERSION_MAJOR_MINOR="$(rapids-version-major-minor)"

rapids-dependency-file-generator \
--output conda \
--file-key docs \
--matrix "cuda=${RAPIDS_CUDA_VERSION%.*};arch=$(arch);py=${RAPIDS_PY_VERSION}" | tee env.yaml
--matrix "cuda=${RAPIDS_CUDA_VERSION%.*};arch=$(arch);py=${RAPIDS_PY_VERSION}" \
--prepend-channel "${CPP_CHANNEL}" \
| tee env.yaml

rapids-mamba-retry env create --yes -f env.yaml -n docs

Expand All @@ -23,15 +28,8 @@ set -u

rapids-print-env

rapids-logger "Downloading artifacts from previous jobs"

CPP_CHANNEL=$(rapids-download-conda-from-s3 cpp)
export RAPIDS_DOCS_DIR="$(mktemp -d)"

rapids-mamba-retry install \
--channel "${CPP_CHANNEL}" \
"libwholegraph=${RAPIDS_VERSION}"

rapids-logger "Build C++ docs"
pushd cpp
doxygen Doxyfile
Expand Down
1 change: 1 addition & 0 deletions ci/release/update-version.sh
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ DEPENDENCIES=(
libraft
libraft-headers
librmm
libwholegraph
pylibcugraph
pylibwholegraph
rmm
Expand Down
11 changes: 2 additions & 9 deletions ci/test_cpp.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/bin/bash
# Copyright (c) 2022-2024, NVIDIA CORPORATION.
# Copyright (c) 2022-2025, NVIDIA CORPORATION.

set -euo pipefail

Expand All @@ -8,8 +8,6 @@ cd "$(dirname "$(realpath "${BASH_SOURCE[0]}")")"/../

. /opt/conda/etc/profile.d/conda.sh

RAPIDS_VERSION="$(rapids-version)"

CPP_CHANNEL=$(rapids-download-conda-from-s3 cpp)

rapids-logger "Generate C++ testing dependencies"
Expand All @@ -18,7 +16,7 @@ rapids-dependency-file-generator \
--file-key test_cpp \
--matrix "cuda=${RAPIDS_CUDA_VERSION%.*};arch=$(arch)" \
--prepend-channel "${CPP_CHANNEL}" \
| tee env.yaml
| tee env.yaml

rapids-mamba-retry env create --yes -f env.yaml -n test

Expand All @@ -32,11 +30,6 @@ mkdir -p "${RAPIDS_TESTS_DIR}"

rapids-print-env

rapids-mamba-retry install \
--channel "${CPP_CHANNEL}" \
"libwholegraph=${RAPIDS_VERSION}" \
"libwholegraph-tests=${RAPIDS_VERSION}"

rapids-logger "Check GPU usage"
nvidia-smi

Expand Down
87 changes: 40 additions & 47 deletions ci/test_python.sh
Copy link
Contributor

@bdice bdice Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are ARM tests doing anything here? Maybe they should exit earlier (or be skipped in the CI matrix) if not.

There are a few comments of the form # Reactivate the test environment back. I’m not sure if we have a test environment to go back to, with the new single-solve environments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do that in a follow-up PR. I'd like to limit the scope of this one to consolidating the environment creation.

Original file line number Diff line number Diff line change
@@ -1,11 +1,17 @@
#!/bin/bash
# Copyright (c) 2022-2024, NVIDIA CORPORATION.
# Copyright (c) 2022-2025, NVIDIA CORPORATION.

set -euo pipefail

# Support invoking test_python.sh outside the script directory
cd "$(dirname "$(realpath "${BASH_SOURCE[0]}")")"/../

if [[ "${RAPIDS_CUDA_VERSION%%.*}" == "11" ]]; then
DGL_CHANNEL="dglteam/label/th23_cu118"
else
DGL_CHANNEL="dglteam/label/th23_cu121"
fi

. /opt/conda/etc/profile.d/conda.sh

RAPIDS_VERSION="$(rapids-version)"
Expand All @@ -14,15 +20,6 @@ rapids-logger "Downloading artifacts from previous jobs"
CPP_CHANNEL=$(rapids-download-conda-from-s3 cpp)
PYTHON_CHANNEL=$(rapids-download-conda-from-s3 python)

rapids-logger "Generate Python testing dependencies"
rapids-dependency-file-generator \
--output conda \
--file-key test_python \
--matrix "cuda=${RAPIDS_CUDA_VERSION%.*};arch=$(arch);py=${RAPIDS_PY_VERSION}" \
--prepend-channel "${CPP_CHANNEL}" \
--prepend-channel "${PYTHON_CHANNEL}" \
| tee env.yaml

RAPIDS_TESTS_DIR=${RAPIDS_TESTS_DIR:-"${PWD}/test-results"}
RAPIDS_COVERAGE_DIR=${RAPIDS_COVERAGE_DIR:-"${PWD}/coverage-results"}
mkdir -p "${RAPIDS_TESTS_DIR}" "${RAPIDS_COVERAGE_DIR}"
Expand Down Expand Up @@ -50,31 +47,26 @@ set +e
# bulk sampler IO tests (hangs in CI)

if [[ "${RUNNER_ARCH}" != "ARM64" ]]; then
rapids-logger "(cugraph-dgl) Generate Python testing dependencies"
rapids-dependency-file-generator \
--output conda \
--file-key test_cugraph_dgl \
--matrix "cuda=${RAPIDS_CUDA_VERSION%.*};arch=$(arch);py=${RAPIDS_PY_VERSION}" \
--prepend-channel "${CPP_CHANNEL}" \
--prepend-channel "${PYTHON_CHANNEL}" \
--prepend-channel pytorch \
--prepend-channel conda-forge \
--prepend-channel "${DGL_CHANNEL}" \
--prepend-channel nvidia \
| tee env.yaml

rapids-mamba-retry env create --yes -f env.yaml -n test_cugraph_dgl

# activate test_cugraph_dgl environment for dgl
set +u
conda activate test_cugraph_dgl
set -u

if [[ "${RAPIDS_CUDA_VERSION%%.*}" == "11" ]]; then
DGL_CHANNEL="dglteam/label/th23_cu118"
else
DGL_CHANNEL="dglteam/label/th23_cu121"
fi


rapids-mamba-retry install \
--channel "${CPP_CHANNEL}" \
--channel "${PYTHON_CHANNEL}" \
--channel pytorch \
--channel conda-forge \
--channel "${DGL_CHANNEL}" \
--channel nvidia \
"pylibwholegraph=${RAPIDS_VERSION}" \
"cugraph-dgl=${RAPIDS_VERSION}" \
'pytorch>=2.3' \
"ogb"

rapids-print-env

Expand All @@ -98,22 +90,23 @@ else
fi

if [[ "${RUNNER_ARCH}" != "ARM64" ]]; then
rapids-logger "(cugraph-pyg) Generate Python testing dependencies"
rapids-dependency-file-generator \
--output conda \
--file-key test_cugraph_pyg \
--matrix "cuda=${RAPIDS_CUDA_VERSION%.*};arch=$(arch);py=${RAPIDS_PY_VERSION}" \
--prepend-channel "${CPP_CHANNEL}" \
--prepend-channel "${PYTHON_CHANNEL}" \
--prepend-channel pytorch \
| tee env.yaml

rapids-mamba-retry env create --yes -f env.yaml -n test_cugraph_pyg

# Temporarily allow unbound variables for conda activation.
set +u
conda activate test_cugraph_pyg
set -u

rapids-mamba-retry install \
--channel "${CPP_CHANNEL}" \
--channel "${PYTHON_CHANNEL}" \
--channel pytorch \
"pylibwholegraph=${RAPIDS_VERSION}" \
"cugraph-pyg=${RAPIDS_VERSION}" \
'pytorch>=2.3' \
'ogb'

rapids-print-env

rapids-logger "Check GPU usage"
Expand All @@ -136,23 +129,23 @@ else
fi

if [[ "${RUNNER_ARCH}" != "ARM64" ]]; then
rapids-logger "(pylibwholegraph) Generate Python testing dependencies"
rapids-dependency-file-generator \
--output conda \
--file-key test_pylibwholegraph \
--matrix "cuda=${RAPIDS_CUDA_VERSION%.*};arch=$(arch);py=${RAPIDS_PY_VERSION}" \
--prepend-channel "${CPP_CHANNEL}" \
--prepend-channel "${PYTHON_CHANNEL}" \
--prepend-channel pytorch \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocking comment for this PR, but @tingyu66 @alexbarghi-nv are we planning to drop use of the pytorch channel in the 25.04 release?

Linking this related conversation: #99 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should do that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I can get that started in a follow-up PR if you'd like.

| tee env.yaml

rapids-mamba-retry env create --yes -f env.yaml -n test_pylibwholegraph

# Temporarily allow unbound variables for conda activation.
set +u
conda activate test_pylibwholegraph
set -u

rapids-mamba-retry install \
--channel "${CPP_CHANNEL}" \
--channel "${PYTHON_CHANNEL}" \
--channel pytorch \
'mkl<2024.1.0' \
"pylibwholegraph=${RAPIDS_VERSION}" \
'pytorch>=2.3' \
'pytest-forked' \
'ogb'

rapids-print-env

rapids-logger "Check GPU usage"
Expand Down
65 changes: 65 additions & 0 deletions dependencies.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -43,11 +43,14 @@ files:
- cuda_version
- docs
- py_version
- depends_on_libwholegraph
test_cpp:
output: none
includes:
- cuda_version
- test_cpp
- depends_on_libwholegraph
- depends_on_libwholegraph_tests
test_notebooks:
output: none
includes:
Expand All @@ -68,6 +71,46 @@ files:
- depends_on_ogb
- py_version
- test_python_common
test_cugraph_dgl:
output: none
includes:
- cuda_version
- depends_on_cugraph
- depends_on_cudf
- depends_on_dgl
- depends_on_pytorch
- depends_on_ogb
- py_version
- test_python_common
- depends_on_pylibwholegraph
- depends_on_cugraph_dgl
test_cugraph_pyg:
output: none
includes:
- cuda_version
- depends_on_cugraph
- depends_on_cudf
- depends_on_dgl
- depends_on_pytorch
- depends_on_ogb
- py_version
- test_python_common
- depends_on_pylibwholegraph
- depends_on_cugraph_pyg
test_pylibwholegraph:
output: none
includes:
- cuda_version
- depends_on_cugraph
- depends_on_cudf
- depends_on_dgl
- depends_on_pytorch
- depends_on_ogb
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is what you want, to replace conda install ogb.

depends_on_ogb... doesn't seem to contain ogb:

# Will remove this after snap-stanford/ogb#497 is resolved.
# Temporarily sets the max pytorch version to 2.5 for compatibility
# with ogb.
depends_on_ogb:
common:
- output_types: [conda]
packages:
- pytorch>=2.3,<2.6a0
specific:
- output_types: [requirements]
matrices:
- matrix: {cuda: "12.*"}
packages:
- --extra-index-url=https://download.pytorch.org/whl/cu121
- matrix: {cuda: "11.*"}
packages:
- --extra-index-url=https://download.pytorch.org/whl/cu118
- {matrix: null, packages: null}
- output_types: [requirements, pyproject]
matrices:
- matrix: {cuda: "12.*"}
packages:
- torch>=2.3,<2.6a0
- matrix: {cuda: "11.*"}
packages:
- torch>=2.3,<2.6a0
- {matrix: null, packages: [*pytorch_pip]}

That looks like a mistake, not sure where it happened. @tingyu66 could you please take a look? Maybe that was the result of a bad merge conflict resolution or something. depends_on_ogb seems to just contain torch (which we already have depends_on_pytorch for)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a mistake

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a temporary constraint of PyTorch that only applies to test environments where ogb is being used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bradley and I discussed this here: #104

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And on Slack

Copy link
Contributor

@bdice bdice Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better name for this might be ogb_pytorch_constraint.

We typically use depends_on_ to indicate an actual dependency on just that package.

Copy link
Member

@jameslamb jameslamb Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks! The comment directly above this, about this being there for the benefit of ogb, now makes sense to me.

It still would be better to name it ogb_pytorch_constraint, I think, but that doesn't have to hold up this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, also devcontainer builds will fail until #148 gets merged. I expect that to pass CI now that the cudf fix is merged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that that's merged, I've merged in latest branch-25.04 here to re-run CI.

- py_version
- test_python_common
- depends_on_mkl
- depends_on_pylibwholegraph
- test_python_pylibwholegraph

py_build_pylibwholegraph:
output: pyproject
Expand Down Expand Up @@ -509,6 +552,18 @@ dependencies:
- pylibwholegraph-cu11==25.4.*,>=0.0.0a0
- {matrix: null, packages: [*pylibwholegraph_unsuffixed]}

depends_on_libwholegraph:
common:
- output_types: conda
packages:
- libwholegraph==25.4.*,>=0.0.0a0

depends_on_libwholegraph_tests:
common:
- output_types: conda
packages:
- libwholegraph-tests==25.4.*,>=0.0.0a0

depends_on_rmm:
common:
- output_types: conda
Expand Down Expand Up @@ -658,3 +713,13 @@ dependencies:
packages: &cupy_packages_cu11
- cupy-cuda11x>=13.2.0
- {matrix: null, packages: *cupy_packages_cu11}
depends_on_cugraph_pyg:
common:
- output_types: conda
packages:
- cugraph-pyg==25.4.*,>=0.0.0a0
depends_on_mkl:
common:
- output_types: conda
packages:
- mkl<2024.1.0