Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
106 commits
Select commit Hold shift + click to select a range
f0b5ab3
Adding softlink with MajorVersion number for pciutils
renjithravindrankannath Sep 16, 2025
2ff1033
Revert "Adding softlink with MajorVersion number for pciutils"
renjithravindrankannath Sep 16, 2025
86acb94
Merge remote-tracking branch 'upstream/develop' into develop
renjithravindrankannath Sep 19, 2025
d723352
Enabling ci build for py-torch in rocm
renjithravindrankannath Sep 19, 2025
47f737c
[@spackbot] updating style on behalf of renjithravindrankannath
renjithravindrankannath Sep 19, 2025
e57ab94
enabling horovod and keras in ci
renjithravindrankannath Sep 22, 2025
326901a
Increasing timeout for ck and aotriton
renjithravindrankannath Sep 23, 2025
7a65c39
Increase timout for ck and aotriton
renjithravindrankannath Sep 24, 2025
6ad618d
aotriton 0.10b and related changesin py-torchwq
renjithravindrankannath Oct 23, 2025
9f956ed
stlye error fix
renjithravindrankannath Oct 24, 2025
9f60ff2
fix audit error
renjithravindrankannath Oct 24, 2025
bdf548d
aotriton require specific commit of llvm
renjithravindrankannath Oct 24, 2025
89f3a25
[@spackbot] updating style on behalf of renjithravindrankannath
renjithravindrankannath Oct 24, 2025
349cdbf
removing unused imports
renjithravindrankannath Oct 24, 2025
5f59902
audit check error fix
renjithravindrankannath Oct 24, 2025
0ddd68e
aotriton-llvm update and related changes
renjithravindrankannath Oct 24, 2025
5294a9b
Adding docstring for aotriton-llvm
renjithravindrankannath Oct 24, 2025
3ec901a
Aotriton and py-torch dependency fixes
renjithravindrankannath Oct 28, 2025
96aafb6
[@spackbot] updating style on behalf of renjithravindrankannath
renjithravindrankannath Oct 28, 2025
6b8fe85
Aotriton-llvm dependency fixes and style fix
renjithravindrankannath Oct 28, 2025
81c722b
Merge remote-tracking branch 'upstream/develop' into develop
renjithravindrankannath Oct 31, 2025
07fd5a8
aotriton 10.0 updates
renjithravindrankannath Nov 4, 2025
43a4360
[@spackbot] updating style on behalf of renjithravindrankannath
renjithravindrankannath Nov 4, 2025
e2cdefb
Merge remote-tracking branch 'upstream/develop' into develop
renjithravindrankannath Nov 6, 2025
1d28e00
Merge branch 'develop' into py-torch-2.8-rocm
renjithravindrankannath Nov 6, 2025
771df65
rocm update for py-torch 2.9
renjithravindrankannath Nov 6, 2025
445001d
py-torch requires ck
renjithravindrankannath Nov 6, 2025
f64a92f
Merge remote-tracking branch 'upstream/develop' into develop
renjithravindrankannath Nov 14, 2025
1df2322
Merge branch 'develop' into py-torch-2.8-rocm
renjithravindrankannath Nov 14, 2025
d256987
Limiting to py-torch temporarily
renjithravindrankannath Nov 15, 2025
3434b79
masking py-kornia temporarily
renjithravindrankannath Nov 15, 2025
f44564c
Updating python version and addressing other review comments
renjithravindrankannath Nov 15, 2025
ad5cf5c
Correcting aotriton and python dependencies
renjithravindrankannath Nov 17, 2025
4769152
updating ck dependency
renjithravindrankannath Nov 24, 2025
509af9f
roctracer include path for kineto
renjithravindrankannath Nov 25, 2025
1c9b69e
Updating hip dependency
renjithravindrankannath Nov 26, 2025
e979804
Removing temporary changes from ci/gitlab/configs
renjithravindrankannath Nov 26, 2025
ccdeb8b
Including roctracer include path for 2.5 as well
renjithravindrankannath Nov 26, 2025
29aa6c7
CK build requires more than 6 hrs in ci
renjithravindrankannath Dec 4, 2025
7b13b05
Correction in timeout for CK build
renjithravindrankannath Dec 5, 2025
377d5d3
Adding AMDGPU_TARGETS for older versions for CK
renjithravindrankannath Dec 8, 2025
7af2229
Passing amdgpu_target to CK
renjithravindrankannath Dec 8, 2025
9296227
CK needs more than 5hrs
renjithravindrankannath Dec 8, 2025
30980bc
Revert timeout as it doesn't help on expiring token
renjithravindrankannath Dec 9, 2025
e3be055
disabling tests for ck
renjithravindrankannath Dec 10, 2025
244316e
disabling tests for ck under external
renjithravindrankannath Dec 10, 2025
29eceff
disabling tests for ck in ml-linux-x86_64-rocm ci
renjithravindrankannath Dec 10, 2025
5b9802e
Setting timeout to 600 again to check ck failure
renjithravindrankannath Dec 11, 2025
5031bf2
Temporarily disabling everything else except py-torch
renjithravindrankannath Dec 12, 2025
ea1f835
Adjusting timeout at template-level
renjithravindrankannath Dec 15, 2025
f0a676c
Renewing temporary creds just before the long‑running CK upload
renjithravindrankannath Dec 16, 2025
236fcc4
Revert "Renewing temporary creds just before the long‑running CK upload"
renjithravindrankannath Dec 16, 2025
dbc3e9b
Updating hipblaslt dependency
renjithravindrankannath Dec 19, 2025
9f9d596
Updating ck dependency without gpu_target
renjithravindrankannath Dec 19, 2025
dd3faec
Merge remote-tracking branch 'upstream/develop' into develop
renjithravindrankannath Dec 23, 2025
fea696a
Merge branch 'develop' into py-torch-2.8-rocm
renjithravindrankannath Dec 23, 2025
ce828b3
py-llvmlite standalone tests in broken-tests-packages
renjithravindrankannath Dec 25, 2025
2a2e3f9
Merge remote-tracking branch 'upstream/develop' into develop
renjithravindrankannath Jan 10, 2026
8c32799
Merge branch 'develop' into py-torch-2.8-rocm
renjithravindrankannath Jan 10, 2026
f1b2841
py-llvmlite is incompatible with llvm-amdgpu
renjithravindrankannath Jan 12, 2026
e4ea819
Prevent building py-llvmlite
renjithravindrankannath Jan 13, 2026
d3ca5de
skip numba since it requires llvmlite
renjithravindrankannath Jan 13, 2026
0796b50
Revert Dummy path
renjithravindrankannath Jan 13, 2026
948b9e8
Rerstricting 6.3 rocm dependency
renjithravindrankannath Jan 13, 2026
cf44ac7
py-pandas 2.3.3 and above conflicts with llvm version
renjithravindrankannath Jan 14, 2026
5e525aa
Reverting py-pandas version rule
renjithravindrankannath Jan 14, 2026
51828e4
py-networkx@2.5.1 2.7 to avoid py-llvmlite dependency
renjithravindrankannath Jan 14, 2026
7352641
py-llvmlite@0.45 to avoid conflict with llvm version
renjithravindrankannath Jan 14, 2026
9ec09dd
py-llvmlite@0.44 to avoid conflict with llvm version
renjithravindrankannath Jan 14, 2026
42bc465
Trigger aotriton build
renjithravindrankannath Jan 15, 2026
9bfc735
Trigger aotriton build
renjithravindrankannath Jan 15, 2026
49f05d2
AMDGPU_TARGETS not needed for ck
renjithravindrankannath Jan 19, 2026
e5ed8be
Adding aotriton include path
renjithravindrankannath Jan 22, 2026
d85b521
enabling packages depending py-torch
renjithravindrankannath Jan 23, 2026
b5ddb66
Disable py-torchaudio and py-torchvision
renjithravindrankannath Jan 23, 2026
a64fcb9
passing gpu_target to hipblaslt and limiting py-llvmlite dependency
renjithravindrankannath Jan 27, 2026
b2a0601
[@spackbot] updating style on behalf of renjithravindrankannath
renjithravindrankannath Jan 27, 2026
e15dc80
Merge remote-tracking branch 'upstream/develop' into develop
renjithravindrankannath Jan 27, 2026
f24f794
Merge branch 'develop' into py-torch-2.8-rocm
renjithravindrankannath Jan 27, 2026
e49d342
aotriton release updates
renjithravindrankannath Jan 27, 2026
9df3fc4
[@spackbot] updating style on behalf of renjithravindrankannath
renjithravindrankannath Jan 27, 2026
af1e462
Merge remote-tracking branch 'upstream/develop' into develop
renjithravindrankannath Jan 27, 2026
57baaaa
Merge branch 'develop' into py-torch-2.8-rocm
renjithravindrankannath Jan 27, 2026
ca6c4fa
Updating aotriton dependency
renjithravindrankannath Jan 28, 2026
33a1d31
Updating 2.5 patch with aotriton path
renjithravindrankannath Jan 28, 2026
22e052f
Revert aotriton dependency change
renjithravindrankannath Jan 28, 2026
9cd7c1a
Updating aotriton path for 2.5
renjithravindrankannath Jan 29, 2026
501a60b
py-llvmlite 0.46.0 requires hwloc without rocm
renjithravindrankannath Feb 2, 2026
5b3b323
Revert "py-llvmlite 0.46.0 requires hwloc without rocm"
renjithravindrankannath Feb 3, 2026
426f2b9
py-llvmlite 0.46 which requires llvm 20 create conflict with hwloc 2.…
renjithravindrankannath Feb 3, 2026
97e3ac7
py-llvmlite 0.46 which requires llvm 20 create conflict with hwloc 2.…
renjithravindrankannath Feb 4, 2026
93f707a
Merge remote-tracking branch 'upstream/develop' into develop
renjithravindrankannath Feb 5, 2026
a78a1db
Merge branch 'develop' into py-torch-2.8-rocm
renjithravindrankannath Feb 5, 2026
f6b12b5
Update for 2.10 on rocm
renjithravindrankannath Feb 6, 2026
34c0eb4
Temporarily reverting mkldnn check to verify
renjithravindrankannath Feb 9, 2026
fcffa06
py-torchvision requires rocm math lib paths indirectly when py-torch …
renjithravindrankannath Feb 11, 2026
131a4e7
Merge branch 'develop' into py-torch-2.8-rocm
renjithravindrankannath Feb 11, 2026
d3f0e85
Merge remote-tracking branch 'upstream/develop' into develop
renjithravindrankannath Feb 11, 2026
99fc821
Merge branch 'develop' into py-torch-2.8-rocm
renjithravindrankannath Feb 11, 2026
8437bbd
Merge branch 'spack:develop' into py-torch-2.8-rocm
renjithravindrankannath Feb 11, 2026
690a8c7
Merge branch 'spack:develop' into py-torch-2.8-rocm
renjithravindrankannath Feb 12, 2026
f7b686f
Removing unwanted line in aotriton-llvm
renjithravindrankannath Feb 12, 2026
3ba2d6a
Temporarily reverting math lib include path to test
renjithravindrankannath Feb 12, 2026
1fc2b0c
Merge branch 'spack:develop' into py-torch-2.8-rocm
renjithravindrankannath Feb 12, 2026
b221734
libtorch_hip.so needs aotriton and hip libs at runtime
renjithravindrankannath Feb 20, 2026
5ded4e0
Add prefix lib dirs when they exist so the loader can find .so files
renjithravindrankannath Feb 21, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .ci/gitlab/configs/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ ci:
broken-tests-packages:
- superlu-dist # srun -n 4 hangs
- papyrus
- composable-kernel
- py-llvmlite

pipeline-gen:
- build-job:
Expand Down
1 change: 1 addition & 0 deletions .ci/gitlab/configs/linux/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ ci:
- wrf
build-job:
tags: [ "spack", "huge" ]
timeout: 1200 minutes
variables:
CI_JOB_SIZE: huge
SPACK_BUILD_JOBS: "12"
Expand Down
1 change: 1 addition & 0 deletions repos/spack_repo/builtin/packages/aotriton/package.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ class Aotriton(CMakePackage):
depends_on("pkgconfig", type="build")

# build llvm version with mlir with the commit that matches inside the llvm-hash.txt

depends_on("aotriton-llvm@0.10", when="@0.10b")
depends_on("aotriton-llvm@0.9", when="@0.9b")
depends_on("aotriton-llvm@0.8", when="@0.8b")
Expand Down
2 changes: 1 addition & 1 deletion repos/spack_repo/builtin/packages/hwloc/package.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ class Hwloc(AutotoolsPackage, CudaPackage, ROCmPackage):
depends_on("mpi", when="+netloc")

with when("+rocm"):
depends_on("rocm-smi-lib")
depends_on("rocm-smi-lib@7.0:")
depends_on("rocm-opencl", when="+opencl")
# Avoid a circular dependency since the openmp
# variant of llvm-amdgpu depends on hwloc.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,17 +25,18 @@ index 9be7f37..39d0f24 100644
endif()

diff --git a/cmake/public/LoadHIP.cmake b/cmake/public/LoadHIP.cmake
index 1c0d3a2..e0de4b1 100644
index 1c0d3a2..83f9f9d 100644
--- a/cmake/public/LoadHIP.cmake
+++ b/cmake/public/LoadHIP.cmake
@@ -167,6 +167,10 @@ if(HIP_FOUND)
@@ -167,6 +167,11 @@ if(HIP_FOUND)
find_package_and_print_version(hipsolver REQUIRED)
find_package_and_print_version(hiprtc REQUIRED)

+ list(APPEND ROCM_INCLUDE ${rocthrust_INCLUDE_DIR})
+ list(APPEND ROCM_INCLUDE ${rocprim_INCLUDE_DIR})
+ list(APPEND ROCM_INCLUDE ${hipcub_INCLUDE_DIR})
+ list(APPEND ROCM_INCLUDE ${rocRAND_INCLUDE_DIR})
+ list(APPEND ROCM_INCLUDE $ENV{AOTRITON_INSTALLED_PREFIX}/include)

find_library(PYTORCH_HIP_LIBRARIES amdhip64 HINTS ${ROCM_PATH}/lib)
# TODO: miopen_LIBRARIES should return fullpath to the library file,
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
diff --git a/caffe2/CMakeLists.txt b/caffe2/CMakeLists.txt
index d2d23b7..620a89f 100644
index d2d23b7ab65..620a89f65cb 100644
--- a/caffe2/CMakeLists.txt
+++ b/caffe2/CMakeLists.txt
@@ -1379,13 +1379,6 @@ if(USE_ROCM)
Expand All @@ -26,7 +26,7 @@ index d2d23b7..620a89f 100644
endif()

diff --git a/cmake/public/LoadHIP.cmake b/cmake/public/LoadHIP.cmake
index 58c74dd..d3e1ad4 100644
index 58c74ddda35..54f96871372 100644
--- a/cmake/public/LoadHIP.cmake
+++ b/cmake/public/LoadHIP.cmake
@@ -26,12 +26,6 @@ else()
Expand Down Expand Up @@ -78,7 +78,15 @@ index 58c74dd..d3e1ad4 100644
find_package_and_print_version(amd_comgr REQUIRED)
find_package_and_print_version(rocrand REQUIRED)
find_package_and_print_version(hiprand REQUIRED)
@@ -171,7 +168,11 @@ if(HIP_FOUND)
@@ -157,6 +154,7 @@ if(HIP_FOUND)
find_package_and_print_version(hipcub REQUIRED)
find_package_and_print_version(rocthrust REQUIRED)
find_package_and_print_version(hipsolver REQUIRED)
+ list(APPEND ROCM_INCLUDE_DIRS $ENV{AOTRITON_INSTALLED_PREFIX}/include)
# workaround cmake 4 build issue
if(CMAKE_VERSION VERSION_GREATER_EQUAL "4.0.0")
message(WARNING "Work around hiprtc cmake failure for cmake >= 4")
@@ -171,7 +169,11 @@ if(HIP_FOUND)
if(UNIX)
find_package_and_print_version(rccl)
find_package_and_print_version(hsa-runtime64 REQUIRED)
Expand Down
29 changes: 24 additions & 5 deletions repos/spack_repo/builtin/packages/py_torch/package.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,7 @@ class PyTorch(PythonPackage, CudaPackage, ROCmPackage):
conflicts("+gloo+rocm")
conflicts("+rocm", when="@2.3", msg="Rocm doesn't support py-torch 2.3 release")
conflicts("+rocm", when="@2.4", msg="Rocm doesn't support py-torch 2.4 release")
conflicts("+rocm", when="@2.8", msg="Rocm doesn't support py-torch 2.8 release")
conflicts("+tensorpipe", when="+rocm ^hip@:5.1", msg="TensorPipe not supported until ROCm 5.2")
conflicts("+breakpad", when="target=ppc64:")
conflicts("+breakpad", when="target=ppc64le:")
Expand Down Expand Up @@ -305,7 +306,8 @@ class PyTorch(PythonPackage, CudaPackage, ROCmPackage):
depends_on("valgrind", when="+valgrind")
with when("+rocm"):
depends_on("hsa-rocr-dev")
depends_on("hip")
depends_on("hip@7.0:", when="@2.9:")
depends_on("hip@:6.4", when="@:2.7")
depends_on("rccl", when="+nccl")
depends_on("rocprim")
depends_on("hipcub")
Expand All @@ -320,11 +322,20 @@ class PyTorch(PythonPackage, CudaPackage, ROCmPackage):
depends_on("rocfft")
depends_on("rocblas")
depends_on("miopen-hip")
for target in ROCmPackage.amdgpu_targets:
depends_on(f"composable-kernel amdgpu_target={target}", when=f"amdgpu_target={target}")
# This constraint applies to ANY hipblaslt in the dependency tree
# including the one used by miopen-hip
depends_on(f"hipblaslt amdgpu_target={target}", when=f"amdgpu_target={target}")
# Ensure hipblaslt version for 2.9+
depends_on(
f"hipblaslt@7.0: amdgpu_target={target}", when=f"@2.9: amdgpu_target={target}"
)
depends_on("rocminfo")
depends_on("aotriton@0.8.1b", when="@2.5:2.6")
depends_on("aotriton@0.9.1b", when="@2.7:")
depends_on("composable-kernel@:6.3.2", when="@2.5")
depends_on("composable-kernel@6.3.2:", when="@2.6:")
depends_on("hipsparselt@7.0:", when="@2.9:")
depends_on("aotriton@0.8b", when="@2.5:2.6")
depends_on("aotriton@0.9.2b", when="@2.7")
depends_on("aotriton@0.10b", when="@2.8:")
depends_on("mpi", when="+mpi")
depends_on("ucc", when="+ucc")
depends_on("ucx", when="+ucc")
Expand Down Expand Up @@ -568,6 +579,14 @@ def patch(self):
"torch_global_deps PROPERTIES LINKER_LANGUAGE CXX",
"caffe2/CMakeLists.txt",
)
if self.spec.satisfies("@2.5:+rocm"):
filter_file(
"find_library(ROCM_ROCTX_LIB roctx64 HINTS ${ROCM_PATH}/lib)",
"find_library(ROCM_ROCTX_LIB roctx64 HINTS ${ROCM_PATH}/lib)\n"
"set(ROCTRACER_INCLUDE_DIR $ENV{ROCTRACER_INCLUDE_DIR})",
"cmake/public/LoadHIP.cmake",
string=True,
)
if self.spec.satisfies("@2.1:2.7+rocm"):
filter_file(
"${ROCM_INCLUDE_DIRS}/rocm-core/rocm_version.h",
Expand Down
39 changes: 38 additions & 1 deletion repos/spack_repo/builtin/packages/py_torchvision/package.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
#
# SPDX-License-Identifier: (Apache-2.0 OR MIT)

import os

from spack_repo.builtin.build_systems.python import PythonPackage

Expand Down Expand Up @@ -198,7 +199,43 @@ def setup_build_environment(self, env: EnvironmentModifications) -> None:
include.extend(query.headers.directories)
library.extend(query.libs.directories)

# CONTRIBUTING.md says to use TORCHVISION_INCLUDE and TORCHVISION_LIBRARY, but
# PyTorch headers include rocthrust, rocprim, hipsparse, hipblas, hipblas-common,
# hipblaslt and hipsolver headers; when building with ROCm we need these in the
# include path (py-torch depends on them, but they are not direct link deps of
# torchvision). Only add paths for packages that are in the spec to avoid KeyError.
if "^py-torch+rocm" in self.spec:
rocm_include_pkgs = [
"rocthrust",
"rocprim",
"hipsparse",
"hipblas",
"hipblas-common",
"hipblaslt",
"hipsolver",
]
for pkg in rocm_include_pkgs:
if pkg in self.spec:
include.extend(self.spec[pkg].headers.directories)

# At build time, torchvision's setup imports torch; libtorch_hip.so then
# needs aotriton and hip libs at runtime. Add their lib dirs so the loader
# can resolve undefined symbols (e.g. aotriton::v2::flash::attn_bwd_fused).
for pkg in ["aotriton", "hip"]:
if pkg not in self.spec:
continue
try:
for lib_dir in self.spec[pkg].libs.directories:
env.prepend_path("LD_LIBRARY_PATH", lib_dir)
except NoLibrariesError:
# Package may not declare 'libraries' (e.g. aotriton), so Spack
# cannot recursively locate libs. Add prefix lib dirs when they
# exist so the loader can find .so files (lib, lib64, or both).
for sub in ("lib", "lib64"):
lib_dir = os.path.join(self.spec[pkg].prefix, sub)
if os.path.isdir(lib_dir):
env.prepend_path("LD_LIBRARY_PATH", lib_dir)

# CONTRIBUTING.md says to use TORCHVISION_INCLUDE and TORCHVISION_LIBRARY, but
# these do not work for older releases. Build uses a mix of Spack's compiler wrapper
# and the actual compiler, so this is needed to get parts of the build working.
# See https://github.com/pytorch/vision/issues/2591
Expand Down
2 changes: 2 additions & 0 deletions stacks/e4s-rocm-external/spack.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -259,6 +259,8 @@ spack:
image: ghcr.io/spack/e4s-rocm-base-x86_64:v6.4.3-1760790880
broken-tests-packages:
- paraview
- composable-kernel
- py-llvmlite

cdash:
build-group: E4S ROCm External
31 changes: 16 additions & 15 deletions stacks/ml-linux-x86_64-rocm/spack.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,23 +46,22 @@ spack:
# - py-keras backend=torch

# PyTorch
# Does not yet support Spack-installed ROCm
# - py-botorch
# - py-gpytorch
# - py-kornia
# - py-lightning
# - py-pytorch-lightning
# - py-segmentation-models-pytorch
# - py-timm
# - py-torch
# - py-torch-geometric
- py-botorch
- py-gpytorch
- py-kornia
- py-lightning
- py-pytorch-lightning
- py-segmentation-models-pytorch
- py-timm
- py-torch
- py-torch-geometric
# - py-torch-nvidia-apex
# - py-torchaudio
# - py-torchdata
# - py-torchgeo
# - py-torchmetrics
- py-torchdata
- py-torchgeo
- py-torchmetrics
# - py-torchvision
# - py-vector-quantize-pytorch
- py-vector-quantize-pytorch

# scikit-learn
- py-scikit-learn
Expand All @@ -82,11 +81,13 @@ spack:
# - py-xgboost

ci:
broken-tests-packages:
- composable-kernel
- py-llvmlite
pipeline-gen:
- build-job:
image:
name: ghcr.io/spack/ubuntu-24.04:v2025-09-15
entrypoint: ['']

cdash:
build-group: Machine Learning
Loading