You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using patch #2086 to compile with CUDA Toolkit 12.6.3
cmake
cmake -B../build -S../cutlass -DCUTLASS_NVCC_ARCHS="90a" -DCUTLASS_ENABLE_GDC_FOR_SM90=1
-- CMake Version: 3.31.4
-- CUTLASS 3.8.0
-- CUDART: /home/manish_magic_dev/sdk/cuda/12.6.3/lib64/libcudart.so
-- CUDA Driver: /home/manish_magic_dev/sdk/cuda/12.6.3/lib64/stubs/libcuda.so
-- NVRTC: /home/manish_magic_dev/sdk/cuda/12.6.3/lib64/libnvrtc.so
-- Default Install Location: install
-- Make cute::tuple be the new standard-layout tuple type
-- CUDA Compilation Architectures: 90a
-- Enable caching of reference results in conv unit tests
-- Enable rigorous conv problem sizes in conv unit tests
-- Grid Dependency Control (GDC) is enabled for SM90 kernels (required for programmatic dependent launches).
-- Using the following NVCC flags:
--expt-relaxed-constexpr
-DCUTE_USE_PACKED_TUPLE=1
-DCUTLASS_TEST_LEVEL=0
-DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1
-DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1
-DCUTLASS_DEBUG_TRACE_LEVEL=0
-DCUTLASS_ENABLE_GDC_FOR_SM90=1
-Xcompiler=-Wconversion
-Xcompiler=-fno-strict-aliasing
-- CUTLASS Revision: 833f6990
CMake Warning (dev) at /home/manish_magic_dev/.local/lib/python3.10/site-packages/cmake/data/share/cmake-3.31/Modules/FetchContent.cmake:1953 (message):
Calling FetchContent_Populate(googletest) is deprecated, call
FetchContent_MakeAvailable(googletest) instead. Policy CMP0169 can be set
to OLD to allow FetchContent_Populate(googletest) to be called directly for
now, but the ability to call it with declared details will be removed
completely in a future version.
Call Stack (most recent call first):
cmake/googletest.cmake:47 (FetchContent_Populate)
CMakeLists.txt:759 (include)
This warning is for project developers. Use -Wno-dev to suppress it.
-- Found Python3: /usr/bin/python3.10 (found version "3.10.12") found components: Interpreter
-- Configuring cublas ...
-- cuBLAS Disabled.
-- Configuring cuBLAS ... done.
-- Completed generation of library instances. See /home/manish_magic_dev/repos/cutlass/cutlass_tree_2/build/tools/library/library_instance_generation.log for more information.
-- Found Python3: /usr/bin/python3.10 (found suitable version "3.10.12", minimum required is "3.5") found components: Interpreter
-- Enable device reference verification in conv unit tests
-- Configuring done (3.2s)
-- Generating done (3.5s)
-- Build files have been written to: /home/manish_magic_dev/repos/cutlass/cutlass_tree_2/build
NVIDIA SMI
nvidia-smi topo -p2p r
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7
GPU0 X OK OK OK OK OK OK OK
GPU1 OK X OK OK OK OK OK OK
GPU2 OK OK X OK OK OK OK OK
GPU3 OK OK OK X OK OK OK OK
GPU4 OK OK OK OK X OK OK OK
GPU5 OK OK OK OK OK X OK OK
GPU6 OK OK OK OK OK OK X OK
GPU7 OK OK OK OK OK OK OK X
Legend:
X = Self
OK = Status Ok
CNS = Chipset not supported
GNS = GPU not supported
TNS = Topology not supported
NS = Not supported
U = Unknown
It looks like it's failing gemm.initialize, which means one of the following failed:
Per-gpu workspace initialization: probably not the reason
Setting shmem size: shouldn't happen as long as the kernel doesn't require more shmem than available.
Constructing CUDA graphs failed: this might be the most likely reason.
Could you confirm:
Whether the kernel in the example was modified, and if so, to what tile shape (so we can rule out shmem size).
Your CUDA driver version (If it's corresponding CTK version is older than 12.5, then specific CUDA graphs APIs required by the example wouldn't work even if the compiler's 12.5 or later and has the APIs available.
Meanwhile, could you also try building with -DCUTLASS_DEBUG_TRACE_LEVEL=5 and trying again to see if the traces tell us anything?
I compiled using 12.6.3, also tried 12.8 but my driver version is archaic (Driver Version: 535.183.01). So this is probably the reason. Maybe we can update the minimum driver version requirement in the example README.md just like the minimum toolkit version required is 12.6.
I will see if we can update the driver and try example 65 again!
Using patch #2086 to compile with CUDA Toolkit 12.6.3
cmake
NVIDIA SMI
Builds example 65
Running it on 8xH100
nvidia-smi while it was running
The text was updated successfully, but these errors were encountered: