[BUG FIX]:fix get torch.version.cuda error when cuda is None in rocm #6909

hj-wei · 2024-12-24T07:38:33Z

HI, I found some error when using deepspeed with rocm-torch

torch_cuda_version = ".".join(torch.version.cuda.split('.')[:2])

will raise an AttributeError when torch.version.cuda is None. This occurs because the CUDA version in rocm-torch/version.py is set to always be None, leading to potential runtime errors in environments where ROCm is being used.

tjruwase · 2024-12-24T15:05:16Z

op_builder/builder.py

@@ -839,7 +839,9 @@ def cxx_args(self):

        CPU_ARCH = self.cpu_arch()
        SIMD_WIDTH = self.simd_width()
-        CUDA_ENABLE = self.is_cuda_enable()
+        CUDA_ENABLE = (
+            "-D__DISABLE_CUDA__" if self.is_rocm_pytorch() else self.is_cuda_enable()


Thanks for this PR. Can you please help with the following changes to improve the codebase beyond your fix?

Rename the method is_cuda_enable() -> to get_cuda_compile_flag(), which is more meaningful.

Handle self.is_rocm_pytorch() case in get_cuda_compile_flag(), returning -D__DISABLE_CUDA__ as appropriate.

Are you able to help these changes? Thanks!

[BUG FIX]:fix get torch.version.cuda error when cuda is None in rocm

b86b380

hj-wei requested review from loadams, tjruwase and jomayeri as code owners December 24, 2024 07:38

tjruwase reviewed Dec 24, 2024

View reviewed changes

Move ROCm torch branch check into get_cuda_compile_flag

309a666

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG FIX]:fix get torch.version.cuda error when cuda is None in rocm #6909

[BUG FIX]:fix get torch.version.cuda error when cuda is None in rocm #6909

hj-wei commented Dec 24, 2024

tjruwase Dec 24, 2024

hj-wei Dec 25, 2024

[BUG FIX]:fix get torch.version.cuda error when cuda is None in rocm #6909

Are you sure you want to change the base?

[BUG FIX]:fix get torch.version.cuda error when cuda is None in rocm #6909

Conversation

hj-wei commented Dec 24, 2024

tjruwase Dec 24, 2024

Choose a reason for hiding this comment

hj-wei Dec 25, 2024

Choose a reason for hiding this comment