[feature request] Support for AMD GPU #907

cregouby · 2022-10-15T12:06:54Z

following up #455, I'd love to able to run torch load on my AMD GPU.
My Hardware is available for any test / debug / experiment around it
Thanks

dfalbel · 2022-10-17T12:36:32Z

Cool @cregouby !

In order to get support for AMD GPU's we will need to figure out:

How to build lantern targeting ROCm, probably adding another set of conditions here to download the pre-built binaries for ROCm.
Setup a workflow to build lantern for ROCm and upload the pre-built binaries here
Then modify the install.R to allow installing from ROCm builds.

cregouby · 2022-10-19T07:06:14Z

nice push ! I'm on it in https://github.com/cregouby/torch/tree/platform/amd_gpu
Currently 1. seems to have a good start :

~/R/_packages/torch/lantern/build$ cmake ..
-- The C compiler identification is GNU 11.2.0
-- The CXX compiler identification is GNU 11.2.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Downloading /home/___/R/_packages/torch/lantern/build/libtorch.zip: https://download.pytorch.org/libtorch/rocm5.1.1/libtorch-cxx11-abi-shared-with-deps-1.12.1%2Brocm5.1.1.zip

I still need to add version-matching check (as I currently do not match the available rocm version on my machine)

dfalbel · 2022-10-19T11:17:00Z

Nice! This is looking great! Maybe ROCM can work with minor version mismatches? That's not the case for CUDA, but you could try.

cregouby · 2022-10-20T07:05:58Z

Sure !
Currently dealing with Github-action workflow, I'm wondering which runs-on should be selected to have a AMD GPU hardware to run on.. Any idea on this ? (I have to admit that part of the hardware is unclear to me in github runners)

dfalbel · 2022-10-20T10:12:26Z

I think you can cross-compile on the default ubuntu and install the ROCm compilers. Ie, I think you can compile for ROCm in a machine that doesn't include a AMD GPU.

See eg: https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html#installing-development-packages-for-cross-compilation

cregouby · 2022-10-27T06:40:47Z

I've made good progress on step 3. (maybe the easiest one)
I'm still hardly fighting the 1. with step-by-step progress. I've now fixed the hipBLAS requirement, and I'm now dealing with 3 more packages needed hipFFT, hipRAND, hipSPARSE. I'll keep you up to date...

cregouby · 2022-11-04T18:43:13Z

Some news on the task :

cmake is now successful on lantern
make -j8 fails with a weird error :

....
[ 39%] Building CXX object CMakeFiles/lantern.dir/src/Dimname.cpp.o                                                                                                                                                                          
In file included from /home/____/R/_packages/torch/lantern/src/Dtype.cpp:8:                                                                                                                                                                  
In file included from /home/____/R/_packages/torch/lantern/src/utils.hpp:2:                                                                                                                                                                  
/home/____/R/_packages/torch/lantern/include/lantern/types.h:13:10: warning: pack fold expression is a C++17 extension [-Wc++17-extensions]                                                                                                  
         ...);                                                                                                                                                                                                                               
         ^                                                                                                                                                                                                                                   
/home/____/R/_packages/torch/lantern/include/lantern/types.h:9:3: error: no member named 'apply' in namespace 'std'; did you mean 'torch::apply'?                                                                                            
  std::apply(                                                                                                                                                                                                                                
  ^~~~~~~~~~                                                                                                                                                                                                                                 
  torch::apply                                                                                                                                                                                                                               
/home/____/R/_packages/torch/lantern/build/libtorch/include/torch/csrc/utils/variadic.h:118:6: note: 'torch::apply' declared here                                                                                                            
void apply(Function function, Ts&&... ts) {                                                                                                                                                                                                  
     ^                                                                                                                                                                                                                                       
1 warning and 1 error generated when compiling for gfx900. 
...
make[2]: *** [CMakeFiles/lantern.dir/build.make:76 : CMakeFiles/lantern.dir/src/lantern.cpp.o] Erreur 1
make[1]: *** [CMakeFiles/Makefile2:85 : CMakeFiles/lantern.dir/all] Erreur 2
make: *** [Makefile:91 : all] Erreur 2

any suggestion would be appreciated

dfalbel · 2022-11-04T22:01:17Z

Great!!

Perhaps something equivalent to the below for ROCM is missing?

torch/lantern/CMakeLists.txt

Line 192 in fef4bf0

set_property(TARGET lantern PROPERTY CUDA_STANDARD 17)

dfalbel · 2022-11-04T22:09:05Z

It seems that setting this would help: https://cmake.org/cmake/help/latest/prop_tgt/HIP_STANDARD.html

cregouby · 2022-11-04T23:32:58Z

Thanks for the hint, setting it to value 14 or 17 did not remove the C++17 extension warning....

For the error lantern/types.h:9:3: error: no member named 'apply' in namespace 'std'; did you mean 'torch::apply' , I made the change into types.h (I must admit I'm completely lost with what to do - not to do in .h files)
https://github.com/cregouby/torch/blob/9c67675d43862cb53c7b47df7c5451eb741798ec/lantern/include/lantern/types.h#L9
and now build target lantern reaches 100 %

My two big uncertainties right now are

what is the impact of changing type.h std::apply into torch::apply
is src/Contrib/SortVertices/sort_vert_cpu.cpp sufficient to build on ROCm ? i.e. not including src/AllocatorCuda.cpp and src/Contrib/SortVertices/sort_vert_kernel.cu ...

dfalbel · 2022-11-04T23:55:21Z

I don't think torch::apply is equivalent to std::apply...
I think torch::apply is equivalent to https://pytorch.org/docs/stable/generated/torch.Tensor.apply_.html while std::apply is metaprogramming stuff from C++ https://en.cppreference.com/w/cpp/utility/apply

std::apply is a C++17 feature, so that warning is probably caused by the compiler not supporting c++17, or maybe that HIP standard flag is not being correctly propagated. AFAICT in the cuda world, nvcc (the compiler that supports cuda) works like a preprocessor, ie, it will take the CUDA parts and compile and the part that of the code that is not CUDA related is forwarded to a C++ compiler, and that's where those flags matter.

Yeah, I think you don't need to provide HIP kernel for the Contrib stuff, so just building with the CPU version should be fine.

cregouby · 2022-11-07T18:00:05Z

Thanks for those hints, I'll try to rework based on that !
FYI the 100% build of lantern makes `install_torch_from_file()to fail with

install_torch(version = version, type = type, install_config = install_config)
Erreur dans cpp_lantern_init(file.path(install_path(), "lib")) : 
  /home/____/R/x86_64-pc-linux-gnu-library/4.2/torch/lib/liblantern.so - /home/____/R/x86_64-pc-linux-gnu-library/4.2/torch/lib/liblantern.so: undefined symbol: _ZN2at4_ops4rand4callEN3c108ArrayRefIlEENS2_8optionalINS2_10ScalarTypeEEENS5_INS2_6LayoutEEENS5_INS2_6DeviceEEENS5_IbEE

And despite my effort, I can't get the HIP compiler to consider C++17 code... I'll question the authors... or maybe try
something else based on https://github.com/ROCm-Developer-Tools/HIP/blob/809149ecc8d751acd3c1595b590090cd86ada8df/bin/hipcc.pl#L397

    # nvcc does not handle standard compiler options properly
    # This can prevent hipcc being used as standard CXX/C Compiler
    # To fix this we need to pass -Xcompiler for options

dfalbel · 2022-11-09T16:59:43Z

That's great progress!! 👍

Hmm, this seems to be related to the clang version, perhaps? Or something like this?

cregouby · 2024-01-31T19:26:03Z

Ah some news here after some deeper investigation :

Support and Compatibility

libtorch public / nightly	rocm	ubuntu installer	gfx card support	R torch
-	5.0		908, 90a
1.13.0 - 1.13.1 / 1.13.0 - 2.0.0	5.2	18.04/20.04 (1)	add 1011 (2)	0.10.0
-	5.3.0	22.04	add 11xx	-
2.0.0-2.0.1 / 2.0.0 - 2.1.0	5.4.2	22.04	add 1100, 1102	0.12.0

Liblantern build

Strickly following the compatibility table, I've been able to build liblantern.so for

ROCM 5.2
ROCM 5.4.2

using the official buildlantern.R

{torch}

I've tweeked a bit the download torch right now and get to the following success :

>   # copy lantern
>   source("R/install.R")
>   source("R/lantern_sync.R")
>   lantern_sync(TRUE)
[1] TRUE
> library(torch)

Attachement du package : ‘torch’

Les objets suivants sont masqués _par_ ‘.GlobalEnv’:

    get_install_libs_url, install_torch, install_torch_from_file, torch_install_path, torch_is_installed

> torch_version
[1] "2.0.1"
> tt <- torch_tensor(c(1,2,3,4), device = "cuda")
> tt
torch_tensor
 1
 2
 3
 4
[ CUDAFloatType{4} ]

which is amazing !

I still have a discrepancy as I currently crash R when running tt + 1 due to a possible mismatch in version between libtorch and {torch}.

But I can feel the taste of success...

RMHogervorst · 2024-02-20T19:55:08Z

This is very exciting! is there a way I can help test? I have an AMD rocm computer and I would love it if torch would work on gpu, just like pytorch!

cregouby · 2024-02-21T18:31:05Z

Hello @RMHogervorst ,
I'm glad you want to help!
You should clone the repo and switch to the platform/amd_gpu branch, where building the ROCM lantern is documented following the /.github/CONTRIBUTING.md.
In order to build lantern for torch 0.12, you will need the ROCM 5.4.2 suite on your machine
Let us know if you can build it.

RMHogervorst · 2024-02-21T19:45:36Z

@cregouby
after cloning your repository

First install all packages (I used renv to do that)
I had to create the lantern directory (otherwise the build_lantern condition is not true)
installed cmake
run `source("tools/build_lantern.R")

CMake Error: The source directory "/home/roel/Documents/projecten/experimenten/torch/lantern" does not appear to contain CMakeLists.txt.

object path not found in lantern_sync

I think I'm missing something

I have installed the latest version of rocm 6.0.2, I can probably install the 5.4.2 version too, but I think this error is not related to the rocm version

RMHogervorst · 2024-02-21T20:14:47Z

I realized that there are cmakelist files in the src directory.
(I have not a lot of experience building c projects so I probably learn a lot (do stupid stuff))

from the src directory
run cmake .
run cmake --build . --target lantern --config Release --parallel 8

This builds a library, but it seems to build it for cpu

cregouby · 2024-02-21T22:19:01Z

Sorry @RMHogervorst, I didn't commit my experimental lantern/CMakeLists.txt
You should now get it if you git pull again from the cregouby/torch repo on branch platform/amd_gpu

Feel free to question or improve every line inside the CMakeLists.txt file, as makefiles are far beyond my confort zone.

After lantern is compiled, you may want to setup some environment variables.

Those are mine, stored in .Renviron (again may need some changes)

# --- torch  / lantern build
# change ARCH target at `make` time
HCC_AMDGPU_TARGET=gfx900
USE_ROCM=1
BUILD_LANTERN=1

# ---- torch lantern package build ----
MAKE=make -j10
LD_LIBRARY_PATH=/opt/rocm-5.4.2/lib:/opt/rocm-5.4.2/llvm/lib:~/R/_packages/torch/inst/lib:~/R/x86_64-pc-linux-gnu-library/4.3/torch/lib
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/snap/bin:/opt/rocm-5.4.2:/opt/rocm-5.4.2/bin
ROCM_PATH=/opt/rocm
# ---- local liblantern.so usage----
# may need a ln -s of a liblantern_<version>.so in the same directory
  # The library URL can be 3 different things:
  # - real URL
  # - path to a zip file containing the library
  # - path to a directory containing the files to be installed. 
# if set, escape the download within lantern/CMakeLists.txt
# TORCH_URL= https://download.pytorch.org/libtorch/rocm5.4.2/libtorch-cxx11-abi-shared-with-deps-2.0.1%2Brocm5.4.2.zip
# local cache of the previous
TORCH_URL= "~/R/_packages/torch_experiment/libtorch-cxx11-abi-shared-with-deps-2.0.1%2Brocm5.4.2.zip"
TORCH_INSTALL_DEBUG=1

cregouby changed the title ~~Support for AMD GPU~~ [feature request] Support for AMD GPU Oct 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature request] Support for AMD GPU #907

[feature request] Support for AMD GPU #907

cregouby commented Oct 15, 2022

dfalbel commented Oct 17, 2022

cregouby commented Oct 19, 2022 •

edited

Loading

dfalbel commented Oct 19, 2022 •

edited

Loading

cregouby commented Oct 20, 2022

dfalbel commented Oct 20, 2022

cregouby commented Oct 27, 2022

cregouby commented Nov 4, 2022

dfalbel commented Nov 4, 2022

dfalbel commented Nov 4, 2022

cregouby commented Nov 4, 2022 •

edited

Loading

dfalbel commented Nov 4, 2022

cregouby commented Nov 7, 2022

dfalbel commented Nov 9, 2022

cregouby commented Jan 31, 2024

RMHogervorst commented Feb 20, 2024

cregouby commented Feb 21, 2024

RMHogervorst commented Feb 21, 2024 •

edited

Loading

RMHogervorst commented Feb 21, 2024

cregouby commented Feb 21, 2024 •

edited

Loading

[feature request] Support for AMD GPU #907

[feature request] Support for AMD GPU #907

Comments

cregouby commented Oct 15, 2022

dfalbel commented Oct 17, 2022

cregouby commented Oct 19, 2022 • edited Loading

dfalbel commented Oct 19, 2022 • edited Loading

cregouby commented Oct 20, 2022

dfalbel commented Oct 20, 2022

cregouby commented Oct 27, 2022

cregouby commented Nov 4, 2022

dfalbel commented Nov 4, 2022

dfalbel commented Nov 4, 2022

cregouby commented Nov 4, 2022 • edited Loading

dfalbel commented Nov 4, 2022

cregouby commented Nov 7, 2022

dfalbel commented Nov 9, 2022

cregouby commented Jan 31, 2024

Support and Compatibility

Liblantern build

{torch}

RMHogervorst commented Feb 20, 2024

cregouby commented Feb 21, 2024

RMHogervorst commented Feb 21, 2024 • edited Loading

RMHogervorst commented Feb 21, 2024

cregouby commented Feb 21, 2024 • edited Loading

cregouby commented Oct 19, 2022 •

edited

Loading

dfalbel commented Oct 19, 2022 •

edited

Loading

cregouby commented Nov 4, 2022 •

edited

Loading

RMHogervorst commented Feb 21, 2024 •

edited

Loading

cregouby commented Feb 21, 2024 •

edited

Loading