Improve gpuistl using cudaGraphs #5852

multitalentloes · 2025-01-08T13:06:31Z

Using cudagraphs reduces the overhead associated with many consecutive kernel launches in GPU ILU and DILU.
Code changes contain very few linechanges in the apply, though the stream has to be specified and many functions need an updated signature.

Speedups on spe1, spe11 and sleipner are typically around 1.1 to 1.2 in the preconditioners apply.

Also no speedup yet in the update, seems strange given the same kernel pattern...

multitalentloes · 2025-01-17T15:44:37Z

opm/simulators/linalg/gpuistl/detail/preconditionerKernels/DILUKernels.cu

+    template void computeDiluDiagonal<T, blocksize>(                                                                   \
+        T*, int*, int*, int*, int*, const int, int, T*, int, cudaStream_t);                                            \
+    template void computeDiluDiagonalSplit<blocksize, T, double, MatrixStorageMPScheme::DOUBLE_DIAG_DOUBLE_OFFDIAG>(   \
+        const T*,                                                                                                      \


clang-format didnt make this bit prettier...

multitalentloes · 2025-01-17T15:47:09Z

Speedup is measured to be 10% to 20% on the apply of both ILU0 and DILU. No speedup measured for the update so I am not introducing cudaGraphs there, no idea why we do not see the same runtime reduction when the update also has lots of short kernels...

Still not mergable as the results are not verified on a consumer grade AMD-card to ensure there is no slowdown there (speedup is not expected with current rocm versions and current hardware generation)

multitalentloes · 2025-01-20T08:04:36Z

Now tested on AMD - 1.02 speedup on the ILU0 apply, though I am not sure this is significant/consistently reproducible.
Seems safe to use the graphs.

is also supported in HIP, though not does not seem to affect performance in any clear way. 1.1 to 1.2 speedup in Nvidia GPUs.

multitalentloes · 2025-01-20T08:07:52Z

opm/simulators/linalg/gpuistl/GpuBuffer.cpp

@@ -174,6 +174,19 @@ GpuBuffer<T>::copyFromHost(const T* dataPointer, size_t numberOfElements)
    OPM_GPU_SAFE_CALL(cudaMemcpy(data(), dataPointer, numberOfElements * sizeof(T), cudaMemcpyHostToDevice));
 }

+template <class T>
+void
+GpuBuffer<T>::copyFromHost(const T* dataPointer, size_t numberOfElements, cudaStream_t stream)


various functions have been changed to include the cudastream, this is because to record a set of GPU activities in a cudaGraph, a created stream must be used.

multitalentloes force-pushed the cudagraph_experiment branch from d990b2e to 47726cc Compare January 17, 2025 14:57

multitalentloes commented Jan 17, 2025

View reviewed changes

multitalentloes marked this pull request as ready for review January 20, 2025 08:01

use cudagraphs in gpu DILU and ILU0 apply

8ea1b29

is also supported in HIP, though not does not seem to affect performance in any clear way. 1.1 to 1.2 speedup in Nvidia GPUs.

multitalentloes force-pushed the cudagraph_experiment branch from acd2c7a to 8ea1b29 Compare January 20, 2025 08:06

multitalentloes commented Jan 20, 2025

View reviewed changes

multitalentloes added 2 commits January 20, 2025 09:48

resimplify update and format

4cde09a

reduce diff

c016ca1

multitalentloes requested a review from kjetilly January 20, 2025 08:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve gpuistl using cudaGraphs #5852

Improve gpuistl using cudaGraphs #5852

multitalentloes commented Jan 8, 2025 •

edited

Loading

multitalentloes Jan 17, 2025

multitalentloes commented Jan 17, 2025

multitalentloes commented Jan 20, 2025

multitalentloes Jan 20, 2025

Improve gpuistl using cudaGraphs #5852

Are you sure you want to change the base?

Improve gpuistl using cudaGraphs #5852

Conversation

multitalentloes commented Jan 8, 2025 • edited Loading

multitalentloes Jan 17, 2025

Choose a reason for hiding this comment

multitalentloes commented Jan 17, 2025

multitalentloes commented Jan 20, 2025

multitalentloes Jan 20, 2025

Choose a reason for hiding this comment

multitalentloes commented Jan 8, 2025 •

edited

Loading