Gpu pingpong test #556

therault · 2023-06-14T17:59:35Z

This creates a simple test that makes data be updated on CPU and on GPUs alternatively.

As part of the test, I found out that HIP was not ported over DTD, and this ports HIP over DTD. It also provides a first test for HIP.

therault · 2023-06-16T21:28:02Z

I'm a bit lost with CI here... Another pair of eyes would help. To summarize what I observe:

When running in shared=OFF / profiling=ON mode, we don't detect CUDA at all (no device, no compiler)
When running in shared=ON / profiling=OFF mode, we always detect CUDA (device part).
- In the master version, we asked for the slack package gcc@12.something, and that makes check_language(CUDA) fail, because nvcc cannot work with a gcc > 11.x
- In the version proposed in this patch, we load the slack package gcc@11.3.0. Now something even more curious is happening:
  - We detect CUDAToolkit and enable the cuda device without aproblem
  - We still claim that check_language(CUDA) fails.
  - To investigate why, I have added some CMake messages that are output of the current failing job (https://github.com/ICLDisco/parsec/actions/runs/5294237295/jobs/9583355256?pr=556).
    - nvcc is where it should be based on the CUDA toolkit we have discovered
    - I can run succesfully nvcc -c /path/to/some/cufile.cu
    - No CMakeError.log file is generated. I display the contents of CMakeFiles/ and it doesn't seem it contains any useful information.

To conclude, I have no idea why check_language(CUDA) fails in this setup, and I'm now out of ideas to test...

parsec/interfaces/dtd/insert_function.c

abouteiller · 2023-08-09T21:02:14Z

tests/dsl/dtd/dtd_test_cuda_task_insert.c

@@ -4,6 +4,7 @@
 #include "parsec/data_dist/matrix/two_dim_rectangle_cyclic.h"
 #include "parsec/interfaces/dtd/insert_function_internal.h"
 #include "tests/tests_data.h"
+#include "parsec/mca/device/cuda/device_cuda_internal.h"


why do we need this, this is internal and should not spillover into user code.

bosilca · 2023-10-19T16:17:53Z

please rebase and reassess the changes to the CI part (not clear they are still needed).

@devreal

make a token pass from CPU to each GPU, and back, a few times, to check a possible bug found by @devreal. Part of the DTD interface was not fully ported to HIP Enable (cuda|hip)_pingpong test in CI Add a PTG GPU pingpong test to compare with the behavior in DTD -- Work in progress Tests need to import the appropriate GPU-specific header file, as insert_function_internal.h doesn't do it for them anymore Enable PTG test over CUDA Fix errors in data distribution initialization and some DAG errors in the PTG of the GPU pingpong test Rename files and directories to match the new status of tests (tests/runtime/cuda is renamed tests/runtime/gpu and the pingpong tests are named to specify the API and not a particular device name, since they should work on both GPU types) Only define the pingpong tests if a suitable compiler is found for the kernels Do a ping-pong-pong test instead of ping-pong, to see how dependencies are tracked on GPU-to-GPU task dependency Fix the checks of the pingpong test, and add it in the Testings.cmake PTG ping-pong test: in order to guide the selection of the best device, the advised data needs to flow from a CPU task, not directly from memory. Trying to introduce the gpu_nvidia runner in the CI matrix Add ROCm, create one github_runner-[device].yaml file per device; remove debugging info from CMakeLists.txt Add some infrastructure to make sure CI does the device tests where it should, and issue an error if things cannot be tested (e.g. because the GPUs are down or the compiler/spack is broken) Trying to work around the xml2 issue with mesa. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> Integrate the gpu_amd/release in the test suite Add support to rocm-smi in check_nb_devices.sh Conditional CMake command that depends upon the github runner loaded to prepare for testing

…e if we install binutils+headers?

therault · 2024-01-12T16:24:12Z

Split this PR in two: one for the tester itself and another for the CI/runners

therault requested a review from a team as a code owner June 14, 2023 17:59

therault force-pushed the gpu_pingpong_test branch from e329147 to 49b0fd2 Compare June 14, 2023 18:06

bosilca approved these changes Jun 14, 2023

View reviewed changes

abouteiller approved these changes Jun 16, 2023

View reviewed changes

therault force-pushed the gpu_pingpong_test branch 16 times, most recently from 2cdb527 to 4222191 Compare June 16, 2023 21:11

therault force-pushed the gpu_pingpong_test branch 7 times, most recently from 1ffaec0 to db3ad7d Compare June 22, 2023 15:38

bosilca force-pushed the gpu_pingpong_test branch from b9f4741 to db3ad7d Compare June 22, 2023 19:21

therault force-pushed the gpu_pingpong_test branch from a3f8803 to 49f7d56 Compare June 23, 2023 20:55

abouteiller reviewed Aug 9, 2023

View reviewed changes

parsec/interfaces/dtd/insert_function.c Outdated Show resolved Hide resolved

abouteiller reviewed Aug 9, 2023

View reviewed changes

therault force-pushed the gpu_pingpong_test branch from e6b4966 to c644262 Compare November 6, 2023 21:51

therault added 2 commits November 7, 2023 15:56

Adapt spack list to current runners

80e7ea5

According to spack/spack#29350 llvm (required by hip) can only compil…

36f7822

…e if we install binutils+headers?

therault force-pushed the gpu_pingpong_test branch from b7055c4 to 36f7822 Compare November 7, 2023 16:22

abouteiller added this to the v4.0 milestone Jan 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gpu pingpong test #556

Gpu pingpong test #556

therault commented Jun 14, 2023

therault commented Jun 16, 2023

abouteiller Aug 9, 2023

bosilca commented Oct 19, 2023 •

edited

Loading

therault commented Jan 12, 2024

Gpu pingpong test #556

Are you sure you want to change the base?

Gpu pingpong test #556

Conversation

therault commented Jun 14, 2023

therault commented Jun 16, 2023

abouteiller Aug 9, 2023

Choose a reason for hiding this comment

bosilca commented Oct 19, 2023 • edited Loading

therault commented Jan 12, 2024

bosilca commented Oct 19, 2023 •

edited

Loading