Topic/cuda aware communications #671

bosilca · 2024-09-10T04:35:01Z

Add support for sending and receiving the data directly from and to devices. There are few caveats (noted on the commit log).

The first question is how is such a device selected ?

The allocation of such a copy happen way before the scheduler is invoked
for a task, in fact before the task is even ready. Thus, we need to
decide on the location of this copy only based on some static
information, such as the task affinity. Therefore, this approach only
works for owner-compute type of tasks, where the task will be executed
on the device that owns the data used for the task affinity.

Pass the correct data copy across the entire system, instead of
falling back to data copy of the device 0 (CPU memory)

TODOs

rebase on c11 atomic fix
Add a configure option to enable GPU-aware communications.
Add a runtime configuration to turn on/off the gpu-aware comms?
Pass -g 2 tests
Failure with ctest get_best_device scheduling.c:157: int __parsec_execute(parsec_execution_stream_t *, parsec_task_t *): Assertion NULL != copy->original && NULL != copy->original->device_copies[0]'
Failure with ctest nvlink, stress (segfault)
Failure with ctest stage (presumably identical to intermittent failure in gemm/potrf) device_gpu.c:2470: int parsec_device_kernel_epilog(parsec_device_gpu_module_t *, parsec_gpu_task_t *): Assertion PARSEC_DATA_STATUS_UNDER _TRANSFER == cpu_copy->data_transfer_status' failed.
RO data between tasks may reach an assert when doing D2D between devices that do not have peer_access between them
readers values are miscounted when 2 or more GPUs are used per rank Topic/cuda aware communications #671 (comment)

abouteiller · 2024-10-11T18:09:56Z

Now passing 1-gpu/node, 8 ranks PTG POTRF
Sorry I had to force-push there were issues with rebasing on master

Signed-off-by: George Bosilca <gbosilca@nvidia.com>

This allows to check if the data can be send and received directly to and from GPU buffers. Signed-off-by: George Bosilca <gbosilca@nvidia.com>

This is a multi-part patch that allows the CPU to prepare a data copy mapped onto a device. 1. The first question is how is such a device selected ? The allocation of such a copy happen way before the scheduler is invoked for a task, in fact before the task is even ready. Thus, we need to decide on the location of this copy only based on some static information, such as the task affinity. Therefore, this approach only works for owner-compute type of tasks, where the task will be executed on the device that owns the data used for the task affinity. 2. Pass the correct data copy across the entire system, instead of falling back to data copy of the device 0 (CPU memory) Add a configure option to enable GPU-aware communications. Signed-off-by: George Bosilca <gbosilca@nvidia.com>

Name the data_t allocated for temporaries allowing developers to track them through the execution. Add the keys to all outputs (tasks and copies). Signed-off-by: George Bosilca <gbosilca@nvidia.com>

Signed-off-by: George Bosilca <gbosilca@nvidia.com>

copy if we are passed-in a GPU copy, and we need to retain/release the copies that we are swapping

readers

…ut-only flows, for which checking if they are control flows segfaults

bosilca requested a review from a team as a code owner September 10, 2024 04:35

bosilca force-pushed the topic/cuda_aware_communications branch from 968bf7e to 6f2e034 Compare September 10, 2024 04:38

bosilca mentioned this pull request Sep 10, 2024

Add support for batched tasks and for CUDA-aware communications bosilca/parsec#4

Open

bosilca force-pushed the topic/cuda_aware_communications branch 2 times, most recently from b3dfcdc to 0838a95 Compare September 10, 2024 05:03

This comment was marked as resolved.

Sign in to view

abouteiller force-pushed the topic/cuda_aware_communications branch from efa8386 to ab1a74a Compare October 11, 2024 18:07

This comment was marked as resolved.

Sign in to view

abouteiller force-pushed the topic/cuda_aware_communications branch from cd7c475 to 3bab2d5 Compare October 16, 2024 20:43

therault mentioned this pull request Oct 24, 2024

C11 atomic lock alignment in data_t #685

Merged

bosilca and others added 10 commits October 30, 2024 09:59

Allow JDF with no dependencies, no datatype and no arenas.

fa9438f

Signed-off-by: George Bosilca <gbosilca@nvidia.com>

Add a CUDA-based RTT test.

0ab9a2b

This allows to check if the data can be send and received directly to and from GPU buffers. Signed-off-by: George Bosilca <gbosilca@nvidia.com>

Mostly improvement to the debuging output.

108b778

Name the data_t allocated for temporaries allowing developers to track them through the execution. Add the keys to all outputs (tasks and copies). Signed-off-by: George Bosilca <gbosilca@nvidia.com>

Correctly initialize an unlock atomic lock.

c08f9ca

Signed-off-by: George Bosilca <gbosilca@nvidia.com>

gpu-datain: when we are executing a CPU body we need to retrieve the CPU

bff15ea

copy if we are passed-in a GPU copy, and we need to retain/release the copies that we are swapping

gpu-datain: pure distribution collections have no data_of/data_of_key

bf90133

Use lock initializers instead of static temps

bedef4d

gpu-datain: when the RO data-in is from D2D we need to increase its

6099898

readers

gpu-datain: add a configure option and change some namings

3e0cb38

abouteiller force-pushed the topic/cuda_aware_communications branch from eb5c782 to 3e0cb38 Compare October 31, 2024 14:53

When managing CPU-only tasks that received a GPU data copy, skip outp…

82dcd40

…ut-only flows, for which checking if they are control flows segfaults

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Topic/cuda aware communications #671

Topic/cuda aware communications #671

bosilca commented Sep 10, 2024 •

edited by abouteiller

Loading

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

abouteiller commented Oct 11, 2024

This comment was marked as resolved.

Topic/cuda aware communications #671

Are you sure you want to change the base?

Topic/cuda aware communications #671

Conversation

bosilca commented Sep 10, 2024 • edited by abouteiller Loading

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

abouteiller commented Oct 11, 2024

This comment was marked as resolved.

bosilca commented Sep 10, 2024 •

edited by abouteiller

Loading