Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topic/cuda aware communications #671

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

bosilca
Copy link
Contributor

@bosilca bosilca commented Sep 10, 2024

Add support for sending and receiving the data directly from and to devices. There are few caveats (noted on the commit log).

  1. The first question is how is such a device selected ?

The allocation of such a copy happen way before the scheduler is invoked
for a task, in fact before the task is even ready. Thus, we need to
decide on the location of this copy only based on some static
information, such as the task affinity. Therefore, this approach only
works for owner-compute type of tasks, where the task will be executed
on the device that owns the data used for the task affinity.

  1. Pass the correct data copy across the entire system, instead of
    falling back to data copy of the device 0 (CPU memory)

TODOs

  • rebase on c11 atomic fix
  • Add a configure option to enable GPU-aware communications.
  • Add a runtime configuration to turn on/off the gpu-aware comms?
  • Pass -g 2 tests
  • Failure with ctest get_best_device scheduling.c:157: int __parsec_execute(parsec_execution_stream_t *, parsec_task_t *): Assertion NULL != copy->original && NULL != copy->original->device_copies[0]'
  • Failure with ctest nvlink, stress (segfault)
  • Failure with ctest stage (presumably identical to intermittent failure in gemm/potrf) device_gpu.c:2470: int parsec_device_kernel_epilog(parsec_device_gpu_module_t *, parsec_gpu_task_t *): Assertion PARSEC_DATA_STATUS_UNDER _TRANSFER == cpu_copy->data_transfer_status' failed.
  • RO data between tasks may reach an assert when doing D2D between devices that do not have peer_access between them
  • readers values are miscounted when 2 or more GPUs are used per rank Topic/cuda aware communications #671 (comment)

@abouteiller

This comment was marked as resolved.

@abouteiller

This comment was marked as resolved.

@devreal

This comment was marked as resolved.

@abouteiller abouteiller force-pushed the topic/cuda_aware_communications branch from efa8386 to ab1a74a Compare October 11, 2024 18:07
@abouteiller
Copy link
Contributor

Now passing 1-gpu/node, 8 ranks PTG POTRF
Sorry I had to force-push there were issues with rebasing on master

@abouteiller

This comment was marked as resolved.

bosilca and others added 10 commits October 30, 2024 09:59
Signed-off-by: George Bosilca <gbosilca@nvidia.com>
This allows to check if the data can be send and received directly to
and from GPU buffers.

Signed-off-by: George Bosilca <gbosilca@nvidia.com>
This is a multi-part patch that allows the CPU to prepare a data copy
mapped onto a device.

1. The first question is how is such a device selected ?

The allocation of such a copy happen way before the scheduler is invoked
for a task, in fact before the task is even ready. Thus, we need to
decide on the location of this copy only based on some static
information, such as the task affinity. Therefore, this approach only
works for owner-compute type of tasks, where the task will be executed
on the device that owns the data used for the task affinity.

2. Pass the correct data copy across the entire system, instead of
   falling back to data copy of the device 0 (CPU memory)

Add a configure option to enable GPU-aware communications.

Signed-off-by: George Bosilca <gbosilca@nvidia.com>
Name the data_t allocated for temporaries allowing developers to track
them through the execution. Add the keys to all outputs (tasks and
copies).

Signed-off-by: George Bosilca <gbosilca@nvidia.com>
Signed-off-by: George Bosilca <gbosilca@nvidia.com>
copy if we are passed-in a GPU copy, and we need to retain/release the
copies that we are swapping
@abouteiller abouteiller force-pushed the topic/cuda_aware_communications branch from eb5c782 to 3e0cb38 Compare October 31, 2024 14:53
…ut-only flows, for which checking if they are control flows segfaults
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants