Delegate GPU task completion to a co-manager #509

josephjohnjj · 2023-04-02T02:24:33Z

Delegate GPU task completion to a co-manager using the MCA parameter device_cuda_delegate_task_completion.

The second CPU thread that submits the task to the GPU device is transitioned to a co-manager.
The task is completed by the manager if the co manager has not yet been set.
The manager pushes the task to be completed to a co-manager specific queue.
The GPU task is freed by the thread (manager or co-manager) that completes it.

complete_mutex - tracks the number of tasks to be completed by the co-manager to_complete - list of tasks to be completed by the co-manager co_manager_mutex - ensures that there is only one co-manager per device

The second thread that submits the task to the GPU device is transitioned to a co-manager. The task is completed by the manager if the co manager has not yet been set. The task is freed by the manager if it completes the tasks or the task is freed by the co-manager.

devreal · 2024-11-01T02:53:06Z

What is the status of this? Any reason for not taking this in?

@josephjohnjj Could you please rebase your branch?

josephjohnjj · 2024-11-01T05:20:51Z

@devreal There was no performance improvement when using a co-manager to complete the task. @bosilca suggested that this might be due to a single task completion not generating enough child tasks to make a noticeable impact. Unlike #566, in #509 co-manager just completed the tasks and was not involved in task execution.

I'm doubtful that rebasing the code would be helpful this at this stage. In my codebase, all task offloading to GPU occurs in parsec_cuda_kernel_scheduler() within parsec/mca/device/cuda/device_cuda_module.c.

In the current codebase, this has been moved to parsec_device_kernel_scheduler() in parsec/parsec/mca/device/cuda/device_cuda_module.c.

I can implement the same in the current codebase if having a co-manger will be helpful. Also, the co-manager was controlled by an MCA parameter, so if in the extreme case there is just 2 cores we could make the choice not to use the co-manager.

josephjohnjj added 2 commits April 1, 2023 20:52

Updated parsec_device_gpu_module_s to incorporate co-manager.

927d191

complete_mutex - tracks the number of tasks to be completed by the co-manager to_complete - list of tasks to be completed by the co-manager co_manager_mutex - ensures that there is only one co-manager per device

josephjohnjj requested a review from a team as a code owner April 2, 2023 02:24

BrieucNicolas mentioned this pull request Aug 9, 2023

co_manager shortcuting the scheduler #566

Draft

bosilca mentioned this pull request Oct 30, 2024

Offload device task release to worker threads #687

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delegate GPU task completion to a co-manager #509

Delegate GPU task completion to a co-manager #509

josephjohnjj commented Apr 2, 2023

devreal commented Nov 1, 2024

josephjohnjj commented Nov 1, 2024 •

edited

Loading

Delegate GPU task completion to a co-manager #509

Are you sure you want to change the base?

Delegate GPU task completion to a co-manager #509

Conversation

josephjohnjj commented Apr 2, 2023

devreal commented Nov 1, 2024

josephjohnjj commented Nov 1, 2024 • edited Loading

josephjohnjj commented Nov 1, 2024 •

edited

Loading