Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delegate GPU task completion to a co-manager #509

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

josephjohnjj
Copy link
Contributor

Delegate GPU task completion to a co-manager using the MCA parameter device_cuda_delegate_task_completion.

  1. The second CPU thread that submits the task to the GPU device is transitioned to a co-manager.
  2. The task is completed by the manager if the co manager has not yet been set.
  3. The manager pushes the task to be completed to a co-manager specific queue.
  4. The GPU task is freed by the thread (manager or co-manager) that completes it.

complete_mutex - tracks the number of tasks to be completed by the co-manager
to_complete - list of tasks to be completed by the co-manager
co_manager_mutex - ensures that there is only one co-manager per device
The second thread that submits the task to the GPU device is transitioned to a co-manager.
The task is completed by the manager if the co manager has not yet been set.
The task is freed by the manager if it completes the tasks or the task is freed by the
co-manager.
@devreal
Copy link
Contributor

devreal commented Nov 1, 2024

What is the status of this? Any reason for not taking this in?

@josephjohnjj Could you please rebase your branch?

@josephjohnjj
Copy link
Contributor Author

josephjohnjj commented Nov 1, 2024

@devreal There was no performance improvement when using a co-manager to complete the task. @bosilca suggested that this might be due to a single task completion not generating enough child tasks to make a noticeable impact. Unlike #566, in #509 co-manager just completed the tasks and was not involved in task execution.

I'm doubtful that rebasing the code would be helpful this at this stage. In my codebase, all task offloading to GPU occurs in parsec_cuda_kernel_scheduler() within parsec/mca/device/cuda/device_cuda_module.c.

In the current codebase, this has been moved to parsec_device_kernel_scheduler() in parsec/parsec/mca/device/cuda/device_cuda_module.c.

I can implement the same in the current codebase if having a co-manger will be helpful. Also, the co-manager was controlled by an MCA parameter, so if in the extreme case there is just 2 cores we could make the choice not to use the co-manager.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants