Skip to content

[Draft] Host task extensions #12921

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 14 commits into from
Closed

[Draft] Host task extensions #12921

wants to merge 14 commits into from

Conversation

hdelan
Copy link
Contributor

@hdelan hdelan commented Mar 6, 2024

Draft implementation adding a few HT extensions:

interop_handle::add_native_events
interop_handle::get_native_events
sycl::ext::codeplay::experimental::property::host_task::manual_interop_sync

Add native events

add_native_events allows the application to inform the SYCL runtime of asynchronous tasks submitted within a host_task function that the SYCL runtime must manage once the host_task function finishes execution. This extension allows for host_task functions to execute without needing synchronization points within the host_task function before returning.

Manual Interop Sync

manual_interop_sync is a property that can be passed to handler::host_task. The property instructs the SYCL runtime not to synchronize with all of the native events that the host_task command depends on at host_task submission time.

Get Native Events

get_native_events is a member function of interop handle. Since SYCL events have a one-to-one or one-to-
many mapping to native events, such as CUevents, this function allows the application to access the native events associated with the events that the command depends on. In this way, the application can handle asynchronous dependencies by using native events in the backend API within a host_task function, instead of handling dependencies by synchronizing on host at submission of the host_task command. This member function is designed to be used with the manual_interop_sync property.

Documentation and more tests to follow.

@AerialMantis @t4c1 @npmiller

@hdelan hdelan requested a review from a team as a code owner March 6, 2024 12:14
@hdelan hdelan requested a review from bso-intel March 6, 2024 12:14
// Using this function removes the requirement that a host task callable must
// synchronize with any asynchronous operations from within the callable.
template <backend Backend = backend::opencl>
void add_native_events(backend_return_t<Backend, event> NativeEvents) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This member function is named ext_oneapi_add_native_events in the extension.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove the default backend so the user must always specify which backend they are targeting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea RE default backend. Will change

// Using this function removes the requirement that a host task callable must
// synchronize with any asynchronous operations from within the callable.
template <backend Backend = backend::opencl>
void add_native_events(backend_return_t<Backend, event> NativeEvents) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also have it in the extension to use backend_return_t here, but I wonder if this should be backend_input_t instead.

Copy link
Contributor Author

@hdelan hdelan May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting thought. If we have backend_input_t then maybe this entry point should take a std::vector<backend_input_t<...>>

// Gets all the native events that the host task depends on, and that are
// still active
template <backend Backend = backend::opencl>
backend_return_t<Backend, event> get_native_events() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have this as returning a std::vector<backend_return_t<Backend, event>> since there may be more than one SYCL event so more than one sets of native events.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes maybe this is a good idea. The problem is that backend_return_t is already a std::vector for many backends. Allocating a vector of vectors would be unnecessarily bad in terms of perf overhead. So maybe a std::vector<backend_input_t<...>> is better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want a vector here regardless of whether backend_return_t already is a vector or not. Using backend_input_t in a return type feels wrong, although I think it would work just fine. A third option would be to use some template magic to make a vector out of backend_return_t if it is not already a vector.

Hugh Delaney added 12 commits June 25, 2024 10:13
Able to pass native events out of the host task successfully
in a way that can be retrieved with get_native(event)
Able to call event.wait() and queue.wait() but cannot use plugin
dependencies directly
- Remove setPlugin method, use setContextImpl instead in event_impl
- Change from using a std vector of PiEvents to EventImplPtrs for
  native events
If I submit a normal kernel after my host task with
native events, then the call to the plugin will contain
an array of the native events submitted from within the
add_native_events call. So far event.wait is not working
in the same way.
setPlugin was accidentally deleted. This reinstates it.
getNativeVector wasn't working. This fixes it so that
get_native(event) works. This also updates test so that
the wait tests are disabled as they are currently not
working
If we call event::wait or queue::wait we need to make sure that the
native events are waited on as well. The usual path just waits on
the lambda and then uses the native events in submitting the next
enqueued commands.
Added an entry point for get_native_events, which gets the native events
that a HT is dependent on. Tests are passing. When using a parallel_for
and then a HT which depends on the PF, the SYCL RT still waits for the
PF native event to be entirely complete before dispatching the HT. This
is a problem with the model of synchronization in general with host
tasks. A new synchronization guarantee is needed in order to get around
this. Perhaps there is an opportunity to reinstate the interop task for
this reason. A solution would be if the interop task requires CGs that
it depends on to be enqueued before the HT is dispatched (so there are
native events). It would still require HT lambdas to be completed before
a new HT can be enqueued, so that add_native_events has populated the
SYCL RT with any events it might need.

Other changes:
- Move existing add_native_events to new dir. Test failing for accessor
  interop due to accessor interop being broken for CUDA atm. Refactor
  some stuff with a helper header.
- Change the void * to pi_native_handle for the interop_handle RT
  entry points.
Make sure that the size of the vector of native events returned
by get_native_events is greater than zero.
Add a new property for manual interop sync and make a host_task, HostTask specialization
taking a property list.
A host task without native events should not have getNativeVector called
Don't call Plugin if RawEvents is empty
Only wait for the RawEvents if we submit the host task
with manual_interop_sync property
enqueueing commands from the host task thread causes a race condition for very
quickly executing lambdas. This is because commands are enqueued both from
the host thread, and then also from the HT thread if the command is blocked
by the lambda executing on HT thread.

If the command blocked by HT A starts and finishes executing and is cleaned up
by the main thread before thread A enqueues the new command then the new
enqueue will be using deallocated data. This could cause some non determinism
or segfaulting.

waitForEvents wait for native events

- Wait for native HT events if waitForEvents is called.
- Only return native events if they haven't been waited on
- Fix bug in get_native for event which isn't host task
- Make native input types single CUevent/HIPevents, not vectors.
- Update tests with using structs as well as not requiring that
  get_native_events returns an empty vec if SYCL RT synchronization is
  used.

Run general manual interop sync test for all backends

Fix warning for comparing int and size_t

Reorganize tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants