-
Notifications
You must be signed in to change notification settings - Fork 794
[Draft] Host task extensions #12921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft] Host task extensions #12921
Conversation
// Using this function removes the requirement that a host task callable must | ||
// synchronize with any asynchronous operations from within the callable. | ||
template <backend Backend = backend::opencl> | ||
void add_native_events(backend_return_t<Backend, event> NativeEvents) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This member function is named ext_oneapi_add_native_events
in the extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should remove the default backend so the user must always specify which backend they are targeting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea RE default backend. Will change
// Using this function removes the requirement that a host task callable must | ||
// synchronize with any asynchronous operations from within the callable. | ||
template <backend Backend = backend::opencl> | ||
void add_native_events(backend_return_t<Backend, event> NativeEvents) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also have it in the extension to use backend_return_t
here, but I wonder if this should be backend_input_t
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting thought. If we have backend_input_t
then maybe this entry point should take a std::vector<backend_input_t<...>>
// Gets all the native events that the host task depends on, and that are | ||
// still active | ||
template <backend Backend = backend::opencl> | ||
backend_return_t<Backend, event> get_native_events() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have this as returning a std::vector<backend_return_t<Backend, event>>
since there may be more than one SYCL event
so more than one sets of native events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes maybe this is a good idea. The problem is that backend_return_t
is already a std::vector
for many backends. Allocating a vector of vectors would be unnecessarily bad in terms of perf overhead. So maybe a std::vector<backend_input_t<...>>
is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want a vector here regardless of whether backend_return_t
already is a vector or not. Using backend_input_t
in a return type feels wrong, although I think it would work just fine. A third option would be to use some template magic to make a vector out of backend_return_t
if it is not already a vector.
Able to pass native events out of the host task successfully in a way that can be retrieved with get_native(event)
Able to call event.wait() and queue.wait() but cannot use plugin dependencies directly
- Remove setPlugin method, use setContextImpl instead in event_impl - Change from using a std vector of PiEvents to EventImplPtrs for native events
If I submit a normal kernel after my host task with native events, then the call to the plugin will contain an array of the native events submitted from within the add_native_events call. So far event.wait is not working in the same way.
setPlugin was accidentally deleted. This reinstates it. getNativeVector wasn't working. This fixes it so that get_native(event) works. This also updates test so that the wait tests are disabled as they are currently not working
If we call event::wait or queue::wait we need to make sure that the native events are waited on as well. The usual path just waits on the lambda and then uses the native events in submitting the next enqueued commands.
Added an entry point for get_native_events, which gets the native events that a HT is dependent on. Tests are passing. When using a parallel_for and then a HT which depends on the PF, the SYCL RT still waits for the PF native event to be entirely complete before dispatching the HT. This is a problem with the model of synchronization in general with host tasks. A new synchronization guarantee is needed in order to get around this. Perhaps there is an opportunity to reinstate the interop task for this reason. A solution would be if the interop task requires CGs that it depends on to be enqueued before the HT is dispatched (so there are native events). It would still require HT lambdas to be completed before a new HT can be enqueued, so that add_native_events has populated the SYCL RT with any events it might need. Other changes: - Move existing add_native_events to new dir. Test failing for accessor interop due to accessor interop being broken for CUDA atm. Refactor some stuff with a helper header. - Change the void * to pi_native_handle for the interop_handle RT entry points.
Make sure that the size of the vector of native events returned by get_native_events is greater than zero.
Add a new property for manual interop sync and make a host_task, HostTask specialization taking a property list.
A host task without native events should not have getNativeVector called
Don't call Plugin if RawEvents is empty
Only wait for the RawEvents if we submit the host task with manual_interop_sync property
enqueueing commands from the host task thread causes a race condition for very quickly executing lambdas. This is because commands are enqueued both from the host thread, and then also from the HT thread if the command is blocked by the lambda executing on HT thread. If the command blocked by HT A starts and finishes executing and is cleaned up by the main thread before thread A enqueues the new command then the new enqueue will be using deallocated data. This could cause some non determinism or segfaulting. waitForEvents wait for native events - Wait for native HT events if waitForEvents is called. - Only return native events if they haven't been waited on - Fix bug in get_native for event which isn't host task - Make native input types single CUevent/HIPevents, not vectors. - Update tests with using structs as well as not requiring that get_native_events returns an empty vec if SYCL RT synchronization is used. Run general manual interop sync test for all backends Fix warning for comparing int and size_t Reorganize tests
Draft implementation adding a few HT extensions:
Add native events
add_native_events
allows the application to inform the SYCL runtime of asynchronous tasks submitted within a host_task function that the SYCL runtime must manage once the host_task function finishes execution. This extension allows for host_task functions to execute without needing synchronization points within the host_task function before returning.Manual Interop Sync
manual_interop_sync
is a property that can be passed to handler::host_task. The property instructs the SYCL runtime not to synchronize with all of the native events that the host_task command depends on at host_task submission time.Get Native Events
get_native_events
is a member function of interop handle. Since SYCL events have a one-to-one or one-to-many mapping to native events, such as
CUevents
, this function allows the application to access the native events associated with the events that the command depends on. In this way, the application can handle asynchronous dependencies by using native events in the backend API within a host_task function, instead of handling dependencies by synchronizing on host at submission of the host_task command. This member function is designed to be used with themanual_interop_sync
property.Documentation and more tests to follow.
@AerialMantis @t4c1 @npmiller