[SYCL][DOC] Add draft of sycl_ext_oneapi_async_memcpy #9439

dkhaldi · 2023-05-12T17:29:35Z

No description provided.

Pennycook · 2023-05-12T18:10:58Z

sycl/doc/extensions/proposed/sycl_ext_oneapi_async_memcpy.asciidoc

+`sycl::ext::oneapi::experimental::async_memcpy` is a free function
+that  asynchronously copies a number of elements specified by
+`num_elements` of data of type `T` from the source pointer `src` to


I think it would be a good idea to align with SYCL 2020's USM copy functions here. For USM, memcpy accepts void* and a numBytes, whereas copy accepts T* and a count. The function you've defined here is called memcpy but accepts T*.

My recommendation would be to define both async_memcpy and async_copy.

Pennycook · 2023-05-12T18:18:09Z

sycl/doc/extensions/proposed/sycl_ext_oneapi_async_memcpy.asciidoc

+`syclex::barrier` as an argument that can be used to wait on the
+completion of the memory copy.
+
+Permitted types for `T` are all scalar and vector types.


I suspect that we may need some more restrictions here, or at least some non-normative notes to highlight potentially surprising behavior.

It's unsafe to use memcpy if T isn't trivially copyable. This should probably be listed as a precondition of memcpy_async as well.

If T has a non-trivial copy constructor, developers calling async_copy would expect the copy constructor to execute. That would (probably) prevent an implementation from using any hardware-acceleration for asynchronous copies. I don't know whether it's better to limit T to trivially copyable types, or to add a non-normative note that many implementations will probably not execute asynchronously if T is not trivially copyable.

I'm leaning towards accepting all T and adding a note, because asynchrony already isn't guaranteed.

I thought the term "scalar type" did not include classes. Therefore, aren't all scalar types trivially copyable? See cppreference definition of Scalar Type. Also see the SYCL definition of Scalar data types.

I don't think we should limit this to scalar and vector. One can create a class and allocate it in local memory/registers where the hardware can use such accelerated async_memcpy for such class.
I will add a precondition that T must be a trivially copyable type.

Pennycook · 2023-05-12T18:19:16Z

sycl/doc/extensions/proposed/sycl_ext_oneapi_async_memcpy.asciidoc

+  access::decorated SrcIsDecorated, sycl::memory_scope Scope>
+  void async_memcpy(multi_ptr<T, DestSpace, DestIsDecorated> dest,
+  multi_ptr<T, SrcSpace, SrcIsDecorated> src, size_t numElements,
+  syclex::barrier<Scope> bar);


syclex::barrier has two template arguments. Was it your intention to limit this to barriers with the default CompletionFunction?

No, I will add CompletionFunction to the arguments

Pennycook · 2023-05-12T18:19:36Z

sycl/doc/extensions/proposed/sycl_ext_oneapi_async_memcpy.asciidoc

+  access::decorated SrcIsDecorated, sycl::memory_scope Scope>
+  void async_memcpy(multi_ptr<T, DestSpace, DestIsDecorated> dest,
+  multi_ptr<T, SrcSpace, SrcIsDecorated> src, size_t numElements,
+  syclex::barrier<Scope> bar);


I don't think syclex is defined anywhere here. Since you're in the namespace already, you can probably just remove it.

Pennycook · 2023-05-12T18:20:16Z

sycl/doc/extensions/proposed/sycl_ext_oneapi_async_memcpy.asciidoc

+  template <typename T,  access::address_space DestSpace,
+  access::decorated DestIsDecorated, access::address_space SrcSpace,
+  access::decorated SrcIsDecorated, sycl::memory_scope Scope>
+  void async_memcpy(multi_ptr<T, DestSpace, DestIsDecorated> dest,


Why limit to multi_ptr here? Should we allow generic T* as well?

I use multi_ptr to be able to carry around the address space info.
If USM is used, multi_ptr can be constructed using address_space_cast
Can the implementation of async_copy retrieve address space (local vs global) if we only pass generic T*?

It depends. T* is allowed to match both decorated pointers (which have the address space attached) and raw pointers (which don't have the address space attached).

If a function is passed a decorated pointer, then the implementation knows the address space (just as if it was passed a multi_ptr). If it's passed a raw pointer, then it can use address_space_cast to determine what the address space is using a runtime check (which may introduce some overhead).

I'm hopeful that in cases where the compiler actually does know the address space, but it's represented as a raw pointer, the runtime check can be optimized away. But I don't think that is implemented right now.

gmlueck · 2023-05-18T13:43:16Z

sycl/doc/extensions/proposed/sycl_ext_oneapi_async_memcpy.asciidoc

+`syclex::barrier` as an argument that can be used to wait on the
+completion of the memory copy.
+
+Permitted types for `T` are all scalar and vector types.


I thought the term "scalar type" did not include classes. Therefore, aren't all scalar types trivially copyable? See cppreference definition of Scalar Type. Also see the SYCL definition of Scalar data types.

gmlueck · 2023-05-18T13:44:43Z

sycl/doc/extensions/proposed/sycl_ext_oneapi_async_memcpy.asciidoc

+  access::decorated IsDecorated, sycl::memory_scope Scope>
+  void group_async_memcpy(Group g, multi_ptr<T, Space, IsDecorated>
+  dest, multi_ptr<T, Space, IsDecorated> src, size_t numElements,
+  syclex::barrier<Scope> bar);


The barrier class has a deleted copy constructor, so I think we need to pass it by reference.

gmlueck · 2023-05-18T13:48:49Z

sycl/doc/extensions/proposed/sycl_ext_oneapi_async_memcpy.asciidoc

+  // Allocate memory for and construct the barrier
+  auto* bar = sycl::ext::oneapi::group_local_memory<wg_barrier>(it.get_group(), nthreads);
+
+  async_memcpy(pdest, psrc, N, bar);


I don't understand how the barrier works here. Does the implementation call arrive on the barrier when the copy is complete? If that is the case, it seems like the call to arrive_and_wait below should be just wait. If that is not how it works, then I'm confused about something here.

I am using the barrier defined here: https://github.com/Pennycook/llvm/blob/cc7eaf559699a759c9cde1586e3113f9c1479bda/sycl/doc/extensions/proposed/sycl_ext_oneapi_barrier.asciidoc
The WIs here call arrive, not the implementation. Then, wait to block until the predetermined number of work items have called arrive. When same number of WIs call arrive and wait at the same point of code, token is not needed and arrive_and_wait can be used.

gmlueck · 2023-05-18T13:56:25Z

sycl/doc/extensions/proposed/sycl_ext_oneapi_async_memcpy.asciidoc

+  // Allocate memory for and construct the barrier
+  auto* bar = sycl::ext::oneapi::group_local_memory<wg_barrier>(it.get_group(), nthreads);
+
+  group_async_memcpy(it.get_group(), pdest, psrc, N, bar);


Is the idea that all work-items in the group jointly copy from psrc to pdest? If so, there should be a description someplace saying that all work-items must pass the same value for pdest, psrc, and N. I presume all work-items should also pass the same value for bar?

In real applications you often need an interface whereby each thread is passed a different pointer:
imagine you are dealing with some complicated strided/and or with offset data that you wish to copy. All of these existing interfaces in this spec doc assume each thread is passed the same start address, and then (although as you mention doc doesn't state yet) the implication is that the impl assigns group items different addresses wrt the start address to copy, up to N items.
This is fine for certain applications, but if you want to support all real applications you at least need an additional interface (perhaps by e.g. introducing a single item/thread "group"), where each thread can be passed a specific pointer by the user.

The group variants adds convergence constraints. I will add description that same value of pdest, psrc, N, bar must be passed.
If these are different as @JackAKirk is suggesting, I think the non-group variants should be used.

…s and T

github-actions · 2024-09-15T02:02:35Z

This pull request is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be automatically closed in 30 days.

github-actions · 2024-10-16T02:00:35Z

This pull request was closed because it has been stalled for 30 days with no activity.

[SYCL][DOC] Add draft of sycl_ext_oneapi_async_memcpy

ef57094

dkhaldi requested a review from a team as a code owner May 12, 2023 17:29

dkhaldi requested review from Pennycook and jbrodman May 12, 2023 17:29

Pennycook requested changes May 12, 2023

View reviewed changes

gmlueck reviewed May 18, 2023

View reviewed changes

add async_copy, use generic T, clarifications about the group variant…

4d3dd51

…s and T

github-actions bot added the Stale label Sep 15, 2024

github-actions bot closed this Oct 16, 2024

dkhaldi reopened this Dec 10, 2024

github-actions bot removed the Stale label Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL][DOC] Add draft of sycl_ext_oneapi_async_memcpy #9439

[SYCL][DOC] Add draft of sycl_ext_oneapi_async_memcpy #9439

dkhaldi commented May 12, 2023

Pennycook May 12, 2023

Pennycook May 12, 2023

gmlueck May 18, 2023

dkhaldi Aug 24, 2023

Pennycook May 12, 2023

dkhaldi Aug 24, 2023

Pennycook May 12, 2023

Pennycook May 12, 2023

dkhaldi Aug 24, 2023

Pennycook Aug 24, 2023

gmlueck May 18, 2023

gmlueck May 18, 2023

gmlueck May 18, 2023

dkhaldi Aug 25, 2023

gmlueck May 18, 2023

JackAKirk Aug 16, 2023 •

edited

Loading

dkhaldi Aug 25, 2023

github-actions bot commented Sep 15, 2024

github-actions bot commented Oct 16, 2024

[SYCL][DOC] Add draft of sycl_ext_oneapi_async_memcpy #9439

Are you sure you want to change the base?

[SYCL][DOC] Add draft of sycl_ext_oneapi_async_memcpy #9439

Conversation

dkhaldi commented May 12, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JackAKirk Aug 16, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Sep 15, 2024

github-actions bot commented Oct 16, 2024

JackAKirk Aug 16, 2023 •

edited

Loading