-
Notifications
You must be signed in to change notification settings - Fork 754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Doc] Add free function kernel extension spec #11861
Conversation
Add a proposed extension spec that allows a kernel to be defined as a plain function, where the function arguments are the kernel arguments. This is useful for the sycl_ext_oneapi_kernel_compiler because it allows a kernel to be defined in isolation from its host code.
sycl/doc/extensions/proposed/sycl_ext_oneapi_raw_kernel.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_raw_kernel.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_raw_kernel.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_raw_kernel.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_raw_kernel.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_raw_kernel.asciidoc
Outdated
Show resolved
Hide resolved
Add some more notes about decomposed kernel arguments and kernel arguments that are optimized away.
Add a section describing requirements when a raw kernel is defined as a function template.
sycl/doc/extensions/proposed/sycl_ext_oneapi_raw_kernel.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_raw_kernel.asciidoc
Outdated
Show resolved
Hide resolved
|
||
In a standard SYCL kernel, the iteration ID is passed as a parameter to the | ||
kernel's callable object. | ||
However, this is not the case for a raw kernel because the function parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? I would prefer if the item/nd_item is the first argument followed by the other function parameters.
This would also remove the need for the range/nd_range/single_task_kernel property. Because this could be inferred from the first argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't we still need to decorate the function somehow, though? I don't think we want to turn every function which accepts a sycl::nd_item
into a kernel -- it might just be a device function that's called from a kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we can't use the first argument to determine whether it is a kernel. Only the type of kernel (nd/range/single_task). In the same TU we can turn every function which gets passed to get_kernel into a kernel. Cross TU we do need a decoration. But then the user only needs to provide the decoration cross TU (like SYCL_EXTERNAL) and the FE doesn't need to understand multiple decorations.
If we support kernel function pointers cross TU, we should also support callable types cross TU. And those would need the same decoration.
sycl/doc/extensions/proposed/sycl_ext_oneapi_raw_kernel.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_raw_kernel.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_raw_kernel.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_raw_kernel.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_raw_kernel.asciidoc
Outdated
Show resolved
Hide resolved
Rename this feature to "free function kernels" instead of "raw kernel".
Abandon the idea of identifying free function kernels by a type-name. Instead, we use a non-type template parameter whose value is the address of the function.
Use special macros to decorate free function kernels instead of compile time properties.
Clarify that the kernel arguments are in the private address space. This is consistent with arguments to device functions called within the kernel. It is also consistent with OpenCL C, which specifies that the kernel arguments are in the private address space.
Attempting to enqueue a free function kernel using a mechanism that does not | ||
match its type results in undefined behavior. | ||
Attempting to enqueue a free function kernel with a `range` or `nd_range` whose | ||
dimensionality does not match the free function kernel definition results in | ||
undefined behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid UB by having just one macro with no parameters (e.g. SYCL_EXT_ONEAPI_KERNEL
)?
The spec doesn't say why we need 3 different macros, which must match kernel invocation function to avoid UB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that by having three different macros we can provide better error messages. For example, we can emit a compile-time error if a kernel decorated with SYCL_EXT_ONEAPI_RANGE_KERNEL
calls the function sycl::ext::oneapi::this_kernel::get_nd_item
. We can also throw an exception if a kernel decorated with SYCL_EXT_ONEAPI_RANGE_KERNEL
is enqueued as an nd-range kernel.
Note that we can diagnose these errors even if the spec says the behavior is UB. We always have the option to diagnose UB with a helpful error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, we can emit a compile-time error if a kernel decorated with
SYCL_EXT_ONEAPI_RANGE_KERNEL
calls the functionsycl::ext::oneapi::this_kernel::get_nd_item
.
Is this a situation where we need to emit an error? I would expect any SYCL kernel to work for any combination of kernel invocation API and kernel identifiers free functions. nd_item is a wrapper around kernel identifiers, which are available to any kernel regardless of parallel_for overload. Current SYCL API limitations on lambda call operator parameter type are artificial. They are useful for code optimization: compiler has to allocate and initialize id/item/nd_item lambda argument. In this case, it's created by user via free function, so we don't need to optimize it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think an error would be helpful. The specification for the free function queries says their behavior is undefined if they are called from a kernel that is launched in an incompatible way:
It is the user’s responsibility to ensure that these functions are called in a manner that is compatible with the kernel’s launch parameters, as detailed in the definition of each function. Calling these functions from an incompatible kernel results in undefined behavior.
If we don't say it is UB, then we would need to clarify the spec to explain what is returned in the mismatched cases. For example, what is the work-group size for a simple "range" parallel-for? This seems worse than saying the behavior is UB and diagnosing an error when possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't say it is UB, then we would need to clarify the spec to explain what is returned in the mismatched cases.
There are no mismatches cases because there is no SYCL kernel argument we have to "match" (or to be compatible with). This argument exists only for call operator of functional object, but this extension defines alternative API, which we don't need to "match".
For example, what is the work-group size for a simple "range" parallel-for?
I expect it to be defined by the implementation. AFAIK, all supported backends provide access to work-group size on device side.
On the host side, when not set explicitly OpenCL back-end chooses the work-group size automatically. Other backends has API to query recommended work-group size, which can be set explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no mismatches cases because there is no SYCL kernel argument we have to "match"
By "mismatched" I meant a mismatch between the way the kernel is launched and the queries the kernel makes. For example, launching the kernel as a simple "range" kernel and calling this_kernel::get_nd_item
.
I expect it to be defined by the implementation. AFAIK, all supported backends provide access to work-group size on device side.
I really feel this would open a can of worms that would make our life difficult in the future. For example, what if the application launches the kernel as a simple "range" and then uses this_kernel::get_nd_item
to get a group
object that it uses to call one of the group functions defined in section 4.17.2 or 4.17.3? I really don't think we want to say this is supported. If we were required to support this, I think it would limit our implementation choices in the future.
I also think this presents a confusing story to our users. @Pennycook has been clarifying the SYCL spec to say that simple "range" kernels have no concept of work-group. Therefore, it does not make sense to allow "range" kernels to call this_kernel::get_nd_item
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, I don't see much value in "range" kernels. According to my understanding, this mode limits access to low-level HW features for two reasons:
- Simpler interface.
- In some cases, it might be more efficient to create
item
object instead ofnd_item
.
Free functions remove the second item.
Regarding the first one, this extension makes it even more difficult to use. This proposal requires users to make use of 3 different entities consistent: kernel function declaration, kernel function invocation and index space identifiers. Current spec requires only two and I've seen cases of inconsistent usage.
From my POV, "range" kernel benefits do not justify adding more UB and usage (and implementation) complexity of this proposal.
All low-level APIs we support today have work-group concept, so all limitations enforced by "range" (or even "single") kernels are artificial. We can always create valid meaningful nd_item
regardless of kernel invocation API. We add compiler and runtime library support to diagnose inconsistent API uses for very low ROI.
sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc
Outdated
Show resolved
Hide resolved
In intel#11861 we decided to remove the ability to get the `kernel` object for a free-function kernel using a type-name. Therefore, we don't need this ability in online-compiled kernels either. Remove the associated wording from the kernel compiler specs.
Address a code review comment about wording regarding the address space of kernel arguments.
In #11861 we decided to remove the ability to get the `kernel` object for a free-function kernel using a type-name. Therefore, we don't need this ability in online-compiled kernels either. Remove the associated wording from the kernel compiler specs.
sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc
Outdated
Show resolved
Hide resolved
Regarding "decomposition". Shortly after I added support for array kernel parameters a couple of years ago the clang front-end team decided to decompose all non-POD kernel arguments. That includes all structs and arrays passed by value. SYCL accessors are just an instance of a struct type. So any decision made for accessors is really a decision about all non-POD types. |
Are you sure? I just tried an example with an argument defined like this:
And it seems like it was not decomposed. In fact, I remember @elizabethandrews telling me that she changed the front-end to stop decomposing struct arguments when a member is a pointer. At this point, I thought we only decomposed a kernel argument if it is a "special" type (e.g. accessor or |
My understanding of decomposition was outdated. Yes, it looks like only certain SYCL types are decomposed. |
sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc
Show resolved
Hide resolved
What was the thinking behind the macros SYCL_EXT_ONEAPI_RANGE_KERNEL, SYCL_EXT_ONEAPI_ND_RANGE_KERNEL, or SYCL_EXT_ONEAPI_SINGLE_TASK_KERNEL and the kernel function traits? The macros have somehow to be implemented to declare the traits? |
We decided that it is OK for the front end to recognize some of the compile time properties. Therefore, we can identify a function as a kernel by decorating it with a property, rather than using a special macro.
Ok, using properties looks better than the macros. |
Somehow we forgot to add this `#include` to the examples.
Attention reviewers: I do not plan any more changes to this PR and will ask for it to be merged soon. If you have more comments, please make them soon. |
sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc
Show resolved
Hide resolved
The macro SYCL_EXT_ONEAPI_FUNCTION_PROPERTY is defined in properties.hpp as adding some properties to the sycl-device-has attribute. Is that how the properties range_kernel, etc, are to be found in clang front-end processing? Through the sycl-device-has attribute?? |
No, the |
I find it odd that the macro SYCL_EXT_ONEAPI_FUNCTION_PROPERTY is defined in a header file with no indication that it is to be treated as an example. And then this PR specifically mentions use of that macro to define the new kernel properties range_kernel, nd_range_kernel and single_task_kernel, when simply using that macro does not achieve the desired effect. |
Sorry, I forgot to include a hyperlink in my previous response. I meant to say:
(The word "here" was supposed to be a link to the example I was talking about.) You can see that the macro is described to be a general way to attach properties to device functions:
I'm not sure why the header file hard-codes the macro to work only for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This extension looks great, I've left some comments, mostly clarifications.
Though, when reading through this I thought about a potentially different approach to declaring free function kernels (perhaps this has already been considered), whereby a free function kernel is identified instead by the first parameter to the function. We could introduce a new class template such as sycl::this_kernel
which would be templated on the invocation type (range_kernel
, nd_range_kernel
or single_task_kernel
) and compile-time properties.
class single_task_kernel;
template <size_t Dims>
class range_kernel;
template <size_t Dims>
class nd_range_kernel;
template <typename KernelInvocation , typename... Properties>
class this_kernel;
So a free function kernel would be declared as follows:
void iota(syclex::this_kernel<syclex::range_kernel<1>, syclex::work_group_size<32>> this,
float start, float *ptr) {
// Get the ID of this kernel iteration.
size_t id = this.get_id();
ptr[id] = start + static_cast<float>(id);
}
I believe this could be implemented by a device compiler by having a pass which detects the sycl::this_kernel
parameter and transforms the function into a SYCL kernel function as is does for parallel_for
etc, and it could be implemented in a library by detecting the first parameter of the function to verify that it's a SYCL kernel function before calling it.
This approach could provide some benefits:
- Removes the need to use a macro for declaring free function kernels.
- Adds the invocation type to the type system, avoiding undefined behaviour.
- Allows a free function kernel to be overloaded by the invocation type, which would improve generic programming.
- Allows compile-time properties on free function kernels to be applied generically.
- Avoids being dependent on the free function queries extension to retrieve the kernel id object.
sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc
Show resolved
Hide resolved
The macro `SYCL_EXT_ONEAPI_FUNCTION_PROPERTY` is defined in this extension. In addition, it's not clear how viable free function kernels are if we do not provide some way to decorate them with things like the required sub-group size. Therefore, it seems like we may as well list the kernel properties extension as a dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @AerialMantis, Thanks for the review! I've either addressed or responded to your line-by-line comments below.
Regarding your larger comment:
I believe this could be implemented by a device compiler by having a pass which detects the
sycl::this_kernel
parameter and transforms the function into a SYCL kernel function as is does forparallel_for
etc,
This is an interesting idea, but I think the front-end team will not like this direction, and I think it would be hard to get upstreamed. The front-end does not want to get in the business of recognizing specific type names, and I think it would have to do this in order to identify the kernel via the syclex::this_kernel
parameter. Note that the front-end does not currently recognize the parallel_for
by name. Instead, the header wraps the lambda with a function that is decorated with a special attribute, and the front-end recognizes the attribute. The mechanism is similar for SYCL_EXT_ONEAPI_FUNCTION_PROPERTY
. That macro adds a property to the function, which the header translates into a C++ attribute, and the front-end recognizes the attribute.
I agree that your approach has some benefits, but I don't agree with all of them:
- Removes the need to use a macro for declaring free function kernels.
Yes, this is nice.
- Adds the invocation type to the type system, avoiding undefined behaviour.
I think your approach is not type-safe, so there would still be undefined behavior if it is misused. The fundamental problem is that the kernel_id
and kernel
objects do not convey the type of the kernel or the types of the kernel's arguments. Therefore, the compiler cannot catch errors when the kernel is launched (if the application launches it the wrong way or passes invalid argument types).
This is partly by design. One of our customers likes the fact that any kernel can be represented by a generic kernel
object. This allows them to use generic code to manage a library of kernels along with some metadata that provides the information they need to invoke each one.
FWIW, @Pennycook's proposed free-function sycl::parallel_for
would be a type-safe way to invoke a free-function kernel. See the first issue at the bottom of the spec. This will be type-safe even with the current proposal that uses SYCL_EXT_ONEAPI_FUNCTION_PROPERTY
.
- Allows a free function kernel to be overloaded by the invocation type, which would improve generic programming.
- Allows compile-time properties on free function kernels to be applied generically.
I agree that your proposal is more convenient for generic programming, but I think you can also do it with the current proposal. Free function kernels can be templated, so you could use different template instantiations that correspond to different invocation types or different properties.
- Avoids being dependent on the free function queries extension to retrieve the kernel id object.
I actually don't see this as an advantage. We plan to implement that sycl_ext_oneapi_free_function_queries extension anyways, so I think it's an advantage to using it rather than inventing a new thing.
sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc
Outdated
Show resolved
Hide resolved
So I don't think it would be necessary for the front-end to identify the I agree this would be a more involved implementation than the macro approach as the macro can expand the attributes directly to the function declaration, whereas with
Yes, that is true, while there would be compile-time information about the function, this wouldn't translate to the
I agree there is a strong benefit to this, so I wouldn't want to propose that we start adding type information to the
Ah okay that's interesting, I'll need to revisit that extension.
I think you're right, you could have a template function with some template parameters which describe the invocation method, and then explicitly instantiate them with a different macro. and therefore compile-time property for each. Though thinking about that, this raises another question, when we template a free function kernel is the macro applied to the template declaration (I think this is how it's described in the example), or on the specializations (explicit instantiation or explicit specialization), or both? I wonder how it would work if it was applied to the template declaration because then the template arguments wouldn't be known until provided when retrieving the kernel bundle (something like
Yeah I was hesitant to state this as a benefit, as we may want to go in the direction of the free function queries, but I'm also wary that the free function queries may be different to implement in a library only or host implementation of SYCL. |
Add an issue to capture our discussion about whether we should support a free function kernel syntax where the initial parameter is the iteration index.
Address the review comment about requiring an implementation to emit a diagnostic if the kernel declaration violates the restrictions that the spec mandates. We're not sure yet if the spec should mandate a diagnostic, but we do think that a high quality implementation should do this. Therefore, add an implementation note stating that we expect DPC++ to issue a diagnostic in these cases.
Regarding the comment thread above with @AerialMantis, we had an offline meeting which also included @Pennycook about this. We decided that it did not make sense to use We did not come to a conclusion on this, so we agreed to capture this as an open issue, which I did in ecefa00. We do not want this open issue to block initial implementation of this extension. |
@intel/llvm-gatekeepers I think this PR is ready to merge. I had a meeting with @AerialMantis, and we agreed to capture his remaining comments as an open issue, which I did. I also met with @bader. His comments are more of a fundamental question about whether SYCL should make a distinction between the three kernels types ("range", "nd-range", and "single task"), or whether we should simplify the language and eliminate both "range" and "single task" kernels. This seems outside of the scope of this PR, and I think he does not mean to hold up this PR with his comment (but let me know, @bader, if I misunderstood). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gmlueck thanks for the changes, this looks good, I'm happy for this to be merged now and for the remaining questions as they are described in the open issues section, to be discussed further at a later time.
Merging this. If there are any open or follow-up issues, please open an issue. |
Add a proposed extension spec that allows a kernel to be defined as a plain function, where the function arguments are the kernel arguments. This is useful for the sycl_ext_oneapi_kernel_compiler because it allows a kernel to be defined in isolation from its host code.