From b32eac41d3d215a2d71e95860e7da4d6985d07a6 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Mon, 2 Oct 2023 16:38:08 -0400 Subject: [PATCH 01/29] [SYCL][Doc] Add raw kernel extension spec Add a proposed extension spec that allows a kernel to be defined as a plain function, where the function arguments are the kernel arguments. This is useful for the sycl_ext_oneapi_kernel_compiler because it allows a kernel to be defined in isolation from its host code. --- .../sycl_ext_oneapi_raw_kernel.asciidoc | 513 ++++++++++++++++++ 1 file changed, 513 insertions(+) create mode 100644 sycl/doc/extensions/proposed/sycl_ext_oneapi_raw_kernel.asciidoc diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_raw_kernel.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_raw_kernel.asciidoc new file mode 100644 index 0000000000000..1be867e1a4f80 --- /dev/null +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_raw_kernel.asciidoc @@ -0,0 +1,513 @@ += sycl_ext_oneapi_raw_kernel + +:source-highlighter: coderay +:coderay-linenums-mode: table + +// This section needs to be after the document title. +:doctype: book +:toc2: +:toc: left +:encoding: utf-8 +:lang: en +:dpcpp: pass:[DPC++] + +// Set the default source code type in this document to C++, +// for syntax highlighting purposes. This is needed because +// docbook uses c++ and html5 uses cpp. +:language: {basebackend@docbook:c++:cpp} + + +== Notice + +[%hardbreaks] +Copyright (C) 2023-2023 Intel Corporation. All rights reserved. + +Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks +of The Khronos Group Inc. +OpenCL(TM) is a trademark of Apple Inc. used by permission by Khronos. + + +== Contact + +To report problems with this extension, please open a new issue at: + +https://github.com/intel/llvm/issues + + +== Dependencies + +This extension is written against the SYCL 2020 revision 8 specification. +All references below to the "core SYCL specification" or to section numbers in +the SYCL specification refer to that revision. + +This extension also depends on the following other SYCL extensions: + +* link:../experimental/sycl_ext_oneapi_properties.asciidoc[ + sycl_ext_oneapi_properties] +* link:../proposed/sycl_ext_oneapi_free_function_queries.asciidoc[ + sycl_ext_oneapi_free_function_queries] + + +== Status + +This is a proposed extension specification, intended to gather community +feedback. +Interfaces defined in this specification may not be implemented yet or may be +in a preliminary state. +The specification itself may also change in incompatible ways before it is +finalized. +*Shipping software products should not rely on APIs defined in this +specification.* + + +== Overview + +This extension introduces a new way to define a kernel as a simple C++ +function, where the kernel arguments are parameters to the function. +This is different from standard SYCL kernels, where the kernel arguments are +either captures of a lambda expression or member variables of a callable +object. + +The primary motivation for this extension is the +link:../proposed/sycl_ext_oneapi_kernel_compiler.asciidoc[ +sycl_ext_oneapi_kernel_compiler], which allows online compilation of a kernel +from source code. +These kernels must have a clean separation between host and device code, and +they must have a defined order to their arguments because the application sets +the argument values by their "index" via `handler::set_arg(index, value)`. +Because variables captured by lambda expressions have no defined order, we need +some other way to define kernel arguments for use with the +sycl_ext_oneapi_kernel_compiler extension, and the sycl_ext_oneapi_raw_kernel +extension provides that mechanism. +The "raw kernel" feature is a separate extension, though, because it can also +be used independently from sycl_ext_oneapi_kernel_compiler. +For example, some users may find it more familiar to define kernels as plain +functions or they might have other reasons to prefer a clean separation between +host and device code. + + +== Specification + +=== Feature test macro + +This extension provides a feature-test macro as described in the core SYCL +specification. +An implementation supporting this extension must predefine the macro +`SYCL_EXT_ONEAPI_RAW_KERNEL` +to one of the values defined in the table below. +Applications can test for the existence of this macro to determine if the +implementation supports this feature, or applications can test the macro's +value to determine which of the extension's features the implementation +supports. + +[%header,cols="1,5"] +|=== +|Value +|Description + +|1 +|The APIs of this experimental extension are not versioned, so the + feature-test macro always has this value. +|=== + +=== Defining a raw kernel + +A raw kernel is a normal C++ function definition, where the function definition +is decorated with one of the following compile-time properties: `range_kernel`, +`nd_range_kernel`, or `single_task_kernel`. +Each of these properties takes a template parameter `Name`, which serves as the +name of the kernel, similar to the `KernelName` template parameter to +`parallel_for` for a standard SYCL kernel. +When a raw kernel has a name, the type used as the name must not be used to +name any other kernel in the application (either a raw kernel or a "named" +standard kernel). +The type used to name a raw kernel has the same limitations as the type used to +name a standard SYCL kernel as specified in section 5.2 "Naming of kernels" of +the core SYCL specification. +A raw kernel may be defined as a function template, however, all instantiations +of the function must have a unique name, which implies that the type used to +name the kernel is also somehow templated. + +Usually, a raw kernel must have a type-name because the type-name is used to +obtain a `kernel` object that represents the raw kernel, and a `kernel` object +is needed in order to enqueue the kernel. +However, when raw kernels are used with the +link:../proposed/sycl_ext_oneapi_kernel_compiler.asciidoc[ +sycl_ext_oneapi_kernel_compiler] extension, it is possible to obtain the +`kernel` even without a type-name. +In this case, the raw kernel can be unnamed by letting the `Name` template +parameter have its default value of `void`. + +The following table provides additional details about these compile-time +properties. + +|==== +a| +*`range_kernel` property* + +[frame=all,grid=none] +!==== +a! +[source] +---- +struct range_kernel_key { + template + using value_t = property_value; +}; + +template +inline constexpr range_kernel_key::value_t range_kernel; + +template<> +struct is_property_key : std::true_type {}; +---- +!==== + +Indicates that the function is a raw kernel that is invoked with a simple +`range` iteration space of `Dims` dimensions. +When the `Name` parameter is not `void`, it is a type that represents the name +of the kernel. + +The `property_value` struct has the following member variables and type +aliases: + +[%header,cols="1,1"] +!==== +!Member +!Description + +a! +[source] +---- +static constexpr int dimensions = Dims +---- +! +The number of dimensions of the kernel's range. + +a! +[source] +---- +using name = Name +---- +! +The type-name of the kernel. +!==== + +a| +*`nd_range_kernel` property* + +[frame=all,grid=none] +!==== +a! +[source] +---- +struct nd_range_kernel_key { + template + using value_t = property_value; +}; + +template +inline constexpr nd_range_kernel_key::value_t nd_range_kernel; + +template<> +struct is_property_key : std::true_type {}; +---- +!==== + +Indicates that the function is a raw kernel that is invoked with an +`nd_range` iteration space of `Dims` dimensions. +When the `Name` parameter is not `void`, it is a type that represents the name +of the kernel. + +The `property_value` struct has the following member variables and type +aliases: + +[%header,cols="1,1"] +!==== +!Member +!Description + +a! +[source] +---- +static constexpr int dimensions = Dims +---- +! +The number of dimensions of the kernel's range. + +a! +[source] +---- +using name = Name +---- +! +The type-name of the kernel. +!==== + +a| +*`single_task_kernel` property* + +[frame=all,grid=none] +!==== +a! +[source] +---- +struct single_task_kernel_key { + template + using value_t = property_value; +}; + +template +inline constexpr single_task_kernel_key::value_t single_task_kernel; + +template<> +struct is_property_key : std::true_type {}; +---- +!==== + +Indicates that the function is a raw kernel that is invoked via `single_task` +(i.e. without any iteration space). +When the `Name` parameter is not `void`, it is a type that represents the name +of the kernel. + +The `property_value` struct has the following member type alias: + +[%header,cols="1,1"] +!==== +!Member +!Description + +a! +[source] +---- +using name = Name +---- +! +The type-name of the kernel. +!==== +|==== + +When a function is defined as a raw kernel, each parameter to the function is +a kernel argument, which must abide by the rules for allowable kernel parameter +types specified in section 4.12.4 "Rules for parameter passing to kernels" of +the core SYCL specification. +The function's return type must be `void`. + +The following example demonstrates how a raw kernel can be defined using the +`range_kernel` property: + +``` +namespace syclex = sycl::ext::oneapi::experimental; + +struct my_iota; + +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(syclex::range_kernel<1, my_iota>) +void iota(float start, float *ptr) { + // ... +} +``` + +=== Interaction with the kernel bundle APIs + +Raw kernels work with the kernel bundle APIs defined in section 4.11 "Kernel +bundles" of the core SYCL specification in a natural way. +Any kernel bundle API that accepts a kernel's type-name may be used with the +type-name of a raw kernel. +In fact, an application must use the kernel bundle APIs in order to invoke a +raw kernel because the application must first obtain a `kernel` object. +Typically, the application first obtains a kernel bundle from the raw kernel's +name and then obtains its `kernel` object as shown in the following example: + +``` +namespace syclex = sycl::ext::oneapi::experimental; + +struct my_iota; + +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(syclex::range_kernel<1, my_iota>) +void iota(float start, float *ptr) { + // ... +} + +int main() { + sycl::queue q; + sycl::context ctxt = q.get_context(); + + // Get a kernel bundle that contains the raw kernel "my_iota". + auto exe_bndl = + sycl::get_kernel_bundle(ctxt); + + // Get a kernel object for the "my_iota" function from that bundle. + sycl::kernel myiota = exe_bndl.get_kernel(); +} +``` + +=== Enqueuing a raw kernel and setting parameter values + +Once the application obtains a `kernel` object for a raw kernel, it can enqueue +the kernel to a device using any of the SYCL functions that allow a kernel to +be enqueued via a `kernel` object. +The application must enqueue the raw kernel according to its type. +For example, a raw kernel defined via `range_kernel` can be enqueued by calling +the `handler::parallel_for` overload taking a `range`. +A raw kernel defined via `nd_range_kernel` can be enqueued by calling the +`handler::parallel_for` overload taking an `nd_range`. +A raw kernel defined via `single_task_kernel` can be enqueued by calling +`handler::single_task`. + +Attempting to enqueue a raw kernel using a mechanism that does not match its +type results in undefined behavior. +Attempting to enqueue a raw kernel with a `range` or `nd_range` whose +dimensionality does not match the raw kernel definition results in undefined +behavior. + +The application is also responsible for setting the values of any kernel +arguments when the kernel is enqueued. +For example, when enqueuing a kernel with `handler::parallel_for` or +`handler::single_task`, the kernel argument values must be set via +`handler::set_arg` or `handler::set_args`. +Failing to set the value of a kernel argument results in undefined behavior. + +=== Obtaining the iteration id for a kernel + +In a standard SYCL kernel, the iteration ID is passed as a parameter to the +kernel's callable object. +However, this is not the case for a raw kernel because the function parameters +are used to pass the kernel arguments instead. +Therefore, a raw kernel must obtain the iteration ID in some other way. +Typically, a raw kernel uses the free functions specified in +link:../proposed/sycl_ext_oneapi_free_function_queries.asciidoc[ +sycl_ext_oneapi_free_function_queries] for this purpose. + +=== Interaction with kernel properties + +If the implementation supports +link:../experimental/sycl_ext_oneapi_kernel_properties.asciidoc[ +sycl_ext_oneapi_kernel_properties], a raw kernel may be decorated with these +properties by applying the properties to the function definition as illustrated +below. + +``` +struct my_iota; + +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(syclex::nd_range_kernel<1, my_iota>) +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(syclex::work_group_size<32>) +void iota(float start, float *ptr) { + // ... +} +``` + +As with standard SYCL kernels, these properties can be queried via +`kernel::get_info` using either the `info::kernel::attributes` information +descriptor or the `info::kernel_device_specific` information descriptors. + + +== Example + +The following example demonstrates how to define a raw kernel and then enqueue +it on a device. + +``` +namespace syclex = sycl::ext::oneapi::experimental; + +struct my_iota; +static constexpr size_t NUM = 1024; + +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(syclex::range_kernel<1, my_iota>) +void iota(float start, float *ptr) { + // Get the ID of this kernel iteration. + size_t id = syclex::this_kernel::get_id(); + + ptr[id] = start + static_cast(id); +} + +void main() { + sycl::queue q; + sycl::context ctxt = q.get_context(); + + // Get a kernel bundle that contains the raw kernel "my_iota". + auto exe_bndl = + sycl::get_kernel_bundle(ctxt); + + // Get a kernel object for the "my_iota" function from that bundle. + sycl::kernel myiota = exe_bndl.get_kernel(); + + float *ptr = sycl::malloc_shared(NUM, q); + q.submit([&](sycl::handler &cgh) { + // Set the values of the kernel arguments. + cgh.set_args(3.14f, ptr); + + cgh.parallel_for({NUM}, myiota); + }).wait(); +} +``` + + +== Issues + +* The front-end team has expressed concern about implementing this syntax + because it requires the front-end to recognize the property names + `range_kernel`, `nd_range_kernel`, and `single_task_kernel`. + This is necessary because the front-end must know that functions decorated + with these properties are device code, and it must somehow get the kernel + name in order to generate the integration header. + Currently, the front-end does not intrinsicly know any of the property names. + Rather, the front-end simply passes the properties verbatim into the + generated LLVM IR. + If we want to avoid teaching the front-end about these property names, we + could instead change the syntax for declaring a raw kernel to be like this: ++ +``` +SYCL_EXT_ONEAPI_RANGE_KERNEL(1, my_iota) +void iota(float start, float *ptr) { + // ... +} +``` ++ +Here, the macro `SYCL_EXT_ONEAPI_RANGE_KERNEL` would expand to two things. +One part of the expansion would be a C++ attribute that tells the front-end +that this function is a raw "range" kernel with the given type-name. +Another part of the expansion would add the same IR attributes as the +`range_kernel` property (assuming there is even a need to represent this +information in the IR). +There are two drawbacks to this approach. +One is purely aesthetic: we would need to add a new macro to the language +instead of using our existing property mechanism. +The other relates to error messages. +Error messages with macros tend to be worse when the user passes incorrect +parameters. ++ +This same issue exists also for the extension +link:https://github.com/intel/llvm/pull/10540[ +sycl_ext_oneapi_virtual_functions] because that extension adds the property +`indirectly_callable` which must be used to decorate virtual functions +that can be called from device code. +Again, the front-end would either need to recognize this property name, or we +would need to use a new macro like `SYCL_EXT_ONEAPI_INDIRECTLY_CALLABLE(name)` +instead of the property. + +* We need to investigate whether there will be problems passing kernel + arguments that are "decomposed" by the front-end. + For example, if a kernel argument is a struct that contains an accessor as a + member variable, the front-end decomposes the struct, passing each member + variable as a separate kernel argument. + We could still support arguments like this if `handler::set_arg` is smart + enough to also do the decomposition, passing multiple arguments when the + argument type requires decomposition. + If this is too difficult to implement, we could restrict the arguments to + only those types that do not require decomposition. + If we add this restriction, the front-end should diagnose an error if a + raw kernel is defined to take such an argument. + +* We currently say it is UB if there is a mismatch between a raw kernel's type + or dimensionality and the call to `parallel_for` or `single_task`. + Should we go a step further and require an exception to be thrown in these + cases? + I'm pretty sure we can implement this in {dpcpp}, but I'm not sure about a + library-only implementation. + However, I'm not sure _any_ of this can be implemented without compiler + support. + +* We currently say it is UB if a raw kernel is enqueued without setting a value + for each of its arguments. + Should we go a step further and require an exception in this case? + This seems easier to implement, even for library-only. From 68e1ce453b20c216c49cf05fe065e1499d08e8c9 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Mon, 13 Nov 2023 17:04:47 -0500 Subject: [PATCH 02/29] Add some more open issues Add some more notes about decomposed kernel arguments and kernel arguments that are optimized away. --- .../sycl_ext_oneapi_raw_kernel.asciidoc | 21 ++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_raw_kernel.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_raw_kernel.asciidoc index 1be867e1a4f80..13b24fd999b76 100644 --- a/sycl/doc/extensions/proposed/sycl_ext_oneapi_raw_kernel.asciidoc +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_raw_kernel.asciidoc @@ -494,10 +494,29 @@ instead of the property. enough to also do the decomposition, passing multiple arguments when the argument type requires decomposition. If this is too difficult to implement, we could restrict the arguments to - only those types that do not require decomposition. + only those types that do not require decomposition, however this would be a + big limitation because `accessor` and `local_accessor` would both be + prohibited. If we add this restriction, the front-end should diagnose an error if a raw kernel is defined to take such an argument. +* There is a similar problem with kernel arguments that have been optimized. + Consider a kernel that uses an `accessor`. + Each member variable is passed as a separate kernel argument, but any members + that are unused in the kernel are optimized away, thus they have no + corresponding kernel argument. + Again, we can handle this by making `handler::set_arg` smart enough to know + which member variables have been optimized away. + Alternatively, we can disable these optimizations for raw kernels. + We also have to consider the behavior when a raw kernel has a formal + parameter that is unused (or optimized away) inside the kernel. + Can the compiler optimize away such an argument? + If so, `handler::set_arg` would need to be smart enough to treat an attempt + to set the value of such an argument as a no-op. + The `handler::set_arg` function would also need to account for this when + interpreting the argument index of arguments that follow an optimized-away + argument. + * We currently say it is UB if there is a mismatch between a raw kernel's type or dimensionality and the call to `parallel_for` or `single_task`. Should we go a step further and require an exception to be thrown in these From 0e6f0b226702b65405e8fb41fad588bc811484a8 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Tue, 14 Nov 2023 13:23:52 -0500 Subject: [PATCH 03/29] Behavior when raw kernel is templated Add a section describing requirements when a raw kernel is defined as a function template. --- .../sycl_ext_oneapi_raw_kernel.asciidoc | 31 +++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_raw_kernel.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_raw_kernel.asciidoc index 13b24fd999b76..4b5390325791e 100644 --- a/sycl/doc/extensions/proposed/sycl_ext_oneapi_raw_kernel.asciidoc +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_raw_kernel.asciidoc @@ -400,6 +400,37 @@ As with standard SYCL kernels, these properties can be queried via `kernel::get_info` using either the `info::kernel::attributes` information descriptor or the `info::kernel_device_specific` information descriptors. +== Raw kernels that are function templates + +A raw kernel may be defined as a function template, however, any instantiation +that is submitted at runtime must be instantiated at compile time. +Normally, the application does this with an explicit template instantiation as +shown below: + +``` +template +struct my_iota; + +template +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(syclex::range_kernel<1, my_iota>) +void iota(T start, T *ptr) { + // ... +} + +template void iota(int, int *); +template void iota(float, float *); +``` + +As noted earlier, each instantiation must have a unique type-name, as +illustrated in the example above. +The application can use this type-name to obtain the `kernel` object for the +raw kernel: + +``` +sycl::kernel myiota_int = exe_bndl.get_kernel>(); +sycl::kernel myiota_float = exe_bndl.get_kernel>(); +``` + == Example From 0f7f928433209be5a01399229714978ca0d957ef Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Fri, 17 Nov 2023 08:59:14 -0500 Subject: [PATCH 04/29] Rename extension Rename this feature to "free function kernels" instead of "raw kernel". --- ...ext_oneapi_free_function_kernels.asciidoc} | 178 +++++++++--------- 1 file changed, 92 insertions(+), 86 deletions(-) rename sycl/doc/extensions/proposed/{sycl_ext_oneapi_raw_kernel.asciidoc => sycl_ext_oneapi_free_function_kernels.asciidoc} (72%) diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_raw_kernel.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc similarity index 72% rename from sycl/doc/extensions/proposed/sycl_ext_oneapi_raw_kernel.asciidoc rename to sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc index 4b5390325791e..40402773061e4 100644 --- a/sycl/doc/extensions/proposed/sycl_ext_oneapi_raw_kernel.asciidoc +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc @@ -1,4 +1,4 @@ -= sycl_ext_oneapi_raw_kernel += sycl_ext_oneapi_free_function_kernels :source-highlighter: coderay :coderay-linenums-mode: table @@ -77,10 +77,10 @@ they must have a defined order to their arguments because the application sets the argument values by their "index" via `handler::set_arg(index, value)`. Because variables captured by lambda expressions have no defined order, we need some other way to define kernel arguments for use with the -sycl_ext_oneapi_kernel_compiler extension, and the sycl_ext_oneapi_raw_kernel -extension provides that mechanism. -The "raw kernel" feature is a separate extension, though, because it can also -be used independently from sycl_ext_oneapi_kernel_compiler. +sycl_ext_oneapi_kernel_compiler extension, and the +sycl_ext_oneapi_free_function_kernels extension provides that mechanism. +The "free function kernel" feature is a separate extension, though, because it +can also be used independently from sycl_ext_oneapi_kernel_compiler. For example, some users may find it more familiar to define kernels as plain functions or they might have other reasons to prefer a clean separation between host and device code. @@ -93,7 +93,7 @@ host and device code. This extension provides a feature-test macro as described in the core SYCL specification. An implementation supporting this extension must predefine the macro -`SYCL_EXT_ONEAPI_RAW_KERNEL` +`SYCL_EXT_ONEAPI_FREE_FUNCTION_KERNELS` to one of the values defined in the table below. Applications can test for the existence of this macro to determine if the implementation supports this feature, or applications can test the macro's @@ -110,33 +110,33 @@ supports. feature-test macro always has this value. |=== -=== Defining a raw kernel +=== Defining a free function kernel -A raw kernel is a normal C++ function definition, where the function definition -is decorated with one of the following compile-time properties: `range_kernel`, -`nd_range_kernel`, or `single_task_kernel`. +A free function kernel is a normal C++ function definition, where the function +definition is decorated with one of the following compile-time properties: +`range_kernel`, `nd_range_kernel`, or `single_task_kernel`. Each of these properties takes a template parameter `Name`, which serves as the name of the kernel, similar to the `KernelName` template parameter to `parallel_for` for a standard SYCL kernel. -When a raw kernel has a name, the type used as the name must not be used to -name any other kernel in the application (either a raw kernel or a "named" -standard kernel). -The type used to name a raw kernel has the same limitations as the type used to -name a standard SYCL kernel as specified in section 5.2 "Naming of kernels" of -the core SYCL specification. -A raw kernel may be defined as a function template, however, all instantiations -of the function must have a unique name, which implies that the type used to -name the kernel is also somehow templated. - -Usually, a raw kernel must have a type-name because the type-name is used to -obtain a `kernel` object that represents the raw kernel, and a `kernel` object -is needed in order to enqueue the kernel. -However, when raw kernels are used with the +When a free function kernel has a name, the type used as the name must not be +used to name any other kernel in the application (either a free function kernel +or a "named" standard kernel). +The type used to name a free function kernel has the same limitations as the +type used to name a standard SYCL kernel as specified in section 5.2 "Naming of +kernels" of the core SYCL specification. +A free function kernel may be defined as a function template, however, all +instantiations of the function must have a unique name, which implies that the +type used to name the kernel is also somehow templated. + +Usually, a free function kernel must have a type-name because the type-name is +used to obtain a `kernel` object that represents the free function kernel, and +a `kernel` object is needed in order to enqueue the kernel. +However, when free function kernels are used with the link:../proposed/sycl_ext_oneapi_kernel_compiler.asciidoc[ sycl_ext_oneapi_kernel_compiler] extension, it is possible to obtain the `kernel` even without a type-name. -In this case, the raw kernel can be unnamed by letting the `Name` template -parameter have its default value of `void`. +In this case, the free function kernel can be unnamed by letting the `Name` +template parameter have its default value of `void`. The following table provides additional details about these compile-time properties. @@ -163,8 +163,8 @@ struct is_property_key : std::true_type {}; ---- !==== -Indicates that the function is a raw kernel that is invoked with a simple -`range` iteration space of `Dims` dimensions. +Indicates that the function is a free function kernel that is invoked with a +simple `range` iteration space of `Dims` dimensions. When the `Name` parameter is not `void`, it is a type that represents the name of the kernel. @@ -214,7 +214,7 @@ struct is_property_key : std::true_type {}; ---- !==== -Indicates that the function is a raw kernel that is invoked with an +Indicates that the function is a free function kernel that is invoked with an `nd_range` iteration space of `Dims` dimensions. When the `Name` parameter is not `void`, it is a type that represents the name of the kernel. @@ -265,8 +265,8 @@ struct is_property_key : std::true_type {}; ---- !==== -Indicates that the function is a raw kernel that is invoked via `single_task` -(i.e. without any iteration space). +Indicates that the function is a free function kernel that is invoked via +`single_task` (i.e. without any iteration space). When the `Name` parameter is not `void`, it is a type that represents the name of the kernel. @@ -287,14 +287,14 @@ The type-name of the kernel. !==== |==== -When a function is defined as a raw kernel, each parameter to the function is -a kernel argument, which must abide by the rules for allowable kernel parameter -types specified in section 4.12.4 "Rules for parameter passing to kernels" of -the core SYCL specification. +When a function is defined as a free function kernel, each parameter to the +function is a kernel argument, which must abide by the rules for allowable +kernel parameter types specified in section 4.12.4 "Rules for parameter passing +to kernels" of the core SYCL specification. The function's return type must be `void`. -The following example demonstrates how a raw kernel can be defined using the -`range_kernel` property: +The following example demonstrates how a free function kernel can be defined +using the `range_kernel` property: ``` namespace syclex = sycl::ext::oneapi::experimental; @@ -309,14 +309,16 @@ void iota(float start, float *ptr) { === Interaction with the kernel bundle APIs -Raw kernels work with the kernel bundle APIs defined in section 4.11 "Kernel -bundles" of the core SYCL specification in a natural way. +Free function kernels work with the kernel bundle APIs defined in section 4.11 +"Kernel bundles" of the core SYCL specification in a natural way. Any kernel bundle API that accepts a kernel's type-name may be used with the -type-name of a raw kernel. +type-name of a free function kernel. In fact, an application must use the kernel bundle APIs in order to invoke a -raw kernel because the application must first obtain a `kernel` object. -Typically, the application first obtains a kernel bundle from the raw kernel's -name and then obtains its `kernel` object as shown in the following example: +free function kernel because the application must first obtain a `kernel` +object. +Typically, the application first obtains a kernel bundle from the free function +kernel's name and then obtains its `kernel` object as shown in the following +example: ``` namespace syclex = sycl::ext::oneapi::experimental; @@ -332,7 +334,7 @@ int main() { sycl::queue q; sycl::context ctxt = q.get_context(); - // Get a kernel bundle that contains the raw kernel "my_iota". + // Get a kernel bundle that contains the free function kernel "my_iota". auto exe_bndl = sycl::get_kernel_bundle(ctxt); @@ -341,24 +343,24 @@ int main() { } ``` -=== Enqueuing a raw kernel and setting parameter values - -Once the application obtains a `kernel` object for a raw kernel, it can enqueue -the kernel to a device using any of the SYCL functions that allow a kernel to -be enqueued via a `kernel` object. -The application must enqueue the raw kernel according to its type. -For example, a raw kernel defined via `range_kernel` can be enqueued by calling -the `handler::parallel_for` overload taking a `range`. -A raw kernel defined via `nd_range_kernel` can be enqueued by calling the -`handler::parallel_for` overload taking an `nd_range`. -A raw kernel defined via `single_task_kernel` can be enqueued by calling -`handler::single_task`. - -Attempting to enqueue a raw kernel using a mechanism that does not match its -type results in undefined behavior. -Attempting to enqueue a raw kernel with a `range` or `nd_range` whose -dimensionality does not match the raw kernel definition results in undefined -behavior. +=== Enqueuing a free function kernel and setting parameter values + +Once the application obtains a `kernel` object for a free function kernel, it +can enqueue the kernel to a device using any of the SYCL functions that allow +a kernel to be enqueued via a `kernel` object. +The application must enqueue the free function kernel according to its type. +For example, a free function kernel defined via `range_kernel` can be enqueued +by calling the `handler::parallel_for` overload taking a `range`. +A free function kernel defined via `nd_range_kernel` can be enqueued by calling +the `handler::parallel_for` overload taking an `nd_range`. +A free function kernel defined via `single_task_kernel` can be enqueued by +calling `handler::single_task`. + +Attempting to enqueue a free function kernel using a mechanism that does not +match its type results in undefined behavior. +Attempting to enqueue a free function kernel with a `range` or `nd_range` whose +dimensionality does not match the free function kernel definition results in +undefined behavior. The application is also responsible for setting the values of any kernel arguments when the kernel is enqueued. @@ -371,10 +373,11 @@ Failing to set the value of a kernel argument results in undefined behavior. In a standard SYCL kernel, the iteration ID is passed as a parameter to the kernel's callable object. -However, this is not the case for a raw kernel because the function parameters -are used to pass the kernel arguments instead. -Therefore, a raw kernel must obtain the iteration ID in some other way. -Typically, a raw kernel uses the free functions specified in +However, this is not the case for a free function kernel because the function +parameters are used to pass the kernel arguments instead. +Therefore, a free function kernel must obtain the iteration ID in some other +way. +Typically, a free function kernel uses the free functions specified in link:../proposed/sycl_ext_oneapi_free_function_queries.asciidoc[ sycl_ext_oneapi_free_function_queries] for this purpose. @@ -382,9 +385,9 @@ sycl_ext_oneapi_free_function_queries] for this purpose. If the implementation supports link:../experimental/sycl_ext_oneapi_kernel_properties.asciidoc[ -sycl_ext_oneapi_kernel_properties], a raw kernel may be decorated with these -properties by applying the properties to the function definition as illustrated -below. +sycl_ext_oneapi_kernel_properties], a free function kernel may be decorated +with these properties by applying the properties to the function definition as +illustrated below. ``` struct my_iota; @@ -400,10 +403,11 @@ As with standard SYCL kernels, these properties can be queried via `kernel::get_info` using either the `info::kernel::attributes` information descriptor or the `info::kernel_device_specific` information descriptors. -== Raw kernels that are function templates +== Free function kernels that are function templates -A raw kernel may be defined as a function template, however, any instantiation -that is submitted at runtime must be instantiated at compile time. +A free function kernel may be defined as a function template, however, any +instantiation that is submitted at runtime must be instantiated at compile +time. Normally, the application does this with an explicit template instantiation as shown below: @@ -424,7 +428,7 @@ template void iota(float, float *); As noted earlier, each instantiation must have a unique type-name, as illustrated in the example above. The application can use this type-name to obtain the `kernel` object for the -raw kernel: +free function kernel: ``` sycl::kernel myiota_int = exe_bndl.get_kernel>(); @@ -434,8 +438,8 @@ sycl::kernel myiota_float = exe_bndl.get_kernel>(); == Example -The following example demonstrates how to define a raw kernel and then enqueue -it on a device. +The following example demonstrates how to define a free function kernel and then +enqueue it on a device. ``` namespace syclex = sycl::ext::oneapi::experimental; @@ -455,7 +459,7 @@ void main() { sycl::queue q; sycl::context ctxt = q.get_context(); - // Get a kernel bundle that contains the raw kernel "my_iota". + // Get a kernel bundle that contains the free function kernel "my_iota". auto exe_bndl = sycl::get_kernel_bundle(ctxt); @@ -485,7 +489,8 @@ void main() { Rather, the front-end simply passes the properties verbatim into the generated LLVM IR. If we want to avoid teaching the front-end about these property names, we - could instead change the syntax for declaring a raw kernel to be like this: + could instead change the syntax for declaring a free function kernel to be + like this: + ``` SYCL_EXT_ONEAPI_RANGE_KERNEL(1, my_iota) @@ -496,7 +501,7 @@ void iota(float start, float *ptr) { + Here, the macro `SYCL_EXT_ONEAPI_RANGE_KERNEL` would expand to two things. One part of the expansion would be a C++ attribute that tells the front-end -that this function is a raw "range" kernel with the given type-name. +that this function is a free function "range" kernel with the given type-name. Another part of the expansion would add the same IR attributes as the `range_kernel` property (assuming there is even a need to represent this information in the IR). @@ -529,7 +534,7 @@ instead of the property. big limitation because `accessor` and `local_accessor` would both be prohibited. If we add this restriction, the front-end should diagnose an error if a - raw kernel is defined to take such an argument. + free function kernel is defined to take such an argument. * There is a similar problem with kernel arguments that have been optimized. Consider a kernel that uses an `accessor`. @@ -538,9 +543,9 @@ instead of the property. corresponding kernel argument. Again, we can handle this by making `handler::set_arg` smart enough to know which member variables have been optimized away. - Alternatively, we can disable these optimizations for raw kernels. - We also have to consider the behavior when a raw kernel has a formal - parameter that is unused (or optimized away) inside the kernel. + Alternatively, we can disable these optimizations for free function kernels. + We also have to consider the behavior when a free function kernel has a + formal parameter that is unused (or optimized away) inside the kernel. Can the compiler optimize away such an argument? If so, `handler::set_arg` would need to be smart enough to treat an attempt to set the value of such an argument as a no-op. @@ -548,8 +553,9 @@ instead of the property. interpreting the argument index of arguments that follow an optimized-away argument. -* We currently say it is UB if there is a mismatch between a raw kernel's type - or dimensionality and the call to `parallel_for` or `single_task`. +* We currently say it is UB if there is a mismatch between a free function + kernel's type or dimensionality and the call to `parallel_for` or + `single_task`. Should we go a step further and require an exception to be thrown in these cases? I'm pretty sure we can implement this in {dpcpp}, but I'm not sure about a @@ -557,7 +563,7 @@ instead of the property. However, I'm not sure _any_ of this can be implemented without compiler support. -* We currently say it is UB if a raw kernel is enqueued without setting a value - for each of its arguments. +* We currently say it is UB if a free function kernel is enqueued without + setting a value for each of its arguments. Should we go a step further and require an exception in this case? This seems easier to implement, even for library-only. From a7bef5ed34e9e71199da439fa5d27015782c191d Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Fri, 17 Nov 2023 13:37:15 -0500 Subject: [PATCH 05/29] Identify kernels by address rather than type-name Abandon the idea of identifying free function kernels by a type-name. Instead, we use a non-type template parameter whose value is the address of the function. --- ..._ext_oneapi_free_function_kernels.asciidoc | 422 +++++++++++------- 1 file changed, 271 insertions(+), 151 deletions(-) diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc index 40402773061e4..e7dc6c6d4b691 100644 --- a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc @@ -10,6 +10,7 @@ :encoding: utf-8 :lang: en :dpcpp: pass:[DPC++] +:endnote: —{nbsp}end{nbsp}note // Set the default source code type in this document to C++, // for syntax highlighting purposes. This is needed because @@ -113,30 +114,8 @@ supports. === Defining a free function kernel A free function kernel is a normal C++ function definition, where the function -definition is decorated with one of the following compile-time properties: +declaration is decorated with one of the following compile-time properties: `range_kernel`, `nd_range_kernel`, or `single_task_kernel`. -Each of these properties takes a template parameter `Name`, which serves as the -name of the kernel, similar to the `KernelName` template parameter to -`parallel_for` for a standard SYCL kernel. -When a free function kernel has a name, the type used as the name must not be -used to name any other kernel in the application (either a free function kernel -or a "named" standard kernel). -The type used to name a free function kernel has the same limitations as the -type used to name a standard SYCL kernel as specified in section 5.2 "Naming of -kernels" of the core SYCL specification. -A free function kernel may be defined as a function template, however, all -instantiations of the function must have a unique name, which implies that the -type used to name the kernel is also somehow templated. - -Usually, a free function kernel must have a type-name because the type-name is -used to obtain a `kernel` object that represents the free function kernel, and -a `kernel` object is needed in order to enqueue the kernel. -However, when free function kernels are used with the -link:../proposed/sycl_ext_oneapi_kernel_compiler.asciidoc[ -sycl_ext_oneapi_kernel_compiler] extension, it is possible to obtain the -`kernel` even without a type-name. -In this case, the free function kernel can be unnamed by letting the `Name` -template parameter have its default value of `void`. The following table provides additional details about these compile-time properties. @@ -151,12 +130,12 @@ a! [source] ---- struct range_kernel_key { - template - using value_t = property_value; + template + using value_t = property_value; }; -template -inline constexpr range_kernel_key::value_t range_kernel; +template +inline constexpr range_kernel_key::value_t range_kernel; template<> struct is_property_key : std::true_type {}; @@ -165,11 +144,8 @@ struct is_property_key : std::true_type {}; Indicates that the function is a free function kernel that is invoked with a simple `range` iteration space of `Dims` dimensions. -When the `Name` parameter is not `void`, it is a type that represents the name -of the kernel. -The `property_value` struct has the following member variables and type -aliases: +The `property_value` struct has the following member variables: [%header,cols="1,1"] !==== @@ -183,14 +159,6 @@ static constexpr int dimensions = Dims ---- ! The number of dimensions of the kernel's range. - -a! -[source] ----- -using name = Name ----- -! -The type-name of the kernel. !==== a| @@ -202,12 +170,12 @@ a! [source] ---- struct nd_range_kernel_key { - template - using value_t = property_value; + template + using value_t = property_value; }; -template -inline constexpr nd_range_kernel_key::value_t nd_range_kernel; +template +inline constexpr nd_range_kernel_key::value_t nd_range_kernel; template<> struct is_property_key : std::true_type {}; @@ -216,11 +184,8 @@ struct is_property_key : std::true_type {}; Indicates that the function is a free function kernel that is invoked with an `nd_range` iteration space of `Dims` dimensions. -When the `Name` parameter is not `void`, it is a type that represents the name -of the kernel. -The `property_value` struct has the following member variables and type -aliases: +The `property_value` struct has the following member variables: [%header,cols="1,1"] !==== @@ -234,14 +199,6 @@ static constexpr int dimensions = Dims ---- ! The number of dimensions of the kernel's range. - -a! -[source] ----- -using name = Name ----- -! -The type-name of the kernel. !==== a| @@ -253,12 +210,10 @@ a! [source] ---- struct single_task_kernel_key { - template - using value_t = property_value; + using value_t = property_value; }; -template -inline constexpr single_task_kernel_key::value_t single_task_kernel; +inline constexpr single_task_kernel_key::value_t single_task_kernel; template<> struct is_property_key : std::true_type {}; @@ -267,24 +222,6 @@ struct is_property_key : std::true_type {}; Indicates that the function is a free function kernel that is invoked via `single_task` (i.e. without any iteration space). -When the `Name` parameter is not `void`, it is a type that represents the name -of the kernel. - -The `property_value` struct has the following member type alias: - -[%header,cols="1,1"] -!==== -!Member -!Description - -a! -[source] ----- -using name = Name ----- -! -The type-name of the kernel. -!==== |==== When a function is defined as a free function kernel, each parameter to the @@ -299,50 +236,211 @@ using the `range_kernel` property: ``` namespace syclex = sycl::ext::oneapi::experimental; -struct my_iota; - -SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(syclex::range_kernel<1, my_iota>) +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(syclex::range_kernel<1>) void iota(float start, float *ptr) { // ... } ``` -=== Interaction with the kernel bundle APIs +=== New kernel bundle member functions -Free function kernels work with the kernel bundle APIs defined in section 4.11 -"Kernel bundles" of the core SYCL specification in a natural way. -Any kernel bundle API that accepts a kernel's type-name may be used with the -type-name of a free function kernel. -In fact, an application must use the kernel bundle APIs in order to invoke a -free function kernel because the application must first obtain a `kernel` -object. -Typically, the application first obtains a kernel bundle from the free function -kernel's name and then obtains its `kernel` object as shown in the following -example: +This extension adds the following new functions which add kernel bundle support +for free function kernels: + +|==== +a| +[frame=all,grid=none] +!==== +a! +[source] +---- +namespace sycl::ext::oneapi::experimental { + +template +kernel_id get_kernel_id(); + +} // namespace sycl::ext::oneapi::experimental +---- +!==== + +_Preconditions_: The address `Func` must be the address of some free function +kernel that is defined in the calling application. + +[_Note:_ The function `Func` need not be defined in the same translation unit +as the call to `get_kernel_id`. +_{endnote}_] + +_Returns:_ The kernel identifier that is associated with that kernel. + +!==== +a! +[source] +---- +namespace sycl::ext::oneapi::experimental { + +template // (1) +kernel_bundle get_kernel_bundle(const context& ctxt); + +template // (2) +kernel_bundle get_kernel_bundle(const context& ctxt, + const std::vector& devs); + +} // namespace sycl::ext::oneapi::experimental +---- +!==== + +_Preconditions_: The address `Func` must be the address of some free function +kernel that is defined in the calling application. + +_Returns: (1)_ The same value as +`get_kernel_bundle(ctxt, ctxt.get_devices(), {get_kernel_id()})`. + +_Returns: (2)_ The same value as +`get_kernel_bundle(ctxt, devs, {get_kernel_id()})`. + +!==== +a! +[source] +---- +namespace sycl::ext::oneapi::experimental { + +template // (1) +bool has_kernel_bundle(const context& ctxt); + +template // (2) +bool has_kernel_bundle(const context& ctxt, const std::vector& devs); + +} // namespace sycl::ext::oneapi::experimental +---- +!==== + +_Preconditions_: The address `Func` must be the address of some free function +kernel that is defined in the calling application. + +_Returns: (1)_ The same value as +`has_kernel_bundle(ctxt, {get_kernel_id()})`. + +_Returns: (2)_ The same value as +`has_kernel_bundle(ctxt, devs, {get_kernel_id()})`. + +!==== +a! +[source] +---- +namespace sycl::ext::oneapi::experimental { + +template bool is_compatible(const device& dev); + +} // namespace sycl::ext::oneapi::experimental +---- +!==== + +_Preconditions_: The address `Func` must be the address of some free function +kernel that is defined in the calling application. + +_Returns:_ The same value as +`is_compatible({get_kernel_id()}, dev)`. + +|==== + +This extension also adds the following new member functions to the +`kernel_bundle` class: ``` -namespace syclex = sycl::ext::oneapi::experimental; +namespace sycl { -struct my_iota; +template +class kernel_bundle { + // ... -SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(syclex::range_kernel<1, my_iota>) -void iota(float start, float *ptr) { - // ... -} + template + bool ext_oneapi_has_kernel(); -int main() { - sycl::queue q; - sycl::context ctxt = q.get_context(); + template + bool ext_oneapi_has_kernel(const device &dev); - // Get a kernel bundle that contains the free function kernel "my_iota". - auto exe_bndl = - sycl::get_kernel_bundle(ctxt); + template + kernel ext_oneapi_get_kernel(); +}; - // Get a kernel object for the "my_iota" function from that bundle. - sycl::kernel myiota = exe_bndl.get_kernel(); -} +} // namespace sycl ``` +|==== +a| +[frame=all,grid=none] +!==== +a! +[source] +---- +template // (1) +bool ext_oneapi_has_kernel() + +template // (2) +bool ext_oneapi_has_kernel(const device &dev) +---- +!==== + +_Preconditions_: The address `Func` must be the address of some free function +kernel that is defined in the calling application. + +_Returns: (1)_: The value `true` only if the kernel bundle contains the free +function kernel whose address is `Func`. + +_Returns: (2)_: The value `true` only if the kernel bundle contains the free +function kernel whose address is `Func` and if that kernel is compatible with +the device `dev`. + +!==== +a! +[source] +---- +template +kernel ext_oneapi_get_kernel() +---- +!==== + +_Constraints:_ This function is available only when `State` is +`bundle_state::executable`. + +_Preconditions_: The address `Func` must be the address of some free function +kernel that is defined in the calling application. + +_Returns:_ If the kernel whose address is `Func` resides in this kernel bundle, +returns the `kernel` object representing that kernel. + +_Throws_: An `exception` with the error code `errc::invalid` if the kernel with +address `Func` does not reside in this kernel bundle. +|==== + +=== Behavior with kernel bundle functions in the core SYCL specification + +Free function kernels that are defined by the application have a corresponding +kernel identifier (`kernel_id`) and are contained by the device images in the +SYCL application. +This section defines the ramifications this has on the kernel bundle functions +defined by the core SYCL specification. + +* The function `get_kernel_ids()` returns the kernel identifiers for any free + function kernels defined by the application, in addition to identifiers for + any kernels defined as lambda expressions or named kernel objects. + +* The kernel bundle returned by + `get_kernel_bundle(const context&, const std::vector& devs)` contains + all of the free function kernels defined by the application that are + compatible with at least one of the devices in `devs`, in addition to all of + the kernels defined as lambda expressions or named kernel objects that are + compatible with one of these devices. + +* The function `has_kernel_bundle(const context&, const std::vector&)` + considers free function kernels defined by the application when computing its + return value. + +The information descriptor `info::kernel::num_args` may be used to query a +`kernel` object that represents a free function kernel. +The return value tells the number of formal parameters in the function's +definition. + === Enqueuing a free function kernel and setting parameter values Once the application obtains a `kernel` object for a free function kernel, it @@ -368,6 +466,10 @@ For example, when enqueuing a kernel with `handler::parallel_for` or `handler::single_task`, the kernel argument values must be set via `handler::set_arg` or `handler::set_args`. Failing to set the value of a kernel argument results in undefined behavior. +The type of the value passed to `handler::set_arg` or `handler::set_args` must +be the same as the type of the corresponding formal parameter in the free +function kernel. +Passing a value with a mismatched type results in undefined behavior. === Obtaining the iteration id for a kernel @@ -377,7 +479,7 @@ However, this is not the case for a free function kernel because the function parameters are used to pass the kernel arguments instead. Therefore, a free function kernel must obtain the iteration ID in some other way. -Typically, a free function kernel uses the free functions specified in +Typically, a free function kernel uses the functions specified in link:../proposed/sycl_ext_oneapi_free_function_queries.asciidoc[ sycl_ext_oneapi_free_function_queries] for this purpose. @@ -390,9 +492,7 @@ with these properties by applying the properties to the function definition as illustrated below. ``` -struct my_iota; - -SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(syclex::nd_range_kernel<1, my_iota>) +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(syclex::nd_range_kernel<1>) SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(syclex::work_group_size<32>) void iota(float start, float *ptr) { // ... @@ -403,51 +503,20 @@ As with standard SYCL kernels, these properties can be queried via `kernel::get_info` using either the `info::kernel::attributes` information descriptor or the `info::kernel_device_specific` information descriptors. -== Free function kernels that are function templates - -A free function kernel may be defined as a function template, however, any -instantiation that is submitted at runtime must be instantiated at compile -time. -Normally, the application does this with an explicit template instantiation as -shown below: - -``` -template -struct my_iota; - -template -SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(syclex::range_kernel<1, my_iota>) -void iota(T start, T *ptr) { - // ... -} - -template void iota(int, int *); -template void iota(float, float *); -``` - -As noted earlier, each instantiation must have a unique type-name, as -illustrated in the example above. -The application can use this type-name to obtain the `kernel` object for the -free function kernel: - -``` -sycl::kernel myiota_int = exe_bndl.get_kernel>(); -sycl::kernel myiota_float = exe_bndl.get_kernel>(); -``` - == Example +=== Basic invocation + The following example demonstrates how to define a free function kernel and then enqueue it on a device. ``` namespace syclex = sycl::ext::oneapi::experimental; -struct my_iota; static constexpr size_t NUM = 1024; -SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(syclex::range_kernel<1, my_iota>) +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(syclex::range_kernel<1>) void iota(float start, float *ptr) { // Get the ID of this kernel iteration. size_t id = syclex::this_kernel::get_id(); @@ -459,26 +528,77 @@ void main() { sycl::queue q; sycl::context ctxt = q.get_context(); - // Get a kernel bundle that contains the free function kernel "my_iota". + // Get a kernel bundle that contains the free function kernel "iota". auto exe_bndl = - sycl::get_kernel_bundle(ctxt); + syclex::get_kernel_bundle(ctxt); - // Get a kernel object for the "my_iota" function from that bundle. - sycl::kernel myiota = exe_bndl.get_kernel(); + // Get a kernel object for the "iota" function from that bundle. + sycl::kernel k_iota = exe_bndl.ext_oneapi_get_kernel(); float *ptr = sycl::malloc_shared(NUM, q); q.submit([&](sycl::handler &cgh) { // Set the values of the kernel arguments. cgh.set_args(3.14f, ptr); - cgh.parallel_for({NUM}, myiota); + cgh.parallel_for({NUM}, k_iota); }).wait(); } ``` +=== Free function kernels which are templates or overloaded + +A free function kernel may be defined as a function template. +It is also legal to define several overloads for a free function kernel. +The following example demonstrates how to get a kernel identifier in such +cases. + +``` +namespace syclex = sycl::ext::oneapi::experimental; + +template +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(syclex::range_kernel<1>) +void iota(T start, T *ptr) { + // ... +} + +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(syclex::single_task_kernel) +void ping(float *x) { + // ... +} + +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(syclex::single_task_kernel) +void ping(int *x) { + // ... +} + +int main() { + // When the free function kernel is templated, pass the address of a + // specific instantiation. + sycl::kernel_id iota_float = syclex::get_kernel_id>(); + sycl::kernel_id iota_int = syclex::get_kernel_id>(); + + // When there are multiple overloads of a free function kernel, use a cast + // to disambiguate. + sycl::kernel_id ping_float = syclex::get_kernel_id<(void(*)(float))ping>(); + sycl::kernel_id ping_int = syclex::get_kernel_id<(void(*)(int))ping>(); +} +``` + == Issues +* Can the front-end define a trait like this, which returns true only if the + address is a free function kernel: ++ +``` +template +bool is_kernel_function_v; +``` ++ +If the front-end can provide this, we can provide a nice diagnostic when the +user passes an invalid address to the kernel bundle functions like +`template kernel_id get_kernel_id()`. + * The front-end team has expressed concern about implementing this syntax because it requires the front-end to recognize the property names `range_kernel`, `nd_range_kernel`, and `single_task_kernel`. @@ -493,7 +613,7 @@ void main() { like this: + ``` -SYCL_EXT_ONEAPI_RANGE_KERNEL(1, my_iota) +SYCL_EXT_ONEAPI_RANGE_KERNEL(1) void iota(float start, float *ptr) { // ... } @@ -501,7 +621,7 @@ void iota(float start, float *ptr) { + Here, the macro `SYCL_EXT_ONEAPI_RANGE_KERNEL` would expand to two things. One part of the expansion would be a C++ attribute that tells the front-end -that this function is a free function "range" kernel with the given type-name. +that this function is a free function "range" kernel. Another part of the expansion would add the same IR attributes as the `range_kernel` property (assuming there is even a need to represent this information in the IR). From 93504b52ae3e3749a51aade2490236afd4b2be09 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Fri, 17 Nov 2023 14:39:22 -0500 Subject: [PATCH 06/29] Use macro instead of property Use special macros to decorate free function kernels instead of compile time properties. --- ..._ext_oneapi_free_function_kernels.asciidoc | 195 ++++++------------ 1 file changed, 64 insertions(+), 131 deletions(-) diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc index e7dc6c6d4b691..aeb6ea9f7c3ef 100644 --- a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc @@ -43,8 +43,6 @@ the SYCL specification refer to that revision. This extension also depends on the following other SYCL extensions: -* link:../experimental/sycl_ext_oneapi_properties.asciidoc[ - sycl_ext_oneapi_properties] * link:../proposed/sycl_ext_oneapi_free_function_queries.asciidoc[ sycl_ext_oneapi_free_function_queries] @@ -114,109 +112,79 @@ supports. === Defining a free function kernel A free function kernel is a normal C++ function definition, where the function -declaration is decorated with one of the following compile-time properties: -`range_kernel`, `nd_range_kernel`, or `single_task_kernel`. +declaration is decorated with one of the following macros: +`SYCL_EXT_ONEAPI_RANGE_KERNEL`, `SYCL_EXT_ONEAPI_ND_RANGE_KERNEL`, or +`SYCL_EXT_ONEAPI_SINGLE_TASK_KERNEL`. -The following table provides additional details about these compile-time -properties. +When a function declaration is decorated with one of these macros, the +following rules must be observed: + +* The position of the macro decoration must be the same as the position allowed + for C++ attribute decorations on the function declaration. + For example: ++ +``` +SYCL_EXT_ONEAPI_RANGE_KERNEL(1) +void iota(float start, float *ptr); +``` + +* The macro decoration must appear on the first declaration of the function + in the translation unit. + Redeclarations of the function may optionally be decorated with the same + macro if the macro arguments are the same. + The effect is the same regardless of whether redeclaration are so decorated. + +* The same function may be decorated with at most one of these macros. + However, the same function may be decorated multiple times with the same + macro (with the same arguments). + Programs that decorate the same function with more than one of these macros + or with multiple instances of the same macro with different arguments are ill + formed. + +* If a function is decorated with one of these macros in one translation unit, + that function must also be decorated with the same macro (with the same + arguments) in all other translation units. + +The following table provides additional details about these macros: |==== a| -*`range_kernel` property* - [frame=all,grid=none] !==== a! [source] ---- -struct range_kernel_key { - template - using value_t = property_value; -}; - -template -inline constexpr range_kernel_key::value_t range_kernel; - -template<> -struct is_property_key : std::true_type {}; +#define SYCL_EXT_ONEAPI_RANGE_KERNEL(Dims) /*unspecified*/ ---- !==== Indicates that the function is a free function kernel that is invoked with a simple `range` iteration space of `Dims` dimensions. - -The `property_value` struct has the following member variables: - -[%header,cols="1,1"] -!==== -!Member -!Description - -a! -[source] ----- -static constexpr int dimensions = Dims ----- -! -The number of dimensions of the kernel's range. -!==== +The `Dims` parameter must be an integral constant expression that specifies a +valid number of dimensions for the `range` class. a| -*`nd_range_kernel` property* - [frame=all,grid=none] !==== a! [source] ---- -struct nd_range_kernel_key { - template - using value_t = property_value; -}; - -template -inline constexpr nd_range_kernel_key::value_t nd_range_kernel; - -template<> -struct is_property_key : std::true_type {}; +#define SYCL_EXT_ONEAPI_ND_RANGE_KERNEL(Dims) /*unspecified*/ ---- !==== Indicates that the function is a free function kernel that is invoked with an `nd_range` iteration space of `Dims` dimensions. - -The `property_value` struct has the following member variables: - -[%header,cols="1,1"] -!==== -!Member -!Description - -a! -[source] ----- -static constexpr int dimensions = Dims ----- -! -The number of dimensions of the kernel's range. -!==== +The `Dims` parameter must be an integral constant expression that specifies a +valid number of dimensions for the `nd_range` class. a| -*`single_task_kernel` property* - [frame=all,grid=none] !==== a! [source] ---- -struct single_task_kernel_key { - using value_t = property_value; -}; - -inline constexpr single_task_kernel_key::value_t single_task_kernel; - -template<> -struct is_property_key : std::true_type {}; +#define SYCL_EXT_ONEAPI_SINGLE_TASK_KERNEL /*unspecified */ ---- !==== @@ -230,18 +198,20 @@ kernel parameter types specified in section 4.12.4 "Rules for parameter passing to kernels" of the core SYCL specification. The function's return type must be `void`. -The following example demonstrates how a free function kernel can be defined -using the `range_kernel` property: +The following example demonstrates how a free function kernel using a +3-dimensional nd-range iteration space can be defined: ``` -namespace syclex = sycl::ext::oneapi::experimental; - -SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(syclex::range_kernel<1>) +SYCL_EXT_ONEAPI_ND_RANGE_KERNEL(3) void iota(float start, float *ptr) { // ... } ``` +A function decorated with one of these macros can still be called from host +code. +The macro has no effect in such cases. + === New kernel bundle member functions This extension adds the following new functions which add kernel bundle support @@ -447,12 +417,13 @@ Once the application obtains a `kernel` object for a free function kernel, it can enqueue the kernel to a device using any of the SYCL functions that allow a kernel to be enqueued via a `kernel` object. The application must enqueue the free function kernel according to its type. -For example, a free function kernel defined via `range_kernel` can be enqueued -by calling the `handler::parallel_for` overload taking a `range`. -A free function kernel defined via `nd_range_kernel` can be enqueued by calling -the `handler::parallel_for` overload taking an `nd_range`. -A free function kernel defined via `single_task_kernel` can be enqueued by -calling `handler::single_task`. +For example, a free function kernel defined via `SYCL_EXT_ONEAPI_RANGE_KERNEL` +can be enqueued by calling the `handler::parallel_for` overload taking a +`range`. +A free function kernel defined via `SYCL_EXT_ONEAPI_ND_RANGE_KERNEL` can be +enqueued by calling the `handler::parallel_for` overload taking an `nd_range`. +A free function kernel defined via `SYCL_EXT_ONEAPI_SINGLE_TASK_KERNEL` can be +enqueued by calling `handler::single_task`. Attempting to enqueue a free function kernel using a mechanism that does not match its type results in undefined behavior. @@ -488,17 +459,21 @@ sycl_ext_oneapi_free_function_queries] for this purpose. If the implementation supports link:../experimental/sycl_ext_oneapi_kernel_properties.asciidoc[ sycl_ext_oneapi_kernel_properties], a free function kernel may be decorated -with these properties by applying the properties to the function definition as +with these properties by applying the properties to the function declaration as illustrated below. ``` -SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(syclex::nd_range_kernel<1>) +SYCL_EXT_ONEAPI_RANGE_KERNEL(1) SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(syclex::work_group_size<32>) void iota(float start, float *ptr) { // ... } ``` +The `SYCL_EXT_ONEAPI_FUNCTION_PROPERTY` decorations may appear either before or +after the macro decorations that identify the function as a free function +kernel. + As with standard SYCL kernels, these properties can be queried via `kernel::get_info` using either the `info::kernel::attributes` information descriptor or the `info::kernel_device_specific` information descriptors. @@ -516,7 +491,7 @@ namespace syclex = sycl::ext::oneapi::experimental; static constexpr size_t NUM = 1024; -SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(syclex::range_kernel<1>) +SYCL_EXT_ONEAPI_RANGE_KERNEL(1) void iota(float start, float *ptr) { // Get the ID of this kernel iteration. size_t id = syclex::this_kernel::get_id(); @@ -556,17 +531,17 @@ cases. namespace syclex = sycl::ext::oneapi::experimental; template -SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(syclex::range_kernel<1>) +SYCL_EXT_ONEAPI_RANGE_KERNEL(1) void iota(T start, T *ptr) { // ... } -SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(syclex::single_task_kernel) +SYCL_EXT_ONEAPI_SINGLE_TASK_KERNEL void ping(float *x) { // ... } -SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(syclex::single_task_kernel) +SYCL_EXT_ONEAPI_SINGLE_TASK_KERNEL void ping(int *x) { // ... } @@ -599,48 +574,6 @@ If the front-end can provide this, we can provide a nice diagnostic when the user passes an invalid address to the kernel bundle functions like `template kernel_id get_kernel_id()`. -* The front-end team has expressed concern about implementing this syntax - because it requires the front-end to recognize the property names - `range_kernel`, `nd_range_kernel`, and `single_task_kernel`. - This is necessary because the front-end must know that functions decorated - with these properties are device code, and it must somehow get the kernel - name in order to generate the integration header. - Currently, the front-end does not intrinsicly know any of the property names. - Rather, the front-end simply passes the properties verbatim into the - generated LLVM IR. - If we want to avoid teaching the front-end about these property names, we - could instead change the syntax for declaring a free function kernel to be - like this: -+ -``` -SYCL_EXT_ONEAPI_RANGE_KERNEL(1) -void iota(float start, float *ptr) { - // ... -} -``` -+ -Here, the macro `SYCL_EXT_ONEAPI_RANGE_KERNEL` would expand to two things. -One part of the expansion would be a C++ attribute that tells the front-end -that this function is a free function "range" kernel. -Another part of the expansion would add the same IR attributes as the -`range_kernel` property (assuming there is even a need to represent this -information in the IR). -There are two drawbacks to this approach. -One is purely aesthetic: we would need to add a new macro to the language -instead of using our existing property mechanism. -The other relates to error messages. -Error messages with macros tend to be worse when the user passes incorrect -parameters. -+ -This same issue exists also for the extension -link:https://github.com/intel/llvm/pull/10540[ -sycl_ext_oneapi_virtual_functions] because that extension adds the property -`indirectly_callable` which must be used to decorate virtual functions -that can be called from device code. -Again, the front-end would either need to recognize this property name, or we -would need to use a new macro like `SYCL_EXT_ONEAPI_INDIRECTLY_CALLABLE(name)` -instead of the property. - * We need to investigate whether there will be problems passing kernel arguments that are "decomposed" by the front-end. For example, if a kernel argument is a struct that contains an accessor as a From 817bff3216f8bc742a7acc96a4fcbe8c68910027 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Fri, 17 Nov 2023 17:09:20 -0500 Subject: [PATCH 07/29] Kernel arguments are private address space Clarify that the kernel arguments are in the private address space. This is consistent with arguments to device functions called within the kernel. It is also consistent with OpenCL C, which specifies that the kernel arguments are in the private address space. --- .../proposed/sycl_ext_oneapi_free_function_kernels.asciidoc | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc index aeb6ea9f7c3ef..ae15ec0e56f55 100644 --- a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc @@ -454,6 +454,12 @@ Typically, a free function kernel uses the functions specified in link:../proposed/sycl_ext_oneapi_free_function_queries.asciidoc[ sycl_ext_oneapi_free_function_queries] for this purpose. +=== Address space of kernel arguments + +The arguments to a free function kernel are in the private address space. +As a result, a kernel can modify its arguments, but the modification is visible +only within the work-item. + === Interaction with kernel properties If the implementation supports From 397668c47c09a0b0a6cbe82d05d1601e6ded8187 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Fri, 17 Nov 2023 16:18:05 -0500 Subject: [PATCH 08/29] Add some remaining open issues --- ..._ext_oneapi_free_function_kernels.asciidoc | 46 +++++++++++++++++++ 1 file changed, 46 insertions(+) diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc index ae15ec0e56f55..3932e5b206650 100644 --- a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc @@ -568,6 +568,39 @@ int main() { == Issues +* We're pretty sure that we want to define some syntax that allows a free + function kernel to be enqueued using the APIs defined in + link:../proposed/sycl_ext_oneapi_enqueue_functions.asciidoc[ + sycl_ext_oneapi_enqueue_functions], but we haven't settled on the exact API + yet. + One option is like this: ++ +``` +SYCL_EXT_ONEAPI_RANGE_KERNEL(1) +void iota(float start, float *ptr) { /*...*/ } + +int main() { + sycl::queue q; + float *ptr = sycl::malloc_shared(N, q); + sycl::parallel_for(q, {N}, 1.f, ptr); +} +``` ++ +Another option is like this: ++ +``` +SYCL_EXT_ONEAPI_RANGE_KERNEL(1) +void iota(float start, float *ptr) { /*...*/ } + +int main() { + sycl::queue q; + float *ptr = sycl::malloc_shared(N, q); + sycl::parallel_for(q, {N}, kfp, 1.f, ptr); +} +``` ++ +Where `kfp` would have some nicer name. + * Can the front-end define a trait like this, which returns true only if the address is a free function kernel: + @@ -612,6 +645,19 @@ user passes an invalid address to the kernel bundle functions like interpreting the argument index of arguments that follow an optimized-away argument. +* What happens if the user defines the kernel function with a default parameter + value? + Ideally, the caller could omit the call to `handler::set_arg` in this case, + but that seems hard to implement and would cost extra cycles whenever setting + a kernel argument. + Another option is to have the front-end diagnose an error in this case, but + that requires some special logic in the front-end. + FWIW, nvcc seems to allow default parameter values for kernels, and the + call site can omit the corresponding argument value when using the triple + chevron syntax. + However, the argument value cannot be omitted when invoking the kernel via + `cuLaunchKernel`. + * We currently say it is UB if there is a mismatch between a free function kernel's type or dimensionality and the call to `parallel_for` or `single_task`. From 3adda07be441a8ec1751e8905aad8be679426825 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Mon, 27 Nov 2023 17:26:32 -0500 Subject: [PATCH 09/29] Clarify address space of kernel args Address a code review comment about wording regarding the address space of kernel arguments. --- .../sycl_ext_oneapi_free_function_kernels.asciidoc | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc index 3932e5b206650..aa022654aa988 100644 --- a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc @@ -460,6 +460,13 @@ The arguments to a free function kernel are in the private address space. As a result, a kernel can modify its arguments, but the modification is visible only within the work-item. +[_Note:_ This applies only to the arguments themselves, not to memory that the +arguments point to. +For example, with a USM pointer argument, the pointer argument itself is in the +private address space, but the memory it points to is in the global address +space. +_{endnote}_] + === Interaction with kernel properties If the implementation supports From 7d9b6f60bd1cf707ace7b08510a3c90cd74e6166 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Fri, 8 Dec 2023 15:01:52 -0500 Subject: [PATCH 10/29] Add restriction for free function kernel --- .../proposed/sycl_ext_oneapi_free_function_kernels.asciidoc | 3 +++ 1 file changed, 3 insertions(+) diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc index aa022654aa988..4dabfad1f6651 100644 --- a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc @@ -119,6 +119,9 @@ declaration is decorated with one of the following macros: When a function declaration is decorated with one of these macros, the following rules must be observed: +* The function must be declared at either namespace scope are in class scope as + a static member function. + * The position of the macro decoration must be the same as the position allowed for C++ attribute decorations on the function declaration. For example: From 36680e8d9292b2b6981af05ff6d56d4411ef4710 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Mon, 11 Dec 2023 09:02:23 -0500 Subject: [PATCH 11/29] Update sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc Co-authored-by: Tom Honermann --- .../proposed/sycl_ext_oneapi_free_function_kernels.asciidoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc index 4dabfad1f6651..2b5c955c90c40 100644 --- a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc @@ -119,7 +119,7 @@ declaration is decorated with one of the following macros: When a function declaration is decorated with one of these macros, the following rules must be observed: -* The function must be declared at either namespace scope are in class scope as +* The function must be declared at either namespace scope or at class scope as a static member function. * The position of the macro decoration must be the same as the position allowed From 0e1be43b440e19378bddf6c9c09cfc5c1160440f Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Mon, 11 Dec 2023 09:03:12 -0500 Subject: [PATCH 12/29] Integration header restrictions List restrictions specific to implementations that use an integration header. --- ..._ext_oneapi_free_function_kernels.asciidoc | 75 +++++++++++++++++++ 1 file changed, 75 insertions(+) diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc index 2b5c955c90c40..f4485a75b5f50 100644 --- a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc @@ -494,6 +494,81 @@ As with standard SYCL kernels, these properties can be queried via `kernel::get_info` using either the `info::kernel::attributes` information descriptor or the `info::kernel_device_specific` information descriptors. +=== Restrictions for integration header implementations + +[_Note:_ The {dpcpp} implementation of this extension currently has the +restrictions listed in this section. +In the future, restrictions tied to the integration header approach might be +formalized in the core SYCL specification and tied to a macro, similar to the +feature set macros that exist already. +_{endnote}_] + +Implementations of SYCL that use the integration header technique have +additional restrictions for functions that are declared as free function +kernels. +These implementations automatically insert forward declarations of the free +function kernels at the top of the translation unit. +This has ramifications on how the application may declare the free function +kernels, on the types that may be used in those declarations, and on the way +the application may reference these kernel identifiers. +The following example illustrates the forward declarations that the +implementation inserts: + +``` +// Forward declarations of types used by the kernel functions. +struct mystruct; +enum myenum : int; + +// Each kernel is forward declared in the same namespace in which the +// application declares it. +SYCL_EXT_ONEAPI_RANGE_KERNEL(1) void kernel1(int *); +SYCL_EXT_ONEAPI_RANGE_KERNEL(1) void kernel2(mystruct, myenum); + +template +SYCL_EXT_ONEAPI_RANGE_KERNEL(1) void kernel3(T *); + +namespace ns { +SYCL_EXT_ONEAPI_RANGE_KERNEL(1) void kernel4(int *); +} +``` + +As a result, these implementations impose additional restrictions for functions +that are declared as free function kernels: + +* The function must be declared at namespace scope. + +* Any type used in the declaration of a parameter must be one of the allowed + types listed below. + +* If the function is instantiated from a template, any type used to instantiate + the template must be one of the allowed types listed below. + +* Uses of function identifiers in the application must assume that the free + function kernels are forward declared at the top of the translation unit. + Note that this can also affect references to functions that are not declared + as free functions kernels as illustrated below. ++ +``` +void foo(int) {/*...*/} + +void caller() { + auto *pf = foo; // This is ambiguous because foo(float) is forward declared + // in the integration header +} + +SYCL_EXT_ONEAPI_RANGE_KERNEL(1) +void foo(float) {/*...*/} +``` + +The allowed types are: + +* A {cpp} fundamental type. +* A class or struct that is defined at namespace scope. +* A scoped enumeration that is defined at namespace scope. +* An unscoped enumeration that has an explicit underlying type, where the + enumeration is defined at namespace scope. +* A type alias to one of the above types. + == Example From 326b502ed5c5fffbc949a16f54fff176818bd2ec Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Mon, 11 Dec 2023 13:49:39 -0500 Subject: [PATCH 13/29] Improve wording on kernel restrictions Also disallow default argument values for kernel functions. --- ..._ext_oneapi_free_function_kernels.asciidoc | 34 ++++++++----------- 1 file changed, 14 insertions(+), 20 deletions(-) diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc index f4485a75b5f50..0bd1aa766906e 100644 --- a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc @@ -122,6 +122,16 @@ following rules must be observed: * The function must be declared at either namespace scope or at class scope as a static member function. +* The function's return type must be `void`. + +* The function must not accept variadic arguments. + +* Each of the function's arguments must have a type that is an allowed kernel + parameter type as specified in section 4.12.4 "Rules for parameter passing to + kernels" of the core SYCL specification. + +* All declarations of the function must provide no default parameter values. + * The position of the macro decoration must be the same as the position allowed for C++ attribute decorations on the function declaration. For example: @@ -135,7 +145,7 @@ void iota(float start, float *ptr); in the translation unit. Redeclarations of the function may optionally be decorated with the same macro if the macro arguments are the same. - The effect is the same regardless of whether redeclaration are so decorated. + The effect is the same regardless of whether redeclarations are so decorated. * The same function may be decorated with at most one of these macros. However, the same function may be decorated multiple times with the same @@ -145,8 +155,8 @@ void iota(float start, float *ptr); formed. * If a function is decorated with one of these macros in one translation unit, - that function must also be decorated with the same macro (with the same - arguments) in all other translation units. + any other translation unit that declares the same function must also decorate + the function with the same macro (with the same arguments). The following table provides additional details about these macros: @@ -196,10 +206,7 @@ Indicates that the function is a free function kernel that is invoked via |==== When a function is defined as a free function kernel, each parameter to the -function is a kernel argument, which must abide by the rules for allowable -kernel parameter types specified in section 4.12.4 "Rules for parameter passing -to kernels" of the core SYCL specification. -The function's return type must be `void`. +function is a kernel argument. The following example demonstrates how a free function kernel using a 3-dimensional nd-range iteration space can be defined: @@ -730,19 +737,6 @@ user passes an invalid address to the kernel bundle functions like interpreting the argument index of arguments that follow an optimized-away argument. -* What happens if the user defines the kernel function with a default parameter - value? - Ideally, the caller could omit the call to `handler::set_arg` in this case, - but that seems hard to implement and would cost extra cycles whenever setting - a kernel argument. - Another option is to have the front-end diagnose an error in this case, but - that requires some special logic in the front-end. - FWIW, nvcc seems to allow default parameter values for kernels, and the - call site can omit the corresponding argument value when using the triple - chevron syntax. - However, the argument value cannot be omitted when invoking the kernel via - `cuLaunchKernel`. - * We currently say it is UB if there is a mismatch between a free function kernel's type or dimensionality and the call to `parallel_for` or `single_task`. From 83193df7236bf59a9d76f19cf785f723022ea196 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Mon, 11 Dec 2023 15:41:24 -0500 Subject: [PATCH 14/29] Disallow special kernel arguments Do not allow a free function kernel to take arguments of type `kernel_handler` or `reducer`. --- .../proposed/sycl_ext_oneapi_free_function_kernels.asciidoc | 3 +++ 1 file changed, 3 insertions(+) diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc index 0bd1aa766906e..d751629636f8a 100644 --- a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc @@ -129,6 +129,9 @@ following rules must be observed: * Each of the function's arguments must have a type that is an allowed kernel parameter type as specified in section 4.12.4 "Rules for parameter passing to kernels" of the core SYCL specification. + The function must not be declared with parameters of type `reducer` or + `kernel_handler`. + These special kernel arguments cannot be passed to a free function kernel. * All declarations of the function must provide no default parameter values. From 73ddbad24e765ad1db7d136d532818b8df55a80a Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Mon, 11 Dec 2023 13:52:45 -0500 Subject: [PATCH 15/29] Kernel function can be called from device code Clarify that a free function kernel can also be called as a regular function from device code. --- .../proposed/sycl_ext_oneapi_free_function_kernels.asciidoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc index d751629636f8a..ad5e1807e02f2 100644 --- a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc @@ -221,8 +221,8 @@ void iota(float start, float *ptr) { } ``` -A function decorated with one of these macros can still be called from host -code. +A function decorated with one of these macros can still be called as a normal +function in either host or device code. The macro has no effect in such cases. === New kernel bundle member functions From ae6f7ede568821463bed0744594db5a442289f1d Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Mon, 11 Dec 2023 14:36:40 -0500 Subject: [PATCH 16/29] Define traits Define traits that can be used to tell whether a function is declared as a free function kernel. Improve error reporting for functions taking a `Func` template parameter by adding a precondition for `Func`. --- ..._ext_oneapi_free_function_kernels.asciidoc | 143 ++++++++++++++---- 1 file changed, 116 insertions(+), 27 deletions(-) diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc index ad5e1807e02f2..6eb1d90311e37 100644 --- a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc @@ -225,6 +225,115 @@ A function decorated with one of these macros can still be called as a normal function in either host or device code. The macro has no effect in such cases. +=== New traits for kernel functions + +This extension defines the following traits that can be used to tell whether a +function is declared as a free function kernel. + +|==== +a| +[frame=all,grid=none] +!==== +a! +[source] +---- +namespace sycl::ext::oneapi::experimental { + +template +struct is_range_kernel; + +template +inline constexpr bool is_range_kernel_v = is_range_kernel::value; + +} // namespace sycl::ext::oneapi::experimental +---- +!==== + +If `Func` is the address of a function whose declaration is decorated with the +`SYCL_EXT_ONEAPI_RANGE_KERNEL(Dims)` macro, the `is_range_kernel` trait +provides the member constant `value` equal to `true`. +Otherwise `value` is `false`. + +The helper trait `is_range_kernel_v` provides the value of `value`. + +a| +[frame=all,grid=none] +!==== +a! +[source] +---- +namespace sycl::ext::oneapi::experimental { + +template +struct is_nd_range_kernel; + +template +inline constexpr bool is_nd_range_kernel_v = is_nd_range_kernel::value; + +} // namespace sycl::ext::oneapi::experimental +---- +!==== + +If `Func` is the address of a function whose declaration is decorated with the +`SYCL_EXT_ONEAPI_ND_RANGE_KERNEL(Dims)` macro, the `is_nd_range_kernel` trait +provides the member constant `value` equal to `true`. +Otherwise `value` is `false`. + +The helper trait `is_nd_range_kernel_v` provides the value of `value`. + +a| +[frame=all,grid=none] +!==== +a! +[source] +---- +namespace sycl::ext::oneapi::experimental { + +template +struct is_single_task_kernel; + +template +inline constexpr bool is_single_task_kernel_v = is_single_task_kernel::value; + +} // namespace sycl::ext::oneapi::experimental +---- +!==== + +If `Func` is the address of a function whose declaration is decorated with the +`SYCL_EXT_ONEAPI_SINGLE_TASK_KERNEL` macro, the `is_single_task_kernel` trait +provides the member constant `value` equal to `true`. +Otherwise `value` is `false`. + +The helper trait `is_single_task_kernel_v` provides the value of `value`. + +a| +[frame=all,grid=none] +!==== +a! +[source] +---- +namespace sycl::ext::oneapi::experimental { + +template +struct is_kernel; + +template +inline constexpr bool is_kernel_v = is_kernel::value; + +} // namespace sycl::ext::oneapi::experimental +---- +!==== + +If `Func` is the address of a function whose declaration is decorated with any +of the macros `SYCL_EXT_ONEAPI_RANGE_KERNEL`, +`SYCL_EXT_ONEAPI_ND_RANGE_KERNEL`, or `SYCL_EXT_ONEAPI_SINGLE_TASK_KERNEL`; the +`is_kernel` trait provides the member constant `value` equal to `true`. +Otherwise `value` is `false`. + +The helper trait `is_kernel_v` provides the value of `value`. +|==== + + === New kernel bundle member functions This extension adds the following new functions which add kernel bundle support @@ -246,8 +355,7 @@ kernel_id get_kernel_id(); ---- !==== -_Preconditions_: The address `Func` must be the address of some free function -kernel that is defined in the calling application. +_Constraints_: Available only if `is_kernel_v` is `true`. [_Note:_ The function `Func` need not be defined in the same translation unit as the call to `get_kernel_id`. @@ -272,8 +380,7 @@ kernel_bundle get_kernel_bundle(const context& ctxt, ---- !==== -_Preconditions_: The address `Func` must be the address of some free function -kernel that is defined in the calling application. +_Constraints_: Available only if `is_kernel_v` is `true`. _Returns: (1)_ The same value as `get_kernel_bundle(ctxt, ctxt.get_devices(), {get_kernel_id()})`. @@ -297,8 +404,7 @@ bool has_kernel_bundle(const context& ctxt, const std::vector& devs); ---- !==== -_Preconditions_: The address `Func` must be the address of some free function -kernel that is defined in the calling application. +_Constraints_: Available only if `is_kernel_v` is `true`. _Returns: (1)_ The same value as `has_kernel_bundle(ctxt, {get_kernel_id()})`. @@ -318,8 +424,7 @@ template bool is_compatible(const device& dev); ---- !==== -_Preconditions_: The address `Func` must be the address of some free function -kernel that is defined in the calling application. +_Constraints_: Available only if `is_kernel_v` is `true`. _Returns:_ The same value as `is_compatible({get_kernel_id()}, dev)`. @@ -364,8 +469,7 @@ bool ext_oneapi_has_kernel(const device &dev) ---- !==== -_Preconditions_: The address `Func` must be the address of some free function -kernel that is defined in the calling application. +_Constraints_: Available only if `is_kernel_v` is `true`. _Returns: (1)_: The value `true` only if the kernel bundle contains the free function kernel whose address is `Func`. @@ -383,11 +487,8 @@ kernel ext_oneapi_get_kernel() ---- !==== -_Constraints:_ This function is available only when `State` is -`bundle_state::executable`. - -_Preconditions_: The address `Func` must be the address of some free function -kernel that is defined in the calling application. +_Constraints:_ Available only if `State` is `bundle_state::executable` and if +`is_kernel_v` is `true`. _Returns:_ If the kernel whose address is `Func` resides in this kernel bundle, returns the `kernel` object representing that kernel. @@ -696,18 +797,6 @@ int main() { + Where `kfp` would have some nicer name. -* Can the front-end define a trait like this, which returns true only if the - address is a free function kernel: -+ -``` -template -bool is_kernel_function_v; -``` -+ -If the front-end can provide this, we can provide a nice diagnostic when the -user passes an invalid address to the kernel bundle functions like -`template kernel_id get_kernel_id()`. - * We need to investigate whether there will be problems passing kernel arguments that are "decomposed" by the front-end. For example, if a kernel argument is a struct that contains an accessor as a From b94c5950ad16a58217915ba238641e7be8915d18 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Mon, 11 Dec 2023 14:52:59 -0500 Subject: [PATCH 17/29] Improve note about where kernels can be defined --- .../sycl_ext_oneapi_free_function_kernels.asciidoc | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc index 6eb1d90311e37..444e40672e879 100644 --- a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc @@ -337,7 +337,12 @@ The helper trait `is_kernel_v` provides the value of `value`. === New kernel bundle member functions This extension adds the following new functions which add kernel bundle support -for free function kernels: +for free function kernels. + +[_Note:_ Many of the functions in this section have a template parameter +`Func`, which identifies a free function kernel. +This kernel function may be defined in any translation unit in the application. +_{endnote}_] |==== a| @@ -357,10 +362,6 @@ kernel_id get_kernel_id(); _Constraints_: Available only if `is_kernel_v` is `true`. -[_Note:_ The function `Func` need not be defined in the same translation unit -as the call to `get_kernel_id`. -_{endnote}_] - _Returns:_ The kernel identifier that is associated with that kernel. !==== From 8162fd28d35a312134f27eae604be6ad281a2f8a Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Mon, 11 Dec 2023 16:32:58 -0500 Subject: [PATCH 18/29] Update issues on error checking --- ...ycl_ext_oneapi_free_function_kernels.asciidoc | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc index 444e40672e879..d024510d176c8 100644 --- a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc @@ -835,12 +835,16 @@ Where `kfp` would have some nicer name. `single_task`. Should we go a step further and require an exception to be thrown in these cases? - I'm pretty sure we can implement this in {dpcpp}, but I'm not sure about a - library-only implementation. - However, I'm not sure _any_ of this can be implemented without compiler - support. + I'm a little hesitant to require an error check here because this is on the + critical path for enqueuing a kernel. + However, {dpcpp} is still allowed to throw an exception in this case if the + overhead is not too high (I'd suggest `errc::invalid`). + I think we should decide during implementation whether the overhead is + minimal enough that we can mandate an error in the spec. * We currently say it is UB if a free function kernel is enqueued without setting a value for each of its arguments. - Should we go a step further and require an exception in this case? - This seems easier to implement, even for library-only. + Should we go a step further and require an exception in this case (again + probably `errc::invalid`). + Again, I think we should decide during implementation whether the overhead is + minimal enough that we can mandate an error in the spec. From 10387fcc2c5c6dcf61e2e5f6133d22d4de7a524d Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Mon, 11 Dec 2023 16:58:15 -0500 Subject: [PATCH 19/29] Add section on implementation notes --- ..._ext_oneapi_free_function_kernels.asciidoc | 39 +++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc index d024510d176c8..c3216d6ce8ade 100644 --- a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc @@ -763,6 +763,45 @@ int main() { ``` +== Implementation notes + +=== Integration header + +My expectation is that {dpcpp} will use the integration header to implement the +traits and the queries like `get_kernel_id()`. +The integration header will probably start with forward declarations of types +used for the parameters to the free function kernels. +Following this, the header can contain forward declarations of the free +function kernels themselves. +In order to avoid problems where functions with the same name in different +namespaces "shadow" each other, the structure can look like this: + +``` +SYCL_EXT_ONEAPI_RANGE_KERNEL(1) +void same_name(int arg1); +static constexpr auto __sycl_shim1() {return (void(*)(int))same_name;} + +inline namespace { + SYCL_EXT_ONEAPI_RANGE_KERNEL(1) + void same_name(int arg1); + static constexpr auto __sycl_shim2() {return (void(*)(int))same_name;} +} + +namespace sycl { + template<> struct is_range_kernel<__sycl_shim1()> : std::true_type {}; + template<> struct is_range_kernel<__sycl_shim2()> : std::true_type {}; +} +``` + +The helper functions `+__sycl_shim1+`, etc. avoid the shadowing problem because +they are defined in the same namespace as the user's kernel function. +Thus, the {cpp} unqualified name lookup algorithm, finds the correct function +definition. +However, each helper function has a unique name, so it can be uniquely +identified from the `sycl` namespace, where it is called to specialize the +`is_range_kernel` trait. + + == Issues * We're pretty sure that we want to define some syntax that allows a free From c0f2029a3360ea105fbb6859ae98c025a9d120dd Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Tue, 19 Dec 2023 13:43:23 -0500 Subject: [PATCH 20/29] Resolve issues about decomposed and optimized args We decided that free function kernels can accept all legal argument types, even those types that need to be decomposed. Remove the open issue and add an implementation note. Do the same for the issue about arguments that have been optimized away. It seems like this can be implemented with similar logic that will be needed for the decomposed arguments. However, this implementation note does not require implementations to perform these optimizations. It merely points out the ramifications. --- ..._ext_oneapi_free_function_kernels.asciidoc | 77 +++++++++++-------- 1 file changed, 45 insertions(+), 32 deletions(-) diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc index c3216d6ce8ade..9dfeb918fd57a 100644 --- a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc @@ -801,6 +801,51 @@ However, each helper function has a unique name, so it can be uniquely identified from the `sycl` namespace, where it is called to specialize the `is_range_kernel` trait. +=== Decomposed kernel arguments + +The {dpcpp} implementation currently "decomposes" certain kernel argument +types, meaning that some argument types are actually passed as several separate +arguments when the SYCL runtime invokes the kernel using the underlying +backend. +For example, `accessor` consists of several internal member variables. +On the OpenCL backend, one of these member variables is `cl_mem`, and OpenCL +restrictions require this variable to be passed directly as an OpenCL kernel +argument. +(It cannot be passed as a member embedded within a structure.) +As a result, {dpcpp} passes each member variable as a separate OpenCL kernel +argument. + +An argument like this that is decomposed is still represent as a single +argument in SYCL source code. +When invoking a free function kernel, the application sets the value of such an +argument with a single call to `handler::set_arg`. +For example, the application sets the value of an `accessor` by calling +`set_arg(acc)`, where `acc` is a variable of type `accessor`. + +It is the responsibility of the implementation to translate these calls to +`set_arg` into multiple backend argument-setting calls when necessary. +For example, a call to `set_arg(acc)` may actually result in several OpenCL +calls to `clSetKernelArg`, one for each of the member variables in `accessor`. + +=== Kernel arguments that are optimized away + +The {dpcpp} implementation currently has the ability to optimize away unused +kernel arguments. +For example, if a kernel is declared to take an argument `foo` which is never +used by the kernel, the implementation may eliminate the argument entirely and +avoid calling the backend argument-setting API. +It is still possible to perform these sorts of optimizations for a free +function kernel, but the logic inside of `handler::set_arg` needs to know when +an argument has been optimized away. + +Of course, the application is still responsible for calling `set_arg` for all +kernel arguments, even if the implementation has optimized the argument away. +(The application has no way of knowing whether the optimization has been +performed.) +Therefore, `set_arg` must know whether the argument has been optimized away, +and it must not call the underlying backend argument-setting API for such an +argument, effectively turning the call into a no-op. + == Issues @@ -837,38 +882,6 @@ int main() { + Where `kfp` would have some nicer name. -* We need to investigate whether there will be problems passing kernel - arguments that are "decomposed" by the front-end. - For example, if a kernel argument is a struct that contains an accessor as a - member variable, the front-end decomposes the struct, passing each member - variable as a separate kernel argument. - We could still support arguments like this if `handler::set_arg` is smart - enough to also do the decomposition, passing multiple arguments when the - argument type requires decomposition. - If this is too difficult to implement, we could restrict the arguments to - only those types that do not require decomposition, however this would be a - big limitation because `accessor` and `local_accessor` would both be - prohibited. - If we add this restriction, the front-end should diagnose an error if a - free function kernel is defined to take such an argument. - -* There is a similar problem with kernel arguments that have been optimized. - Consider a kernel that uses an `accessor`. - Each member variable is passed as a separate kernel argument, but any members - that are unused in the kernel are optimized away, thus they have no - corresponding kernel argument. - Again, we can handle this by making `handler::set_arg` smart enough to know - which member variables have been optimized away. - Alternatively, we can disable these optimizations for free function kernels. - We also have to consider the behavior when a free function kernel has a - formal parameter that is unused (or optimized away) inside the kernel. - Can the compiler optimize away such an argument? - If so, `handler::set_arg` would need to be smart enough to treat an attempt - to set the value of such an argument as a no-op. - The `handler::set_arg` function would also need to account for this when - interpreting the argument index of arguments that follow an optimized-away - argument. - * We currently say it is UB if there is a mismatch between a free function kernel's type or dimensionality and the call to `parallel_for` or `single_task`. From 0fe104f983c9602f638681c2e08b377f99470676 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Wed, 20 Dec 2023 15:38:35 -0500 Subject: [PATCH 21/29] Add issue about using function properties --- ..._ext_oneapi_free_function_kernels.asciidoc | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc index 9dfeb918fd57a..295228488faa3 100644 --- a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc @@ -882,6 +882,25 @@ int main() { + Where `kfp` would have some nicer name. +* We're not sure we like the current syntax with the macros like + `SYCL_EXT_ONEAPI_RANGE_KERNEL`. + Instead, we'd prefer to use the general function property syntax, which would + lead to code like this: ++ +``` +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(syclex::range_kernel<1>) +void iota(float start, float *ptr) { /*...*/ } +``` ++ +Although this is more verbose, it uses the existing property mechanism rather +than adding a new macro. ++ +The downside of this approach is that it requires the compiler front end to +recognize the specific property named `range_kernel` and treat this as an +indicator that the function is a kernel. +Currently, the front end does not recognize any specific properties. +Instead, it just has generic code that propagates the properties into the IR. + * We currently say it is UB if there is a mismatch between a free function kernel's type or dimensionality and the call to `parallel_for` or `single_task`. From aa94401c618610c6c8e375ec94540ec4a1951aa8 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Wed, 10 Jan 2024 16:00:58 -0500 Subject: [PATCH 22/29] Use property instead of macro We decided that it is OK for the front end to recognize some of the compile time properties. Therefore, we can identify a function as a kernel by decorating it with a property, rather than using a special macro. --- ..._ext_oneapi_free_function_kernels.asciidoc | 235 +++++++++++------- 1 file changed, 143 insertions(+), 92 deletions(-) diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc index 295228488faa3..9a1a7dc41f71c 100644 --- a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc @@ -21,7 +21,7 @@ == Notice [%hardbreaks] -Copyright (C) 2023-2023 Intel Corporation. All rights reserved. +Copyright (C) 2023-2024 Intel Corporation. All rights reserved. Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks of The Khronos Group Inc. @@ -112,11 +112,10 @@ supports. === Defining a free function kernel A free function kernel is a normal C++ function definition, where the function -declaration is decorated with one of the following macros: -`SYCL_EXT_ONEAPI_RANGE_KERNEL`, `SYCL_EXT_ONEAPI_ND_RANGE_KERNEL`, or -`SYCL_EXT_ONEAPI_SINGLE_TASK_KERNEL`. +declaration is decorated with one of the following compile-time properties: +`range_kernel`, `nd_range_kernel`, or `single_task_kernel`. -When a function declaration is decorated with one of these macros, the +When a function declaration is decorated with one of these properties, the following rules must be observed: * The function must be declared at either namespace scope or at class scope as @@ -135,72 +134,135 @@ following rules must be observed: * All declarations of the function must provide no default parameter values. -* The position of the macro decoration must be the same as the position allowed - for C++ attribute decorations on the function declaration. - For example: -+ -``` -SYCL_EXT_ONEAPI_RANGE_KERNEL(1) -void iota(float start, float *ptr); -``` - -* The macro decoration must appear on the first declaration of the function - in the translation unit. +* The property must appear on the first declaration of the function in the + translation unit. Redeclarations of the function may optionally be decorated with the same - macro if the macro arguments are the same. + property if the property argument is the same. The effect is the same regardless of whether redeclarations are so decorated. -* The same function may be decorated with at most one of these macros. +* The same function may be decorated with at most one of these properties. However, the same function may be decorated multiple times with the same - macro (with the same arguments). - Programs that decorate the same function with more than one of these macros - or with multiple instances of the same macro with different arguments are ill - formed. + property (with the same argument). + Programs that decorate the same function with more than one of these + properties or with multiple instances of the same property with different + argument are ill formed. -* If a function is decorated with one of these macros in one translation unit, - any other translation unit that declares the same function must also decorate - the function with the same macro (with the same arguments). +* If a function is decorated with one of these properties in one translation + unit, any other translation unit that declares the same function must also + decorate the function with the same property (with the same argument). -The following table provides additional details about these macros: +The following table provides additional details about these compile-time +properties. |==== a| +*`range_kernel` property* + [frame=all,grid=none] !==== a! [source] ---- -#define SYCL_EXT_ONEAPI_RANGE_KERNEL(Dims) /*unspecified*/ +namespace sycl::ext::oneapi::experimental { + +struct range_kernel_key { + template + using value_t = property_value; +}; + +template +inline constexpr range_kernel_key::value_t range_kernel; + +template<> +struct is_property_key : std::true_type {}; + +} // namespace sycl::ext::oneapi::experimental ---- !==== Indicates that the function is a free function kernel that is invoked with a simple `range` iteration space of `Dims` dimensions. -The `Dims` parameter must be an integral constant expression that specifies a -valid number of dimensions for the `range` class. + +The `property_value` struct has the following member variables: + +[%header,cols="1,1"] +!==== +!Member +!Description + +a! +[source] +---- +static constexpr int dimensions = Dims +---- +! +The number of dimensions of the kernel's range. +!==== a| +*`nd_range_kernel` property* + [frame=all,grid=none] !==== a! [source] ---- -#define SYCL_EXT_ONEAPI_ND_RANGE_KERNEL(Dims) /*unspecified*/ +namespace sycl::ext::oneapi::experimental { + +struct nd_range_kernel_key { + template + using value_t = property_value; +}; + +template +inline constexpr nd_range_kernel_key::value_t nd_range_kernel; + +template<> +struct is_property_key : std::true_type {}; + +} // namespace sycl::ext::oneapi::experimental ---- !==== Indicates that the function is a free function kernel that is invoked with an `nd_range` iteration space of `Dims` dimensions. -The `Dims` parameter must be an integral constant expression that specifies a -valid number of dimensions for the `nd_range` class. + +The `property_value` struct has the following member variables: + +[%header,cols="1,1"] +!==== +!Member +!Description + +a! +[source] +---- +static constexpr int dimensions = Dims +---- +! +The number of dimensions of the kernel's range. +!==== a| +*`single_task_kernel` property* + [frame=all,grid=none] !==== a! [source] ---- -#define SYCL_EXT_ONEAPI_SINGLE_TASK_KERNEL /*unspecified */ +namespace sycl::ext::oneapi::experimental { + +struct single_task_kernel_key { + using value_t = property_value; +}; + +inline constexpr single_task_kernel_key::value_t single_task_kernel; + +template<> +struct is_property_key : std::true_type {}; + +} // namespace sycl::ext::oneapi::experimental ---- !==== @@ -215,15 +277,15 @@ The following example demonstrates how a free function kernel using a 3-dimensional nd-range iteration space can be defined: ``` -SYCL_EXT_ONEAPI_ND_RANGE_KERNEL(3) +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::nd_range_kernel<3>)) void iota(float start, float *ptr) { // ... } ``` -A function decorated with one of these macros can still be called as a normal -function in either host or device code. -The macro has no effect in such cases. +A function decorated with one of these properties can still be called as a +normal function in either host or device code. +The property has no effect in such cases. === New traits for kernel functions @@ -250,8 +312,8 @@ inline constexpr bool is_range_kernel_v = is_range_kernel::value; !==== If `Func` is the address of a function whose declaration is decorated with the -`SYCL_EXT_ONEAPI_RANGE_KERNEL(Dims)` macro, the `is_range_kernel` trait -provides the member constant `value` equal to `true`. +`range_kernel` property, the `is_range_kernel` trait provides +the member constant `value` equal to `true`. Otherwise `value` is `false`. The helper trait `is_range_kernel_v` provides the value of `value`. @@ -275,7 +337,7 @@ inline constexpr bool is_nd_range_kernel_v = is_nd_range_kernel::val !==== If `Func` is the address of a function whose declaration is decorated with the -`SYCL_EXT_ONEAPI_ND_RANGE_KERNEL(Dims)` macro, the `is_nd_range_kernel` trait +`nd_range_kernel` property, the `is_nd_range_kernel` trait provides the member constant `value` equal to `true`. Otherwise `value` is `false`. @@ -300,8 +362,8 @@ inline constexpr bool is_single_task_kernel_v = is_single_task_kernel::val !==== If `Func` is the address of a function whose declaration is decorated with the -`SYCL_EXT_ONEAPI_SINGLE_TASK_KERNEL` macro, the `is_single_task_kernel` trait -provides the member constant `value` equal to `true`. +`single_task_kernel` property, the `is_single_task_kernel` trait provides +the member constant `value` equal to `true`. Otherwise `value` is `false`. The helper trait `is_single_task_kernel_v` provides the value of `value`. @@ -325,9 +387,9 @@ inline constexpr bool is_kernel_v = is_kernel::value; !==== If `Func` is the address of a function whose declaration is decorated with any -of the macros `SYCL_EXT_ONEAPI_RANGE_KERNEL`, -`SYCL_EXT_ONEAPI_ND_RANGE_KERNEL`, or `SYCL_EXT_ONEAPI_SINGLE_TASK_KERNEL`; the -`is_kernel` trait provides the member constant `value` equal to `true`. +of the properties `range_kernel`, `nd_range_kernel`, or `single_task_kernel`; +the `is_kernel` trait provides the member constant `value` equal to +`true`. Otherwise `value` is `false`. The helper trait `is_kernel_v` provides the value of `value`. @@ -532,13 +594,12 @@ Once the application obtains a `kernel` object for a free function kernel, it can enqueue the kernel to a device using any of the SYCL functions that allow a kernel to be enqueued via a `kernel` object. The application must enqueue the free function kernel according to its type. -For example, a free function kernel defined via `SYCL_EXT_ONEAPI_RANGE_KERNEL` -can be enqueued by calling the `handler::parallel_for` overload taking a -`range`. -A free function kernel defined via `SYCL_EXT_ONEAPI_ND_RANGE_KERNEL` can be -enqueued by calling the `handler::parallel_for` overload taking an `nd_range`. -A free function kernel defined via `SYCL_EXT_ONEAPI_SINGLE_TASK_KERNEL` can be -enqueued by calling `handler::single_task`. +For example, a free function kernel defined via `range_kernel` can be enqueued +by calling the `handler::parallel_for` overload taking a `range`. +A free function kernel defined via `nd_range_kernel` can be enqueued by calling +the `handler::parallel_for` overload taking an `nd_range`. +A free function kernel defined via `single_task_kernel` can be enqueued by +calling `handler::single_task`. Attempting to enqueue a free function kernel using a mechanism that does not match its type results in undefined behavior. @@ -591,18 +652,17 @@ with these properties by applying the properties to the function declaration as illustrated below. ``` -SYCL_EXT_ONEAPI_RANGE_KERNEL(1) -SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(syclex::work_group_size<32>) +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::range_kernel<1>)) +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::work_group_size<32>)) void iota(float start, float *ptr) { // ... } ``` -The `SYCL_EXT_ONEAPI_FUNCTION_PROPERTY` decorations may appear either before or -after the macro decorations that identify the function as a free function -kernel. +The kernel properties may appear either before or after the `range_kernel`, +`nd_range_kernel`, or `single_task_kernel` property. -As with standard SYCL kernels, these properties can be queried via +As with standard SYCL kernels, these kernel properties can be queried via `kernel::get_info` using either the `info::kernel::attributes` information descriptor or the `info::kernel_device_specific` information descriptors. @@ -633,17 +693,27 @@ enum myenum : int; // Each kernel is forward declared in the same namespace in which the // application declares it. -SYCL_EXT_ONEAPI_RANGE_KERNEL(1) void kernel1(int *); -SYCL_EXT_ONEAPI_RANGE_KERNEL(1) void kernel2(mystruct, myenum); +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::range_kernel<1>)) +void kernel1(int *); +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::range_kernel<1>)) +void kernel2(mystruct, myenum); template -SYCL_EXT_ONEAPI_RANGE_KERNEL(1) void kernel3(T *); +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::range_kernel<1>)) +void kernel3(T *); namespace ns { -SYCL_EXT_ONEAPI_RANGE_KERNEL(1) void kernel4(int *); +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::range_kernel<1>)) +void kernel4(int *); } ``` +(The lines using `SYCL_EXT_ONEAPI_FUNCTION_PROPERTY` are exposition-only. +Implementations will probably emit some implementation-specific code here +instead of using the macro because the macro and the `range_kernel` property +are probably defined in the `` header, which does not get +get included until after the integration header.) + As a result, these implementations impose additional restrictions for functions that are declared as free function kernels: @@ -668,7 +738,7 @@ void caller() { // in the integration header } -SYCL_EXT_ONEAPI_RANGE_KERNEL(1) +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::range_kernel<1>)) void foo(float) {/*...*/} ``` @@ -682,7 +752,7 @@ The allowed types are: * A type alias to one of the above types. -== Example +== Examples === Basic invocation @@ -694,7 +764,7 @@ namespace syclex = sycl::ext::oneapi::experimental; static constexpr size_t NUM = 1024; -SYCL_EXT_ONEAPI_RANGE_KERNEL(1) +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::range_kernel<1>)) void iota(float start, float *ptr) { // Get the ID of this kernel iteration. size_t id = syclex::this_kernel::get_id(); @@ -734,17 +804,17 @@ cases. namespace syclex = sycl::ext::oneapi::experimental; template -SYCL_EXT_ONEAPI_RANGE_KERNEL(1) +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::range_kernel<1>)) void iota(T start, T *ptr) { // ... } -SYCL_EXT_ONEAPI_SINGLE_TASK_KERNEL +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::single_task_kernel)) void ping(float *x) { // ... } -SYCL_EXT_ONEAPI_SINGLE_TASK_KERNEL +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::single_task_kernel)) void ping(int *x) { // ... } @@ -777,12 +847,12 @@ In order to avoid problems where functions with the same name in different namespaces "shadow" each other, the structure can look like this: ``` -SYCL_EXT_ONEAPI_RANGE_KERNEL(1) +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::range_kernel<1>)) void same_name(int arg1); static constexpr auto __sycl_shim1() {return (void(*)(int))same_name;} inline namespace { - SYCL_EXT_ONEAPI_RANGE_KERNEL(1) + SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::range_kernel<1>)) void same_name(int arg1); static constexpr auto __sycl_shim2() {return (void(*)(int))same_name;} } @@ -857,7 +927,7 @@ argument, effectively turning the call into a no-op. One option is like this: + ``` -SYCL_EXT_ONEAPI_RANGE_KERNEL(1) +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::range_kernel<1>)) void iota(float start, float *ptr) { /*...*/ } int main() { @@ -870,7 +940,7 @@ int main() { Another option is like this: + ``` -SYCL_EXT_ONEAPI_RANGE_KERNEL(1) +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::range_kernel<1>)) void iota(float start, float *ptr) { /*...*/ } int main() { @@ -882,25 +952,6 @@ int main() { + Where `kfp` would have some nicer name. -* We're not sure we like the current syntax with the macros like - `SYCL_EXT_ONEAPI_RANGE_KERNEL`. - Instead, we'd prefer to use the general function property syntax, which would - lead to code like this: -+ -``` -SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(syclex::range_kernel<1>) -void iota(float start, float *ptr) { /*...*/ } -``` -+ -Although this is more verbose, it uses the existing property mechanism rather -than adding a new macro. -+ -The downside of this approach is that it requires the compiler front end to -recognize the specific property named `range_kernel` and treat this as an -indicator that the function is a kernel. -Currently, the front end does not recognize any specific properties. -Instead, it just has generic code that propagates the properties into the IR. - * We currently say it is UB if there is a mismatch between a free function kernel's type or dimensionality and the call to `parallel_for` or `single_task`. @@ -916,6 +967,6 @@ Instead, it just has generic code that propagates the properties into the IR. * We currently say it is UB if a free function kernel is enqueued without setting a value for each of its arguments. Should we go a step further and require an exception in this case (again - probably `errc::invalid`). + probably `errc::invalid`)? Again, I think we should decide during implementation whether the overhead is minimal enough that we can mandate an error in the spec. From 796a3022744f7c42f2a1dc4a4477959e80a75f75 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Wed, 10 Jan 2024 17:30:13 -0500 Subject: [PATCH 23/29] Add include of Somehow we forgot to add this `#include` to the examples. --- .../proposed/sycl_ext_oneapi_free_function_kernels.asciidoc | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc index 9a1a7dc41f71c..71d2b91c10732 100644 --- a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc @@ -760,6 +760,7 @@ The following example demonstrates how to define a free function kernel and then enqueue it on a device. ``` +#include namespace syclex = sycl::ext::oneapi::experimental; static constexpr size_t NUM = 1024; @@ -801,6 +802,7 @@ The following example demonstrates how to get a kernel identifier in such cases. ``` +#include namespace syclex = sycl::ext::oneapi::experimental; template From f9df9a2374c135b0e1b892d20ab35bdaf0fbd9a7 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Wed, 17 Jan 2024 09:29:30 -0500 Subject: [PATCH 24/29] Add dependence on properties extension --- .../proposed/sycl_ext_oneapi_free_function_kernels.asciidoc | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc index 71d2b91c10732..a834e1950137f 100644 --- a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc @@ -45,6 +45,8 @@ This extension also depends on the following other SYCL extensions: * link:../proposed/sycl_ext_oneapi_free_function_queries.asciidoc[ sycl_ext_oneapi_free_function_queries] +* link:../experimental/sycl_ext_oneapi_properties.asciidoc[ + sycl_ext_oneapi_properties] == Status From 94f102a458c2206a482ae5fba9368c8eea1e6d63 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Wed, 17 Jan 2024 09:38:26 -0500 Subject: [PATCH 25/29] Add dependence on kernel properties extension The macro `SYCL_EXT_ONEAPI_FUNCTION_PROPERTY` is defined in this extension. In addition, it's not clear how viable free function kernels are if we do not provide some way to decorate them with things like the required sub-group size. Therefore, it seems like we may as well list the kernel properties extension as a dependency. --- .../sycl_ext_oneapi_free_function_kernels.asciidoc | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc index a834e1950137f..adaba710a80cf 100644 --- a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc @@ -47,6 +47,8 @@ This extension also depends on the following other SYCL extensions: sycl_ext_oneapi_free_function_queries] * link:../experimental/sycl_ext_oneapi_properties.asciidoc[ sycl_ext_oneapi_properties] +* link:../experimental/sycl_ext_oneapi_kernel_properties.asciidoc[ + sycl_ext_oneapi_kernel_properties] == Status @@ -647,11 +649,10 @@ _{endnote}_] === Interaction with kernel properties -If the implementation supports -link:../experimental/sycl_ext_oneapi_kernel_properties.asciidoc[ -sycl_ext_oneapi_kernel_properties], a free function kernel may be decorated -with these properties by applying the properties to the function declaration as -illustrated below. +A free function kernel may also be decorated with any of the properties defined +in link:../experimental/sycl_ext_oneapi_kernel_properties.asciidoc[ +sycl_ext_oneapi_kernel_properties] by applying the properties to the function +declaration as illustrated below. ``` SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::range_kernel<1>)) From 579f4610a506ea6373f8cfd2304bb5b7080e6978 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Wed, 17 Jan 2024 10:15:12 -0500 Subject: [PATCH 26/29] Forbid setting values for non-existent kernel args --- .../sycl_ext_oneapi_free_function_kernels.asciidoc | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc index adaba710a80cf..246ec93594abd 100644 --- a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc @@ -616,11 +616,15 @@ arguments when the kernel is enqueued. For example, when enqueuing a kernel with `handler::parallel_for` or `handler::single_task`, the kernel argument values must be set via `handler::set_arg` or `handler::set_args`. -Failing to set the value of a kernel argument results in undefined behavior. -The type of the value passed to `handler::set_arg` or `handler::set_args` must -be the same as the type of the corresponding formal parameter in the free -function kernel. -Passing a value with a mismatched type results in undefined behavior. +The application must abide by the following rules, otherwise the behavior is +undefined: + +* The application must set a value for each kernel argument. +* The application must not set a value for a kernel argument that does not + exist (e.g. specifying an argument index to `handler::set_arg` that is out of + range). +* The type of the expression used to set the argument's value must match the + type of the corresponding formal parameter in the free function kernel. === Obtaining the iteration id for a kernel From ecefa0091901f28cde063dc5647162af93157b20 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Sun, 21 Jan 2024 11:37:53 -0500 Subject: [PATCH 27/29] Add issue about initial iteration index parameter Add an issue to capture our discussion about whether we should support a free function kernel syntax where the initial parameter is the iteration index. --- ..._ext_oneapi_free_function_kernels.asciidoc | 54 +++++++++++++++++++ 1 file changed, 54 insertions(+) diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc index 246ec93594abd..e75f3697f97b6 100644 --- a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc @@ -961,6 +961,60 @@ int main() { + Where `kfp` would have some nicer name. +* We are debating whether we should allow a free function kernel to be defined + with an initial "iteration index" parameter such as: ++ +-- +``` +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::range_kernel<1>)) +void iota(sycl::item<1> item, float start, float *ptr) { /*...*/ } +``` + +The advantage is that the user wouldn't need to use the functions in +link:../proposed/sycl_ext_oneapi_free_function_queries.asciidoc[ +sycl_ext_oneapi_free_function_queries] to get the iteration index. +Doing this raises some new questions, though: + +** When the application sets the value of a kernel parameter via `set_arg`, + does argument index `0` correspond to the `item` or to the first parameter + after `item`? + For example, to set the value of `start` in the example above, does the + application call `+set_arg(0, ...)+` or `+set_arg(1, ...)+`? + Both seem like reasonable choices, so many users may need to read the + documentation to determine what is right. + +** If the first parameter is an index like `sycl::item<1>`, then the property + `syclex::range_kernel<1>` is somewhat redundant. + Should the compiler raise a diagnostic if they do not match? + Or, should we invent a new property like: ++ +``` +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::kernel_function)) +void iota(sycl::item<1> item, float start, float *ptr) { /*...*/ } +``` + +** In standard SYCL kernels, the iteration index can be anything that is + convertible from `sycl::item` or `sycl::nd_item`. + For example, it is common to use `id` for range kernels or `int` for + 1-dimensional range kernels. + However, both `id` and `int` can also be used as kernel parameters. + Therefore, something like this is ambiguous: ++ +``` +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::range_kernel<1>)) +void iota(sycl::id<1> i, float start, float *ptr) { /*...*/ } +``` ++ +Is `i` the kernel's iteration index, or is it simply a kernel argument whose +type is `sycl::id`? +-- ++ +We agreed that we _do_ need to support free function kernels that do not have +an initial iteration index parameter (as this spec is currently written) +because this is necessary when migrating some CUDA code. +Therefore, the question is whether we _also_ want to support a syntax where the +first parameter is an iteration index. + * We currently say it is UB if there is a mismatch between a free function kernel's type or dimensionality and the call to `parallel_for` or `single_task`. From 4d4535332b85e2e9ff88828b5f8baf9bd16a4a0d Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Sun, 21 Jan 2024 11:54:44 -0500 Subject: [PATCH 28/29] Add implementation note about diagnostics Address the review comment about requiring an implementation to emit a diagnostic if the kernel declaration violates the restrictions that the spec mandates. We're not sure yet if the spec should mandate a diagnostic, but we do think that a high quality implementation should do this. Therefore, add an implementation note stating that we expect DPC++ to issue a diagnostic in these cases. --- ...ycl_ext_oneapi_free_function_kernels.asciidoc | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc index e75f3697f97b6..f046d188ca810 100644 --- a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc @@ -844,6 +844,22 @@ int main() { == Implementation notes +=== Compiler diagnostics + +My expectation is that {dpcpp} will emit a diagnostic if a function is +decorated as a free function kernel (e.g. via `syclex::range_kernel`) and the +function violates any of the restrictions listed above under "Defining a free +function kernel". +(Except, of course, no diagnostic is required for violations of the last bullet +because that cannot be diagnosed when compiling a single translation unit.) + +It is probably not practical to diagnose violations for all the extra +restrictions listed under "Restrictions for integration header +implementations". +However, we should diagnose as many as are practical. +In particular, it seems easy to emit a diagnostic if a free function kernel +is defined as a static member function. + === Integration header My expectation is that {dpcpp} will use the integration header to implement the From 6ed0ec5e4f889c7c6b92f616313437e293a18ab9 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Sun, 21 Jan 2024 12:17:53 -0500 Subject: [PATCH 29/29] Add issue about requiring diagnostics --- .../sycl_ext_oneapi_free_function_kernels.asciidoc | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc index f046d188ca810..bb4e0f26aeaac 100644 --- a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc @@ -1031,6 +1031,14 @@ because this is necessary when migrating some CUDA code. Therefore, the question is whether we _also_ want to support a syntax where the first parameter is an iteration index. +* Should the spec require an implementation to emit a diagnostic if a free + function kernel violates the restrictions listed in "Defining a free function + kernel"? + For now, I've listed this under "Implementation notes" because I expect + {dpcpp} to emit a diagnostic in this case. + We should decide if it is reasonable to require a diagnostic for all + implementations of this extensions. + * We currently say it is UB if there is a mismatch between a free function kernel's type or dimensionality and the call to `parallel_for` or `single_task`.