diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc new file mode 100644 index 0000000000000..bb4e0f26aeaac --- /dev/null +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc @@ -0,0 +1,1059 @@ += sycl_ext_oneapi_free_function_kernels + +:source-highlighter: coderay +:coderay-linenums-mode: table + +// This section needs to be after the document title. +:doctype: book +:toc2: +:toc: left +:encoding: utf-8 +:lang: en +:dpcpp: pass:[DPC++] +:endnote: —{nbsp}end{nbsp}note + +// Set the default source code type in this document to C++, +// for syntax highlighting purposes. This is needed because +// docbook uses c++ and html5 uses cpp. +:language: {basebackend@docbook:c++:cpp} + + +== Notice + +[%hardbreaks] +Copyright (C) 2023-2024 Intel Corporation. All rights reserved. + +Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks +of The Khronos Group Inc. +OpenCL(TM) is a trademark of Apple Inc. used by permission by Khronos. + + +== Contact + +To report problems with this extension, please open a new issue at: + +https://github.com/intel/llvm/issues + + +== Dependencies + +This extension is written against the SYCL 2020 revision 8 specification. +All references below to the "core SYCL specification" or to section numbers in +the SYCL specification refer to that revision. + +This extension also depends on the following other SYCL extensions: + +* link:../proposed/sycl_ext_oneapi_free_function_queries.asciidoc[ + sycl_ext_oneapi_free_function_queries] +* link:../experimental/sycl_ext_oneapi_properties.asciidoc[ + sycl_ext_oneapi_properties] +* link:../experimental/sycl_ext_oneapi_kernel_properties.asciidoc[ + sycl_ext_oneapi_kernel_properties] + + +== Status + +This is a proposed extension specification, intended to gather community +feedback. +Interfaces defined in this specification may not be implemented yet or may be +in a preliminary state. +The specification itself may also change in incompatible ways before it is +finalized. +*Shipping software products should not rely on APIs defined in this +specification.* + + +== Overview + +This extension introduces a new way to define a kernel as a simple C++ +function, where the kernel arguments are parameters to the function. +This is different from standard SYCL kernels, where the kernel arguments are +either captures of a lambda expression or member variables of a callable +object. + +The primary motivation for this extension is the +link:../proposed/sycl_ext_oneapi_kernel_compiler.asciidoc[ +sycl_ext_oneapi_kernel_compiler], which allows online compilation of a kernel +from source code. +These kernels must have a clean separation between host and device code, and +they must have a defined order to their arguments because the application sets +the argument values by their "index" via `handler::set_arg(index, value)`. +Because variables captured by lambda expressions have no defined order, we need +some other way to define kernel arguments for use with the +sycl_ext_oneapi_kernel_compiler extension, and the +sycl_ext_oneapi_free_function_kernels extension provides that mechanism. +The "free function kernel" feature is a separate extension, though, because it +can also be used independently from sycl_ext_oneapi_kernel_compiler. +For example, some users may find it more familiar to define kernels as plain +functions or they might have other reasons to prefer a clean separation between +host and device code. + + +== Specification + +=== Feature test macro + +This extension provides a feature-test macro as described in the core SYCL +specification. +An implementation supporting this extension must predefine the macro +`SYCL_EXT_ONEAPI_FREE_FUNCTION_KERNELS` +to one of the values defined in the table below. +Applications can test for the existence of this macro to determine if the +implementation supports this feature, or applications can test the macro's +value to determine which of the extension's features the implementation +supports. + +[%header,cols="1,5"] +|=== +|Value +|Description + +|1 +|The APIs of this experimental extension are not versioned, so the + feature-test macro always has this value. +|=== + +=== Defining a free function kernel + +A free function kernel is a normal C++ function definition, where the function +declaration is decorated with one of the following compile-time properties: +`range_kernel`, `nd_range_kernel`, or `single_task_kernel`. + +When a function declaration is decorated with one of these properties, the +following rules must be observed: + +* The function must be declared at either namespace scope or at class scope as + a static member function. + +* The function's return type must be `void`. + +* The function must not accept variadic arguments. + +* Each of the function's arguments must have a type that is an allowed kernel + parameter type as specified in section 4.12.4 "Rules for parameter passing to + kernels" of the core SYCL specification. + The function must not be declared with parameters of type `reducer` or + `kernel_handler`. + These special kernel arguments cannot be passed to a free function kernel. + +* All declarations of the function must provide no default parameter values. + +* The property must appear on the first declaration of the function in the + translation unit. + Redeclarations of the function may optionally be decorated with the same + property if the property argument is the same. + The effect is the same regardless of whether redeclarations are so decorated. + +* The same function may be decorated with at most one of these properties. + However, the same function may be decorated multiple times with the same + property (with the same argument). + Programs that decorate the same function with more than one of these + properties or with multiple instances of the same property with different + argument are ill formed. + +* If a function is decorated with one of these properties in one translation + unit, any other translation unit that declares the same function must also + decorate the function with the same property (with the same argument). + +The following table provides additional details about these compile-time +properties. + +|==== +a| +*`range_kernel` property* + +[frame=all,grid=none] +!==== +a! +[source] +---- +namespace sycl::ext::oneapi::experimental { + +struct range_kernel_key { + template + using value_t = property_value; +}; + +template +inline constexpr range_kernel_key::value_t range_kernel; + +template<> +struct is_property_key : std::true_type {}; + +} // namespace sycl::ext::oneapi::experimental +---- +!==== + +Indicates that the function is a free function kernel that is invoked with a +simple `range` iteration space of `Dims` dimensions. + +The `property_value` struct has the following member variables: + +[%header,cols="1,1"] +!==== +!Member +!Description + +a! +[source] +---- +static constexpr int dimensions = Dims +---- +! +The number of dimensions of the kernel's range. +!==== + +a| +*`nd_range_kernel` property* + +[frame=all,grid=none] +!==== +a! +[source] +---- +namespace sycl::ext::oneapi::experimental { + +struct nd_range_kernel_key { + template + using value_t = property_value; +}; + +template +inline constexpr nd_range_kernel_key::value_t nd_range_kernel; + +template<> +struct is_property_key : std::true_type {}; + +} // namespace sycl::ext::oneapi::experimental +---- +!==== + +Indicates that the function is a free function kernel that is invoked with an +`nd_range` iteration space of `Dims` dimensions. + +The `property_value` struct has the following member variables: + +[%header,cols="1,1"] +!==== +!Member +!Description + +a! +[source] +---- +static constexpr int dimensions = Dims +---- +! +The number of dimensions of the kernel's range. +!==== + +a| +*`single_task_kernel` property* + +[frame=all,grid=none] +!==== +a! +[source] +---- +namespace sycl::ext::oneapi::experimental { + +struct single_task_kernel_key { + using value_t = property_value; +}; + +inline constexpr single_task_kernel_key::value_t single_task_kernel; + +template<> +struct is_property_key : std::true_type {}; + +} // namespace sycl::ext::oneapi::experimental +---- +!==== + +Indicates that the function is a free function kernel that is invoked via +`single_task` (i.e. without any iteration space). +|==== + +When a function is defined as a free function kernel, each parameter to the +function is a kernel argument. + +The following example demonstrates how a free function kernel using a +3-dimensional nd-range iteration space can be defined: + +``` +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::nd_range_kernel<3>)) +void iota(float start, float *ptr) { + // ... +} +``` + +A function decorated with one of these properties can still be called as a +normal function in either host or device code. +The property has no effect in such cases. + +=== New traits for kernel functions + +This extension defines the following traits that can be used to tell whether a +function is declared as a free function kernel. + +|==== +a| +[frame=all,grid=none] +!==== +a! +[source] +---- +namespace sycl::ext::oneapi::experimental { + +template +struct is_range_kernel; + +template +inline constexpr bool is_range_kernel_v = is_range_kernel::value; + +} // namespace sycl::ext::oneapi::experimental +---- +!==== + +If `Func` is the address of a function whose declaration is decorated with the +`range_kernel` property, the `is_range_kernel` trait provides +the member constant `value` equal to `true`. +Otherwise `value` is `false`. + +The helper trait `is_range_kernel_v` provides the value of `value`. + +a| +[frame=all,grid=none] +!==== +a! +[source] +---- +namespace sycl::ext::oneapi::experimental { + +template +struct is_nd_range_kernel; + +template +inline constexpr bool is_nd_range_kernel_v = is_nd_range_kernel::value; + +} // namespace sycl::ext::oneapi::experimental +---- +!==== + +If `Func` is the address of a function whose declaration is decorated with the +`nd_range_kernel` property, the `is_nd_range_kernel` trait +provides the member constant `value` equal to `true`. +Otherwise `value` is `false`. + +The helper trait `is_nd_range_kernel_v` provides the value of `value`. + +a| +[frame=all,grid=none] +!==== +a! +[source] +---- +namespace sycl::ext::oneapi::experimental { + +template +struct is_single_task_kernel; + +template +inline constexpr bool is_single_task_kernel_v = is_single_task_kernel::value; + +} // namespace sycl::ext::oneapi::experimental +---- +!==== + +If `Func` is the address of a function whose declaration is decorated with the +`single_task_kernel` property, the `is_single_task_kernel` trait provides +the member constant `value` equal to `true`. +Otherwise `value` is `false`. + +The helper trait `is_single_task_kernel_v` provides the value of `value`. + +a| +[frame=all,grid=none] +!==== +a! +[source] +---- +namespace sycl::ext::oneapi::experimental { + +template +struct is_kernel; + +template +inline constexpr bool is_kernel_v = is_kernel::value; + +} // namespace sycl::ext::oneapi::experimental +---- +!==== + +If `Func` is the address of a function whose declaration is decorated with any +of the properties `range_kernel`, `nd_range_kernel`, or `single_task_kernel`; +the `is_kernel` trait provides the member constant `value` equal to +`true`. +Otherwise `value` is `false`. + +The helper trait `is_kernel_v` provides the value of `value`. +|==== + + +=== New kernel bundle member functions + +This extension adds the following new functions which add kernel bundle support +for free function kernels. + +[_Note:_ Many of the functions in this section have a template parameter +`Func`, which identifies a free function kernel. +This kernel function may be defined in any translation unit in the application. +_{endnote}_] + +|==== +a| +[frame=all,grid=none] +!==== +a! +[source] +---- +namespace sycl::ext::oneapi::experimental { + +template +kernel_id get_kernel_id(); + +} // namespace sycl::ext::oneapi::experimental +---- +!==== + +_Constraints_: Available only if `is_kernel_v` is `true`. + +_Returns:_ The kernel identifier that is associated with that kernel. + +!==== +a! +[source] +---- +namespace sycl::ext::oneapi::experimental { + +template // (1) +kernel_bundle get_kernel_bundle(const context& ctxt); + +template // (2) +kernel_bundle get_kernel_bundle(const context& ctxt, + const std::vector& devs); + +} // namespace sycl::ext::oneapi::experimental +---- +!==== + +_Constraints_: Available only if `is_kernel_v` is `true`. + +_Returns: (1)_ The same value as +`get_kernel_bundle(ctxt, ctxt.get_devices(), {get_kernel_id()})`. + +_Returns: (2)_ The same value as +`get_kernel_bundle(ctxt, devs, {get_kernel_id()})`. + +!==== +a! +[source] +---- +namespace sycl::ext::oneapi::experimental { + +template // (1) +bool has_kernel_bundle(const context& ctxt); + +template // (2) +bool has_kernel_bundle(const context& ctxt, const std::vector& devs); + +} // namespace sycl::ext::oneapi::experimental +---- +!==== + +_Constraints_: Available only if `is_kernel_v` is `true`. + +_Returns: (1)_ The same value as +`has_kernel_bundle(ctxt, {get_kernel_id()})`. + +_Returns: (2)_ The same value as +`has_kernel_bundle(ctxt, devs, {get_kernel_id()})`. + +!==== +a! +[source] +---- +namespace sycl::ext::oneapi::experimental { + +template bool is_compatible(const device& dev); + +} // namespace sycl::ext::oneapi::experimental +---- +!==== + +_Constraints_: Available only if `is_kernel_v` is `true`. + +_Returns:_ The same value as +`is_compatible({get_kernel_id()}, dev)`. + +|==== + +This extension also adds the following new member functions to the +`kernel_bundle` class: + +``` +namespace sycl { + +template +class kernel_bundle { + // ... + + template + bool ext_oneapi_has_kernel(); + + template + bool ext_oneapi_has_kernel(const device &dev); + + template + kernel ext_oneapi_get_kernel(); +}; + +} // namespace sycl +``` + +|==== +a| +[frame=all,grid=none] +!==== +a! +[source] +---- +template // (1) +bool ext_oneapi_has_kernel() + +template // (2) +bool ext_oneapi_has_kernel(const device &dev) +---- +!==== + +_Constraints_: Available only if `is_kernel_v` is `true`. + +_Returns: (1)_: The value `true` only if the kernel bundle contains the free +function kernel whose address is `Func`. + +_Returns: (2)_: The value `true` only if the kernel bundle contains the free +function kernel whose address is `Func` and if that kernel is compatible with +the device `dev`. + +!==== +a! +[source] +---- +template +kernel ext_oneapi_get_kernel() +---- +!==== + +_Constraints:_ Available only if `State` is `bundle_state::executable` and if +`is_kernel_v` is `true`. + +_Returns:_ If the kernel whose address is `Func` resides in this kernel bundle, +returns the `kernel` object representing that kernel. + +_Throws_: An `exception` with the error code `errc::invalid` if the kernel with +address `Func` does not reside in this kernel bundle. +|==== + +=== Behavior with kernel bundle functions in the core SYCL specification + +Free function kernels that are defined by the application have a corresponding +kernel identifier (`kernel_id`) and are contained by the device images in the +SYCL application. +This section defines the ramifications this has on the kernel bundle functions +defined by the core SYCL specification. + +* The function `get_kernel_ids()` returns the kernel identifiers for any free + function kernels defined by the application, in addition to identifiers for + any kernels defined as lambda expressions or named kernel objects. + +* The kernel bundle returned by + `get_kernel_bundle(const context&, const std::vector& devs)` contains + all of the free function kernels defined by the application that are + compatible with at least one of the devices in `devs`, in addition to all of + the kernels defined as lambda expressions or named kernel objects that are + compatible with one of these devices. + +* The function `has_kernel_bundle(const context&, const std::vector&)` + considers free function kernels defined by the application when computing its + return value. + +The information descriptor `info::kernel::num_args` may be used to query a +`kernel` object that represents a free function kernel. +The return value tells the number of formal parameters in the function's +definition. + +=== Enqueuing a free function kernel and setting parameter values + +Once the application obtains a `kernel` object for a free function kernel, it +can enqueue the kernel to a device using any of the SYCL functions that allow +a kernel to be enqueued via a `kernel` object. +The application must enqueue the free function kernel according to its type. +For example, a free function kernel defined via `range_kernel` can be enqueued +by calling the `handler::parallel_for` overload taking a `range`. +A free function kernel defined via `nd_range_kernel` can be enqueued by calling +the `handler::parallel_for` overload taking an `nd_range`. +A free function kernel defined via `single_task_kernel` can be enqueued by +calling `handler::single_task`. + +Attempting to enqueue a free function kernel using a mechanism that does not +match its type results in undefined behavior. +Attempting to enqueue a free function kernel with a `range` or `nd_range` whose +dimensionality does not match the free function kernel definition results in +undefined behavior. + +The application is also responsible for setting the values of any kernel +arguments when the kernel is enqueued. +For example, when enqueuing a kernel with `handler::parallel_for` or +`handler::single_task`, the kernel argument values must be set via +`handler::set_arg` or `handler::set_args`. +The application must abide by the following rules, otherwise the behavior is +undefined: + +* The application must set a value for each kernel argument. +* The application must not set a value for a kernel argument that does not + exist (e.g. specifying an argument index to `handler::set_arg` that is out of + range). +* The type of the expression used to set the argument's value must match the + type of the corresponding formal parameter in the free function kernel. + +=== Obtaining the iteration id for a kernel + +In a standard SYCL kernel, the iteration ID is passed as a parameter to the +kernel's callable object. +However, this is not the case for a free function kernel because the function +parameters are used to pass the kernel arguments instead. +Therefore, a free function kernel must obtain the iteration ID in some other +way. +Typically, a free function kernel uses the functions specified in +link:../proposed/sycl_ext_oneapi_free_function_queries.asciidoc[ +sycl_ext_oneapi_free_function_queries] for this purpose. + +=== Address space of kernel arguments + +The arguments to a free function kernel are in the private address space. +As a result, a kernel can modify its arguments, but the modification is visible +only within the work-item. + +[_Note:_ This applies only to the arguments themselves, not to memory that the +arguments point to. +For example, with a USM pointer argument, the pointer argument itself is in the +private address space, but the memory it points to is in the global address +space. +_{endnote}_] + +=== Interaction with kernel properties + +A free function kernel may also be decorated with any of the properties defined +in link:../experimental/sycl_ext_oneapi_kernel_properties.asciidoc[ +sycl_ext_oneapi_kernel_properties] by applying the properties to the function +declaration as illustrated below. + +``` +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::range_kernel<1>)) +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::work_group_size<32>)) +void iota(float start, float *ptr) { + // ... +} +``` + +The kernel properties may appear either before or after the `range_kernel`, +`nd_range_kernel`, or `single_task_kernel` property. + +As with standard SYCL kernels, these kernel properties can be queried via +`kernel::get_info` using either the `info::kernel::attributes` information +descriptor or the `info::kernel_device_specific` information descriptors. + +=== Restrictions for integration header implementations + +[_Note:_ The {dpcpp} implementation of this extension currently has the +restrictions listed in this section. +In the future, restrictions tied to the integration header approach might be +formalized in the core SYCL specification and tied to a macro, similar to the +feature set macros that exist already. +_{endnote}_] + +Implementations of SYCL that use the integration header technique have +additional restrictions for functions that are declared as free function +kernels. +These implementations automatically insert forward declarations of the free +function kernels at the top of the translation unit. +This has ramifications on how the application may declare the free function +kernels, on the types that may be used in those declarations, and on the way +the application may reference these kernel identifiers. +The following example illustrates the forward declarations that the +implementation inserts: + +``` +// Forward declarations of types used by the kernel functions. +struct mystruct; +enum myenum : int; + +// Each kernel is forward declared in the same namespace in which the +// application declares it. +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::range_kernel<1>)) +void kernel1(int *); +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::range_kernel<1>)) +void kernel2(mystruct, myenum); + +template +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::range_kernel<1>)) +void kernel3(T *); + +namespace ns { +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::range_kernel<1>)) +void kernel4(int *); +} +``` + +(The lines using `SYCL_EXT_ONEAPI_FUNCTION_PROPERTY` are exposition-only. +Implementations will probably emit some implementation-specific code here +instead of using the macro because the macro and the `range_kernel` property +are probably defined in the `` header, which does not get +get included until after the integration header.) + +As a result, these implementations impose additional restrictions for functions +that are declared as free function kernels: + +* The function must be declared at namespace scope. + +* Any type used in the declaration of a parameter must be one of the allowed + types listed below. + +* If the function is instantiated from a template, any type used to instantiate + the template must be one of the allowed types listed below. + +* Uses of function identifiers in the application must assume that the free + function kernels are forward declared at the top of the translation unit. + Note that this can also affect references to functions that are not declared + as free functions kernels as illustrated below. ++ +``` +void foo(int) {/*...*/} + +void caller() { + auto *pf = foo; // This is ambiguous because foo(float) is forward declared + // in the integration header +} + +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::range_kernel<1>)) +void foo(float) {/*...*/} +``` + +The allowed types are: + +* A {cpp} fundamental type. +* A class or struct that is defined at namespace scope. +* A scoped enumeration that is defined at namespace scope. +* An unscoped enumeration that has an explicit underlying type, where the + enumeration is defined at namespace scope. +* A type alias to one of the above types. + + +== Examples + +=== Basic invocation + +The following example demonstrates how to define a free function kernel and then +enqueue it on a device. + +``` +#include +namespace syclex = sycl::ext::oneapi::experimental; + +static constexpr size_t NUM = 1024; + +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::range_kernel<1>)) +void iota(float start, float *ptr) { + // Get the ID of this kernel iteration. + size_t id = syclex::this_kernel::get_id(); + + ptr[id] = start + static_cast(id); +} + +void main() { + sycl::queue q; + sycl::context ctxt = q.get_context(); + + // Get a kernel bundle that contains the free function kernel "iota". + auto exe_bndl = + syclex::get_kernel_bundle(ctxt); + + // Get a kernel object for the "iota" function from that bundle. + sycl::kernel k_iota = exe_bndl.ext_oneapi_get_kernel(); + + float *ptr = sycl::malloc_shared(NUM, q); + q.submit([&](sycl::handler &cgh) { + // Set the values of the kernel arguments. + cgh.set_args(3.14f, ptr); + + cgh.parallel_for({NUM}, k_iota); + }).wait(); +} +``` + +=== Free function kernels which are templates or overloaded + +A free function kernel may be defined as a function template. +It is also legal to define several overloads for a free function kernel. +The following example demonstrates how to get a kernel identifier in such +cases. + +``` +#include +namespace syclex = sycl::ext::oneapi::experimental; + +template +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::range_kernel<1>)) +void iota(T start, T *ptr) { + // ... +} + +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::single_task_kernel)) +void ping(float *x) { + // ... +} + +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::single_task_kernel)) +void ping(int *x) { + // ... +} + +int main() { + // When the free function kernel is templated, pass the address of a + // specific instantiation. + sycl::kernel_id iota_float = syclex::get_kernel_id>(); + sycl::kernel_id iota_int = syclex::get_kernel_id>(); + + // When there are multiple overloads of a free function kernel, use a cast + // to disambiguate. + sycl::kernel_id ping_float = syclex::get_kernel_id<(void(*)(float))ping>(); + sycl::kernel_id ping_int = syclex::get_kernel_id<(void(*)(int))ping>(); +} +``` + + +== Implementation notes + +=== Compiler diagnostics + +My expectation is that {dpcpp} will emit a diagnostic if a function is +decorated as a free function kernel (e.g. via `syclex::range_kernel`) and the +function violates any of the restrictions listed above under "Defining a free +function kernel". +(Except, of course, no diagnostic is required for violations of the last bullet +because that cannot be diagnosed when compiling a single translation unit.) + +It is probably not practical to diagnose violations for all the extra +restrictions listed under "Restrictions for integration header +implementations". +However, we should diagnose as many as are practical. +In particular, it seems easy to emit a diagnostic if a free function kernel +is defined as a static member function. + +=== Integration header + +My expectation is that {dpcpp} will use the integration header to implement the +traits and the queries like `get_kernel_id()`. +The integration header will probably start with forward declarations of types +used for the parameters to the free function kernels. +Following this, the header can contain forward declarations of the free +function kernels themselves. +In order to avoid problems where functions with the same name in different +namespaces "shadow" each other, the structure can look like this: + +``` +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::range_kernel<1>)) +void same_name(int arg1); +static constexpr auto __sycl_shim1() {return (void(*)(int))same_name;} + +inline namespace { + SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::range_kernel<1>)) + void same_name(int arg1); + static constexpr auto __sycl_shim2() {return (void(*)(int))same_name;} +} + +namespace sycl { + template<> struct is_range_kernel<__sycl_shim1()> : std::true_type {}; + template<> struct is_range_kernel<__sycl_shim2()> : std::true_type {}; +} +``` + +The helper functions `+__sycl_shim1+`, etc. avoid the shadowing problem because +they are defined in the same namespace as the user's kernel function. +Thus, the {cpp} unqualified name lookup algorithm, finds the correct function +definition. +However, each helper function has a unique name, so it can be uniquely +identified from the `sycl` namespace, where it is called to specialize the +`is_range_kernel` trait. + +=== Decomposed kernel arguments + +The {dpcpp} implementation currently "decomposes" certain kernel argument +types, meaning that some argument types are actually passed as several separate +arguments when the SYCL runtime invokes the kernel using the underlying +backend. +For example, `accessor` consists of several internal member variables. +On the OpenCL backend, one of these member variables is `cl_mem`, and OpenCL +restrictions require this variable to be passed directly as an OpenCL kernel +argument. +(It cannot be passed as a member embedded within a structure.) +As a result, {dpcpp} passes each member variable as a separate OpenCL kernel +argument. + +An argument like this that is decomposed is still represent as a single +argument in SYCL source code. +When invoking a free function kernel, the application sets the value of such an +argument with a single call to `handler::set_arg`. +For example, the application sets the value of an `accessor` by calling +`set_arg(acc)`, where `acc` is a variable of type `accessor`. + +It is the responsibility of the implementation to translate these calls to +`set_arg` into multiple backend argument-setting calls when necessary. +For example, a call to `set_arg(acc)` may actually result in several OpenCL +calls to `clSetKernelArg`, one for each of the member variables in `accessor`. + +=== Kernel arguments that are optimized away + +The {dpcpp} implementation currently has the ability to optimize away unused +kernel arguments. +For example, if a kernel is declared to take an argument `foo` which is never +used by the kernel, the implementation may eliminate the argument entirely and +avoid calling the backend argument-setting API. +It is still possible to perform these sorts of optimizations for a free +function kernel, but the logic inside of `handler::set_arg` needs to know when +an argument has been optimized away. + +Of course, the application is still responsible for calling `set_arg` for all +kernel arguments, even if the implementation has optimized the argument away. +(The application has no way of knowing whether the optimization has been +performed.) +Therefore, `set_arg` must know whether the argument has been optimized away, +and it must not call the underlying backend argument-setting API for such an +argument, effectively turning the call into a no-op. + + +== Issues + +* We're pretty sure that we want to define some syntax that allows a free + function kernel to be enqueued using the APIs defined in + link:../proposed/sycl_ext_oneapi_enqueue_functions.asciidoc[ + sycl_ext_oneapi_enqueue_functions], but we haven't settled on the exact API + yet. + One option is like this: ++ +``` +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::range_kernel<1>)) +void iota(float start, float *ptr) { /*...*/ } + +int main() { + sycl::queue q; + float *ptr = sycl::malloc_shared(N, q); + sycl::parallel_for(q, {N}, 1.f, ptr); +} +``` ++ +Another option is like this: ++ +``` +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::range_kernel<1>)) +void iota(float start, float *ptr) { /*...*/ } + +int main() { + sycl::queue q; + float *ptr = sycl::malloc_shared(N, q); + sycl::parallel_for(q, {N}, kfp, 1.f, ptr); +} +``` ++ +Where `kfp` would have some nicer name. + +* We are debating whether we should allow a free function kernel to be defined + with an initial "iteration index" parameter such as: ++ +-- +``` +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::range_kernel<1>)) +void iota(sycl::item<1> item, float start, float *ptr) { /*...*/ } +``` + +The advantage is that the user wouldn't need to use the functions in +link:../proposed/sycl_ext_oneapi_free_function_queries.asciidoc[ +sycl_ext_oneapi_free_function_queries] to get the iteration index. +Doing this raises some new questions, though: + +** When the application sets the value of a kernel parameter via `set_arg`, + does argument index `0` correspond to the `item` or to the first parameter + after `item`? + For example, to set the value of `start` in the example above, does the + application call `+set_arg(0, ...)+` or `+set_arg(1, ...)+`? + Both seem like reasonable choices, so many users may need to read the + documentation to determine what is right. + +** If the first parameter is an index like `sycl::item<1>`, then the property + `syclex::range_kernel<1>` is somewhat redundant. + Should the compiler raise a diagnostic if they do not match? + Or, should we invent a new property like: ++ +``` +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::kernel_function)) +void iota(sycl::item<1> item, float start, float *ptr) { /*...*/ } +``` + +** In standard SYCL kernels, the iteration index can be anything that is + convertible from `sycl::item` or `sycl::nd_item`. + For example, it is common to use `id` for range kernels or `int` for + 1-dimensional range kernels. + However, both `id` and `int` can also be used as kernel parameters. + Therefore, something like this is ambiguous: ++ +``` +SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::range_kernel<1>)) +void iota(sycl::id<1> i, float start, float *ptr) { /*...*/ } +``` ++ +Is `i` the kernel's iteration index, or is it simply a kernel argument whose +type is `sycl::id`? +-- ++ +We agreed that we _do_ need to support free function kernels that do not have +an initial iteration index parameter (as this spec is currently written) +because this is necessary when migrating some CUDA code. +Therefore, the question is whether we _also_ want to support a syntax where the +first parameter is an iteration index. + +* Should the spec require an implementation to emit a diagnostic if a free + function kernel violates the restrictions listed in "Defining a free function + kernel"? + For now, I've listed this under "Implementation notes" because I expect + {dpcpp} to emit a diagnostic in this case. + We should decide if it is reasonable to require a diagnostic for all + implementations of this extensions. + +* We currently say it is UB if there is a mismatch between a free function + kernel's type or dimensionality and the call to `parallel_for` or + `single_task`. + Should we go a step further and require an exception to be thrown in these + cases? + I'm a little hesitant to require an error check here because this is on the + critical path for enqueuing a kernel. + However, {dpcpp} is still allowed to throw an exception in this case if the + overhead is not too high (I'd suggest `errc::invalid`). + I think we should decide during implementation whether the overhead is + minimal enough that we can mandate an error in the spec. + +* We currently say it is UB if a free function kernel is enqueued without + setting a value for each of its arguments. + Should we go a step further and require an exception in this case (again + probably `errc::invalid`)? + Again, I think we should decide during implementation whether the overhead is + minimal enough that we can mandate an error in the spec.