From d981d372b062bb9e561cb9f46fd7d0b697ba1b66 Mon Sep 17 00:00:00 2001 From: Victor Perez Date: Wed, 7 Feb 2024 10:09:53 +0000 Subject: [PATCH 1/2] [SYCL][Fusion][Doc] Document reductions support Document reduction strategies supported by kernel fusion and how users should use reductions in their code. Signed-off-by: Victor Perez --- sycl/doc/design/KernelFusionJIT.md | 34 +++++++++++++++++++++++++++++- 1 file changed, 33 insertions(+), 1 deletion(-) diff --git a/sycl/doc/design/KernelFusionJIT.md b/sycl/doc/design/KernelFusionJIT.md index 224a1984d2902..8878e11ac6132 100644 --- a/sycl/doc/design/KernelFusionJIT.md +++ b/sycl/doc/design/KernelFusionJIT.md @@ -313,12 +313,44 @@ computing the private memory size. As range rounding only applies to basic kernels (parametrized by a `sycl::range`), local internalization is not affected by the range rounding transformation. +### Reductions + +Kernel fusion of reductions is partially supported. In order to preserve the +legality of the fused kernel, i.e., the fact that fused kernel must perform the +same work as the graph of kernels to be fused, only the fusion of following +reduction strategies at the time of writing is supported: + +- `group_reduce_and_last_wg_detection` +- `local_atomic_and_atomic_cross_wg` +- `range_basic` +- `group_reduce_and_atomic_cross_wg` +- `local_mem_tree_and_atomic_cross_wg` + +Other strategies require implicit inter-work-group synchronization, not +supported in kernel fusion. + +This way, users should not use `sycl::reduction` directly when performing kernel +fusion in their code, as an unsupported algorithm might be chosen. They should +instead use `sycl::detail::reduction_parallel_for`, forcing a supported fusion +strategy. Reductions implementation in +[`sycl/reduction.hpp`](../../include/sycl/reduction.hpp) might give users an +insight into which kind of reductions to use for their purposes: + +```c++ +q.submit([&](sycl::handler &cgh) { + sycl::accessor in(dataBuf, cgh, sycl::read_only); + sycl::reduction sum(sumBuf, cgh, sycl::plus<>{}); + // Force supported 'group_reduce_and_last_wg_detection' strategy + sycl::detail::reduction_parallel_for(...); +}); +``` + ### Unsupported SYCL constructs The following SYCL API constructs are currently not officially supported for kernel fusion and should be considered untested/unsupported: -- Reductions - `sycl::stream` - Specialization constants and `sycl::kernel_handler` - Images (`sycl::unsampled_image` and `sycl::sampled_image`) From d9cd676e917a9c1500078f4d29366245e712f476 Mon Sep 17 00:00:00 2001 From: Victor Perez Date: Wed, 7 Feb 2024 12:25:53 +0000 Subject: [PATCH 2/2] Address comments Signed-off-by: Victor Perez --- sycl/doc/design/KernelFusionJIT.md | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/sycl/doc/design/KernelFusionJIT.md b/sycl/doc/design/KernelFusionJIT.md index 8878e11ac6132..bbdd4b3e70930 100644 --- a/sycl/doc/design/KernelFusionJIT.md +++ b/sycl/doc/design/KernelFusionJIT.md @@ -317,8 +317,8 @@ by the range rounding transformation. Kernel fusion of reductions is partially supported. In order to preserve the legality of the fused kernel, i.e., the fact that fused kernel must perform the -same work as the graph of kernels to be fused, only the fusion of following -reduction strategies at the time of writing is supported: +same work as the graph of kernels to be fused, only the fusion of the following +reduction strategies is supported at the time of writing: - `group_reduce_and_last_wg_detection` - `local_atomic_and_atomic_cross_wg` @@ -329,9 +329,12 @@ reduction strategies at the time of writing is supported: Other strategies require implicit inter-work-group synchronization, not supported in kernel fusion. -This way, users should not use `sycl::reduction` directly when performing kernel -fusion in their code, as an unsupported algorithm might be chosen. They should -instead use `sycl::detail::reduction_parallel_for`, forcing a supported fusion +Users may encounters errors, e.g., fusion being aborted or incorrect results due +to race conditions or any other cause, when using the `sycl::reduction` +interface. The SYCL runtime will choose different algorithms depending on the +reduction operator, data type and hardware capabilities, so strategy selection +is not possible through the regular interface. In this case, users can instead +use `sycl::detail::reduction_parallel_for`, forcing a supported fusion strategy. Reductions implementation in [`sycl/reduction.hpp`](../../include/sycl/reduction.hpp) might give users an insight into which kind of reductions to use for their purposes: