From d981d372b062bb9e561cb9f46fd7d0b697ba1b66 Mon Sep 17 00:00:00 2001
From: Victor Perez <victor.perez@codeplay.com>
Date: Wed, 7 Feb 2024 10:09:53 +0000
Subject: [PATCH 1/2] [SYCL][Fusion][Doc] Document reductions support

Document reduction strategies supported by kernel fusion and how users
should use reductions in their code.

Signed-off-by: Victor Perez <victor.perez@codeplay.com>
---
 sycl/doc/design/KernelFusionJIT.md | 34 +++++++++++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/sycl/doc/design/KernelFusionJIT.md b/sycl/doc/design/KernelFusionJIT.md
index 224a1984d2902..8878e11ac6132 100644
--- a/sycl/doc/design/KernelFusionJIT.md
+++ b/sycl/doc/design/KernelFusionJIT.md
@@ -313,12 +313,44 @@ computing the private memory size. As range rounding only applies to basic
 kernels (parametrized by a `sycl::range`), local internalization is not affected
 by the range rounding transformation.
 
+### Reductions
+
+Kernel fusion of reductions is partially supported. In order to preserve the
+legality of the fused kernel, i.e., the fact that fused kernel must perform the
+same work as the graph of kernels to be fused, only the fusion of following
+reduction strategies at the time of writing is supported:
+
+- `group_reduce_and_last_wg_detection`
+- `local_atomic_and_atomic_cross_wg`
+- `range_basic`
+- `group_reduce_and_atomic_cross_wg`
+- `local_mem_tree_and_atomic_cross_wg`
+
+Other strategies require implicit inter-work-group synchronization, not
+supported in kernel fusion.
+
+This way, users should not use `sycl::reduction` directly when performing kernel
+fusion in their code, as an unsupported algorithm might be chosen. They should
+instead use `sycl::detail::reduction_parallel_for`, forcing a supported fusion
+strategy. Reductions implementation in
+[`sycl/reduction.hpp`](../../include/sycl/reduction.hpp) might give users an
+insight into which kind of reductions to use for their purposes:
+
+```c++
+q.submit([&](sycl::handler &cgh) {
+  sycl::accessor in(dataBuf, cgh, sycl::read_only);
+  sycl::reduction sum(sumBuf, cgh, sycl::plus<>{});
+  // Force supported 'group_reduce_and_last_wg_detection' strategy
+  sycl::detail::reduction_parallel_for<sycl::detail::auto_name,
+      sycl::detail::strategy::group_reduce_and_last_wg_detection>(...);
+});
+```
+
 ### Unsupported SYCL constructs
 
 The following SYCL API constructs are currently not officially supported for
 kernel fusion and should be considered untested/unsupported: 
 
-- Reductions
 - `sycl::stream`
 - Specialization constants and `sycl::kernel_handler`
 - Images (`sycl::unsampled_image` and `sycl::sampled_image`)

From d9cd676e917a9c1500078f4d29366245e712f476 Mon Sep 17 00:00:00 2001
From: Victor Perez <victor.perez@codeplay.com>
Date: Wed, 7 Feb 2024 12:25:53 +0000
Subject: [PATCH 2/2] Address comments

Signed-off-by: Victor Perez <victor.perez@codeplay.com>
---
 sycl/doc/design/KernelFusionJIT.md | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/sycl/doc/design/KernelFusionJIT.md b/sycl/doc/design/KernelFusionJIT.md
index 8878e11ac6132..bbdd4b3e70930 100644
--- a/sycl/doc/design/KernelFusionJIT.md
+++ b/sycl/doc/design/KernelFusionJIT.md
@@ -317,8 +317,8 @@ by the range rounding transformation.
 
 Kernel fusion of reductions is partially supported. In order to preserve the
 legality of the fused kernel, i.e., the fact that fused kernel must perform the
-same work as the graph of kernels to be fused, only the fusion of following
-reduction strategies at the time of writing is supported:
+same work as the graph of kernels to be fused, only the fusion of the following
+reduction strategies is supported at the time of writing:
 
 - `group_reduce_and_last_wg_detection`
 - `local_atomic_and_atomic_cross_wg`
@@ -329,9 +329,12 @@ reduction strategies at the time of writing is supported:
 Other strategies require implicit inter-work-group synchronization, not
 supported in kernel fusion.
 
-This way, users should not use `sycl::reduction` directly when performing kernel
-fusion in their code, as an unsupported algorithm might be chosen. They should
-instead use `sycl::detail::reduction_parallel_for`, forcing a supported fusion
+Users may encounters errors, e.g., fusion being aborted or incorrect results due
+to race conditions or any other cause, when using the `sycl::reduction`
+interface. The SYCL runtime will choose different algorithms depending on the
+reduction operator, data type and hardware capabilities, so strategy selection
+is not possible through the regular interface. In this case, users can instead
+use `sycl::detail::reduction_parallel_for`, forcing a supported fusion
 strategy. Reductions implementation in
 [`sycl/reduction.hpp`](../../include/sycl/reduction.hpp) might give users an
 insight into which kind of reductions to use for their purposes: