-
Notifications
You must be signed in to change notification settings - Fork 0
Description
(Potential) Bug Description
The MapTiling transformation does not appear to handle maps with strided ranges (stride > 1) correctly. Specifically, when applied to such maps, the resulting SDFG differs semantically from the original and produces incorrect results.
Steps to Reproduce
I recommend to run the following code snippets in a Jupyter notebook.
-
Import the needed modules
import numpy as np import dace from dace.sdfg import nodes from dace.transformation.dataflow.tiling import MapTiling
-
The example SDFG
N = dace.symbol('N') s = 33 @dace.program def vector_copy_strides(A: dace.uint32[N], B: dace.uint32[N]): for i in dace.map[0:N:s] @ dace.dtypes.ScheduleType.CPU_Multicore: A[i] = B[i] sdfg = vector_copy_strides.to_sdfg() sdfg
This produces the following SDFG:
-
Apply the Maptiling transformation:
state = sdfg.states()[0] sdfg_nodes = state.nodes() map_entry:nodes.MapEntry = [n for n in sdfg_nodes if isinstance(n, nodes.MapEntry)][0] tile_sizes = [32] MapTiling.apply_to( sdfg=sdfg, options={ "prefix": "b", "tile_sizes": tile_sizes, "divides_evenly": False, "tile_trivial": True, "skew": True }, map_entry=map_entry ) sdfg
This results into the following SDFG:
Where the stride in the top level map is not correct.
Expected Behavior
The stride in the top-level map should be 32 * s (i.e., 32 * 33) to correctly account for both the original stride (s = 33) and the tiling configuration. However, the current MapTiling transformation only considers the tile size (32) and ignores the existing stride. This results in incorrect behavior.
For example, consider the following input:
A = [0, 0, ..., 0]
B = [0, 1, 2, ..., 63]After applying the MapTiling transformation, only every 32nd element of B is copied into A, instead of every 33rd. This occurs because the generated SDFG includes a top-level map with stride 32 and an inner map with stride 33. Due to the limited inner range (min(N-1, b_i + 31) + 1 - b_i <= 32), only the first iteration (i = 0) of the inner map executes.
Since the copy operation is of the form A[b_i + i] = B[b_i + i], and i = 0 is the only iteration, the result is simply A[b_i] = B[b_i] for b_i in [0:N:32]. This causes the final output to reflect copies from indices 0, 32, 64, ..., instead of the expected 0, 33, 66, ....

