-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Bug Description
DaCe’s Python frontend does not correctly handle memory copies of the form:
dst[b1:e1:s1] = src[b2:e2:s2]While simple cases work (e.g. dst[:] = src[:]), copies where the source and destination strides ( s1 and s2 ) are different are not correctly captured. This results in an incorrect memlet in the generated SDFG and raises an InvalidSDFGEdgeError.
Steps to Reproduce
Consider the following valid 1D strided copy:
dst[0:20:2] = src[0:40:4]To reproduce the issue, open a Jupyter Notebook and paste the following code snippets into separate cells:
- Get required imports
import dace
import cupy as cp
import numpy as np- Define Example Inputs
src = np.ones(40, dtype=cp.uint32)
dst = cp.zeros(20, dtype=cp.uint32)- Failing example
@dace.program
def fail(dst: dace.uint32[20] @ dace.dtypes.StorageType.GPU_Global, src: dace.uint32[40]):
dst[0:20:2] = src[0:40:4]
fail.to_sdfg()Now hover over the memlet edge in the generated SDFG with your mouse to see the following data movement:
[0:10:4] -> dst[0:20:2]
Volume: 10This is incorrect — the source range is misrepresented.
- Trigger the Error
To see the validation error directly either replace the previous last line by:
fail.to_sdfg(validate=True)Or try to run the program:
@dace.program
def fail(dst: dace.uint32[20] @ dace.dtypes.StorageType.GPU_Global,
src: dace.uint32[40]):
dst[0:20:2] = src[0:40:4]
sdfg = fail.to_sdfg()
sdfg(src=src, dst=dst)- How the SDFG of the previous dace program is expected to look like:
sdfg = dace.SDFG("strided_1D_memory_copy")
state = sdfg.add_state("main")
src_dev = sdfg.add_array("src", (40,), dace.uint32)
dst_dev = sdfg.add_array("dst", (20,), dace.uint32, dace.dtypes.StorageType.GPU_Global)
src_acc = state.add_access("src")
dst_acc = state.add_access("dst")
state.add_edge(src_acc, None, dst_acc, None, dace.memlet.Memlet('[0:40:4] -> dst[0:20:2]'))
sdfg.fill_scope_connectors()
sdfgNow this is the correct and expected solution. You can hover with the mouse over the edge between the src and dst node to see the dataflow, which is now correct and shows:
[0:40:4] -> dst[0:20:2]
Volume: 10We can also verify the generated SDFG produces the expected result, namely every 2nd element is 1 and the rest 0.
sdfg(src=src, dst=dst)
print(f"dst: {dst}")