-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved GPU Copy #1976
base: main
Are you sure you want to change the base?
Improved GPU Copy #1976
Conversation
But I have to run the unit tests.
…D copy to a 1d copy, because it happens to be continiously allocated.
…not where I though I found it.
…date_state()` function. The reason is that in some cases this is valid, for example if an edge connects an AccessNode and a MapEntry, because, in that case the map might not be executed. Since the Memlet does not have access to its source and destination node it can not check that, so the test was moved to a location that can do this check. However, it only does the check for AN to AN connections, which is a bit restrictive, but this is something for later.
Once [PR#1976](spcl/dace#1976) is merged in DaCe the code generator is able to handle more Memlets directly as Cuda `memcpy()` calls. This PR modifies the GPU transformation of GT4Py in such a way that these Memlets are no longer transformed into Maps. However, it can only be merged if the DaCe dependency was bumped to a version that includes PR#1976!
This looks good to me, but would be great if @alexnick83 could quickly sanity check - GPU codegen is not my forté |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some questions/comments, namely the following:
- I share the concern that, for ndim > 2, the conditions tested are insufficient. I would suggest making an issue to revisit in the future.
- I am seeing an issue in
_emit_copy
, for ndim == 2, around lines 1014-1015. Maybe have a look and confirm that this is intended.
I see that we were already referring to these copies as "continuous," but the correct term is "contiguous." Could you do a find and replace-all?
if src_strides[-1] != 1 or dst_strides[-1] != 1: | ||
# Currently we only support ND copies when they can be represented | ||
# as a 1D copy or as a 2D strided copy | ||
# NOTE: Not sure if this test is enough, it should also be tested that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To what are you agreeing on?
- That you are not sure what this code is doing.
- That you agree that it is meaningless and can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That the test is not enough. it doesn't consider, e.g., Views where you can have multiple discontinuities, and I guess it is like that because the original code predates Views. I also think it is fine to leave it as-is for now, but it should be noted down as a possible source of errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why people give this code the impression that it is so old. It's from March 2022, it definitely does not predate views: #961
if src_strides[-1] != 1 or dst_strides[-1] != 1: | ||
# Currently we only support ND copies when they can be represented | ||
# as a 1D copy or as a 2D strided copy | ||
# NOTE: Not sure if this test is enough, it should also be tested that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That the test is not enough. it doesn't consider, e.g., Views where you can have multiple discontinuities, and I guess it is like that because the original code predates Views. I also think it is fine to leave it as-is for now, but it should be noted down as a possible source of errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I wrote the original code, I want to make sure I review it properly before it’s in
dace/codegen/targets/cuda.py
Outdated
if dims == 2 and (is_fortran_order or is_c_order): | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tbennun Since you want to review it and write the code you can probably tell me what this operation, turning a 2d copy into a 1d copy if the memory is contiguous.
I mean I understand what it is doing but I have no idea why it is doing it, especially since memlet_copy_to_absolute_strides()
does the same thing already.
My guess is that this code is older than memlet_copy_to_absolute_strides()
and it was forgotten and I would propose to remove it, do you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is about 5 years newer than memlet_copy_to_absolute_strides (which was there in the very first GitHub commit of dace):
dace/dace/codegen/targets/cpu.py
Line 970 in 256a0a6
def memlet_copy_to_absolute_strides(self, |
Definitely not some code that was forgotten, you would see it very quickly if you git blame
d the file.
There is a reason it is there, I need to read the code again to get context. This will happen during April though. Thanks!
@tbennun As an experiment I deleted that code and the CI passes this includes the tests I explicitly added to test that a 2D copy is transformed into a contiguous 1d copy, if possible. |
Before some 2D copies (especially if they had FORTRAN order) were turned into Maps, see issue#1953.
This PR modifies the code generator in such a way that such copies are now handled.
There is some legacy stuff that should also be looked at.