-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CUDA][LIBCLC] Implement RC11 seq_cst for PTX6.0 #12516
Conversation
see https://dl.acm.org/doi/pdf/10.1145/3297858.3304043 for all details Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
This is ready for review. I've marked this as draft so this doesn't get merged, since the UR tag is only temporary for testing: this PR requires oneapi-src/unified-runtime#1291 Some further information: The Repaired C++11 memory consistency model (RC11 https://pure.mpg.de/rest/items/item_2543045/component/file_3332084/content ) was adopted in C++20 definition of seq_cst: see https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0668r5.html This PR implements this mapping. The "ptx 6.0" memory model described in that paper is described as the ptx memory consistency model in their official ptx documentation https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#memory-consistency-model CUDA memory consistency patterns falling outside the scope of "ptx 6.0", apart from its "ptx 7.5" extension (https://dl.acm.org/doi/pdf/10.1145/3470496.3533045), are not properly defined anywhere. |
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SYCLcompat changes look good to me. Thank you for fixing those tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your PR description will become the squashed commit message. Please reword it so it better reads as one:
The title should be clearer: "Implement CRC11 seq_cst for PTX6"
The wording should use imperative mood:
s/this PR implements/Implement
s/With this PR//g
There's also a merge conflict that needs to be resolved
Bindless images failure is unrelated. This PR just needs oneapi-src/unified-runtime#1291 merged and the UR tag updated. Then it will be ready for merge. I'll unmark it as draft when that happens. |
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oneapi-src/unified-runtime#1291 has been merged, however this PR is dependent on #12907 merging first. Once that's happened pull in the latest sycl branch changes, resolve the conflict, then update the UR repo/tag as suggested. After that make this ready for review and UR reviewers approve it.
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
seq_cst
implImplement `seq_cst` RC11/ptx6.0 memory consistency for CUDA backend. See https://dl.acm.org/doi/pdf/10.1145/3297858.3304043 and https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#memory-consistency-model for full details. Requires sm_70 or above. With this PR there is now a complete mapping between SYCL memory consistency model capabilities and the official CUDA model, fully exploiting CUDA capabilities when possible on supported arches. This makes the SYCL-CTS atomic_ref tests fully pass for sm_70 on the cuda backend. Fixes intel#11208 Depends on intel#12907 --------- Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Implement
seq_cst
RC11/ptx6.0 memory consistency for CUDA backend.See https://dl.acm.org/doi/pdf/10.1145/3297858.3304043 and https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#memory-consistency-model for full details. Requires sm_70 or above. With this PR there is now a complete mapping between SYCL memory consistency model capabilities and the official CUDA model, fully exploiting CUDA capabilities when possible on supported arches.
This makes the SYCL-CTS atomic_ref tests fully pass for sm_70 on the cuda backend.
Fixes #11208
Depends on #12907