Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUDA][LIBCLC] Implement RC11 seq_cst for PTX6.0 #12516

Merged
merged 15 commits into from
Mar 18, 2024

Conversation

JackAKirk
Copy link
Contributor

@JackAKirk JackAKirk commented Jan 29, 2024

Implement seq_cst RC11/ptx6.0 memory consistency for CUDA backend.

See https://dl.acm.org/doi/pdf/10.1145/3297858.3304043 and https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#memory-consistency-model for full details. Requires sm_70 or above. With this PR there is now a complete mapping between SYCL memory consistency model capabilities and the official CUDA model, fully exploiting CUDA capabilities when possible on supported arches.

This makes the SYCL-CTS atomic_ref tests fully pass for sm_70 on the cuda backend.

Fixes #11208

Depends on #12907

see https://dl.acm.org/doi/pdf/10.1145/3297858.3304043 for all details

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
@JackAKirk JackAKirk requested review from a team as code owners January 29, 2024 14:48
@JackAKirk JackAKirk requested a review from npmiller January 29, 2024 14:48
@JackAKirk JackAKirk marked this pull request as draft January 29, 2024 14:51
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
@JackAKirk
Copy link
Contributor Author

JackAKirk commented Jan 29, 2024

This is ready for review. I've marked this as draft so this doesn't get merged, since the UR tag is only temporary for testing: this PR requires oneapi-src/unified-runtime#1291

Some further information:

The Repaired C++11 memory consistency model (RC11 https://pure.mpg.de/rest/items/item_2543045/component/file_3332084/content ) was adopted in C++20 definition of seq_cst: see https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0668r5.html
Nvidia explicitly state how their ptx instructions map to RC11 here (Figure 11) : https://dl.acm.org/doi/pdf/10.1145/3297858.3304043

This PR implements this mapping.
Read "4.2 A Mapping from Scoped C++ onto PTX" for an explanation of the RMW mapping from ptx 6.0 to RC11.

The "ptx 6.0" memory model described in that paper is described as the ptx memory consistency model in their official ptx documentation https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#memory-consistency-model

CUDA memory consistency patterns falling outside the scope of "ptx 6.0", apart from its "ptx 7.5" extension (https://dl.acm.org/doi/pdf/10.1145/3470496.3533045), are not properly defined anywhere.

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
@JackAKirk JackAKirk closed this Jan 31, 2024
@JackAKirk JackAKirk reopened this Jan 31, 2024
@JackAKirk JackAKirk marked this pull request as ready for review January 31, 2024 20:13
@JackAKirk JackAKirk requested a review from a team as a code owner January 31, 2024 20:13
Copy link
Contributor

@Alcpz Alcpz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SYCLcompat changes look good to me. Thank you for fixing those tests.

Copy link
Contributor

@ldrumm ldrumm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your PR description will become the squashed commit message. Please reword it so it better reads as one:

The title should be clearer: "Implement CRC11 seq_cst for PTX6"

The wording should use imperative mood:

s/this PR implements/Implement

s/With this PR//g

There's also a merge conflict that needs to be resolved

sycl/plugins/unified_runtime/CMakeLists.txt Outdated Show resolved Hide resolved
sycl/test-e2e/syclcompat/atomic/atomic_class.cpp Outdated Show resolved Hide resolved
@JackAKirk
Copy link
Contributor Author

Bindless images failure is unrelated. This PR just needs oneapi-src/unified-runtime#1291 merged and the UR tag updated. Then it will be ready for merge. I'll unmark it as draft when that happens.

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Copy link
Contributor

@kbenzie kbenzie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oneapi-src/unified-runtime#1291 has been merged, however this PR is dependent on #12907 merging first. Once that's happened pull in the latest sycl branch changes, resolve the conflict, then update the UR repo/tag as suggested. After that make this ready for review and UR reviewers approve it.

sycl/plugins/unified_runtime/CMakeLists.txt Outdated Show resolved Hide resolved
@JackAKirk JackAKirk marked this pull request as ready for review March 18, 2024 14:23
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
@JackAKirk JackAKirk changed the title [CUDA][LIBCLC] RC11/ptx6.0 memory consistency model seq_cst impl [CUDA][LIBCLC] Implement RC11 seq_cst for PTX6.0 Mar 18, 2024
@ldrumm ldrumm merged commit c1e2957 into intel:sycl Mar 18, 2024
11 checks passed
kbenzie pushed a commit to kbenzie/intel-llvm that referenced this pull request Apr 15, 2024
Implement `seq_cst` RC11/ptx6.0 memory consistency for CUDA backend.

See https://dl.acm.org/doi/pdf/10.1145/3297858.3304043 and
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#memory-consistency-model
for full details. Requires sm_70 or above. With this PR there is now a
complete mapping between SYCL memory consistency model capabilities and
the official CUDA model, fully exploiting CUDA capabilities when
possible on supported arches.

This makes the SYCL-CTS atomic_ref tests fully pass for sm_70 on the
cuda backend.

Fixes intel#11208

Depends on intel#12907

---------

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
sommerlukas pushed a commit that referenced this pull request Apr 30, 2024
…ces (#12974)

AMD ~~and CUDA~~ devices still not supported.
~~CUDA to be supported in #12516

Edit: Since #12516 has been merged, CUDA is also `seq_cst` by default.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PI CUDA ERROR when using sycl::atomic_ref
5 participants