-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PI CUDA ERROR when using sycl::atomic_ref
#11208
Comments
Is this feature important to you? |
I am fine without it. |
cuda supports non relaxed atomics for sm70 and above: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#memory-consistency-model The only difference is seq_cst, which doesn't have the same level of native support, but can apparently be implemented via fences. Here is a summary I wrote: The sycl::memory_order parameter corresponds with the ptx .sem qualifier that can be used on all atomic operations: "The .sem qualifier requires sm_70 or higher. It specifies a memory synchronizing effect as described in the Memory Consistency Model. If the .sem qualifier is absent, .relaxed is assumed by default." There is a memory-order correspondence between all the possible values of the .sem qualifier and those of sycl::memory_order except that sycl::memory_order::seq_cst is not supported by ptx:
|
I've mapped sycl seq_cst to cuda backend. Details here: #12516 (comment) |
Implement `seq_cst` RC11/ptx6.0 memory consistency for CUDA backend. See https://dl.acm.org/doi/pdf/10.1145/3297858.3304043 and https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#memory-consistency-model for full details. Requires sm_70 or above. With this PR there is now a complete mapping between SYCL memory consistency model capabilities and the official CUDA model, fully exploiting CUDA capabilities when possible on supported arches. This makes the SYCL-CTS atomic_ref tests fully pass for sm_70 on the cuda backend. Fixes #11208 Depends on #12907 --------- Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Implement `seq_cst` RC11/ptx6.0 memory consistency for CUDA backend. See https://dl.acm.org/doi/pdf/10.1145/3297858.3304043 and https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#memory-consistency-model for full details. Requires sm_70 or above. With this PR there is now a complete mapping between SYCL memory consistency model capabilities and the official CUDA model, fully exploiting CUDA capabilities when possible on supported arches. This makes the SYCL-CTS atomic_ref tests fully pass for sm_70 on the cuda backend. Fixes intel#11208 Depends on intel#12907 --------- Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Describe the bug
A clear and concise description of what the bug is.
I get an error when running the program below.
To Reproduce
Please describe the steps to reproduce the behavior:
clang++ -O3 -DNDEBUG main.cpp -fsycl -fsycl-targets=nvptx64-nvidia-cuda
./a.out
program should run without errors.
Environment (please complete the following information):
OS: [e.g Windows/Linux]
Linux
Target device and vendor: [e.g. Intel GPU]
Nvidia GPU
DPC++ version: [e.g. commit hash or output of
clang++ --version
]Intel(R) oneAPI DPC++/C++ Compiler 2023.2.0 (2023.2.0.20230622)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/intel/oneapi/compiler/2023.2.0/linux/bin-llvm
Dependencies version: [e.g. low-level runtime versions (like NEO 20.04)]
Additional context
Add any other context about the problem here.
If the
sycl::atomic_ref
is usingsycl::memory_order::acq_rel
orsycl::memory_order::relaxed
, then it runs without errors.The text was updated successfully, but these errors were encountered: