-
I notice that cccl implements But the ASPLOS_2019 PTX Memory Model paper states in section 4.2 that a release store is necessary:
Are there any new developments on this, or it is a implementation error? Thanks a lot! —————————— https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0982r1.html does this have anything to do with this topic? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Very good question! First, libcu++ atomics currently rely on implementation details which, in currently supported platforms, enable libcu++ to lower:
Second, you are totally right that the current expansion is not correct according to the model published in the ASPLOS ’19 paper, or the PTX Atomics ABI which is what we require external SW to follow. We’ve actually considered this a bug in the ASPLOS ’19 memory model and the ABI for a while, and although we haven’t gotten to it yet, we intend to update the model formalism to reflect the fact that the mapping with the relaxed store is sound in practice. |
Beta Was this translation helpful? Give feedback.
Very good question!
First, libcu++ atomics currently rely on implementation details which, in currently supported platforms, enable libcu++ to lower:
fence.sc; st.relaxed;
instead offence.sc; st.release;
.fence.sc; atom.acquire;
instead offence.sc; atom.acq_rel;
.libc++ is closely tied to the implementation (CUDA Toolkit, compiler, driver, hw) and if the above changes, we'll update it accordingly.
Second, you are totally right that the current expansion is not correct according to the model published in the ASPLOS ’19 paper, or the PTX Atomics ABI which is what we require external SW to follow. We’ve actually considered…