-
Notifications
You must be signed in to change notification settings - Fork 751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Support shuffle algorithms for non-uniform groups #12705
[SYCL] Support shuffle algorithms for non-uniform groups #12705
Conversation
This commit makes the non-uniform group classes support the shift and select algorithms. Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this, and for tidying up some of the SPIR-V wrappers!
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good to me, modulo the test failures.
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
This reverts commit a6eb06d.
if constexpr (ext::oneapi::experimental::is_user_constructed_group_v< | ||
GroupT>) { | ||
return __nvvm_shfl_sync_up_i32(detail::ExtractMask(detail::GetMask(g))[0], | ||
x, delta, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JackAKirk - Can you think of a reason why this would fail for the new test cases for ballot_group only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is failing because the delta does not relate to the true delta only considering set bits in the mask which is what is desired. I think could also fail for opportunistic_group, but guess this depends on the test/execution.
I think for these cases (opportunistic_group and ballot_group, but you could also use this for all non-uniform groups) if you replace
return __nvvm_shfl_sync_up_i32(detail::ExtractMask(detail::GetMask(g))[0],
x, delta, 0);
with a call to
non_uniform_shfl
defined here:
as e.g.
unsigned localSetBit = g.get_local_id()[0] + 1;
int unfoldedSrcSetBit = localSetBit + delta;
auto MemberMask = detail::ExtractMask(detail::GetMask(g))[0];
return non_uniform_shfl(g, MemberMask, x,
__nvvm_fns(MemberMask, 0, unfoldedSrcSetBit));
Then it should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually thinking about it more, in this case I think the sign of delta might have to be changed for ballot_group/opportunistic group, when using that non_uniform_shfl
function.
e.g. in my above message you might have to replace
int unfoldedSrcSetBit = localSetBit + delta;
with:
int unfoldedSrcSetBit = localSetBit - delta;
In any case I think one of those should work. I can't remember the docs offhand for which version is correct (basically whether the semantic is send to idx or receive from). I think it is the - delta
version!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a ton, Jack! Let's give it a try! 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly it doesn't seem that it changed anything, but it also doesn't seem like we call non_uniform_shfl
here. I wonder if it is immediately usable or if it has been specialized for the use-case with reduce and scan. Either way, I do not have an easy way of testing fixes, so would you be okay with me disabling the subset of checks for CUDA and opening an issue so you can have a look when possible? (Tag @npmiller )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, no worries. I think I know how to fix it.
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
This reverts commit 033d589.
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
@intel/llvm-reviewers-runtime @sergey-semenov @uditagarwal97 - Friendly ping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Within the bounds of my limited knowledge in this part of the code base, the changes LGTM!
@intel/llvm-reviewers-runtime | @sergey-semenov - Friendly ping. |
This follows on from discussion of #12705 (comment) to impl/fix non-uniform group shuffles on cuda. - Non-uniform group algorithm impls fixes for permute/left/right - Generalize group shuffles to support double/half/long/short correctly for both uniform and non-uniform groups - Make fixed_size_group test fail if group member "local id" mapping not correct or removed. - Update ballot_group_algorithms.cpp to test previously failing cases on cuda backend. Shuffle impls in ::detail match those in syclomatic for masked shuffle builtins (which don't exist in oneapi outside syclomatic). --------- Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
This commit makes the non-uniform group classes support the shift and select algorithms.