Skip to content

Conversation

@MrBurmark
Copy link
Member

Summary

Add more accurate kahan and cascade tunings for Base_Seq variants of floating point reduction kernels. This is useful at larger problem sizes where the default naive left fold reduce algorithm becomes inaccurate. This innacuracy can causes GPU variant tunings to spuriously fail their checksums. Adding more accurate reduction techniques like kahan and cascade summation fixes this issue.

Currently the first run variant tuning is used as both the checksum and performance reference. Allow the checksum reference to differ by adding tuning attributes that indicate, among other possible things, which variant tunings have a preferred checksum. Now it will pick the first run variant tuning as the checksum reference but will overwrite that with the first run variant tuning that has a preferred checksum. Note that this has the unfortunate side effect that the -sp output can indicate that variant tuning passed their checksums even if they fail because a later variant tuning may have a preferred checksum, however the output files are correct.

Note that it is a bit strange that I'm calling these Base_Seq variants because they use RAJA classes in their implementations even if they don't use RAJA loop constructs.

Add tuning attributes that can allow the variant tunings to be
designated as preferred for checksum references. The checksum
reference is still decided by which variant tuning runs first
but now a preferred checksum variant tuning can overwrite a
non-preferred variant tuning.
@MrBurmark MrBurmark requested review from artv3 and rhornung67 January 20, 2026 20:35
@MrBurmark MrBurmark enabled auto-merge January 20, 2026 23:51
@MrBurmark MrBurmark merged commit be6ba05 into develop Jan 21, 2026
17 checks passed
@MrBurmark MrBurmark deleted the feature/burmark1/reduce_checksums branch January 21, 2026 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants