-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LST followups: better work divisions, concrete kernel dimension, some cleanup and fixes #47084
base: master
Are you sure you want to change the base?
LST followups: better work divisions, concrete kernel dimension, some cleanup and fixes #47084
Conversation
cms-bot internal usage |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47084/43263 |
A new Pull Request was created by @ariostas for master. It involves the following packages:
@cmsbuild, @jfernan2, @mandrenguyen can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
Tagging @fwyzard since most (if not all) of the comments addressed were his |
test parameters:
|
@cmsbuild please test |
-1 Failed Tests: UnitTests RelVals-GPU Unit TestsI found 1 errors in the following unit tests: ---> test test-das-selected-lumis had ERRORS RelVals-GPU
Comparison SummarySummary:
|
there are a bunch of errors like
the same workflow step3 in the baseline ran OK. So, the crash seems related to this PR. |
assign heterogeneous |
1a27b2a
to
baa91b3
Compare
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47084/43348 |
@cmsbuild please test |
+1 Size: This PR adds an extra 100KB to repository Comparison SummarySummary:
GPU Comparison SummarySummary:
|
+1 |
@cms-sw/heterogeneous-l2 |
Looks ok to me. @fwyzard Are you planning to take a look, or shall we just sign? |
I can have a look in the coming days, but if this is urgent for any reason go ahead and sign it, and I will still have a look after the fact. |
hold
Actually, you know what ? |
Pull request has been put on hold by @fwyzard |
Vec3D const blocksPerGrid_crossCleanpT3{1, 4, 20}; | ||
WorkDiv3D const crossCleanpT3_workDiv = | ||
createWorkDiv(blocksPerGrid_crossCleanpT3, threadsPerBlock_crossCleanpT3, elementsPerThread); | ||
auto const crossCleanpT3_workDiv = cms::alpakatools::make_workdiv<Acc2D>({20, 4}, {64, 16}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the X and Y values in the ranges are inverted with respect to before - is it intended ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, there were a few places where I flipped the order so that the loops are nested in the recommended order.
Vec3D const blocksPerGrid_crossCleanpLS{1, 4, 20}; | ||
WorkDiv3D const crossCleanpLS_workDiv = | ||
createWorkDiv(blocksPerGrid_crossCleanpLS, threadsPerBlock_crossCleanpLS, elementsPerThread); | ||
auto const crossCleanpLS_workDiv = cms::alpakatools::make_workdiv<Acc2D>({20, 4}, {32, 16}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also here (OK, so it's probably intended).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
if (slope == | ||
kVerticalModuleSlope) // Designated for tilted module when the slope is infinity (module lying along y-axis) | ||
if (slope == kVerticalModuleSlope || | ||
edm::isNotFinite(slope)) // Designated for tilted module when the slope is infinity (module lying along y-axis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@makortel do you know if edm::isFinite
/edm::isNotFinite
is guaranteed to in device code ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically they could presently work: the functions are constexpr
and presently use union
for type punning (that is strictly speaking undefined behavior). I'd like to replace the union
with std::bit_cast
that is constexpr
, but on the other hand e.g. https://stackoverflow.com/a/78232359 kind of suggests to use cuda::std::bit_cast
on CUDA 12.8.
I see in GCC 12 the std::bit_cast
implementation is just a call to __builtin_bit_cast
, and that e.g. in https://github.com/cms-sw/cmssw/blob/master/HeterogeneousCore/AlpakaInterface/interface/atomicMaxF.h we use edm::bit_cast
(that just forwards to std::bit_cast
or __builtin_bit_cast
) only for CPU implementation (I don't remember the exact reason for that though, whether the edm::bit_cast
didn't work on device code, or the intrinsics were "easier" on CUDA+HIP).
So perhaps for long term it would be better to define Alpaka-specific functions (ideally Alpaka could provide a portable bit_cast
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It currently does work, but I agree that would be better to be more careful about it (in a separate PR?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a related note, should I also reimplement std::distance
to be safe?
unhold |
This PR addresses some of the LST followups that we have listed in #46746.
Here is the list of fixes/changes:
cms::alpakatools::makeworkdiv
(instead of our customcreateWorkDiv
) and we now usecms::alpakatools::uniform_elements
for kernel loops.kVerticalModuleSlope
(previously namedlst_INF
). We're doing this in two steps instead of one since the data files also need to be updated. We ensure a smooth transition by first supporting both options and later removing the legacy one.c.c. @slava77 @VourMa