-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NCU Reader Support for RAJA_CUDA and Lambda_CUDA #201
Conversation
05e6275
to
b1b8ea9
Compare
b1b8ea9
to
32e1406
Compare
Changing this PR to "work in progress" because there are complications due to mismatches in demangled kernel names from Caliper and NCU. |
This PR supersedes https://github.com/LLNL/thicket/tree/rajaperf-paper branch used for the rajaperf paper. |
32e1406
to
aa3146f
Compare
Addressed in new changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a pytest test for each of the cases in your table so we can make sure each is supported (Base_CUDA, Lambda_CUDA, RAJA_CUDA X rajaperf, cub, multiple
Do you want to remove the |
Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, just the question about disabling tqdm, otherwise will approve
thicket/ncu.py
Outdated
@@ -113,8 +237,12 @@ def _read_ncu(thicket, ncu_report_mapping): | |||
pbar = tqdm(range) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can they disable tqdm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Option added in 0ef7c0d
…ity matching for cub kernels
ef8a6dd
to
0ef7c0d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
tldr:
Lambda_CUDA
andRAJA_CUDA
variants by using demangled kernel names.debug
flag to see detailed information about kernel matches, so we can preliminarily investigate future issues without editing the source code.Description
Enables support for reading NCU report profiles for
RAJA_CUDA
andLambda_CUDA
variants andcub
kernels by using the demangled action name.The current Thicket NCU reader matches nodes in a Caliper
cuda_activity_profile
(CAP) and NCU report file by checking if an action in the report has the nameaction.name(_ncu_report.IAction_NameBase_FUNCTION)
, which forBase_CUDA
is the name of the kernel (e.g.daxpy
,energy1
, orenergy2
). This name can be found in the CAP node namekernel_name in node.frame["name"]
.For
RAJA_CUDA
andLambda_CUDA
, the above assumption does not hold, as the values foraction.name(_ncu_report.IAction_NameBase_FUNCTION)
will not be the kernel names. However, the kernel names are still embedded in theaction.name(_ncu_report.IAction_NameBase_DEMANGLED)
demangled action name. This PR parses the demangled name to match the nodes in the CAP, which also works forBase_CUDA
profiles.For cub kernels, there may be kernels with the same name, but different function signatures. For example, matching the ncu kernel
void cub::DeviceRadixSortDownsweepKernel<cub::DeviceRadixSortPolicy<double, double, int>::Policy700, false, false, double, double, int>(const T4 *, T4 *, const T5 *, T5 *, T6 *, T6, int, int, cub::GridEvenShare<T6>)
to the firstDeviceRadixSortDownsweepKernel
in the following calltree:We use similarity matching using the standard library difflib
SequenceMatcher
to match the two, after first narrowing the search down to theAlgorithm_SORT
part of the calltree.NCU kernel support by variant:
This PR (#201)
Develop