NCU Reader Support for RAJA_CUDA and Lambda_CUDA #201

michaelmckinsey1 · 2024-07-29T22:51:58Z

tldr:

Support Lambda_CUDA and RAJA_CUDA variants by using demangled kernel names.
Add debug flag to see detailed information about kernel matches, so we can preliminarily investigate future issues without editing the source code.
Support cub kernels via string similarity to compare function signatures.
Adds unit testing by refactoring the matching functions

Description

Enables support for reading NCU report profiles for RAJA_CUDA and Lambda_CUDA variants and cub kernels by using the demangled action name.

The current Thicket NCU reader matches nodes in a Caliper cuda_activity_profile (CAP) and NCU report file by checking if an action in the report has the name action.name(_ncu_report.IAction_NameBase_FUNCTION), which for Base_CUDA is the name of the kernel (e.g. daxpy, energy1, or energy2). This name can be found in the CAP node name kernel_name in node.frame["name"].

For RAJA_CUDA and Lambda_CUDA, the above assumption does not hold, as the values for action.name(_ncu_report.IAction_NameBase_FUNCTION) will not be the kernel names. However, the kernel names are still embedded in the action.name(_ncu_report.IAction_NameBase_DEMANGLED) demangled action name. This PR parses the demangled name to match the nodes in the CAP, which also works for Base_CUDA profiles.

For cub kernels, there may be kernels with the same name, but different function signatures. For example, matching the ncu kernel void cub::DeviceRadixSortDownsweepKernel<cub::DeviceRadixSortPolicy<double, double, int>::Policy700, false, false, double, double, int>(const T4 *, T4 *, const T5 *, T5 *, T6 *, T6, int, int, cub::GridEvenShare<T6>) to the first DeviceRadixSortDownsweepKernel in the following calltree:

nan RAJAPerf
└─ nan Algorithm
   ├─ nan Algorithm_SORT
   │  ├─ nan cudaLaunchKernel
   │  │  ├─ 1016096.000 void cub::DeviceRadixSortDownsweepKernel<cub::DeviceRadixSortPolicy<double, cub::NullType, int>::Policy700, **false, false**, double, cub::NullType, int>(double const*, double*, cub::NullType const*, cub::NullType*, int*, int, int, int, cub::GridEvenShare<int>)
   │  │  ├─ 1399520.000 void cub::DeviceRadixSortDownsweepKernel<cub::DeviceRadixSortPolicy<double, cub::NullType, int>::Policy700, **true, false**, double, cub::NullType, int>(double const*, double*, cub::NullType const*, cub::NullType*, int*, int, int, int, cub::GridEvenShare<int>)

We use similarity matching using the standard library difflib SequenceMatcher to match the two, after first narrowing the search down to the Algorithm_SORT part of the calltree.

NCU kernel support by variant:

This PR (#201)

	Base_CUDA	Lambda_CUDA	RAJA_CUDA
rajaperf kernels	✓	✓	✓
cub kernels	✓	✓	✓
kernels with multiple instances	✓	✓	✓

Develop

	Base_CUDA	Lambda_CUDA	RAJA_CUDA
rajaperf kernels	✓	x	x
cub kernels	x	x	x
kernels with multiple instances	✓	x	x

ilumsden · 2024-08-12T14:11:35Z

Changing this PR to "work in progress" because there are complications due to mismatches in demangled kernel names from Caliper and NCU.

michaelmckinsey1 · 2024-08-26T16:44:47Z

This PR supersedes https://github.com/LLNL/thicket/tree/rajaperf-paper branch used for the rajaperf paper.

michaelmckinsey1 · 2024-10-04T19:52:07Z

Changing this PR to "work in progress" because there are complications due to mismatches in demangled kernel names from Caliper and NCU.

Addressed in new changes.

dyokelson

Please add a pytest test for each of the cases in your table so we can make sure each is supported (Base_CUDA, Lambda_CUDA, RAJA_CUDA X rajaperf, cub, multiple

thicket/ncu.py

slabasan · 2024-10-25T13:35:21Z

This PR supersedes https://github.com/LLNL/thicket/tree/rajaperf-paper branch used for the rajaperf paper.

Do you want to remove the rajaperf-paper branch on this repo?

michaelmckinsey1 · 2024-10-25T18:41:27Z

This PR supersedes https://github.com/LLNL/thicket/tree/rajaperf-paper branch used for the rajaperf paper.

Do you want to remove the rajaperf-paper branch on this repo?

Done

dyokelson

looks good, just the question about disabling tqdm, otherwise will approve

dyokelson · 2024-10-25T23:45:48Z

thicket/ncu.py

@@ -113,8 +237,12 @@ def _read_ncu(thicket, ncu_report_mapping):
                pbar = tqdm(range)


Can they disable tqdm?

Option added in 0ef7c0d

…ity matching for cub kernels

dyokelson

LGTM

michaelmckinsey1 changed the title ~~Fix ncu rajacuda~~ Support for RAJA_CUDA Jul 29, 2024

michaelmckinsey1 changed the title ~~Support for RAJA_CUDA~~ NCU Reader Support for RAJA_CUDA Jul 29, 2024

michaelmckinsey1 requested review from ilumsden and Yejashi July 29, 2024 23:04

michaelmckinsey1 self-assigned this Jul 29, 2024

michaelmckinsey1 changed the title ~~NCU Reader Support for RAJA_CUDA~~ NCU Reader Support for RAJA_CUDA and Lambda_CUDA Jul 29, 2024

michaelmckinsey1 force-pushed the fix-ncu_RAJACUDA branch from 05e6275 to b1b8ea9 Compare July 30, 2024 19:28

michaelmckinsey1 force-pushed the fix-ncu_RAJACUDA branch from b1b8ea9 to 32e1406 Compare August 10, 2024 17:17

ilumsden added status-work-in-progress PR is currently being worked on and removed status-ready-for-review This PR is ready to be reviewed by assigned reviewers labels Aug 12, 2024

michaelmckinsey1 force-pushed the fix-ncu_RAJACUDA branch from 32e1406 to aa3146f Compare October 3, 2024 19:52

michaelmckinsey1 added status-ready-for-review This PR is ready to be reviewed by assigned reviewers and removed status-work-in-progress PR is currently being worked on labels Oct 4, 2024

michaelmckinsey1 requested a review from slabasan October 4, 2024 19:34

michaelmckinsey1 mentioned this pull request Oct 4, 2024

Thicket NCU Reader Tutorial LLNL/thicket-tutorial#46

Merged

michaelmckinsey1 requested a review from dyokelson October 7, 2024 16:47

dyokelson requested changes Oct 7, 2024

View reviewed changes

thicket/ncu.py Outdated Show resolved Hide resolved

thicket/ncu.py Outdated Show resolved Hide resolved

michaelmckinsey1 added priority-normal Normal priority issues and PRs and removed priority-urgent Urgent priority issues and PRs labels Oct 11, 2024

dyokelson added this to the 2024.3.0 milestone Oct 18, 2024

michaelmckinsey1 requested a review from dyokelson October 23, 2024 22:15

dyokelson reviewed Oct 25, 2024

View reviewed changes

Michael Richard Mckinsey and others added 20 commits October 28, 2024 12:32

Add error check

b7eb82b

Add error check

3168bbb

Enable matching RAJA_CUDA data using demangled string

56be7d3

black and docstring

35d4267

black

c6c8d57

Add Lambda_CUDA support

c639ce1

black

bf7d69a

Update docstring

ba735a0

Skip matching if different pattern. Add additional debug statements

4122264

Partial cub support

463ba16

change cache to use demangled name as key for uniqueness. Add similar…

2922e66

…ity matching for cub kernels

Remove unused variable

a8551c9

Refactor matching functions to enable unit testing

b193b65

black

b56847d

black

5e96f5c

Add license to file

c7735f1

Refactor import so functions can be defined in ncu.py

c46fa36

Rename file

cb63a65

Reorder import

0d7e2c9

Add option to disable tqdm

0ef7c0d

michaelmckinsey1 force-pushed the fix-ncu_RAJACUDA branch from ef8a6dd to 0ef7c0d Compare October 28, 2024 19:39

michaelmckinsey1 requested a review from dyokelson October 28, 2024 19:40

dyokelson approved these changes Oct 28, 2024

View reviewed changes

dyokelson added status-approved No more revisions are required on this PR and it is ready for merge and removed status-ready-for-review This PR is ready to be reviewed by assigned reviewers labels Oct 28, 2024

slabasan approved these changes Oct 29, 2024

View reviewed changes

slabasan merged commit 345eddf into LLNL:develop Oct 29, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NCU Reader Support for RAJA_CUDA and Lambda_CUDA #201

NCU Reader Support for RAJA_CUDA and Lambda_CUDA #201

michaelmckinsey1 commented Jul 29, 2024 •

edited

Loading

ilumsden commented Aug 12, 2024

michaelmckinsey1 commented Aug 26, 2024 •

edited

Loading

michaelmckinsey1 commented Oct 4, 2024

dyokelson left a comment

slabasan commented Oct 25, 2024

michaelmckinsey1 commented Oct 25, 2024

dyokelson left a comment

dyokelson Oct 25, 2024

michaelmckinsey1 Oct 28, 2024

dyokelson left a comment

		@@ -113,8 +237,12 @@ def _read_ncu(thicket, ncu_report_mapping):
		pbar = tqdm(range)

NCU Reader Support for RAJA_CUDA and Lambda_CUDA #201

NCU Reader Support for RAJA_CUDA and Lambda_CUDA #201

Conversation

michaelmckinsey1 commented Jul 29, 2024 • edited Loading

Description

NCU kernel support by variant:

This PR (#201)

Develop

ilumsden commented Aug 12, 2024

michaelmckinsey1 commented Aug 26, 2024 • edited Loading

michaelmckinsey1 commented Oct 4, 2024

dyokelson left a comment

Choose a reason for hiding this comment

slabasan commented Oct 25, 2024

michaelmckinsey1 commented Oct 25, 2024

dyokelson left a comment

Choose a reason for hiding this comment

dyokelson Oct 25, 2024

Choose a reason for hiding this comment

michaelmckinsey1 Oct 28, 2024

Choose a reason for hiding this comment

dyokelson left a comment

Choose a reason for hiding this comment

michaelmckinsey1 commented Jul 29, 2024 •

edited

Loading

michaelmckinsey1 commented Aug 26, 2024 •

edited

Loading