-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase granularity of halo-exchange timing info #639
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #639 +/- ##
==========================================
+ Coverage 42.85% 42.96% +0.11%
==========================================
Files 61 61
Lines 16280 16314 +34
Branches 1891 1882 -9
==========================================
+ Hits 6976 7010 +34
- Misses 8259 8260 +1
+ Partials 1045 1044 -1 ☔ View full report in Codecov by Sentry. |
Is it possible to adjust the NVTX range naming to make the hierarchy of the ranges more obvious? For example, |
Nice. I think a 'TSTEP-SUBSTEP' range makes sense (for RK3 you have 3 such substeps). This helps consolidate things. Related to #631 |
@max-Hawkins would you mind updating/finish this for merge? |
1aac797
to
095d890
Compare
@henryleberre Ready for your evaluation. |
needs edit: nvm |
Thanks! A beauty. Merging. |
Description
Previously, the NVTX ranges measuring the so-called 'MPI' time included the time to unpack and pack the contiguous buffers actually exchanged during the MPI_SENDRECV operation. While this may make sense, to avoid confusion and always be able to get proper communication time, I renamed the 'RHS-MPI' NVTX range to 'RHS-MPI+BufPack' and added an NVTX range only around the MPI_SENDRECV operation called 'RHS-MPI_SENDRECV.'
Type of change
How Has This Been Tested?
I ran an example case under nsys with and without this change. The reported timing from the new RHS-MPI_SENDRECV NVTX range was within 5% error of the MPI trace time reporting for this example.
See below for screenshots from the NSYS reports. In this example, the MPI_SENDRECV time is ~1.4% of the total 'MPI' time.
This shows the NSYS MPI trace timing info. Note the highlighted line's 'total time'
This is the NVTX range timing information. Note that the RHS-MPI_SENDRECV range total time is similar to the new NVTX range result:
Test Configuration:
4 V100 nodes on Phoenix running the 2D shockbubble case for 700 timesteps.
Checklist
./mfc.sh format
before committing my codeIf your code changes any code source files (anything in
src/simulation
)To make sure the code is performing as expected on GPU devices, I have:
nvtx
ranges so that they can be identified in profiles./mfc.sh run XXXX --gpu -t simulation --nsys
, and have attached the output file (.nsys-rep
) and plain text results to this PR./mfc.sh run XXXX --gpu -t simulation --omniperf
, and have attached the output file and plain text results to this PR.