-
Notifications
You must be signed in to change notification settings - Fork 49
Stochastic Physics CPU and GPU Optimizations - NGARCH #65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Stochastic Physics CPU and GPU Optimizations - NGARCH #65
Conversation
|
Hi Jason, thanks for moving this over to GitHub. I've left a few things for you to address before I can approve the SciTech review. These are all style fixes -
|
mo-alistairp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes functionally look good. I've left inline comments requesting changes - they are all style changes (mostly to do with spacing and alignment). I also will need you to fix all the pep8 failures in the Python scripts, which can be seen by running the flake8 command.
interfaces/physics_schemes_interface/source/algorithm/spt_main_alg_mod.x90
Outdated
Show resolved
Hide resolved
interfaces/physics_schemes_interface/source/algorithm/spt_main_alg_mod.x90
Outdated
Show resolved
Hide resolved
interfaces/physics_schemes_interface/source/kernel/stph/skeb_biharm_diss_kernel_mod.F90
Outdated
Show resolved
Hide resolved
...ces/physics_schemes_interface/source/kernel/stph/spt_convection_cfl_limit_cap_kernel_mod.F90
Outdated
Show resolved
Hide resolved
interfaces/physics_schemes_interface/source/kernel/stph/spt_levels_cap_kernel_mod.F90
Outdated
Show resolved
Hide resolved
interfaces/physics_schemes_interface/source/kernel/stph/spt_levels_cap_kernel_mod.F90
Outdated
Show resolved
Hide resolved
...rfaces/physics_schemes_interface/source/kernel/stph/spt_moisture_conservation_kernel_mod.F90
Outdated
Show resolved
Hide resolved
interfaces/physics_schemes_interface/source/kernel/stph/spt_orog_cap_kernel_mod.F90
Outdated
Show resolved
Hide resolved
328a2d9 to
8cefbfb
Compare
|
Hi @mo-alistairp , thanks for the comments of the styling issues. I have addresssed them and so re-request your review now. |
mo-alistairp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for making those changes it is indeed passing lint checks now. I'm happy with these changes so SciTech review: PASS. I'll pass this onto code review now with @MetBenjaminWent
MetBenjaminWent
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Developer tests all pass, and a local check shows that core CPU OMP functionality for EXs remains unchanged, but also has GPU functionality for these 3x scripts.
Whilst I would have liked to see these scripts tidied up and broken into functions, I recognise that it would block this ticket.
I'm approving this ticket on technical groups, but noting a technical debt included in these scripts.
| const = LFRicConstants() | ||
| cpu_parallel = OMPParallelTrans() | ||
|
|
||
| if OFFLOAD_DIRECTIVES == "omp": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a good way to ensure this is set, however, could these checks be nested inside a function to reduce the length of this script?
For PSYKAL LFRic API, these would be added here:
https://github.com/MetOffice/lfric_core/blob/main/infrastructure/build/psyclone/psyclone_tools.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can add this in a new PR if needed.
|
|
||
| # Make setval_* compute redundantly to the level 1 halo if it | ||
| # is in its own loop | ||
| for loop in subroutine.loops(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like above, could this functionality be nested inside a function to reduce the length of this script?
For PSYKAL LFRic API, these would be added here:
https://github.com/MetOffice/lfric_core/blob/main/infrastructure/build/psyclone/psyclone_tools.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is redundant_computation__setval ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can add this in a new PR if needed. It is indeed doing the same thing as redundant_computation__setval.
|
|
||
| # Colour loops over cells unless they are on discontinuous spaces | ||
| # (alternatively we could annotate the kernels with atomics) | ||
| for loop in subroutine.loops(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like above, could this functionality be nested inside a function to reduce the length of this script?
For PSYKAL LFRic API, these would be added here:
https://github.com/MetOffice/lfric_core/blob/main/infrastructure/build/psyclone/psyclone_tools.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is colour_loops ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is colour_loops essentially.
|
|
||
| # Mark kernels inside the loops over cells as GPU-enabled | ||
| # and inline them. | ||
| for loop in subroutine.loops(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like above, could this functionality be nested inside a function to reduce the length of this script?
For PSYKAL LFRic API, these would be added here:
https://github.com/MetOffice/lfric_core/blob/main/infrastructure/build/psyclone/psyclone_tools.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can add this in a new PR if needed.
| ) | ||
| print(f"GPU-annotated kernel '{kern.name}'") | ||
|
|
||
| try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how much I trust nested try statements, but given the dependency between annotating the kernel, then in-lining it, this doesn't seem unreasonable.
| ) | ||
| for name in kernel_names | ||
| ): | ||
| try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However here, could the if's and the try wrapper around the choices be reversed?
Have the ifs for the options, and then trys for each option?
It just seems a bit safer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can verify this in the new PR.
| # arrays and convert to code unsupported intrinsics. | ||
|
|
||
| # Add GPU offloading to loops unless they are over colours or are null. | ||
| for loop in subroutine.walk(Loop): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like above, could this functionality be nested inside a function to reduce the length of this script?
For PSYKAL LFRic API, these would be added here:
https://github.com/MetOffice/lfric_core/blob/main/infrastructure/build/psyclone/psyclone_tools.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can add this in a new PR if needed.
|
|
||
| # Apply OpenMP thread parallelism for any kernels we've not been able | ||
| # to offload to GPU. | ||
| for loop in subroutine.walk(Loop): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like above, could this functionality be nested inside a function to reduce the length of this script?
For PSYKAL LFRic API, these would be added here:
https://github.com/MetOffice/lfric_core/blob/main/infrastructure/build/psyclone/psyclone_tools.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can add this in a new PR if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments from applications/lfric_atm/optimisation/meto-ex1a/psykal/algorithm/runtime_constants/physics_constants_mod.py are also applicable here as both scripts appear very similar, indicating duplicated code?
Where there are minor differences, these could be function options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments from applications/lfric_atm/optimisation/meto-ex1a/psykal/algorithm/runtime_constants/physics_constants_mod.py are also applicable here as both scripts appear very similar, indicating duplicated code?
Where there are minor differences, these could be function options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is correct.
|
Hi @jasonjunweilyu would you be able to fully reverify all the commits for us so we can commit this? Thanks! https://metoffice.github.io/simulation-systems/WorkingPractices/gh_authorisation.html
|
…n git_migration tag
c1dd404 to
70f44a1
Compare
|
Hi @MetBenjaminWent , thanks a lot for your review. I have reverified my first two commits with |
MetBenjaminWent
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re approving on request of James Bruten to test workflow retrigger.
|
Hi @MetBenjaminWent , I have just resolve conflicts between Please let me know if you need me to do anything else. Otherwise, I will wait for you to merge this branch in without re-requesting review from you. Thanks. |

PR Summary
Sci/Tech Reviewer: @mo-alistairp
Code Reviewer: @MetBenjaminWent
CC: @hiker
This completes the work of NGARCH stochastic physics optimization for CPU and GPU by migrating from fcm to git to be merged to
main. Details of the completed work are documented in ticket:543.Code Quality Checklist
(Some checks are automatically carried out via the CI pipeline)
style guidelines
readability of the code
Testing
acceptable (eg. kgo changes)
tests, unit tests, etc.)
and have tests been allocated to an appropriate testing group (i.e. the
developer tests are for jobs which use a small amount of compute resource
and complete in a matter of minutes)
Since only the rose-stem climate config is now using stochastic physics, I have run the
lfric_atm_clim_gal9_nomg-C12_gadi_intel_fast-debug-64bitavailable on NCI GADI. The build and run jobs finished successfully. For KGO verification, since KGOs on GADI have not been updated, I have compared the KGO with the one produced by running the trunk version I forked from and confirmed that they are identical.trac.log
Security Considerations
Performance Impact
performance measurements have been conducted
AI Assistance and Attribution
of Generative AI tool name (e.g., Met Office Github Copilot Enterprise,
Github Copilot Personal, ChatGPT GPT-4, etc) and I have followed the
Simulation Systems AI policy
(including attribution labels)
Documentation
confirmed that it builds correctly
PSyclone Approval
inteface, optimisation scripts, LFRic data structure code) then please
contact the
tooscollabdevteam@metoffice.gov.uk
Sci/Tech Review
Please alert the code reviewer via a tag when you have approved the SR
Code Review