Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates for cycling C1152 ATM #3173

Open
CatherineThomas-NOAA opened this issue Dec 17, 2024 · 7 comments
Open

Updates for cycling C1152 ATM #3173

CatherineThomas-NOAA opened this issue Dec 17, 2024 · 7 comments

Comments

@CatherineThomas-NOAA
Copy link
Contributor

CatherineThomas-NOAA commented Dec 17, 2024

What is wrong?

This issue is for tracking the updates needed to cycle C1152 atmosphere-only. Changes found so far in testing:

  • config.resources(.$machine): include C1152 case options for DA tasks
  • gdasfcst config.ufs: WRTTASK_PER_GROUP_PER_THREAD_PER_TILE_GDAS=20
  • source code update for calc_analysis (PR#59): new solution proposed - untested
  • fix needed for atmanl upp: solution proposed - untested

What should have happened?

C1152 ATM-only should cycle without failure on WCOSS2.

What machines are impacted?

All or N/A, WCOSS2

What global-workflow hash are you using?

bc61862

Steps to reproduce

  • Clone and build develop
  • Set up a C1152/C384 ATM-only cycling experiment on WCOSS2

Additional information

Logs and expdirs have been copied on to Hera:
/scratch1/NCEPDEV/da/Catherine.Thomas/v17/c1152

Do you have a proposed solution?

Each problem and potential solutions are being documented in this issue as they arise. Once C1152/C384 ATM-only can run without error on WCOSS2, I will open a PR with the needed changes.

@CatherineThomas-NOAA CatherineThomas-NOAA added bug Something isn't working triage Issues that are triage labels Dec 17, 2024
@CatherineThomas-NOAA
Copy link
Contributor Author

@WenMeng-NOAA : Would you be able to take a look at the atmanl_upp failure? I copied the log file onto Hera here:
/scratch1/NCEPDEV/da/Catherine.Thomas/v17/c1152/logs/log_c1152_atm/2021070112/gdas_atmanlupp.log

@WenMeng-NOAA
Copy link
Contributor

@CatherineThomas-NOAA I will look into this issue.

@WenMeng-NOAA
Copy link
Contributor

@CatherineThomas-NOAA From your runtime log:

[38;5;39m2024-12-15 05:11:11,496 - DEBUG    - upp         : ( <exe: ['mpiexec', '-l', '-n', '120', '-ppn', '120', '--cpu-bind', 'depth', '--depth', '1', '/lfs/h2/emc/stmp/catherine.thomas/RUNDIRS/c1152atm/gdas.2021070112/upp.147572/upp.x']> )

I would suggest increasing the total tasks from 120 to 200 or more and reducing tasks per node to allocate more memory for C1152. My own UPP standalone test for C1152 on WCOSS2 is configurated as: mpiexec -l -ppn 40 -n 240

@CatherineThomas-NOAA
Copy link
Contributor Author

@WenMeng-NOAA: Thanks for the configuration suggestion. I'll try it once WCOSS2 is back.

@CatherineThomas-NOAA
Copy link
Contributor Author

Description updated to replace NOAA-EMC/GSI-utils#57 with NOAA-EMC/GSI-utils#59. Testing of NOAA-EMC/GSI-utils#59 is on hold until WCOSS2 returns.

CatherineThomas-NOAA added a commit to CatherineThomas-NOAA/global-workflow that referenced this issue Dec 19, 2024
A handful of updates are needed to cycle with C1152
atmosphere, which are mostly related to configs and
resources.

Refs: NOAA-EMC#3173
@CatherineThomas-NOAA
Copy link
Contributor Author

Config changes have been included in my fork here. These config changes were added by hand based on previous WCOSS testing. This particular branch has only been tested on Hera to make sure it didn't fail when creating the XML. Once WCOSS returns, it can be tested properly with C1152 including the updates to GSI-utils and upp node configuration.

@CatherineThomas-NOAA
Copy link
Contributor Author

I reran my C1152 ATM-only cycling experiment with my fork and changes from NOAA-EMC/GSI-utils#59.

The analcalc and atmanl upp jobs completed without error. As a sanity check, I plotted the gsistats over a few cycles and results were consistent with operations, with some notable improvements:
gsistat_uvtq_RMSE

I had an error in the archive step which was related to obs, but I jerry rigged the DMPDIR for this case, so it's likely this is not a typical error. Unfortunately, my logs have scrubbed so I can't follow up.

@aerorahul aerorahul removed bug Something isn't working triage Issues that are triage labels Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants