Skip to content

Make marine recentering task run in parallel to bmat task #3805

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: develop
Choose a base branch
from

Conversation

AndrewEichmann-NOAA
Copy link
Contributor

@AndrewEichmann-NOAA AndrewEichmann-NOAA commented Jun 16, 2025

Description

Changes marine DA ensemble recentering task to run in parallel to marine bmat task, from one after the other. Also adds removes conditional dependency to recentering task to marine analysis checkpoint task, which was previously unneeded required with the serial dependency. This is to alleviate an unnecessary bottleneck between long-running tasks.

Resolves #3835 (Originally NOAA-EMC/GDASApp#1665)

Type of change

  • New feature (adds functionality)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? NO
  • Does this change require an update to any of the following submodules? NO

How has this been tested?

C96C48mx500_S2SW_cyc_gfs on Hera

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have documented my code, including function, input, and output descriptions
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • This change is covered by an existing CI test or a new one has been added
  • Any new scripts have been added to the .github/CODEOWNERS file with owners
  • I have made corresponding changes to the system documentation if necessary

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

The PR updates the marine ensemble recentering workflow so that the ocnanalecen task runs in parallel with the bmat task and adds the appropriate dependency to the marine analysis checkpoint task (marineanlchkpt). Key changes include:

  • Modifying ocnanalecen dependencies to use ocean history data and the enkfgdas_fcst metatask instead of waiting on marineanlvar.
  • Adjusting marineanlchkpt to always depend on marineanlvar and conditionally on ocnanalecen.
  • Simplifying dependency creation in marineanlfinal by removing the explicit dep_condition.
Comments suppressed due to low confidence (1)

dev/workflow/rocoto/gfs_tasks.py:817

  • [nitpick] Update this comment to indicate that ocnanalecen now runs in parallel with both marinebmat and the enkfgdas_fcst metatask, so its intent matches the implemented dependencies.
        # can run in parallel with marinebmat

if self.options['do_hybvar_ocn']:
dep_dict = {'type': 'task', 'name': f'{self.run}_ocnanalecen'}
else:
dep_dict = {'type': 'task', 'name': f'{self.run}_marineanlvar'}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this also mean that marineanlchkpt will run in parallel with marineanlvar for non-hybrid cases?

Copy link
Contributor Author

@AndrewEichmann-NOAA AndrewEichmann-NOAA Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently in develop, marineanlvar runs in both deterministic and ensemble setups, and inserts ocnanalecen serially after marineanlvar only with do_hybvar_ocn, so marineanlchkpt depends on either one or the other according to do_hybvar_ocn. The PR will have them run in parallel, so marineanlchkpt will depend on marineanlvar in every case, and ocnanalecen in addition to marineanlvar with do_hybvar_ocn.

Copy link
Contributor

@shlyaeva shlyaeva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple questions/suggestions:

  • does ocnanalecen copy/link ensemble forecasts? checking that it's not relying on bmat job to do that copying/linking.
  • I think marineanlchkpt doesn't need to depend on ocnanalecen since recentering only recenters the ensemble forecasts around a deterministic forecast. Perhaps the dependency could be for the ensemble forecast for this cycle on ocnanalecen.

@AndrewEichmann-NOAA
Copy link
Contributor Author

AndrewEichmann-NOAA commented Jun 18, 2025

A couple questions/suggestions:

  • does ocnanalecen copy/link ensemble forecasts? checking that it's not relying on bmat job to do that copying/linking.
  • I think marineanlchkpt doesn't need to depend on ocnanalecen since recentering only recenters the ensemble forecasts around a deterministic forecast. Perhaps the dependency could be for the ensemble forecast for this cycle on ocnanalecen.
  • as it happens, both stage the background. I forget if this was deliberate or not. @guillaumevernieres ? Maybe that should go into the init task?
  • there was some file that ocnanalecen stages that marineanlchkpt was failing on, hence the added dependency. I should have recorded it but didn't, it was in a log that I deleted after the failed experiment.

@shlyaeva
Copy link
Contributor

there was some file that ocnanalecen stages that marineanlchkpt was failing on, hence the added dependency. I should have recorded it but didn't, it was in a log that I deleted after the failed experiment.

Interesting, I wouldn't expect anything in marineanlchkpt to depend on ocnanalecen. It may be worth fixing this and having no dependency in the analysis on the recentering at all (I think recentering runs a lot slower than even bmat and var combined). I'm happy to help with this, if you don't mind rerunning and finding out what the file was. Or also happy to go with the current solution if others feel like it's good enough.

@shlyaeva
Copy link
Contributor

as it happens, both stage the background.

Perfect!

@AndrewEichmann-NOAA
Copy link
Contributor Author

Interesting, I wouldn't expect anything in marineanlchkpt to depend on ocnanalecen. It may be worth fixing this and having no dependency in the analysis on the recentering at all (I think recentering runs a lot slower than even bmat and var combined). I'm happy to help with this, if you don't mind rerunning and finding out what the file was. Or also happy to go with the current solution if others feel like it's good enough.

I'll re-run without the second dependency and have ocnanalecen fail before doing anything.

@CatherineThomas-NOAA
Copy link
Contributor

I cloned the branch for the PR and compared XML files between two v17-like experiments. The differences I see align with what I would expect from the code changes:

  • ocnanalecen now depends on the previous cycle's forecasts
  • marineanlchkpt now depends on both ocnanalecen and marineanlvar
  • Extra <and> tags were removed from the marineanlfinal dependency

@AndrewEichmann-NOAA : Do you still have changes to make with regards to the chkpt task?

@AndrewEichmann-NOAA
Copy link
Contributor Author

I cloned the branch for the PR and compared XML files between two v17-like experiments. The differences I see align with what I would expect from the code changes:

  • ocnanalecen now depends on the previous cycle's forecasts
  • marineanlchkpt now depends on both ocnanalecen and marineanlvar
  • Extra <and> tags were removed from the marineanlfinal dependency

@AndrewEichmann-NOAA : Do you still have changes to make with regards to the chkpt task?

@CatherineThomas-NOAA I am setting up a test to see if marineanlchkpt really should depend on ocnanalecen as I found earlier, or at least determine why - I had tried earlier and ran into problems with the submodules aligning. I just completed C96C48mx500_S2SW_cyc_gfs with the PR as is and it would be good to go if everybody is comfortable with that.

@AndrewEichmann-NOAA
Copy link
Contributor Author

@CatherineThomas-NOAA @shlyaeva I ran C96C48mx500_S2SW_cyc_gfs with the now current hash, with the dependency of marineanlchkpt on ocnanalecen removed (the since pushed version) and it completed without problem, that is, without reproducing whatever I thought I saw earlier. Some of the changes in the GDASApp or jcb-gdas yamls might have changed that. In any case, this should be as ready as it's going to get.

Copy link
Contributor

@shlyaeva shlyaeva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hooray! I think to be safe we also need a dependency of ensemble forecast on the recentering, since the recentering saves mom6 increments and cice restarts that are used in the ensemble forecast.

@AndrewEichmann-NOAA
Copy link
Contributor Author

Hooray! I think to be safe we also need a dependency of ensemble forecast on the recentering, since the recentering saves mom6 increments and cice restarts that are used in the ensemble forecast.

@shlyaeva Do you mean something parallel to the dependency on marineanlfinal added here?

if self.options['do_jediocnvar']:

I'd note that the dependency of marineanlchkpt on ocnanalecen, while being technically premature in the workflow, has the advantage of having been tested.

@shlyaeva
Copy link
Contributor

@shlyaeva Do you mean something parallel to the dependency on marineanlfinal added here?

I don't know for sure, I don't really know which task is which here. Maybe someone with experience in these tasks can comment.

I'd note that the dependency of marineanlchkpt on ocnanalecen, while being technically premature in the workflow, has the advantage of having been tested.

It's fine with me to rollback to having dependency of marineanlchkpt on ocnanalecen if everyone agrees.

@AndrewEichmann-NOAA
Copy link
Contributor Author

@shlyaeva @DavidHuber-NOAA @CatherineThomas-NOAA Any opinions on this?

@guillaumevernieres
Copy link
Contributor

@shlyaeva @DavidHuber-NOAA @CatherineThomas-NOAA Any opinions on this?

Having marineanlchkpt depend on ocnanalecen sounds good to me too.

@CatherineThomas-NOAA
Copy link
Contributor

@shlyaeva

I think to be safe we also need a dependency of ensemble forecast on the recentering

Correct. This dependency already exists in the workflow, so no need to add it in this PR.

@AndrewEichmann-NOAA : I think it's safer for now to roll back the marineanlchkpt change. You saw something previously and there might be a reason that dependency was there in the first place. As you said, the dependency being in place is something that's been thoroughly tested. Moving ocnanalecen to be parallel with marineanlvar is already going to give us a big time savings. I vote that we get this change in and if we're still having timing problems later, we can revisit the chkpt job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make marine DA recentering and var tasks run in parallel
5 participants