Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase wallclock for analysis job #1997

Conversation

WalterKolczynski-NOAA
Copy link
Contributor

@WalterKolczynski-NOAA WalterKolczynski-NOAA commented Oct 30, 2023

Description

Increases the wallclock limit of the analysis job so the job completes on Orion.

Resolves #1996

Type of change

  • Bug fix (fixes something broken)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? NO

How has this been tested?

Letting CI handle it

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • I have made corresponding changes to the documentation if necessary

@WalterKolczynski-NOAA WalterKolczynski-NOAA added the CI-Orion-Ready **CM use only** PR is ready for CI testing on Orion label Oct 30, 2023
@WalterKolczynski-NOAA WalterKolczynski-NOAA self-assigned this Oct 30, 2023
@emcbot emcbot added CI-Orion-Building **Bot use only** CI testing is cloning/building on Orion CI-Orion-Running **Bot use only** CI testing on Orion for this PR is in-progress and removed CI-Orion-Ready **CM use only** PR is ready for CI testing on Orion CI-Orion-Building **Bot use only** CI testing is cloning/building on Orion labels Oct 30, 2023
@emcbot
Copy link

emcbot commented Oct 30, 2023

Automated global-workflow Testing Results:

Machine: Orion
Start: Sun Oct 29 19:44:19 CDT 2023 on Orion-login-1.HPC.MsState.Edu
---------------------------------------------------
Checkout:                      *SUCCESS*
Checkout: Completed at Sun Oct 29 19:45:27 CDT 2023
Build:                         *SUCCESS*
Build: Completed at Sun Oct 29 20:12:15 CDT 2023
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 20:12:22 CDT 2023 for experiment C48_ATM_81d0b4ce
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 20:12:25 CDT 2023 for experiment C48_S2SA_gefs_81d0b4ce
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 20:12:28 CDT 2023 for experiment C48_S2SW_81d0b4ce
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 20:12:32 CDT 2023 for experiment C96_atm3DVar_81d0b4ce
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 20:12:37 CDT 2023 for experiment C96C48_hybatmDA_81d0b4ce

@emcbot
Copy link

emcbot commented Oct 30, 2023

Automated global-workflow Testing Results:

Machine: Orion
Start: Sun Oct 29 19:44:19 CDT 2023 on Orion-login-1.HPC.MsState.Edu
---------------------------------------------------
Checkout:                      *SUCCESS*
Checkout: Completed at Sun Oct 29 19:45:27 CDT 2023
Build:                         *SUCCESS*
Build: Completed at Sun Oct 29 20:12:15 CDT 2023
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 20:12:22 CDT 2023 for experiment C48_ATM_81d0b4ce
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 20:12:25 CDT 2023 for experiment C48_S2SA_gefs_81d0b4ce
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 20:12:28 CDT 2023 for experiment C48_S2SW_81d0b4ce
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 20:12:32 CDT 2023 for experiment C96_atm3DVar_81d0b4ce
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 20:12:37 CDT 2023 for experiment C96C48_hybatmDA_81d0b4ce
Experiment C48_S2SA_gefs_81d0b4ce completed: *SUCCESS*
Experiment C48_S2SA_gefs_81d0b4ce Completed at Sun Oct 29 21:30:17 CDT 2023
with 4 successfully completed jobs

@emcbot
Copy link

emcbot commented Oct 30, 2023

Automated global-workflow Testing Results:

Machine: Orion
Start: Sun Oct 29 19:44:19 CDT 2023 on Orion-login-1.HPC.MsState.Edu
---------------------------------------------------
Checkout:                      *SUCCESS*
Checkout: Completed at Sun Oct 29 19:45:27 CDT 2023
Build:                         *SUCCESS*
Build: Completed at Sun Oct 29 20:12:15 CDT 2023
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 20:12:22 CDT 2023 for experiment C48_ATM_81d0b4ce
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 20:12:25 CDT 2023 for experiment C48_S2SA_gefs_81d0b4ce
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 20:12:28 CDT 2023 for experiment C48_S2SW_81d0b4ce
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 20:12:32 CDT 2023 for experiment C96_atm3DVar_81d0b4ce
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 20:12:37 CDT 2023 for experiment C96C48_hybatmDA_81d0b4ce
Experiment C48_S2SA_gefs_81d0b4ce completed: *SUCCESS*
Experiment C48_S2SA_gefs_81d0b4ce Completed at Sun Oct 29 21:30:17 CDT 2023
with 4 successfully completed jobs
Experiment C48_ATM_81d0b4ce completed: *SUCCESS*
Experiment C48_ATM_81d0b4ce Completed at Sun Oct 29 21:32:07 CDT 2023
with 29 successfully completed jobs

@emcbot emcbot added CI-Orion-Failed **Bot use only** CI testing on Orion for this PR has failed and removed CI-Orion-Running **Bot use only** CI testing on Orion for this PR is in-progress labels Oct 30, 2023
@emcbot
Copy link

emcbot commented Oct 30, 2023

Automated global-workflow Testing Results:

Machine: Orion
Start: Sun Oct 29 19:44:19 CDT 2023 on Orion-login-1.HPC.MsState.Edu
---------------------------------------------------
Checkout:                      *SUCCESS*
Checkout: Completed at Sun Oct 29 19:45:27 CDT 2023
Build:                         *SUCCESS*
Build: Completed at Sun Oct 29 20:12:15 CDT 2023
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 20:12:22 CDT 2023 for experiment C48_ATM_81d0b4ce
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 20:12:25 CDT 2023 for experiment C48_S2SA_gefs_81d0b4ce
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 20:12:28 CDT 2023 for experiment C48_S2SW_81d0b4ce
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 20:12:32 CDT 2023 for experiment C96_atm3DVar_81d0b4ce
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 20:12:37 CDT 2023 for experiment C96C48_hybatmDA_81d0b4ce
Experiment C48_S2SA_gefs_81d0b4ce completed: *SUCCESS*
Experiment C48_S2SA_gefs_81d0b4ce Completed at Sun Oct 29 21:30:17 CDT 2023
with 4 successfully completed jobs
Experiment C48_ATM_81d0b4ce completed: *SUCCESS*
Experiment C48_ATM_81d0b4ce Completed at Sun Oct 29 21:32:07 CDT 2023
with 29 successfully completed jobs
Experiment C96C48_hybatmDA_81d0b4ce Terminated: *FAILED*
Experiment C96C48_hybatmDA_81d0b4ce Terminated with 1 tasks failed at Sun Oct 29 23:06:11 CDT 2023
Error logs:
/work2/noaa/stmp/GFS_CI_ROOT/PR/1997/RUNTESTS/COMROT/C96C48_hybatmDA_81d0b4ce/logs/2021122100/gdasanal.log

@WalterKolczynski-NOAA WalterKolczynski-NOAA marked this pull request as draft October 30, 2023 04:15
@WalterKolczynski-NOAA WalterKolczynski-NOAA removed the CI-Orion-Failed **Bot use only** CI testing on Orion for this PR has failed label Oct 30, 2023
@WalterKolczynski-NOAA WalterKolczynski-NOAA added the CI-Orion-Ready **CM use only** PR is ready for CI testing on Orion label Oct 30, 2023
@emcbot emcbot added CI-Orion-Building **Bot use only** CI testing is cloning/building on Orion CI-Orion-Running **Bot use only** CI testing on Orion for this PR is in-progress and removed CI-Orion-Ready **CM use only** PR is ready for CI testing on Orion CI-Orion-Building **Bot use only** CI testing is cloning/building on Orion labels Oct 30, 2023
@emcbot
Copy link

emcbot commented Oct 30, 2023

Automated global-workflow Testing Results:

Machine: Orion
Start: Sun Oct 29 23:20:44 CDT 2023 on Orion-login-1.HPC.MsState.Edu
---------------------------------------------------
Checkout:                      *SUCCESS*
Checkout: Completed at Sun Oct 29 23:21:50 CDT 2023
Build:                         *SUCCESS*
Build: Completed at Sun Oct 29 23:48:43 CDT 2023
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 23:48:50 CDT 2023 for experiment C48_ATM_eb9082fa
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 23:48:53 CDT 2023 for experiment C48_S2SA_gefs_eb9082fa
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 23:48:57 CDT 2023 for experiment C48_S2SW_eb9082fa
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 23:49:01 CDT 2023 for experiment C96_atm3DVar_eb9082fa
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 23:49:06 CDT 2023 for experiment C96C48_hybatmDA_eb9082fa

@emcbot
Copy link

emcbot commented Oct 30, 2023

Automated global-workflow Testing Results:

Machine: Orion
Start: Sun Oct 29 23:20:44 CDT 2023 on Orion-login-1.HPC.MsState.Edu
---------------------------------------------------
Checkout:                      *SUCCESS*
Checkout: Completed at Sun Oct 29 23:21:50 CDT 2023
Build:                         *SUCCESS*
Build: Completed at Sun Oct 29 23:48:43 CDT 2023
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 23:48:50 CDT 2023 for experiment C48_ATM_eb9082fa
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 23:48:53 CDT 2023 for experiment C48_S2SA_gefs_eb9082fa
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 23:48:57 CDT 2023 for experiment C48_S2SW_eb9082fa
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 23:49:01 CDT 2023 for experiment C96_atm3DVar_eb9082fa
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 23:49:06 CDT 2023 for experiment C96C48_hybatmDA_eb9082fa
Experiment C48_ATM_eb9082fa completed: *SUCCESS*
Experiment C48_ATM_eb9082fa Completed at Mon Oct 30 01:06:06 CDT 2023
with 29 successfully completed jobs

@emcbot
Copy link

emcbot commented Oct 30, 2023

Automated global-workflow Testing Results:

Machine: Orion
Start: Sun Oct 29 23:20:44 CDT 2023 on Orion-login-1.HPC.MsState.Edu
---------------------------------------------------
Checkout:                      *SUCCESS*
Checkout: Completed at Sun Oct 29 23:21:50 CDT 2023
Build:                         *SUCCESS*
Build: Completed at Sun Oct 29 23:48:43 CDT 2023
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 23:48:50 CDT 2023 for experiment C48_ATM_eb9082fa
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 23:48:53 CDT 2023 for experiment C48_S2SA_gefs_eb9082fa
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 23:48:57 CDT 2023 for experiment C48_S2SW_eb9082fa
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 23:49:01 CDT 2023 for experiment C96_atm3DVar_eb9082fa
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 23:49:06 CDT 2023 for experiment C96C48_hybatmDA_eb9082fa
Experiment C48_ATM_eb9082fa completed: *SUCCESS*
Experiment C48_ATM_eb9082fa Completed at Mon Oct 30 01:06:06 CDT 2023
with 29 successfully completed jobs
Experiment C48_S2SA_gefs_eb9082fa completed: *SUCCESS*
Experiment C48_S2SA_gefs_eb9082fa Completed at Mon Oct 30 01:06:10 CDT 2023
with 4 successfully completed jobs

@emcbot emcbot added CI-Orion-Failed **Bot use only** CI testing on Orion for this PR has failed and removed CI-Orion-Running **Bot use only** CI testing on Orion for this PR is in-progress labels Oct 30, 2023
@emcbot
Copy link

emcbot commented Oct 30, 2023

Automated global-workflow Testing Results:

Machine: Orion
Start: Sun Oct 29 23:20:44 CDT 2023 on Orion-login-1.HPC.MsState.Edu
---------------------------------------------------
Checkout:                      *SUCCESS*
Checkout: Completed at Sun Oct 29 23:21:50 CDT 2023
Build:                         *SUCCESS*
Build: Completed at Sun Oct 29 23:48:43 CDT 2023
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 23:48:50 CDT 2023 for experiment C48_ATM_eb9082fa
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 23:48:53 CDT 2023 for experiment C48_S2SA_gefs_eb9082fa
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 23:48:57 CDT 2023 for experiment C48_S2SW_eb9082fa
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 23:49:01 CDT 2023 for experiment C96_atm3DVar_eb9082fa
Created experiment:            *SUCCESS*
Case setup: Completed at Sun Oct 29 23:49:06 CDT 2023 for experiment C96C48_hybatmDA_eb9082fa
Experiment C48_ATM_eb9082fa completed: *SUCCESS*
Experiment C48_ATM_eb9082fa Completed at Mon Oct 30 01:06:06 CDT 2023
with 29 successfully completed jobs
Experiment C48_S2SA_gefs_eb9082fa completed: *SUCCESS*
Experiment C48_S2SA_gefs_eb9082fa Completed at Mon Oct 30 01:06:10 CDT 2023
with 4 successfully completed jobs
Experiment C96C48_hybatmDA_eb9082fa Terminated: *FAILED*
Experiment C96C48_hybatmDA_eb9082fa Terminated with 1 tasks failed at Mon Oct 30 03:20:18 CDT 2023
Error logs:
/work2/noaa/stmp/GFS_CI_ROOT/PR/1997/RUNTESTS/COMROT/C96C48_hybatmDA_eb9082fa/logs/2021122100/gdasanal.log

@WalterKolczynski-NOAA
Copy link
Contributor Author

The issue seems to be something other than just the wall time.

@WalterKolczynski-NOAA
Copy link
Contributor Author

Going to resurrect this and just increase wallclock to something that seems to be working until a better solution is found.

@WalterKolczynski-NOAA WalterKolczynski-NOAA added CI-Orion-Ready **CM use only** PR is ready for CI testing on Orion and removed CI-Orion-Failed **Bot use only** CI testing on Orion for this PR has failed labels Nov 7, 2023
@emcbot emcbot added CI-Orion-Building **Bot use only** CI testing is cloning/building on Orion CI-Orion-Failed **Bot use only** CI testing on Orion for this PR has failed and removed CI-Orion-Ready **CM use only** PR is ready for CI testing on Orion CI-Orion-Building **Bot use only** CI testing is cloning/building on Orion labels Nov 7, 2023
@emcbot
Copy link

emcbot commented Nov 7, 2023

Automated global-workflow Testing Results:

Machine: Orion
Start: Mon Nov  6 21:36:38 CST 2023 on Orion-login-1.HPC.MsState.Edu
---------------------------------------------------
Checkout: Completed at Mon Nov  6 21:37:53 CST 2023
Build: *** FAILED ***
Build: Failed at Mon Nov  6 23:08:04 CST 2023
Build: see output at /work2/noaa/stmp/GFS_CI_ROOT/PR/1997/global-workflow/sorc/log.build
Failed on cloning and building global-workflowi PR: 1997
CI on Orion failed to build on Mon Nov  6 23:08:04 CST 2023 for repo https://github.com/NOAA-EMC/global-workflow.git

@RussTreadon-NOAA
Copy link
Contributor

Going to resurrect this and just increase wallclock to something that seems to be working until a better solution is found.

Agreed. Waiting for change to /work/noaa/nems/arichert/spack-stack-1.4.1-gw/envs/gw/install/modulefiles/Core permissions. I do not belong to the nems group. Thus, I cannot access /work/noaa/nems/arichert/spack-stack-1.4.1-gw.

@WalterKolczynski-NOAA
Copy link
Contributor Author

Gonna close this again since it looks like the real fix has been found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI-Orion-Failed **Bot use only** CI testing on Orion for this PR has failed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GSI-based analysis jobs dying after hitting wallclock limit on Orion
3 participants