Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation of pyscream module(s) for running eamxx from python #2851

Merged
merged 47 commits into from
Jun 25, 2024

Conversation

bartgol
Copy link
Contributor

@bartgol bartgol commented May 29, 2024

The desire is to run a single parametrization from a python script. This requires a few hooks, that this PR tries to put together. Work is still ongoing, and the PR may be rebased/reset/closed/reopen.

The idea is to create a separate python module for each scream library:

  • pyscream: contains all infrastructure hooks. Currently it has wrappers for
    • ekat's Units
    • scream Field
    • scream Grid
  • pyp3: contains a wrapper of P3Microphysics.

The wrappers do NOT expose all the methods of the wrapped class, nor they are built in the same way as the wrapped class. Instead, we expose the minimum necessary to be able to create/setup/use scream objects.

If you want to try building this, you need to

  • pip install mpi4py pybind11
  • run test-all-scream with -c EAMXX_ENABLE_PYBIND=ON -c BUILD_SHARED_LIBS=ON

Lots of stuff to do/decide

  • Add method to extract numpy array from PyField
  • Add wrappers for timestamp (needed for init)
  • Add wrappers for reading fields from IC files
  • Add hooks to (re)set parameters inside parametrizations. What if this requires to re-run initialize? Can we re-init an atm proc? I don't think so. Alternative: re-create the atm proc every time.
  • Run init/finalize of eamxx session when pyscream is imported/unloaded (the latter may be done with atexit)
  • Allow to set geometry data in the grid? E.g., lat/lon/area?
  • Do units really matter? Can we set in/out fields with bad units? Do atm procs notice it or do they just check the field name/layout?
  • Other?

@bartgol bartgol requested a review from mahf708 May 29, 2024 06:34
@bartgol bartgol self-assigned this May 29, 2024
@bartgol
Copy link
Contributor Author

bartgol commented May 29, 2024

Here's an example of how it can be used. It still crashes, but gives an idea. Also, forget about units, I got tired and just wanted to get all fields in.

#!/usr/bin/env python3

# These lines won't be needed once we package things correctly
import sys
sys.path.append('/home/lbertag/workdir/scream/scream-src/master/components/eamxx/ctest-build/full_debug/src/share/python')
sys.path.append('/home/lbertag/workdir/scream/scream-src/master/components/eamxx/ctest-build/full_debug/src/physics/p3/python')

from mpi4py import MPI
import pyscream as ps
import pyp3 as P3
from pyscream.units import m,s,kg,K,N,J,W,Pa,one

#######################################
def run ():
#########################################

    # Create a comm, and print specs
    comm = MPI.COMM_WORLD
    print (f"hello from rank{comm.Get_rank()}/{comm.Get_size()}")

    # Create a physics-like grid
    ncols = 100
    nlevs = 128
    grid = ps.Grid("Physics",ncols,nlevs)

    # Check that units operators overloads work
    m2 = m*m
    Wm2 = W / m2
    print (f"Wm2: {Wm2.str()}")

    # Create a bunch of (managed) fields
    T_mid = grid.scalar3d_mid("T_mid",K)
    qv = grid.scalar3d_mid("qv",kg/kg)
    qc = grid.scalar3d_mid("qc",kg/kg)
    qi = grid.scalar3d_mid("qi",kg/kg)
    qr = grid.scalar3d_mid("qr",kg/kg)
    qm = grid.scalar3d_mid("qm",kg/kg)
    nc = grid.scalar3d_mid("nc",one/kg)
    nr = grid.scalar3d_mid("nr",one/kg)
    ni = grid.scalar3d_mid("ni",one/kg)
    bm = grid.scalar3d_mid("bm",one/kg)
    precip_l = grid.scalar2d("precip_liq_surf_mass",kg)
    precip_i = grid.scalar2d("precip_ice_surf_mass",kg)
    eff_r_qc = grid.scalar3d_mid("eff_radius_qc",m)
    eff_r_qi = grid.scalar3d_mid("eff_radius_qi",m)
    eff_r_qr = grid.scalar3d_mid("eff_radius_qr",m)
    p_mid = grid.scalar3d_mid("p_mid",m)
    p_dry_mid = grid.scalar3d_mid("p_dry_mid",m)
    rho = grid.scalar3d_mid("pseudo_density",m)
    rho_dry = grid.scalar3d_mid("pseudo_density_dry",m)
    cldfrac_tot = grid.scalar3d_mid("cldfrac_tot",m)
    qv_prev = grid.scalar3d_mid("qv_prev_micro_step",m)
    T_prev = grid.scalar3d_mid("T_prev_micro_step",m)
    rain_frac = grid.scalar3d_mid("rainfrac",m)
    nc_nuceat_tend = grid.scalar3d_mid("nc_nuceat_tend",m)
    nccn = grid.scalar3d_mid("nccn",m)
    ni_activated = grid.scalar3d_mid("ni_activated",m)
    inv_qc_relvar = grid.scalar3d_mid("inv_qc_relvar",m)
    liq_ice_exchange = grid.scalar3d_mid("micro_liq_ice_exchange",m)
    vap_liq_exchange = grid.scalar3d_mid("micro_vap_liq_exchange",m)
    vap_ice_exchange = grid.scalar3d_mid("micro_vap_ice_exchange",m)

    # Create an init p3
    p3 = P3.P3(grid)
    p3.set_fields([qv,T_mid,qc,qi,qr,qm,nc,nr,ni,bm,precip_l,precip_i,eff_r_qc,eff_r_qi,eff_r_qr,p_mid,p_dry_mid,rho,rho_dry,cldfrac_tot,qv_prev,rain_frac,nc_nuceat_tend,nccn,ni_activated,inv_qc_relvar,T_prev,liq_ice_exchange,vap_liq_exchange,vap_ice_exchange])
    p3.initialize()

#######################################
def main ():
#######################################
    # This level of indirection ensures all pybind structs are destroyed
    # before we finalize eamxx (and hence kokkos)
    ps.init()
    run ()
    ps.finalize()

####################################
if  __name__  == "__main__":
    main()

@PeterCaldwell
Copy link
Contributor

quick thought - we should probably call the library pyeamxx or something like that since folks will be using it for low-res applications as well. Though pyeamxx sounds lame. py3sm? eampy?

@mahf708
Copy link
Contributor

mahf708 commented May 29, 2024

quick thought - we should probably call the library pyeamxx or something like that since folks will be using it for low-res applications as well. Though pyeamxx sounds lame. py3sm? eampy?

The main package (currently someone --- I don't know who!!! --- is calling it screaminpy) should definitely be changed to something more agreeable.

The names of wrappers of components should match their nearest neighbor imho (so things like pyscream, as the python of scream_share, should remain as internal components)

@bartgol
Copy link
Contributor Author

bartgol commented May 29, 2024

quick thought - we should probably call the library pyeamxx or something like that since folks will be using it for low-res applications as well. Though pyeamxx sounds lame. py3sm? eampy?

I thought about that too. pyscream is a bit off, since scream is an eamxx configuration. As you said, pyeamxx just doesn't roll off the tongue. Alas, I think being less misleading is better than being easy to pronounce, so maybe we will switch to pyeamxx at some point. I would vote against py3sm and eampy, since they give false impressions (we are not interfacing to e3sm or eam).

@mahf708
Copy link
Contributor

mahf708 commented May 29, 2024

quick thought - we should probably call the library pyeamxx or something like that since folks will be using it for low-res applications as well. Though pyeamxx sounds lame. py3sm? eampy?

I thought about that too. pyscream is a bit off, since scream is an eamxx configuration. As you said, pyeamxx just doesn't roll off the tongue. Alas, I think being less misleading is better than being easy to pronounce, so maybe we will switch to pyeamxx at some point. I would vote against py3sm and eampy, since they give false impressions (we are not interfacing to e3sm or eam).

I can switch the overall package (in eamxx/src/python) to pyeamxx later today. I don't think we need to change pyscream to that since if I understand your logic correctly it is just a wrapper for scream_share lib (which is an actual thing in our code right now, eamxx/src/share compiles to scream_share).

I think it is important to think about this in terms of what faces the user: ultimately, the user will only interact with the package in eamxx/src/python which will have more python-centric tools that we will write (for example, hiding away all of the details of init, grid, etc. stuff unless the user wants to override them). In your code snippet above, I envision the user to do:

from pyeamxx import pyscream as ps
from pyeamxx import pyp3 as P3

#######################################
def run ():
#########################################

    # Create a physics-like grid
    ncols = 100
    nlevs = 128
    with ps.Grid("Physics",ncols,nlevs) as psg:
      # this with-as logic allows us to do a lot of init/de-init stuff
      # by merely entering and existing this block, including MPI
      # Create an init p3
      p3 = P3.P3(psg)
      # we will write an abstraction such that this
      # P3.P3 call endows psg with all the fields needed
      # init-ed to something reasonable (or via p3_input.yaml if available)
      # users can access them as p3.T_mid, etc.
      p3.initialize()
      # we can add more convenient functionalities here based on user feedback
      # exposing/hiding things as needed.
      
      # NOTE: we will enable the user to override our hiding of things!

#######################################
def main ():
#######################################
    run ()

####################################
if  __name__  == "__main__":
    main()

@bartgol
Copy link
Contributor Author

bartgol commented May 29, 2024

Uhm, that may be a better approach. Do not let the user create fields, just access the ones they need. I'll work on that. It may also allow to trim down the pygrid file, since we don't need to expose those capabilities anymore.

@mahf708
Copy link
Contributor

mahf708 commented May 29, 2024

Uhm, that may be a better approach. Do not let the user create fields, just access the ones they need. I'll work on that. It may also allow to trim down the pygrid file, since we don't need to expose those capabilities anymore.

I don't think you need to change anything on the cpp end, we can do all the hiding/exposing in python 😉

Copy link
Contributor

@mahf708 mahf708 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two quick technical comments (for the record here, in case someone else wants to weigh in, but we will continue privately)

  • pybind11 vs nanobind. So, nanobind is young and much more exciting. I couldn't get it to work. Did you test it at all? Since you're much more familiar with the build system, I think you will likely get it work. If we can get nanobind to work, we are better off just going with nanobind from the get-go. Also, for my understanding, how does cmake find the pybind11 submodule? Is there a logic in up dir to find all the stuff in externs?
  • array api. I think you're using the pybind11/numpy headers to expose stuff. That's fine for now. But like pybind11 vs nanobind, if we are able to get the more general array api to work from the get-go, that will save us a lot of time later. I can look into this point myself to see where the state of the art is these days; I recently saw the generic array api being used with nanobind fwiw. My hope is that it will be simply transferrable with very minor edits.

@bartgol
Copy link
Contributor Author

bartgol commented May 29, 2024

  • pybind: I think a good old find_package(pybind11 REQUIRED) is all we need. If a CMake var called <packagename>_ROOT exists, cmake already uses it to find the package.
  • I don't know about the array specifics. What's bad with the stuff in numpy.h from pybind11?

@mahf708
Copy link
Contributor

mahf708 commented May 29, 2024

What's bad with the stuff in numpy.h from pybind11?

It may limit us to numpy arrays exclusively, which may become burdensome down the line. Supporting a generic array api will enable us to exchange arbitrary arrays as long as they conform to the api. An example is: consider in the future we want to take this to device, numpy doesn't support gpus and won't, so we will need to figure out some workarounds. Another: general ML applications and frameworks with their "arrays" (tensors). Overall, I think the interoperability is robust and we can live with numpy.h if we have to, but if we don't have to, we should prefer the generic api (e.g., one offered by nanobind).

side reading on the "python array api" https://data-apis.org/array-api/latest/purpose_and_scope.html (nanobind has something else, which seems operational: https://nanobind.readthedocs.io/en/latest/ndarray.html)

bartgol and others added 16 commits May 30, 2024 22:19
we likely should set this up under eamxx/pippkg or something, tbd
* All other files are hpp files included by pyscream.cpp
* Each file handle separate EAMxx concept
* Remove hard-coding of pybind11_DIR. User must provide pybind11_ROOT cmake/env var
* Add header guards to pyxyz.hpp
* Add PyGrid type
* PyGrid can create managed/unmanaged fields
* Add constructors to PyField (not all py-interoperable)
Start the pattern where folder blah has python subfolder
if pybind hooks are supported
When needed, build a GridsManager on the fly, using the ad-hoc
type SingleGridGM
* Fields are created by pyscream, no possibility to pass pre-built numpy array
* Add generic impl of atm proc wrapper in PyAtmProc
* Users can grab fields out of a py AP based on field name
* Removed pointless stuff from py grid
* Removed no longer needed wrappers for units
@bartgol bartgol force-pushed the bartgol/eamxx/pyscream branch from 0c81180 to 253774d Compare June 10, 2024 23:09
bartgol added 2 commits June 10, 2024 17:21
With PUBLIC, the compiled mod files magically appeared in the build folder
of targets that link against tms. I'm not sure if this is the right fix,
but it seems to work
@bartgol
Copy link
Contributor Author

bartgol commented Jun 10, 2024

@mahf708 I was puzzled an confused by the two modules (pyeamxx and libpyeamxx_ext), so I renamed the second to pyeamxx. My guess is that the two clash when you try to package it as a conda package? I'm not sure. We can discuss when you get back.

Meanwhile, pyeamxx can now be used with any of the physics parametrizations. The only limitation is that the user must create a grid for which we have an IC file. We can work on adding some logic (in eamxx or pyeamxx) to allow reading an existing IC file (with N cols) into a grid with 1 col, perhaps maybe recycling IOP implementation: specify the lat/lon of the col, and reading in the closest col to that lat/lon.

When Naser returns, we can iterate on the pyeamxx conda package impl, which I'm confident I broke. Meanwhile, pyeamxx is available as follows:

$ cd components/eamxx
$ ./scripts/test-all-scream -t opt -m MACHINE --config-only
$ cd ctest-build/release/src/python
$ make -j

And then in your python script, do something like

import sys
sys.path.append('/path/to/eamxx/ctest-build/full_debug/src/python')
sys.path.append('/path/to/eamxx/scripts')

import mpi4py
mpi4py.rc.initialize = False  # do not initialize MPI automatically
mpi4py.rc.finalize = False    # do not finalize MPI automatically

from mpi4py import MPI
import pyeamxx
from pathlib import Path
from utils import ensure_yaml
ensure_yaml()
import yaml

def main():
    # Get timestepping params
    with open('input.yaml','r') as fd:
        yaml_input = yaml.load(fd,Loader=yaml.SafeLoader)
    nsteps = yaml_input['time_stepping']['number_of_steps']
    dt     = yaml_input['time_stepping']['time_step']
    t0_str = yaml_input['time_stepping']['run_t0']

    # Create the grid
    ncols = 218
    nlevs = 72
    grid = pyeamxx.Grid("Physics",ncols,nlevs)

    ic_file = Path('/home/lbertag/workdir/e3sm/e3sm-data/inputdata/atm/scream/init/screami_unit_tests_ne2np4L72_20220822.nc')

    rrtmgp_params = pyeamxx.ParameterList(yaml_input['atmosphere_processes']['rrtmgp'],'rrtmgp')
    rrtmgp = pyeamxx.AtmProc(rrtmgp_params,grid)
    missing = rrtmgp.read_ic(str(ic_file))
    print (f"WARNING! The following input fields were not found in the IC file, and must be manually initialized: {missing}")
    rrtmgp.initialize(str(t0_str))
    rrtmgp.setup_output("my_output_rrtmgp.yaml")

    # Time looop
    for n in range(0,nsteps):
        rrtmgp.run(dt)

if  __name__  == "__main__":
    # This level of indirection ensures all pybind structs are destroyed
    # before we finalize eamxx (and hence kokkos)
    MPI.Init()
    pyeamxx.init()
    main ()
    pyeamxx.finalize()
    MPI.Finalize()

The above assumes that input.yaml is structured similarly to our standalone input tests. Of course, you are welcome to change that, and then also change the python example above accordingly.

@PeterCaldwell @hassanbeydoun I know you guys were interesting in running rrtmgp for multiple IC perturbations. If you want, you can start using the above example. You can access all input/output fields of the parametrization, and modify them at will. E.g.:

def main():
   ...
   for ens in range(0,num_ensembles):
       # create rrtmgp as above
       
       # f is a numpy array, of the proper dimension
       f = rrtmgp.get_field('some_input')
       data = f.get()
       data[1,2,3] *= some_rand_number
       f.sync_to_dev()
       
       for n in range(0,nsteps):
        rrtmgp.run(dt)

@bartgol bartgol force-pushed the bartgol/eamxx/pyscream branch from 605d452 to 8d8e2af Compare June 11, 2024 16:30
@mahf708 mahf708 removed the AT: WIP label Jun 24, 2024
@E3SM-Bot
Copy link
Collaborator

Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects:

Pull Request Auto Testing STARTING (click to expand)

Build Information

Test Name: SCREAM_PullRequest_Autotester_Mappy

  • Build Num: 5575
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS python
PULLREQUESTNUM 2851
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA 1bd69ad
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 6d57c6b
TEST_REPO_ALIAS SCREAM

Build Information

Test Name: SCREAM_PullRequest_Autotester_Weaver

  • Build Num: 5828
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS python
PULLREQUESTNUM 2851
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA 1bd69ad
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 6d57c6b
TEST_REPO_ALIAS SCREAM

Using Repos:

Repo: SCREAM (E3SM-Project/scream)
  • Branch: bartgol/eamxx/pyscream
  • SHA: 1bd69ad
  • Mode: TEST_REPO

Pull Request Author: bartgol

@E3SM-Bot
Copy link
Collaborator

Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED

Pull Request Auto Testing has PASSED (click to expand)

Build Information

Test Name: SCREAM_PullRequest_Autotester_Mappy

  • Build Num: 5575
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS python
PULLREQUESTNUM 2851
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA 1bd69ad
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 6d57c6b
TEST_REPO_ALIAS SCREAM

Build Information

Test Name: SCREAM_PullRequest_Autotester_Weaver

  • Build Num: 5828
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS python
PULLREQUESTNUM 2851
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA 1bd69ad
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 6d57c6b
TEST_REPO_ALIAS SCREAM

@E3SM-Bot
Copy link
Collaborator

Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging
THE LAST COMMIT TO THIS PULL REQUEST HAS NOT BEEN REVIEWED YET!

@E3SM-Bot
Copy link
Collaborator

All Jobs Finished; status = PASSED, target_sha=d0bbf72587216ad2cec0d6e3a0ec3ff0fa902e96, However Inspection must be performed before merge can occur...

Copy link
Contributor

@mahf708 mahf708 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As checks are passing, let's merge this whenever you feel ready @bartgol. We will follow up with more stuff in separate PRs.

@E3SM-Bot
Copy link
Collaborator

The base branch has been updated since the last successful testing.

  • last PASS base branch sha: d0bbf72
  • current base branch sha : 3a0cee9
    The AutoTester will discard the last PASS, and re-test the PR from scratch

@E3SM-Bot
Copy link
Collaborator

Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects:

Pull Request Auto Testing STARTING (click to expand)

Build Information

Test Name: SCREAM_PullRequest_Autotester_Mappy

  • Build Num: 5580
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS python
PULLREQUESTNUM 2851
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA 1bd69ad
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 6d57c6b
TEST_REPO_ALIAS SCREAM

Build Information

Test Name: SCREAM_PullRequest_Autotester_Weaver

  • Build Num: 5832
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS python
PULLREQUESTNUM 2851
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA 1bd69ad
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 6d57c6b
TEST_REPO_ALIAS SCREAM

Using Repos:

Repo: SCREAM (E3SM-Project/scream)
  • Branch: bartgol/eamxx/pyscream
  • SHA: 1bd69ad
  • Mode: TEST_REPO

Pull Request Author: bartgol

@E3SM-Bot
Copy link
Collaborator

Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED

Note: Testing will normally be attempted again in approx. 2 Hrs. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run.

Pull Request Auto Testing has FAILED (click to expand)

Build Information

Test Name: SCREAM_PullRequest_Autotester_Mappy

  • Build Num: 5580
  • Status: FAILED

Jenkins Parameters

Parameter Name Value
PR_LABELS python
PULLREQUESTNUM 2851
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA 1bd69ad
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 6d57c6b
TEST_REPO_ALIAS SCREAM

Build Information

Test Name: SCREAM_PullRequest_Autotester_Weaver

  • Build Num: 5832
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS python
PULLREQUESTNUM 2851
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA 1bd69ad
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 6d57c6b
TEST_REPO_ALIAS SCREAM
SCREAM_PullRequest_Autotester_Mappy # 5580 FAILED (click to see last 100 lines of console output)

srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 2889513 step creation still disabled, retrying (Requested nodes are busy)
srun: Cancelled pending job step with signal 15
srun: error: Unable to create step for job 2889513: Job/step already completing or completed

96% tests passed, 11 tests failed out of 313

Label Time Summary:
PEM = 3.53 secproc (15 tests)
baseline_cmp = 50.70 sec
proc (8 tests)
baseline_gen = 580.15 secproc (10 tests)
bfbhash = 1.46 sec
proc (1 test)
check = 1.76 secproc (1 test)
cld_fraction = 1.48 sec
proc (1 test)
cxx baseline_cmp = 68.31 secproc (2 tests)
diagnostics = 37.24 sec
proc (21 tests)
driver = 825.79 secproc (21 tests)
fail = 2247.59 sec
proc (34 tests)
io = 1011.59 secproc (45 tests)
mam4_aci = 106.92 sec
proc (6 tests)
mam4_optics = 435.41 secproc (6 tests)
nudging = 71.32 sec
proc (4 tests)
p3 = 2507.72 secproc (43 tests)
p3_sk = 2031.80 sec
proc (32 tests)
physics = 5308.92 secproc (112 tests)
remap = 54.02 sec
proc (4 tests)
shoc = 96.67 secproc (13 tests)
spa = 114.95 sec
proc (18 tests)
surface_coupling = 23.94 sec*proc (4 tests)

Total Test time (real) = 6980.71 sec

The following tests FAILED:
106 - p3_tests_omp9 (Failed)
107 - p3_tests_omp10 (Failed)
108 - p3_tests_omp11 (Failed)
136 - p3_sk_tests_omp7 (Failed)
137 - p3_sk_tests_omp8 (Failed)
138 - p3_sk_tests_omp9 (Failed)
169 - shoc_tests_omp5 (Failed)
170 - shoc_tests_omp6 (Failed)
171 - shoc_tests_omp7 (Failed)
185 - shoc_sk_tests_omp5 (Failed)
186 - shoc_sk_tests_omp6 (Failed)'
./scream/components/eamxx/scripts/jenkins/jenkins_common.sh: line 7: 31943 Terminated $JENKINS_SCRIPT_DIR/jenkins_common_impl.sh 2>&1
31944 Done | tee JENKINS_$DATE_STAMP
POST BUILD TASK : SUCCESS
END OF POST BUILD TASK : 0
Sending e-mails to: lbertag@sandia.gov
Finished: FAILURE

SCREAM_PullRequest_Autotester_Weaver # 5832 PASSED (click to see last 100 lines of console output)

122/139 Test #122: shoc_p3_nudging_glob_novert .............................   Passed    2.80 sec
        Start 123: homme_shoc_cld_p3_rrtmgp_np1
123/139 Test #123: homme_shoc_cld_p3_rrtmgp_np1 ............................   Passed   12.36 sec
        Start 124: homme_shoc_cld_p3_rrtmgp_baseline_cmp
124/139 Test #124: homme_shoc_cld_p3_rrtmgp_baseline_cmp ...................   Passed    0.16 sec
        Start 125: homme_shoc_cld_p3_rrtmgp_pg2_np1
125/139 Test #125: homme_shoc_cld_p3_rrtmgp_pg2_np1 ........................   Passed   11.52 sec
        Start 126: homme_shoc_cld_p3_rrtmgp_pg2_baseline_cmp
126/139 Test #126: homme_shoc_cld_p3_rrtmgp_pg2_baseline_cmp ...............   Passed    0.09 sec
        Start 127: model_baseline
127/139 Test #127: model_baseline ..........................................   Passed   13.25 sec
        Start 128: model_initial
128/139 Test #128: model_initial ...........................................   Passed    5.66 sec
        Start 129: model_restart
129/139 Test #129: model_restart ...........................................   Passed    6.86 sec
        Start 130: restarted_vs_monolithic_check_np1
130/139 Test #130: restarted_vs_monolithic_check_np1 .......................   Passed    0.10 sec
        Start 131: homme_shoc_cld_spa_p3_rrtmgp_np1
131/139 Test #131: homme_shoc_cld_spa_p3_rrtmgp_np1 ........................   Passed   13.16 sec
        Start 132: homme_shoc_cld_spa_p3_rrtmgp_baseline_cmp
132/139 Test #132: homme_shoc_cld_spa_p3_rrtmgp_baseline_cmp ...............   Passed    0.13 sec
        Start 133: homme_shoc_cld_spa_p3_rrtmgp_128levels_np1
133/139 Test #133: homme_shoc_cld_spa_p3_rrtmgp_128levels_np1 ..............   Passed   15.65 sec
        Start 134: homme_shoc_cld_spa_p3_rrtmgp_128levels_tend_check_np1
134/139 Test #134: homme_shoc_cld_spa_p3_rrtmgp_128levels_tend_check_np1 ...   Passed    1.46 sec
        Start 135: homme_shoc_cld_spa_p3_rrtmgp_128levels_baseline_cmp
135/139 Test #135: homme_shoc_cld_spa_p3_rrtmgp_128levels_baseline_cmp .....   Passed    0.63 sec
        Start 136: homme_shoc_cld_spa_p3_rrtmgp_pg2_dp_np1
136/139 Test #136: homme_shoc_cld_spa_p3_rrtmgp_pg2_dp_np1 .................   Passed   12.91 sec
        Start 137: homme_shoc_cld_spa_p3_rrtmgp_pg2_dp_baseline_cmp
137/139 Test #137: homme_shoc_cld_spa_p3_rrtmgp_pg2_dp_baseline_cmp ........   Passed    0.09 sec
        Start 138: homme_shoc_cld_p3_mam_optics_rrtmgp_np1
138/139 Test #138: homme_shoc_cld_p3_mam_optics_rrtmgp_np1 .................   Passed   19.41 sec
        Start 139: homme_shoc_cld_p3_mam_optics_rrtmgp_baseline_cmp
139/139 Test #139: homme_shoc_cld_p3_mam_optics_rrtmgp_baseline_cmp ........   Passed    0.17 sec

100% tests passed, 0 tests failed out of 139

Label Time Summary:
baseline_cmp = 73.28 secproc (17 tests)
baseline_gen = 191.87 sec
proc (19 tests)
bfbhash = 0.69 secproc (1 test)
check = 0.72 sec
proc (1 test)
cld = 35.60 secproc (6 tests)
cld_fraction = 0.95 sec
proc (1 test)
cxx baseline_cmp = 6.22 secproc (2 tests)
diagnostics = 33.27 sec
proc (22 tests)
driver = 59.09 secproc (12 tests)
dynamics = 4.37 sec
proc (3 tests)
fail = 31.10 secproc (5 tests)
io = 48.86 sec
proc (13 tests)
mam4_aci = 30.50 secproc (4 tests)
mam4_optics = 8.44 sec
proc (1 test)
nudging = 8.34 secproc (2 tests)
p3 = 82.15 sec
proc (10 tests)
p3_sk = 40.20 secproc (2 tests)
physics = 150.75 sec
proc (23 tests)
remap = 5.98 secproc (1 test)
rrtmgp = 48.10 sec
proc (11 tests)
shoc = 43.03 secproc (11 tests)
spa = 7.69 sec
proc (4 tests)
surface_coupling = 5.08 sec*proc (1 test)

Total Test time (real) = 596.87 sec

Testing '''b4982a91ff5d81cecef596c8d311faaaabfdd989''' for test '''full_sp_debug'''

RUN: taskset -c 52-103 sh -c '''SCREAM_BUILD_PARALLEL_LEVEL=52 CTEST_PARALLEL_LEVEL=1 ctest -V --output-on-failure --resource-spec-file /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/5832/scream/components/eamxx/ctest-build/full_sp_debug/ctest_resource_file.json -DNO_SUBMIT=True -DBUILD_WORK_DIR=/home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/5832/scream/components/eamxx/ctest-build/full_sp_debug -DBUILD_NAME_MOD=full_sp_debug -S /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/5832/scream/components/eamxx/cmake/ctest_script.cmake -DCTEST_SITE=weaver -DCMAKE_COMMAND="-C /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/5832/scream/components/eamxx/cmake/machine-files/weaver.cmake -DNetCDF_Fortran_PATH=/projects/ppc64le-pwr9-rhel8/tpls/netcdf-fortran/4.6.0/gcc/11.3.0/openmpi/4.1.4/5ka6asw -DNetCDF_C_PATH=/projects/ppc64le-pwr9-rhel8/tpls/netcdf-c/4.9.0/gcc/11.3.0/openmpi/4.1.4/mdd6fth -DPnetCDF_C_PATH=/projects/ppc64le-pwr9-rhel8/tpls/parallel-netcdf/1.12.3/gcc/11.3.0/openmpi/4.1.4/52dibdr -DCMAKE_BUILD_TYPE=Debug -DEKAT_DEFAULT_BFB=True -DSCREAM_DOUBLE_PRECISION=False -DEKAT_DISABLE_TPL_WARNINGS='''''''''ON''''''''' -DCMAKE_CXX_COMPILER=mpicxx -DCMAKE_C_COMPILER=mpicc -DCMAKE_Fortran_COMPILER=mpifort -DSCREAM_DYNAMICS_DYCORE=HOMME -DSCREAM_TEST_MAX_TOTAL_THREADS=1 -DSCREAM_BASELINES_DIR=/home/projects/e3sm/scream/pr-autotester/master-baselines/weaver/full_sp_debug" '''
FROM: /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/5832/scream/components/eamxx/ctest-build/full_sp_debug

Testing '''b4982a91ff5d81cecef596c8d311faaaabfdd989''' for test '''full_debug'''

RUN: taskset -c 0-51 sh -c '''SCREAM_BUILD_PARALLEL_LEVEL=52 CTEST_PARALLEL_LEVEL=1 ctest -V --output-on-failure --resource-spec-file /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/5832/scream/components/eamxx/ctest-build/full_debug/ctest_resource_file.json -DNO_SUBMIT=True -DBUILD_WORK_DIR=/home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/5832/scream/components/eamxx/ctest-build/full_debug -DBUILD_NAME_MOD=full_debug -S /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/5832/scream/components/eamxx/cmake/ctest_script.cmake -DCTEST_SITE=weaver -DCMAKE_COMMAND="-C /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/5832/scream/components/eamxx/cmake/machine-files/weaver.cmake -DNetCDF_Fortran_PATH=/projects/ppc64le-pwr9-rhel8/tpls/netcdf-fortran/4.6.0/gcc/11.3.0/openmpi/4.1.4/5ka6asw -DNetCDF_C_PATH=/projects/ppc64le-pwr9-rhel8/tpls/netcdf-c/4.9.0/gcc/11.3.0/openmpi/4.1.4/mdd6fth -DPnetCDF_C_PATH=/projects/ppc64le-pwr9-rhel8/tpls/parallel-netcdf/1.12.3/gcc/11.3.0/openmpi/4.1.4/52dibdr -DCMAKE_BUILD_TYPE=Debug -DEKAT_DEFAULT_BFB=True -DKokkos_ENABLE_DEBUG_BOUNDS_CHECK=True -DEKAT_DISABLE_TPL_WARNINGS='''''''''ON''''''''' -DCMAKE_CXX_COMPILER=mpicxx -DCMAKE_C_COMPILER=mpicc -DCMAKE_Fortran_COMPILER=mpifort -DSCREAM_DYNAMICS_DYCORE=HOMME -DSCREAM_TEST_MAX_TOTAL_THREADS=1 -DSCREAM_BASELINES_DIR=/home/projects/e3sm/scream/pr-autotester/master-baselines/weaver/full_debug" '''
FROM: /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/5832/scream/components/eamxx/ctest-build/full_debug

Testing '''b4982a91ff5d81cecef596c8d311faaaabfdd989''' for test '''release'''

RUN: taskset -c 104-155 sh -c '''SCREAM_BUILD_PARALLEL_LEVEL=52 CTEST_PARALLEL_LEVEL=1 ctest -V --output-on-failure --resource-spec-file /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/5832/scream/components/eamxx/ctest-build/release/ctest_resource_file.json -DNO_SUBMIT=True -DBUILD_WORK_DIR=/home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/5832/scream/components/eamxx/ctest-build/release -DBUILD_NAME_MOD=release -S /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/5832/scream/components/eamxx/cmake/ctest_script.cmake -DCTEST_SITE=weaver -DCMAKE_COMMAND="-C /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/5832/scream/components/eamxx/cmake/machine-files/weaver.cmake -DNetCDF_Fortran_PATH=/projects/ppc64le-pwr9-rhel8/tpls/netcdf-fortran/4.6.0/gcc/11.3.0/openmpi/4.1.4/5ka6asw -DNetCDF_C_PATH=/projects/ppc64le-pwr9-rhel8/tpls/netcdf-c/4.9.0/gcc/11.3.0/openmpi/4.1.4/mdd6fth -DPnetCDF_C_PATH=/projects/ppc64le-pwr9-rhel8/tpls/parallel-netcdf/1.12.3/gcc/11.3.0/openmpi/4.1.4/52dibdr -DCMAKE_BUILD_TYPE=Release -DEKAT_DISABLE_TPL_WARNINGS='''''''''ON''''''''' -DCMAKE_CXX_COMPILER=mpicxx -DCMAKE_C_COMPILER=mpicc -DCMAKE_Fortran_COMPILER=mpifort -DSCREAM_DYNAMICS_DYCORE=HOMME -DSCREAM_TEST_MAX_TOTAL_THREADS=1 -DSCREAM_BASELINES_DIR=/home/projects/e3sm/scream/pr-autotester/master-baselines/weaver/release" '''
FROM: /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/5832/scream/components/eamxx/ctest-build/release
OVERALL STATUS: PASS
Starting analysis on weaver with cmd: cd /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/5832/scream/components/eamxx && source /etc/profile.d/modules.sh && module purge && module load cmake/3.25.1 git/2.39.1 python/3.10.8 py-netcdf4/1.5.8 gcc/11.3.0 cuda/11.8.0 openmpi netcdf-c netcdf-fortran parallel-netcdf netlib-lapack && export HDF5_USE_FILE_LOCKING=FALSE && true && bsub -I -q rhel8 -n 4 -gpu num=4 ./scripts/test-all-scream --baseline-dir AUTO $compiler -p -c EKAT_DISABLE_TPL_WARNINGS=ON -m weaver
RUN: cd /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/5832/scream/components/eamxx && source /etc/profile.d/modules.sh && module purge && module load cmake/3.25.1 git/2.39.1 python/3.10.8 py-netcdf4/1.5.8 gcc/11.3.0 cuda/11.8.0 openmpi netcdf-c netcdf-fortran parallel-netcdf netlib-lapack && export HDF5_USE_FILE_LOCKING=FALSE && true && bsub -I -q rhel8 -n 4 -gpu num=4 ./scripts/test-all-scream --baseline-dir AUTO $compiler -p -c EKAT_DISABLE_TPL_WARNINGS=ON -m weaver
FROM: /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/5832/scream/components/eamxx
Completed analysis on weaver'

  • [[ 0 != 0 ]]
  • [[ 1 == 0 ]]
  • [[ weaver == \m\a\p\p\y ]]
  • set +x
    Performing Post build task...
    Match found for : : True
    Logical operation result is TRUE
    Running script : #!/bin/bash -le

cd $WORKSPACE/${BUILD_ID}/

./scream/components/eamxx/scripts/jenkins/jenkins_cleanup.sh
[SCREAM_PullRequest_Autotester_Weaver] $ /bin/bash -le /tmp/jenkins4892983773418261847.sh
POST BUILD TASK : SUCCESS
END OF POST BUILD TASK : 0
Finished: SUCCESS

@E3SM-Bot
Copy link
Collaborator

Status Flag 'Pull Request AutoTester' - User Requested Retest - Label AT: RETEST will be reset after testing.

@E3SM-Bot
Copy link
Collaborator

Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects:

Pull Request Auto Testing STARTING (click to expand)

Build Information

Test Name: SCREAM_PullRequest_Autotester_Mappy

  • Build Num: 5582
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS AT: RETEST;AT: AUTOMERGE;python
PULLREQUESTNUM 2851
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA 1bd69ad
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 6d57c6b
TEST_REPO_ALIAS SCREAM

Build Information

Test Name: SCREAM_PullRequest_Autotester_Weaver

  • Build Num: 5834
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS AT: RETEST;AT: AUTOMERGE;python
PULLREQUESTNUM 2851
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA 1bd69ad
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 6d57c6b
TEST_REPO_ALIAS SCREAM

Using Repos:

Repo: SCREAM (E3SM-Project/scream)
  • Branch: bartgol/eamxx/pyscream
  • SHA: 1bd69ad
  • Mode: TEST_REPO

Pull Request Author: bartgol

@E3SM-Bot
Copy link
Collaborator

Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED

Pull Request Auto Testing has PASSED (click to expand)

Build Information

Test Name: SCREAM_PullRequest_Autotester_Mappy

  • Build Num: 5582
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS AT: RETEST;AT: AUTOMERGE;python
PULLREQUESTNUM 2851
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA 1bd69ad
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 6d57c6b
TEST_REPO_ALIAS SCREAM

Build Information

Test Name: SCREAM_PullRequest_Autotester_Weaver

  • Build Num: 5834
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS AT: RETEST;AT: AUTOMERGE;python
PULLREQUESTNUM 2851
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA 1bd69ad
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 6d57c6b
TEST_REPO_ALIAS SCREAM

@E3SM-Bot E3SM-Bot merged commit d5057ec into master Jun 25, 2024
14 checks passed
@E3SM-Bot E3SM-Bot deleted the bartgol/eamxx/pyscream branch June 25, 2024 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants