Skip to content

Conducting a Retrospective Run with Containers

uwagura edited this page Dec 31, 2024 · 3 revisions

Containers are isolated software environments containing all dependencies and tools required to run a particular piece of software. Fre/2025.01 offers tools for creating containers and compiling and running models within them. In addition, users can use a separate container to post-processes model output with Fre/2025.01, and MED Division provides a Dockerfile that allow users to conduct containerized runs outside of the Fre workflow altogether. Note that MED does not yet support full retrospective runs outside of the FRE workflow, but future updates to this repo will add helper scripts to support this option. (WIP - push Dockerfile and related tools to github )

NWA Retrospective Run and Post Processing with Containers in Fre

The latest version of Fre contains tools for building and running models in containers. Note that as of the time of writing this wiki page, Fre/2025.01 is under active development and has not been officially been released yet, so the instructions below are subject to change.

Building and Running the Model

The following sections assume that the following are true: 1.) You have access to Gaea or some other system with access to the Fre workflow. 2.) You have access to both podman and apptainer/singularity on your system. Note that on gaea, you must open a help desk ticket and be added to the list of users you can use podman in order to get access to the podman commmand

If you are running this workflow outside of gaea, note that the container images, .tar, and .sif files produced by podman and apptainer can be quite large - approximately 25 GB each.

Setup Environment

First clone this repository to get access to the prewritting yamls and xmls:

git clone https://github.com/NOAA-GFDL/CEFI-regional-MOM6.git

Once the repo is cloned, navigated to the yamls directory:

cd CEFI-regional-MOM6/yamls/NWA12

Load the relevant modules to setup up your environment correctly:

module unuse /ncrc/home2/fms/local/modulefiles
module use /ncrc/home2/fms/local/modulefiles_test
module load fre/2025.01

You should now have access to the fre command, as well as the fre make sub command. If each of these commands work, you can begin compiling the model.

Creating Container and Compiling Model

Begin by running the following command to create a checkout script - this is the script that will git clone the model components into the container so they can be compiled:

fre make create-checkout -y CEFI_NWA12_cobalt.yaml -p hpcme.2023 -t prod -npc

If this step completes successfully, you should see a new folder named /tmp/hpcme.2023 containing the checkout script. Next you will need to create a Makefile to compile the model inside of the container, as well as a Dockerfile to create the container itself:

fre make create-makefile -y CEFI_NWA12_cobalt.yaml -p hpcme.2023 -t prod
fre make create-dockerfile -y CEFI_NWA12_cobalt.yaml -p hpcme.2023 -t prod

If the these steps are successful, you should now see a Makefile and a execrunscript.sh in the /tmp/hpcme.2023 folder, as well as a Dockerfile and a createContainer.sh script in your current directory. The Dockerfile contains instructions on how to create the container and then compile the model within the container, while the createContainer.sh script contains commands to automate the process of building the container and saving it to both a tar file and a singularity image file that can be used to run the container/model. Simply run the script to begin this process:

./createContainer.sh

Note that you can have the script run automatically by adding the --execute flag to the create-dockerfile command above.

Running the Model

Once the container has been built - this may take some time, depending on the availability of resources - you should now have .tar and .sif files named mom6_sis2_generic_4p_compile_symm_yaml-prod. The .sif file contains the execrunscript.sh created in the previous step, meaning it can now be used just like model executable. In order to conduct the full 27 year retrospective in the NWA domain using fre, first run:

module unload fre/2025.01
module load fre/bronx-23

This replaces the latest version of fre with a version that can perform container runs. Next navigate to the xmls directory:

cd ../../xmls/NWA12

To run the containerized model using fre, make the following changes to the xml: 1.) Change the fre_version variable from bronx-22 to bronx-23 2.) Right below the line defining the CEFI_NWA12_COBALT_V1 experiment, add a line pointing to the container:

   <experiment name="CEFI_NWA12_COBALT_V1" inherit="MOM6_SIS2_GENERIC_4P_compile_symm">
    <container file="/gpfs/f5/cefi/scratch/Utheri.Wagura/CEFI-regional-MOM6/yamls/NWA12/mom6_sis2_generic_4p_compile_symm_yaml-prod.sif"/>
    <description>

Finally run frerun with the --container flag to run the experiment:

frerun -x CEFI_NWA12_cobalt.xml -p ncrc6.intel23 -t prod CEFI_NWA12_COBALT_V1 --container

If you are running on c5, be sure to change the platform to ncrc5.intel23

Postprocessing Model Output

(WIP)

NWA Retrospective Run Outside of Fre

If you don't have access to Fre on your system, MED division also provides a Dockerfile that let's you create a container and compile the model on any system that has either docker or podman. (WIP: File not yet available).

Clone this wiki locally