-
Notifications
You must be signed in to change notification settings - Fork 16
Conducting a Retrospective Run with Containers
Containers are isolated software environments containing all dependencies and tools required to run a particular piece of software. Fre/2025.01
offers tools for creating containers and compiling and running models within them. In addition, users can use a separate container to post-processes model output with Fre/2025.01
, and MED Division provides a Dockerfile that allow users to conduct containerized runs outside of the Fre
workflow altogether. Note that MED does not yet support full retrospective runs outside of the FRE
workflow, but future updates to this repo will add helper scripts to support this option. (WIP - push Dockerfile and related tools to github )
The latest version of Fre
contains tools for building and running models in containers. Note that as of the time of writing this wiki page, Fre/2025.01
is under active development and has not been officially been released yet, so the instructions below are subject to change.
The following sections assume that the following are true:
1.) You have access to Gaea or some other system with access to the Fre
workflow.
2.) You have access to both podman
and apptainer/singularity
on your system. Note that on gaea, you must open a help desk ticket and be added to the list of users you can use podman
in order to get access to the podman
commmand
If you are running this workflow outside of gaea, note that the container images, .tar
, and .sif
files produced by podman
and apptainer
can be quite large - approximately 25 GB each.
First clone this repository to get access to the prewritting yamls and xmls:
git clone https://github.com/NOAA-GFDL/CEFI-regional-MOM6.git
Once the repo is cloned, navigated to the yamls directory:
cd CEFI-regional-MOM6/yamls/NWA12
Load the relevant modules to setup up your environment correctly:
module unuse /ncrc/home2/fms/local/modulefiles
module use /ncrc/home2/fms/local/modulefiles_test
module load fre/2025.01
You should now have access to the fre
command, as well as the fre make
sub command. If each of these commands work, you can begin compiling the model.
Begin by running the following command to create a checkout script - this is the script that will git clone
the model components into the container so they can be compiled:
fre make create-checkout -y CEFI_NWA12_cobalt.yaml -p hpcme.2023 -t prod -npc
If this step completes successfully, you should see a new folder named /tmp/hpcme.2023
containing the checkout script. Next you will need to create a Makefile
to compile the model inside of the container, as well as a Dockerfile to create the container itself:
fre make create-makefile -y CEFI_NWA12_cobalt.yaml -p hpcme.2023 -t prod
fre make create-dockerfile -y CEFI_NWA12_cobalt.yaml -p hpcme.2023 -t prod
If the these steps are successful, you should now see a Makefile
and a execrunscript.sh
in the /tmp/hpcme.2023
folder, as well as a Dockerfile
and a createContainer.sh
script in your current directory. The Dockerfile
contains instructions on how to create the container and then compile the model within the container, while the createContainer.sh
script contains commands to automate the process of building the container and saving it to both a tar
file and a singularity image file that can be used to run the container/model. Simply run the script to begin this process:
./createContainer.sh
Note that you can have the script run automatically by adding the --execute
flag to the create-dockerfile
command above.
Once the container has been built - this may take some time, depending on the availability of resources - you should now have .tar
and .sif
files named mom6_sis2_generic_4p_compile_symm_yaml-prod
. The .sif
file contains the execrunscript.sh
created in the previous step, meaning it can now be used just like model executable. In order to conduct the full 27 year retrospective in the NWA domain using fre, first run:
module unload fre/2025.01
module load fre/bronx-23
This replaces the latest version of fre
with a version that can perform container runs. Next navigate to the xmls directory:
cd ../../xmls/NWA12
To run the containerized model using fre, make the following changes to the xml:
1.) Change the fre_version
variable from bronx-22
to bronx-23
2.) Right below the line defining the CEFI_NWA12_COBALT_V1
experiment, add a line pointing to the container:
<experiment name="CEFI_NWA12_COBALT_V1" inherit="MOM6_SIS2_GENERIC_4P_compile_symm">
<container file="/gpfs/f5/cefi/scratch/Utheri.Wagura/CEFI-regional-MOM6/yamls/NWA12/mom6_sis2_generic_4p_compile_symm_yaml-prod.sif"/>
<description>
Finally run frerun
with the --container
flag to run the experiment:
frerun -x CEFI_NWA12_cobalt.xml -p ncrc6.intel23 -t prod CEFI_NWA12_COBALT_V1 --container
If you are running on c5, be sure to change the platform to ncrc5.intel23
(WIP)
If you don't have access to Fre
on your system, MED division also provides a Dockerfile that let's you create a container and compile the model on any system that has either docker
or podman
. (WIP: File not yet available).