ClimaCoupler Lessons Learned

Running interactively on the new partition of Caltech's HPC

set export CLIMACOMMS_CONTEXT=SINGLETON (avoids the PMI2_Init error)
get an interactive session on a compute node: srun --pty -t hh:mm:ss -n tasks -N nodes /bin/bash -l
for more SLURM commands see https://www.hpc.caltech.edu/documentation/slurm-commands

Working with external data files

We use Clima’s CaltechBox to store external files, namely in ClimaCoupler/data
To download a file, grab a link by clicking on Share. Be sure to grab the Direct Link (with the file extension) by clicking on Link Settings. If you use the default link, your file may not be readable.
For example, to download file https://caltech.box.com/s/123.hdf5 from Julia you can use two approaches:
- 1. use Downloads.download(“https://caltech.box.com/s/123.hdf5”, “your_file_name.hdf5”)
- 1. use Artifacts.jl (and ArtifactWrapper.jl with some convenience functions), which we use for more formal file tracking and for more complex containers (e.g., tarballs):

function your_dataset_path()
    _dataset = AW.ArtifactWrapper(
        @__DIR__,
        "123",
        AW.ArtifactFile[AW.ArtifactFile(
            url = "Downloads.download(“https://caltech.box.com/s/123.hdf5",
            filename = "your_file_name.hdf5",
        ),],
    )
    return AW.get_data_folder(_dataset)
end
sst_data = joinpath(your_dataset_path(), "your_file_name.hdf5")

Reading files
- NetCDF: use NCDatasets.NCDataset(“your_file_name.nc”)
- Hdf5: use ClimaCore.InputOutput.HDF5Reader("your_file_name.hdf5")

Running MPI on the cluster

Example commands to run

This example will run the tests in ClimaCoupler.jl/test/mpi_tests/run_mpi_tests.jl using MPI with up to 3 nodes. To run other tests, navigate to the desired directory and change the include command accordingly. These commands should be run from the HPC.


srun -n 3 -t 01:00:00 --pty bash # request 3 processors for 1 hour

cd ClimaCoupler.jl/test # navigate to correct test directory
module purge
module load julia/1.8.1 openmpi/4.1.1 hdf5/1.12.1-ompi411

export CLIMACORE_DISTRIBUTED="MPI"
export JULIA_MPI_BINARY="system"

julia --project -e 'using Pkg; Pkg.instantiate(); Pkg.build()'
julia --project -e 'using Pkg; Pkg.build("MPI"); Pkg.build("HDF5")'
julia --project -e 'using MPIPreferences; MPIPreferences.use_system_binary()'
julia --project -e 'include("mpi_tests/run_mpi_tests.jl")'

Alternatively, this set of commands can be run using sbatch instead of srun by adding all lines after the srun line to a bash script, e.g. script.sh, then running sbatch -n 3 -t 01:00:00 script.sh.

For debugging it may be useful to run the MPI.mpiexec command in the REPL. In that case, make sure you set up your environment and builds as above before entering Julia.

Note that the cluster can be unreliable when running with MPI, so these commands may raise an error. If that occurs, exit and try again to log in onto a different node.

Communications contexts

Our MPI package, ClimaComms.jl, contains convenience functions that utilize the MPI.jl backend, so ClimaCommsMPI.MPICommsContext wraps MPI.COMM_WORLD, which is a communicator object that describes partitioning into multiple MPI processes.
ClimaComms.SingletonCommsContext is a dummy object of type AbstractCommsContext used for non-distributed runs. This enables us to write more generic functions that dispatch on the type of the communicator, with the high-level code remaining unaffected.
Make sure that the communications context being used for MPI is both instantiated (comms_ctx = ClimaCommsMPI.MPICommsContext()) and initialized (pid, nprocs = ClimaComms.init(comms_ctx), with pid denoting the current process ID, and nproc the total number of processes) before its first use.
If using a ClimaCore topology/space, make sure to use one which allows distributed computing, and initialize it using the corresponding MPI communications context (topology = Topologies.DistributedTopology2D(comms_ctx, mesh, Topologies.spacefillingcurve(mesh))).

Other notes

A helpful setup is a test file which contains tests that use MPI (e.g. ClimaCoupler.jl/test/mpi_tests/regridder_mpi_tests.jl, and a file (e.g. ClimaCoupler.jl/test/mpi_tests/run_mpi_tests.jl), which runs the test files with the MPI setup
MPI can be unreliable on Windows, so we generally do not run it on that OS. This and the fact that GH Actions only allows a limited number of processes are the reasons why we run our MPI unit tests only on Buildkite.

Resources

Getting started with MPI

Remapping Approach

Background

We’ve tried two different approaches to implement remappings:

Non-conservative clean_mask function applied after non-monotone remapping
- Cuts off values outside of desired range
- Advantages: fast, does not reduce spatial resolution
- Disadvantages: does not conserve global total of quantity being remapped
Conservative monotone remapping function
- A monotone remapping has the quality that no new minima nor maxima are introduced (i.e. all weights used in the mapping are between [0,1])
- Advantage: remapping is both conservative and monotone
- Disadvantages: decreases spatial resolution (i.e., it's a lower order method), slower approach

TempestRemap has multiple functions that generate remappings. The currently-used function in ClimaCoupler is GenerateOfflineMap, and the alternative is GenerateTransposeMap. Note that GenerateTransposeMap reverses a previous remapping and can only map FV griddings to CGLL (not vice versa). Thus, to apply this function, a minimum of 2 and potentially 3 remappings must be applied, which is quite costly.

Tests

To compare GenerateOfflineMap and GenerateTransposeMap, as well as monotone vs non-monotone remappings, we performed a number of remappings using the seamask.nc dataset.

We find that at a spatial resolution of h_elem=6, monotone and non-monotone remappings produce qualitatively similar results when applying the map created by either GenerateOfflineMap or GenerateTransposeMap. However, at a spatial resolution of h_elem=4, the monotonicity has a substantial effect. This suggests that considerations of enforcing monotonicity of the mapping may be important at lower resolutions, but is not as essential for slightly higher resolutions (for which monotone remapping is always recommended).

Main Takeaways

Monotone remappings should be used when being applied to quantities where global conservation is important (i.e. for fluxes), but are not strictly speaking necessary for values where this is not required (i.e. for land cover) or when spatial resolution is sufficiently high.
GenerateOfflineMap is the currently used method to create remappings in ClimaCoupler, and performs about as well as GenerateTransposeMap. In addition, it is easier to use as it doesn’t require a previous mapping, so we will continue to use it moving forward.

Relevant Plots

Remapping using `GenerateTransposeMap` with h_elem = 4: monotone (left) vs non-monotone (right)

Remapping using `GenerateTransposeMap` with h_elem = 6: monotone (left) vs non-monotone (right)

Remapping using `GenerateOfflineMap` with h_elem = 4: monotone (left) vs non-monotone (right)

Remapping using `GenerateOfflineMap` with h_elem = 6: monotone (left) vs non-monotone (right)

Note that while it may appear that the monotone plots contain negative values, this is merely a consequence of the plotting method used. The values for these plots are contained in [0, 1].

Also note that these boundary conditions cause numerical instability when used in an atmospheric simulation with h_elem=4 and n_poly=3 (=polynomial degree = number of GLL nodes - 1), whether monotonous or not. The same resolution for the aquaplanet setup is stable.

Direct stiffness summation

dss needs to be applied to all variables that are being calculated as part of the step! and passed to other models. For example here is is tropical ice growth vs not, due to q_sfc not being dss'd:

SLURM ntask and mem scheduling

using either just slurm_ntasks, or slurm_ntasks_per_node + slurm_nodes (seethe docs)
global slurm_mem clashes with slurm_mem_per_cpu (issue highlighted here)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ClimaCoupler Lessons Learned

Running interactively on the new partition of Caltech's HPC

Working with external data files

Running MPI on the cluster

Example commands to run

Communications contexts

Other notes

Resources

Remapping Approach

Background

Tests

Main Takeaways

Relevant Plots

Remapping using `GenerateTransposeMap` with h_elem = 4: monotone (left) vs non-monotone (right)

Remapping using `GenerateTransposeMap` with h_elem = 6: monotone (left) vs non-monotone (right)

Remapping using `GenerateOfflineMap` with h_elem = 4: monotone (left) vs non-monotone (right)

Remapping using `GenerateOfflineMap` with h_elem = 6: monotone (left) vs non-monotone (right)

Direct stiffness summation

SLURM ntask and mem scheduling

Clone this wiki locally

ClimaCoupler Lessons Learned

Running interactively on the new partition of Caltech's HPC

Working with external data files

Running MPI on the cluster

Example commands to run

Communications contexts

Other notes

Resources

Remapping Approach

Background

Tests

Main Takeaways

Relevant Plots

Remapping using GenerateTransposeMap with h_elem = 4: monotone (left) vs non-monotone (right)

Remapping using GenerateTransposeMap with h_elem = 6: monotone (left) vs non-monotone (right)

Remapping using GenerateOfflineMap with h_elem = 4: monotone (left) vs non-monotone (right)

Remapping using GenerateOfflineMap with h_elem = 6: monotone (left) vs non-monotone (right)

Direct stiffness summation

SLURM ntask and mem scheduling

Clone this wiki locally

Remapping using `GenerateTransposeMap` with h_elem = 4: monotone (left) vs non-monotone (right)

Remapping using `GenerateTransposeMap` with h_elem = 6: monotone (left) vs non-monotone (right)

Remapping using `GenerateOfflineMap` with h_elem = 4: monotone (left) vs non-monotone (right)

Remapping using `GenerateOfflineMap` with h_elem = 6: monotone (left) vs non-monotone (right)