Distributed multi-process execution of the Nextgen driver (ngen) is possible through use of an implementation of MPI.
- Overview
- Building the MPI-Enabled Nextgen Driver
- Running the MPI-Enabled Nextgen Driver
- Partition Config Generator
The basic design starts with partitioning configuration. It defines the groupings of catchments and nexuses of the entire hydrofabric into separate collections, while also including details on remote connections where the partitions have features that interact. It also implicitly defines partition processing; partitions have an id
property, and it is expected for partitions and MPI ranks to map 1-to-1 and have the same identifier.
Partition configurations are expected to be represented in JSON. There is a partition generator tool available separately from the main driver for creating these files.
When executed, the driver is provided a valid partitioning configuration file path as a command line arg. Each rank loads all or part of the supplied hydrofabric, and then constructs the specific model formulations appropriate for the features within its partition. Data communication across partition boundaries is handled by a remote nexus type.
To enable distributed processing capabilities in the driver, the CMake build must be generated with the NGEN_WITH_MPI
variable set to ON
, such as NGEN_WITH_MPI:BOOL=ON
and the necessary MPI libraries must be available to CMake. Additionally, certain features are only supported if NGEN_WITH_PYTHON
is also ON
.
Otherwise, if using the standard build process with CMake, the driver is built using the same commands. It can be built either as part of the entire project or the ngen
CMake target. CMake manages the necessary adjustments to, e.g., the compiler used, etc.
- The MPI-enabled driver is run by wrapping within the
mpirun
command, supplying also the number of processes to start. - An additional driver CLI arg is also required, to supply the path to the partitioner config.
- Another flag may also optionally be provided as a CLI arg to adjust the way the driver processes load the hydrofabric.
Certain hydrofabrics may currently require too much memory to be fully loaded by each individual MPI rank/process, which is the default behavior prior to initializing individual catchment formulations. To work around this, it is possible to include the following flag as the final command line argument:
--subdivided-hydrofabric
This will indicate to the driver processes that the provided, full hydrofabric files should not be directly loaded. Instead, rank/process/partition specific files should be used. Each of these contain a subdivided, non-overlapping portion of the entire hydrofabric and are expected to correspond directly to a partition.
When subdivided hydrofabrics should be used, the driver processes will first check to see if the necessary subdivided hydrofabric files already exist. If they do not, and the functionality for doing so is enabled, the driver will generate them. If a subdivided hydrofabric is required, but files are not available and cannot be generated, the driver exits in error.
The name of an expected or generated subdivided hydrofabric file is based on two things:
- the name of the complete hydrofabric file from which is its data is obtained
- the partition/rank id
This will have a partition specific suffix but otherwise have the same name as the full hydrofabric files. E.g., catchment_data.geojson.0 would be the subdivided hydrofabric file for catchment_data.geojson specific to rank 0.
Driver processes may, under certain conditions, be able to self-subdivide a hydrofabric and generate the files when necessary. For this to be possible, the executable must have been built with Python support (via the CMake NGEN_WITH_PYTHON
being set to ON
), and the required package must be installed within the Python environment available to the driver processes.
- the CMake build directory is named
cmake-build/
- four MPI processes are started
- the catchment and nexus hydrofabric files, realization config, and partition config have intuitive names and are located in the current working directory
- all processes completely load the entire hydrofabric
mpirun -n 4 cmake-build/ngen catchment_data.geojson "all" nexus_data.geojson "all" realization_config.json partition_config.json
- the CMake build directory is named
cmake-build/
- eight MPI processes are started
- the catchment and nexus hydrofabric files, realization config, and partition config have intuitive names and are located in the current working directory
- each process only load files for a subdivided portion of the hydrofabric that corresponds to a given process's partition
mpirun -n 8 cmake-build/ngen catchment_data.geojson "all" nexus_data.geojson "all" realization_config.json partition_config.json --subdivided-hydrofabric
A separate artifact can be built for generating partition configs. This is the partitionGenerator executable, built in the CMake build directory either when the entire project or specifically the partitionGenerator
CMake target is built. The syntax for using is:
<cmake-build-dir>/partitionGenerator <catchment_data_file> <nexus_data_file> <output_partition_config> <num_partitions> '' ''
E.g.:
./cmake-build-debug/partitionGenerator ./data/huc01_hydrofabric/catchment_data.geojson ./data/huc01_hydrofabric/nexus_data.geojson ./partition_config.json 4 '' ''
The last two arguments are intended to allow for partitioning only a subset of the entire hydrofabric. Note also that single-quotes must be used. At this time, these are required, but it is recommended they be left as empty strings.