-
Notifications
You must be signed in to change notification settings - Fork 8
Domain decomposition and halo construction #540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- module level log - add builder for decomposition, remove from constructor - start/end index add workaround for global INTERIOR
remove dtype arg from int fields as it is always the same
# Conflicts: # model/common/pyproject.toml # model/common/src/icon4py/model/common/decomposition/mpi_decomposition.py # model/common/src/icon4py/model/common/dimension.py # model/common/src/icon4py/model/common/grid/grid_manager.py # model/common/tests/grid_tests/test_grid_manager.py # model/testing/src/icon4py/model/testing/datatest_utils.py # model/testing/src/icon4py/model/testing/serialbox.py # tools/src/icon4py/tools/common/metadata.py # tools/src/icon4py/tools/icon4pygen/bindings/locations.py
# Conflicts: # model/common/pyproject.toml
|
cscs-ci run default |
|
cscs-ci run distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces domain decomposition of ICON grid files using METIS (pymetis) and constructs ICON-like halo regions for distributed runs. It also refactors grid loading so GridManager can build distributed/local grids (including refinement-domain bounds) and expands MPI-focused test coverage.
Changes:
- Add METIS-based decomposition and ICON-like halo construction for cell/edge/vertex dimensions.
- Refactor grid file reading/index transformations and update grid/config/decomposition APIs (
horizontal_size,DecompositionInfo.set_dimension, etc.). - Add/adjust MPI tests and update existing tests/utilities for the new decomposition/halo behavior.
Reviewed changes
Copilot reviewed 47 out of 49 changed files in this pull request and generated 39 comments.
Show a summary per file
| File | Description |
|---|---|
| uv.lock | Adds mpi4py/pymetis to lockfile entries. |
| tach.toml | Excludes pymetis from tach scanning. |
| pyproject.toml | Adds mpi4py to typing extras. |
| model/common/pyproject.toml | Adds pymetis to io extra. |
| tools/src/icon4py/tools/py2fgen/wrappers/common.py | Updates grid/decomposition wrapper construction for new APIs. |
| model/testing/src/icon4py/model/testing/serialbox.py | Updates savepoint decomposition/grid construction to new APIs. |
| model/testing/src/icon4py/model/testing/parallel_helpers.py | Simplifies logging helpers; updates decomposition info size reporting. |
| model/testing/src/icon4py/model/testing/grid_utils.py | Switches to GridManager-provided decomposition_info; updates GridManager init. |
| model/testing/src/icon4py/model/testing/fixtures/benchmark.py | Uses grid_manager.decomposition_info instead of constructing it. |
| model/testing/src/icon4py/model/testing/definitions.py | Adds a new torus grid descriptor. |
| model/standalone_driver/src/icon4py/model/standalone_driver/driver_utils.py | Updates GridManager creation signature (but needs follow-up fixes). |
| model/common/tests/common/metrics/mpi_tests/test_parallel_metrics.py | Skips MPI tests when optional MPI deps are missing. |
| model/common/tests/common/interpolation/mpi_tests/test_parallel_interpolation.py | Skips MPI tests when optional MPI deps are missing. |
| model/common/tests/common/io/unit_tests/test_writers.py | Updates horizontal_config → horizontal_size. |
| model/common/tests/common/io/unit_tests/test_io.py | Updates horizontal_config → horizontal_size. |
| model/common/tests/common/grid/unit_tests/test_icon.py | Improves skip-value assertion messages. |
| model/common/tests/common/grid/unit_tests/test_horizontal.py | Adds test for max_boundary_level. |
| model/common/tests/common/grid/unit_tests/test_gridfile.py | Updates GridFile API usage; adds selective read tests. |
| model/common/tests/common/grid/unit_tests/test_grid_refinement.py | Updates domain-bounds tests to use decomposition info; adds more LAM cases. |
| model/common/tests/common/grid/unit_tests/test_grid_manager.py | Adds tests for distributed connectivities and decomposition info; updates GridManager usage. |
| model/common/tests/common/grid/fixtures.py | Introduces shared fixtures module for grid tests. |
| model/common/tests/common/grid/mpi_tests/utils.py | Adds helpers to run grid manager in single/multi-rank modes. |
| model/common/tests/common/grid/mpi_tests/test_parallel_icon.py | Adds distributed-grid skip-value test; refactors existing MPI grid tests. |
| model/common/tests/common/grid/mpi_tests/test_parallel_grid_refinement.py | Adds MPI test for compute_domain_bounds (contains an xfail bug). |
| model/common/tests/common/grid/mpi_tests/test_parallel_grid_manager.py | Adds extensive single-vs-multi rank comparisons and validation tests (contains an assertion bug). |
| model/common/tests/common/grid/mpi_tests/test_parallel_geometry.py | Skips when MPI deps missing; minor refactor. |
| model/common/tests/common/decomposition/fixtures.py | Adds fixture for simple-grid neighbor tables. |
| model/common/tests/common/decomposition/utils.py | Adds simple-grid decomposition/halo reference data and dummy props. |
| model/common/tests/common/decomposition/unit_tests/test_halo.py | Adds unit tests for halo construction and mapping. |
| model/common/tests/common/decomposition/unit_tests/test_definitions.py | Adds tests for new DecompositionInfo behavior (halo masks, distributed detection). |
| model/common/tests/common/decomposition/mpi_tests/test_parallel_halo.py | Adds MPI test ensuring unique ownership across ranks. |
| model/common/tests/common/decomposition/mpi_tests/test_mpi_decomposition.py | Expands MPI decomposition tests; logging tweaks; adds halo-level mask tests. |
| model/common/src/icon4py/model/common/utils/data_allocation.py | Adds array_ns_from_array helper (numpy vs cupy selection). |
| model/common/src/icon4py/model/common/interpolation/stencils/compute_cell_2_vertex_interpolation.py | Updates GT4Py typing annotations for dims. |
| model/common/src/icon4py/model/common/interpolation/interpolation_factory.py | Adjusts provider registration call for nudgecoeffs. |
| model/common/src/icon4py/model/common/grid/simple.py | Updates GridConfig argument name to horizontal_size. |
| model/common/src/icon4py/model/common/grid/icon.py | Adjusts skip-value logic to treat distributed like limited-area. |
| model/common/src/icon4py/model/common/grid/horizontal.py | Adds ordered/halo-domain helpers; adds max_boundary_level; doc updates. |
| model/common/src/icon4py/model/common/grid/gridfile.py | Adds transformation protocol, dynamic/fixed dimensions, selective reads, context manager. |
| model/common/src/icon4py/model/common/grid/grid_refinement.py | Computes domain bounds using decomposition info and halo placement rules. |
| model/common/src/icon4py/model/common/grid/grid_manager.py | Major refactor: decomposed grid construction, local reading, halo integration. |
| model/common/src/icon4py/model/common/grid/base.py | Renames GridConfig field to horizontal_size; uses shared MissingConnectivityError. |
| model/common/src/icon4py/model/common/exceptions.py | Adds ValidationError and MissingConnectivityError; typing improvements. |
| model/common/src/icon4py/model/common/decomposition/mpi_decomposition.py | Updates mpi4py import handling; field slicing based on decomposition info indices. |
| model/common/src/icon4py/model/common/decomposition/definitions.py | Refactors DecompositionInfo to require halo levels and adds halo helpers/flags. |
| model/common/src/icon4py/model/common/decomposition/decomposer.py | Adds METIS and single-node decomposers. |
| model/common/src/icon4py/model/common/decomposition/halo.py | Adds ICON-like halo constructor and global→local connectivity mapping. |
| model/atmosphere/dycore/tests/dycore/integration_tests/test_benchmark_solve_nonhydro.py | Switches owner_mask source to geometry field provider. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
model/standalone_driver/src/icon4py/model/standalone_driver/driver_utils.py
Show resolved
Hide resolved
| definitions.DecompositionInfo() | ||
| # TODO (halungge): last argument is called `decomp_domain` in icon, it is not needed in the granules should we pass it nevertheless? | ||
| .set_dimension(dims.CellDim, c_glb_index, c_owner_mask, None) | ||
| .set_dimension(dims.EdgeDim, e_glb_index, e_owner_mask, None) | ||
| .set_dimension(dims.VertexDim, v_glb_index, v_owner_mask, None) |
Copilot
AI
Feb 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DecompositionInfo.set_dimension(...) now requires a halo_levels array; passing None will break later calls like is_distributed(), halo_levels(), and halo_level_mask() (and any exchange logic relying on them). Please pass a properly-sized int array (e.g., OWNED for all entries if halo info is unavailable) instead of None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would mean changing the fortran interface to pass decomp_domain, so I'd leave this as None for now.
We should check that icon blueline still works correctly and doesn't try to access the Nones before merging though, so I'll leave this comment unresolved until then.
model/common/tests/common/interpolation/mpi_tests/test_parallel_interpolation.py
Show resolved
Hide resolved
|
cscs-ci run default |
| # TODO(msimberg): Single rank, not node. | ||
| def run_grid_manager_for_singlenode(file: pathlib.Path) -> gm.GridManager: | ||
| manager = _grid_manager(file, NUM_LEVELS) | ||
| manager( | ||
| keep_skip_values=True, | ||
| run_properties=decomp_defs.SingleNodeProcessProperties(), | ||
| decomposer=decomp.SingleNodeDecomposer(), | ||
| allocator=None, | ||
| ) | ||
| return manager | ||
|
|
||
|
|
||
| # TODO(msimberg): Fix typos. Consistent naming with above function. | ||
| def run_gridmananger_for_multinode( | ||
| file: pathlib.Path, | ||
| run_properties: decomp_defs.ProcessProperties, | ||
| decomposer: decomp.Decomposer, | ||
| ) -> gm.GridManager: | ||
| manager = _grid_manager(file, NUM_LEVELS) | ||
| manager( | ||
| keep_skip_values=True, allocator=None, run_properties=run_properties, decomposer=decomposer | ||
| ) | ||
| return manager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've renamed these multi_rank and single_rank, but for bikeshedding purposes we could consider just calling them distributed and local. I'm not super convinced by the latter, but distributed is very established and we already use the term in some places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also have a number of other utilities that still use single/multinode that should be renamed but I'm not renaming them in this PR. This renaming is already on https://hackmd.io/WljE8xX3STmJL4VkqYpqQw?both.
|
Mandatory Tests Please make sure you run these tests via comment before you merge!
Optional Tests To run benchmarks you can use:
To run tests and benchmarks with the DaCe backend you can use:
To run test levels ignored by the default test suite (mostly simple datatest for static fields computations) you can use:
For more detailed information please look at CI in the EXCLAIM universe. |
| # TODO(msimberg): What should we do about this. (The global) num_cells is | ||
| # not guaranteed to be set here when used through fortran. Should we: | ||
| # 1. Ignore distributed? | ||
| # 2. Compute num_cells with a reduction? | ||
| # 3. Use a ProcessProperties to detect it? | ||
| distributed = ( | ||
| config.num_cells < global_properties.num_cells | ||
| if global_properties.num_cells is not None | ||
| else False | ||
| ) | ||
| limited_area_or_distributed = config.limited_area or distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be good to resolve before this PR is merged.
Decompose (global) grid file:
pymetisto decompose the global grid (cells) intonpatchesOmissions:
LAM grids need to be investigated further:
start_indexandend_indexnot in the halo construction.the number of halo lines (in terms of cells) is hardcoded to 2, that could be made a parameter.
Not sure it all runs on GPU correctly... most probably there are some
numpycupyissues to fix.