openPMD plugin: Restart from checkpoint with non-matching domain decomposition #5508

franzpoeschel · 2025-10-16T15:11:00Z

Factored out of #5405 as a standalone feature.

Different domain decomposition means a different distribution of parallel processes over the simulation domain. The number of parallel processes may vary in between runs.

Fields: Mostly support this already, just do random IO accesses into the sub-grid of interest. The PML fields do not support this, skip them (for now?).
Particles: So far, the restart logic expected that there would be exactly one precisely matching particle patch. The new logic now accepts any number of particle patches and filters the patches if they only overlap with the local domain.

TODO:

Auto-detect if PML load operations should be skipped
Maybe someone has a good test case for this

To reviewers, please also check the comments below

This reverts commit 17070daff52cd0d1ccc790ee0738e7ff9d99ef2c.

This reverts commit 2a67584769ab5d818d5fc0a1b205997e730931c7.

franzpoeschel

Some lines to pay special attention to for code reviewers

franzpoeschel · 2025-10-16T15:12:39Z

include/picongpu/plugins/openPMD/NDScalars.hpp

+                     * with, we must take care not to index past the dataset boundaries. Just loop around to the start
+                     * in that case. Not the finest way, but it does the job for now..
+                     */
+                    start.push_back(gridPos.revert()[d] % extent[d]);


Is this really the best solution? This loads nextId and startId. If the GPU grid does not match the old one, this logic currently just uses modulus to loop around and hand out some random ID. No idea if that has downsides.

franzpoeschel · 2025-10-16T15:14:38Z

include/picongpu/plugins/openPMD/restart/LoadSpecies.hpp

+                                for(size_t d = 0; d < simDim; ++d)
+                                {
+                                    auto positionInD = positionVec[d] + positionOffsetVec[d];
+                                    if(positionInD < patchTotalOffset[d] || positionInD >= patchUpperCorner[d])


This line decides if a particle from a partially overlapping domain should be considered. I treat the lower boundary as inclusive and the upper boundary as exclusive, otherwise I had single particles loaded into both processes.

franzpoeschel · 2025-10-16T15:16:29Z

include/picongpu/plugins/openPMD/restart/LoadSpecies.hpp

+                    DataSpace<simDim> const patchTotalOffset
+                        = localToTotalDomainOffset + threadParams->localWindowToDomainOffset;
+                    DataSpace<simDim> const patchExtent = threadParams->window.localDimensions.size;
+                    DataSpace<simDim> const patchUpperCorner = patchTotalOffset + patchExtent;


Yeah someone please check if all these position computations match up. They seem to do in my tests.

franzpoeschel · 2025-10-16T15:17:32Z

include/picongpu/plugins/openPMD/restart/LoadSpecies.hpp

+                                    filterKeep,
+                                    filterRemove,
+                                    alpaka::getPtrNative(filter));
+                            eventSystem::getTransactionEvent().waitForFinished();


Someone please verify the kernel and its call. I suspect that the call can be made more efficient.

franzpoeschel · 2025-10-16T15:22:45Z

include/picongpu/plugins/openPMD/restart/LoadSpecies.hpp

+                    //           patchExtent)) <<
+                    //           '\n';
+                    if((patchTotalOffset <= offsets[i]) == true_
+                       && ((offsets[i] + extents[i]) <= (patchTotalOffset + patchExtent)) == true_)


Again, someone please verify the positioning logic in here. Note that both ends are inclusive here, since ideally (i.e. in the normal case) we compare a particle patch from disk against itself, i.e. the local window which is the same.

include/picongpu/plugins/openPMD/restart/RestartFieldLoader.hpp

PrometheusPi · 2025-11-10T12:13:58Z

Regarding the open point "maybe someone has a good test case for this":
I guess the ion simulations from @pordyna or @paschk31 might qualify for this.

franzpoeschel added 6 commits October 16, 2025 14:25

Restart with different domain decomposition

369f4ba

Revert "Remap also on GPU"

ececd23

This reverts commit 17070daff52cd0d1ccc790ee0738e7ff9d99ef2c.

Revert "wip: redistribute on gpu"

4ab9853

This reverts commit 2a67584769ab5d818d5fc0a1b205997e730931c7.

Factor out loafFull/PartialMatches

05d2127

Continue factoring out common code

71a14d2

Do not double-load boundary particles

03b33cd

franzpoeschel commented Oct 16, 2025

View reviewed changes

franzpoeschel added 2 commits October 16, 2025 17:23

Cleanup

4fdf55b

Fix -Werror=sign-compare problem

6cdc175

franzpoeschel commented Oct 16, 2025

View reviewed changes

include/picongpu/plugins/openPMD/restart/RestartFieldLoader.hpp Outdated Show resolved Hide resolved

franzpoeschel added 3 commits October 17, 2025 13:14

Erase unused variables

0488d99

Replace stdout with logging

dc6f09b

Auto-detect if PML loads should be skipped

1d02d93

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

openPMD plugin: Restart from checkpoint with non-matching domain decomposition #5508

openPMD plugin: Restart from checkpoint with non-matching domain decomposition #5508

Uh oh!

franzpoeschel commented Oct 16, 2025 •

edited

Loading

Uh oh!

franzpoeschel left a comment

Uh oh!

franzpoeschel Oct 16, 2025

Uh oh!

franzpoeschel Oct 16, 2025

Uh oh!

franzpoeschel Oct 16, 2025

Uh oh!

franzpoeschel Oct 16, 2025

Uh oh!

franzpoeschel Oct 16, 2025

Uh oh!

Uh oh!

PrometheusPi commented Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

openPMD plugin: Restart from checkpoint with non-matching domain decomposition #5508

Are you sure you want to change the base?

openPMD plugin: Restart from checkpoint with non-matching domain decomposition #5508

Uh oh!

Conversation

franzpoeschel commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

franzpoeschel left a comment

Choose a reason for hiding this comment

Uh oh!

franzpoeschel Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

franzpoeschel Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

franzpoeschel Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

franzpoeschel Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

franzpoeschel Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

PrometheusPi commented Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

franzpoeschel commented Oct 16, 2025 •

edited

Loading