Distribute external_file particle reading across all MPI ranks by Noerr · Pull Request #6715 · BLAST-WarpX/warpx

Noerr · 2026-03-26T13:24:15Z

Summary

AddPlasmaFromFile() previously loaded the entire openPMD particle file on the IO rank, causing GPU OOM for large files. This PR distributes the file reading across all MPI ranks using loadChunk(offset, extent) with sub-chunking to bound peak memory.

All ranks open the file collectively via openPMD::Series(path, READ_ONLY, Communicator())
Particle index range divided evenly: rank_chunk = ceil(npart_total / nranks)
Sub-chunk loop with max_sub_chunk = 2^22 (4M particles) bounds peak memory at ~960 MB/rank
Each iteration: loadChunk(offset, extent) → insideBounds() filter → collective AddNParticles()
series.flush() called outside the sub_count > 0 guard so all ranks participate (flush may be collective with ADIOS2/HDF5 MPI backends)
Added WARPX_ALWAYS_ASSERT_WITH_MESSAGE to catch particle overcount bugs

Follows the same distributed I/O pattern established by PR #6221 (distributed density reading).

Changes

File	Description
`PlasmaInjector.H`	Replace `std::any m_openpmd_input_series` with `std::string m_injection_file_path`
`PlasmaInjector.cpp`	Store file path instead of Series handle; Series closes after metadata read
`AddParticles.cpp`	Distributed sub-chunked reading in `AddPlasmaFromFile()`

Motivation

Reproduction: Frontier (OLCF), 64 MI250X GCDs (62 GiB HBM each), 117 GB openPMD/HDF5 file (~2.4B particles) → hipMalloc returned 2: out of memory on rank 0
Root cause: IO rank must hold ~133 GB of particle data in memory before Redistribute(), exceeding GPU memory
TODO comment: //TODO: Make changes for read/write in multiple MPI ranks has been in the code since PR Load Particles: external_file MPI Support #956 (May 2020)

Validated on Frontier (AMReX TinyProfiler)

Test case: 4 nodes × 8 GCDs = 32 MPI ranks, ~740M particles.

Metric	Serial (IO rank only)	Distributed (this PR)
GPU `AddParticles()` MaxMem max	31 GiB	701 MiB (45× reduction)
Pinned `AddParticles()` MaxMem max	63 GiB	495 MiB (130× reduction)
GPU Arena used (min...max)	878 MiB ... 46,065 MiB	1,101 MiB ... 3,188 MiB
Pinned Arena used (min...max)	381 MiB ... 97,277 MiB	903 MiB ... 1,297 MiB
`Redistribute_partition` Nalloc	194,517	7,835

The serial run had rank 0 at 96% of GPU arena capacity. The distributed approach keeps all ranks under 3.2 GiB.

Test plan

Existing test test_3d_focusing_gaussian_beam_from_openpmd_picmi should pass (exercises FromFileDistribution with 2 MPI ranks)
Verified on Frontier: 32 MPI ranks, ~2.4B particle file, no OOM, simulation ran to completion where it previously failed on particle load.
Verified on Frontier: 32 MPI ranks, ~740M particle file. Simulation produces same answer over ~3000 time steps for the legacy serial load and this distributed particle load.
Confirm CI passes (eg. builds 1D, 2D, 3D, RZ)

Resolves #3185

AddPlasmaFromFile() previously loaded the entire openPMD particle file on the IO rank, causing GPU OOM for large files (e.g. 117 GB / 2.4B particles on Frontier MI250X). All ranks now open the file collectively and read a 1/N slice via loadChunk(offset, extent) in sub-chunks of 2^22 particles, filtering with insideBounds() before each collective AddNParticles() call. Peak memory per rank is bounded at ~960 MB regardless of file size. Resolves: BLAST-WarpX#3185 Precedent: PR BLAST-WarpX#6221 (distributed density reading)

Noerr force-pushed the feature/distributed-fromfile-reading branch from 69f742d to 8079326 Compare March 26, 2026 13:29

Noerr force-pushed the feature/distributed-fromfile-reading branch from 8079326 to 4f74d5f Compare March 26, 2026 14:00

RemiLehe requested a review from ax3l March 30, 2026 21:50

RemiLehe assigned ax3l Mar 31, 2026

ax3l added the component: openPMD openPMD I/O label Mar 31, 2026

ax3l requested a review from WeiqunZhang March 31, 2026 23:51

ax3l added the performance optimization label Mar 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distribute external_file particle reading across all MPI ranks#6715

Distribute external_file particle reading across all MPI ranks#6715
Noerr wants to merge 1 commit intoBLAST-WarpX:developmentfrom
Noerr:feature/distributed-fromfile-reading

Noerr commented Mar 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Noerr commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Motivation

Validated on Frontier (AMReX TinyProfiler)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Noerr commented Mar 26, 2026 •

edited

Loading