Skip to content

Distribute external_file particle reading across all MPI ranks#6715

Open
Noerr wants to merge 1 commit intoBLAST-WarpX:developmentfrom
Noerr:feature/distributed-fromfile-reading
Open

Distribute external_file particle reading across all MPI ranks#6715
Noerr wants to merge 1 commit intoBLAST-WarpX:developmentfrom
Noerr:feature/distributed-fromfile-reading

Conversation

@Noerr
Copy link
Copy Markdown
Contributor

@Noerr Noerr commented Mar 26, 2026

Summary

AddPlasmaFromFile() previously loaded the entire openPMD particle file on the IO rank, causing GPU OOM for large files. This PR distributes the file reading across all MPI ranks using loadChunk(offset, extent) with sub-chunking to bound peak memory.

  • All ranks open the file collectively via openPMD::Series(path, READ_ONLY, Communicator())
  • Particle index range divided evenly: rank_chunk = ceil(npart_total / nranks)
  • Sub-chunk loop with max_sub_chunk = 2^22 (4M particles) bounds peak memory at ~960 MB/rank
  • Each iteration: loadChunk(offset, extent)insideBounds() filter → collective AddNParticles()
  • series.flush() called outside the sub_count > 0 guard so all ranks participate (flush may be collective with ADIOS2/HDF5 MPI backends)
  • Added WARPX_ALWAYS_ASSERT_WITH_MESSAGE to catch particle overcount bugs

Follows the same distributed I/O pattern established by PR #6221 (distributed density reading).

Changes

File Description
PlasmaInjector.H Replace std::any m_openpmd_input_series with std::string m_injection_file_path
PlasmaInjector.cpp Store file path instead of Series handle; Series closes after metadata read
AddParticles.cpp Distributed sub-chunked reading in AddPlasmaFromFile()

Motivation

  • Reproduction: Frontier (OLCF), 64 MI250X GCDs (62 GiB HBM each), 117 GB openPMD/HDF5 file (~2.4B particles) → hipMalloc returned 2: out of memory on rank 0
  • Root cause: IO rank must hold ~133 GB of particle data in memory before Redistribute(), exceeding GPU memory
  • TODO comment: //TODO: Make changes for read/write in multiple MPI ranks has been in the code since PR Load Particles: external_file MPI Support #956 (May 2020)

Validated on Frontier (AMReX TinyProfiler)

Test case: 4 nodes × 8 GCDs = 32 MPI ranks, ~740M particles.

Metric Serial (IO rank only) Distributed (this PR)
GPU AddParticles() MaxMem max 31 GiB 701 MiB (45× reduction)
Pinned AddParticles() MaxMem max 63 GiB 495 MiB (130× reduction)
GPU Arena used (min...max) 878 MiB ... 46,065 MiB 1,101 MiB ... 3,188 MiB
Pinned Arena used (min...max) 381 MiB ... 97,277 MiB 903 MiB ... 1,297 MiB
Redistribute_partition Nalloc 194,517 7,835

The serial run had rank 0 at 96% of GPU arena capacity. The distributed approach keeps all ranks under 3.2 GiB.

Test plan

  • Existing test test_3d_focusing_gaussian_beam_from_openpmd_picmi should pass (exercises FromFileDistribution with 2 MPI ranks)
  • Verified on Frontier: 32 MPI ranks, ~2.4B particle file, no OOM, simulation ran to completion where it previously failed on particle load.
  • Verified on Frontier: 32 MPI ranks, ~740M particle file. Simulation produces same answer over ~3000 time steps for the legacy serial load and this distributed particle load.
  • Confirm CI passes (eg. builds 1D, 2D, 3D, RZ)

Resolves #3185

@Noerr Noerr force-pushed the feature/distributed-fromfile-reading branch from 69f742d to 8079326 Compare March 26, 2026 13:29
AddPlasmaFromFile() previously loaded the entire openPMD particle file
on the IO rank, causing GPU OOM for large files (e.g. 117 GB / 2.4B
particles on Frontier MI250X). All ranks now open the file collectively
and read a 1/N slice via loadChunk(offset, extent) in sub-chunks of
2^22 particles, filtering with insideBounds() before each collective
AddNParticles() call. Peak memory per rank is bounded at ~960 MB
regardless of file size.

Resolves: BLAST-WarpX#3185
Precedent: PR BLAST-WarpX#6221 (distributed density reading)
@Noerr Noerr force-pushed the feature/distributed-fromfile-reading branch from 8079326 to 4f74d5f Compare March 26, 2026 14:00
@RemiLehe RemiLehe requested a review from ax3l March 30, 2026 21:50
@ax3l ax3l added the component: openPMD openPMD I/O label Mar 31, 2026
@ax3l ax3l requested a review from WeiqunZhang March 31, 2026 23:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Problems with injection from external file

2 participants