Distribute external_file particle reading across all MPI ranks#6715
Open
Noerr wants to merge 1 commit intoBLAST-WarpX:developmentfrom
Open
Distribute external_file particle reading across all MPI ranks#6715Noerr wants to merge 1 commit intoBLAST-WarpX:developmentfrom
Noerr wants to merge 1 commit intoBLAST-WarpX:developmentfrom
Conversation
69f742d to
8079326
Compare
AddPlasmaFromFile() previously loaded the entire openPMD particle file on the IO rank, causing GPU OOM for large files (e.g. 117 GB / 2.4B particles on Frontier MI250X). All ranks now open the file collectively and read a 1/N slice via loadChunk(offset, extent) in sub-chunks of 2^22 particles, filtering with insideBounds() before each collective AddNParticles() call. Peak memory per rank is bounded at ~960 MB regardless of file size. Resolves: BLAST-WarpX#3185 Precedent: PR BLAST-WarpX#6221 (distributed density reading)
8079326 to
4f74d5f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
AddPlasmaFromFile()previously loaded the entire openPMD particle file on the IO rank, causing GPU OOM for large files. This PR distributes the file reading across all MPI ranks usingloadChunk(offset, extent)with sub-chunking to bound peak memory.openPMD::Series(path, READ_ONLY, Communicator())rank_chunk = ceil(npart_total / nranks)max_sub_chunk = 2^22(4M particles) bounds peak memory at ~960 MB/rankloadChunk(offset, extent)→insideBounds()filter → collectiveAddNParticles()series.flush()called outside thesub_count > 0guard so all ranks participate (flush may be collective with ADIOS2/HDF5 MPI backends)WARPX_ALWAYS_ASSERT_WITH_MESSAGEto catch particle overcount bugsFollows the same distributed I/O pattern established by PR #6221 (distributed density reading).
Changes
PlasmaInjector.Hstd::any m_openpmd_input_serieswithstd::string m_injection_file_pathPlasmaInjector.cppAddParticles.cppAddPlasmaFromFile()Motivation
hipMalloc returned 2: out of memoryon rank 0Redistribute(), exceeding GPU memory//TODO: Make changes for read/write in multiple MPI rankshas been in the code since PR Load Particles: external_file MPI Support #956 (May 2020)Validated on Frontier (AMReX TinyProfiler)
Test case: 4 nodes × 8 GCDs = 32 MPI ranks, ~740M particles.
AddParticles()MaxMem maxAddParticles()MaxMem maxRedistribute_partitionNallocThe serial run had rank 0 at 96% of GPU arena capacity. The distributed approach keeps all ranks under 3.2 GiB.
Test plan
test_3d_focusing_gaussian_beam_from_openpmd_picmishould pass (exercisesFromFileDistributionwith 2 MPI ranks)Resolves #3185