Implement alternative shuffling methods for collisions#6692
Open
dpgrote wants to merge 23 commits intoBLAST-WarpX:developmentfrom
Open
Implement alternative shuffling methods for collisions#6692dpgrote wants to merge 23 commits intoBLAST-WarpX:developmentfrom
dpgrote wants to merge 23 commits intoBLAST-WarpX:developmentfrom
Conversation
| ); | ||
| if (m_shuffling_method == ParticleShufflingMethod::Modulus) { | ||
| ModulusShuffle(n_cells, np1, m_modulus_rounds, cell_offsets_1, indices_1); | ||
| /* ModulusShuffle(n_cells, np2, m_modulus_rounds, cell_offsets_2, indices_2); */ |
Check notice
Code scanning / CodeQL
Commented-out code Note
| FisherYatesShuffle(n_cells, cell_offsets_2, indices_2); | ||
| } else if (m_shuffling_method == ParticleShufflingMethod::Standard) { | ||
| StandardShuffle(n_cells, cell_offsets_1, indices_1); | ||
| /* StandardShuffle(n_cells, cell_offsets_2, indices_2); */ |
Check notice
Code scanning / CodeQL
Commented-out code Note
ax3l
reviewed
Mar 27, 2026
Comment on lines
+231
to
+232
| std::random_device rd; | ||
| std::mt19937 g(rd()); |
Member
There was a problem hiding this comment.
Suggestion: you want to init the state of your mersenne twister generator only once and keep it g around.
dpgrote
commented
Mar 27, 2026
| WARPX_ABORT_WITH_MESSAGE("Standard shuffle not supported on GPU"); | ||
| #else | ||
| std::random_device rd; | ||
| std::mt19937 g(rd()); |
Member
Author
There was a problem hiding this comment.
@WeiqunZhang can you suggest a way to use a random engine from AMReX?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When doing binary-paired collisions, the particle order is shuffled to ensure randomness, that each particle can collide with every other particle in the cell with equal probability. The Fisher-Yates shuffle is the best method numerically with the best randomness characteristics. However, the shuffle can be time-intensive, in some cases taking half of the simulation time. This slowness is a particular issue on GPU since the kernels are distributed per grid cell which limits the opportunity for parallelism. (This is the same issue addressed in PR #4577.)
To address this slowness, this PR implements an alternative shuffling technique that is implemented as a loop over particles providing a substantial speedup. This method uses a linear congruential generator to do the shuffle, where the particle
iis replace by the particle(i*step + offset) % n, wherenis the number of particles in the cell,stepis chosen randomly and is co-prime withn, andoffsetis chosen randomly. Since this algorithm is known to have a low degree of randomness, the shuffle is done multiple times on subgroups of particles which greatly increases the degree of randomness. By default, five shuffles are done, the first over all particles in the cell, then the rest with a randomly chosen number of subgroups of up to four, with the start of the subgroups shifted randomly. The number of shuffles can be specified.This shuffle is substantially faster than the Fisher-Yates, on both CPU and GPU. In various test cases, on CPU (Mac M3) it is roughly four time faster (presumably because only a few random number are needed compared to one for each particle). On the GPU, it is 300 to 500 times faster, becoming an insignificant part of the simulation.
Many tests were done checking the correctness of simulations with this shuffling method. For
pairwisecoulomb, multiple simulations were made looking at equilibration rates for both intra- and interspecies collisions, with anisotropic temperature, differing species temperatures, and mixed temperatures within a species (for example tests 1 and 2 in https://doi.org/10.1016/j.jcp.2025.113927). In all cases, 1D, 2D, and 3D, the equilibration rates agreed with that found using Fisher-Yates. This includes stringent tests withdo_not_push = 1where the particles remain stationary in memory (in these cases a single modulus shuffle without the subgroups would fail). Thenuclearfusioncollision was also tested, showing the correct neutron production rates.As an extra, this also allows use of the
std::shuffleon CPU, which uses the same Fisher-Yates shuffle, but is somewhat faster than the WarpX code. Also added is the option for no shuffling for testing purposes.A side note - in many of the tests, I also ran cases without shuffling as a comparison and in most of these cases, the correct collision rates were still obtained. This is particularly true in 2D and 3D where there seemed to be adequate shuffling of the particles just by having particles enter and leave the cells which rearranges the particles in memory. It would not be good to run this way, but is an interesting effect to see.