Skip to content

Implement alternative shuffling methods for collisions#6692

Open
dpgrote wants to merge 23 commits intoBLAST-WarpX:developmentfrom
dpgrote:use_modulusshuffle_instead_of_yates
Open

Implement alternative shuffling methods for collisions#6692
dpgrote wants to merge 23 commits intoBLAST-WarpX:developmentfrom
dpgrote:use_modulusshuffle_instead_of_yates

Conversation

@dpgrote
Copy link
Copy Markdown
Member

@dpgrote dpgrote commented Mar 19, 2026

When doing binary-paired collisions, the particle order is shuffled to ensure randomness, that each particle can collide with every other particle in the cell with equal probability. The Fisher-Yates shuffle is the best method numerically with the best randomness characteristics. However, the shuffle can be time-intensive, in some cases taking half of the simulation time. This slowness is a particular issue on GPU since the kernels are distributed per grid cell which limits the opportunity for parallelism. (This is the same issue addressed in PR #4577.)

To address this slowness, this PR implements an alternative shuffling technique that is implemented as a loop over particles providing a substantial speedup. This method uses a linear congruential generator to do the shuffle, where the particle i is replace by the particle (i*step + offset) % n, where n is the number of particles in the cell, step is chosen randomly and is co-prime with n, and offset is chosen randomly. Since this algorithm is known to have a low degree of randomness, the shuffle is done multiple times on subgroups of particles which greatly increases the degree of randomness. By default, five shuffles are done, the first over all particles in the cell, then the rest with a randomly chosen number of subgroups of up to four, with the start of the subgroups shifted randomly. The number of shuffles can be specified.

This shuffle is substantially faster than the Fisher-Yates, on both CPU and GPU. In various test cases, on CPU (Mac M3) it is roughly four time faster (presumably because only a few random number are needed compared to one for each particle). On the GPU, it is 300 to 500 times faster, becoming an insignificant part of the simulation.

Many tests were done checking the correctness of simulations with this shuffling method. For pairwisecoulomb, multiple simulations were made looking at equilibration rates for both intra- and interspecies collisions, with anisotropic temperature, differing species temperatures, and mixed temperatures within a species (for example tests 1 and 2 in https://doi.org/10.1016/j.jcp.2025.113927). In all cases, 1D, 2D, and 3D, the equilibration rates agreed with that found using Fisher-Yates. This includes stringent tests with do_not_push = 1 where the particles remain stationary in memory (in these cases a single modulus shuffle without the subgroups would fail). The nuclearfusion collision was also tested, showing the correct neutron production rates.

As an extra, this also allows use of the std::shuffle on CPU, which uses the same Fisher-Yates shuffle, but is somewhat faster than the WarpX code. Also added is the option for no shuffling for testing purposes.

A side note - in many of the tests, I also ran cases without shuffling as a comparison and in most of these cases, the correct collision rates were still obtained. This is particularly true in 2D and 3D where there seemed to be adequate shuffling of the particles just by having particles enter and leave the cells which rearranges the particles in memory. It would not be good to run this way, but is an interesting effect to see.

@dpgrote dpgrote requested a review from JustinRayAngus March 19, 2026 20:33
@dpgrote dpgrote added the component: collisions Anything related to particle collisions label Mar 19, 2026
);
if (m_shuffling_method == ParticleShufflingMethod::Modulus) {
ModulusShuffle(n_cells, np1, m_modulus_rounds, cell_offsets_1, indices_1);
/* ModulusShuffle(n_cells, np2, m_modulus_rounds, cell_offsets_2, indices_2); */

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
FisherYatesShuffle(n_cells, cell_offsets_2, indices_2);
} else if (m_shuffling_method == ParticleShufflingMethod::Standard) {
StandardShuffle(n_cells, cell_offsets_1, indices_1);
/* StandardShuffle(n_cells, cell_offsets_2, indices_2); */

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
Comment on lines +231 to +232
std::random_device rd;
std::mt19937 g(rd());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: you want to init the state of your mersenne twister generator only once and keep it g around.

WARPX_ABORT_WITH_MESSAGE("Standard shuffle not supported on GPU");
#else
std::random_device rd;
std::mt19937 g(rd());
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WeiqunZhang can you suggest a way to use a random engine from AMReX?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component: collisions Anything related to particle collisions KISMET performance optimization

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants