Implement alternative shuffling methods for collisions by dpgrote · Pull Request #6692 · BLAST-WarpX/warpx

dpgrote · 2026-03-19T20:33:49Z

When doing binary-paired collisions, the particle order is shuffled to ensure randomness, that each particle can collide with every other particle in the cell with equal probability. The Fisher-Yates shuffle is the best method numerically with the best randomness characteristics. However, the shuffle can be time-intensive, in some cases taking half of the simulation time. This slowness is a particular issue on GPU since the kernels are distributed per grid cell which limits the opportunity for parallelism. (This is the same issue addressed in PR #4577.)

To address this slowness, this PR implements an alternative shuffling technique that is implemented as a loop over particles providing a substantial speedup. This method uses a linear congruential generator to do the shuffle, where the particle i is replace by the particle (i*step + offset) % n, where n is the number of particles in the cell, step is chosen randomly and is co-prime with n, and offset is chosen randomly. Since this algorithm is known to have a low degree of randomness, the shuffle is done multiple times on subgroups of particles which greatly increases the degree of randomness. By default, five shuffles are done, the first over all particles in the cell, then the rest with a randomly chosen number of subgroups of up to four, with the start of the subgroups shifted randomly. The number of shuffles can be specified.

This shuffle is substantially faster than the Fisher-Yates, on both CPU and GPU. In various test cases, on CPU (Mac M3) it is roughly four time faster (presumably because only a few random number are needed compared to one for each particle). On the GPU, it is 300 to 500 times faster, becoming an insignificant part of the simulation.

Many tests were done checking the correctness of simulations with this shuffling method. For pairwisecoulomb, multiple simulations were made looking at equilibration rates for both intra- and interspecies collisions, with anisotropic temperature, differing species temperatures, and mixed temperatures within a species (for example tests 1 and 2 in https://doi.org/10.1016/j.jcp.2025.113927). In all cases, 1D, 2D, and 3D, the equilibration rates agreed with that found using Fisher-Yates. This includes stringent tests with do_not_push = 1 where the particles remain stationary in memory (in these cases a single modulus shuffle without the subgroups would fail). The nuclearfusion collision was also tested, showing the correct neutron production rates.

As an extra, this also allows use of the std::shuffle on CPU, which uses the same Fisher-Yates shuffle, but is somewhat faster than the WarpX code. Also added is the option for no shuffling for testing purposes.

A side note - in many of the tests, I also ran cases without shuffling as a comparison and in most of these cases, the correct collision rates were still obtained. This is particularly true in 2D and 3D where there seemed to be adequate shuffling of the particles just by having particles enter and leave the cells which rearranges the particles in memory. It would not be good to run this way, but is an interesting effect to see.

Source/Particles/Collision/BinaryCollision/BinaryCollision.H

-            );
+            if (m_shuffling_method == ParticleShufflingMethod::Modulus) {
+                ModulusShuffle(n_cells, np1, m_modulus_rounds, cell_offsets_1, indices_1);
+                /* ModulusShuffle(n_cells, np2, m_modulus_rounds, cell_offsets_2, indices_2); */


Source/Particles/Collision/BinaryCollision/BinaryCollision.H

+                FisherYatesShuffle(n_cells, cell_offsets_2, indices_2);
+            } else if (m_shuffling_method == ParticleShufflingMethod::Standard) {
+                StandardShuffle(n_cells, cell_offsets_1, indices_1);
+                /* StandardShuffle(n_cells, cell_offsets_2, indices_2); */


ax3l · 2026-03-27T20:21:18Z

Source/Particles/Collision/BinaryCollision/ParticleShufflers.H

+    std::random_device rd;
+    std::mt19937 g(rd());


Suggestion: you want to init the state of your mersenne twister generator only once and keep it g around.

dpgrote · 2026-03-27T21:28:21Z

Source/Particles/Collision/BinaryCollision/ParticleShufflers.H

+    WARPX_ABORT_WITH_MESSAGE("Standard shuffle not supported on GPU");
+#else
+    std::random_device rd;
+    std::mt19937 g(rd());


@WeiqunZhang can you suggest a way to use a random engine from AMReX?

dpgrote added 19 commits February 24, 2026 16:56

In binary collisions, use modulus shuffle instead of Yates

3316dce

Bug fix

96739e9

Separated out the array copy

15f8017

Implement version with loops separated

85ab58f

Finished modulus shuffle and added shuffling_method input parameter

292be54

Add documentation

d2930fb

Fix missing DEVICE statment

8bba18c

Add Standard shuffle option

cba5db7

Fix Standard shuffle (forgot imports)

b8513ee

Add subgroup shuffling to modulus

12801ce

Fix abort with Standard shuffle and GPU

77d16e7

Update the documentation

bf91768

Set default shuffling method to FisherYates

963b614

Fine tune the modulus shuffle

3cac2d9

Clean up the documentation

9fb15a8

Add CI test

0703d00

Update the documentation

dcbfa02

Merge branch 'development' into use_modulusshuffle_instead_of_yates

137e61c

Further update the documentation

956e8bf

dpgrote requested a review from JustinRayAngus March 19, 2026 20:33

dpgrote added the component: collisions Anything related to particle collisions label Mar 19, 2026

github-advanced-security bot found potential problems Mar 19, 2026

View reviewed changes

dpgrote added 4 commits March 19, 2026 14:18

Revert to use combined Fisher-Yates shuffle for two species

6394fbc

Further improvements to the shuffle

1e470a9

Update documentation

8f27973

Fix documentation

6de4d74

JustinRayAngus added the KISMET label Mar 21, 2026

ax3l added the performance optimization label Mar 27, 2026

ax3l reviewed Mar 27, 2026

View reviewed changes

dpgrote commented Mar 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement alternative shuffling methods for collisions#6692

Implement alternative shuffling methods for collisions#6692
dpgrote wants to merge 23 commits intoBLAST-WarpX:developmentfrom
dpgrote:use_modulusshuffle_instead_of_yates

dpgrote commented Mar 19, 2026 •

edited

Loading

Uh oh!

Check notice

Check notice

ax3l Mar 27, 2026

Uh oh!

dpgrote Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

dpgrote commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Check notice

Check notice

ax3l Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

dpgrote Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dpgrote commented Mar 19, 2026 •

edited

Loading