Time dilation of collisionless {H,f} to control CFL #894

Maxwell-Rosen · 2025-11-09T23:53:07Z

Feature

Summary

Late on a Friday night, I had a research meeting with Ammar, Mana and Jimmy in my sleep. It was mostly with Ammar, but Jimmy and Mana were there. I explained to them the issues we had in G2 when we implemented force softening. In the 2023 paper, we published the results below.

The issue is that the force softening introduced higher temperatures in the expander. Expander heating is something we are incredibly concerned about right now because of the positivity hack, so this scheme is not ideal. I explained in my dream that the issue was that we are modifying the Hamiltonian, therefore modifying the orbit contours, and the orbit contours are essential for particle confinement. We really should not modify the Hamiltoninan for an appropriate scheme.

Instead, I was reminded of the work in this article (“Time-Dilation Methods for Extreme Multiscale Timestepping Problems” https://arxiv.org/html/2510.09756). They essentially discuss slowing down time near the extreme regions of the problem, effectively what we have done for the orbit average phase of the pseudo orbit averaging integrator. This method preserves the orbit contours, masks the CFL, and will speed up our simulations. Ammar was doubtful this would work and dismissed my idea, however I was very confident that this would work, so he wanted to see some evidence. I awoke to program in the middle of the night, teeming with excitement and motivation.

I realize another issue is that we used an overly harsh mask in the 2023 paper. It is not necessary to create a completely logical mask, going all the way from 1 to 0. Instead, the mask should optimally choose its strength based on the CFL frequency and rate. It should modify the problem the bare minimum in order to get the optimal CFL rate. That lead me to this modification of the GK vlasov equation.

Here is a derivation of this feature. It follows a similar derivation to Werner (2018) "Speeding Up Simulations By Slowing Down Particles: Speed-Limited Particle-In-Cell Simulation". Let’s start with a phase-space continuity equation of the form

$$ \frac{\partial f}{\partial t} = [\mathcal{H},f] + C(f) + S. $$

To dilate time, consider a slowed-down distribution function

$$ g(\vec x, \vec v, t) = \beta(\vec x, \vec v, t) f(\vec x, \vec v, t) $$

where $\beta(\vec x, \vec v, t)$ is the amount which time is dilated, $\beta \in (0,1]$. Simulations thus far in Gkeyll have been evolving $f(\vec x, \vec v, t)$, but we aim to evolve $g(\vec x, \vec v, t)$. Substituting $g/\beta$ into the time derivative of the phase-space continuity equation yields

$$ \frac{\partial(g/\beta)}{\partial t} = [\mathcal{H},f] + C(f) + S. $$

Applying the product rule, we get

$$ \frac{1}{\beta} \frac{\partial g}{\partial t} -\frac{g}{\beta^2} \frac{\partial \beta}{\partial t} = [\mathcal{H},f] + C(f) + S $$

Multiplying by $\beta$, this can be expressed in terms of the time derivative of $g$.

$$ \frac{\partial g}{\partial t} = \frac{g}{\beta} \frac{\partial \beta}{\partial t} + \beta \left([\mathcal{H},f] + C(f) + S \right). $$

This equation correctly slows down the advection, collisions, and source terms, which relaxes the time-step.
However, the resulting equation is not closed unless $\beta$ is specified independently, to which two options are presented.
One option is to approximate the time variation of $\beta$ as slow, setting $\partial_t\beta \approx 0$, or to treat $\beta$ as time-independent.
A second approach is to set $\beta$ as a state function, either of $g$ or the CFL limits imposed by equation \eqref{eq: Full vlasov equation toy models}. This means equation \eqref{eq: slowed down continuity equation with dbeta dt} is not determining the evolution of $\beta$ in time, but instead using $\partial_t \beta$ to compute, therefore an Euler step (or higher order RK methods) backwards in time can be made to compute $\partial_t \beta = \left( \beta_n - \beta_{n-1} \right) / \Delta t$.
The $\partial_t \beta$ term can be integrated exactly, avoiding additional timestep restrictions.
The time-derivative term involving $\beta$ is purely local in phase space and does not introduce a CFL constraint; the timestep remains controlled by the rescaled characteristic velocities.

The option which is implemented is to set

$$ \beta =\rm{min}(1, \Delta t_{\rm CFL} / \Delta t_{\Omega_H} ) $$

This slows down the advective Poisson bracket by the appropriate ratio when $ \Delta t_{ \rm CFL} < \Delta t_{ \Omega_H } $. This numerical approach to the mask is also more flexible, as the code or user does not need to know about the mirror loss cone, therefore it is a more general method of time-dilation in inaccessible regions of phase space.

Issue link: I will open a DR as well about this idea

Implementation Details

Key changes: To calculate $ \rm{min}( 1, \Delta t_{ \Omega_H } / \Delta t_{ \rm CFL}) $, some modifications are needed inside the array_ops and dg_bin_ops methods.

We need an inversion operation to invert a P0 array. This should use the dg_inv method, which is already written. However, there is no kernel for the P0 array. Division is tricky with DG arrays, so it should not be an array_op, even though this is a P0 array. I handwrote a P0 kernel for this method because the Maxima generators do not work for P0. It just does 1/f[0], which is a straightforward operation and I really don't want to make the operation 1/f[0] more complex then neccisary.

This is followed by a $\rm min()$ operation, so that also needed to be implemented. I decided on the location for this function inside array_ops for a few reasons. One, this is an operation on a single array. Second, it is a simple operation, not requiring a basis. Third, I really don't want to think about how this generalizes to a P1 array and implement a full kernel operation. I see the issue that this method is only intended for P0 arrays, however I think that is the purpose of these array_ops methods. It is for single array manipulation, independent of basis. I really don't want to make this method implementation more complex then neccisary and do P1,P2 kernels and write Maxima scripts when the operation is just `fmin( coeff, array)' element wise.

Dependencies:None

Automated testing: New unit tests are added to ctest_array_ops.c. A regression test is used to test the mask of the CFL.

Example Use

    .collisionless = {
      .type = GKYL_GK_COLLISIONLESS_ES,
      .cfl_dt_min_value = 1e-8, // User specified minimum dt
      .write_diagnostics = true,
      .cfl_dt_min_omegaH = true, // To limit the CFL based on the omegaH mode.
    },

Community Standards

Documentation has been updated.
My code follows the project's coding guidelines.
Changes to layer/zero should have a unit test, e.g., core/zero.

Testing:

Additional Notes

I've been testing some simulations with this new feature. I will get some better results that are better direct comparsion. It speeds the FDP up by about 20x, which is huge. Preliminary results show no difference in the moments and a much better result thatn the skip_cell feature.

This figure compares two simulations, one with the time dilation feature, the other without it. This is a non-uniform z simulation with 216 cells using the positivity hack. The simulation executed 10 POA cycles, with a 20 μs FDP and 2 s OAP. The simulation ends around 20 s. The simulations compare nearly identically, except for minor deviations in Tpar in the mirror throat. The time dilation saves 20x during the FDP and the simulation goes from 22 hours to around 5-6 hours. It's a big cost saving,

I also wanted to compare a high resolution simulation Nz=800 without the positivity hack. The simulations executed 10 POA cycles with a 20 μs FDP and 2 s OAP. The results agree very well, although they needed to use a higher $\mu$ resolution of 32 cells instead of 16.

Here is the percent error for each quantity above, showing less than 1% deviation for all quantities with a 2x speedup.

- Introduced skip_cell_threshold to gkyl_gk_collisionless_flux_new and related functions to improve handling of small values in flux calculations. - Updated relevant structures and CUDA implementations to accommodate the new parameter.

…_cell_threshold. This should work

…he skip_cell_threshold to the positivity updater. Change the minimum value of the threshold to -DBL_MAX so that it really never gets tripped

…t pass skip_cell_threshold correctly into the positivity_shift GPU kernel.

…ather than conf

…ted to actually skip the shift, not just set the shifted_node value

…lean type to save memory

…involved touching a lot of files and was a lot of refactoring. Now, there is a gkyl_skip_cell updater, which is in gyrokinetic/zero. This is where the threshold is set and determined. All the apps acquire this object and free (valgrind clean) and read the array, always making sure to do const bool *foo = gkyl_array_cfetch(). There was some confusion with pointers and booleans, but now the code works and reproduces the same results as the PR that is open right now. I will play with masking diagnostics tomorrow. The code compiles without errors and warnings. No testing has been done on GPU yet, but GPU code has been modified

…lation and not only is it very messy, but it doesn't have a dramatic impact. I tried just hardcoding the mask in to save on the code integration work and the small test showed lackluster results

…pe of cell skiping operation we are doing. Remove the moment_mask option because it is not implemented yet.

…s terms. I had a vivid dream that I explained this feature to Ammar and he rejected it first, but liked the way I explained it and wanted to see some tests, so I awoke to make a first pass at this and get the idea out. This is similar to force softening, except it's advective slowdown for everything. The orbit contours should be the same, just slower, so I think this will mitigate the issues we saw before of expander temperature increase due to the force softening. I think this is a really good way to determine the region we want to ignore, by setting the operation scale_fac = min(1.0, dt_omegaH/dt_CFL), so that we are always limited by omegaH. this should screen out the inaccessible regions of phase space automatically for the mirror problem. This can be controlled by setting the omegaH_fac in the input file. I need to figure out how to do this while loop with array methods because this doesn't work for GPUs, but we cannot array divide things so it's a little complicated. We would also need some sort of array_min(a, array) operation, which I need to collaborate with others on.

…ed tests. I should clarify that my dream was explaining that instead of force softening, using the time dialation scheme (dialating the advective term) has a similar property of constraining the time step due to advective restrictions without allowing extra leakage. Insead of force softening, which allows extra stuff to get through the throat, we dialate the time in this region to just made evolution there slower. It preserves the energy contours of the problem while allowing a larger time step. Yes, I did give this detailed description to Ammar in my dream including figures and diagrams. - Added gkyl_array_min and gkyl_array_min_range functions for element-wise minimum operations. - Implemented CUDA kernels for minimum operations on arrays. - Updated tests to include new minimum functionality. - Removed unused previous step CFL rate logic from species dynamics.

…erations and dg_bin ops. I made a special basis for this operation. Added in the ability for either users to specify the minimum dt, or for the code to minimize dt with the time dialation method based on omega_H. The user can always adjust the omega_H frequency at the input file level by specifying something like cfl_frac for the omegaH mode.

…ate into binary operations; update collisionless app for CFL scaling

… think this means we need to combine this into one big method because we have to allocate everything at initilization. Or at least the initilization needs to initilize everything. Tested the poa regression test and it still works. On to simulations!

…te related calls in tests and applications. Add and verify GPU unit tests for this function

…tion and add GPU support tests

…null mask

…eadability

…ells that are not updated. This means the skip_cell mask should be a greater than mask, which means we should flip all the conditionals inside the equation updaters. To improve readability, I changed skip_cell to update_cell, so that the if statements read something like "if update cell is false, then skip". I think it's more readable now and follows the conventions in the code better

…do not make mistakes and accessing it's private elements directly. We should use functions to call const pointers to these elements if neccisasy. I updated the ctest_dg_array_mask to reflect this. The unit test was also doing some incorrect accessing of gkyl_array->data and size, which I had to fix. I condensed the unit test by using a use_gpu flag in the function call, so the filesize went down by quite a bit. Furthermore, the calls to the mask were utilizing the internal range object, which was made private. Instead, I made an internal gkyl_dg_array_mask_eval_idx which takes an int idx so that we have one method taking the linear long idx and another that takes the int idx

…headers also need to "acquire" the device pointer, which is now moved into the private header, so we need a public method to access this pointer. I also accidentally deleted a few indices inside the priv headers, which I brought back from main

…get any GPUs on stellar because I have too many things queued up. I'm struggling with the GPU implementation after moving the struct definition into the private header. I'm not sure where to put the GKYL_CU_DH static inline because I can't put it in the public header since it needs access to private elements

…ask' into skip-cell-general-mask

…evice or host is to have the GKYL_CU_DH flag inside the function in the private header, which we call a kernel. We have a host function in the .c file which calls this kernel. The GPU parts of the code can include this private header since those are private parts of the code and call the kernel directly

…the simulation, then uses a fraction of the maximum to mask cells. Another applies this maximum threshold in a spatially dependent manner, so inside each configuration space cell, a reduction is performed across the velocity range for the maximum C0, which is used for that configuration space cell. CPU unit tests work on my laptop, so I'm switching to another machine for GPU testing

…ask' into skip-cell-general-mask

…ask updater instead of the skip cell updater. It's cleaner now, plus I added an essential feature to the array mask to mask things that are some fraction of the maximum f value. I need to test this implementation more thoroughly, but it should be working now. This is just a scripting change in the app, so it should be fine. I updated the reset method to have a more robust decision process of releasing and re-initializing the array_mask so that it can change between the different thresholds and update the threshold

… fdot_scaling_enabled method for clarity. Testing, but test produces infinite timesteps

…ed this

…timesteps because the default for update_cell was false, meaning no updates happened, and every cell skipped its updates. In GK, the default for a NONE mask should be true, but we should be able to set the default depending on the context. Non mask-setting regression tests behave normally.

…scaling_enabled function

… the input file boolean to time_dilation_has_spatial_dependence, which has a clearrer name

Maxwell-Rosen added 30 commits October 16, 2025 14:30

I realize I made a mistake in the gpu kernels and didn't add the skip…

f96baf1

…_cell_threshold. This should work

Merge branch 'main' into fix-f-skip-cell-gk

d5c3216

Add collisionless surface flux diagnostic

b896a89

Rename skip_cell_thresh to skip_cell_threshold for consistency. Add t…

14bf8f8

…he skip_cell_threshold to the positivity updater. Change the minimum value of the threshold to -DBL_MAX so that it really never gets tripped

Update so that it compiles. Include float.h in the right places

e84286a

Verify that compilation works on GPUs. I forgot a <float.h> and didn'…

5dca2de

…t pass skip_cell_threshold correctly into the positivity_shift GPU kernel.

Move the write collisionless diagnostics to the write_species_phase r…

a4f5062

…ather than conf

Merge branch 'main' into fix-f-skip-cell-gk

0eff041

Jimmy noted that my logic was not doing what I thought it was. We wan…

9fc33fa

…ted to actually skip the shift, not just set the shifted_node value

Merge branch 'main' into fix-f-skip-cell-gk

0a13093

Implement a boolean gkeyl array type so that the mask can be in a boo…

ed09c12

…lean type to save memory

Undo a comment I accidently left in. I tried masking the moment calcu…

a490f7e

…lation and not only is it very messy, but it doesn't have a dramatic impact. I tried just hardcoding the mask in to save on the code integration work and the small test showed lackluster results

Add enum to the skip_cell which defaults to none to describe which ty…

468ce4e

…pe of cell skiping operation we are doing. Remove the moment_mask option because it is not implemented yet.

Merge branch 'main' into slowdown-to-omega-cfl

230f47e

Merge branch 'main' into slowdown-to-omega-cfl

6246572

The functionality works now. Add P0 inverse basis function and integr…

71b072f

…ate into binary operations; update collisionless app for CFL scaling

Add the diagnostic to this module

48e342c

Rename min functions to min_by_cell for clarity and consistency; upda…

d7938cd

…te related calls in tests and applications. Add and verify GPU unit tests for this function

Fix GPU build

22d32b8

Update a comment for the cfl_dt_min_value

622e84e

Merge branch 'main' into slowdown-to-omega-cfl

392cd6c

Merge branch 'main' into skip-cell-general-mask

39143cd

Mege main into skip-cell-general-mask

7217442

Mege main into skip-cell-general-mask

43104c9

Maxwell-Rosen added 30 commits November 26, 2025 14:24

Refactor dg_array_mask handling for NONE type; optimize memory alloca…

26af358

…tion and add GPU support tests

Fix the GPU build for the sheath regression tests, which initilize a …

54b093a

…null mask

Update comments to remove skip_cell from the mask updater

93ff68f

Update comment in dg_array_mask_eval

dbfdbd4

Add newline before gkyl_dg_array_mask_advance function for improved r…

67872d6

…eadability

Fix compiling error

561209a

Merge branch 'main' into skip-cell-general-mask

e907190

Valgrind clean one of the unit tests that I modified

4dd0baf

Merge remote-tracking branch 'refs/remotes/origin/skip-cell-general-m…

b208567

…ask' into skip-cell-general-mask

All unit tests pass

715796b

Fix some compiler warnings

65e853f

Feed the new skip_cell mask to the input file

eca8e16

Merge remote-tracking branch 'refs/remotes/origin/skip-cell-general-m…

e3544bf

…ask' into skip-cell-general-mask

Merge branch 'main' into skip-cell-general-mask

f51a69f

Merge branch 'main' into gk-time-dilation-small-f

28c0359

Merge branch 'skip-cell-general-mask' into gk-time-dilation-small-f

f130238

Adjust the reset method for the time dilation for clarity. Adjust the…

6be753d

… fdot_scaling_enabled method for clarity. Testing, but test produces infinite timesteps

rename skip_cell to update_cell in the cuda kernels. I must have miss…

775d832

…ed this

Merge branch 'skip-cell-general-mask' into slowdown-to-omega-cfl

68fd48e

Remove a diagnostic array write

f4c1c53

Remove debug print statement for collisionless time dilation in fdot_…

047837e

…scaling_enabled function

Spatially dependent mask looks like it's working correctly. I changed…

af26dc4

… the input file boolean to time_dilation_has_spatial_dependence, which has a clearrer name

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Time dilation of collisionless {H,f} to control CFL #894

Time dilation of collisionless {H,f} to control CFL #894

Uh oh!

Maxwell-Rosen commented Nov 9, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Time dilation of collisionless {H,f} to control CFL #894

Are you sure you want to change the base?

Time dilation of collisionless {H,f} to control CFL #894

Uh oh!

Conversation

Maxwell-Rosen commented Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Feature

Summary

Implementation Details

Example Use

Community Standards

Testing:

Additional Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Maxwell-Rosen commented Nov 9, 2025 •

edited

Loading