-
Notifications
You must be signed in to change notification settings - Fork 9
Time dilation of collisionless {H,f} to control CFL #894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
Maxwell-Rosen
wants to merge
96
commits into
main
Choose a base branch
from
slowdown-to-omega-cfl
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Introduced skip_cell_threshold to gkyl_gk_collisionless_flux_new and related functions to improve handling of small values in flux calculations. - Updated relevant structures and CUDA implementations to accommodate the new parameter.
…_cell_threshold. This should work
…he skip_cell_threshold to the positivity updater. Change the minimum value of the threshold to -DBL_MAX so that it really never gets tripped
…t pass skip_cell_threshold correctly into the positivity_shift GPU kernel.
…ted to actually skip the shift, not just set the shifted_node value
…lean type to save memory
…involved touching a lot of files and was a lot of refactoring. Now, there is a gkyl_skip_cell updater, which is in gyrokinetic/zero. This is where the threshold is set and determined. All the apps acquire this object and free (valgrind clean) and read the array, always making sure to do const bool *foo = gkyl_array_cfetch(). There was some confusion with pointers and booleans, but now the code works and reproduces the same results as the PR that is open right now. I will play with masking diagnostics tomorrow. The code compiles without errors and warnings. No testing has been done on GPU yet, but GPU code has been modified
…lation and not only is it very messy, but it doesn't have a dramatic impact. I tried just hardcoding the mask in to save on the code integration work and the small test showed lackluster results
…pe of cell skiping operation we are doing. Remove the moment_mask option because it is not implemented yet.
…s terms. I had a vivid dream that I explained this feature to Ammar and he rejected it first, but liked the way I explained it and wanted to see some tests, so I awoke to make a first pass at this and get the idea out. This is similar to force softening, except it's advective slowdown for everything. The orbit contours should be the same, just slower, so I think this will mitigate the issues we saw before of expander temperature increase due to the force softening. I think this is a really good way to determine the region we want to ignore, by setting the operation scale_fac = min(1.0, dt_omegaH/dt_CFL), so that we are always limited by omegaH. this should screen out the inaccessible regions of phase space automatically for the mirror problem. This can be controlled by setting the omegaH_fac in the input file. I need to figure out how to do this while loop with array methods because this doesn't work for GPUs, but we cannot array divide things so it's a little complicated. We would also need some sort of array_min(a, array) operation, which I need to collaborate with others on.
…ed tests. I should clarify that my dream was explaining that instead of force softening, using the time dialation scheme (dialating the advective term) has a similar property of constraining the time step due to advective restrictions without allowing extra leakage. Insead of force softening, which allows extra stuff to get through the throat, we dialate the time in this region to just made evolution there slower. It preserves the energy contours of the problem while allowing a larger time step. Yes, I did give this detailed description to Ammar in my dream including figures and diagrams. - Added gkyl_array_min and gkyl_array_min_range functions for element-wise minimum operations. - Implemented CUDA kernels for minimum operations on arrays. - Updated tests to include new minimum functionality. - Removed unused previous step CFL rate logic from species dynamics.
…erations and dg_bin ops. I made a special basis for this operation. Added in the ability for either users to specify the minimum dt, or for the code to minimize dt with the time dialation method based on omega_H. The user can always adjust the omega_H frequency at the input file level by specifying something like cfl_frac for the omegaH mode.
…ate into binary operations; update collisionless app for CFL scaling
… think this means we need to combine this into one big method because we have to allocate everything at initilization. Or at least the initilization needs to initilize everything. Tested the poa regression test and it still works. On to simulations!
…te related calls in tests and applications. Add and verify GPU unit tests for this function
…tion and add GPU support tests
…ells that are not updated. This means the skip_cell mask should be a greater than mask, which means we should flip all the conditionals inside the equation updaters. To improve readability, I changed skip_cell to update_cell, so that the if statements read something like "if update cell is false, then skip". I think it's more readable now and follows the conventions in the code better
…do not make mistakes and accessing it's private elements directly. We should use functions to call const pointers to these elements if neccisasy. I updated the ctest_dg_array_mask to reflect this. The unit test was also doing some incorrect accessing of gkyl_array->data and size, which I had to fix. I condensed the unit test by using a use_gpu flag in the function call, so the filesize went down by quite a bit. Furthermore, the calls to the mask were utilizing the internal range object, which was made private. Instead, I made an internal gkyl_dg_array_mask_eval_idx which takes an int idx so that we have one method taking the linear long idx and another that takes the int idx
…headers also need to "acquire" the device pointer, which is now moved into the private header, so we need a public method to access this pointer. I also accidentally deleted a few indices inside the priv headers, which I brought back from main
…get any GPUs on stellar because I have too many things queued up. I'm struggling with the GPU implementation after moving the struct definition into the private header. I'm not sure where to put the GKYL_CU_DH static inline because I can't put it in the public header since it needs access to private elements
…ask' into skip-cell-general-mask
…evice or host is to have the GKYL_CU_DH flag inside the function in the private header, which we call a kernel. We have a host function in the .c file which calls this kernel. The GPU parts of the code can include this private header since those are private parts of the code and call the kernel directly
…the simulation, then uses a fraction of the maximum to mask cells. Another applies this maximum threshold in a spatially dependent manner, so inside each configuration space cell, a reduction is performed across the velocity range for the maximum C0, which is used for that configuration space cell. CPU unit tests work on my laptop, so I'm switching to another machine for GPU testing
…ask' into skip-cell-general-mask
…ask updater instead of the skip cell updater. It's cleaner now, plus I added an essential feature to the array mask to mask things that are some fraction of the maximum f value. I need to test this implementation more thoroughly, but it should be working now. This is just a scripting change in the app, so it should be fine. I updated the reset method to have a more robust decision process of releasing and re-initializing the array_mask so that it can change between the different thresholds and update the threshold
… fdot_scaling_enabled method for clarity. Testing, but test produces infinite timesteps
…timesteps because the default for update_cell was false, meaning no updates happened, and every cell skipped its updates. In GK, the default for a NONE mask should be true, but we should be able to set the default depending on the context. Non mask-setting regression tests behave normally.
…scaling_enabled function
… the input file boolean to time_dilation_has_spatial_dependence, which has a clearrer name
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Feature
Summary
Late on a Friday night, I had a research meeting with Ammar, Mana and Jimmy in my sleep. It was mostly with Ammar, but Jimmy and Mana were there. I explained to them the issues we had in G2 when we implemented force softening. In the 2023 paper, we published the results below.


The issue is that the force softening introduced higher temperatures in the expander. Expander heating is something we are incredibly concerned about right now because of the positivity hack, so this scheme is not ideal. I explained in my dream that the issue was that we are modifying the Hamiltonian, therefore modifying the orbit contours, and the orbit contours are essential for particle confinement. We really should not modify the Hamiltoninan for an appropriate scheme.
Instead, I was reminded of the work in this article (“Time-Dilation Methods for Extreme Multiscale Timestepping Problems” https://arxiv.org/html/2510.09756). They essentially discuss slowing down time near the extreme regions of the problem, effectively what we have done for the orbit average phase of the pseudo orbit averaging integrator. This method preserves the orbit contours, masks the CFL, and will speed up our simulations. Ammar was doubtful this would work and dismissed my idea, however I was very confident that this would work, so he wanted to see some evidence. I awoke to program in the middle of the night, teeming with excitement and motivation.
I realize another issue is that we used an overly harsh mask in the 2023 paper. It is not necessary to create a completely logical mask, going all the way from 1 to 0. Instead, the mask should optimally choose its strength based on the CFL frequency and rate. It should modify the problem the bare minimum in order to get the optimal CFL rate. That lead me to this modification of the GK vlasov equation.
Here is a derivation of this feature. It follows a similar derivation to Werner (2018) "Speeding Up Simulations By Slowing Down Particles: Speed-Limited Particle-In-Cell Simulation". Let’s start with a phase-space continuity equation of the form
To dilate time, consider a slowed-down distribution function
where$\beta(\vec x, \vec v, t)$ is the amount which time is dilated, $\beta \in (0,1]$ . Simulations thus far in Gkeyll have been evolving $f(\vec x, \vec v, t)$ , but we aim to evolve $g(\vec x, \vec v, t)$ . Substituting $g/\beta$ into the time derivative of the phase-space continuity equation yields
Applying the product rule, we get
Multiplying by$\beta$ , this can be expressed in terms of the time derivative of $g$ .
This equation correctly slows down the advection, collisions, and source terms, which relaxes the time-step.$\beta$ is specified independently, to which two options are presented.$\beta$ as slow, setting $\partial_t\beta \approx 0$ , or to treat $\beta$ as time-independent.$\beta$ as a state function, either of $g$ or the CFL limits imposed by equation \eqref{eq: Full vlasov equation toy models}. This means equation \eqref{eq: slowed down continuity equation with dbeta dt} is not determining the evolution of $\beta$ in time, but instead using $\partial_t \beta$ to compute, therefore an Euler step (or higher order RK methods) backwards in time can be made to compute $\partial_t \beta = \left( \beta_n - \beta_{n-1} \right) / \Delta t$ .$\partial_t \beta$ term can be integrated exactly, avoiding additional timestep restrictions.$\beta$ is purely local in phase space and does not introduce a CFL constraint; the timestep remains controlled by the rescaled characteristic velocities.
However, the resulting equation is not closed unless
One option is to approximate the time variation of
A second approach is to set
The
The time-derivative term involving
The option which is implemented is to set
This slows down the advective Poisson bracket by the appropriate ratio when $ \Delta t_{ \rm CFL} < \Delta t_{ \Omega_H } $. This numerical approach to the mask is also more flexible, as the code or user does not need to know about the mirror loss cone, therefore it is a more general method of time-dilation in inaccessible regions of phase space.
Issue link: I will open a DR as well about this idea
Implementation Details
Key changes: To calculate $ \rm{min}( 1, \Delta t_{ \Omega_H } / \Delta t_{ \rm CFL}) $, some modifications are needed inside the array_ops and dg_bin_ops methods.
We need an inversion operation to invert a P0 array. This should use the dg_inv method, which is already written. However, there is no kernel for the P0 array. Division is tricky with DG arrays, so it should not be an array_op, even though this is a P0 array. I handwrote a P0 kernel for this method because the Maxima generators do not work for P0. It just does 1/f[0], which is a straightforward operation and I really don't want to make the operation 1/f[0] more complex then neccisary.
This is followed by a$\rm min()$ operation, so that also needed to be implemented. I decided on the location for this function inside array_ops for a few reasons. One, this is an operation on a single array. Second, it is a simple operation, not requiring a basis. Third, I really don't want to think about how this generalizes to a P1 array and implement a full kernel operation. I see the issue that this method is only intended for P0 arrays, however I think that is the purpose of these array_ops methods. It is for single array manipulation, independent of basis. I really don't want to make this method implementation more complex then neccisary and do P1,P2 kernels and write Maxima scripts when the operation is just `fmin( coeff, array)' element wise.
Dependencies:None
Automated testing: New unit tests are added to ctest_array_ops.c. A regression test is used to test the mask of the CFL.
Example Use
Community Standards
layer/zeroshould have a unit test, e.g.,core/zero.Testing:
make checkand unit tests all pass.Additional Notes
I've been testing some simulations with this new feature. I will get some better results that are better direct comparsion. It speeds the FDP up by about 20x, which is huge. Preliminary results show no difference in the moments and a much better result thatn the skip_cell feature.
This figure compares two simulations, one with the time dilation feature, the other without it. This is a non-uniform z simulation with 216 cells using the positivity hack. The simulation executed 10 POA cycles, with a 20 μs FDP and 2 s OAP. The simulation ends around 20 s. The simulations compare nearly identically, except for minor deviations in Tpar in the mirror throat. The time dilation saves 20x during the FDP and the simulation goes from 22 hours to around 5-6 hours. It's a big cost saving,

I also wanted to compare a high resolution simulation Nz=800 without the positivity hack. The simulations executed 10 POA cycles with a 20 μs FDP and 2 s OAP. The results agree very well, although they needed to use a higher$\mu$ resolution of 32 cells instead of 16.

Here is the percent error for each quantity above, showing less than 1% deviation for all quantities with a 2x speedup.
