Skip to content

Conversation

@jmrodman
Copy link
Collaborator

@jmrodman jmrodman commented Feb 8, 2025

Adds Fokker-Planck collisions to the Vlasov solver. More details about the implementation are in the design review: here.

Summary of Changes

  1. Routines to compute the Rosenbluth potentials for a Maxwellian distribution via Gaussian quadrature
  2. Computation of the drag coefficient and diffusion tensor as derivatives of those potentials using recovery
  3. Calculation of the necessary moments to construct and solve the linear system to enforce conservation of energy and momentum with the FPO by correcting the drag and diffusion coefficients
  4. Drag term update using hyper_dg_advance
  5. Diffusion term update using the new/updated hyper_dg_gen_stencil_advance

These changes interface with the Vlasov solver through apps/vm_species_fpo.c and struct vm_fpo_collisions in apps/gkyl_vlasov_priv.h. However, one major point of note is that changes were made to zero/hyper_dg.c and zero/hyper_dg_cu.cu. These changes are exclusively to the function gkyl_hyper_dg_gen_stencil_advance and its CUDA counterpart, NOT gkyl_hyper_dg_advance.

Notes

These pieces are implemented for both CPUs and GPUs for:

  • 1x3v p=1 Hybrid
  • 1x3v p=2 Serendipity
  • 2x3v p=1 Hybrid

Higher dimensionality should also work, kernels just get very large.

Kernels generated for:

  • Drag coefficient and sign information
  • Diffusion tensor and its surface expansion
  • Correction moment calculation
  • Conservation correction linear system matrix setting and accumulation
  • Drag term volume, surface, boundary surface
  • Diffusion term volume, surface, boundary surface (for all 9 velocity combinations in all 9 regions of the domain)

New unit tests:

  • unit/ctest_fpo_vlasov_coeff.c
  • unit/ctest_fpo_vlasov_coeff_correct.c

New regression tests:

  • regression/rt_fpo_vlasov_relax_1x3v_p1.c
  • regression/rt_fpo_vlasov_relax_1x3v_p2.c

Checklist

  • CPU/GPU compile without issue
  • New unit tests pass
  • New regression tests run and reproduce expected result

jmrodman and others added 30 commits May 4, 2023 14:56
…on coefficient for FPO update from arbitrary potentials in 1x3v to 3x3v.
…rojects H and G onto phase basis and H, G, dH/dv, dG/dv, and d2G/dv2 onto phase surface basis at each velocity space boundary. Currently no cross derivatives in second derivative of G.
…s work as expected, but we'd like to switch to a nodal expansion for continuity across cell boundaries in surface projections
…ity at cell boundaries for surface expansions should make our lives easier when calculating diffusion tensor cross terms at domain boundaries.
…for accuracy for H, dH/dv, G, d2G/dv2, and eval_on_nodes routine for cell boundary continuity for dG/dv. Makes the recovery process a lot simpler for the diffusion coefficient.
… as expected. Updated to use Mana's domain stencil decomposition so domain surface and volume routines are all handled the same way. Need to add a unit test for these calculations
…pdated gen_stencil_advance in hyper_dg.c to match domain stencil method used for diffusion coefficient, etc.
…ing as expected. Using hyper_dg_advance for drag term and hyper_dg_gen_stencil for diffusion term. Removed the the gamma factor from the potential projection so we aren't accounting for it twice. Calculating primitive moments as part of the FPO update to feed into potential calculation; that infrastructure is pulled straight from the LBO.
…er than the LBO infrastructure that includes the conservation corrections for LBO. Had to generate p2 kernels.
…ter rather than the LBO infrastructure that includes the conservation corrections for LBO. Had to generate p2 kernels."

This reverts commit d6250a1.
…rvative scheme. Might have broken the FPO regression tests in the merge. Need to check, but committing these regardless as a starting point
…we might be comparing junk offsets and we get valgrind errors.
… part of various names since we will always use the "primitive" moments for the potential calculation. Using new LTE moments object to compute n, u_drift, and T/m for use in the maxwellian potentials object. Standardizing the regression tests to just have a bunch of ~unity factors for now. Will hopefully change back as we figure out things out.
…g. At first it seemed like a resolution thing with the robustness of the FPO but now I think it's that there's some factors missing in the potential calculation or something because I don't think the FPO works when vt =/= 1.0
… need to make the maxwellian potential projection and drag and diffusion coefficient calculation GPU-ified, but want to start this now to start debugging build issues which may occur from final trying to port hyper_dg_gen_stencil to GPUs (and also the way we set the kernels for the FPO diffusion)
JunoRavin and others added 8 commits February 12, 2025 09:51
…he FPO branch... the conflicts were related to a slight refactoring of hyper_dg that moved around where the cu updaters are defined and how they're utilized (directly calling the cu methods if use_gpu is true, instead of having to include if statements in updater methods). Copied this syntax for the gen_stencil method. Also updated the vlasov.c write method for the FPO potentials to utilize metadata.
…and deleting old FPO regression tests. Also fixing unit test compilation.
…here the Nvidia compiler is just straight up giving up on the size of the Gkeyll system at this point. Basically we get errors that nvcc can no longer handle the optimizations it does to create temporary variables and put things in registers.

This development is unfortunate, as it has nothing to do with FPO or main per se. Both independently compile with GPUs, but their combination does not. Further, @jmrodman and I aggressively refactored parts of the FPO over the last 72 hours to make the compiler have an easier time and now the FPO is not even the worst culprit in the Gkeyll infrastructure for long lines and non-trivial compiler optimizations. Unfortunately the error persists.

As a truly desperate measure to try to solve this issue, I have gone through the code base and looked for where we have been doing things historically the GPU compiler struggles to optimize, such as other kernels which have long lines but the GPU was fine with, in the hope that the problem is not the FPO, but again, the size of Gkeyll at this point and thus try to make the whole Gkeyll system easier to compile.

1. Getting rid of remaining 3x2v_p2 gyrokinetic kernels that are not being utilized.
2. Getting rid of the vlasov_gen_geo kernels which are being superseded by canonical PB
3. Redoing the Vlasov and SR Vlasov volume kernels to eliminate some temporary variables and perform the volume updates *sequentially*, so that in 2x3v and 3x3v, we do not try to do one final super update, but a sequence of updates (which breaks up lines that were >20k large into 3 7k long lines).

May God have mercy on me and grant me salvation from my sins.
@JunoRavin
Copy link
Collaborator

Posting a picture of the GPU bug here for our records
Screenshot 2025-02-14 at 6 03 58 PM

JunoRavin and others added 6 commits April 10, 2025 09:57
…t thought things were moved around and a couple of bug fixes that were attempted in here as part of "solving the kernel size" issue. But it compiles and runs. So let's try GPUs
…onfiguration space for the FPO (so that we can switch the FPO to be a per-quadrature-point in configuration space update to make it a generic 3V update with the potential for Rosenbluth solves), we delete the p=2 FPO kernels, retaining the Serendipity p=1 kernels in 1x3v and 2x3v (that are still p=2 in velocity space with the hybrid basis).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants