Skip to content

Conversation

@maximilian-gelbrecht
Copy link

@vchuravy suggested I add an example using SpeedyWeather.jl to the integration tests.

The test is pretty much the same thing we do in the paper. It's a sensitivity analysis of a single grid point and it checks that this runs without errors and the gradient makes physical sense, so the gradient is localised around the selected grid point and small far away from it.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 14, 2025

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic main) to apply these changes.

Click here to view the suggested changes.
diff --git a/test/integration/SpeedyWeather/runtests.jl b/test/integration/SpeedyWeather/runtests.jl
index ea3e354c..c606e9fd 100644
--- a/test/integration/SpeedyWeather/runtests.jl
+++ b/test/integration/SpeedyWeather/runtests.jl
@@ -6,7 +6,7 @@
 using SpeedyWeather, Enzyme, Test
 
 spectral_grid = SpectralGrid(trunc = 32, nlayers = 8)             # define resolution
-model = PrimitiveWetModel(; spectral_grid, physics=false)         # construct model
+model = PrimitiveWetModel(; spectral_grid, physics = false)         # construct model
 # physics = false to accelate the test
 simulation = initialize!(model)
 initialize!(simulation)

@vchuravy
Copy link
Member

Thanks Max!

@vchuravy
Copy link
Member

@giordano can we make the filtering more aggressive and only run CI on the integration test being changed?

@giordano
Copy link
Member

We could generate the matrix dynamically, but only if test/integration/ files only have been modified, otherwise I think we want to run all the tests, right?

@giordano
Copy link
Member

BTW, this is missing adding SpeedyWeather to

package:
- Bijectors
- DifferentiationInterface
- Distributions
- DynamicExpressions
- Lux
- SciML
- KernelAbstractions
- Molly
- MPI

@maximilian-gelbrecht
Copy link
Author

Oh, yes. I just added it.

@giordano
Copy link
Member

Besides the fact Test isn't loaded, the jobs are timing out. Any way to have something lighter to test? All other jobs finish in less than 20 minutes (most of them are a lot faster)

@maximilian-gelbrecht
Copy link
Author

maximilian-gelbrecht commented Nov 14, 2025

Yeah, in our own CI a very similar test takes about 35 minutes. I hoped it's a bit faster now after recent Enzyme patches, but it doesn't seem to be the case.

If 20 min is the limit (didn't know that), then I can turn off some of the parametrizations and see if that's sufficient. I am out of time to really experiment with that for today though.

@giordano
Copy link
Member

Timeout is 45 minutes

timeout-minutes: 45
but all other jobs take less than half of that.

@maximilian-gelbrecht
Copy link
Author

I mean the gradient compile time is what it is currently 🤷
On my laptop and my HPC it's a bit faster, but in the GitHub Actions CI it's really quite slow.

@maximilian-gelbrecht
Copy link
Author

I updated the example to mirror more closely the one we already use in your CI in Speedy. It should be faster now, and there it always finishes < 40 min.

However, it also seems that Enzymev0.13.105 broke something for us, as we are getting fails from our CI tests on this since Friday. So this example here will also currently fail.

@wsmoses
Copy link
Member

wsmoses commented Nov 25, 2025

well thats certainly a reason for getting the integration tests in!

@wsmoses
Copy link
Member

wsmoses commented Nov 25, 2025

that said the erring part is a bit too complex/baked in, would you be able to construct a MWE of?

@wsmoses wsmoses force-pushed the mg/speedy-ci-example branch from b0f98ad to 8eec083 Compare November 25, 2025 06:14
@codecov
Copy link

codecov bot commented Nov 25, 2025

Codecov Report

❌ Patch coverage is 0% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.78%. Comparing base (30e0519) to head (0b438af).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
src/rules/customrules.jl 0.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2784      +/-   ##
==========================================
- Coverage   67.79%   67.78%   -0.01%     
==========================================
  Files          58       58              
  Lines       20723    20726       +3     
==========================================
  Hits        14050    14050              
- Misses       6673     6676       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@wsmoses
Copy link
Member

wsmoses commented Nov 25, 2025

okay I think I fixed the custom rule issue immediately here (and a potential PR for more information on the 1.11: EnzymeAD/Enzyme#2582).

However, on a debug julia I pretty much always hit GC errors. @vchuravy I think we really ought debug and fix that MWE speedweather GC error issue before we can land this

@maximilian-gelbrecht
Copy link
Author

Thanks for your effort on this already. I already had a debugging session today with this, but it's quite nasty to boil this down to a proper MWE. I know which intermediate function from our code causes the error, but each individual function it calls is fine. This is the function for our simplest model, and here's an example call with Enzyme, but in the next days I'll try to condense it to a proper MWE, it's just not that easy to boil this down so quickly.

@maximilian-gelbrecht
Copy link
Author

maximilian-gelbrecht commented Nov 26, 2025

I continue to struggle a bit to come up with a MWE that doesn't use our data structures.

Below is the simplest function in our code that produces the error. Each of the individual functions I can differentiate just fine, it's the interplay between them that seemingly causes an error here.

using Enzyme, SpeedyWeather

function transform_debug!(
    diagn::DiagnosticVariables,   
    progn::PrognosticVariables,
    lf::Integer,
    model::Barotropic;
    kwargs...
)    
    # retrieve data
    (; vor_grid, u_grid, v_grid ) = diagn.grid
    (; scratch_memory) = diagn.dynamics  

    vor = get_step(progn.vor, lf)   # relative vorticity at leapfrog step lf
    U = diagn.dynamics.a            # reuse work arrays for velocities in spectral
    V = diagn.dynamics.b            # reuse work arrays for velocities in spectral
                                    # U = u*coslat, V=v*coslat
    S = model.spectral_transform

    # COMPUTE FUNCTIONS, WHEN At least 3 are active -> GC error 
    # that's  a spherical harmonics transforms (KA kernel + FFT)
    transform!(vor_grid, vor, scratch_memory, S)    # get vorticity on grid from spectral vor
    
    # that's a spatial derivative computed in spectral space (KA kernel) 
    SpeedyWeather.UV_from_vor!(U, V, vor, S)
    
    # that's a spherical harmonics transforms and some rescaling (KA kernel + FFT + KA kernel)
    transform!(u_grid, U, scratch_memory, S, unscale_coslat=true)
    #transform!(v_grid, V, scratch_memory, S, unscale_coslat=true)

    return nothing
end

spectral_grid = SpectralGrid(trunc=9, nlayers=1)
model = BarotropicModel(; spectral_grid)
simulation = initialize!(model)
progn, diagn, model = SpeedyWeather.unpack(simulation)

lf2 = 2 

dprogn = make_zero(progn)
ddiagn = make_zero(diagn)

autodiff(Reverse, transform_debug!, Const, Duplicated(diagn, ddiagn), Duplicated(progn, dprogn), Const(lf2), Const(model))

which causes

[29800] signal (11.2): Segmentation fault: 11
in expression starting at REPL[12]:1
_ZN4llvm5Value11setNameImplERKNS_5TwineE at /Users/max/.julia/juliaup/julia-1.10.10+0.aarch64.apple.darwin14/lib/julia/libLLVM.dylib (unknown line)
Allocations: 66874844 (Pool: 66783213; Big: 91631); GC: 92
Segmentation fault: 11

and was fine before with Enzyme <0.13.105

@maximilian-gelbrecht
Copy link
Author

Is #2818 related to this here? Just let me know if I should invest a bit more time for finding a really proper MWE, or if that's already it.

@wsmoses
Copy link
Member

wsmoses commented Nov 29, 2025

No that's unrelated and an issue from an in progress PR which doesn't occur on main atm

@maximilian-gelbrecht
Copy link
Author

maximilian-gelbrecht commented Dec 5, 2025

The whole issue just confuses me.

Now, I found that just leaving away the kwargs... in the function signature makes it work. But I've used Enzyme with functions with kwargs of course many times. And the individual functions are all fine to differentiate as well.

So in the following script, using some of Speedy's numerics, the only difference between the two functions is the inclusion of a kwargs... in the function signature that's not actually used at all. The first autodiff works, the second one gives a Seg Fault (but only since Enzyme 0.13.105). Note that we use EnzymeRules custom rules we defined in there for one half of the transform!

Julia 1.10, Enzyme v0.13.108, SpeedyWeather main

using Enzyme, SpeedyWeather
 
# this works
function transform_debug!(
    diagn::DiagnosticVariables,   
    progn::PrognosticVariables,
    lf::Integer,
    model::Barotropic
)    
    # retrieve data
    (; vor_grid, u_grid, v_grid ) = diagn.grid
    (; scratch_memory) = diagn.dynamics  

    vor = get_step(progn.vor, lf)   # relative vorticity at leapfrog step lf
    U = diagn.dynamics.a            # reuse work arrays for velocities in spectral
    S = model.spectral_transform

    # that's  a spherical harmonics transforms (KA kernel + FFT)
    transform!(vor_grid, vor, scratch_memory, S)    # get vorticity on grid from spectral vor
    
    # that's a spherical harmonics transforms and some rescaling (KA kernel + FFT + KA kernel)
    transform!(u_grid, U, scratch_memory, S)

    return nothing
    
end

function transform_debug_with_kwargs!(
    diagn::DiagnosticVariables,   
    progn::PrognosticVariables,
    lf::Integer,
    model::Barotropic;
    kwargs...
)    
    # retrieve data
    (; vor_grid, u_grid, v_grid ) = diagn.grid
    (; scratch_memory) = diagn.dynamics  

    vor = get_step(progn.vor, lf)   # relative vorticity at leapfrog step lf
    U = diagn.dynamics.a            # reuse work arrays for velocities in spectral
    S = model.spectral_transform

    # that's  a spherical harmonics transforms (KA kernel + FFT)
    transform!(vor_grid, vor, scratch_memory, S)    # get vorticity on grid from spectral vor
    
    # that's a spherical harmonics transforms and some rescaling (KA kernel + FFT + KA kernel)
    transform!(u_grid, U, scratch_memory, S)

    return nothing
    
end

spectral_grid = SpectralGrid(trunc=9, nlayers=1)
model = BarotropicModel(; spectral_grid)
simulation = initialize!(model)
progn, diagn, model = SpeedyWeather.unpack(simulation)

lf2 = 2 

dprogn = make_zero(progn)
ddiagn = make_zero(diagn)

# this works 
autodiff(Reverse, transform_debug!, Const, Duplicated(diagn, ddiagn), Duplicated(progn, dprogn), Const(lf2), Const(model))

# this gives a SegFault 
autodiff(Reverse, transform_debug_with_kwargs!, Const, Duplicated(diagn, ddiagn), Duplicated(progn, dprogn), Const(lf2), Const(model))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants