-
Notifications
You must be signed in to change notification settings - Fork 82
CI Integration Example SpeedyWeather.jl #2784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
CI Integration Example SpeedyWeather.jl #2784
Conversation
|
Your PR requires formatting changes to meet the project's style guidelines. Click here to view the suggested changes.diff --git a/test/integration/SpeedyWeather/runtests.jl b/test/integration/SpeedyWeather/runtests.jl
index ea3e354c..c606e9fd 100644
--- a/test/integration/SpeedyWeather/runtests.jl
+++ b/test/integration/SpeedyWeather/runtests.jl
@@ -6,7 +6,7 @@
using SpeedyWeather, Enzyme, Test
spectral_grid = SpectralGrid(trunc = 32, nlayers = 8) # define resolution
-model = PrimitiveWetModel(; spectral_grid, physics=false) # construct model
+model = PrimitiveWetModel(; spectral_grid, physics = false) # construct model
# physics = false to accelate the test
simulation = initialize!(model)
initialize!(simulation) |
|
Thanks Max! |
|
@giordano can we make the filtering more aggressive and only run CI on the integration test being changed? |
|
We could generate the matrix dynamically, but only if |
|
BTW, this is missing adding SpeedyWeather to Enzyme.jl/.github/workflows/Integration.yml Lines 47 to 56 in ab96c82
|
|
Oh, yes. I just added it. |
|
Besides the fact |
|
Yeah, in our own CI a very similar test takes about 35 minutes. I hoped it's a bit faster now after recent Enzyme patches, but it doesn't seem to be the case. If 20 min is the limit (didn't know that), then I can turn off some of the parametrizations and see if that's sufficient. I am out of time to really experiment with that for today though. |
|
Timeout is 45 minutes
|
|
I mean the gradient compile time is what it is currently 🤷 |
|
I updated the example to mirror more closely the one we already use in your CI in Speedy. It should be faster now, and there it always finishes < 40 min. However, it also seems that Enzymev0.13.105 broke something for us, as we are getting fails from our CI tests on this since Friday. So this example here will also currently fail. |
|
well thats certainly a reason for getting the integration tests in! |
|
that said the erring part is a bit too complex/baked in, would you be able to construct a MWE of? |
Co-authored-by: Mosè Giordano <765740+giordano@users.noreply.github.com>
Co-authored-by: Mosè Giordano <765740+giordano@users.noreply.github.com>
b0f98ad to
8eec083
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2784 +/- ##
==========================================
- Coverage 67.79% 67.78% -0.01%
==========================================
Files 58 58
Lines 20723 20726 +3
==========================================
Hits 14050 14050
- Misses 6673 6676 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
okay I think I fixed the custom rule issue immediately here (and a potential PR for more information on the 1.11: EnzymeAD/Enzyme#2582). However, on a debug julia I pretty much always hit GC errors. @vchuravy I think we really ought debug and fix that MWE speedweather GC error issue before we can land this |
|
Thanks for your effort on this already. I already had a debugging session today with this, but it's quite nasty to boil this down to a proper MWE. I know which intermediate function from our code causes the error, but each individual function it calls is fine. This is the function for our simplest model, and here's an example call with Enzyme, but in the next days I'll try to condense it to a proper MWE, it's just not that easy to boil this down so quickly. |
|
I continue to struggle a bit to come up with a MWE that doesn't use our data structures. Below is the simplest function in our code that produces the error. Each of the individual functions I can differentiate just fine, it's the interplay between them that seemingly causes an error here. using Enzyme, SpeedyWeather
function transform_debug!(
diagn::DiagnosticVariables,
progn::PrognosticVariables,
lf::Integer,
model::Barotropic;
kwargs...
)
# retrieve data
(; vor_grid, u_grid, v_grid ) = diagn.grid
(; scratch_memory) = diagn.dynamics
vor = get_step(progn.vor, lf) # relative vorticity at leapfrog step lf
U = diagn.dynamics.a # reuse work arrays for velocities in spectral
V = diagn.dynamics.b # reuse work arrays for velocities in spectral
# U = u*coslat, V=v*coslat
S = model.spectral_transform
# COMPUTE FUNCTIONS, WHEN At least 3 are active -> GC error
# that's a spherical harmonics transforms (KA kernel + FFT)
transform!(vor_grid, vor, scratch_memory, S) # get vorticity on grid from spectral vor
# that's a spatial derivative computed in spectral space (KA kernel)
SpeedyWeather.UV_from_vor!(U, V, vor, S)
# that's a spherical harmonics transforms and some rescaling (KA kernel + FFT + KA kernel)
transform!(u_grid, U, scratch_memory, S, unscale_coslat=true)
#transform!(v_grid, V, scratch_memory, S, unscale_coslat=true)
return nothing
end
spectral_grid = SpectralGrid(trunc=9, nlayers=1)
model = BarotropicModel(; spectral_grid)
simulation = initialize!(model)
progn, diagn, model = SpeedyWeather.unpack(simulation)
lf2 = 2
dprogn = make_zero(progn)
ddiagn = make_zero(diagn)
autodiff(Reverse, transform_debug!, Const, Duplicated(diagn, ddiagn), Duplicated(progn, dprogn), Const(lf2), Const(model))which causes and was fine before with Enzyme <0.13.105 |
|
Is #2818 related to this here? Just let me know if I should invest a bit more time for finding a really proper MWE, or if that's already it. |
|
No that's unrelated and an issue from an in progress PR which doesn't occur on main atm |
|
The whole issue just confuses me. Now, I found that just leaving away the So in the following script, using some of Speedy's numerics, the only difference between the two functions is the inclusion of a Julia 1.10, Enzyme v0.13.108, SpeedyWeather using Enzyme, SpeedyWeather
# this works
function transform_debug!(
diagn::DiagnosticVariables,
progn::PrognosticVariables,
lf::Integer,
model::Barotropic
)
# retrieve data
(; vor_grid, u_grid, v_grid ) = diagn.grid
(; scratch_memory) = diagn.dynamics
vor = get_step(progn.vor, lf) # relative vorticity at leapfrog step lf
U = diagn.dynamics.a # reuse work arrays for velocities in spectral
S = model.spectral_transform
# that's a spherical harmonics transforms (KA kernel + FFT)
transform!(vor_grid, vor, scratch_memory, S) # get vorticity on grid from spectral vor
# that's a spherical harmonics transforms and some rescaling (KA kernel + FFT + KA kernel)
transform!(u_grid, U, scratch_memory, S)
return nothing
end
function transform_debug_with_kwargs!(
diagn::DiagnosticVariables,
progn::PrognosticVariables,
lf::Integer,
model::Barotropic;
kwargs...
)
# retrieve data
(; vor_grid, u_grid, v_grid ) = diagn.grid
(; scratch_memory) = diagn.dynamics
vor = get_step(progn.vor, lf) # relative vorticity at leapfrog step lf
U = diagn.dynamics.a # reuse work arrays for velocities in spectral
S = model.spectral_transform
# that's a spherical harmonics transforms (KA kernel + FFT)
transform!(vor_grid, vor, scratch_memory, S) # get vorticity on grid from spectral vor
# that's a spherical harmonics transforms and some rescaling (KA kernel + FFT + KA kernel)
transform!(u_grid, U, scratch_memory, S)
return nothing
end
spectral_grid = SpectralGrid(trunc=9, nlayers=1)
model = BarotropicModel(; spectral_grid)
simulation = initialize!(model)
progn, diagn, model = SpeedyWeather.unpack(simulation)
lf2 = 2
dprogn = make_zero(progn)
ddiagn = make_zero(diagn)
# this works
autodiff(Reverse, transform_debug!, Const, Duplicated(diagn, ddiagn), Duplicated(progn, dprogn), Const(lf2), Const(model))
# this gives a SegFault
autodiff(Reverse, transform_debug_with_kwargs!, Const, Duplicated(diagn, ddiagn), Duplicated(progn, dprogn), Const(lf2), Const(model))
|
@vchuravy suggested I add an example using SpeedyWeather.jl to the integration tests.
The test is pretty much the same thing we do in the paper. It's a sensitivity analysis of a single grid point and it checks that this runs without errors and the gradient makes physical sense, so the gradient is localised around the selected grid point and small far away from it.