Skip to content

Conversation

@hersle
Copy link
Contributor

@hersle hersle commented Nov 29, 2025

I noticed two things when profiling with sparse+analytical Jacobians generated by MTK in my package:

using SymBoltz
M = ΛCDM(K = nothing) # creates a System with 219 equations
pars = parameters_Planck18(M)
ks = 1.0:1.0:100.0

@profview prob = CosmologyProblem(M, pars; jac = true, sparse = true) # creates ODEProblem with sparse 54x54 Jacobian
@profview sol = solve(prob, ks; thread = false) # solves 100x ODEProblems with sparse Jacobians
  1. Almost all time in ODEProblem(...; jac = true, sparse = true) is spent inside this iszero check:
bilde
  1. The solve is slowed by this sparsity pattern check every time the analytically generated Jacobian function is called:
bilde

I suggest two changes:

  1. Circumvent that iszero altogether. It is used when nonzeros of W are explicitly stored in J. But this can be done equivalently by recreating the sparse Jacobian in (I, J, V)-form with explicit zeros added at all of W's nonzero entries.
  2. Make it possible to disable the checking of the sparsity pattern at runtime. My first suggestion is for this checking to be done only if checkbounds = true. I am sure there are also other ways to solve this.

With these changes, the respective sections of the flamegraph disappear and the timings improve:

@time prob = CosmologyProblem(M, pars; jac = true, sparse = true)
@time sol = solve(prob, ks; thread = false)
 31.711084 seconds (262.74 M allocations: 8.367 GiB, 1.73% gc time, 0.78% compilation time: <1% of which was recompilation)
  0.594824 seconds (1.32 M allocations: 346.698 MiB, 34.96% gc time)

to

  4.061829 seconds (40.72 M allocations: 1.243 GiB, 2.49% gc time, 0.10% compilation time: <1% of which was recompilation)
  0.396266 seconds (1.10 M allocations: 154.179 MiB, 7.32% gc time)

Checklist

  • Appropriate tests were added
  • Any code changes were done in a way that does not break public API
  • All documentation related to code changes were updated
  • The new code follows the
    contributor guidelines, in particular the SciML Style Guide and
    COLPRAC.
  • Any new documentation only uses public API

@hersle
Copy link
Contributor Author

hersle commented Nov 29, 2025

Alternatively, if you think the bounds check should be preserved for safety, I suggest to change it to check only the number of nonzero entries, and not the correctness of every index. Something like length(out.nzval) == length(jac_precomputed.nzval).

@hersle hersle force-pushed the sparse_performance branch from fe2296b to e14cc64 Compare November 29, 2025 19:18
Copy link
Member

@AayushSabharwal AayushSabharwal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! I think this definitely makes sense. I have just a couple minor suggestions below. Could you also add this case to the benchmark script? We can then notice future regressions.

@hersle
Copy link
Contributor Author

hersle commented Dec 1, 2025

Good idea, I will try to make an example that doesn't need any external dependencies and add it to the benchmarks.

@hersle
Copy link
Contributor Author

hersle commented Dec 1, 2025

I set checkbounds = false by default now. I also made sure that it is still forwarded to build_function_wrapper, like it was before this PR. The default in ODEProblem, which is propagated to generate_*, is also false.

I also added an example to the benchmarks with a "semi-complicated" sparsity pattern. It indeed hits problem 1+2 described in the OP (before this PR):

julia> prob.f.jac_prototype
25×25 SparseArrays.SparseMatrixCSC{Float64, Int64} with 225 stored entries:
⎡⠕⣥⢊⠔⡡⢊⠔⡡⢊⠔⡡⢊⠄⎤
⎢⢊⠔⡱⢎⠔⡡⢊⠔⡡⢊⠔⡡⠂⎥
⎢⡡⢊⠔⡡⢛⢔⡡⢊⠔⡡⢊⠔⡁⎥
⎢⠔⡡⢊⠔⡡⢊⠕⣥⢊⠔⡡⢊⠄⎥
⎢⢊⠔⡡⢊⠔⡡⢊⠔⡱⢎⠔⡡⠂⎥
⎢⡡⢊⠔⡡⢊⠔⡡⢊⠔⡡⢛⢔⡁⎥
⎣⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠁⎦

julia> @profview prob = ODEProblem(model, u0, tspan; jac, sparse)

julia> @profview for i in 1:100000 (prob.f.jac.f_iip)(out, prob.u0, prob.p, first(tspan)) end
image image

So the timings should improve with this PR.

@hersle
Copy link
Contributor Author

hersle commented Dec 1, 2025

I think these are the benchmark results, although they fail to be posted here? Copy-paste:

println(replace(copypasta, "\\n" => "\n"))

Benchmark Results (Julia vlts)

Time benchmarks
master 9ea5906... master / 9ea5906...
ODEProblem 0.0901 ± 0.007 s 0.0921 ± 0.0076 s 0.978 ± 0.11
init 0.0581 ± 0.0018 ms 0.0582 ± 0.0017 ms 0.999 ± 0.042
large_parameter_init/ODEProblem 1.05 ± 0.018 s 1.02 ± 0.011 s 1.03 ± 0.021
large_parameter_init/init 0.108 ± 0.0025 ms 0.109 ± 0.0025 ms 0.992 ± 0.032
mtkcompile 0.0326 ± 0.0013 s 0.0335 ± 0.0014 s 0.975 ± 0.056
sparse_analytical_jacobian/ODEProblem 0.348 ± 0.0097 s 0.221 ± 0.01 s 1.57 ± 0.084
sparse_analytical_jacobian/f_iip 2.24 ± 2.4 μs 0.07 ± 0.01 μs 32.1 ± 35
sparse_analytical_jacobian/f_oop 0.38 ± 0.0092 ms 0.384 ± 0.0092 ms 0.991 ± 0.034
time_to_load 5.13 ± 0.038 s 5.05 ± 0.035 s 1.02 ± 0.01
Memory benchmarks
master 9ea5906... master / 9ea5906...
ODEProblem 0.659 M allocs: 22.1 MB 0.659 M allocs: 22.1 MB 1
init 0.892 k allocs: 0.046 MB 0.892 k allocs: 0.046 MB 1
large_parameter_init/ODEProblem 4.21 M allocs: 0.163 GB 4.21 M allocs: 0.163 GB 0.999
large_parameter_init/init 1.74 k allocs: 0.0856 MB 1.74 k allocs: 0.0856 MB 1
mtkcompile 0.229 M allocs: 8.46 MB 0.229 M allocs: 8.46 MB 1
sparse_analytical_jacobian/ODEProblem 1.21 M allocs: 0.042 GB 0.82 M allocs: 29.3 MB 1.47
sparse_analytical_jacobian/f_iip 3 allocs: 5.95 kB 0 allocs: 0 B
sparse_analytical_jacobian/f_oop 0.634 k allocs: 19.6 kB 0.634 k allocs: 19.6 kB 1
time_to_load 0.153 k allocs: 14.5 kB 0.153 k allocs: 14.5 kB 1

@AayushSabharwal
Copy link
Member

Yeah, the workflow isn't able to comment if the PR comes from a fork. I don't yet know how to fix that.

@AayushSabharwal
Copy link
Member

AayushSabharwal commented Dec 1, 2025

You might have to update some of the tests for this - they expect an AssertionError but it now throws a BoundsError (InterfaceI)

@hersle
Copy link
Contributor Author

hersle commented Dec 1, 2025

Got it. Running those failing tests locally segfaults instead of giving me a BoundsError, though. That should be because the tests are taking the default checkbounds = false, and then crashing when writing to non-existent elements in the sparse matrix. I think the CI tests don't segfault because they are run with --check-bounds=yes.

@hersle
Copy link
Contributor Author

hersle commented Dec 1, 2025

I set checkbounds = true instead (to never run into the segfault) and kept the AssertionError. I can easily change it to BoundsError (which relies on --check-bounds=yes to not segfault) if that makes more sense.

@AayushSabharwal
Copy link
Member

That makes sense for the test.

@hersle
Copy link
Contributor Author

hersle commented Dec 1, 2025

I believe the tests are ok now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants