Performance improvements #97

CasBex · 2023-08-17T13:37:12Z

Hi, I was profiling my code and noticed that most of the time was spent inside FMIImport.jl, so I decided to make some performance improvements. Functionality-wise there is no change. The example I was testing this on went from about 60s with 22 GB allocations to 18s with 2.8 GB (which is still not amazing in all honesty).

This is only for FMI2 as I don't use FMI3 myself. Some of these changes require updates to FMICore.jl which is incoming. Before merging, FMICore.jl shoud thus be updated and the Project.toml in this repo should be updated for compatibility.

Summary

fmi2SetReal checking of the Jacobians in c.jl has been improved (biggest performance gain by far). By making a Set of $\partial$f_refs, it is possible to do lookups in O(1) instead of O(n), which reduces the for loop runtime complexity.
Various allocations and copies have been removed if possible
Type stability has been improved where possible
Added methods to keyword-only performance-critical functions (fmu::FMU2) and (c::FMU2Component) which seem to be faster.

Performance tips from a user perspective

The FMI C-functions (e.g. cFmi2SetReal...) operate on Vector{T} and return this type, regardless of which type of array was provided (where T depends on the C-function). If other types ::AbstractVector{<:Real} are provided, they are converted upon every C-call which takes a lot of time and memory allocations. @ThummeTo Maybe it would be a good idea to dispatch functions instead on Vector{T} and have a single catch-all upper method dispatch on AbstractVector{<:Real} and provide a performance warning for every function?

For optimal performance it is best to preallocate a v::Vector{T} of the correct size and type and reuse this as an intermediate buffer between whatever function you have running and the FMU. (I was personally passing views of arrays to avoid allocation, but this turned out to be counterproductive because of above)

Further improvements

The remaining allocations and slowness are mostly caused by type instability/ambiguity. This happens for example in the FMU2Component which has fields A,B,C,D,E,F::Union{Nothing,FMUJacobian} This means that in every function, Julia has to check at runtime what the type is, even though we know that functions like fmi2SetReal are only called when the Jacobians are already initialised. This causes significant slowdown. This would however require more significant changes to the library than I am personally willing to make (and would be too big for one PR).

CasBex · 2023-08-17T13:42:10Z

Specifically the changes in c5d5387 and a895080 require this PR in FMICore.jl

CasBex · 2023-08-22T14:18:19Z

71fd194 to avoid mutating c.x when x is mutated. This will avoid some bugs for users who use often-changing cache-vectors for this.

ThummeTo · 2023-08-25T15:08:01Z

Much appreciated!
Especially typing the structs is something that was on the todo list (for too long I think 😉), and many other performance optimizations too.
Thank you very much for the PR. If this is ok, I would review and add some additional modifications after my holiday at the very beginning of september and then merge it.
Best regards!

CasBex · 2023-09-05T07:19:23Z

FYI, I would set the FMICore compat to 18.0 before pushing v16.0 to the general repository

ThummeTo · 2023-09-05T07:23:46Z

Sure, this will be a breaking release (I will do some more modifications) 👍

ThummeTo · 2023-09-11T08:39:28Z

Hey @CasBex,
just out of curiosity, what benchmark did you use, that shows these improvements?
So even if you can not share the simulation model, it would be very interessting to have some information about this, like number of states and event indicators, number of triggered events, implicite/explicite solver and number of solver steps. The model I use for benchmarking the new changes is faster too, but not as much as in your case.
Regards!

CasBex · 2023-09-11T08:49:47Z

It's basically the thermal model in here https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-stg.2019.0196 together with a discrete PID controller and weather data as periodic callbacks (no events in the fmu). The solver model is the default from DifferentialEquations.jl

One thing that will also affect this is that I usually run multiple FMUs at the same time and use my own interface for that combined with FMIImport.jl. In there I also made some improvements, the most significant being the one under "Performance tips from a user perspective", so that is also likely part of the speedup.

ThummeTo · 2023-09-11T13:01:53Z

Regarding performance:
Yep, I just checked this out (and it's also part of the ccall docu) that basically for every arguement in ccall, a unsafe_convert(argtype, cconvert(argtype, argvalue)) is performed, which results in allocations for almost any argument that is used here... I will think about a good solution that doesn't restrict the user interface too much....

ThummeTo · 2023-09-15T16:38:47Z

Didn't know that the negative impact of using AbstractTypes is that huge! I am currently on restructuring the most important functions, there are speed-ups of around 20x in practice for some FMI function evaluations - just by replacing AbstractArray{T} with Array{T}. I will further add dispatches for AbstractTypes with performance warnings, that's a good and user friendly suggestion.

* Performance improvements (#97) * PrepareValueReference multiple dispatch * Reduce allocations * Unsense-related allocations * Minor change * Optimise Jacobian invalidation (requires update in FMICore) * Type stability * Keyword and non-keyword definitions * Avoid allocations * Avoid c.x pointing to external array * further performance optimizations * performance improvements * ready for fmissensitivity.jl --------- Co-authored-by: CasBex <123587431+CasBex@users.noreply.github.com>

CasBex added 8 commits August 17, 2023 13:23

PrepareValueReference multiple dispatch

4a2a73e

Reduce allocations

542b016

Unsense-related allocations

728319b

Minor change

ddc3d00

Optimise Jacobian invalidation (requires update in FMICore)

c5d5387

Type stability

b66511b

Keyword and non-keyword definitions

ebb7610

Avoid allocations

a895080

Avoid c.x pointing to external array

71fd194

ThummeTo changed the base branch from main to v0.16.0 September 5, 2023 07:10

ThummeTo merged commit 6772986 into ThummeTo:v0.16.0 Sep 5, 2023
6 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvements #97

Performance improvements #97

CasBex commented Aug 17, 2023

CasBex commented Aug 17, 2023

CasBex commented Aug 22, 2023

ThummeTo commented Aug 25, 2023

CasBex commented Sep 5, 2023

ThummeTo commented Sep 5, 2023

ThummeTo commented Sep 11, 2023 •

edited

Loading

CasBex commented Sep 11, 2023

ThummeTo commented Sep 11, 2023 •

edited

Loading

ThummeTo commented Sep 15, 2023 •

edited

Loading

Performance improvements #97

Performance improvements #97

Conversation

CasBex commented Aug 17, 2023

Summary

Performance tips from a user perspective

Further improvements

CasBex commented Aug 17, 2023

CasBex commented Aug 22, 2023

ThummeTo commented Aug 25, 2023

CasBex commented Sep 5, 2023

ThummeTo commented Sep 5, 2023

ThummeTo commented Sep 11, 2023 • edited Loading

CasBex commented Sep 11, 2023

ThummeTo commented Sep 11, 2023 • edited Loading

ThummeTo commented Sep 15, 2023 • edited Loading

ThummeTo commented Sep 11, 2023 •

edited

Loading

ThummeTo commented Sep 11, 2023 •

edited

Loading

ThummeTo commented Sep 15, 2023 •

edited

Loading