Skip to content

Conversation

@Ivorforce
Copy link
Member

@Ivorforce Ivorforce commented Nov 28, 2025

This PR allows users to profile their games' GDScript functions using the tracy profiler.
It is a collaborative work between @enetheru and me.

SCR-20251128-nknw

Users can use the tracy instructions on the docs (currently in PR form, soon available here) to start profiling their games. Note that profiling with tracy requires recompiling the engine.

I've profiled tps-demo for about ~10 minutes straight, and both Godot and tracy stay responsive, without any apparent slowdowns or leaks.

Approach

Tracy profile zones generally make use of constexpr tracy::SourceLocationData *, which are passed to the profiling macros. With this trick, tracy can introspect millions of events, because it only needs to store a start location, and end location, and a pointer to the SourceLocationData.

While tracy supports dynamic source locations / strings, effectively it just copies the contents and leaks it. This works, but is wasteful, because tons of data is leaked. A previous implementation (by the developers of Halls of Torment) had trouble with this approach, because it led to gigabytes of data filling up after a few seconds of profiling (I don't know for sure they used tracy's leaky approach, but it would match the observations).

Instead of using tracy's dynamic source location API, this PR interns source locations for GDScriptFunction objects, and leaks those instead. This means that every function is only leaked once, instead of once per call, which allows us to profile for much, much longer. This takes inspiration from StringName's interning approach.
The implementation is fully contained in the tracy glue internals, and doesn't affect games that don't compile with tracy. It is further contained to .cpp files including profiling.h, so recompiling with a profiler attached stays trivial.

(it would be possible to attach a void * to GDScriptFunction instances instead of interning SourceLocationData. This would be faster, but it wouldn't be self-contained anymore. In the future, if people start running into performance overhead from the profiler, we can amend the implementation to do this instead).

@Ivorforce
Copy link
Member Author

Ivorforce commented Nov 28, 2025

Note that Godot currently crashes on exit (when tracy is used). This doesn't affect the ability of the PR to trace events.
I'll see if this can be fixed, but I think it would be acceptable to merge the PR even in this state, because it's more of a beauty error than a problem.
Edit: Fixed

@Ivorforce Ivorforce force-pushed the tracy-gdscript-codeloc branch from 60cfe84 to 65af0fc Compare November 28, 2025 14:48
This adds macro `GodotProfileZoneGroupedFirstScript`, and uses interning for speedy lookups.

Co-authored-by: Samuel Nicholas <nicholas.samuel@gmail.com>
@vnen
Copy link
Member

vnen commented Nov 28, 2025

Tested this for a bit. I think it's pretty good but it fails to capture engine calls from GDScript. So if a script function calls something else and that something is slow, it's not really possible to tell.

I wonder if you can hook some of the API into the CALL* opcodes in the VM so the profiler is aware that the control is leaving the function (with some information of what's being called). This would give a better view of the execution time vs. the self time.

@Ivorforce
Copy link
Member Author

Ivorforce commented Nov 28, 2025

Tested this for a bit. I think it's pretty good but it fails to capture engine calls from GDScript. So if a script function calls something else and that something is slow, it's not really possible to tell.

Right, since tracy is a tracing profiler, by nature we only trace what we explicitly annotate. (Most) engine functions aren't annotated, so they aren't traced.
However, I would argue that this PR solves the more important problem already (figuring out where your frametime goes). Benchmarking the function parts from there is a lot easier than locating the source of your problems in the first place.

I wonder if you can hook some of the API into the CALL* opcodes in the VM so the profiler is aware that the control is leaving the function (with some information of what's being called). This would give a better view of the execution time vs. the self time.

This should be possible — and indeed, #112707 had some support for it.
However, adding support for this is not required for GDScript tracing to work, so I removed it from this PR. I think it is better added and reviewed in isolation, after this PR is merged.
(An alternative to instrumenting engine calls is using tracy's sampling function. This should fill in the gaps in a more complete way)

@enetheru
Copy link
Contributor

and indeed, #112707 had some support for it.

I put a lot of effort into capturing all calls from gdscript , specifically tailoring it for capturing engine calls as well. I was originally capturing all opcodes separately, and figured out which ones could be captured at the start, and which ones i needed to keep.

I havent had time to look over and test the rest yet, but this makes me sad.

@Ivorforce
Copy link
Member Author

Ivorforce commented Nov 28, 2025

and indeed, #112707 had some support for it.

I put a lot of effort into capturing all calls from gdscript , specifically tailoring it for capturing engine calls as well. I was originally capturing all opcodes separately, and figured out which ones could be captured at the start, and which ones i needed to keep.

I havent had time to look over and test the rest yet, but this makes me sad.

It's not a problem, we can add those back in a follow-up PR. But I much prefer separation of concerns in PRs, to give each added feature the attention that's needed to review it.

@Ivorforce
Copy link
Member Author

Ivorforce commented Nov 28, 2025

To go a little bit more in-depth with my response:

First off, thank you for putting in the time to add support for tracing system calls. Judging by @vnen's and @AdriaandeJongh's reaction, this is a feature that is anticipated, and will be useful for profiling GDScript.

Still, I'll try to explain why I decided to remove it from this PR. The reason for that is simply that I want to keep the complexity of the PR as low as possible. I strongly believe in the power of code reviews, and code reviews are most effective when the code is easy to understand, and focuses on one change at a time. In short, I'm not removing system call tracing because I believe it should not be added — I'm just removing it temporarily to make this PR easier to review and merge. I'm looking forward to your follow-up PR to re-add those calls!

You've also asked why I've added interning for SourceLocationData, at the same time while I removed the system call tracing. The reason for that is that it is needed for the feature to run smoothly. Let's review the alternatives to my approach:

Use Tracy's dynamic source location API: This is you did in your PR #112707. However, using this API makes Tracy copy and leak the strings. In bigger projects, this can lead to large amounts of data being leaked, effectively making us unable to trace for longer than a few seconds (as described in the Halls of Torment retrospective).

Use and leak SourceLocationData: I (inadvertently) tested this. Tracy supports a maximum of 32k SourceLocationData objects. With a few calls each frame, this is maxed out after a few seconds, preventing longer tracing, the same as the previous approach.

Considering this, we need a way to keep track of SourceLocationData objects and reuse them. To keep the profiler logic free from non-profiler builds, I decided to intern them. This approach works well: As described in the OP, we should be able to profile (almost) indefinitely with this design. Since this design is vital for sustainable profiling, I believe it should inform the API design, and is therefore needed in the initial PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants