Skip to content

Loading gltf on background thread very slow when using "Separate" rendering thread model due to lock contention #112452

@brycehutchings

Description

@brycehutchings

Tested versions

v4.5.1

System information

Godot v4.5.1.mono (eb5a059c3) - Windows 11 (build 26200) - Multi-window, 2 monitors - Vulkan (Forward+) - dedicated NVIDIA GeForce RTX 4090 (NVIDIA; 32.0.15.7680) - 13th Gen Intel(R) Core(TM) i9-13900K (32 threads) - 127.7 GiB memory

Issue description

My team is working on loading user-supplied glTF models at runtime for an XR application. We are experimenting with loading the glTF model on a background thread to reduce frame stalling.

We are also experimenting with the "Separate" rendering model (despite the warnings) to understand what problems we might have.

With the "Separate" thread model, we see load times that are roughly 30x slower than if we use a background thread with "Safe" thread model. I have instrumented the Godot engine to isolate the problem and have found it is due to lock contention in CommandQueueMT. Specifically this is what I see:

  • GLTFDocument's generate_scene will import each node.
    • As part of import, convert_importer_mesh_instance_3d will create MeshInstance3D objects and set their mesh.
    • Creating the MeshInstance3D object often takes 10+ milliseconds. Setting the mesh also often takes 10+ milliseconds. With "Safe" thread model these operations are instant. More on what causes this later...
    • So if there are tens of thousands of nodes (we're working with complex industrial models that can be hundreds of megabytes) we can see this take 17 minutes for one of our models. BTW, Godot does wonderfully rendering with good performance once it is loaded, and does wonderfully loading quickly if we use "Safe" thread model.
  • When the "Separate" thread model is used, the RenderingServer's _draw() function will be queued on the CommandQueueMT and execute on the RenderingServer's render pump thread.
    • _draw() is very slow because it includes the swapchain synchronization. If I turn off vsync (which we do for XR), then it is "only" 2-3x slower than when using "Safe" thread model.
    • While _draw() is running (or any queued work for that matter), a mutex in the CommandQueueMT is acquired which prevents new work from being queued.

I instrumented Godot and have this timeline which shows what is happening:
Image

I drew a box around a single node/mesh being imported and there are four places where it pushes into the CommandQueue. Each of these four places block on acquiring a mutex because the _draw() method (shown on a separate lower track in that screenshot) is being invoked out of the Command Queue which is holding that lock.

I am still new to Godot and its architecture but one possible solution comes to mind: CommandQueueMT gets smarter around locking. Perhaps it could have separate pending and flushing queues with separate locks? Flush would swap the queues atomically to minimize contention with queuing work.

Safe thread model doesn't have this problem because the _draw() command does not run via CommandQueueMT.

Steps to reproduce

Here is the loading code I use, which I invoke from a button press:

public async void _on_button_pressed()
{
    Node3D sceneNode = null!;
    await Task.Run(() =>
    {
        var state = new GltfState();
        var doc = new GltfDocument();
        Error readErr = doc.AppendFromFile("path to very large GLB file", state);
        sceneNode = (Node3D)doc.GenerateScene(state);
    });

    await ToSignal(GetTree(), SceneTree.SignalName.ProcessFrame); // hop back to main thread

    AddChild(sceneNode);
}

Minimal reproduction project (MRP)

separate-thread-model-slow.zip

Note this doesn't include the GLB file that I am testing with. The "PATH_TO_GLB_GOES_HERE.glb" line of code in node_3d.gd will need to be changed to something.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions