Loading gltf on background thread very slow when using "Separate" rendering thread model due to lock contention

### Tested versions

v4.5.1

### System information

Godot v4.5.1.mono (eb5a059c3) - Windows 11 (build 26200) - Multi-window, 2 monitors - Vulkan (Forward+) - dedicated NVIDIA GeForce RTX 4090 (NVIDIA; 32.0.15.7680) - 13th Gen Intel(R) Core(TM) i9-13900K (32 threads) - 127.7 GiB memory

### Issue description

My team is working on loading user-supplied glTF models at runtime for an XR application. We are experimenting with loading the glTF model on a background thread to reduce frame stalling.

We are also experimenting with the "Separate" rendering model (despite the warnings) to understand what problems we might have.

With the "Separate" thread model, we see load times that are roughly 30x slower than if we use a background thread with "Safe" thread model. I have instrumented the Godot engine to isolate the problem and have found it is due to lock contention in `CommandQueueMT`. Specifically this is what I see:
* GLTFDocument's `generate_scene` will import each node.
  * As part of import, `convert_importer_mesh_instance_3d` will create `MeshInstance3D` objects and set their mesh.
  * Creating the `MeshInstance3D` object often takes 10+ milliseconds. Setting the mesh also often takes 10+ milliseconds. With "Safe" thread model these operations are instant. More on what causes this later...
  * So if there are tens of thousands of nodes (we're working with complex industrial models that can be hundreds of megabytes) we can see this take 17 minutes for one of our models. BTW, Godot does wonderfully rendering with good performance once it is loaded, and does wonderfully loading quickly if we use "Safe" thread model.
* When the "Separate" thread model is used, the `RenderingServer`'s `_draw()` function will be queued on the `CommandQueueMT` and execute on the RenderingServer's render pump thread.
  * `_draw()` is very slow because it includes the swapchain synchronization. If I turn off vsync (which we do for XR), then it is "only" 2-3x slower than when using "Safe" thread model.
  * While `_draw()` is running (or any queued work for that matter), a mutex in the `CommandQueueMT` is acquired which prevents new work from being queued.


I instrumented Godot and have this timeline which shows what is happening:
<img width="1787" height="306" alt="Image" src="https://github.com/user-attachments/assets/4b615dba-9a13-4754-9d50-70b99f4e7c18" />

I drew a box around a single node/mesh being imported and there are four places where it pushes into the CommandQueue. Each of these four places block on acquiring a mutex because the _draw() method (shown on a separate lower track in that screenshot) is being invoked out of the Command Queue which is holding that lock.

I am still new to Godot and its architecture but one possible solution comes to mind: `CommandQueueMT` gets smarter around locking. Perhaps it could have separate pending and flushing queues with separate locks? Flush would swap the queues atomically to minimize contention with queuing work.

Safe thread model doesn't have this problem because the _draw() command does not run via `CommandQueueMT`.

### Steps to reproduce

Here is the loading code I use, which I invoke from a button press:

```csharp
public async void _on_button_pressed()
{
    Node3D sceneNode = null!;
    await Task.Run(() =>
    {
        var state = new GltfState();
        var doc = new GltfDocument();
        Error readErr = doc.AppendFromFile("path to very large GLB file", state);
        sceneNode = (Node3D)doc.GenerateScene(state);
    });

    await ToSignal(GetTree(), SceneTree.SignalName.ProcessFrame); // hop back to main thread

    AddChild(sceneNode);
}
```

### Minimal reproduction project (MRP)

[separate-thread-model-slow.zip](https://github.com/user-attachments/files/23375698/separate-thread-model-slow.zip)

Note this doesn't include the GLB file that I am testing with. The "PATH_TO_GLB_GOES_HERE.glb" line of code in node_3d.gd will need to be changed to something.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Loading gltf on background thread very slow when using "Separate" rendering thread model due to lock contention #112452

Tested versions

System information

Issue description

Steps to reproduce

Minimal reproduction project (MRP)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Loading gltf on background thread very slow when using "Separate" rendering thread model due to lock contention #112452

Description

Tested versions

System information

Issue description

Steps to reproduce

Minimal reproduction project (MRP)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions