Skip to content

Introduce sparse buffer uploads for mesh input uniforms.#23242

Open
pcwalton wants to merge 1 commit intobevyengine:mainfrom
pcwalton:sparse-buffer-vec
Open

Introduce sparse buffer uploads for mesh input uniforms.#23242
pcwalton wants to merge 1 commit intobevyengine:mainfrom
pcwalton:sparse-buffer-vec

Conversation

@pcwalton
Copy link
Contributor

@pcwalton pcwalton commented Mar 6, 2026

Currently, all mesh input uniforms (primarily the transform data) for all mesh instances are uploaded to the GPU every frame, even for those meshes that haven't changed their transform. This has become a bottleneck when scaling to millions of mesh instances.

This commit solves this issue by introducing sparse buffers. A sparse buffer is a buffer that tracks which elements are changed and, if the number of changes is small enough, pushes only the elements that have changed to the GPU and uses a compute shader to scatter those elements into their proper positions in the buffer. This supports the access patterns of large worlds, in which there may be millions of static mesh instances but, on any given frame, only a few thousand change their transform.

Although this PR only makes the mesh input uniform buffer sparsely updated, I want to use this infrastructure to store bins in the future, in order to optimize the other operation that must currently send per-mesh-instance data to the GPU every frame. That data doesn't need to be written from multiple threads, and so at that time we'll want to introduce a SparseBufferVec to go alongside the
AtomicSparseBufferVec that this patch introduces. In preparation for that, I've factored most of the functionality that will be common to SparseBufferVec and AtomicSparseBufferVec into separate functions in the sparse_buffer_vec module. That will allow the future bin slab work to introduce SparseBufferVec without having to refactor the work in this patch.

On many_cubes --instance-count 1000000 --no-cpu-culling, this reduces the median time spent in write_batched_instance_buffers from 12.4 ms to 0.876 ms, a 14x speedup.

Screenshot 2026-03-05 185623

Currently, all mesh input uniforms (primarily the transform data) for
all mesh instances are uploaded to the GPU every frame, even for those
meshes that haven't changed their transform. This has become a
bottleneck when scaling to millions of mesh instances.

This commit solves this issue by introducing *sparse buffers*. A sparse
buffer is a buffer that tracks which elements are changed and, if the
number of changes is small enough, pushes only the elements that have
changed to the GPU and uses a compute shader to scatter those elements
into their proper positions in the buffer. This supports the access
patterns of large worlds, in which there may be millions of static mesh
instances but, on any given frame, only a few thousand change their
transform.

Although this PR only makes the mesh input uniform buffer sparsely
updated, I want to use this infrastructure to store bins in the future,
in order to optimize the other operation that must currently send
per-mesh-instance data to the GPU every frame. That data doesn't need to
be written from multiple threads, and so at that time we'll want to
introduce a `SparseBufferVec` to go alongside the
`AtomicSparseBufferVec` that this patch introduces. In preparation for
that, I've factored most of the functionality that will be common to
`SparseBufferVec` and `AtomicSparseBufferVec` into separate functions in
the `sparse_buffer_vec` module. That will allow the future bin slab work
to introduce `SparseBufferVec` without having to refactor the work in
this patch.

On `many_cubes --instance-count 1000000 --no-cpu-culling`, this reduces
the median time spent in `write_batched_instance_buffers` from 12.4 ms
to 0.876 ms, a 14x speedup.
@pcwalton pcwalton requested review from atlv24 and tychedelia March 6, 2026 05:02
@pcwalton pcwalton added the A-Rendering Drawing game state to the screen label Mar 6, 2026
@pcwalton pcwalton added the C-Performance A change motivated by improving speed, memory usage or compile times label Mar 6, 2026
@github-project-automation github-project-automation bot moved this to Needs SME Triage in Rendering Mar 6, 2026
@pcwalton pcwalton added the S-Needs-Review Needs reviewer attention (from anyone!) to move forward label Mar 6, 2026
@ecoskey ecoskey self-requested a review March 6, 2026 05:30
@aevyrie aevyrie self-requested a review March 6, 2026 09:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times S-Needs-Review Needs reviewer attention (from anyone!) to move forward

Projects

Status: Needs SME Triage

Development

Successfully merging this pull request may close these issues.

1 participant