Introduce sparse buffer uploads for mesh input uniforms.#23242
Open
pcwalton wants to merge 1 commit intobevyengine:mainfrom
Open
Introduce sparse buffer uploads for mesh input uniforms.#23242pcwalton wants to merge 1 commit intobevyengine:mainfrom
pcwalton wants to merge 1 commit intobevyengine:mainfrom
Conversation
Currently, all mesh input uniforms (primarily the transform data) for all mesh instances are uploaded to the GPU every frame, even for those meshes that haven't changed their transform. This has become a bottleneck when scaling to millions of mesh instances. This commit solves this issue by introducing *sparse buffers*. A sparse buffer is a buffer that tracks which elements are changed and, if the number of changes is small enough, pushes only the elements that have changed to the GPU and uses a compute shader to scatter those elements into their proper positions in the buffer. This supports the access patterns of large worlds, in which there may be millions of static mesh instances but, on any given frame, only a few thousand change their transform. Although this PR only makes the mesh input uniform buffer sparsely updated, I want to use this infrastructure to store bins in the future, in order to optimize the other operation that must currently send per-mesh-instance data to the GPU every frame. That data doesn't need to be written from multiple threads, and so at that time we'll want to introduce a `SparseBufferVec` to go alongside the `AtomicSparseBufferVec` that this patch introduces. In preparation for that, I've factored most of the functionality that will be common to `SparseBufferVec` and `AtomicSparseBufferVec` into separate functions in the `sparse_buffer_vec` module. That will allow the future bin slab work to introduce `SparseBufferVec` without having to refactor the work in this patch. On `many_cubes --instance-count 1000000 --no-cpu-culling`, this reduces the median time spent in `write_batched_instance_buffers` from 12.4 ms to 0.876 ms, a 14x speedup.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Currently, all mesh input uniforms (primarily the transform data) for all mesh instances are uploaded to the GPU every frame, even for those meshes that haven't changed their transform. This has become a bottleneck when scaling to millions of mesh instances.
This commit solves this issue by introducing sparse buffers. A sparse buffer is a buffer that tracks which elements are changed and, if the number of changes is small enough, pushes only the elements that have changed to the GPU and uses a compute shader to scatter those elements into their proper positions in the buffer. This supports the access patterns of large worlds, in which there may be millions of static mesh instances but, on any given frame, only a few thousand change their transform.
Although this PR only makes the mesh input uniform buffer sparsely updated, I want to use this infrastructure to store bins in the future, in order to optimize the other operation that must currently send per-mesh-instance data to the GPU every frame. That data doesn't need to be written from multiple threads, and so at that time we'll want to introduce a
SparseBufferVecto go alongside theAtomicSparseBufferVecthat this patch introduces. In preparation for that, I've factored most of the functionality that will be common toSparseBufferVecandAtomicSparseBufferVecinto separate functions in thesparse_buffer_vecmodule. That will allow the future bin slab work to introduceSparseBufferVecwithout having to refactor the work in this patch.On
many_cubes --instance-count 1000000 --no-cpu-culling, this reduces the median time spent inwrite_batched_instance_buffersfrom 12.4 ms to 0.876 ms, a 14x speedup.