Atom Direction 2022-2023 #48
Replies: 3 comments 6 replies
-
"View, scene, pass, and object data is replicated into draw packet state at a per-draw frequency" and "draw packets that reference such state must be re-recorded to uniform buffers every time the state changes". |
Beta Was this translation helpful? Give feedback.
-
I tend to agree that the material type system needs to be more modular. I'm curious how other engines support disparate pipelines, if they rely on separation of content (different materials for different platforms), or if there is some system to swap out the shaders that back their equivalent of material types. To a certain extent, we could use our existing system to swap out shaders for different pipelines while using the same materials. We already do this for the low-end pipeline vs the main pipeline. We could expand this a bit more to support a deferred pipeline for example. But I agree with you that it won't scale well. If you have deferred vs forward, and low end vs high end, and VR, plus game-specific pipelines, etc, it's going to be difficult to maintain a common material library. There's also the question of timing for working on this project. We could delay this work in favor of other projects for now, use a more bespoke solution for the few use cases we are facing at this time, and ramp up modularity as needed. But depending on drastically we would want to modify the system, it could be better to tackle this project sooner before the user base grows. (I have yet to form an opinion about which way we should go). |
Beta Was this translation helpful? Give feedback.
-
Just throwing this out there randomly: I am working on some tech demo and I noted that for each objects, you need a distinct constant buffer to pass its specific transformation matrix (and other material setup flags for instance). Unfortunately from the CPU point of view, until the command list execution (and wait), the constant buffer cannot be reused, because each Map() event does not record on the command list. And a constant buffer is a minimum of 256 KiB IIRC because of stringent alignment constraints, we are losing tons of space and fragmentation because of that, also it's inconvenient to manage on the CPU side since we need a collection of CBs. So I said screw it, and I kept my unique CB object, turned it into a Structured Buffer so that it can be an array. And instead, I passed a Root Constant 32 bit variable on command-list that indexes into that SB. So the fetch of draw-call-frequency data is indirected by the root-constant, into the SB, and that lifts a headache. |
Beta Was this translation helpful? Give feedback.
-
This post is meant to help steer discussion, planning, and architecture for the Atom renderer in the coming months. By "help steer," I mean that this is a set of recommendations that we can iterate on, and it would be wonderful to have partner and community feedback. For each item herein, I will explain the current state of affairs, and offer some rationale for why I think a change will be beneficial. This set of topics is by no means exhaustive, and skips over a lot of smaller scale spot optimizations we should consider.
General themes:
Global GPU-resident Scene Representation
Currently, state is "smeared" out in a number of places. View, scene, pass, and object data is replicated into draw packet state at a per-draw frequency. We have GPU-resident state for object transforms and skinned geometry, but in general, the state currently resides in command buffers.
This poses a few issues for scalability. First, draw packets that reference such state must have state changes written to uniform buffers, resulting in a memory write-amplification cost that scales with scene size instead of just the frequency of the state changes themselves. Another scalability benefit is that having a unified scene representation reduces state duplication needed for systems that rely on indirection to gather state and render (e.g. ray tracing, terrain, deferred materials, deferred lighting).
Having a global GPU-resident scene representation unlocks several new opportunities, including but not restricted to:
Fewer draws, not faster draws
Atom's draw submission architecture is predicated on leveraging CPU-side parallelism to do DX12-draw bundle or VK secondary command buffer style wide submission. While this architecture properly leverages available concurrency well (CPU occupancy high), much of the work is either unnecessary (excessive SRG/descriptor versioning), or could be done more efficiently on the GPU.
An alternative frame of reference for us as Atom contributors should be to minimize draws altogether. There are many downstream ramifications of this:
VkDescriptorSetLayout
andRootSignature
counts to more effectively coalesce drawsThis mindset will get us closer to a more modern approach of GPU-driven scene submission, unlock GPU culling, and more importantly, free up CPU time to do useful work in other systems (physics, animation, etc.).
Decouple materials, lighting, and geometry
The "material type" is the location where everything comes together, describing how draws should be constructed for every render pass in the main render pipeline. This includes the depth prepass, shadow passes, motion vector pass, and main pass. Critically, this means that changes to the main render pipeline affect all material types, immediately restricting the ease with which alternative pipelines may be implemented.
There is a natural division that may be possible by decoupling the material type. Namely, separation of primitive state and surface state promotes experimentation with geometry-only passes and surface and lighting passes. An ideal system in my mind has the following properties:
The first point is the most critical one. As Atom's usage expands to different domains (mobile, console, VR, virtual production, simulation, etc.), it will be necessary to author new pipelines specialized to the platform's needs. Currently, this is manifested as a "low end" pipeline and a main pipeline, but the existing data abstraction will not scale if every material type needs to integrate with every future pipeline and their respective render passes.
This task naturally has ramifications to the "Material canvas" work, which will enable content creators to create procedural effects in shaders. We must work to ensure that materials authored via material canvas are content-stable across future renderer upgrades and architectural changes.
Device memory management
Currently, we don't have suitable mechanisms for backpressure when VRAM is oversubscribed. Entire mesh LOD chains and texture MIP chains are resident on the GPU, and if existing budgets are exhausted, draws are simply dropped from submission. We also don't respond to actual physical hardware conditions - our memory pools don't grow and shrink as available device-local memory changes.
A system is needed to resolve how to efficiently distribute memory across objects in the scene, balancing primitive and image resolution based on draw distance, estimated texel density, and physical hardware constraints. Some engines opt to do this computation on a world-cell basis, precomputing MIPs and LODs needed based on camera world-space position and object world-cell assignment. Other engines do this using a GPU-feedback mechanism. Still more engines perform a heuristic based on projected bounding box areas and information computed offline. The approach we take is not yet determined, but an RFC will likely be needed here to guide further discussion (my preference is the third approach mentioned above).
Deferred lighting, visibility buffer pipelines
A successful Material Canvas unlocks tremendous creative expression capabilities... and a lot of shaders. Segmenting opaque and transparent geometry is a nearly critical optimization for many engines to keep shader counts low, and a Forward+-only renderer only realistically scales for projects that can constrain artists to having a fixed palette of curated uber-shaders. While it is debatable what the correct approach is for a given project, I believe Atom should not dictate this for the user. The more important opportunity is that implementing, say, a deferred lighting pipeline is a forcing function to ensure that our abstractions provide suitable degrees of flexibility and modularity. If changing an existing pipeline or adding a new one requires content changes, we know we have more work to do.
General Tenets
A general guiding recommendation I would offer is for contributors to consider, "For each feature, how would such a feature be customized or overridden, and how can I ensure that content is not affected?"
Atom has a tall task of being a renderer used across a broad gamut of domains and use cases, and compared to other renderers, I believe Atom should lean far more heavily into modularity and extensibility. The primary focus at the moment should be to welcome new contributors and tackle new workloads. The "break even point" for such investments will naturally be further out in the future, but we should consider this to be an investment not just in the current working group, but the future working group (which will hopefully be considerably larger).
Beta Was this translation helpful? Give feedback.
All reactions