The FrameInterpolationSwapChain
implements the IDXGISwapChain4
interface to provide an easy way to handle dispatching the workloads required for frame interpolation and pacing presentation.
Though this implementation may not work for all engines or applications, it was designed to provide an easy way to integrate FSR3 frame generation in a way that it is (almost) transparent to the underlying application.
FrameInterpolationSwapChain
can be used as a replacement of the DXGI swapchain and from the application point of view behavior should be similar.
When frame generation is disabled, the main difference will be that present is slightly more expensive (one extra surface copy) compared to using the DXGI swapchain directly.
In this case, the frame interpolation swapchain still supports handling the UI composition, so applications don't have to handle their UI differently when disabling frame interpolation.
Internally the FrameInterpolationSwapChain
will create 2 additional CPU threads:
- The first thread is used to not stall the application while waiting for interpolation to finish on the GPU. After that the thread will get the current CPU time and compute pacing information.
- The second thread dispatches the GPU workloads for UI composition (calling the callback function if needed) and pacing the non-interpolated frames' present.
FrameInterpolationSwapChain
implements the IDXGISwapChain4
interface, so once created it behaves just like a normal swapchain.
Creation can be done by either calling ffxReplaceSwapchainForFrameinterpolationDX12
, which will replace an existing swapchain, or by calling ffxCreateFrameinterpolationSwapchainDX12
or ffxCreateFrameinterpolationSwapchainForHwndDX12
similar to calling CreateSwapChain
or CreateSwapChainForHwnd
.
The FrameInterpolationSwapchain
has been designed to be independent of FfxOpticalFlow
or FfxFrameInterpolation
interfaces. To achieve this, it does not interact directly with those interfaces. The frame interpolation workload can be provided to the FrameInterpolationSwapchain
in 2 ways:
- Provide a callback function (
frameGenerationCallback
) in theFfxFrameGenerationConfig
. This function will get called from theFrameInterpolationSwapChain
during the call to::Present
on the game thread, if frame interpolation is enabled to record the command list containing the frame interpolation workload. - Call
ffxGetFrameinterpolationCommandlistDX12(FfxSwapchain, FfxCommandList&)
to obtain a command list from theFrameInterpolationSwapChain
and record the frame interpolation workload into it. In this case the command list will be executed when present is called.
The command list can either be executed on the same command queue present is being called on, or on an asynchronous compute queue:
- Synchronous execution is more resilient to issues if an application calls upscale but then decides not to call present on a frame.
- Asynchronous execution may result in higher performance depending on the hardware and what workloads are running alongside the frame interpolation workload.
Either way, UI composition and present will be executed an a second graphics queue in order to not restrict UI composition to compute and allow the driver to schedule the present calls during preparation of the next frame.
Note: to ensure presents can execute at the time intended by FSR3's frame pacing logic, avoid micro stuttering and assure good VRR response by the display, it is recommended to ensure the frame consists of multiple command lists.
When using frame interpolation, it is highly advisable to treat the UI with special care, since distortion due to game motion vectors that would hardly be noticeable in 3D scenes will significantly impact readability of any UI text and result in very noticeable artifacts, especially on any straight, hard edges of the UI.
To combat any artifacts and keep the UI nice and readable, FSR3 offers 3 ways to handle UI composition in the FrameInterpolationSwapChain
:
- Register a call back function, which will render the UI on top of the back buffer. This function will get called for every back buffer presented (interpolated and real) so it allows the application to render UI animations at display rate or apply effects like film grain differently for each frame sent to the monitor. However this approach obviously has some impact on performance as the UI will have to be rendered twice, so care should be taken to only record small workloads in the UI callback.
- Render the UI to a separate surface, so it can be alpha-blended to the final back buffer. This way the UI can be applied to the interpolated and real back buffers without any distortion.
- Provide a surface containing the HUD-less scene to the
FrameInterpolationSwapChain
in addition to the final back buffer. In this case the frame interpolation shader will detect UI areas in the frame and suppress distortion in those areas.
The FrameInterpolationSwapchain
handles frame pacing automatically. Since Windows is not a real-time operating system and variable refresh rate displays are sensitive to timing imprecisions, FSR3 has been designed to use a busy wait loop in order to achieve the best possible timing behavior.
With frame generation enabled, frames can take wildly different amounts of time to render. The workload for interpolated frames can be much smaller than for application rendered frames ("real" frames). It is therefore important to properly pace presentation of frames to ensure a smooth experience. The goal is to display each frame for an equal amount of time.
Presentation and pacing are done using two additional CPU threads separate from the main render loop. A high-priority pacing thread keeps track of average frame time, including UI composition time, and calculates the target presentation time. It also waits for GPU work to finish to avoid long GPU-side waits after the CPU-side presentation call.
To prevent any frame time spikes from impacting pacing too much, the moving average of several frames is used to estimate the frame time.
A present thread dispatches frame composition work for the interpolated frame and first presents the interpolated frame. After this it will dispatch the frame composition work for the real frame, followed by a wait until the target presentation time and finally presents the real frame.
The application should ensure that the rendered frame rate is slightly below half the desired output frame rate. When VSync is enabled, the render performance will be implicitly limited to half the monitors maximum refresh rate.
It is recommended to use normal priority for any GPU queues created by the application to allow interpolation work to be scheduled with higher priority. In addition, developers should take care that command lists running concurrently with interpolation and composition are short (in terms of execution time) to allow presentation to be scheduled at a precise time.
To further illustrate the pacing method and rationale behind it, the following sections will lay out expected behavior in different scenarios. We differentiate based on the post-interpolation frame rate as well as whether the display uses a fixed or variable refresh rate.
Here, tearing is disabled and every frame is displayed for at least one sync interval. Presentation is synchronized to the display's vertical blanking period ("vsync"). This may result in uneven display timings and may increase input latency (by up to one refresh period).
In the diagram, the first real frame is presented slightly after the vertical blanking interval, leading to the prior interpolated frame being shown for two refresh intervals and increased latency compared to immediate display.
In this case, tearing is likely to occur. Presentation is not synchronized with the display. The benefit of this is reduced input latency compared to lower frame rates.
This section applies to display and GPU combinations with support for variable refresh rate (VRR) technologies, such as AMD FreeSync, NVIDIA G-SYNC® and VESA AdaptiveSync.
The timing between display refreshes is dictated by the variable refresh rate window. The delta time between two refreshes can be any time inside the window. As an example, if the VRR window is 64-120Hz, then the delta time must be between 8.33 and 15.625 milliseconds. If the delta is outside this window, tearing will likely occur.
If no new present happens inside the window, the prior frame is displayed again.
The variable refresh window usually does not extend above the reported native refresh rate of the display, so tearing will be disabled in this case.
If the frame rate is below the lower bound of the VRR window, the expected behavior is the same as if the frame rate is below the refresh rate of a fixed refresh rate display (see above).
If the frame rate is above the upper bound of the VRR window, the expected behavior is the same as if the frame rate is above the refresh rate of a fixed refresh rate display (see above).
List of resources created by the FrameInterpolationSwapChain
:
- Two CPU worker threads. One of those will be partially spinning between present of the interpolated frame and the real frame to precisely time the presents
- One asynchronous compute queue (only used
FFX_FSR3_ENABLE_ASYNC_WORKLOAD_SUPPORT
is set on FSR3 context creation andallowAsyncWorkloads
is true in theFfxFrameGenerationConfig
) - One asynchronous present queue. This queue will be used to execute UI composition workloads and present
- A set of command lists, allocators and fences for the interpolation and UI composition workloads
- The GPU resources required to blit the back buffer to the swapchain and compose the UI (if no callback is used)
- The swapchain attached to the actual game window
The FrameInterpolationSwapchain
has been designed to minimize dynamic allocations during runtime:
- System memory usage of the class is constant during the lifetime of the swapchain, no STL is being used
- DirectX resources are created on first use and kept alive for reuse