Integrating AMD FSR3 Frame Generation

Introduction

AMD FSR3 is a mix of super resolution and frame generation technologies. Super resolution upscales low resolution image to high resolution one and frame generation interpolates two real images to generate an interpolated image. FSR3 is a part of AMD FidelityFX SDK.

There's two types of FG, interpolation and extrapolation, but to my knowledge every FG technique (DLSS, FSR, and XeSS) is interpolation so I will consider frame generation and frame interpolation interchangeable.

Frame generation (FG in short) is performed with final color images of two consecutive frames, so the cost is mostly agnostic of scene complexity. If FG cost is cheaper than actually rendering a complex scene, then FG effectively doubles the FPS by doing the following:

  1. Render current real frame.
  2. Generate an interpolated frame between current and prev real frames. (= frame generation)
  3. Display the interpolated frame. (= swapchain interaction)
  4. Display the real frame.

Intuitively, the FPS should be stable. Let's say real frames were displayed at 60 FPS = 16.67 ms. If interpolated frames are displayed for, to say, 8.33 ms, then FPS would increase to 90 FPS, but the camera control will feel pretty bad. This issue is called frame pacing and it will be explained later.

There's a few concerns for FG to be actually useful:

AMD FidelityFX SDK contains two modules for FSR3 frame generation. One is frame generation module and the other one is auxiliary module for swapchain interaction.

FSR3 integration strategy

Usually when you integrate a third party SDK, you just include header files, link DLLs, and call the API of the SDK. This time I wanted to actually understand the FG algorithm so I rather chose another way with a lot of manual work.

I have a toy DX12 project called Cyseal where I exercise graphics programming. I have integrated open source implementation of FSR3 frame generation in FidelityFX SDK v1.1.4 to Cyseal. I made some choices when doing it:

Generating interpolated frames

To run frame generation shaders you need typical input parameters like scene color, scene depth, motion vector, ... and a not so obvious input called Optical Flow. FidelityFX SDK provides the optical flow module to generate it. So generating an interpolated frame is like this:

  1. Render the scene to generate scene color, scene depth, motion vector, and so on.
  2. Execute optical flow pass.
  3. Execute frame generation pass.

Both optical flow and frame generation utilize Single Pass Downsampler, another FidelityFX module. It's just for efficient generation of texture mip pyramid so I'll skip it.

Optical flow pass

Overall what I do is fairly standard compute shader experience:

  1. Create compute pipelines for optical flow shaders.
  2. Fill the constant buffers for optical flow.
  3. Dispatch optical flow shaders.
  4. Output two textures; optical flow vector and scene change detection.

The algorithm details are well explained in AMD gpuopen.

The following is the input of my optical flow class. The only GPU resource is the scene color texture.

// See ffx_opticalflow_prepare_luma.h
enum class OpticalFlowBackbufferTransferFunction : uint32
{
	LinearLdrToLuminance                  = 0,
	PQCorrectedHdrToPerceivedLuminance    = 1,
	SCRGBCorrectedHdrToPerceivedLuminance = 2,

	Count,
};

struct OpticalFlowPassInput
{
	class ClearResourcePass*              clearResourcePass; // My util class to clear textures as zero.
	OpticalFlowBackbufferTransferFunction transferFunction;
	bool                                  bResetAccumulation;
	uint32                                containerSizeX;
	uint32                                containerSizeY;
	int32                                 lumaResolutionX;
	int32                                 lumaResolutionY;
	float                                 minLuminance;
	float                                 maxLuminance;
	Texture*                              sceneColorTexture;
	ShaderResourceView*                   sceneColorSRV;
};

... with a lot of backing internal resources. I managed them manually but if you integrate FidelityFX SDK in the standard way, you don't need to care for them.

class OpticalFlowPass final : public SceneRenderPass
{
public:
	void initialize(RenderDevice* inRenderDevice);

	OpticalFlowPassOutput runOpticalFlow(RenderCommandList* commandList, const FrameInfo& frameInfo, const OpticalFlowPassInput& passInput);

private:
	void initializePipelines();
	void recreateResources(RenderCommandList* commandList, const FrameInfo& frameInfo, const OpticalFlowPassInput& passInput);

private:
	RenderDevice* device = nullptr;
	uint32 resourceFrameIndex = 0; // for CPU
	uint32 gpuFrameIndex = 0;
	bool bFirstExecution = true;

	// <FidelityFX_SDK>/sdk/src/components/opticalflow/ffx_opticalflow_private.h
	UniquePtr<ComputePipelineState>        pipelinePrepareLuma;
	UniquePtr<ComputePipelineState>        pipelineGenerateOpticalFlowInputPyramid;
	UniquePtr<ComputePipelineState>        pipelineGenerateSCDHistogram;
	UniquePtr<ComputePipelineState>        pipelineComputeSCDDivergence;
	UniquePtr<ComputePipelineState>        pipelineComputeOpticalFlowAdvancedV5;
	UniquePtr<ComputePipelineState>        pipelineFilterOpticalFlowV5;
	UniquePtr<ComputePipelineState>        pipelineScaleOpticalFlowAdvancedV5;

	VolatileDescriptorHelper               prepareLumaDescriptor;
	VolatileDescriptorHelper               genInputPyramidDescriptor;
	VolatileDescriptorHelper               genSCDHistogramDescriptor;
	VolatileDescriptorHelper               computeSCDDivergenceDescriptor;
	VolatileDescriptorHelper               computeOpticalFlowAdvancedV5Descriptor;
	VolatileDescriptorHelper               filterOpticalFlowV5Descriptor;
	VolatileDescriptorHelper               scaleOpticalFlowAdvancedV5Descriptor;

	std::vector<int32>                     containerResolutionXs;
	std::vector<int32>                     containerResolutionYs;
	std::vector<int32>                     lumaResolutionXs;
	std::vector<int32>                     lumaResolutionYs;

	UniquePtr<Texture>                     opticalFlowInputTextures[2][7];
	UniquePtr<UnorderedAccessView>         opticalFlowInputUAVs[2][7];
	UniquePtr<ShaderResourceView>          opticalFlowInputSRVs[2][7];

	UniquePtr<Texture>                     opticalFlowTextures[2][7];
	UniquePtr<UnorderedAccessView>         opticalFlowUAVs[2][7];
	UniquePtr<ShaderResourceView>          opticalFlowSRVs[2][7];

	BufferedUniquePtr<Texture>             scdHistogramTextures;
	BufferedUniquePtr<UnorderedAccessView> scdHistogramUAVs;
	UniquePtr<Texture>                     scdTempTexture;
	UniquePtr<UnorderedAccessView>         scdTempUAV;
	UniquePtr<Texture>                     scdOutputTexture; // Final output
	UniquePtr<UnorderedAccessView>         scdOutputUAV;
	UniquePtr<ShaderResourceView>          scdOutputSRV;

	uint32                                 opticalFlowVectorSizeX = 0;
	uint32                                 opticalFlowVectorSizeY = 0;
	UniquePtr<Texture>                     opticalFlowVectorTexture; // Final output
	UniquePtr<UnorderedAccessView>         opticalFlowVectorUAV;
	UniquePtr<ShaderResourceView>          opticalFlowVectorSRV;
};

Among them opticalFlowVectorTexture and scdOutputTexture are the final output, which will be fed to the frame generation pass.

struct OpticalFlowPassOutput
{
	uint32              opticalFlowVectorSizeX          = 0;
	uint32              opticalFlowVectorSizeY          = 0;
	Texture*            opticalFlowVectorTexture        = nullptr;
	ShaderResourceView* opticalFlowVectorSRV            = nullptr;
	Texture*            sceneChangeDetectionTexture     = nullptr;
	ShaderResourceView* sceneChangeDetectionSRV         = nullptr;
};

I wrap output resources to make it easy to pass them around.

Frame generation pass

Again I do fairly standard compute shader experience:

  1. Create compute pipelines for frame generation shaders.
  2. Fill the required constant buffers.
  3. Dispatch frame generation shaders.
  4. Output the interpolated color texture.

The algorithm details are well explained in AMD gpuopen. It mentions backbuffer multiple times but after all Frame Generation is an algorithm that generates an interpolated texture between two textures, and backbuffer interaction is just an implementation detail due to FidelityFX being a game-related SDK. You can just pass arbitrary textures to the algorithm and get the interpolated texture. (Backbuffer thing does matter if you integrate FidelityFX as is because it takes control of swapchain.)

The following is the input of my frame generation class.

enum class EFrameGenDispatchFlags : uint32
{
	NONE                        = 0,
	DRAW_DEBUG_TEAR_LINES       = (1 << 0),
	DRAW_DEBUG_RESET_INDICATORS = (1 << 1),
	DRAW_DEBUG_VIEW             = (1 << 2),
};
ENUM_CLASS_FLAGS(EFrameGenDispatchFlags);

struct FrameGenPassInput
{
	class ClearResourcePass*              clearResourcePass; // My util class to clear textures as zero.
	const OpticalFlowPassOutput*          opticalFlowPassOutput;
	const Camera*                         camera;
	int32                                 renderSizeX;
	int32                                 renderSizeY;
	int32                                 displaySizeX;
	int32                                 displaySizeY;
	uint32                                frameID;
	EFrameGenDispatchFlags                dispatchFlags;
	OpticalFlowBackbufferTransferFunction backBufferTransferFunction;
	bool                                  bReset;
	float                                 minLuminance;
	float                                 maxLuminance;
	Texture*                              sceneColorTexture;
	ShaderResourceView*                   sceneColorSRV;
	Texture*                              sceneDepthTexture;
	ShaderResourceView*                   sceneDepthSRV;
	Texture*                              motionVectorTexture;
	ShaderResourceView*                   motionVectorSRV;
};

Again with a lot of backing fields:

class FrameGenPass final : public SceneRenderPass
{
public:
	void initialize(RenderDevice* inRenderDevice, EPixelFormat inSourceColorFormat);

	FrameGenPassOutput runFrameGeneration(RenderCommandList* commandList, const FrameInfo& frameInfo, const FrameGenPassInput& passInput);

private:
	void initializePipelines();

	void recreateResources(RenderCommandList* commandList, const FrameGenPassInput& passInput);

	void updateUniforms(RenderCommandList* commandList, const FrameGenPassInput& passInput);
	void preparePhase(RenderCommandList* commandList, const FrameInfo& frameInfo, const FrameGenPassInput& passInput);
	void dispatchPhase(RenderCommandList* commandList, const FrameInfo& frameInfo, const FrameGenPassInput& passInput);

	ConstantBufferView* getCurrentFrameInterpUniformCBV();
	ConstantBufferView* getCurrentInpaintingPyramidUniformCBV();

private:
	RenderDevice* device = nullptr;
	EPixelFormat sourceColorFormat;

	uint32 cpuFrameIndex = 0;
	uint32 prevFrameID = 0; // from FrameGenPassInput
	uint32 interpolationDispatchCount = 0;
	bool bResetCurrentFrame = false;

	// See FfxFrameInterpolationPass enum in <FidelityFX_SDK>\sdk\src\components\frameinterpolation\ffx_frameinterpolation.cpp
	UniquePtr<ComputePipelineState>        reconstructAndDilatePipeline;
	UniquePtr<ComputePipelineState>        setupPipeline;
	UniquePtr<ComputePipelineState>        reconstructPrevDepthPipeline;
	UniquePtr<ComputePipelineState>        gameMotionVectorFieldPipeline;
	UniquePtr<ComputePipelineState>        opticalFlowVectorFieldPipeline;
	UniquePtr<ComputePipelineState>        disocclusionMaskPipeline;
	UniquePtr<ComputePipelineState>        interpolationPipeline;
	UniquePtr<ComputePipelineState>        inpaintingPyramidPipeline;
	UniquePtr<ComputePipelineState>        inpaintingPipeline;
	UniquePtr<ComputePipelineState>        gameVectorFieldInpaintingPyramidPipeline;
	UniquePtr<ComputePipelineState>        debugViewPipeline;

	VolatileDescriptorHelper               prepareDescriptor;
	VolatileDescriptorHelper               frameInterpDescriptor;
	VolatileDescriptorHelper               inpaintingPyramidDescriptor;

	VolatileDescriptorHelper               reconstructPrevDepthDescriptor;
	VolatileDescriptorHelper               gameMotionVectorFieldDescriptor;
	VolatileDescriptorHelper               gameMotionVectorFieldInpaintingPyramidDescriptor;
	VolatileDescriptorHelper               opticalFlowVectorFieldDescriptor;
	VolatileDescriptorHelper               disocclusionMaskDescriptor;
	VolatileDescriptorHelper               interpolationDescriptor;
	VolatileDescriptorHelper               inpaintingDescriptor;
	VolatileDescriptorHelper               debugViewDescriptor;

	UniquePtr<Texture>                     reconstructedPrevDepthTexture;
	UniquePtr<ShaderResourceView>          reconstructedPrevDepthSRV;
	UniquePtr<UnorderedAccessView>         reconstructedPrevDepthUAV;
	UniquePtr<Texture>                     reconstructedDepthInterpolatedFrameTexture;
	UniquePtr<ShaderResourceView>          reconstructedDepthInterpolatedFrameSRV;
	UniquePtr<UnorderedAccessView>         reconstructedDepthInterpolatedFrameUAV;
	UniquePtr<Texture>                     dilatedMotionVectorTexture;
	UniquePtr<ShaderResourceView>          dilatedMotionVectorSRV;
	UniquePtr<UnorderedAccessView>         dilatedMotionVectorUAV;
	UniquePtr<Texture>                     dilatedDepthTexture;
	UniquePtr<ShaderResourceView>          dilatedDepthSRV;
	UniquePtr<UnorderedAccessView>         dilatedDepthUAV;
	UniquePtr<Texture>                     gameMotionVectorFieldTextures[2]; // x, y
	UniquePtr<ShaderResourceView>          gameMotionVectorFieldSRVs[2]; // x, y
	UniquePtr<UnorderedAccessView>         gameMotionVectorFieldUAVs[2]; // x, y
	UniquePtr<Texture>                     opticalFlowMotionVectorFieldTextures[2]; // x, y
	UniquePtr<ShaderResourceView>          opticalFlowMotionVectorFieldSRVs[2]; // x, y
	UniquePtr<UnorderedAccessView>         opticalFlowMotionVectorFieldUAVs[2]; // x, y
	UniquePtr<Texture>                     disocclusionMaskTexture;
	UniquePtr<ShaderResourceView>          disocclusionMaskSRV;
	UniquePtr<UnorderedAccessView>         disocclusionMaskUAV;
	UniquePtr<Buffer>                      counterBuffer;
	UniquePtr<ShaderResourceView>          counterSRV;
	UniquePtr<UnorderedAccessView>         counterUAV;
	UniquePtr<Texture>                     defaultDistortionFieldTexture;
	UniquePtr<ShaderResourceView>          defaultDistortionFieldSRV;
	UniquePtr<Texture>                     prevInterpolationSourceTexture;
	UniquePtr<ShaderResourceView>          prevInterpolationSourceSRV;
	UniquePtr<UnorderedAccessView>         prevInterpolationSourceUAV;
	UniquePtr<Texture>                     inpaintingPyramidTexture;
	UniquePtr<ShaderResourceView>          inpaintingPyramidSRV;
	UniquePtr<UnorderedAccessView>         inpaintingPyramidUAVs[13];
	UniquePtr<Texture>                     opticalFlowConfidenceTexture;
	UniquePtr<ShaderResourceView>          opticalFlowConfidenceSRV;
	UniquePtr<Texture>                     interpolationOutputTexture;
	UniquePtr<ShaderResourceView>          interpolationOutputSRV;
	UniquePtr<UnorderedAccessView>         interpolationOutputUAV;
};

While dispatching the shaders on my own, I found that some input parameters and internal resources are not used at all.

	FrameInterpUniform fiUniformData{
		.renderSize                 = { passInput.renderSizeX, passInput.renderSizeY },
		.displaySize                = { passInput.displaySizeX, passInput.displaySizeY },
		.displaySizeRcp             = { 1.0f / (float)(passInput.displaySizeX), 1.0f / (float)(passInput.displaySizeY) },
		.cameraNear                 = passInput.camera->getZNear(),
		.cameraFar                  = passInput.camera->getZFar(),
		.upscalerTargetSize         = { passInput.renderSizeX, passInput.renderSizeY },
		.Mode                       = 0, // #todo-fsr3-unused: FidelityFX defines but does not use it.
		.reset                      = bResetCurrentFrame || bDisjointFrameID,
		.fDeviceToViewDepth         = { 0, 0, 0, 0 }, // Set below
		.deltaTime                  = 0, // #todo-fsr3-unused: FidelityFX defines but does not use it.
		.HUDLessAttachedFactor      = 0, // #todo-fsr3: HUDLessAttachedFactor
		.distortionFieldSize        = { 1, 1 },
		.opticalFlowScale           = { 1.0f / (float)(passInput.displaySizeX), 1.0f / (float)(passInput.displaySizeY) },
		.opticalFlowBlockSize       = kOpticalFlowBlockSize,
		.dispatchFlags              = (uint32)passInput.dispatchFlags,
		.maxRenderSize              = { passInput.displaySizeX, passInput.displaySizeY },
		.opticalFlowHalfResMode     = 0, // #todo-fsr3-unused: FidelityFX defines but does not use it.
		.NumInstances               = 0, // #todo-fsr3-unused: FidelityFX defines but does not use it.
		.interpolationRectBase      = { 0, 0 },
		.interpolationRectSize      = { passInput.renderSizeX, passInput.renderSizeY },
		.debugBarColor              = { 1.0f, 0.0f, 0.0f },
		.backBufferTransferFunction = (uint32)passInput.backBufferTransferFunction,
		.minMaxLuminance            = { passInput.minLuminance, passInput.maxLuminance },
		.fTanHalfFOV                = 0.5f * std::tan(2.0f * std::atan(std::tan(passInput.camera->getFovYInRadians() * 0.5f) * passInput.camera->getAspectRatio())),
		._pad1                      = 0,
		.fJitter                    = { 0, 0 }, // #todo-fsr3: Probably needed when doing super resolution AND interpolation.
		.fMotionVectorScale         = { -1, -1 },
	};
	setupDeviceDepthToViewSpaceDepthParams(passInput.camera, 1.0f, fiUniformData.fDeviceToViewDepth);

	frameInterpUniformCBV->writeToGPU(commandList, &fiUniformData, sizeof(fiUniformData));

This is my code to setup the constant buffer. Variables with #todo-fsr3-unused are not actually used. Also opticalFlowConfidenceTexture is not used. They might be used in later versions of FidelityFX but I didn't look for yet, so simply I don't know.

The final output is interpolationOutputTexture. Again I wrap output resources with a struct:

struct FrameGenPassOutput
{
	Texture*               interpolatedFrameTexture                = nullptr;
	ShaderResourceView*    interpolatedFrameSRV                    = nullptr;
	Texture*               opticalFlowMotionVectorFieldTextures[2] = { nullptr, nullptr };
	ShaderResourceView*    opticalFlowMotionVectorFieldSRVs[2]     = { nullptr, nullptr };
};

To present the interpolated frame you only need interpolatedFrameTexture. opticalFlowMotionVectorFieldTextures is just for visualizing optical flow in debug view mode.

This is the FG debug view of my application, which will be familiar if you ran the FidelityFX sample project.

Presenting interpolated frames

To present an interpolated frame you need to consider two things:

FidelityFX provides swapchain module for that. This module has no shaders.

After manually integrating frame generation shaders I looked into this module and realized I can't use it because of how I integrated the shaders. gpuopen explains well how the module works, like multithreading and UI handling, so I'll just describe how I implemented it in my single threaded application.

Lack of multithreading yet my rendering loop is fairly simple:

// pesudo-code of my app's loop
onTick() {
	world->update()
	
	sceneProxy = createSceneProxy(world)
	
	runSceneRenderer(sceneProxy)
	
	renderUI()
	
	present()
}

With frame generation, it changes like this:

// pesudo-code of my app's loop
onTick() {
	world->update()
	
	sceneProxy = createSceneProxy(world)
	
	// Now includes frame generation pass
	runSceneRenderer(sceneProxy)
	
	renderUI()
	presentInterpolatedFrame()
	busyWait()
	
	renderUI()
	presentRealFrame()
}

For UI composition, FidelityFX provides several options. I decided to just render the same UI twice for both interpolated and real frames. To expand the pseudo present logic a bit more:

	// Fill scenePresentInfoArray. if FG is on:
	//     scenePresentInfoArray[0] = interpolated frame
	//     scenePresentInfoArray[1] = real frame
	//     scenePresentCount = 2
	struct ScenePresentInfo
	{
		bool                bRealFrame;
		Texture*            colorTexture;
		ShaderResourceView* colorSRV;
	} scenePresentInfoArray[2];
	uint32 scenePresentCount = 0;
	
	HighFrequencyCounter interpFrameCounter;
	float interpTimeMS = 0.0f;
	
	for (uint32 presentIx = 0; presentIx < scenePresentCount; ++presentIx)
	{
		// 1. Prepare swapchain image.
		
		// 2. Blit the color texture to swapchain.

		// 3. Render UI.
		
		// 4. Present.
		swapChain->present(renderOptions.bForceVSync);
		
		if (scenePresentInfoArray[presentIx].bRealFrame == false)
		{
			interpFrameCounter.start();

			const float frameMS = avgFrameTime.getAverage();

			// Accuracy of std::this_thread::sleep_for() is too bad. Do spin wait.
			HighFrequencyCounter spinWait;
			spinWait.start();
			while (spinWait.stopWithMilliseconds() < frameMS);

			interpTimeMS += interpFrameCounter.stopWithMilliseconds();
		}
		
		if (presentIx != scenePresentCount - 1)
		{
			resetCommandList(commandAllocator, commandList);

			if (bRenderToBackbuffer)
			{
				swapChain->prepareBackbuffer();

				acquireSwapchainResources(swapchainBuffer, swapchainBufferRTV);

				finalBlitTarget    = swapchainBuffer;
				finalBlitRTV       = swapchainBufferRTV;
			}
		}
	}
	
	frameID += 1;
	avgFrameTime.push(renderOptions.prevFrameTime - prevInterpTime);
	prevInterpTime = interpTimeMS;

The display time for interpolated frames is the avrage frame time of recent N real frames and it's well explained in gpuopen. Unlike AMD's swapchain module, my implementation is single threaded therefore my 'frame time' includes the cost of running frame generation shaders and displaying interpolated frames. So when calculating average frame time I manually subtract such cost.

My frame pacing logic is not perfect, but it works in one way or another when the application's frame rate is stable and effectively doubles the frame rate. I'll improve it by carefully considering v-sync, VRR, a too long real frame that ruins avg frame time, and so on.

By its nature frame generation inevitably introduces input lag and I can feel it as a first-person shooting gamer. But the lag decreases as the base frame rate increases, and I start to cannot feel it that much when base FPS goes over 60. Also I'm not die-hard sensitive to input lag when playing slow-paced games so I think this technology is overall good to utilize.

Speculations

So far I only integrated FSR3, and I came up with several speculations for it and other frame generation SDKs. I didn't look into any of them in detail yet so take it with a grain of salt.