BRE Architecture Series Part 1 – Overview

Many years ago, I chose to study Computer Sciences because I always knew I wanted to work in the video games industry or something related with that. First, I learned the C programming language in the university, and then I learned C++ reading books because in GameDev forums and other web pages mentioned it was the most used programming language in the AAA video games industry. Also, I read some suggestions in Game Development Magazine about that specialization is very important in that industry too. That is why I began with artificial intelligence first, reading Matt Buckland’s books, but then I decided that computer graphics was the area that I most enjoyed. At that moment, I bought some books and read tutorials about DirectX 9.0c. In addition to reading the theory about computer graphics and DirectX API, and making some basic demos, I always wanted to write my own rendering engine. I did not want to make a complete game engine to make games, but a rendering engine to make interactive 3D application where I can develop state of the art computer graphics techniques. The first try was with DirectX 11, but at that time I already heard about DirectX 12 and the paradigm change it was going to be. That is why I decided to begin from scratch with DirectX 12. I am not going to mention all the advantages and complexities of DirectX12 because they are very well known, but the learning curve is much more smooth in DirectX 11 than in DirectX 12. When at last I had a general understanding of this new API, I began to develop BRE (which stands for Bertoa Rendering Engine).

BRE is a rendering framework or engine which purpose is to have a codebase on which develop techniques related to computer graphics and also to apply the stuff I learn about DirectX 12. Among BRE features we can include:

  • Task-based architecture for parallel draw submission.
  • Asynchronous command execution/command recording
  • An easy to read, understand and write scene format
  • Configurable number of queued frames to keep the GPU busy
  • Deferred shading

And the rendering techniques implemented at the moment are

  • Color Mapping
  • Texture Mapping
  • Normal Mapping
  • Height Mapping
  • Color Normal Mapping
  • Color Height Mapping
  • Skybox Mapping
  • Diffuse Irradiance Environment Mapping
  • Specular Pre-Convolved Environment Mapping
  • Tone Mapping
  • Screen Space Ambient Occlusion
  • Gamma Correction

What is the big picture of BRE architecture/structure?

The CommandListExecutor is a class that is spawned in its own thread and its only responsibility is to receive recorded command lists from the different passes and to execute them in groups. Also, we have a class called RenderManager that is the “Master Render Thread” that is in charge of passes execution. BRE has a task-based architecture for parallel draw submission. For this we chose Intel TBB to create tbb::task‘s to record command lists in parallel. BRE has classes called CommandListRecorders which record command lists. Its respective passes are in charge to begin the recording and to send already recorded command lists to the CommandListExecutor to be executed.

BRE RenderManager has a list of passes. When the passes are initialized by the RenderManager, they create the pipeline state object (PSO) for each type of command list recorder they use. In this way, we do not pay the cost of generating PSO while the application is running but at initialization time only. Each pass is responsible for executing a list of CommandListRecorders, set resource barriers, clear render targets, etc. In the following image, you can see the passes that BRE has at the time of writing this article.

BRE_Passes

We mentioned the CommandListRecorders. We have several types like ToneMappingCommandListRecorder, TextureMappingCommandListRecorder, EnvironmentLightCommandListRecorder, etc. Each type of CommandListRecorder has its own PSO that, as we mentioned, is created when each pass is initialized. Also, in the command list recording, the PSO never changes, and we do not pay the penalty of this. You can see in the following image the relation between RenderManager, passes, CommandListRecorders, and CommandListExecutor.

command list flow

DirectX12 root signatures can be written in C++ code or in an HLSL file. In our case, in addition to shader files, we always include root signature HLSL files like the following

RS.hlsl

#define RS \
"RootFlags(ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT | " \
"DENY_HULL_SHADER_ROOT_ACCESS | " \
"DENY_DOMAIN_SHADER_ROOT_ACCESS | " \
"DENY_GEOMETRY_SHADER_ROOT_ACCESS), " \
"DescriptorTable(CBV(b0), visibility = SHADER_VISIBILITY_VERTEX), " \
"CBV(b1, visibility = SHADER_VISIBILITY_VERTEX), " \
"DescriptorTable(CBV(b0), visibility = SHADER_VISIBILITY_PIXEL), " \
"CBV(b1, visibility = SHADER_VISIBILITY_PIXEL), " \
"DescriptorTable(SRV(t0), visibility = SHADER_VISIBILITY_PIXEL), " \
"DescriptorTable(SRV(t1), visibility = SHADER_VISIBILITY_PIXEL), " \
"StaticSampler(s0, filter=FILTER_ANISOTROPIC)"

and in the shader code, it is referenced in the following way

VertexShader

#include <ShaderUtils/CBuffers.hlsli>

#include "RS.hlsl"

struct Input {
    float3 mPositionObjectSpace : POSITION;
    float3 mNormalObjectSpace : NORMAL;
    float3 mTangentObjectSpace : TANGENT;
    float2 mUV : TEXCOORD;
};

ConstantBuffer<ObjectCBuffer> gObjCBuffer : register(b0);
ConstantBuffer<FrameCBuffer> gFrameCBuffer : register(b1);

struct Output {
    float4 mPositionClipSpace : SV_POSITION;
    float3 mPositionWorldSpace : POS_WORLD;
    float3 mPositionViewSpace : POS_VIEW;
    float3 mNormalWorldSpace : NORMAL_WORLD;
    float3 mNormalViewSpace : NORMAL_VIEW;
    float3 mTangentWorldSpace : TANGENT_WORLD;
    float3 mTangentViewSpace : TANGENT_VIEW;
    float3 mBinormalWorldSpace : BINORMAL_WORLD;
    float3 mBinormalViewSpace : BINORMAL_VIEW;
    float2 mUV : TEXCOORD;
};

[RootSignature(RS)]
Output main(in const Input input)
{
    Output output;
    output.mPositionWorldSpace = mul(float4(input.mPositionObjectSpace, 1.0f),
                                     gObjCBuffer.mWorldMatrix).xyz;
    output.mPositionViewSpace = mul(float4(output.mPositionWorldSpace, 1.0f),
                                    gFrameCBuffer.mViewMatrix).xyz;
    output.mPositionClipSpace = mul(float4(output.mPositionViewSpace, 1.0f),
                                    gFrameCBuffer.mProjectionMatrix);

    output.mUV = gObjCBuffer.mTexTransform * input.mUV;

    output.mNormalWorldSpace = mul(float4(input.mNormalObjectSpace, 0.0f),
                                   gObjCBuffer.mInverseTransposeWorldMatrix).xyz;
    output.mNormalViewSpace = mul(float4(output.mNormalWorldSpace, 0.0f),
                                  gFrameCBuffer.mViewMatrix).xyz;

    output.mTangentWorldSpace = mul(float4(input.mTangentObjectSpace, 0.0f),
                                    gObjCBuffer.mWorldMatrix).xyz;
    output.mTangentViewSpace = mul(float4(output.mTangentWorldSpace, 0.0f),
                                   gFrameCBuffer.mViewMatrix).xyz;

    output.mBinormalWorldSpace = normalize(cross(output.mNormalWorldSpace,
                                                 output.mTangentWorldSpace));
    output.mBinormalViewSpace = normalize(cross(output.mNormalViewSpace,
                                                output.mTangentViewSpace));

    return output;
}

As you can see, we can use the DENY_*_ACCESS flags to explicitly limit resource-shader visibility of CBVs, SRVs, and UAVs.

To avoid inserting redundant barriers, we have a ResourceStateManager where we track the current state of all the resources. Each pass is responsible for checking the state of the resource and deciding if set a resource barrier or not. Also, we set barriers for many barriers as possible, instead of just one, in a single “set barriers” function call. In the following code, you can see how this works

    CD3DX12_RESOURCE_BARRIER barriers[4U];
    std::uint32_t barrierCount = 0UL;
    if (ResourceStateManager::GetResourceState(*GetCurrentFrameBuffer()) != D3D12_RESOURCE_STATE_RENDER_TARGET) {
        barriers[barrierCount] = ResourceStateManager::ChangeResourceStateAndGetBarrier(*GetCurrentFrameBuffer(),
                                                                                        D3D12_RESOURCE_STATE_RENDER_TARGET);
        ++barrierCount;
    }

    if (ResourceStateManager::GetResourceState(*mIntermediateColorBuffer1.Get()) != D3D12_RESOURCE_STATE_RENDER_TARGET) {
        barriers[barrierCount] = ResourceStateManager::ChangeResourceStateAndGetBarrier(*mIntermediateColorBuffer1.Get(),
                                                                                        D3D12_RESOURCE_STATE_RENDER_TARGET);
        ++barrierCount;
    }

    if (ResourceStateManager::GetResourceState(*mIntermediateColorBuffer2.Get()) != D3D12_RESOURCE_STATE_RENDER_TARGET) {
        barriers[barrierCount] = ResourceStateManager::ChangeResourceStateAndGetBarrier(*mIntermediateColorBuffer2.Get(),
                                                                                        D3D12_RESOURCE_STATE_RENDER_TARGET);
        ++barrierCount;
    }

    if (ResourceStateManager::GetResourceState(*mDepthBuffer) != D3D12_RESOURCE_STATE_DEPTH_WRITE) {
        barriers[barrierCount] = ResourceStateManager::ChangeResourceStateAndGetBarrier(*mDepthBuffer,
                                                                                        D3D12_RESOURCE_STATE_DEPTH_WRITE);
        ++barrierCount;
    }

    if (barrierCount > 0UL) {
        commandList.ResourceBarrier(barrierCount, barriers);
    }

We use flip mode swap chains and use SetFullScreenState(TRUE) along with a (borderless) fullscreen window. Also, we use 3 or 4 buffer swap chain.

Deferred Shading

In BRE, the renderer uses the deferred shading technique. At the time of writing this article, these are the geometry buffers we have

untitled

Normal_Smoothness (DXGI_FORMAT_R16G16B16A16_FLOAT): R and G components used to store normal in view space based on octahedron normal vector encoding. B component used to store material smoothness. A is unused.

BaseColor_MetalMask (DXGI_FORMAT_R8G8B8A8_UNORM): RGB used for base color. A used for metalness (0 non-metal, 1 metal)

Depth (DXGI_FORMAT_R32_FLOAT): For the depth, instead of creating a new buffer, we reuse the depth buffer.

Source Code and Video

The source code is stored in GitHub, in this repository. The following video is a demo of BRE

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s