BRE Architecture Series Part 11 – Ambient Occlusion Pass

In this opportunity, we are going to talk about the ambient occlusion pass. Its previous pass is GeometryPass and its next pass is EnvironmentLightPass. You can check BRE Architecture Series Part 7 – Geometry Pass and BRE Architecture Series Part 8 – Environment Light Pass. Before giving more details we are going to talk about ambient occlusion.

What is ambient occlusion?

Ambient occlusion is simply a simulation of the shadowing caused by objects blocking the ambient light. Because ambient light is environmental, unlike other types of lighting, ambient occlusion does not depend on light direction. Ambient lighting can be combined with ambient occlusion to represent how exposed each point of the scene is, affecting the amount of ambient light it can reflect. This produces diffuse, non-directional lighting throughout the scene, casting no clear shadows, but with enclosed and sheltered areas darkened. The result is usually visually similar to an overcast day.

shadow2.jpg

How do we implement ambient occlusion in BRE?

Vladimir Kajalin was working at Crytek when he developed a technique called Screen Space Ambient Occlusion (SSAO) that was used for first time in 2007 in Crysis video game.

In a John Chapman’s post, he explains the problem with Crysis’s SSAO implementation:

“The Crysis method produces occlusion factors with a particular ‘look’ – because the sample kernel is a sphere, flat walls end up looking grey because ~50% of the samples end up being inside the surrounding geometry. Concave corners darken as expected, but convex ones appear lighter since fewer samples fall inside geometry. Although these artifacts are visually acceptable, they produce a stylistic effect which strays somewhat from photorealism.”

So, BRE follows John Chapman’s approach. You can see a demo in the following video and screenshot

ssao

In BRE, we have a type of command list recorder named AmbientOcclusionCommandListRecorder that generates a command list related with ambient occlusion and pushes it to the CommandListExecutor to be executed. This command list recorder needs an input texture which is created by the AmbientOcclusionPass only for this purpose. Its result is the ambient accessibility texture that is consumed by the BlurCommandListRecorder. In addition, we need the geometry buffer that stores the normals (we will explain later why). Its result is the ambient accessibility texture that is consumed by the BlurCommandListRecorder. You can see a diagram in the following picture

ssao_workflow

Its implementation is the following

AmbientOcclusionCommandListRecorder.h

#pragma once

#include <DirectXMath.h>
#include <vector>

#include <CommandManager\CommandListPerFrame.h>
#include <ResourceManager\FrameUploadCBufferPerFrame.h>
#include <ResourceManager\UploadBuffer.h>

namespace BRE {
struct FrameCBuffer;

///
/// @brief Responsible of command list recording for ambient occlusion pass.
///
class AmbientOcclusionCommandListRecorder {
public:
    AmbientOcclusionCommandListRecorder() = default;
    ~AmbientOcclusionCommandListRecorder() = default;
    AmbientOcclusionCommandListRecorder(const AmbientOcclusionCommandListRecorder&) = delete;
    const AmbientOcclusionCommandListRecorder& operator=(const AmbientOcclusionCommandListRecorder&) = delete;
    AmbientOcclusionCommandListRecorder(AmbientOcclusionCommandListRecorder&&) = default;
    AmbientOcclusionCommandListRecorder& operator=(AmbientOcclusionCommandListRecorder&&) = default;

    ///
    /// @brief Initializes the pipeline state object and root signature
    ///
    /// This method must be called at the beginning of the application, and once.
    ///
    ///
    static void InitSharedPSOAndRootSignature() noexcept;

    ///
    /// @brief Initializes the command list recorder.
    ///
    /// InitSharedPSOAndRootSignature() must be called first and once
    ///
    /// @param ambientAccessibilityBufferRenderTargetView Render target view to the ambient accessibility buffer
    /// @param normalRoughnessBufferShaderResourceView Shader resource view to the normal and roughness buffer
    /// @param depthBufferShaderResourceView Depth buffer shader resource view
    ///k
    void Init(const D3D12_CPU_DESCRIPTOR_HANDLE& ambientAccessibilityBufferRenderTargetView,
              const D3D12_GPU_DESCRIPTOR_HANDLE& normalRoughnessBufferShaderResourceView,
              const D3D12_GPU_DESCRIPTOR_HANDLE& depthBufferShaderResourceView) noexcept;

    ///
    /// @brief Records a command list and push it into the CommandListExecutor
    ///
    /// Init() must be called first
    ///
    /// @param frameCBuffer Constant buffer per frame, for current frame
    /// @return The number of pushed command lists
    ///
    std::uint32_t RecordAndPushCommandLists(const FrameCBuffer& frameCBuffer) noexcept;

    ///
    /// @brief Validates internal data. Used most with assertions.
    ///
    bool IsDataValid() const noexcept;

private:
    ///
    /// @brief Creates the sample kernel buffer
    /// @param sampleKernel List of 4D coordinate vectors for the sample kernel
    ///
    void CreateSampleKernelBuffer(const std::vector<DirectX::XMFLOAT4>& sampleKernel) noexcept;

    ///
    /// @brief Creates the noise texture used in ambient occlusion
    /// @param noiseVector List of 4D noise vectors
    /// @return The created noise texture
    ///
    ID3D12Resource* CreateAndGetNoiseTexture(const std::vector<DirectX::XMFLOAT4>& noiseVector) noexcept;

    ///
    /// @brief Initializes ambient occlusion shaders resource views
    /// @param noiseTexture Noise texture created with CreateAndGetNoiseTexture
    /// @param sampleKernelSize Size of the sample kernel
    /// @see CreateSampleKernelBuffer
    /// @see CreateAndGetNoiseTexture
    ///
    void InitShaderResourceViews(ID3D12Resource& noiseTexture,
                                 const std::uint32_t sampleKernelSize) noexcept;

    ///
    /// @brief Initialize ambient occlusion constant buffer
    ///
    void InitAmbientOcclusionCBuffer() noexcept;

    CommandListPerFrame mCommandListPerFrame;

    FrameUploadCBufferPerFrame mFrameUploadCBufferPerFrame;

    UploadBuffer* mSampleKernelUploadBuffer{ nullptr };

    D3D12_CPU_DESCRIPTOR_HANDLE mAmbientAccessibilityBufferRenderTargetView{ 0UL };

    D3D12_GPU_DESCRIPTOR_HANDLE mNormalRoughnessBufferShaderResourceView{ 0UL };
    D3D12_GPU_DESCRIPTOR_HANDLE mDepthBufferShaderResourceView{ 0UL };

    // First descriptor in the list. All the others are contiguous
    D3D12_GPU_DESCRIPTOR_HANDLE mPixelShaderResourceViewsBegin{ 0UL };

    UploadBuffer* mAmbientOcclusionUploadCBuffer{ nullptr };
};
}

AmbientOcclusionCommandListRecorder.cpp

#include "AmbientOcclusionCommandListRecorder.h"

#include <AmbientOcclusionPass\AmbientOcclusionSettings.h>
#include <AmbientOcclusionPass\Shaders\AmbientOcclusionCBuffer.h>
#include <ApplicationSettings\ApplicationSettings.h>
#include <CommandListExecutor\CommandListExecutor.h>
#include <DescriptorManager\CbvSrvUavDescriptorManager.h>
#include <DXUtils\D3DFactory.h>
#include <DXUtils\d3dx12.h>
#include <MathUtils/MathUtils.h>
#include <PSOManager/PSOManager.h>
#include <ResourceManager/ResourceManager.h>
#include <ResourceManager/UploadBufferManager.h>
#include <ResourceStateManager\ResourceStateManager.h>
#include <RootSignatureManager\RootSignatureManager.h>
#include <ShaderManager\ShaderManager.h>
#include <ShaderUtils\CBuffers.h>
#include <Utils/DebugUtils.h>

using namespace DirectX;

namespace BRE {
// Root Signature:
// "CBV(b0, visibility = SHADER_VISIBILITY_VERTEX), " \ 0 -> Frame CBuffer
// "CBV(b0, visibility = SHADER_VISIBILITY_PIXEL), " \ 1 -> Frame CBuffer
// "CBV(b1, visibility = SHADER_VISIBILITY_PIXEL), " \ 2 -> Ambient Occlusion CBuffer
// "DescriptorTable(SRV(t0), visibility = SHADER_VISIBILITY_PIXEL)" 3 -> normal_roughness
// "DescriptorTable(SRV(t1), SRV(t2), visibility = SHADER_VISIBILITY_PIXEL)" 4 -> sample kernel + kernel noise
// "DescriptorTable(SRV(t3), visibility = SHADER_VISIBILITY_PIXEL)" 5 -> depth buffer

namespace {
ID3D12PipelineState* sPSO{ nullptr };
ID3D12RootSignature* sRootSignature{ nullptr };

///
/// @brief Generates sample kernel
///
/// Sample kernel for ambient occlusion. The requirements are that:
/// - Sample positions fall within the unit hemisphere oriented
///   toward positive z axis.
/// - Sample positions are more densely clustered towards the origin.
///   This effectively attenuates the occlusion contribution
///   according to distance from the sample kernel centre (samples closer
///   to a point occlude it more than samples further away).
///
/// @param sampleKernelSize Size of the sample kernel to generate
/// @param sampleKernel Output sample kernel list
///
void
GenerateSampleKernel(const std::uint32_t sampleKernelSize,
                     std::vector<XMFLOAT4>& sampleKernel)
{
    BRE_ASSERT(sampleKernelSize > 0U);

    sampleKernel.reserve(sampleKernelSize);
    const float sampleKernelSizeFloat = static_cast<float>(sampleKernelSize);
    XMVECTOR vec;
    for (std::uint32_t i = 0U; i < sampleKernelSize; ++i) {
        const float x = MathUtils::RandomFloatInInterval(-1.0f, 1.0f);
        const float y = MathUtils::RandomFloatInInterval(-1.0f, 1.0f);
        const float z = MathUtils::RandomFloatInInterval(0.0f, 1.0f);
        sampleKernel.push_back(XMFLOAT4(x, y, z, 0.0f));
        XMFLOAT4& currentSample = sampleKernel.back();
        vec = XMLoadFloat4(&currentSample);
        vec = XMVector4Normalize(vec);
        XMStoreFloat4(&currentSample, vec);

        // Accelerating interpolation function to falloff
        // from the distance from the origin.
        float scale = i / sampleKernelSizeFloat;
        scale = MathUtils::Lerp(0.1f, 1.0f, scale * scale);
        vec = XMVectorScale(vec, scale);
        XMStoreFloat4(&currentSample, vec);
    }
}

///
/// @brief Generates noise vector
///
/// Generate a set of random values used to rotate the sample kernel,
/// which will effectively increase the sample count and minimize
/// the 'banding' artifacts.
///
/// @param numSamples Number of samples to generate
/// @param noiseVector Output noise vector
///
void
GenerateNoise(const std::uint32_t numSamples,
              std::vector<XMFLOAT4>& noiseVector)
{
    BRE_ASSERT(numSamples > 0U);

    noiseVector.reserve(numSamples);
    XMVECTOR vec;
    for (std::uint32_t i = 0U; i < numSamples; ++i) {
        const float x = MathUtils::RandomFloatInInterval(-1.0f, 1.0f);
        const float y = MathUtils::RandomFloatInInterval(-1.0f, 1.0f);
        // The z component must zero. Since our kernel is oriented along the z-axis,
        // we want the random rotation to occur around that axis.
        const float z = 0.0f;
        noiseVector.push_back(XMFLOAT4(x, y, z, 0.0f));
        XMFLOAT4& currentSample = noiseVector.back();
        vec = XMLoadFloat4(&currentSample);
        vec = XMVector4Normalize(vec);
        XMStoreFloat4(&currentSample, vec);

        // Map from [-1.0f, 1.0f] to [0.0f, 1.0f] because
        // this is going to be stored in a texture
        currentSample.x = currentSample.x * 0.5f + 0.5f;
        currentSample.y = currentSample.y * 0.5f + 0.5f;
        currentSample.z = currentSample.z * 0.5f + 0.5f;
    }
}
}

void
AmbientOcclusionCommandListRecorder::InitSharedPSOAndRootSignature() noexcept
{
    BRE_ASSERT(sPSO == nullptr);
    BRE_ASSERT(sRootSignature == nullptr);

    PSOManager::PSOCreationData psoData{};
    psoData.mBlendDescriptor = D3DFactory::GetAlwaysBlendDesc();
    psoData.mDepthStencilDescriptor = D3DFactory::GetDisabledDepthStencilDesc();

    psoData.mPixelShaderBytecode = ShaderManager::LoadShaderFileAndGetBytecode("AmbientOcclusionPass/Shaders/SSAO/PS.cso");
    psoData.mVertexShaderBytecode = ShaderManager::LoadShaderFileAndGetBytecode("AmbientOcclusionPass/Shaders/SSAO/VS.cso");

    ID3DBlob* rootSignatureBlob = &ShaderManager::LoadShaderFileAndGetBlob("AmbientOcclusionPass/Shaders/SSAO/RS.cso");
    psoData.mRootSignature = &RootSignatureManager::CreateRootSignatureFromBlob(*rootSignatureBlob);
    sRootSignature = psoData.mRootSignature;

    psoData.mNumRenderTargets = 1U;
    psoData.mRenderTargetFormats[0U] = DXGI_FORMAT_R16_UNORM;
    for (std::size_t i = psoData.mNumRenderTargets; i < _countof(psoData.mRenderTargetFormats); ++i) {
        psoData.mRenderTargetFormats[i] = DXGI_FORMAT_UNKNOWN;
    }
    psoData.mPrimitiveTopologyType = D3D12_PRIMITIVE_TOPOLOGY_TYPE_TRIANGLE;
    sPSO = &PSOManager::CreateGraphicsPSO(psoData);

    BRE_ASSERT(sPSO != nullptr);
    BRE_ASSERT(sRootSignature != nullptr);
}

void
AmbientOcclusionCommandListRecorder::Init(const D3D12_CPU_DESCRIPTOR_HANDLE& ambientAccessibilityBufferRenderTargetView,
                                          const D3D12_GPU_DESCRIPTOR_HANDLE& normalRoughnessBufferShaderResourceView,
                                          const D3D12_GPU_DESCRIPTOR_HANDLE& depthBufferShaderResourceView) noexcept
{
    BRE_ASSERT(IsDataValid() == false);

    mAmbientAccessibilityBufferRenderTargetView = ambientAccessibilityBufferRenderTargetView;
    mNormalRoughnessBufferShaderResourceView = normalRoughnessBufferShaderResourceView;
    mDepthBufferShaderResourceView = depthBufferShaderResourceView;

    const std::uint32_t sampleKernelSize =
        static_cast<std::uint32_t>(AmbientOcclusionSettings::sSampleKernelSize);
    const std::uint32_t noiseTextureDimension =
        static_cast<std::uint32_t>(AmbientOcclusionSettings::sNoiseTextureDimension);

    std::vector<XMFLOAT4> sampleKernel;
    GenerateSampleKernel(sampleKernelSize, sampleKernel);
    std::vector<XMFLOAT4> noises;
    GenerateNoise(noiseTextureDimension * noiseTextureDimension, noises);

    CreateSampleKernelBuffer(sampleKernel);
    ID3D12Resource* noiseTexture = CreateAndGetNoiseTexture(noises);
    BRE_ASSERT(noiseTexture != nullptr);
    InitShaderResourceViews(*noiseTexture,
                            sampleKernelSize);

    InitAmbientOcclusionCBuffer();

    BRE_ASSERT(IsDataValid());
}

std::uint32_t
AmbientOcclusionCommandListRecorder::RecordAndPushCommandLists(const FrameCBuffer& frameCBuffer) noexcept
{
    BRE_ASSERT(IsDataValid());
    BRE_ASSERT(sPSO != nullptr);
    BRE_ASSERT(sRootSignature != nullptr);

    ID3D12GraphicsCommandList& commandList = mCommandListPerFrame.ResetCommandListWithNextCommandAllocator(sPSO);

    // Update frame constants
    UploadBuffer& uploadFrameCBuffer(mFrameUploadCBufferPerFrame.GetNextFrameCBuffer());
    uploadFrameCBuffer.CopyData(0U, &frameCBuffer, sizeof(frameCBuffer));

    commandList.RSSetViewports(1U, &ApplicationSettings::sScreenViewport);
    commandList.RSSetScissorRects(1U, &ApplicationSettings::sScissorRect);
    commandList.OMSetRenderTargets(1U,
                                   &mAmbientAccessibilityBufferRenderTargetView,
                                   false,
                                   nullptr);

    ID3D12DescriptorHeap* heaps[] = { &CbvSrvUavDescriptorManager::GetDescriptorHeap() };
    commandList.SetDescriptorHeaps(_countof(heaps), heaps);

    commandList.SetGraphicsRootSignature(sRootSignature);
    const D3D12_GPU_VIRTUAL_ADDRESS frameCBufferGpuVAddress(
        uploadFrameCBuffer.GetResource().GetGPUVirtualAddress());
    const D3D12_GPU_VIRTUAL_ADDRESS ambientOcclusionCBufferGpuVAddress(
        mAmbientOcclusionUploadCBuffer->GetResource().GetGPUVirtualAddress());
    commandList.SetGraphicsRootConstantBufferView(0U, frameCBufferGpuVAddress);
    commandList.SetGraphicsRootConstantBufferView(1U, frameCBufferGpuVAddress);
    commandList.SetGraphicsRootConstantBufferView(2U, ambientOcclusionCBufferGpuVAddress);
    commandList.SetGraphicsRootDescriptorTable(3U, mNormalRoughnessBufferShaderResourceView);
    commandList.SetGraphicsRootDescriptorTable(4U, mPixelShaderResourceViewsBegin);
    commandList.SetGraphicsRootDescriptorTable(5U, mDepthBufferShaderResourceView);

    commandList.IASetPrimitiveTopology(D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST);
    commandList.DrawInstanced(6U, 1U, 0U, 0U);

    commandList.Close();
    CommandListExecutor::Get().PushCommandList(commandList);

    return 1U;
}

bool
AmbientOcclusionCommandListRecorder::IsDataValid() const noexcept
{
    const bool result =
        mSampleKernelUploadBuffer != nullptr &&
        mAmbientAccessibilityBufferRenderTargetView.ptr != 0UL &&

        mNormalRoughnessBufferShaderResourceView.ptr != 0UL &&
        mDepthBufferShaderResourceView.ptr != 0UL &&
        mPixelShaderResourceViewsBegin.ptr != 0UL &&
        mAmbientOcclusionUploadCBuffer != nullptr;

    return result;
}

void
AmbientOcclusionCommandListRecorder::CreateSampleKernelBuffer(const std::vector<XMFLOAT4>& sampleKernel) noexcept
{
    BRE_ASSERT(mSampleKernelUploadBuffer == nullptr);
    BRE_ASSERT(sampleKernel.empty() == false);

    const std::uint32_t sampleKernelSize = static_cast<std::uint32_t>(sampleKernel.size());

    const std::size_t sampleKernelBufferElemSize{ sizeof(XMFLOAT4) };
    mSampleKernelUploadBuffer = &UploadBufferManager::CreateUploadBuffer(sampleKernelBufferElemSize,
                                                                         sampleKernelSize);
    const std::uint8_t* sampleKernelPtr = reinterpret_cast<const std::uint8_t*>(sampleKernel.data());
    for (std::uint32_t i = 0UL; i < sampleKernelSize; ++i) {         mSampleKernelUploadBuffer->CopyData(i,
                                            sampleKernelPtr + sampleKernelBufferElemSize * i,
                                            sampleKernelBufferElemSize);
    }
}

ID3D12Resource*
AmbientOcclusionCommandListRecorder::CreateAndGetNoiseTexture(const std::vector<XMFLOAT4>& noiseVector) noexcept
{
    BRE_ASSERT(noiseVector.empty() == false);

    const std::uint32_t noiseVectorCount = static_cast<std::uint32_t>(noiseVector.size());

    // Kernel noise resource and fill it
    D3D12_RESOURCE_DESC resourceDescriptor = D3DFactory::GetResourceDescriptor(noiseVectorCount,
                                                                               noiseVectorCount,
                                                                               DXGI_FORMAT_R16G16B16A16_UNORM,
                                                                               D3D12_RESOURCE_FLAG_NONE);

    // Create noise texture and fill it.
    ID3D12Resource* noiseTexture{ nullptr };
    D3D12_HEAP_PROPERTIES heapProperties = D3DFactory::GetHeapProperties();
    noiseTexture = &ResourceManager::CreateCommittedResource(heapProperties,
                                                             D3D12_HEAP_FLAG_NONE,
                                                             resourceDescriptor,
                                                             D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE,
                                                             nullptr,
                                                             L"Noise Buffer",
                                                             ResourceManager::ResourceStateTrackingType::FULL_TRACKING);

    // In order to copy CPU memory data into our default buffer, we need to create
    // an intermediate upload heap.
    const std::uint32_t num2DSubresources = resourceDescriptor.DepthOrArraySize * resourceDescriptor.MipLevels;
    const std::size_t uploadBufferSize = GetRequiredIntermediateSize(noiseTexture, 0, num2DSubresources);
    ID3D12Resource* noiseTextureUploadBuffer{ nullptr };

    heapProperties = D3DFactory::GetHeapProperties(D3D12_HEAP_TYPE_UPLOAD,
                                                   D3D12_CPU_PAGE_PROPERTY_UNKNOWN,
                                                   D3D12_MEMORY_POOL_UNKNOWN,
                                                   1U,
                                                   0U);

    resourceDescriptor = D3DFactory::GetResourceDescriptor(uploadBufferSize,
                                                           1,
                                                           DXGI_FORMAT_UNKNOWN,
                                                           D3D12_RESOURCE_FLAG_NONE,
                                                           D3D12_RESOURCE_DIMENSION_BUFFER,
                                                           D3D12_TEXTURE_LAYOUT_ROW_MAJOR);

    noiseTextureUploadBuffer = &ResourceManager::CreateCommittedResource(heapProperties,
                                                                         D3D12_HEAP_FLAG_NONE,
                                                                         resourceDescriptor,
                                                                         D3D12_RESOURCE_STATE_GENERIC_READ,
                                                                         nullptr,
                                                                         nullptr,
                                                                         ResourceManager::ResourceStateTrackingType::NO_TRACKING);

    return noiseTexture;
}

void
AmbientOcclusionCommandListRecorder::InitShaderResourceViews(ID3D12Resource& noiseTexture,
                                                             const std::uint32_t sampleKernelSize) noexcept
{
    BRE_ASSERT(mSampleKernelUploadBuffer != nullptr);
    BRE_ASSERT(sampleKernelSize != 0U);

    ID3D12Resource* resources[] =
    {
        &mSampleKernelUploadBuffer->GetResource(),
        &noiseTexture,
    };

    D3D12_SHADER_RESOURCE_VIEW_DESC srvDescriptors[_countof(resources)]{};

    // Fill sample kernel buffer descriptor
    srvDescriptors[0].Shader4ComponentMapping = D3D12_DEFAULT_SHADER_4_COMPONENT_MAPPING;
    srvDescriptors[0].Format = mSampleKernelUploadBuffer->GetResource().GetDesc().Format;
    srvDescriptors[0].ViewDimension = D3D12_SRV_DIMENSION_BUFFER;
    srvDescriptors[0].Buffer.FirstElement = 0UL;
    srvDescriptors[0].Buffer.NumElements = sampleKernelSize;
    srvDescriptors[0].Buffer.StructureByteStride = sizeof(XMFLOAT4);

    // Fill kernel noise texture descriptor
    srvDescriptors[1].Shader4ComponentMapping = D3D12_DEFAULT_SHADER_4_COMPONENT_MAPPING;
    srvDescriptors[1].ViewDimension = D3D12_SRV_DIMENSION_TEXTURE2D;
    srvDescriptors[1].Texture2D.MostDetailedMip = 0;
    srvDescriptors[1].Texture2D.ResourceMinLODClamp = 0.0f;
    srvDescriptors[1].Format = noiseTexture.GetDesc().Format;
    srvDescriptors[1].Texture2D.MipLevels = noiseTexture.GetDesc().MipLevels;

    BRE_ASSERT(_countof(resources) == _countof(srvDescriptors));

    mPixelShaderResourceViewsBegin = CbvSrvUavDescriptorManager::CreateShaderResourceViews(resources,
                                                                                           srvDescriptors,
                                                                                           _countof(srvDescriptors));
}

void
AmbientOcclusionCommandListRecorder::InitAmbientOcclusionCBuffer() noexcept
{
    const std::size_t ambientOcclusionUploadCBufferElemSize =
        UploadBuffer::GetRoundedConstantBufferSizeInBytes(sizeof(AmbientOcclusionCBuffer));

    mAmbientOcclusionUploadCBuffer = &UploadBufferManager::CreateUploadBuffer(ambientOcclusionUploadCBufferElemSize,
                                                                              1U);
    AmbientOcclusionCBuffer ambientOcclusionCBuffer(static_cast<float>(ApplicationSettings::sWindowWidth),
                                                    static_cast<float>(ApplicationSettings::sWindowHeight),
                                                    AmbientOcclusionSettings::sSampleKernelSize,
                                                    AmbientOcclusionSettings::sNoiseTextureDimension,
                                                    AmbientOcclusionSettings::sOcclusionRadius,
                                                    AmbientOcclusionSettings::sSsaoPower);

    mAmbientOcclusionUploadCBuffer->CopyData(0U, &ambientOcclusionCBuffer, sizeof(AmbientOcclusionCBuffer));
}

}

We will explain in detail each step involved in the ambient occlusion. First, we need to generate a sample kernel. These samples are distributed over the surface normal hemisphere. The sphere radius R is a parameter that should be appropriate to the scale of the scene. This can be generated in CPU side. The source code to do that is the following

///
/// @brief Generates sample kernel
///
/// Sample kernel for ambient occlusion. The requirements are that:
/// - Sample positions fall within the unit hemisphere oriented
///   toward positive z axis.
/// - Sample positions are more densely clustered towards the origin.
///   This effectively attenuates the occlusion contribution
///   according to distance from the sample kernel centre (samples closer
///   to a point occlude it more than samples further away).
///
/// @param sampleKernelSize Size of the sample kernel to generate
/// @param sampleKernel Output sample kernel list
///
void
GenerateSampleKernel(const std::uint32_t sampleKernelSize,
                     std::vector<XMFLOAT4>& sampleKernel)
{
    BRE_ASSERT(sampleKernelSize > 0U);

    sampleKernel.reserve(sampleKernelSize);
    const float sampleKernelSizeFloat = static_cast<float>(sampleKernelSize);
    XMVECTOR vec;
    for (std::uint32_t i = 0U; i < sampleKernelSize; ++i) {
        const float x = MathUtils::RandomFloatInInverval(-1.0f, 1.0f);
        const float y = MathUtils::RandomFloatInInverval(-1.0f, 1.0f);
        const float z = MathUtils::RandomFloatInInverval(0.0f, 1.0f);
        sampleKernel.push_back(XMFLOAT4(x, y, z, 0.0f));
        XMFLOAT4& currentSample = sampleKernel.back();
        vec = XMLoadFloat4(¤tSample);
        vec = XMVector4Normalize(vec);
        XMStoreFloat4(¤tSample, vec);

        // Accelerating interpolation function to falloff
        // from the distance from the origin.
        float scale = i / sampleKernelSizeFloat;
        scale = MathUtils::Lerp(0.1f, 1.0f, scale * scale);
        vec = XMVectorScale(vec, scale);
        XMStoreFloat4(¤tSample, vec);
    }
}

void
AmbientOcclusionCommandListRecorder::CreateSampleKernelBuffer(const std::vector<XMFLOAT4>& sampleKernel) noexcept
{
    BRE_ASSERT(mSampleKernelUploadBuffer == nullptr);
    BRE_ASSERT(sampleKernel.empty() == false);

    const std::uint32_t sampleKernelSize = static_cast<std::uint32_t>(sampleKernel.size());

    const std::size_t sampleKernelBufferElemSize{ sizeof(XMFLOAT4) };
    mSampleKernelUploadBuffer = &UploadBufferManager::CreateUploadBuffer(sampleKernelBufferElemSize,
                                                                         sampleKernelSize);
    const std::uint8_t* sampleKernelPtr = reinterpret_cast<const std::uint8_t*>(sampleKernel.data());
    for (std::uint32_t i = 0UL; i < sampleKernelSize; ++i) {         mSampleKernelUploadBuffer->CopyData(i,
                                            sampleKernelPtr + sampleKernelBufferElemSize * i,
                                            sampleKernelBufferElemSize);
    }
}

We cannot generate a lot of samples for performance reasons, but reducing the number of samples produces banding artifacts in the result. This problem can be addressed by randomly rotating the sample kernel at each pixel. To achieve this, we need to generate a noise texture that contains random float3 used to rotate the sample kernel. This increases the sample count and minimizes the banding artifacts. The source code to do that is the following

///
/// @brief Generates noise vector
///
/// Generate a set of random values used to rotate the sample kernel,
/// which will effectively increase the sample count and minimize
/// the 'banding' artifacts.
///
/// @param numSamples Number of samples to generate
/// @param noiseVector Output noise vector
///
void
GenerateNoise(const std::uint32_t numSamples,
              std::vector<XMFLOAT4>& noiseVector)
{
    BRE_ASSERT(numSamples > 0U);

    noiseVector.reserve(numSamples);
    XMVECTOR vec;
    for (std::uint32_t i = 0U; i < numSamples; ++i) {
        const float x = MathUtils::RandomFloatInInverval(-1.0f, 1.0f);
        const float y = MathUtils::RandomFloatInInverval(-1.0f, 1.0f);
        // The z component must zero. Since our kernel is oriented along the z-axis,
        // we want the random rotation to occur around that axis.
        const float z = 0.0f;
        noiseVector.push_back(XMFLOAT4(x, y, z, 0.0f));
        XMFLOAT4& currentSample = noiseVector.back();
        vec = XMLoadFloat4(¤tSample);
        vec = XMVector4Normalize(vec);
        XMStoreFloat4(¤tSample, vec);

        // Map from [-1.0f, 1.0f] to [0.0f, 1.0f] because
        // this is going to be stored in a texture
        currentSample.x = currentSample.x * 0.5f + 0.5f;
        currentSample.y = currentSample.y * 0.5f + 0.5f;
        currentSample.z = currentSample.z * 0.5f + 0.5f;
    }
}

ID3D12Resource*
AmbientOcclusionCommandListRecorder::CreateAndGetNoiseTexture(const std::vector<XMFLOAT4>& noiseVector) noexcept
{
    BRE_ASSERT(noiseVector.empty() == false);

    const std::uint32_t noiseVectorCount = static_cast<std::uint32_t>(noiseVector.size());

    // Kernel noise resource and fill it
    D3D12_RESOURCE_DESC resourceDescriptor = {};
    resourceDescriptor.Dimension = D3D12_RESOURCE_DIMENSION_TEXTURE2D;
    resourceDescriptor.Alignment = 0U;
    resourceDescriptor.Width = noiseVectorCount;
    resourceDescriptor.Height = noiseVectorCount;
    resourceDescriptor.DepthOrArraySize = 1U;
    resourceDescriptor.MipLevels = 1U;
    resourceDescriptor.SampleDesc.Count = 1U;
    resourceDescriptor.SampleDesc.Quality = 0U;
    resourceDescriptor.Layout = D3D12_TEXTURE_LAYOUT_UNKNOWN;
    resourceDescriptor.Format = DXGI_FORMAT_R16G16B16A16_UNORM;
    resourceDescriptor.Flags = D3D12_RESOURCE_FLAG_NONE;

    // Create noise texture and fill it.
    ID3D12Resource* noiseTexture{ nullptr };
    CD3DX12_HEAP_PROPERTIES heapProps{ D3D12_HEAP_TYPE_DEFAULT };
    noiseTexture = &ResourceManager::CreateCommittedResource(heapProps,
                                                             D3D12_HEAP_FLAG_NONE,
                                                             resourceDescriptor,
                                                             D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE,
                                                             nullptr,
                                                             L"Noise Buffer");

    // In order to copy CPU memory data into our default buffer, we need to create
    // an intermediate upload heap.
    const std::uint32_t num2DSubresources = resourceDescriptor.DepthOrArraySize * resourceDescriptor.MipLevels;
    const std::size_t uploadBufferSize = GetRequiredIntermediateSize(noiseTexture, 0, num2DSubresources);
    ID3D12Resource* noiseTextureUploadBuffer{ nullptr };
    noiseTextureUploadBuffer = &ResourceManager::CreateCommittedResource(CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD),
                                                                         D3D12_HEAP_FLAG_NONE,
                                                                         CD3DX12_RESOURCE_DESC::Buffer(uploadBufferSize),
                                                                         D3D12_RESOURCE_STATE_GENERIC_READ,
                                                                         nullptr,
                                                                         nullptr);

    return noiseTexture;
}

AmbientOcclusionCBuffer.h

#pragma once

#include <cstdint>

#include <ApplicationSettings\ApplicationSettings.h>
#include <EnvironmentLightPass\EnvironmentLightSettings.h>
#include <Utils\DebugUtils.h>

namespace BRE {
///
/// @brief Ambient occlusion constant buffer
///
struct AmbientOcclusionCBuffer {
    AmbientOcclusionCBuffer() = default;

    ///
    /// @brief AmbientOcclussionCBuffer constructor
    /// @param screenWidth Screen width
    /// @param screenHeight Screen height
    /// @param sampleKernelSize Number of samples in the kernel
    /// @param noiseTextureDimension Noise vectors texture dimensions (e.g. 4 (4x4), 8 (8x8))
    /// @param occlusionRadius Radius around the fragment of the geometry to take into account
    /// for ambient occlusion
    /// @param ssaoPower Power to sharpen the constrat in ambient occlusion
    ///
    AmbientOcclusionCBuffer(const float screenWidth,
                            const float screenHeight,
                            const std::uint32_t sampleKernelSize,
                            const std::uint32_t noiseTextureDimension,
                            const float occlusionRadius,
                            const float ssaoPower)
        : mScreenWidth(screenWidth)
        , mScreenHeight(screenHeight)
        , mSampleKernelSize(sampleKernelSize)
        , mNoiseTextureDimension(noiseTextureDimension)
        , mOcclusionRadius(occlusionRadius)
        , mSsaoPower(ssaoPower)
    {
        BRE_ASSERT(screenWidth > 0.0f);
        BRE_ASSERT(screenHeight > 0.0f);
        BRE_ASSERT(sampleKernelSize > 0U);
        BRE_ASSERT(noiseTextureDimension > 0U);
        BRE_ASSERT(occlusionRadius > 0.0f);
        BRE_ASSERT(ssaoPower > 0.0f);
    }

    ~AmbientOcclusionCBuffer() = default;
    AmbientOcclusionCBuffer(const AmbientOcclusionCBuffer&) = default;
    AmbientOcclusionCBuffer(AmbientOcclusionCBuffer&&) = default;
    AmbientOcclusionCBuffer& operator=(AmbientOcclusionCBuffer&&) = default;

    float mScreenWidth{ static_cast<float>(ApplicationSettings::sWindowWidth) };
    float mScreenHeight{ static_cast<float>(ApplicationSettings::sWindowHeight) };
    std::uint32_t mSampleKernelSize{ EnvironmentLightSettings::sSampleKernelSize };
    std::uint32_t mNoiseTextureDimension{ EnvironmentLightSettings::sNoiseTextureDimension };
    float mOcclusionRadius{ EnvironmentLightSettings::sOcclusionRadius };
    float mSsaoPower{ EnvironmentLightSettings::sSsaoPower };
};

}

Crysis implementation used a texture of 4×4 and tiled it over the screen. This will cause the orientation of the kernel to be repeated. As the texture is small, this will occur at a high frequency. To remove this high frequency, we can add a blurring step that preserves the low-frequency detail of the image (this is done by BlurCommandListRecorder that will be explained later). This is cheaper than generating a noise texture of screen width * screen height dimension.

Regarding shaders implementation, we use vertex and pixel shaders. We send a triangle list of 2 triangles, and in the vertex shader, we expand these triangles to match a full-screen quad dimensions in NDC space. The code is the following

VertexShader

#include <ShaderUtils/CBuffers.hlsli>

#include "RS.hlsl"

struct Input {
    uint mVertexId : SV_VertexID;
};

static const float2 gQuadUVs[6] = {
    float2(0.0f, 1.0f),
    float2(0.0f, 0.0f),
    float2(1.0f, 0.0f),
    float2(0.0f, 1.0f),
    float2(1.0f, 0.0f),
    float2(1.0f, 1.0f)
};

ConstantBuffer<FrameCBuffer> gFrameCBuffer : register(b0);

struct Output {
    float4 mPositionNDC : SV_POSITION;
    float3 mRayViewSpace : VIEW_RAY;
    float2 mUV : TEXCOORD;
};

[RootSignature(RS)]
Output main(in const Input input)
{
    Output output;

    output.mUV = gQuadUVs[input.mVertexId];

    // Quad covering screen in NDC space ([-1.0, 1.0] x [-1.0, 1.0] x [0.0, 1.0] x [1.0])
    output.mPositionNDC = float4(2.0f * output.mUV.x - 1.0f,
                                 1.0f - 2.0f * output.mUV.y,
                                 0.0f,
                                 1.0f);

    // Transform current quad corner to view space.
    const float4 ph = mul(output.mPositionNDC,
                          gFrameCBuffer.mInverseProjectionMatrix);
    output.mRayViewSpace = ph.xyz / ph.w;

    return output;
}

Noise texture will be sampled in pixel shader taking into account its dimensions. In our case, we have a screen width of 1920 and screen height of 1080 and our noise texture is 4×4. We also need to generate sample kernel reorientation matrix. We choose to perform the rotation along the fragment’s normal. That is why we need the geometry buffer that stores the normal vectors too. The pixel shader source code is the following

AmbientOcclusionCBuffer.hlsli

#ifndef AMBIENT_OCCLUSION_CBUFFER_H
#define AMBIENT_OCCLUSION_CBUFFER_H

struct AmbientOcclusionCBuffer {
    float mScreenWidth;
    float mScreenHeight;
    uint mSampleKernelSize;
    float mNoiseTextureDimension;
    float mOcclusionRadius;
    float mSsaoPower;
};

#endif

PixelShader

#include <EnvironmentLightPass/Shaders/AmbientOcclusionCBuffer.hlsli>
#include <ShaderUtils/CBuffers.hlsli>
#include <ShaderUtils/Utils.hlsli>

#include "RS.hlsl"

//#define SKIP_AMBIENT_OCCLUSION

struct Input {
    float4 mPositionNDC : SV_POSITION;
    float3 mRayViewSpace : VIEW_RAY;
    float2 mUV : TEXCOORD;
};

ConstantBuffer<FrameCBuffer> gFrameCBuffer : register(b0);
ConstantBuffer<AmbientOcclusionCBuffer> gAmbientOcclusionCBuffer : register(b1);

SamplerState TextureSampler : register (s0);

Texture2D<float4> Normal_SmoothnessTexture : register (t0);
Texture2D<float> DepthTexture : register (t1);
StructuredBuffer<float4> SampleKernelBuffer : register(t2);
Texture2D<float4> NoiseTexture : register (t3);

struct Output {
    float mAmbientAccessibility : SV_Target0;
};

[RootSignature(RS)]
Output main(const in Input input)
{
    Output output = (Output)0;

#ifdef SKIP_AMBIENT_OCCLUSION
    output.mAmbientAccessibility = 1.0f;
#else
    const float2 noiseScale =
        float2(gAmbientOcclusionCBuffer.mScreenWidth / gAmbientOcclusionCBuffer.mNoiseTextureDimension,
               gAmbientOcclusionCBuffer.mScreenHeight / gAmbientOcclusionCBuffer.mNoiseTextureDimension);

    const int3 fragmentPositionNDC = int3(input.mPositionNDC.xy, 0);

    const float fragmentZNDC = DepthTexture.Load(fragmentPositionNDC);
    const float3 rayViewSpace = normalize(input.mRayViewSpace);
    const float4 fragmentPositionViewSpace = float4(ViewRayToViewPosition(rayViewSpace,
                                                                          fragmentZNDC,
                                                                          gFrameCBuffer.mProjectionMatrix),
                                                    1.0f);

    const float2 normal = Normal_SmoothnessTexture.Load(fragmentPositionNDC).xy;
    const float3 normalViewSpace = normalize(Decode(normal));

    // Build a matrix to reorient the sample kerne
    // along current fragment normal vector.
    const float3 noiseVec = NoiseTexture.SampleLevel(TextureSampler,
                                                     noiseScale * input.mUV,
                                                     0).xyz * 2.0f - 1.0f;
    const float3 tangentViewSpace = normalize(noiseVec - normalViewSpace * dot(noiseVec, normalViewSpace));
    const float3 bitangentViewSpace = normalize(cross(normalViewSpace, tangentViewSpace));
    const float3x3 sampleKernelRotationMatrix = float3x3(tangentViewSpace,
                                                         bitangentViewSpace,
                                                         normalViewSpace);

    float occlusionSum = 0.0f;
    for (uint i = 0U; i < gAmbientOcclusionCBuffer.mSampleKernelSize; ++i) {
        // Rotate sample and get sample position in view space
        float4 rotatedSample = float4(mul(SampleKernelBuffer[i].xyz, sampleKernelRotationMatrix), 0.0f);
        float4 samplePositionViewSpace =
            fragmentPositionViewSpace + rotatedSample * gAmbientOcclusionCBuffer.mOcclusionRadius;

        float4 samplePositionNDC = mul(samplePositionViewSpace,
                                       gFrameCBuffer.mProjectionMatrix);
        samplePositionNDC.xy /= samplePositionNDC.w;

        const int2 samplePositionScreenSpace = NdcToScreenSpace(samplePositionNDC.xy,
                                                                0.0f,
                                                                0.0f,
                                                                gAmbientOcclusionCBuffer.mScreenWidth,
                                                                gAmbientOcclusionCBuffer.mScreenHeight);

        const bool isOutsideScreenBorders =
            samplePositionScreenSpace.x < 0.0f ||             samplePositionScreenSpace.x > gAmbientOcclusionCBuffer.mScreenWidth ||
            samplePositionScreenSpace.y < 0.0f ||             samplePositionScreenSpace.y > gAmbientOcclusionCBuffer.mScreenHeight;

        if (isOutsideScreenBorders == false) {
            float sampleZNDC = DepthTexture.Load(int3(samplePositionScreenSpace, 0));

            const float sampleZViewSpace = NdcZToScreenSpaceZ(sampleZNDC,
                                                              gFrameCBuffer.mProjectionMatrix);

            const float rangeCheck =
                abs(fragmentPositionViewSpace.z - sampleZViewSpace) <
                gAmbientOcclusionCBuffer.mOcclusionRadius ? 1.0f : 0.0f;
            occlusionSum += (sampleZViewSpace <= samplePositionViewSpace.z ? 1.0f : 0.0f) * rangeCheck;
        }
    }

    output.mAmbientAccessibility = 1.0f - (occlusionSum / gAmbientOcclusionCBuffer.mSampleKernelSize);
#endif

    // Sharpen the contrast
    output.mAmbientAccessibility = saturate(pow(abs(output.mAmbientAccessibility),
                                                gAmbientOcclusionCBuffer.mSsaoPower));

    return output;
}

Finally, as we mentioned, we need to apply a blurring step to the ambient accessibility buffer to reduce artifacts. In BRE, we have a type of command list recorder named BlurCommandListRecorder that generates a command list related with the blur step and pushes it to the CommandListExecutor to be executed. This command list recorder needs an input texture which is the ambient accessibility texture generated and written by the AmbientOcclusionCommandListRecorder. Its result is the blurred ambient accessibility texture that is consumed by the EnvironmentLightCommandListRecorder. Its implementation is the following

BlurCommandListRecorder.h

#pragma once

#include <CommandManager\CommandListPerFrame.h>
#include <ResourceManager\UploadBuffer.h>

namespace BRE {
///
/// @brief Responsible of recording command lists that apply blur
///
class BlurCommandListRecorder {
public:
    BlurCommandListRecorder() = default;
    ~BlurCommandListRecorder() = default;
    BlurCommandListRecorder(const BlurCommandListRecorder&) = delete;
    const BlurCommandListRecorder& operator=(const BlurCommandListRecorder&) = delete;
    BlurCommandListRecorder(BlurCommandListRecorder&&) = default;
    BlurCommandListRecorder& operator=(BlurCommandListRecorder&&) = default;

    ///
    /// @brief Initializes pipeline state object and root signature
    ///
    /// This method must be called at the beginning of the application, and once
    ///
    static void InitSharedPSOAndRootSignature() noexcept;

    ///
    /// @brief Initialize the recorder
    ///
    /// This method must be called after InitSharedPSOAndRootSignature
    ///
    /// @param ambientAccessibilityBufferShaderResourceView Shader resource view to
    /// the ambient accessibility buffer
    /// @param outputAmbientAccessibilityBufferRenderTargetView Render target view to
    /// the blurred ambient accessibility buffer
    ///
    void Init(const D3D12_GPU_DESCRIPTOR_HANDLE& ambientAccessibilityBufferShaderResourceView,
              const D3D12_CPU_DESCRIPTOR_HANDLE& outputAmbientAccessibilityBufferRenderTargetView) noexcept;

    ///
    /// @brief Records command lists and pushes them into CommandListExecutor
    ///
    /// Init() must be called first
    ///
    /// @return The number of pushed command lists
    ///
    std::uint32_t RecordAndPushCommandLists() noexcept;

    ///
    /// @brief Checks if internal data is valid. Typically, used for assertions
    /// @return True if valid. Otherwise, false
    ///
    bool IsDataValid() const noexcept;

private:
    ///
    /// @brief Initialize blur constant buffer
    ///
    void InitBlurCBuffer() noexcept;

    CommandListPerFrame mCommandListPerFrame;

    D3D12_GPU_DESCRIPTOR_HANDLE mAmbientAccessibilityBufferShaderResourceView{ 0UL };
    D3D12_CPU_DESCRIPTOR_HANDLE mOutputAmbientAccessibilityBufferRenderTargetView{ 0UL };

    UploadBuffer* mBlurUploadCBuffer{ nullptr };
};
}

BlurCommandListRecorder.cpp

#include "BlurCommandListRecorder.h"

#include <d3d12.h>
#include <DirectXMath.h>

#include <AmbientOcclusionPass\AmbientOcclusionSettings.h>
#include <AmbientOcclusionPass\Shaders\BlurCBuffer.h>
#include <ApplicationSettings\ApplicationSettings.h>
#include <CommandListExecutor\CommandListExecutor.h>
#include <DescriptorManager\CbvSrvUavDescriptorManager.h>
#include <PSOManager/PSOManager.h>
#include <ResourceManager/UploadBufferManager.h>
#include <RootSignatureManager\RootSignatureManager.h>
#include <ShaderManager\ShaderManager.h>
#include <Utils/DebugUtils.h>

namespace BRE {
// Root Signature:
// "CBV(b0, visibility = SHADER_VISIBILITY_PIXEL), " \ 0 -> Blur CBuffer
// "DescriptorTable(SRV(t0), visibility = SHADER_VISIBILITY_PIXEL)" 1 -> Color Buffer Texture

namespace {
ID3D12PipelineState* sPSO{ nullptr };
ID3D12RootSignature* sRootSignature{ nullptr };
}

void
BlurCommandListRecorder::InitSharedPSOAndRootSignature() noexcept
{
    BRE_ASSERT(sPSO == nullptr);
    BRE_ASSERT(sRootSignature == nullptr);

    PSOManager::PSOCreationData psoData{};
    psoData.mDepthStencilDescriptor = D3DFactory::GetDisabledDepthStencilDesc();

    psoData.mPixelShaderBytecode = ShaderManager::LoadShaderFileAndGetBytecode("AmbientOcclusionPass/Shaders/Blur/PS.cso");
    psoData.mVertexShaderBytecode = ShaderManager::LoadShaderFileAndGetBytecode("AmbientOcclusionPass/Shaders/Blur/VS.cso");

    ID3DBlob* rootSignatureBlob = &ShaderManager::LoadShaderFileAndGetBlob("AmbientOcclusionPass/Shaders/Blur/RS.cso");
    psoData.mRootSignature = &RootSignatureManager::CreateRootSignatureFromBlob(*rootSignatureBlob);
    sRootSignature = psoData.mRootSignature;

    psoData.mNumRenderTargets = 1U;
    psoData.mRenderTargetFormats[0U] = DXGI_FORMAT_R16_UNORM;
    for (std::size_t i = psoData.mNumRenderTargets; i < _countof(psoData.mRenderTargetFormats); ++i) {         psoData.mRenderTargetFormats[i] = DXGI_FORMAT_UNKNOWN;     }     psoData.mPrimitiveTopologyType = D3D12_PRIMITIVE_TOPOLOGY_TYPE_TRIANGLE;     sPSO = &PSOManager::CreateGraphicsPSO(psoData);     BRE_ASSERT(sPSO != nullptr);     BRE_ASSERT(sRootSignature != nullptr); } void BlurCommandListRecorder::Init(const D3D12_GPU_DESCRIPTOR_HANDLE& ambientAccessibilityBufferShaderResourceView,                               const D3D12_CPU_DESCRIPTOR_HANDLE& outputAmbientAccessibilityBufferRenderTargetView) noexcept {     BRE_ASSERT(IsDataValid() == false);     mAmbientAccessibilityBufferShaderResourceView = ambientAccessibilityBufferShaderResourceView;     mOutputAmbientAccessibilityBufferRenderTargetView = outputAmbientAccessibilityBufferRenderTargetView;     InitBlurCBuffer();     BRE_ASSERT(IsDataValid()); } std::uint32_t BlurCommandListRecorder::RecordAndPushCommandLists() noexcept {     BRE_ASSERT(IsDataValid());     BRE_ASSERT(sPSO != nullptr);     BRE_ASSERT(sRootSignature != nullptr);     ID3D12GraphicsCommandList& commandList = mCommandListPerFrame.ResetCommandListWithNextCommandAllocator(sPSO);     commandList.RSSetViewports(1U, &ApplicationSettings::sScreenViewport);     commandList.RSSetScissorRects(1U, &ApplicationSettings::sScissorRect);     commandList.OMSetRenderTargets(1U,                                    &mOutputAmbientAccessibilityBufferRenderTargetView,                                    false,                                    nullptr);     ID3D12DescriptorHeap* heaps[] = { &CbvSrvUavDescriptorManager::GetDescriptorHeap() };     commandList.SetDescriptorHeaps(_countof(heaps), heaps);     const D3D12_GPU_VIRTUAL_ADDRESS blurCBufferGpuVAddress(         mBlurUploadCBuffer->GetResource().GetGPUVirtualAddress());

    commandList.SetGraphicsRootSignature(sRootSignature);
    commandList.SetGraphicsRootConstantBufferView(0U, blurCBufferGpuVAddress);
    commandList.SetGraphicsRootDescriptorTable(1U, mAmbientAccessibilityBufferShaderResourceView);

    commandList.IASetPrimitiveTopology(D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST);
    commandList.DrawInstanced(6U, 1U, 0U, 0U);

    commandList.Close();
    CommandListExecutor::Get().PushCommandList(commandList);

    return 1U;
}

bool
BlurCommandListRecorder::IsDataValid() const noexcept
{
    const bool result =
        mAmbientAccessibilityBufferShaderResourceView.ptr != 0UL &&
        mOutputAmbientAccessibilityBufferRenderTargetView.ptr != 0UL;

    return result;
}

void
BlurCommandListRecorder::InitBlurCBuffer() noexcept
{
    const std::size_t blurUploadCBufferElemSize =
        UploadBuffer::GetRoundedConstantBufferSizeInBytes(sizeof(BlurCBuffer));

    mBlurUploadCBuffer = &UploadBufferManager::CreateUploadBuffer(blurUploadCBufferElemSize,
                                                                  1U);
    BlurCBuffer blurCBuffer(AmbientOcclusionSettings::sNoiseTextureDimension);

    mBlurUploadCBuffer->CopyData(0U, &blurCBuffer, sizeof(BlurCBuffer));
}
}

BlurCBuffer.h

#pragma once

#include <EnvironmentLightPass\EnvironmentLightSettings.h>

namespace BRE {
///
/// @brief Blur constant buffer
///
struct BlurCBuffer {
    BlurCBuffer() = default;

    ///
    /// @brief BlurCBuffer constructor
    /// @param noiseTextureDimension Noise vectors texture dimensions (e.g. 4 (4x4), 8 (8x8))
    /// These vectors where used in ambient occlusion algorithm
    ///
    BlurCBuffer(const std::uint32_t noiseTextureDimension)
        : mNoiseTextureDimension(noiseTextureDimension)
    {

    }

    ~BlurCBuffer() = default;
    BlurCBuffer(const BlurCBuffer&) = default;
    BlurCBuffer(BlurCBuffer&&) = default;
    BlurCBuffer& operator=(BlurCBuffer&&) = default;

    std::uint32_t mNoiseTextureDimension{ EnvironmentLightSettings::sNoiseTextureDimension };
};
}

Regarding shaders implementation, we use vertex and pixel shaders. We send a triangle list of 2 triangles, and in the vertex shader, we expand these triangles to match a full-screen quad dimensions in NDC space. Its implementation is the following

VertexShader

#include <ShaderUtils/CBuffers.hlsli>

#include "RS.hlsl"

struct Input {
    uint mVertexId : SV_VertexID;
};

static const float2 gQuadUVs[6] = {
    float2(0.0f, 1.0f),
    float2(0.0f, 0.0f),
    float2(1.0f, 0.0f),
    float2(0.0f, 1.0f),
    float2(1.0f, 0.0f),
    float2(1.0f, 1.0f)
};

ConstantBuffer<FrameCBuffer> gFrameCBuffer : register(b0);

struct Output {
    float4 mPositionNDC : SV_POSITION;
    float2 mUV : TEXCOORD;
};

[RootSignature(RS)]
Output main(in const Input input)
{
    Output output;

    output.mUV = gQuadUVs[input.mVertexId];

    // Quad covering screen in NDC space ([-1.0, 1.0] x [-1.0, 1.0] x [0.0, 1.0] x [1.0])
    output.mPositionNDC = float4(2.0f * output.mUV.x - 1.0f,
                                 1.0f - 2.0f * output.mUV.y,
                                 0.0f,
                                 1.0f);

    return output;
}

In the pixel shader, we sample the ambient accessibility texture and apply the blur step. The source code is the following

BlurCBuffer.hlsli

#ifndef BLUR_CBUFFER_H
#define BLUR_CBUFFER_H

struct BlurCBuffer {
    uint mNoiseTextureDimension;
};

#endif

PixelShader

#include <EnvironmentLightPass/Shaders/BlurCBuffer.hlsli>
#include <ShaderUtils/Utils.hlsli>

#include "RS.hlsl"

#define SKIP_BLUR

struct Input {
    float4 mPositionNDC : SV_POSITION;
    float2 mUV : TEXCOORD;
};

ConstantBuffer<BlurCBuffer> gBlurCBuffer : register(b0);

SamplerState TextureSampler : register (s0);
Texture2D<float> BufferTexture : register(t0);

struct Output {
    float mColor : SV_Target0;
};

[RootSignature(RS)]
Output main(const in Input input)
{
    Output output = (Output)0;

#ifdef SKIP_BLUR
    const int3 fragmentScreenSpace = int3(input.mPositionNDC.xy, 0);
    output.mColor = BufferTexture.Load(fragmentScreenSpace);
#else
    float w;
    float h;
    BufferTexture.GetDimensions(w, h);
    const float2 texelSize = 1.0f / float2(w, h);
    float result = 0.0f;
    const float hlimComponent = -float(gBlurCBuffer.mNoiseTextureDimension) * 0.5f + 0.5f;
    const float2 hlim = float2(hlimComponent, hlimComponent);
    for (uint i = 0U; i < gBlurCBuffer.mNoiseTextureDimension; ++i) {
        for (uint j = 0U; j < gBlurCBuffer.mNoiseTextureDimension; ++j) {
            const float2 offset = (hlim + float2(float(i), float(j))) * texelSize;
            result += BufferTexture.Sample(TextureSampler,
                                           input.mUV + offset).r;
        }
    }

    output.mColor = result / float(gBlurCBuffer.mNoiseTextureDimension * gBlurCBuffer.mNoiseTextureDimension);
#endif

    return output;
}

Future Work

  • Try different ambient occlusion techniques
Advertisements

One thought on “BRE Architecture Series Part 11 – Ambient Occlusion Pass

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s