**lucerna-dev showcase 0** lucerna is the vulkan renderer I started working on the _10th of July 2024_ to learn graphics programming and vulkan. my goals for this project is to make a cool **scene renderer**, not a full game engine. To explore cool tech while keeping it as simple and fast as possible I placed the following constraints on myself for the project: - can only handle one scene at a time (from gLTF), drop everything and rebuild to switch scenes - only support one graphics API - only support modernish hardware & crash on missing extensions - dont overlap with future planned projects - no bikeshedding allowed! Here is a overview of the project so far. Foundations === The foundations of the engine are taken from [vkguide](https://vkguide.dev), with the following big changes. Bindless textures through _descriptor indexing_, this allowed me to greatly reduce the complexity of the rendering, by removing the code that handled grouping materials together and creating the descriptor sets with the correctly bound textures. A fun trick you can do with bindless textures and samplers is storing both in a uint32_t and using bit shifting to unpack the index. ```glsl // this allows for 16,777,216 textures and 256 samplers uint texture = mat.albedo & 0x00FFFFFF; uint sampler = mat.albedo >> 24; ``` This allows the material system to be very generic and extensible, being a simple struct that holds indices into the descriptor array. ```glsl struct StandardMaterial { vec3 modulate; vec3 emission; uint albedo; uint flags; float emission_strength; }; ``` The rendering backend is based on the [writing an efficient vulkan renderer](https://zeux.io/2020/02/27/writing-an-efficient-vulkan-renderer/) blog. Rendering is based around DrawData struct that contains the indices to retrieve all the data you need for each draw from big scene global buffers, (eg. transform buffer, material buffer, etc...) DrawData entries are stored in DrawSets that represent a unique pipeline state or view and can be culled and compacted independently. Currently there are only two draw sets, opaque and transparent, but at some point I might implement a draw set for each cascade in CSM or add a set for double sided geometry, etc... The actual drawing is rendering is done through one single indirect draw call for each set. I spent a lot of time working on stream compaction to add gpu driven culling to the engine. For every draw call you determine if its culled or not, I just have frustum culling but you could add other forms of culling like hiz culling on top. From an array of 0(culled) and 1(visiible) you can use exclusive parallel prefix sum to find out the position in the compacted array where the visible draw calls should we placed. ``` // eg. // predicate array 0 0 0 [1] [1] 0 [1] 0 [1] // write index array 0 0 0 [0] [1] 1 [2] 3 [3] ``` I used subgroup intrinsics to try and optimise this, as they enable highly-efficient sharing and manipulation of data between multiple tasks running in parallel on a GPU if they are in the same subgroup, the following resources were helpful. - [vulkan subgroup explained](https://www.khronos.org/assets/uploads/developers/library/2018-vulkan-devday/06-subgroups.pdf) - [vulkan subgroup tutorial](https://www.khronos.org/blog/vulkan-subgroup-tutorial) - [prefix sum on vulkan](https://raphlinus.github.io/gpu/2020/04/30/prefix-sum.html) - [parallel reduce and scan on the gpu](https://cachemiss.xyz/blog/parallel-reduce-and-scan-on-the-GPU) - [parallel prefix sum with cuda](https://developer.nvidia.com/gpugems/gpugems3/part-vi-gpu-computing/chapter-39-parallel-prefix-sum-scan-cuda) - [stream compaction using wave intrinsics](https://interplayoflight.wordpress.com/2022/12/25/stream-compaction-using-wave-intrinsics/) To compact large arrays, I implemented the method described in *parallel prefix sum with cuda* and shown in Figure 1. ![Figure [parallel prefix sum]: Arrays of Arbitary Size](large_arrays.jpg) An alterantive to doing this stream compaction is to set the instanceCount to 0 and keep the draw call in the indirect command buffer, this causes the gpu to attempt to render and then cancel which can be slow on some architectures, due to my implementation of stream compaction being really slow, this naive solution was faster for me but I already implemented stream compaction so its staying, and I will try and optimise it in the future. Atomics could also work but its not ideal as its not stable and inconsistent ordering can lead to subtle issues. Effects === Directional Shadows --- ![Directional Shadows](shadow.png) Basic Shadow Mapping with PCF using Interleaved Gradient Noise. Bloom --- ![Bloom](bloom.png) Call of Duty:Advanced Warfare Bloom To improve my implementation I want to use shared memory as its running in a compute shader to improve memory efficiency. SSAO --- ![Screen Space Ambient Oclussion](ssao.png) I also want to replace the blurring filter with something better than box blur. Miscellaneous === Editor --- The editor is using imgui docking branch and has support to display bindless textures through a custom backend that uses the same global texture descriptor indexing as the rest of the engine. SPIR-V Reflection --- Although its not complete, the newest feature I am working on is parsing a SPIRV binary to obtain the layout automatically, removing all the code that hardcodes this and frequently leads to bugs when I try and change the layout. By creating the layout from the SPRIV I can make sure alignment and size between device and host is kept the same for all the data exchange. This would also allow me to remove duplicate declarations of input and output structure and create them programatically. Its also possible to have a common definition for structures by abusing shader includes and abusing the shared syntax for structs and macros between C and glsl as described in [glsl made shrimple](https://graphics-programming.org/blog/glsl-development-made-shrimple). Closing Remarks --- Its been a lot of work getting up until this point, I wasted a lot of time fighting the build system and adding libraries, trying out lunarvim, and refusing to use vkbootstrap. 1/10 of the development effort was probably used up just getting the first triangle up and running. Regardless I made it this far. Some stuff id like to attempt before I move onto something else are: - Order Independent Transparency - Performant & Improved SSAO, Bloom - Better Shadows & More types - Lights (maybe clustered) - Texture Compression - MSAA - Basic PBR - HiZ culling - Finish SPIR-V Reflection Maybe Baked Lighting, Skinned Meshes, CSM. thanks for reading!