This sample app demonstrates how to use indirect command buffers (ICB) to issue rendering instructions from the GPU. When you have a rendering algorithm that runs in a compute kernel, use ICBs to generate draw calls based on your algorithm’s results. This sample app uses a compute kernel to cull invisible objects from its rendering submission, thereby generating draw commands for only the objects that are currently visible in the scene.
Without ICBs, you couldn’t submit rendering commands on the GPU. Instead, the CPU would wait for your compute kernel’s results before generating rendering commands. Then, the GPU would wait for the rendering commands to make it across the CPU to GPU bridge, which amounts to a round trip slow path as seen in the following diagram:
The Encoding Indirect Command Buffers on the CPU sample code introduces ICBs by creating a single ICB to reuse its commands every frame. So, while the former sample saved expensive command encoding time by reusing commands, this sample uses ICBs to effect a GPU-driven rendering pipeline.
Define the Data Read by the ICB
Ideally, you store each mesh in its own buffer but on iOS, kernels running on the GPU are limited to a small number of data buffers they can access per execution. To reduce the number of buffers needed during the ICBs execution, you pack all meshes into a single buffer at varying offsets. Then, use another buffer to store the offset and size of each mesh. The process to do this follows.
At initialization, create the data for each mesh:
Count the individual and accumulated mesh sizes and create the container buffer:
Finally, insert each mesh into the container buffer while noting its offset and size in the second buffer:
Update the Data Read by the ICB Dynamically
The important techniques shown by this sample not only include issuing draw calls from the GPU, but responding to runtime conditions to execute a select set of draws. By culling non-visible vertices out of the data being fed through the rendering pipeline, you save significant rendering time and effort. To do that, use the same compute kernel that encodes the ICB’s commands to continually update the ICB’s data buffers:
The parallel nature of the GPU partitions the compute task for you, resulting in multiple offscreen meshes getting culled concurrently.
Pass an ICB to a Compute Kernel Using an Argument Buffer
To get an ICB on the GPU and make it accessible to a compute kernel, you pass it through an argument buffer, as follows:
Define the container argument buffer. It’s a struct that contains one member, the ICB:
Encode the ICB into the argument buffer:
Pass the ICB (_indirectCommandBuffer) to the kernel by setting the argument buffer on to the kernel’s compute command encoder:
Because the ICB is passed through an argument buffer, standard argument buffer rules apply. Call useResource on the ICB to tell Metal to prepare its use:
Encode and Optimize ICB Commands
Encode the ICB’s commands by dispatching the compute kernel:
Optimize your ICB commands to remove empty commands or redundant state by calling optimizeIndirectCommandBuffer:withRange::
This sample optimizes because redundant state results from the kernel setting a buffer for each draw, and encoding empty commands for each invisible object. Optimizing out the empty commands frees up a significant number of blank spaces in the command buffer that Metal would otherwise spend time skipping at runtime.
Execute the ICB
Draw the onscreen meshes by calling executeCommandsInBuffer on your render command encoder:
While you can encode an ICB’s commands in a compute kernel, you call executeCommandsInBuffer from your host app to encode a single command that contains all of the commands the compute kernel encoded. By doing this, you choose the queue and buffer that the ICB’s commands go into. The time that you call executeIndirectCommandBuffer determines the placement of the ICB’s commands among any other commands you may also encode in the same buffer.