This sample app demonstrates how to use indirect command buffers (ICB) to issue rendering instructions from the GPU. When you have a rendering algorithm that runs in a compute kernel, use ICBs to generate draw calls based on your algorithm’s results. The sample app uses a compute kernel to remove invisible objects submitted for rendering, and generates draw commands only for the objects currently visible in the scene.
Without ICBs, you couldn’t submit rendering commands on the GPU. Instead, the CPU would wait for your compute kernel’s results before generating the render commands. Then, the GPU would wait for the rendering commands to make it across the CPU to GPU bridge, which amounts to a round trip slow path as seen in the following diagram:
The sample code project, Encoding Indirect Command Buffers on the CPU introduces ICBs by creating a single ICB to reuse its commands every frame. While the former sample saved expensive command-encoding time by reusing commands, this sample uses ICBs to effect a GPU-driven rendering pipeline.
The techniques shown by this sample include issuing draw calls from the GPU, and the process of executing a select set of draws.
This project contains targets for macOS and iOS. Run the iOS scheme on a physical device because Metal isn’t supported in the simulator.
The sample uses MTLDebugComputeCommandEncoderdispatchThreads:threadsPerThreadgroup: which is supported by GPUs of family greater than or equal to:
Define the Data Read by the ICB
In an ideal scenario, you store each mesh in its own buffer. However, on iOS, kernels running on the GPU can only access a limited number of data buffers per execution. To reduce the number of buffers needed during the ICBs execution, you pack all meshes into a single buffer at varying offsets. Then, use another buffer to store the offset and size of each mesh. The process to do this follows.
At initialization, create the data for each mesh:
Count the individual and accumulated mesh sizes and create the container buffer:
Finally, insert each mesh into the container buffer while noting its offset and size in the second buffer:
Update the Data Read by the ICB Dynamically
By culling non-visible vertices from the data fed to the rendering pipeline, you save significant rendering time and effort. To do that, use the same compute kernel that encodes the ICB’s commands to continually update the ICB’s data buffers:
The parallel nature of the GPU partitions the compute task for you, resulting in multiple offscreen meshes getting culled concurrently.
Pass an ICB to a Compute Kernel Using an Argument Buffer
To get an ICB on the GPU and make it accessible to a compute kernel, you pass it through an argument buffer, as follows:
Define the container argument buffer as a structure that contains one member, the ICB:
Encode the ICB into the argument buffer:
Pass the ICB (_indirectCommandBuffer) to the kernel by setting the argument buffer on the kernel’s compute command encoder:
Because you pass the ICB through an argument buffer, standard argument buffer rules apply. Call useResource on the ICB to tell Metal to prepare its use:
Encode and Optimize ICB Commands
Reset the ICB’s commands to their initial before beginning encoding:
Encode the ICB’s commands by dispatching the compute kernel:
Optimize your ICB commands to remove empty commands or redundant state by calling optimizeIndirectCommandBuffer:withRange::
This sample optimizes ICB commands because redundant state results from the kernel setting a buffer for each draw, and encoding empty commands for each invisible object. By removing the empty commands, you can free up a significant number of blank spaces in the command buffer that Metal would otherwise spend time skipping at runtime.
Execute the ICB
Draw the onscreen meshes by calling executeCommandsInBuffer on your render command encoder:
While you can encode an ICB’s commands in a compute kernel, you call executeCommandsInBuffer from your host app to encode a single command that contains all of the commands encoded by the compute kernel. By doing this, you choose the queue and buffer that the ICB’s commands go into. When you call executeIndirectCommandBuffer determines the placement of the ICB’s commands among any other commands you may also encode in the same buffer.