Demonstrates how to encode an indirect command buffer with the CPU or GPU, and then execute its commands in a separate render command encoder.
- iOS 12.0+
- macOS 10.14+
- Xcode 10.0+
An indirect command buffer represents a group of commands that can be encoded with the CPU or GPU and executed repeatedly across multiple render command encoders. You use indirect command buffers to reduce CPU overhead, simplify command organization, and implement GPU-driven pipelines.
In this sample, you’ll learn how to define, create, encode, and execute commands in an indirect command buffer. In particular, you’ll learn about the different Metal API and Metal shading language features that you use to encode indirect command buffers. The sample renders a series of 16 shapes using two vertex buffers and a draw call encoded into an indirect command buffer.
CPU Overhead and Indirect Command Buffers
Metal commands are efficient, incurring minimal CPU overhead when apps access the GPU. Each command does incur some overhead, however; to further reduce the amount, use these strategies:
Do more GPU work with fewer CPU commands.
Avoid repeating expensive CPU commands.
Metal’s indirect command buffer feature reduces the number and performance cost of CPU commands in your app’s critical path (for example, in your app’s render loop). An indirect command buffer allows you to group and encode multiple commands within a single object, instead of encoding each command individually.
By encoding indirect command buffers with the CPU, you can shift a significant amount of CPU overhead from your app’s critical path to its initial setup. By encoding indirect command buffers with the GPU, you can avoid expensive synchronization overhead between CPU encoding and GPU execution.
Individual Commands Versus Indirect Command Buffers
Metal apps, particularly games, typically contain multiple render commands, each associated with a set of render states, buffers, and draw calls. To execute these commands for a render pass, apps first encode them into a render command encoder within a command buffer.
You encode individual commands into a render command encoder by calling
MTLRender methods such as
Encoding numerous commands individually can become tedious and expensive, especially for large apps or games. Instead, you can encode related commands into an indirect command buffer and then set that entire object as a single command for a render pass. This approach greatly reduces CPU overhead and still encodes the intended commands for GPU execution.
Indirect command buffers are represented as
MTLIndirect objects. You encode them into a render command encoder by calling the
execute method for each indirect command buffer.
Define Render Commands and Inherited Render State
For the indirect command buffer,
_icb, the sample defines render commands that:
Set a vertex buffer (using unique vertex data for each shape)
Set another vertex buffer (using common transformation data for all shapes)
Draw a triangle strip
The sample encodes these commands differently for the CPU or the GPU. However, these commands are still encoded into both versions of the indirect command buffer.
The sample also allows
_icb to inherit the render pipeline state from its parent encoder,
_icb implicitly inherits any render state that can’t be encoded into it, such as the cull mode and depth or stencil state for the render pass.
Create an Indirect Command Buffer
The sample creates
_icb from a
MTLIndirect, which defines the features and limits of an indirect command buffer.
The sample specifies the types of commands,
command, and the maximum number of commands,
max, so that Metal reserves enough space in memory for the sample to encode
_icb successfully (with the CPU or GPU).
Encode an Indirect Command Buffer with the CPU
From the CPU, the sample encodes commands into
_icb with a
MTLIndirect object. For each shape to be rendered, the sample encodes two
set commands and one
The sample performs this encoding only once, before encoding any subsequent render commands.
_icb contains a total of 16 draw calls, one for each shape to be rendered. Each draw call references the same transformation data,
_uniform, but different vertex data,
_vertex. Although the CPU encodes data only once, the sample issues 16 draw calls per frame.
_icb inherits its render state from its parent encoder, it implicitly inherits the
MTLCompare value set for the depth comparison function of the render pipeline.
Therefore, all 16 draw calls are executed in a given render pass, but only one draw call passes the depth test (determined by the depth value of each shape).
Encode an Indirect Command Buffer with the GPU
An indirect command buffer can be passed to a graphics or compute function only through an argument buffer, as a scalar or within an array. Within the argument buffer structure, an indirect command buffer occupies a specific index value,
[[id(n)]], just like any other argument does.
The sample encodes
_kernel, which is defined with the
arguments structure where
AAPLArgument is the index value for the indirect command buffer.
Additionally, the sample calls the
use method to indicate that the GPU can access
The sample passes
_kernel, which contains
_icb, to the
kernel kernel function.
From the GPU, the sample encodes commands into
cmd, with a
render structure. For each shape to be rendered, the sample executes the
kernel kernel function. In this function, if the depth value of the current shape matches the depth value of the current frame, the sample encodes two
set commands and one
The sample performs this encoding once every 16 frames, in order to encode different data into the indirect command buffer for those frames.
_icb contains a single draw call for the shape to be rendered. This draw call references the transformation data in
uniforms and the vertex data in
vertex. Although the GPU encodes data multiple times, the sample issues only one draw call per frame.
Optimize an Indirect Command Buffer
Because you set a buffer and render pipeline state for each draw command, some of this state can be redundant. After encoding commands, the sample calls the
optimize method to encode a blit operation that optimizes the contents of
_icb. When Metal executes this operation, it attempts to remove redundant state within a specific range of the indirect command buffer. This optimization reduces the associated execution costs of the redundant state.
If the sample encodes
_icb with the CPU, it then encodes the optimization operation in a separate command buffer.
If the sample encodes
_icb with the GPU, it also encodes the optimization operation in the same command buffer.
Execute an Indirect Command Buffer
The sample calls the
execute method to execute the commands in
_icb was encoded with the CPU or GPU.
Similar to the arguments in an argument buffer, the sample calls the
use method to indicate that the GPU can access the resources within an indirect command buffer.
The sample continues to execute
_icb multiple times in subsequent render command encoders, as long as
_icb contains valid commands to be executed.