Sample Code

Basic Indirect Command Buffers

Demonstrates how to encode an indirect command buffer with the CPU or GPU, and then execute its commands in a separate render command encoder.



An indirect command buffer represents a group of commands that can be encoded with the CPU or GPU and executed repeatedly across multiple render command encoders. You use indirect command buffers to reduce CPU overhead, simplify command organization, and implement GPU-driven pipelines.

In this sample, you’ll learn how to define, create, encode, and execute commands in an indirect command buffer. In particular, you’ll learn about the different Metal API and Metal shading language features that you use to encode indirect command buffers. The sample renders a series of 16 shapes using a vertex buffer, uniform buffer, and draw call encoded into an indirect command buffer.

CPU Overhead and Indirect Command Buffers

Metal commands are efficient, incurring minimal CPU overhead when apps access the GPU. Each command does incur some overhead, however; to further reduce the amount, use these strategies:

  • Do more GPU work with fewer CPU commands.

  • Avoid repeating expensive CPU commands.

Metal’s indirect command buffer feature reduces the number and performance cost of CPU commands in your app’s critical path (for example, in your app’s render loop). An indirect command buffer allows you to group and encode multiple commands within a single object, instead of encoding each command individually.

By encoding indirect command buffers with the CPU, you can shift a significant amount of CPU overhead from your app’s critical path to its initial setup. By encoding indirect command buffers with the GPU, you can avoid expensive synchronization overhead between CPU encoding and GPU execution.

Individual Commands Versus Indirect Command Buffers

Metal apps, particularly games, typically contain multiple render commands, each associated with a set of render states, buffers, and draw calls. To execute these commands for a render pass, apps first encode them into a render command encoder within a command buffer.

You encode individual commands into a render command encoder by calling MTLRenderCommandEncoder methods such as setVertexBuffer:offset:atIndex: or drawPrimitives:vertexStart:vertexCount:vertexCount:instanceCount:baseInstance:.

Layout diagram that shows render commands encoded individually.

Encoding numerous commands individually can become tedious and expensive, especially for large apps or games. Instead, you can encode related commands into an indirect command buffer and then set that entire object as a single command for a render pass. This approach greatly reduces CPU overhead and still encodes the intended commands for GPU execution.

Indirect command buffers are represented as MTLIndirectCommandBuffer objects. You encode them into a render command encoder by calling the executeCommandsInBuffer:withRange: method for each indirect command buffer.

Layout diagram that shows render commands encoded as grouped commands within an indirect command buffer, which is encoded as an individual command.

Define Render Commands and Inherited Render State

For the indirect command buffer, _icb, the sample defines render commands that:

  1. Set the vertex buffer (using unique vertex data for each shape)

  2. Set the uniform buffer (with common transformation data for all shapes)

  3. Draw a triangle strip

The sample encodes these commands differently for the CPU or the GPU. However, these commands are still encoded into both versions of the indirect command buffer.

The sample also allows _icb to inherit the render pipeline state from its parent encoder, renderEncoder. Furthermore, _icb implicitly inherits any render state that can’t be encoded into it, such as the cull mode and depth or stencil state for the render pass.

Create an Indirect Command Buffer

The sample creates _icb from a MTLIndirectCommandBufferDescriptor, which defines the features and limits of an indirect command buffer.

MTLIndirectCommandBufferDescriptor* icbDescriptor = [[MTLIndirectCommandBufferDescriptor alloc] init];

icbDescriptor.commandTypes = MTLIndirectCommandTypeDraw;
icbDescriptor.inheritPipelineState = TRUE;
icbDescriptor.inheritBuffers = FALSE;
icbDescriptor.maxVertexBufferBindCount = 2;
icbDescriptor.maxFragmentBufferBindCount = 0;

_icb = [_device newIndirectCommandBufferWithDescriptor:icbDescriptor maxCommandCount:AAPLNumShapes options:0];

The sample specifies the types of commands, commandTypes, and the maximum number of commands, maxCount, so that Metal reserves enough space in memory for the sample to encode _icb successfully (with the CPU or GPU).

Encode an Indirect Command Buffer with the CPU

From the CPU, the sample encodes commands into _icb with a MTLIndirectRenderCommand object. For each shape to be rendered, the sample encodes two setVertexBuffer:offset:atIndex: commands and one drawPrimitives:vertexStart:vertexCount:instanceCount:baseInstance: command.

for (int indx = 0; indx < AAPLNumShapes; indx++)
    NSUInteger vertexCount = _vertexBuffer[indx].length/sizeof(AAPLVertex);

    id<MTLIndirectRenderCommand> cmd = [_icb indirectRenderCommandAtIndex:indx];

    [cmd setVertexBuffer:_vertexBuffer[indx] offset:0 atIndex:AAPLBufferIndexVertices];
    [cmd setVertexBuffer:_uniformBuffers offset:0 atIndex:AAPLBufferIndexUniforms];

    [cmd drawPrimitives:MTLPrimitiveTypeTriangleStrip

The sample performs this encoding only once, before encoding any subsequent render commands. _icb contains a total of 16 draw calls, one for each shape to be rendered. Each draw call references the same uniform data, _uniformBuffers, but different vertex data, _vertexBuffers[indx]. Although the CPU encodes data only once, the sample issues 16 draw calls per frame.

Layout diagram that shows the commands encoded into an indirect command buffer with the CPU.

Because _icb inherits its render state from its parent encoder, it implicitly inherits the MTLCompareFunctionEqual value set for the depth comparison function of the render pipeline.

MTLDepthStencilDescriptor *depthStateDesc = [[MTLDepthStencilDescriptor alloc] init];
depthStateDesc.depthCompareFunction = MTLCompareFunctionEqual;
depthStateDesc.depthWriteEnabled = YES;
_depthState = [_device newDepthStencilStateWithDescriptor:depthStateDesc];
[renderEncoder setDepthStencilState:_depthState];

Therefore, all 16 draw calls are executed in a given render pass, but only one draw call passes the depth test (determined by the depth value of each shape).

Encode an Indirect Command Buffer with the GPU

An indirect command buffer can be passed to a graphics or compute function only through an argument buffer, as a scalar or within an array. Within the argument buffer structure, an indirect command buffer occupies a specific index value, [[id(n)]], just like any other argument does.

The sample encodes _icb into _kernelShaderArgumentBuffer, which is defined with the arguments structure where AAPLArgumentBufferIDICB is the index value for the indirect command buffer.

typedef struct arguments {
    command_buffer cmd_buffer [[ id(AAPLArgumentBufferIDICB) ]];
    constant AAPLUniforms * uniforms [[ id(AAPLArgumentBufferIDUniformBuffer) ]];
    float depth [[ id(AAPLArgumentBufferIDDepth) ]];
    array<device float *, AAPLNumShapes> vertex_buffers [[ id(AAPLArgumentBufferIDVertexBuffer) ]];
    array<uint8_t, AAPLNumShapes> vertex_num [[ id(AAPLArgumentBufferIDVertexNumBuffer) ]];
} arguments;
[argumentEncoder setArgumentBuffer:_kernelShaderArgumentBuffer offset:0];
[argumentEncoder setIndirectCommandBuffer:_icb atIndex:AAPLArgumentBufferIDICB];

Additionally, the sample calls the useResource:usage: method to indicate that the GPU can access _icb.

[computeEncoder useResource:_icb usage:MTLResourceUsageWrite];

The sample passes _kernelShaderArgumentBuffer, which contains _icb, to the kernelShader kernel function.

[computeEncoder setBuffer:_kernelShaderArgumentBuffer
kernel void kernelShader(uint cmd_idx [[ thread_position_in_threadgroup ]],
                         device arguments &args [[ buffer(AAPLVertexBufferIndexArgument) ]])

From the GPU, the sample encodes commands into _icb, through cmd_buffer, with a render_command structure. For each shape to be rendered, the sample executes the kernelShader kernel function. In this function, if the depth value of the current shape matches the depth value of the current frame, the sample encodes two set_vertex_buffer commands and one draw_primitives command.

render_command cmd(args.cmd_buffer, cmd_idx);
if (args.depth == (((device AAPLVertex *)args.vertex_buffers[cmd_idx])[0].position.z) / AAPLNumShapes)
    cmd.set_vertex_buffer(args.vertex_buffers[cmd_idx], AAPLBufferIndexVertices);
    cmd.set_vertex_buffer(args.uniforms, AAPLBufferIndexUniforms);
    cmd.draw_primitives(primitive_type::triangle_strip, 0, args.vertex_num[cmd_idx], 1, 0);

The sample performs this encoding once every 16 frames, in order to encode different data into the indirect command buffer for those frames. _icb contains a single draw call for the shape to be rendered. This draw call references the uniform data in uniforms and the vertex data in vertex_buffers[cmd_idx]. Although the GPU encodes data multiple times, the sample issues only one draw call per frame.

Layout diagram that shows the commands encoded into an indirect command buffer with the GPU.

Execute an Indirect Command Buffer

The sample calls the executeCommandsInBuffer:withRange: method to execute the commands in _icb, whether _icb was encoded with the CPU or GPU.

[renderEncoder executeCommandsInBuffer:_icb withRange:NSMakeRange(0, AAPLNumShapes)];

Similar to the arguments in an argument buffer, the sample calls the useResource:usage: method to indicate that the GPU can access the resources within an indirect command buffer.

[renderEncoder useResource:_uniformBuffers usage:MTLResourceUsageRead];
[renderEncoder useResource:_vertexBuffer[_currentFrameIndex] usage:MTLResourceUsageRead];

The sample continues to execute _icb multiple times in subsequent render command encoders, as long as _icb contains valid commands to be executed.

See Also

Fundamental Lessons

Devices and Commands

Demonstrates how to access and interact with the GPU.

Hello Triangle

Demonstrates how to render a simple 2D triangle.

Basic Buffers

Demonstrates how to manage hundreds of vertices with a vertex buffer.

Basic Texturing

Demonstrates how to load image data and texture a quad.

Hello Compute

Demonstrates how to perform data-parallel computations using the GPU.

CPU and GPU Synchronization

Demonstrates how to update buffer data and synchronize access between the CPU and GPU.

Argument Buffers

Learn how to use argument buffers.

Beta Software

This documentation contains preliminary information about an API or technology in development. This information is subject to change, and software implemented according to this documentation should be tested with final operating system software.

Learn more about using Apple's beta software