Generating Multiple Output Vertex Streams from One Input Stream

Render efficiently to multiple layers or viewports.


Sometimes you need to generate multiple primitives from a single set of input data. For example, if you are implementing a graphics technique like cascade shadow maps, you might render the same model objects multiple times, once for each cascade level. Although you could do so with multiple render passes, this approach requires you to encode the same drawing commands for each render pass, and may require the GPU to fetch the input data from memory multiple times.

Using vertex amplification, you create drawing commands that generate multiple vertex streams from your input data.

Illustration showing a single input stream of vertices amplified to generate multiple output streams.

When the GPU executes a command with vertex amplification, it sends multiple primitives to the rasterizer. The GPU calls your vertex function multiple times, once per vertex for each output stream. However, if a calculation for a field can be shared because the calculation is the same on all of the vertex outputs, the GPU calculates the value only once and shares it, reducing the GPU's workload. When you write your vertex function, the compiler automatically detects when custom values must be calculated separately, but you can also explicitly mark calculations as shared.

Vertex amplification is usually used in conjunction with layered rendering or rendering to multiple viewports, so that the GPU renders each output primitive to a different texture layer or viewport. For more information, see Rendering to Multiple Texture Slices in a Draw Command or Rendering to Multiple Viewports in a Draw Command.

Check for Vertex Amplification Support

Not all GPUs support vertex amplification. Check for support by calling the supportsVertexAmplificationCount(_:) method on a device object, passing in the number of output streams you want to create. If the device object can support that many streams, this method returns true.

Boolean useVertexAmplification = [_device supportsVertexAmplificationCount:2];

Add Vertex Amplification to Your Vertex Shader

To implement vertex amplification:

  • Add the amplification_count attribute to a shader argument to get the number of requested output streams. You'll set this count when you encode a draw command, as shown below in Set Amplification Information Before Encoding a Draw Command.

  • Add the amplification_id attribute to a parameter to get the index of the output stream. The indices have values from 0 to amplification_count-1.

By default, the GPU calls your vertex function once, with an amplification count of 1 and an amplification index of 0.

To customize the behavior for each output stream, pass in per-stream input data, and use the index of the output stream to select the data. For example, the following shader takes the count and index as inputs, as well as an array of projection matrices, one for each output stream. It uses the index to select which matrix to use for that output.

vertex MyVertexOut myVertex(MyVertexIn in [[stage_in]],
                            constant float4x4 view_proj[MAX_AMP] ,
                            ushort amp_id [[amplification_id]],
                            ushort amp_count [[amplification_count]],
    MyVertexOut vert;
    vert.position  = view_proj[amp_id] * in.position;

Determine Which Calculations Must Be Distinct

A major benefit of vertex amplification over instancing is how it optimizes the work you send to the GPU. The GPU reads the input stream only once, and performs calculations as needed to produce the output streams. If the compiler determines that a particular calculation is shared across the vertex output streams, it calculates that output once. In some cases, you need to explicitly mark values in your shader so that the compiler knows an output value is shared.

Your output vertex data always includes a field with the position attribute, and Metal always marks this data as nonshared. If you assign another built-in attribute to any field, Metal marks that field as shared.

The compiler marks other output values as shared only if it can prove that your shader compiles the output values the same way for all output streams. For example, if the compiler encounters a calculated field that's dependent on the amplification ID, it marks that field as nonshared:

MyVertexOut vert;
vert.ampData   = data[amp_id]; // not shared

On the other hand, if the value is just copied from the input stream, the compiler marks the output as shared:

MyVertexOut vert;
vert.normal    = in.normal; // deduced shared

To explicitly tell the compiler to calculate an output field once, add the shared attribute to the field.

Configure the Pipeline State Object for Vertex Amplification

When you create a render pipeline state object for your shaders, set the maxVertexAmplificationCount property on the MTLRenderPipelineDescriptor to the maximum number of output streams that your pipeline can handle.

MTLRenderPipelineDescriptor *pipelineStateDescriptor = [[MTLRenderPipelineDescriptor alloc] init];
pipelineStateDescriptor.vertexFunction = vertexFunction;
pipelineStateDescriptor.fragmentFunction = fragmentFunction;
[pipelineStateDescriptor setMaxVertexAmplificationCount: 2];

Set Amplification Information Before Encoding a Draw Command

To use vertex amplification in a draw command, call setVertexAmplificationCount(_:viewMappings:) before encoding the command, specifying the number of vertices to generate. The count must be less than or equal to the maximum value you set when you created the render pipeline.

In addition, because vertex amplification is almost always used to render to different layers or viewports, you typically must specify the index of the target for each output vertex. The render target and viewport array indices are always calculated once in the vertex shader (because they use a built-in attribute, as described above). However, you can modify the final indices for each output primitive by creating an array of offsets and passing it as the second parameter.

The following code creates two mappings and configures the draw call to use vertex amplification:

MTLVertexAmplificationViewMapping mappings[2];
mappings[0].viewportArrayIndexOffset = 1
mappings[0].renderTargetArrayIndexOffset = 0;
mappings[1].viewportArrayIndexOffset = 2
mappings[1].renderTargetArrayIndexOffset = 1;

[renderEncoder setVertexAmplificationCount:2 viewMappings: mappings];

The following vertex shader sets the viewport array index to 1. After the GPU runs your shader, it adds the offsets provided above, so the primitive for the first output stream has a viewport array index of 2, and the second has a viewport array index of 3.

struct MyVertexOut
    ushort viewport [[viewport_array_index]]; // Implicitly shared.

vertex MyVertexOut myVertex(MyVertexIn in [[stage_in]],
                            constant float4x4 view_proj[MAX_AMP] ,
                            ushort amp_id [[amplification_id]],
                            ushort amp_count [[amplification_count]]
    MyVertexOut vert;
    vert.viewport  = 1;
    return vert;

fragment float4 myFragment(MyVertexOut in [[ stage_in ]],
                           ushort amp_id [[amplification_id]],
                           ushort amp_count [[amplification_count]])
    // The provided view offsets were: {{1,0},{2,1}}
    //   when amp_id == 0, in.viewport == 2
    //   when amp_id == 1, in.viewport == 3

Combine Vertex Amplification with Instancing

Primitive instancing is another way to generate multiple vertex output streams from a single stream of input data. You provide shared vertex data and data that specifies how you want to render each instance of the model. For example, you might use a single set of model data, but provide different pose data to animate each version of the model separately.

When you execute a draw call with an instance count of 10, the GPU generates ten output streams. Primitive instancing, unlike vertex amplification, recalculates all of the vertex outputs for each call to the vertex function.

You can combine vertex amplification and primitive instancing safely and easily. Use this combination to separate instancing concepts (such as the number of characters in a scene) from rendering concepts (such as the distinction between shadow map targets). Metal generates a number of output streams equal to the product of the vertex amplification count and the instance count. For example, if you execute a draw call with a vertex amplification count of 2 and an instance count of 10, the GPU calls your vertex function 20 times—twice for each instance. It calculates the shared output values from vertex amplification once per instance.

See Also


Creating and Sampling Textures

Load image data into a texture and apply it to a quadrangle.

Calculating Primitive Visibility Using Depth Testing

Determine which pixels are visible in a scene by using a depth texture.

Customizing Render Pass Setup

Render into an offscreen texture by creating a custom render pass.

Render Pipelines

Specify how graphics primitives should be rendered.

class MTLRenderPassDescriptor

A group of render targets that hold the results of a render pass.

protocol MTLRenderCommandEncoder

The object to use for encoding commands for a render pass.

protocol MTLParallelRenderCommandEncoder

An object that splits up a single render pass so that it can be simultaneously encoded from multiple threads.

Model I/O

Specify precise locations within the textures associated with graphics processing.