Get started with Metal shader converter

Metal shader converter converts shader intermediate representations in LLVM IR bytecode into bytecode suitable to be loaded into Metal. It’s available as a library and a standalone executable. All the functionality exposed through the library interface is available via the standalone executable.


This document describes the IR conversion process, explains the binding model, synchronization considerations, and reflection capabilities, then provides general guidance and examples.

This document also presents metal_irconverter_runtime.h, a lightweight, header-only library that accompanies Metal shader converter. metal_irconverter_runtime.h helps perform common tasks when working with pipeline states built from IR generated by Metal shader converter.

System requirements

Metal shader converter requires macOS 13 Ventura or later and Xcode 15.

Metal shader converter for Windows requires Microsoft Windows 10 or later and Microsoft Visual Studio 2019.

Metal libraries built using Metal shader converter tools require a device that supports Argument Buffers Tier 2, running macOS 14 Sonoma, iOS 17 or later. If you build a Metal library for earlier OS versions, not all features will be supported.

Converting IR

To convert shaders from DXIL to Metal IR, you use Metal shader converter as a standalone executable (metal-shaderconverter) or as a dynamic library (libmetalirconverter). The Metal shader converter and libmetalirconverter support both Windows and macOS.

Standalone executable

The Metal shader converter executable offers several options to customize code generation. In its most basic form, Metal shader converter takes a DXIL file as input and produces a metallib.

% metal-shaderconverter shader.dxil -o ./shader.metallib

By default, Metal shader converter generates metallib files that target the latest version of macOS at the time of release. You can inspect this version by running metal-shaderconverter --version. Make sure your Xcode is always up to date.

Run metal-shaderconverter --help to access all command-line options.

Dynamic library

libmetalirconverter offers a C interface for easy integration into C, C++, Objective-C, and Swift codebases.

IRCompiler* pCompiler = IRCompilerCreate();
IRCompilerSetEntryPointName(pCompiler, "MainVSEntry");

IRObject* pDXIL = IRObjectCreateFromDXIL(bytecode, size, IRBytecodeOwnershipNone);

// Compile DXIL to Metal IR:
IRError* pError = nullptr;
IRObject* pOutIR = IRCompilerAllocCompileAndLink(pCompiler, NULL, 0, pDXIL, &pError);

if (!pOutIR)
  // Inspect pError to determine cause.
  IRErrorDestroy( pError );

// Retrieve Metallib:
MetaLibBinary* pMetallib = IRMetalLibBinaryCreate();
IRObjectGetMetalLibBinary(pOutIR, stage, pMetallib);
size_t metallibSize = IRMetalLibGetBytecodeSize(pMetallib);
uint8_t* metallib = new uint8_t[metallibSize];
IRMetalLibGetBytecode(pMetallib, metallib);

// Store the metallib to custom format or disk, or use to create a MTLLibrary.

delete [] metallib;

Although you typically use the library to implement asset conversion and packaging programs, you may also use it at runtime during the bringup process of your game.

Once your game runs on Metal, start converting your shaders ahead of time and directly distributing metallibs in your game.

Create a MTLLibrary instance from metallib bytecode

After you retrieve the metallib data its corresponding size, you create a MTLLibrary via a dispatch_data_t object:

// Use metallib (at runtime):

NSError* __autoreleasing error = nil;
dispatch_data_t data = 
	dispatch_data_create(metallib, metallibSize, dispatch_get_main_queue(), NULL);

id<MTLLibrary> lib = [device newLibraryWithData:data error:&error];

// lib and data are released by ARC.

Tip: On macOS, you can use function IRMetalLibGetBytecodeData to obtain a direct pointer into the metallib's bytecode, avoiding a copy operation.

Multithreading considerations

Metal shader converter supports multithreading the IR translation process, however the IRCompiler object isn’t reentrant. Each thread in your program needs to create its own instance of IRCompiler to avoid race conditions. Once the compilation process completes, your program can reuse a compiler instance to convert further IR.

Binding model

The top-level Argument Buffer

In order to use the Metal IR, you bind global shader resources to your pipeline objects via a “top-level” Argument Buffer. The Metal shader converter offers two mechanisms to control the layout of the resource handles in the top-level Argument Buffer: an explicit mode via root signatures, and an automatic “linear” layout.

The top-level Argument Buffer is a resource shared between the CPU and GPU and, as such, you need to coordinate access to its memory to avoid race conditions. View the synchronization section of this document for best practices.

Explicit layout via root signatures

Use root signatures for maximum flexibility at resource binding time.

When you provide a root signature, Metal shader converter generates a layout for the top-level Argument Buffer that matches your specification. In particular, root signatures allow you to define the following resources in the top-level Argument Buffer:

  • Inline constant data (“root constants”) of arbitrary size.
  • Pointers to Metal resources (“root arguments”). Pointers are 64-bit unsigned values corresponding to the gpuAddress or resourceID of the resource to reference. Note: textures need to always be placed in descriptor tables.
  • Pointers to a resource table (“descriptor table”). In Metal you implement this table via Argument Buffers. Each entry in the table consists of three 64-bit unsigned values: a buffer GPU address, a texture handle, and flags. Use IRDescriptorTableEntry in the runtime companion header (described below) to correctly calculate offsets and cast pointer types.

After Metal shader converter generates its output IR, calculate the offsets of each resource in the top-level Argument Buffer by taking the size of the root constants in bytes (if present) and adding the resource index multiplied by sizeof(uint64_t).

When you use root signatures, you need to supply SamplerState objects via a descriptor table.

Use the explicit layout approach when porting root signatures or when your game uses bindless resources.

Automatic linear layout

When you don’t provide a root signature to Metal shader converter at conversion time, it automatically generates a linear layout of resources into the top-level Argument Buffer.

This is a simple layout where the top-level Argument Buffer directly references each resource through a small resource descriptor. Just like descriptor tables, each entry in the Argument Buffer consists of three uint64_t parameters.

If you use libmetalirconverter, you can reflect the Argument Buffer offsets using function:

void IRShaderReflectionGetResourceLocations(
					   IRShaderReflection* reflection,
					   IRResourceLocation* resourceLocations);

Alternatively, if you use Metal shader converter as a standalone tool, it conveniently writes resource locations into a reflection JSON file.

The runtime companion header, described later in this document, provides a helper struct and functions you can use to encode your resources into this Argument Buffer. These helpers start with the prefix: IRDescriptorTable.

Use the automatic layout mechanism when you don’t need to produce a resource hierarchy and your game doesn’t use bindless resources.

Because this mechanism avoids one level of indirection, it may provide a performance advantage compared to the explicit layout approach.

Resource encoding

When you use root descriptors for your top-level Argument Buffer, you encode resource references into it by writing 64-bit GPU addresses (or Metal resource IDs) at their corresponding offsets within the Argument Buffer.

In addition to root resources, the top-level Argument Buffer may reference “descriptor tables,” which you need to encode using a specific format.

This specific format also applies to resources the top-level Argument Buffer directly references when you generate an automatic linear layout of resources (i.e., when you don’t provide a root signature to Metal shader converter).

The Metal shader converter companion header provides helper functions to help you encode resources into descriptor tables. Use functions IRDescriptorTableSetBuffer(), IRDescriptorTableSetTexture(), and IRDescriptorTableSetSampler(), to encode resource references into descriptor tables, depending on the resource type to encode.

You may also implement this encoding yourself without using the runtime header. In this case, resource “descriptors” in descriptor tables always consist of three 64-bit unsigned ints representing the GPU address (for buffers and acceleration structures), the texture resource ID (for textures and sampler bindings), and metadata flags (64 bits) representing:

  • For buffers:
    • The buffer length stored in the low 32 bits.
    • A texture buffer view offset stored in 2 bytes left-shifted 32 bits. Metal requires texture buffer views to be aligned to 16 bytes. This offset represents the padding necessary to achieve this alignment.
    • Whether the buffer is a typed buffer (1) or not (0) in the high bit.
  • For textures:
    • The min LOD clamp in the low 32 bits.
    • 0 in the high 32 bits.
  • For samplers, the LOD bias as a 32-bit floating point number.

Top-level Argument Buffer synchronization

The top-level Argument Buffer is a shared resource that the CPU and GPU both may access simultaneously, and as such, you need to coordinate access to its memory to avoid race conditions.

In the Metal execution model, you first encode the work to perform into a command buffer, and then the GPU carries out your commands later, after you commit it. If each draw call modifies the top-level Argument Buffer as a shared resource, when the GPU executes your commands, only the last modification to the Argument Buffer is visible to the pipeline.

Furthermore, when your application handles multiple frames in flight, the CPU could overwrite memory locations the GPU is reading from.

To avoid these situations, you can provide each draw call with its own top-level Argument Buffer. While you can manage these as discrete or MTLHeap-based allocations, it would necessitate having one Argument Buffer per draw call and for each frame in flight. This can become challenging to manage and carry CPU overhead to orchestrate at runtime.

To properly synchronize access without the need to serialize CPU and GPU work, you can use a bump allocator backed by a MTLBuffer, or the Metal setBytes family of functions.

Bump allocator

For the best frame encoding performance, implement a bump allocator backed by a MTLBuffer.

To achieve this, you first allocate a buffer large enough to contain the data for your frame or render pass. Store an offset tracker alongside this buffer.

When you need to obtain memory, you get a pointer into the buffer contents at the offset and increase the offset by the size of the data to write. On Apple GPUs, you need to align this offset to 4 bytes. Repeat the process for each piece of data.

To bind the data to your pipeline, use setBuffer:offset:atIndex: family of functions, with the appropriate offset. Repeat the process for each buffer to write.

Using this technique, and keeping one buffer for each frame in flight, you can completely avoid race conditions without the cost of synchronization primitives.

The Example snippets section of this document presents a simple bump allocator implementation in C++.

SetBytes functions

You may alternatively leverage the setBytes: family of function in Metal’s command encoders (such as setVertexBytes:offset:index, and setFragmentBytes:offset:index) to provide your Argument Buffer as inline data in the MTLCommandBuffer.

When using the setBytes: family of functions, Metal immediately performs a memcpy of your Argument Buffer in the CPU timeline, preserving its contents.

The cost of the memcpy operation is linear on the size of your top-level Argument Buffer (smaller buffers are faster to copy), and Metal limits the size of each draw call data to 4KB (per draw call).

In some implementations, Metal may have to allocate a buffer to back the memcpy operation, which can add significant CPU overhead to your frame encoding time. For CPU-intensive scenarios, favor using a bump allocator instead.

Follow these steps when moving from a slot-based binding model to the top-level Argument Buffer model via setBytes:

  1. Write your resource GPU addresses and handles into a CPU struct matching the Argument Buffer’s layout.
  2. Use the setBytes: family of functions to have Metal snapshot your CPU struct into an inline “buffer” at slot kIRArgumentBufferBindPoint (16).
  3. Issue the appropriate useResource and useHeap calls to signal resource residency (and dependencies) to Metal.

Other important considerations

Indirect resources

As with any other Argument Buffers, you need to inform Metal of all resources referenced through the top-level Argument Buffer via the useResource:usage: and useResource:usage:stages: methods for compute and render pipelines respectively. For read-only MTLHeap-backed resources, you may alternatively use useHeap:. Resource tables may in turn reference other resources. You need to also call the useResource: or useHeap: methods to make them resident.

Texture arrays

HLSL shaders may legally treat textures as texture arrays and vice-versa. In order to offer these semantics on Apple GPUs, Metal shader converter pipelines require that you allocate all your texture resources as a texture array type – such as MTLTextureType2DArray – or bind your textures via an array texture view object.

The following table shows the appropriate Metal texture type you need to use for each HLSL texture type:

HLSL Metal Mechanism
1D and 2D textures 1D Texture 2D Texture Array Allocation or texture view
1D Texture Array
2D Texture
2D Texture Array
Texture cubes Cube Cube Array
Cube Array
Multisampled textures 1D Multisampled Texture 2D Multisampled Texture Array
1D Multisampled Texture Array
2D Multisampled Texture
2D Multisampled Texture Array
3D Textures No Change

Sampler state objects

Metal needs to know at resource creation time if your game intends to reference a sampler through an Argument Buffer.

Set the supportArgumentBuffers property of the MTLSamplerDescriptor to YES in order to create sample state objects that you can bind to Metal shader converter pipelines.

Vertex attribute fetch

Metal shader converter supports two mechanisms to fetch vertex attributes: Metal vertex fetch and a separate stage-in function.

By default, Metal shader converter generates IR that leverages the Metal vertex fetch mechanism. When using this mechanism, you typically provide a MTLVertexDescriptor instance to your pipeline state descriptors so Metal can automatically retrieve vertex attributes and make them accessible to your vertex shader.

This mechanism is fast and requires very little setup at both IR conversion time and at pipeline state object creation time. In some situations, however, if your IR requires very flexible datatype conversions or dynamic vertex strides, you need to use a separate stage-in function.

To synthesize a separate vertex stage-in function, pass configuration parameter IRStageInCodeGenerationModeUseSeparateStageInFunction to function IRCompilerSetStageInGenerationMode in the libmetalirconverter library before compiling your vertex stage and call IRMetalLibSynthesizeStageInFunction afterward to generate the stage-in function.

You can also use the command-line argument --vertex-stage-in to direct Metal shader converter standalone tool to produce both a vertex function and a separate stage-in function. The Metal shader converter stores each function in its own metallib.

Metal shader converter generates separate stage-in functions as Metal visible functions that you need to link to your pipeline object via its descriptor’s linkedFunctions property. At runtime, the converted shader code automatically invokes the visible stage-in function, performing any needed type conversions or apply dynamic offsets via software.

The following example compiles a vertex shader that leverages a separate stage-in function:

IRObject* pIR = // input IR

IRCompiler* pCompiler = IRCompilerCreate();

// Synthesize a separate stage-in function by providing a vertex input layout:
IRVersionedInputLayoutDescriptor inputDesc;
inputDesc.version = IRInputLayoutDescriptorVersion_1;
	inputDesc.desc_1_0.numElements = 3;
	inputDesc.desc_1_0.semanticNames[0] = "POSITION";
	inputDesc.desc_1_0.semanticNames[1] = "COLOR";
	inputDesc.desc_1_0.semanticNames[2] = "TEXCOORD";
	inputDesc.desc_1_0.inputElementDescs[0] = {
		.semanticIndex = 0,
        .format = IRFormatR32G32B32A32Float,
        .inputSlot = 0,
        .alignedByteOffset = 0,
        .inputSlotClass = IRInputClassificationPerVertexData,
        .instanceDataStepRate = 0 /* needs to be 0 for per-vertex data */
	inputDesc.desc_1_0.inputElementDescs[1] = {
		.semanticIndex = 0,
		.format = IRFormatR32G32B32A32Float,
		.inputSlot = 1,
		.alignedByteOffset = sizeof(float)*4,
		.inputSlotClass = IRInputClassificationPerVertexData,
        .instanceDataStepRate = 0 /* needs to be 0 for per-vertex data */
	inputDesc.desc_1_0.inputElementDescs[2] = {
		.semanticIndex = 0,
		.format = IRFormatR32G32B32A32Float,
		.inputSlot = 2,
		.alignedByteOffset = sizeof(float)*4*2,
		.inputSlotClass = IRInputClassificationPerVertexData,
        .instanceDataStepRate = 0 /* needs to be 0 for per-vertex data */

IRError* pError = nullptr;
IRObject* pIR = IRCompilerAllocCompileAndLink(pCompiler, nullptr, 0, pIR, &pError);

// Validate pIR != null and no error.

IRShaderReflection* pVertexReflection = IRShaderReflectionCreate();
IRObjectGetReflection(pIR, IRShaderStageVertex, pVertexReflection);

IRMetalLibBinary* pStageInMetalLib = IRMetalLibBinaryCreate();
bool success = IRMetalLibSynthesizeStageInFunction(pCompiler,

// Verify success

IRMetalLibBinary* pVertexStageMetalLib = IRMetalLibBinaryCreate();
success = IRObjectGetMetalLibBinary(pIR, IRShaderStageVertex, pVertexStageMetalLib);

// Verify success

if (pError)


At runtime, after you generate a separate stage-in function, you need to create the pipeline state object by linking the functions together. The next example shows how:

id<MTLDevice> device = MTLCreateSystemDefaultDevice();

id<MTLLibrary> vertexLib = [device newLibraryWithData: /* vertex stage metallib */ ];

id<MTLLibrary> stageInLib = [device newLibraryWithData: /* stage-in metallib */ ];

MTLRenderPipelineDescriptor* rpd = [[MTLRenderPipelineDescriptor alloc] init];
rpd.vertexFunction = 
		  [vertexLib newFunctionWithName:vertexLib.functionNames.firstObject];

MTLLinkedFunctions* linkedFunctions = [[MTLLinkedFunctions alloc] init];
linkedFunctions.functions = @[
		  [stageInLib newFunctionWithName:stageInLib.functionNames.firstObject]
rpd.vertexLinkedFunctions = linkedFunctions;

// ... continue configuring the pipeline state descriptor...

id<MTLRenderPipelineState> pso = 
					  [device newRenderPipelineStateWithDescriptor:rpd error:&error];

Feature support matrix

Metal shader converter supports a large subset of DXIL IR that enables AAA-grade content on Metal. Use the following reference table, as well as error detection features in Metal shader converter, to ensure the correct conversion of your IR.

Shader Model Feature Support Error-detected Notes
Pre-6.0 - Limited Limited Some features not supported
SM6.0 Wave intrinsics Yes -
64-bit iintegers Yes -
SM6.1 SV_ViewID No No
SV_Baricentrics Yes - GetAttributeAtVertex not supported
SM6.2 16-bit scalar types Yes -
denorm mode No No
SM6.3 Ray tracing Yes* -

No callable shaders

Disabled in beta release

SM6.4 Packed dot-product intrinsics Yes -
Library subobjects No No
SM6.5 Ray query Yes -
Sampler Feedback No No
Mesh/Ampl shaders Yes* - Disabled in beta release
SM6.6 64-bit atomics Limited No
Dynamic resources Yes No
IsHelperLane Yes -
Pack/unpack intrinsics No Yes
Compute derivatives No Yes
Wave size No Yes Wave size needs to be 32
Raytracing payload qualifiers No No

This list is non-exhaustive. Metal shader converter may not support other features not listed above, including:

  • SV_StencilRef
  • minLODClamp with texture read
  • Globally coherent textures

Note: some math operations like sine and cosine may offer different precision than other implementations of the input IR.

Offline reflection

Metal shader converter produces valuable information as it compiles your shaders. This information is a complement to, but not a replacement of, the reflection capabilities your source compiler may give you. Refer to the metal_irconverter.h header to determine the reflection information Metal shader converter offers.

Metal shader converter produces reflection information in both its forms: standalone executable and library.

Standalone reflection

When you use the standalone executable, Metal shader converter writes offline reflection information into a companion JSON file next to the generated metallib.

Vertex stage reflection

	"EntryPoint": (string),
	"NeedsFunctionConstants": (bool),
	"Resources": [
			"abIndex": 2,
			"slot": 0,
			"type": (string: "SRV"|"CBV"|"SMP"|"UAV")
	"ShaderType": (string: "Vertex"),
	"instance_id_index": (int),
	"line_passthrough_shader": {
	"max_primitives_per_mesh_threadgroup": (int)
	"needs_draw_params": (bool),
	"point_passthrough_shader": {
		"max_primitives_per_mesh_threadgroup": (int)
	"triangle_passthrough_shader": {
		"max_primitives_per_mesh_threadgroup": (int)
	"vertex_id_index": (int),
	"vertex_inputs": [

	"vertex_output_size_in_bytes": (int),
	"vertex_outputs": [
			"columnCount": (int: 1|2|3|4),
			"elementType": (string),
			"index": (int),
			"name": (string: e.g."sv_position0")

Example: the following vertex shader:

struct VertexData
	float4 position : POSITION;
	float4 color : COLOR;
	float4 uv : TEXCOORD;

struct v2f
	float4 position : SV_Position;
	float4 color : USER0;
	float4 uv : TEXCOORD0;

v2f MainVS( VertexData vin )
	v2f o = (v2f)0;
	o.position = vin.position;
	o.color = vin.color;
	o.uv = vin.>uv;
	return o; 

Produces reflection JSON:

	"EntryPoint": "MainVS",
	"ShaderType": "Vertex",
	"instance_id_index": -1,
	"is_tessellation_vertex_shader": -1,
	"needs_draw_params": false,
	"vertex_id_index": -1,
	"vertex_inputs": [
			"index": 0,
			"name": "position0"
			"index": 1,
			"name": "color0"
			"index": 2,
			"name": "texcoord0"
	"vertex_output_size_in_bytes": 48

Fragment stage reflection

	"EntryPoint": (string),
	"NeedsFunctionConstants": (bool),
	"Resources": [
			"abIndex": (int),
			"slot": (int),
			"type": (string: "SRV"|"CBV"|"SMP"|"UAV")
	"ShaderType": (string: "Fragment"),
	"discards": (bool),
	"num_render_targets": (int),
	"rt_index_int": (int)

Compute stage reflection

	"EntryPoint": (string),
	"NeedsFunctionConstants": (bool),
	"Resources": [
			"abIndex": (int),
			"slot": (int),
			"type": (string: "SRV"|"CBV"|"UAV")
	"ShaderType": (string: "Compute"),
	"tg_size": [

Library reflection

When using Metal shader converter library, you access reflection information for any shader you compiled using function IRObjectGetReflection. The reflection object contains information for the shader stage you request.

While the IRReflection object holds general reflection data – such as the entry point’s name – you access detailed information about a shader stage through reflection information structs. To ensure forward compatibility, all reflection structs are versioned.

Example: reflect the entry point’s name of a compiled vertex shader:

// Reflection the entry point's name:
IRShaderReflection* pReflection = IRShaderReflectionCreate();
IRObjectGetReflection( pOutIR, IRShaderStageVertex, pReflection );
const char* str = IRShaderReflectionGetEntryPointFunctionName( pReflection );

// ... use store entry point name or use it to find the MTLFunction ... //

IRShaderReflectionDestroy( pReflection );

Example: reflect the thread group size of a compiled compute shader through the compute information struct:

// Get reflection data:
IRShaderReflection* pReflection = IRShaderReflectionCreate();
IRObjectGetReflection( pOutIR, IRShaderStageCompute, pReflection );

IRVersionedCSInfo csinfo;
if ( IRShaderReflectionCopyComputeInfo( pReflection, IRReflectionVersion_1_0, &csinfo ) ) 
    // Threadgroup sizes available in csinfo.info_1_0.tg_size

// Clean up
IRShaderReflectionReleaseComputeInfo( &csinfo );
IRShaderReflectionDestroy( pReflection );

IR runtime model compatibility

Draw parameters

Metal shader converter generates code that bridges the gap between the input IR and Metal’s runtime model. For example, this allows preserving the IR semantics of SV_VertexID and SV_InstanceID when taking DXIL IR as the input format.

Continuing with this example, the source IR ensures SV_VertexID to start from 0 regardless of whether StartVertexLocation is non-zero. In Metal, this isn’t the case. The [[vertex_id]] attribute includes the base vertex value, if one is specified [MSL Spec § Vertex Function Input Attributes], and the base instance value.

Metal shader converter closes this gap, but requires pipelines to bind a supplemental buffer with details about the draw call. You may provide this buffer manually, or use the helper draw calls provided by Metal shader converter companion header.

Use the reflection information of the vertex stage to determine if a pipeline state object requires additional draw call information. See the Example snippets section below for an example on how to access this reflection data.

It’s an error not to provide draw call information to a pipeline that requires it, and may trigger a Metal debug layer error.

When your pipeline requires additional draw call information, Metal shader converter companion header provides convenience draw functions that automatically create and bind these additional buffers at the right bind location. See section Metal shader converter companion header for more details.

If you’re not using the companion header, you need to create this buffer manually. The exact format depends on the draw call. A simple non-indexed, non-instanced primitive draw call must provide:

  • start_vertex_location
  • base_vertex_location
  • base_instance
  • index_type – needed for correctly deriving the SV_VertexID in a tessellation vertex shader.
  • instance_count – needed for correctly deriving the primitive_id in a domain shader.
  • vertex_or_index_count_per_instance

Please refer to the implementation of the companion header to determine the layout Metal shader converter requires for other draw call types. This document also includes an example in the Example snippets section below.

Your program needs to bind this data as a buffer at index 20 (kIRArgumentBufferDrawArgumentsBindPoint).

Dual-source blending

Metal shader converter supports dual source blending. By default, Metal shader converter doesn’t inject this capability into its generated Metal IR, but exposes controls to allow you to enable it always, or to defer the decision to pipeline state creation time.

To request support for dual source blending, pass IRDualSourceBlendingConfigurationForceEnabled to function IRCompilerSetDualSourceBlendingConfiguration, or -dualSourceBlending via the command-line interface.

You can alternatively defer the decision to perform dual-source blending to runtime. In that case, use options IRDualSourceBlendingConfigurationDecideAtRuntime or decideAtRuntime when configuring dual-source blending support. In this case, Metal shader converter injects a function constant dualSourceEnabled into your fragment shader that you then provide when retrieving the function from the produced Metal library.

Performance tips

Codegen compatibility flags

Compiler compatibility flags allow you to exact code generation to the specific requirements of your shaders. You typically enable compatibility flags to support a broader set of features and behaviors (such as out-of-bounds reads) when your shader needs them to operate correctly. These flags, however, carry a performance cost.

Always use the minimum set of compatibility flags your shader needs to attain the highest runtime performance for IR code you compile. By default, all compatibility flags are disabled.

You control the compatibility flags by calling the IRCompilerSetCompatibilityFlags API in the Metal shader converter library. The expected parameter is a 64-bit bitmask of flags to enable.

You may also control the compatibility flags from the command-line. Consult metal-shaderconverter help for a listing of all flags.

Automatic linear resource layout vs explicit root signatures

Root signatures provide maximum flexibility when laying out resources in your shader’s top-level Argument Buffer, enabling advanced features such as bindless resources. This flexibility, however, comes at the cost of increased indirection.

Favor using a linear resource binding model for shaders that don’t require the flexibility of root signatures. This binding model provides a top-level Argument Buffer layout that references resources through a single indirection, improving resource access times.

Minimum OS deployment target and minimum GPU

Metal shader converter may be able to produce more optimal output when targeting newer GPU families and operating system versions.

Use functions IRCompilerSetMinimumGPUFamily() to specify the minimum GPU target and IRCompilerSetMinimumDeploymentTarget() to specify the OS and minimum build version your IR needs to support.

Metal shader converter vends these functions via the command-line switches --minimum-gpu-family, and deployment-os alongside --minimum-os-build-version.

Top-level argument buffers and GPU occupancy

Shader code produced by Metal shader converter relies on Argument Buffers to bind resources to pipeline objects. Using Argument Buffers to access resources may result in higher register pressure, reducing theoretical shader occupancy when compared to directly binding resources to pipeline slots.

Top-level argument buffers and shader execution overlap

The root signature binding model allows you to specify and reference resource (descriptor) tables to Metal and reference them from multiple top-level Argument Buffers without rebinding linked resources multiple times. This may lead to lower CPU times due to reducing the calls into the Metal command encoder.

However, be mindful of potential data dependencies introduced between passes by referencing common resources, which may reduce GPU work overlap and increase the wall clock execution time of your workload.

Consider a compute shader that writes into a texture that a fragment shader subsequently samples. You place this texture in a texture table and reference it from a top-level Argument Buffer available to both the compute dispatch and the draw call.

If the texture is a tracked resource, and the vertex stage is able to access this texture through its top-level Argument Buffer, Metal needs to serialize the GPU execution of the compute dispatch and the vertex stage, even when no race condition exists, and these two stages can theoretically overlap.

When you use the root signature binding model and share resources via top-level Argument Buffers, use the Metal System Trace in Instruments to evaluate the overlap. Instruments gives you insights you can use to fine-tune your workload dependencies and maximize shader execution overlap.

Input IR quality influences output IR performance

Metal shader converter transforms IR directly based on its input. Suboptimal input IR influences the output of Metal shader converter, and may reduce runtime performance of shader pipelines. Always use the best possible input IR as input to Metal shader converter.

For best results, avoid intermediate tools that transform the input IR from other formats and provide Metal shader converter with IR as close to the source language as possible.

Root signature validation flags

This flag doesn’t affect Metal IR runtime performance. When you instruct Metal shader converter to generate a hierarchical resource layout via root signatures, by default Metal shader converter performs validation checks on your root signature descriptor and produces an error message when it detects issues.

After you’ve verified your root signatures are correct, you can disable all validation flags to prevent Metal shader converter from performing these checks at compilation time.

Hybrid pipelines

Metal shader converter joins the Metal compiler as another mechanism to produce Metal libraries from your existing shader IR.

Since all shaders become Metal IR, you can combine Metal Libraries coming from Metal shader converter and from the Metal compiler in a single app and even in a single pipeline. This opens the possibility of using the Metal shading language to access unique features — like programmable blending and tile shaders – not typically available in third-party IR.

To take advantage of programmable blending, after your converted pipeline has output its color, use pipelines with a Metal Shading Language fragment stage to perform frame buffer fetch. A good use case for this is implementing an on-tile post-processing stack.

To read back the on-tile color, observe the following mapping. Color data your converted fragment shader pipeline stores in SV_Target0 is available through attribute color(0).

SV_Target0 -> [[color(0)]]

You can also mix and match shader stages, for example, take advantage of the render pipeline with tessellation and geometry, but calculate the final coloring in Metal Shading Language. To accomplish this, match the shader interface using the user property when declaring your struct members.

SV_Position  ->  [[user(SV_Position)]]
SV_NORMAL	->  [[user(SV_Normal0)]]
SV_TEXCOORD0 ->  [[user(SV_Texcoord0)]]

Metal shader converter companion header

The Metal shader converter companion header provides convenience functions to accomplish common tasks:

  1. Helps encoding resources into descriptor tables (3 uint64_t encoding)
  2. Offers wrappers to drawing functions that automatically supply draw parameters to the pipeline
  3. Aids the emulation of Geometry and Tessellation pipelines via Metal mesh shaders

To use the companion header, include file metal_irconverter_runtime.h.

This header depends on Metal, and you need to include it after including Metal/Metal.h or Metal/Metal.hpp.

Because this is a header-only library, it requires you to generate its implementation once. You generate the implementation by defining IR_PRIVATE_IMPLEMENTATION in a single m, .mm, or .cpp file before including the header. You need to define this macro exactly once.

The Metal shader converter companion header is compatible with metal-cpp. To configure the header for metal-cpp usage, define IR_RUNTIME_METALCPP before including it. Your program needs to define this macro for every inclusion directive, ensuring types match across the entire program.

You can download metal-cpp from

Example: include Metal shader converter companion header in a single `cpp` file that uses metal-cpp for rendering, and generate its implementation.

#include <Metal/Metal.hpp>
#define IR_RUNTIME_METALCPP       // enable metal-cpp compatibility mode
#define IR_PRIVATE_IMPLEMENTATION // define only once in an implementation file     
#include <metal_irconverter_runtime/metal_irconverter_runtime.h>

The following sections provide examples of specific tasks you can accomplish with Metal shader converter companion header.

Encoding Argument Buffers and descriptor tables generated with an automatic layout

Encode a descriptor table with two entries: first a texture array, followed by a sampler state object.

const int kNumEntries = 2;
size_t size= sizeof(IRDescriptorTableEntry) * kNumEntries;
MTL::Buffer* pDescriptorTable =
    _pDevice->newBuffer(size, MTL::ResourceStorageModeShared );

auto* pResourceTable = (IRDescriptorTableEntry *)pDescriptorTable->contents();
IRDescriptorTableSetTexture( &pResourceTable[0], pTexture, 0, 0 );
IRDescriptorTableSetSampler( &pResourceTable[1], pSampler, 0 );

Provide draw params

Provide the draw parameters pipelines need to support VertexID semantics compatible with DXIL.

Use Metal shader converter companion header’s draw call functions to have the runtime automatically provide these buffers to Metal.

IRRuntimeDrawPrimitives( pEnc, MTL::PrimitiveTypeTriangle, 0, 3 );

If you choose not to use the companion header, you may encode these buffers manually. See the Example snippets section below for an example.

Emulating geometry and tessellation pipelines

Beyond helping bind data to pipelines, the runtime companion header helps you emulate render pipelines that contain traditional geometry and tessellation stages. Metal Shader Converter allows you to bring these pipelines to Metal, by mapping them to Metal mesh shaders.

To help with the process of building mesh render pipeline state objects from the geometry and tessellation shader stages, the companion header offers the following functions:

  • IRRuntimeNewGeometryEmulationPipeline
  • IRRuntimeNewGeometryTessellationEmulationPipeline

These helper functions take as input parameters descriptor structures with the building blocks to compile the pipeline.

The descriptor structure members reference the Metal libraries containing the pipeline’s shader functions, reflection data, and a base mesh render pipeline descriptor that describes the render attachments.

Structure IRGeometryEmulationPipelineDescriptor contains:

  • stageInLibrary: a MTLLibrary containing the stage in function.
  • vertexLibrary: a MTLLibrary containing the vertex function.
  • vertexFunctionName: the name of the vertex function to retrieve from the vertex library.
  • geometryLibrary: a MTLLibrary containing the geometry function.
  • geometryFunctionName: the name of the geometry function to retrieve from the geometry library.
  • fragmentLibrary: a MTLLibrary containing the fragment function.
  • fragmentFunctionName: the name of the fragment function to retrieve from the fragment library.
  • basePipelineDescriptor: a MTLMeshRenderPipeline descriptor providing template configuration for the pipeline, such as render attachments.
  • pipelineConfig: reflection data that you obtained during the Metal shader converter compilation process.

Structure IRGeometryTessellationEmulationPipelineDescriptor shares all members of the IRGeometryEmulationPipelineDescriptor structure, and expands it to also include the following members:

  • hullLibrary: a MTLLibrary containing the hull function and tessellator.
  • hullFunctionName: the name of the hull function to retrieve from the hull library.
  • domainLibrary: a MTLLibrary containing the domain function.
  • domainFunctionName: the name of the hull function to retrieve from the domain library.

The companion header provides the following draw helper functions to help you use the emulation render pipeline states. Issue these function calls as part of your render pass encoding process to encode a mesh dispatch workload that emulates your geometry and tessellation pipelines.

  • IRRuntimeDrawIndexedPrimitivesGeometryEmulation
  • IRRuntimeDrawIndexedPatchesTessellationEmulation

Before issuing these calls, you need to bind your vertex buffers (or patches) and vertex buffer strides. Use structures IRRuntimeVertexBuffers and IRRuntimeVertexStrides to define and bind your buffers to indices 0 and 1 of the object stage respectively. In addition, make sure to make your vertex buffers resident via useResource or useHeap.

For a complete example of how to perform geometry and tessellation pipeline emulation, please check out the Complete examples section of this document.

Metal GPU binary generation

You can use the offline compiler tool, metal-tt, to ingest the output of Metal shader converter and produce finalized GPU binaries. Use this process to fully compile the non-MSL shader source to GPU binaries that can be loaded into Metal with no shader compilation overhead on device.

The following Python script shows an example shader pipeline that fully compiles HLSL to Apple GPU binaries. Inputs are the shader source, entry points, and shader profiles, alongside an mtlp-json description of the PSO.

import os
import subprocess
cmd =

METAL_TT="xcrun -sdk macosx metal-tt"

products=["compute_pso", "render_pso"]
    "render_pso" : ("render_pso.mtlp-json",
     [("shaders.hlsl", "MainVS", "vs_6_0"), ("shaders.hlsl", "MainFS", "ps_6_0")]),
    "compute_pso" : ("compute_pso.mtlp-json",
     [("shaders.hlsl", "MainCS", "cs_6_0")])

target_archs= \
    "".join(subprocess.check_output(['xcrun', 'metal-arch']).replace("\n"," "))

if not os.path.isdir(output_dir):

for product in products:
    pso_json, deps = dependencies[product]
    for dep in deps:
        source_file, entry, profile = dep
             "-T", profile,
             "-E", entry,
             "-Fo", output_dir + entry + ".dxil"])
            "-rename-entry-point", entry, 
            output_dir + entry + .dxil",
            "-o", "./" + output_dir + entry + ".metallib"])

    cmd(METAL_TT.split(" ") +target_archs.split(" ") +
        ["-L", output_dir, shader_source_dir + pso_json,
         "-o", output_dir + product+".gpubin"])

Example of mtlp-json contents you use to produce a render pipeline state:

  "version": {
	"major": 0,
	"minor": 1,
	"sub_minor": 1
  "generator": "MetalFramework",
  "libraries": {
	"paths": [
		"label": "vtxMetalLib",
		"path": "MainVS.metallib"
		"label": "fragMetalLib",
		"path": "MainFS.metallib"
  "pipelines": {
	"render_pipelines": [
		"vertex_function": "alias:vtxMetalLib#MainVS",
		"fragment_function": "alias:fragMetalLib#MainFS",
		"vertex_descriptor": {
		  "attributes": [
			  "format": "Float4"
		  "layouts": [
			  "stride": 16
		"color_attachments": [
			"pixel_format": "BGRA8Unorm_sRGB"

Note: some limits apply to the offline compilation process. Please review the Apple documentation for the latest set of supported features.

Example snippets

Save a MetalLibBinary to disk

You can link Metal shader converter library to enhance your custom offline shader compilation pipelines, and produce metallib files. The following snippet stores the metallib to disk for later consumption:

bool saveMetalLibToFile(const char* filepath, const IRMetalLibBinary* pMetalLib)
    FILE* f = fopen(filepath, "w");
    size_t siz = IRMetalLibGetBytecodeSize(pMetalLib);
    uint8_t* bytes = (uint8_t)malloc(siz);
    IRMetalLibGetBytecode(pVertexStageMetalLib, bytes);
    fwrite(bytes, siz, 1, f);
    if (ferror(f)) {
        // ...error...

A custom shader pipeline processor may choose to store the metallib bytecode and size in a custom asset packaging format.

Supply draw parameters to a pipeline state without the companion header

This example shows a draw call that uses a shader that leverages VertexID. VertexID has different semantics across Metal and DXIL, so the app needs to provide the pipeline state object extra information about it, so the compiled shader can adjust the semantic’s value to match the value the original IR expects.

This is only needed when using VertexID or InstanceID. Use the needs_draw_info member of the vertex stage reflection information to determine programatically if the original IR requires this additional buffer.

struct DrawArgument
	uint vertexCountPerInstance;
	uint instanceCount;
	uint startVertexLocation;
	uint startInstanceLocation;
} da = { 3, 1, 0, 0 };

struct DrawParams
	DrawArgument draw;
} dp = { .draw = da };

struct DrawInfo
	uint32_t indexType;
	uint32_t primitiveTopology;
	uint32_t maxInputPrimitivesPerMeshThreadgroup;
	uint32_t objectThreadgroupVertexStride;
	uint32_t gsInstanceCount;
} di = { 0, 3, 0, 0, 0 };

pEnc->setVertexBytes( &dp, sizeof( DrawParams ), 25 );
pEnc->setVertexBytes( &di, sizeof( DrawInfo ), 26 );
pEnc->drawPrimitives( MTL::PrimitiveType::PrimitiveTypeTriangle, NS::UInteger(0), NS::UInteger(3) );

Bind points 20 and 21 are specially designated slots where the converted pipeline expects the application to bind buffers containing information about the draw call.

When you use the runtime companion header, the IRRuntimeDrawPrimitives function builds and submits the necessary draw params structure to the draw call based on its input, automatically binding these buffers in the correct slots. The companion header also defines kIRArgumentBufferDrawArgumentsBindPoint and kIRArgumentBufferUniformsBindPoint to conveniently reference these bind slots.

Reflect whether a vertex shader requires draw parameters

This example demonstrates how to determine whether a shader requires a draw parameter buffer via vertex shader reflection information. When using the standalone compiler, Metal shader converter conveniently includes this information in the generated metallib’s companion JSON file.

// Get reflection data:
IRShaderReflection* pReflection = IRShaderReflectionCreate();
IRObjectGetReflection( pOutIR, IRShaderStageVertex, pReflection );

// Determine whether draw params are needed:
IRVersionedVSInfo vsinfo;
if (IRShaderReflectionGetVertexInfo(pReflection, IRReflectionVersion_1_0, &vsinfo))
    if ( vsinfo.info_1_0.needs_draw_params )
        // PSO needs a draw params buffer bound to the vertex stage

// Clean up
IRShaderReflectionReleaseVertexInfo( &vsinfo );
IRShaderReflectionDestroy( pReflection );

Define a Global Root Signature using the Metal shader converter library

The Global Root Signature defines a sampler and a texture 2D. You need to put both samplers and texture references into their own tables. You may only reference raw resources directly from the top-level Argument Buffer, such as constants, constant buffers, buffer SRVs, and UAVs.

Root signatures in Metal shader converter are subject to the same limitations as in Microsoft’s DirectX. If you’re not familiar with these requirements, please refer to Microsoft’s documentation. Supplying an invalid root signature to Metal shader converter may trigger a validation error. If the compiler instance configuration disables validation, the tool’s behavior is undefined.

IRVersionedRootSignatureDescriptor desc;
desc.version = IRRootSignatureVersion_1_1;
desc.desc_1_1.Flags = IRRootSignatureFlagNone;

// Samplers are placed in their own table:
desc.desc_1_1.NumStaticSamplers = 1;
IRStaticSamplerDescriptor pSSDesc[] = { {
    .Filter = IRFilterMinMagMipLinear,
    .AddressU = IRTextureAddressModeWrap,
    .AddressV = IRTextureAddressModeWrap,
    .AddressW = IRTextureAddressModeWrap,
    .MipLODBias = 0,
    .MaxAnisotropy = 0,
    .ComparisonFunc = IRComparisonFunctionNever,
    .BorderColor = IRStaticBorderColorOpaqueBlack,
    .MinLOD = 0,
    .MaxLOD = std::numeric_limits<float>::max(),
    .ShaderRegister = 0,
    .RegisterSpace = 0,
    .ShaderVisibility = IRShaderVisibilityPixel
} };
desc.desc_1_1.pStaticSamplers = pSSDesc;

// Parameters (1 texture):
IRDescriptorRange1 ranges[1] = { [<0] = {
    .RangeType = IRDescriptorRangeTypeSRV,
    .NumDescriptors = 1,
    .BaseShaderRegister = 0,
    .RegisterSpace = 0,
    .Flags = IRDescriptorRangeFlagDataStatic,
    .OffsetInDescriptorsFromTableStart = 0
IRRootParameter1 pParams[] = { {
    .ParameterType = IRRootParameterTypeDescriptorTable,
    .DescriptorTable = { .NumDescriptorRanges = 1, .pDescriptorRanges = ranges },
    .ShaderVisibility = IRShaderVisibilityPixel
} };
desc.desc_1_1.NumParameters = 1;
desc.desc_1_1.pParameters = pParams;

IRError* pRootSigError = nullptr;
IRRootSignature* pRootSig = IRRootSignatureCreateFromDescriptor( &desc, &pRootSigError );
if ( !pRootSig )
    // handle and release error

// After compiling DXIL bytecode to Metal IR using this root signature,
// it should have 2 entries:
// offset 0: a uint64_t referencing a table that contains a
// void* resource (SRV).
// offset 8 (sizeof(uint_64)): a uint64_t referencing a table with
// one sampler.

// The sampler table should be encoded like so:
// For each sampler:
// 64-bits: GPU VA of the sampler.
// 64-bits: 0
// 64-bits: Sampler's LOD Bias.

// The SRV table should be encoded like so:
// 64-bits: 0
// 64-bits: Texture GPU Resource ID
// 64-bits: 0

// Use the companion header for help encoding resources into descriptor tables.

IRCompiler* pCompiler = IRCompilerCreate();
IRCompilerSetGlobalRootSignature( pCompiler, pRootSig );

// Compile DXIL to Metal IR
IRError* pError = nullptr;
IRObject* pDXIL = IRObjectCreateFromDXIL(dxilFragmentBytecode,
IRObject* pOutIR = IRCompilerAllocCompileAndLink(pCompiler,

// if pOutIR is null, inspect pError for causes. Release pError afterwards.

IRMetalLibBinary* pMetallib = IRMetalLibBinaryCreate();
IRObjectGetMetalLibBinary( pOutIR, IRShaderStageFragment, pMetallib );

size_t metallibSize = IRMetalLibGetBytecodeSize( pMetallib );
uint8_t* metallib = new uint8_t[ metallibSize ];
if ( IRMetalLibGetBytecode( pMetallib, metallib ) == metallibSize )
    // Store metallib for later use or directly create a MTLLibrary

delete [] metallib;

IRMetalLibBinaryDestroy( pMetallib );

IRObjectDestroy( pOutIR );
IRObjectDestroy( pDXIL );

IRRootSignatureDestroy( pRootSig );
IRCompilerDestroy( pCompiler );

C++ bump allocator

This snippet demonstrates a simple C++ bump allocator, implemented using metal-cpp. Note: this allocator isn’t thread safe. For multithreading encoding, create one of these instances per thread per frame.


#include <Metal/Metal.hpp>
#include <tuple>
#include <cassert>
#include <cstdint>

namespace mem
constexpr uint64_t alignUp(uint64_t n, uint64_t alignment)
	return (n + alignment - 1) & ~(alignment - 1);

// This allocator isn’t thread safe. For multithreading encoding,
// create one of these instances per thread per frame.
class BumpAllocator
	BumpAllocator(MTL::Device* pDevice,
				  size_t capacityInBytes,
				  MTL::ResourceOptions resourceOptions)
		assert(ResourceOptions != MTL::ResourceStorageModePrivate);
		_offset = 0;
		_capacity = capacityInBytes,
		_pBuffer = pDevice->newBuffer(capacityInBytes, resourceOptions);
		_contents = (uint8_t*)_pBuffer->contents();
	// Disable copy and move constructors and assignment operators
	void reset() { _offset = 0; }
	template< typename T >
	std::pair<T*, uint64_t> addAllocation(uint64_t count=1) noexcept
		// If hit this assert, the allocation data doesn’t fit in
		// the amount estimated.
		assert( _offset + sizeof(T) <=_capacity );
		T* dataPtr = reinterpret_cast<T*>(_contents + _offset);
		int64_t dataOffset = _offset;
		// On Apple GPUs, alignment needs to be on a 4-byte boundary:
		uint64_t allocSize = sizeof(T) * count;
		_offset += mem::alignUp(allocSize, 4);
		return { dataPtr, dataOffset };
	MTL::Buffer* baseBuffer() const noexcept
		return _pBuffer;
	MTL::Buffer* _pBuffer;
	uint64_t _offset;
	uint64_t _capacity;
	uint8_t* _contents;


Complete examples

These examples are complete programs you can use as a starting point for your next project, or just to try out Metal shader converter.

Metal shader converter dynamic library example

This sample builds on the Learn Metal with C++ code sample to add a grass floor to the scene via geometry and tessellation pipeline emulation. The UI allows you to select across the different pipelines available.

The geometry pipeline uses an HLSL geometry shader to generate one strand of grass for each triangle comprising the floor mesh. The geometry stage consumes a buffer to perform a subtle wind animation of the grass mesh.

The tessellation pipeline expands this to subdivide the floor triangle patches, increasing the density of the grass. It also adds an extra wave effect to the wind animation that’s implemented in the domain shader.

The installer places the sample under /opt/metal-shaderconverter/samples by default. To open and build this project from Xcode 15, copy it into your home folder and assign write permissions to the sample’s folder and its contents.

This sample requires macOS 14 or later.