Metal Performance Shaders

Add low-level and high-performance kernels to your Metal app. Optimize graphics and compute performance with kernels that are fine-tuned for the unique characteristics of each Metal GPU family.


The Metal Performance Shaders framework contains a collection of highly optimized compute and graphics shaders that are designed to integrate easily and efficiently into your Metal app. These data-parallel primitives are specially tuned to take advantage of the unique hardware characteristics of each GPU family to ensure optimal performance. Apps adopting the Metal Performance Shaders framework can be sure of achieving optimal performance without needing to update their own hand-written shaders for each new GPU family. Metal Performance Shaders can be used along with your app’s existing Metal resources (such as the MTLCommandBuffer, MTLTexture, and MTLBuffer objects) and shaders.

In iOS 9 and tvOS 9, the Metal Performance Shaders framework introduced a series of commonly-used image processing kernels for performing image effects on Metal textures.

In iOS 10 and tvOS 10, the Metal Performance Shaders framework introduced additional support for the following kernels:

  • Convolutional Neural Networks (CNN) to implement and run deep learning using previously obtained training data. CNN is a machine learning technique that attempts to model the visual cortex as a sequence of convolution, rectification, pooling, and normalization steps.

  • Image processing to perform color-conversion.

  • Matrix multiplication.

The MPSKernel Class

The MPSKernel is the base class for all Metal Performance Shaders kernels. It defines the baseline behavior for all kernels, declaring the device to run the kernel on, some debugging options, and a user-friendly label, should one be required. Derived from this class are the MPSUnaryImageKernel and MPSBinaryImageKernel subclasses, which define shared behavior for most image processing kernels (filters) such as edging modes, clipping, and tiling support for image operations that consume one or two source textures. Neither these nor the MPSKernel class are meant to be used directly. They just provide API abstraction and in some cases may allow some level of polymorphic manipulation of image kernel objects.

Subclasses of the MPSUnaryImageKernel and MPSBinaryImageKernel classes provide specialized initialization and encoding methods to encode various image processing primitives into a command buffer, and may also provide additional configurable properties on their own. Many such image filters are available, such as:

  • Convolution filters (Sobel, Gaussian)

  • Morphological operators (dilate, erode)

  • Histogram operators (equalization, specification)

All of these run on the GPU directly on texture and buffer objects.

As the MPSKernel, MPSUnaryImageKernel, and MPSBinaryImageKernel classes serve to unify a diversity of image operations into a simple consistent interface and calling sequence to apply image filters, subclasses implement details that diverge from the norm. For example, some filters may take a small set of parameters (for example, a convolution kernel) to govern how they function. However, the overall sequence for using kernel subclasses remains the same:

  1. Determine whether the Metal Performance Shaders framework supports your device by querying the MPSSupportsMTLDevice(_:) function.

  2. Allocate the usual Metal objects to drive a Metal compute pipeline: MTLDevice, MTLCommandQueue, and MTLCommandBuffer. If your app has already written to any command buffers, Metal Performance Shaders can encode onto them inline with your own workload.

  3. Create an appropriate kernel—for example, a MPSImageGaussianBlur object if you want to do a Gaussian blur. Kernels are generally lightweight but can be reused to save some setup time. They cannot be used by multiple threads concurrently, so if your app uses Metal from many threads concurrently, make extra kernels. MPSKernel objects conform to the NSCopying protocol.

  4. Call the kernel’s encoding method. Parameters for the encoding call vary by kernel type, but operate similarly. They create a command encoder, write commands to run the kernel into the command buffer, and then end the command encoder. This means you must call the endEncoding() method on your current command encoder before calling a kernel’s encode method. At this point, you can either release the kernel or keep it for later use to save some setup cost.

  5. If you wish to encode further commands of your own on the command buffer, you must create a new command encoder to do so.

  6. When you are done with the command buffer, submit it to the device using the commit() method. The kernel will then begin running on the GPU. You can either use the waitUntilCompleted() or addCompletedHandler(_:) methods to be notified when the work is done.

Each kernel is allocated against a particular device; a single kernel may not be used with multiple devices. This is necessary because the init(device:) methods sometimes allocate buffers and textures to hold data passed in as parameters to the initialization method, and a device is required to allocate them. Kernels provide a copy(with:device:) method that allows them to be copied for a new device.

Tuning Hints

The Metal Performance Shaders framework has been tuned for excellent performance across a diversity of devices and kernel parameters. The tuning process focuses on minimizing both CPU and GPU latency for back to back calls on the same command buffer. It is possible, however, to inadvertently undo this optimization effort by introducing costly operations into the pipeline around the kernel, leading to disappointing overall results.

Here are some elements of good practice to avoid common pitfalls:

  1. Don’t wait for results to complete before enqueuing more work. There can be a significant delay (up to 2.5 ms) just to get an empty command buffer through the pipeline to where the waitUntilCompleted() method returns. Instead, start encoding the next command buffer(s) while you wait for the first one to complete. Enqueue them too, so they can start immediately after the previous one exits the GPU. Don’t wait for the CPU kernel to notice the first command buffer is done, start taking it apart, and eventually make a callback to the app before beginning work on encoding the next one. By allowing the CPU and GPU to work concurrently in this way, throughput can be enhanced by up to a factor of ten.

  2. There is a large cost to allocating buffers and textures. The cost can swamp the CPU, preventing you from keeping the GPU busy. Try to preallocate and reuse the MTLResource objects as much as possible.

  3. There is a cost to switching between render and compute encoders. Each time a new render encoder is used, there can be a substantial GPU mode switch cost that may undermine your throughput. To avoid the cost, try to batch compute work together. Since making a new command buffer forces you to make a new command encoder too, try to do more work with fewer command buffers.

  4. For some image operations, particularly those involving multiple passes (e.g. chaining multiple image filters together), performance can be improved by up to a factor of two by breaking the work into tiles of ~512 KB in size. Use the sourceRegion(destinationSize:) method to find the region needed for each tile.


Device Support

func MPSSupportsMTLDevice(MTLDevice?) -> Bool

Determines whether the Metal Performance Shaders framework supports a Metal device.

Image Filters

Image Filters

Apply high-performance filters to, and extract statistical and histogram data from images.

Neural Networks

Implement and run deep learning using previously obtained training data.

class MPSImage

A texture that may have more than 4 channels for use in convolutional neural networks.

class MPSTemporaryImage

A texture for use in convolutional neural networks that stores transient data to be used and discarded promptly.

Objects that Simplify the Creation of Neural Networks

Simplify the creation of neural networks using networks of filter, image, and state nodes.

Convolutional Neural Network Kernels

Build neural networks with layers.

Recurrent Neural Networks

Create recurrent neural networks.

Matrices and Vectors

Matrices and Vectors

Solve systems of equations, factorize matrices and multiply matrices and vectors.

Ray Tracing

Metal for Accelerating Ray Tracing

Use the Metal Performance Shaders ray intersector to perform ray-traced rendering.

class MPSRayIntersector

A kernel that performs intersection tests between rays and geometry.

class MPSAccelerationStructureGroup

A group of acceleration structures.

class MPSInstanceAccelerationStructure

An acceleration structure built over instances of other acceleration structures.

class MPSTriangleAccelerationStructure

An acceleration structure built over triangles.

class MPSAccelerationStructure

The base class for data structures that are built over geometry and used to accelerate ray tracing.


Kernel Base Classes

class MPSKernel

A standard interface for Metal Performance Shaders kernels.

Keyed Archivers

class NSKeyedArchiver

A coder that stores an object's data to an archive referenced by keys.

class MPSKeyedUnarchiver

A keyed archiver that supports Metal Performance Shaders kernel decoding.


class MPSStateResourceList

An interface for objects that define resources for Metal Performance Shaders state containers.

See Also

Beta Software

This documentation contains preliminary information about an API or technology in development. This information is subject to change, and software implemented according to this documentation should be tested with final operating system software.

Learn more about using Apple's beta software