A texture that may have more than four channels for use in convolutional neural networks.
- iOS 10.0+
- macOS 10.13+
- tvOS 10.0+
- Metal Performance Shaders
Some image types, such as those found in convolutional neural networks (CNN), differ from a standard texture in that they may have more than 4 channels per pixel. While the channels could hold RGBA data, they will more commonly hold a number of structural permutations upon an RGBA image as the neural network progresses. It is not uncommon for each pixel to have 32 or 64 channels in it.
Since a standard
MTLTexture object cannot have more than 4 channels, the additional channels are stored in slices of a 2D texture array (i.e. a texture of type
MTLTexture) such that 4 consecutive channels are stored in each slice of this array. If the number of feature channels is
N, the number of array slices needed is
(N+3)/4. For example, a 9-channel CNN image with a width of 3 and a height of 2 will be stored as follows:
Thus, the width and height of the underlying 2D texture array is the same as the width and height of the
MPSImage object and the array length is equal to
+3)/4. (Channels marked with a
? are just for padding and should not contain
MPSImage object can contain multiple CNN images for batch processing. In order to create an
MPSImage object that contains
N images, create an
MPSImage object with the
number property set to
N. The length of the 2D texture array (i.e. the number of slices) will be equal to
number, where consecutive
(feature slices of this array represent one image.
MPSImage object can contain more than one image, the actual number of images among these processed by an MPSCNNKernel object is controlled by the
z dimension of the clipRect property. (A kernel processes
n=clip images from this collection.)
The starting index of the image to process from the source
MPSImage object is given by
offset. The starting index of the image in the destination
MPSImage object where this processed image is written to is given by
clip. Thus, an MPSCNNKernel object takes the
n=clip image from the source at indices
[offset, processes each independently, and stores the result in the destination at indices
[clip respectively. Thus,
offset should be
clip should be
<=[destination number, and
offset must be
For example, suppose an MPSCNNConvolution object takes an input image with 16 channels and outputs an image with 32 channels. The number of slices needed in the source 2D texture array is 4 and the number of slices needed in the destination 2D texture array is 8. Suppose the source batch size is 5 and the destination batch size is 4. Thus, the number of source slices will be
4*5=20 and the number of destination slices will be
8*4=32. If you want to process image 2 and 3 of the source and store the result at index 1 and 2 in the destination, you can achieve this by setting
clip. The MPSCNNConvolution object will take, in this case, slices 4 and 5 of the source and produce slices 4 to 7 of the destination. Similarly, slices 6 and 7 will be used to produce slices 8 to 11 of the destination.
All MPSCNNKernel objects process images in the batch independently. That is, calling a MPSCNNKernel object on a batch is formally the same as calling it on each image in the batch sequentially. Computational and GPU work submission overhead will be amortized over more work if batch processing is used. This is especially important for better performance on small images.
number (i.e. only one slice is needed to represent the image), the underlying metal texture type is chosen to be
MTLTexture rather than
MTLTexture as explained above.
The framework also provides
MPSTemporary objects, intended for very short-lived image data that is produced and consumed immediately in the same
MTLCommand object. They are a useful way to minimize CPU-side texture allocation costs and greatly reduce the amount of memory used by your image pipeline.
Creation of the underlying texture may occur lazily in some cases. In general, you should avoid calling the
texture property to avoid materializing memory for longer than necessary. When possible, use the other
MPSImage properties to get information about the object instead.
The MPSImage Class
MTLTexture objects are commonly used in Metal apps and are used directly by the Metal Performance Shaders framework when possible. In apps that use CNN, kernels may need more than the four data channels that a
MTLTexture object can provide. In these cases, an
MPSImage object is used instead as an abstraction layer on top of a
MTLTexture object. When more than 4 channels are needed, additional textures in the 2D texture array are added to hold additional channels in sets of four. An
MPSImage object tracks this information as the number of feature channels in an image.
MPSCNNKernel objects operate on
MPSImage objects are at their core
MTLTexture objects; however, whereas
MTLTexture objects commonly represent image or texel data, an
MPSImage object is a more abstract representation of image features. The channels within an
MPSImage do not necessarily correspond to colors in a color space (although they can, if necessary). As a result, there can be many more than four of them. Having 32 or 64 channels per pixel is not uncommon in CNN. This is achieved on the
MTLTexture object abstraction by inserting extra RGBA pixels to handle the additional feature channels (if any) beyond 4. These extra pixels are stored as multiple slices of a 2D image array. Thus, each CNN pixel in a 32-channel image is represented as 8 array slices, with 4-channels stored per-pixel in each slice. The width and height of the
MTLTexture object is the same as the width and height of the
MPSImage object. The number of slices in the
MTLTexture object is given by the number of feature channels rounded up to a multiple of 4.
MPSImage objects can be created from existing
MTLTexture objects. They may also be created anew from an
MPSImage and backed with either standard texture memory, or as
MPSTemporary objects using memory drawn from the framework’s internal cached texture backing store.
MPSTemporary objects can provide great memory usage and CPU time savings, but come with significant restrictions that should be understood before using them. For example, their contents are only valid during the GPU-side execution of a single
MTLCommand object and can not be read from or written to by the CPU. They are provided as an efficient way to hold CNN computations that are used immediately within the scope of the same
MTLCommand object and then discarded. Concatenation is also supported by allowing you to define from which destination feature channel to start writing the output of the current layer. In this way, your app can make a large
MPSTemporary object and fill in parts of it with multiple layers (as long as the destination feature channel offset is a multiple of 4).