Data-Parallel Compute Processing: Compute Command Encoder
This chapter explains how to create and use a MTLComputeCommandEncoder
object to encode data-parallel compute processing state and commands and submit them for execution on a device.
To perform a data-parallel computation, follow these main steps:
Use a
MTLDevice
method to create a compute state (MTLComputePipelineState
) that contains compiled code from aMTLFunction
object, as discussed in Creating a Compute State. TheMTLFunction
object represents a compute function written with the Metal shading language, as described in Functions and Libraries.Specify the
MTLComputePipelineState
object to be used by the compute command encoder, as discussed in Specifying a Compute State and Resources for a Compute Command Encoder.Specify resources and related objects (
MTLBuffer
,MTLTexture
, and possiblyMTLSamplerState
) that may contain the data to be processed and returned by the compute state, as discussed in Specifying a Compute State and Resources for a Compute Command Encoder. Also set their argument table indices, so that Metal framework code can locate a corresponding resource in the shader code. At any given moment, theMTLComputeCommandEncoder
can be associated to a number of resource objects.Dispatch the compute function a specified number of times, as explained in Executing a Compute Command.
Creating a Compute Pipeline State
A MTLFunction
object represents data-parallel code that can be executed by a MTLComputePipelineState
object. The MTLComputeCommandEncoder
object encodes commands that set arguments and execute the compute function. Because creating a compute pipeline state can require an expensive compilation of Metal shading language code, you can use either a blocking or an asynchronous method to schedule such work in a way that best fits the design of your app.
To synchronously create the compute pipeline state object, call either the
newComputePipelineStateWithFunction:error:
ornewComputePipelineStateWithFunction:options:reflection:error:
method ofMTLDevice
. These methods block the current thread while Metal compiles shader code to create the pipeline state object.To asynchronously create the compute pipeline state object, call either the
newComputePipelineStateWithFunction:completionHandler:
ornewComputePipelineStateWithFunction:options:completionHandler:
method ofMTLDevice
. These methods return immediately—Metal asynchronously compiles shader code to create the pipeline state object, then calls your completion handler to provide the newMTLComputePipelineState
object.
When you create a MTLComputePipelineState
object you can also choose to create reflection data that reveals details of the compute function and its arguments. The newComputePipelineStateWithFunction:options:reflection:error:
and newComputePipelineStateWithFunction:options:completionHandler:
methods provide this data. Avoid obtaining reflection data if it will not be used. For more information on how to analyze reflection data, see Determining Function Details at Runtime.
Specifying a Compute State and Resources for a Compute Command Encoder
The setComputePipelineState:
method of a MTLComputeCommandEncoder
object specifies the state, including a compiled compute shader function, to use for a data-parallel compute pass. At any given moment, a compute command encoder can be associated to only one compute function.
The following MTLComputeCommandEncoder
methods specify a resource (that is, a buffer, texture, sampler state, or threadgroup memory) that is used as an argument to the compute function represented by the MTLComputePipelineState
object.
Each method assigns one or more resources to the corresponding argument(s), as illustrated in Figure 6-1.
The limits for the maximum number of entries in a buffer, texture, or sampler state argument table are listed in the Implementation Limits table.
The limits for the maximum total threadgroup memory allocation is also listed in the Implementation Limits table.
Executing a Compute Command
To encode a command to execute a compute function, call the dispatchThreadgroups:threadsPerThreadgroup:
method of MTLComputeCommandEncoder
and specify the threadgroup dimensions and the number of threadgroups. You can query the threadExecutionWidth
and maxTotalThreadsPerThreadgroup
properties of MTLComputePipelineState
to optimize the execution of the compute function on this device.
The total number of threads in a threadgroup is the product of the components of threadsPerThreadgroup
: threadsPerThreadgroup.width * threadsPerThreadgroup.height * threadsPerThreadgroup.depth
. The maxTotalThreadsPerThreadgroup
property specifies the maximum number of threads that can be in a single threadgroup to execute this compute function on the device.
Compute commands are executed in the order in which they are encoded into the command buffer. A compute command finishes execution when all threadgroups associated with the command finish execution and all results are written to memory. Because of this sequencing, the results of a compute command are available to any commands encoded after it in the command buffer.
To end encoding commands for a compute command encoder, call the endEncoding
method of MTLComputeCommandEncoder
. After ending the previous command encoder, you can create a new command encoder of any type to encode additional commands into the command buffer.
Code Example: Executing Data-Parallel Functions
Listing 6-1 shows an example that creates and uses a MTLComputeCommandEncoder
object to perform the parallel computations of an image transformation on specified data. (This example does not show how the device, library, command queue, and resource objects are created and initialized.) The example creates a command buffer and then uses it to create the MTLComputeCommandEncoder
object. Next a MTLFunction
object is created that represents the entry point filter_main
loaded from the MTLLibrary
object, shown in Listing 6-2. Then the function object is used to create a MTLComputePipelineState
object called filterState
.
The compute function performs an image transformation and filtering operation on the image inputImage
with the results returned in outputImage
. First the setTexture:atIndex:
and setBuffer:offset:atIndex:
methods assign texture and buffer objects to indices in the specified argument tables. paramsBuffer
specifies values used to perform the image transformation, and inputTableData
specifies filter weights. The compute function is executed as a 2D threadgroup of size 16 x 16 pixels in each dimension. The dispatchThreadgroups:threadsPerThreadgroup:
method enqueues the command to dispatch the threads executing the compute function, and the endEncoding
method terminates the MTLComputeCommandEncoder
. Finally, the commit
method of MTLCommandBuffer
causes the commands to be executed as soon as possible.
Listing 6-1 Specifying and Running a Function in a Compute State
id <MTLDevice> device; |
id <MTLLibrary> library; |
id <MTLCommandQueue> commandQueue; |
id <MTLTexture> inputImage; |
id <MTLTexture> outputImage; |
id <MTLTexture> inputTableData; |
id <MTLBuffer> paramsBuffer; |
// ... Create and initialize device, library, queue, resources |
// Obtain a new command buffer |
id <MTLCommandBuffer> commandBuffer = [commandQueue commandBuffer]; |
// Create a compute command encoder |
id <MTLComputeCommandEncoder> computeCE = [commandBuffer computeCommandEncoder]; |
NSError *errors; |
id <MTLFunction> func = [library newFunctionWithName:@"filter_main"]; |
id <MTLComputePipelineState> filterState |
= [device newComputePipelineStateWithFunction:func error:&errors]; |
[computeCE setComputePipelineState:filterState]; |
[computeCE setTexture:inputImage atIndex:0]; |
[computeCE setTexture:outputImage atIndex:1]; |
[computeCE setTexture:inputTableData atIndex:2]; |
[computeCE setBuffer:paramsBuffer offset:0 atIndex:0]; |
MTLSize threadsPerGroup = {16, 16, 1}; |
MTLSize numThreadgroups = {inputImage.width/threadsPerGroup.width, |
inputImage.height/threadsPerGroup.height, 1}; |
[computeCE dispatchThreadgroups:numThreadgroups |
threadsPerThreadgroup:threadsPerGroup]; |
[computeCE endEncoding]; |
// Commit the command buffer |
[commandBuffer commit]; |
Listing 6-2 shows the corresponding shader code for the preceding example. (The functions read_and_transform
and filter_table
are placeholders for user-defined code).
Listing 6-2 Shading Language Compute Function Declaration
kernel void filter_main( texture2d<float,access::read> inputImage [[ texture(0) ]], texture2d<float,access::write> outputImage [[ texture(1) ]], uint2 gid [[ thread_position_in_grid ]], texture2d<float,access::sample> table [[ texture(2) ]], constant Parameters* params [[ buffer(0) ]] ) { float2 p0 = static_cast<float2>(gid); float3x3 transform = params->transform; float4 dims = params->dims; float4 v0 = read_and_transform(inputImage, p0, transform); float4 v1 = filter_table(v0,table, dims); outputImage.write(v1,gid); } |
Copyright © 2016 Apple Inc. All Rights Reserved. Terms of Use | Privacy Policy | Updated: 2016-12-12