Decide how to tune your encoder performance by identifying your app's longest-running encoders and their primary GPU activity.
Framework
- Metal
Overview
To process your app's Metal commands, the GPU does work in a few different categories. Use Xcode's pipeline statistics view to see which category of activity consume most of your encoders' time. This information can indicate whether and how to optimize your algorithm.
Capture a Frame
Pipeline statistics work with Xcode's Metal frame capture. Build and run your project, then click the camera button on Xcode's debugging toolbar.

For more information about frame capture, see Frame Capture Debugging Tools.
View the GPU Counters Graph
To maximize the results of your optimization effort, start by checking the pipeline statistics for your longest running encoder. You do that by selecting encoders based on elapsed time. To view your app's encoders, click GPU in the Debug navigator to display the GPU counter graph.


This view lists your app's encoders that did work in the frame. The height of the bar represents the encoder's relative amount of the frame's GPU time. The highest bar identifies the encoder that took the longest to complete.
Find an Encoder with High GPU Time
Minimize the duration of the longest-running encoder in the captured frame to optimize your app's performance. To find the longest running encoder, hold the pointer over an encoder bar to view its GPU Time.

Click an encoder that has a relatively high GPU time.
View Pipeline Statistics
In Xcode's assistant editor breadcrumb, click the selection menu and choose Pipeline Statistics.

In the assistant editor, Xcode displays how long each stage in your pipeline took to complete, and the GPU activities it did during that time.

Interpret the GPU Activity Metrics
By understanding the GPU activities that resulted from a particular command, you can infer the necessary code changes to improve the command's performance.
Explanations for GPU activities.
GPU activity | Explanation and recommendations |
---|---|
ALU | Time spent in the GPU's arithmetic logic unit. Change floats to half floats where possible to reduce time spent in the ALU. Also, you can minimize complex instructions like |
Memory | Time spent waiting for access to your app's buffers or texture memory. Reduce this time by down-sampling textures, or, if you're not spending much time in memory, improve your texture resolution instead. |
Control flow | Time spent in conditional, increment, or jump instructions as a result of branches or loops in your shader. Use a constant iteration count to minimize control flow time for loops, because the Metal compiler can generate optimized code in those cases. |
Synchronization | Time spent waiting for a required resource or event before execution could begin. Synchronization types are described below. |
Synchronization (wait memory) | Waiting for dependent memory accesses such as texture sampling or buffer read/write. |
Synchronization (wait pixel) | Waiting for underlying pixels to release resources. In addition to color attachments, pixels can come from depth or stencil buffers or user-defined resources. Blending is a common cause of pixel waiting. Use raster order groups to reduce wait time. |
Synchronization (barrier) | The thread reached a barrier and waits for remaining threads in the same group to arrive at the barrier before proceeding. |
Synchronization (atomics) | Time spent on atomic instructions. |
View Remarks and Recommendations
For known issues, Xcode can interpret the counters for you and give specific recommendations. If Xcode finds issues with your selected encoder, it shows them in Remarks at the top of the pipeline statistics pane. Inside Remarks, Xcode provides suggestions you can follow to improve the performance of your encoder.

Inspect the GPU Time of Your Encoder's Draw Calls
Adjust the code for the commands that exhibit the highest GPU time to maximize the results of your optimization effort. At the bottom of the pipeline statistics pane, Xcode displays the GPU time in the Total column for each draw in the encoder so you can compare their respective elapsed time.

Because you encode these commands in your host app code, you should have some idea about which ones you can adjust according to the suggestions in the pipeline statistics and Remarks views.