Article

Optimizing Performance with Pipeline Statistics

Decide how to tune your encoder performance by identifying your app's longest-running encoders and their primary GPU activity.

Overview

To process your app's Metal commands, the GPU does work in a few different categories. Use Xcode's pipeline statistics view to see which category of activity consume most of your encoders' time. This information can indicate whether and how to optimize your algorithm.

Capture a Frame

Pipeline statistics work with Xcode's Metal frame capture. Build and run your project, then click the camera button on Xcode's debugging toolbar.

Screenshot of the camera button marked by a callout on Xcode's debug bar.

For more information about frame capture, see Frame Capture Debugging Tools.

View the GPU Counters Graph

To maximize the results of your optimization effort, start by checking the pipeline statistics for your longest running encoder. You do that by selecting encoders based on elapsed time. To view your app's encoders, click GPU in the Debug navigator to display the GPU counter graph.

Screenshot of the GPU Counters Graph showing the relative performance of an app's twelve encoders.

This view lists your app's encoders that did work in the frame. The height of the bar represents the encoder's relative amount of the frame's GPU time. The highest bar identifies the encoder that took the longest to complete.

Find an Encoder with High GPU Time

Minimize the duration of the longest-running encoder in the captured frame to optimize your app's performance. To find the longest running encoder, hold the pointer over an encoder bar to view its GPU Time.

Screeshot of an encoder in the counter graph with a tooltip displaying its GPU time.

Click an encoder that has a relatively high GPU time.

View Pipeline Statistics

In Xcode's assistant editor breadcrumb, click the selection menu and choose Pipeline Statistics.

Screenshot of Xcode's assistant editor breadcrumb set to Pipeline Statistics.

In the assistant editor, Xcode displays how long each stage in your pipeline took to complete, and the GPU activities it did during that time.

Screenshot of the granular pipeline activities that the vertex and fragment shader stages did within the captured frame for the selected draw call.

Interpret the GPU Activity Metrics

By understanding the GPU activities that resulted from a particular command, you can infer the necessary code changes to improve the command's performance.

Table 1

Explanations for GPU activities.

GPU activity

Explanation and recommendations

ALU

Time spent in the GPU's arithmetic logic unit. Change floats to half floats where possible to reduce time spent in the ALU. Also, you can minimize complex instructions like sqrt, sin, cos, and recip.

Memory

Time spent waiting for access to your app's buffers or texture memory. Reduce this time by down-sampling textures, or, if you're not spending much time in memory, improve your texture resolution instead.

Control flow

Time spent in conditional, increment, or jump instructions as a result of branches or loops in your shader. Use a constant iteration count to minimize control flow time for loops, because the Metal compiler can generate optimized code in those cases.

Synchronization

Time spent waiting for a required resource or event before execution could begin. Synchronization types are described below.

Synchronization (wait memory)

Waiting for dependent memory accesses such as texture sampling or buffer read/write.

Synchronization (wait pixel)

Waiting for underlying pixels to release resources. In addition to color attachments, pixels can come from depth or stencil buffers or user-defined resources. Blending is a common cause of pixel waiting. Use raster order groups to reduce wait time.

Synchronization (barrier)

The thread reached a barrier and waits for remaining threads in the same group to arrive at the barrier before proceeding.

Synchronization (atomics)

Time spent on atomic instructions.

View Remarks and Recommendations

For known issues, Xcode can interpret the counters for you and give specific recommendations. If Xcode finds issues with your selected encoder, it shows them in Remarks at the top of the pipeline statistics pane. Inside Remarks, Xcode provides suggestions you can follow to improve the performance of your encoder.

Screenshot of the remakes pane that lists recommendations for the selected draw call.

Inspect the GPU Time of Your Encoder's Draw Calls

Adjust the code for the commands that exhibit the highest GPU time to maximize the results of your optimization effort. At the bottom of the pipeline statistics pane, Xcode displays the GPU time in the Total column for each draw in the encoder so you can compare their respective elapsed time.

Screenshot of the other draws within the same encoder. At center, the other draws' GPU time is displayed in the Total column. At right, a bar chart visualizes the GPU time by sizing the bar relative to it.

Because you encode these commands in your host app code, you should have some idea about which ones you can adjust according to the suggestions in the pipeline statistics and Remarks views.

See Also

Optimizing Your App

Optimizing Performance with GPU Counters

Examine granular metrics for your rendering or compute calls, and tune your app as needed.

Optimizing Performance with the Shader Profiler

Discover which lines of shader code take the longest to complete, identify their primary GPU activities, and tune your shaders accordingly.