Article

Optimizing Performance with the Shader Profiler

Discover which lines of shader code take the longest to complete, identify their primary GPU activities, and tune your shaders accordingly.

Overview

On all platforms, the shader profiler shows your app's longest running pipeline stages. By minimizing the GPU time on your app's longest running stages you can maximize the benefit of your optimization effort.

In iOS and tvOS, Xcode provides more granular data by showing you the duration of each line in your shader.

In addition, on iOS family 4 devices or later, Xcode displays a pie chart that shows the percentage of work performed per GPU activity for each line in your shader. When you make changes to your shader, click the refresh button to update the pie chart and check performance gains.

Configure Your Build to Include Source Code

To use the shader profiler, Xcode looks to your compiled .metallib for source code. You enable this in your project build settings by changing "Produce debugging information" for Debug to "Yes, include source code".

Apps shipped to customers shouldn't contain debugging information, so set Release to No.

Capture a Frame

The shader profiler works with Xcode's Metal frame capture. Build and run your project, then click the camera button on Xcode's debugging toolbar.

Screenshot of the debug bar with the camera button marked by a callout.

For more information about frame capture, see Frame Capture Debugging Tools.

View Your Frame by Performance

When the frame capture completes, Xcode shows the results in the Debug navigator. Click the arrow button designated by the callout to view the frame in different ways.

Screenshot of the Debug navigator the Metal debugger active. At top right, the "View frame in different ways" button is marked with a callout.

Choose View Frame By Performance.

Xcode lists your app's command passes in descending order, starting with the longest running within the frame.

Screenshot of the Debug navigator populated with a captured frame viewed by performance.

Check Your Shader's Duration

In Xcode's Debug navigator, disclose the longest running render pass at the top of the list to display the render pass' shader durations.

Screenshot of a render pass pipeline expanded to show its pipeline stages' GPU times.

To maximize the impact of your optimization effort, minimize the time taken by your app's long running shaders.

Check the Duration Per Line in Your Shader

For iOS and tvOS devices, you can see how long each line in your shader took to complete.

The following image shows a fragment shader that took ~590 nanoseconds to complete (~0.59 milliseconds), which is within the desired GPU time. Note, the sum of your app's shaders' elapsed time should be under your frame interval. The frame interval is 16 milliseconds for an app targeting 60 frames per second.

Screenshot of a render pass pipeline expanded with its longest running shader selected.

Click the shader with the highest GPU time to display the shader's source code in the center pane. In the right sidebar, observe the profiler statistics marked by a callout in the following image.

Screenshot of the source code view populated with a fragment shader. At right, the profiler statistics are marked by a callout.

The profiler statistics for a function entry point indicates that function's total elapsed time on the GPU.

Screenshot of the function entry point's profiler statistics.

The statistics for lines in the function body indicate the time as a percent of the function's total elapsed time.

Screenshot of the function body's profiler statistics.

Focus on adjusting the lines that have the highest percentage in order to maximize the benefit of your performance tuning. For example, if you discover that a calculation is running long and you can substitute it with a precalculated constant instead, the shader profiler suggests that this optimization is worth your time.

Interpret the GPU Activity Metrics

For iOS family 4 devices or later, Xcode displays a pie chart next to the numeric statistic to help you improve a function or code line's performance.

For more information about family 4 devices, see Understanding GPU Family 4.

Hold the pointer over the pie chart to enlarge it and show more detail.

Screenshot showing the pie chart popup after hovering over the dot to the right of a percentage in the times column.

This pie chart identifies the work that the GPU did while executing the line of code. The work a GPU performs can be categorized as memory, ALU, synchronization, or control flow. By understanding the activities that the GPU executes for each line in your shader, you can infer any necessary code changes to improve the line's performance.

Table 1

Explanations for GPU activity

GPU activity

Explanation and recommendations

ALU

Time spent in the GPU's arithmetic logic unit. Change floats to half floats where possible to reduce time spent in the ALU. Also try to minimize your use of complex instructions like sqrt, sin, cos, and recip.

Memory

Time spent waiting for access to your app's buffers or texture memory. Reduce time by down-sampling textures, or, if you're not spending much time in memory, improve your texture resolution instead.

Control flow

Time spent in conditional, increment, or jump instructions as a result of branches or loops in your shader. Use a constant iteration count to minimize control flow time for loops, because the Metal compiler can generate optimized code in those cases.

Synchronization

Time spent waiting for a required resource or event before execution could begin. Synchronization types are described below.

Synchronization (wait memory)

Waiting for dependent memory accesses such as texture sampling or buffer read/write.

Synchronization (wait pixel)

Waiting for underlying pixels to release resources. In addition to color attachments, pixels can come from depth or stencil buffers or user-defined resources. Blending is a common cause of pixel waiting. Use raster order groups to reduce wait time.

Synchronization (barrier)

The thread reached a barrier and waits for remaining threads in the same group to arrive at the barrier before proceeding.

Synchronization (atomics)

Time spent on atomic instructions.

Update Shaders Live

After changing a shader, you can update the captured frame with the new source code by clicking the refresh icon.

Screenshot of the shader debug bar with the refresh icon marked by a callout.

In turn, Xcode:

  • Redraws the application window.

  • Updates the profiler statistics and pie charts.

  • Redraws attachments in the assistant editor.

Updating shaders maintains your place in the captured frame, thus providing an interactive environment to enhance your shader performance tuning.

See Also

Optimizing Your App

Optimizing Performance with GPU Counters

Examine granular metrics for your rendering or compute calls, and tune your app as needed.

Optimizing Performance with Pipeline Statistics

Decide how to tune your encoder performance by identifying your app's longest-running encoders and their primary GPU activity.