Article

Optimizing Performance with the Shader Profiler

View the elapsed execution time of individual statements in your shader to understand where it spends the most time.

Overview

Using the Shader Profiler, you can prioritize your optimization efforts by reducing the time taken by your longest running shader statements. The shader profiler helps you optimize your shader by showing you how long each statement took to complete. On devices with a Family 4 or later GPU, a pie chart details which GPU activity your shader does most, which provides additional hints about improving performance. With the Update Shaders feature, you can change your shader source code live and quickly see how well your shader performs after the change.

Set Up Your Project to Enable the Shader Profiler

To use the shader profiler on your project, set up the .metallib file to allow for debugging:

  1. In Xcode, navigate to your project's build settings.

  2. For the Debug build configuration, set "Produce debugging information" to "Yes, include source code."

Use the shader profiler within a captured Metal frame. Most commonly, you capture a Metal frame by clicking the camera button on Xcode's debug bar as covered in Performing a GPU Capture from the Debug Bar. For more ways to capture a Metal frame, see Metal GPU Capture.

From the captured frame, open the shader profiler using the steps in Figure 1:

  1. In the Debug navigator, choose View Frame By Performance.

  2. View your render pipelines populated in the list.

  3. Observe the amount of time each one took during the frame.

Figure 1

Viewing a frame by performance

Screenshot showing Xcode's Debug navigator. On the top right, the "View frame in different ways" selection menu is annotated. On the left, the frame's render pipelines are listed, and on the right, their elapsed time in the frame.

Click the disclosure triangle to expand a shader and see the time taken by any inline functions it called. Figure 2 shows that the inline function sample took about 134 microseconds (about 42%) of the total time taken by fragmentShader (about 318 microseconds).

Figure 2

Examining the cost of a shader's inline function calls

Screenshot of Xcode's View Frame By Performance pane with a shader expanded to reveal the relative cost of an inline function.

Profile a Shader

Profile a shader using the following steps, and as annotated in Figure 3:

  1. Expand the render pipeline.

  2. Select the shader you want to profile.

  3. View the shader source code in the center pane with the function entry point highlighted.

  4. Examine the times and percentages column.

Figure 3

Profiling a shader

On the left, a fragment shader is selected in the call list and its entry point signature is highlighted in the source code view displayed in the center pane. On the right, the times and percentages are displayed which indicate the performance of the fragment function and its statements.

Because profiling is for performance tuning, most often you'll inspect the render pipeline and shader that took the longest to complete.

In the times and percentages column, the time marking the function entry point is the shader's total elapsed time. Inside of the shader function, a percentage marks each statement and indicates what time (as a percent) of the elapsed time that statement took.

Interpret the GPU Activity Metrics

Next to the percentage of time taken, a pie chart details which activity the GPU is doing most during the statement.

Place your mouse pointer over the dot to bring up the pie chart, as shown in Figure 4.

Figure 4

GPU activity pie chart

Screenshot showing the pie chart popup after hovering over the dot to the right of a percentage in the times column.

A high percentage in one GPU activity can indicate a performance bottleneck, and an opportunity for optimization. See the following explanations based on state:

Table 1

Explanations for GPU activity

GPU activity

Explanation

ALU

Time spent in the GPU's arithmetic logic unit. Changing floats to half floats where possible is one way to reduce time spent in the ALU. Another is to minimize complex instructions, like sqrt, sin, cos, recip, and so on.

Memory

Time spent waiting for access to your app's buffers or texture memory. You can shorten this time by down-sampling textures, or, if you're not spending much time in Memory, you could improve your texture resolution instead.

Control Flow

Time spent in conditional, increment, or jump instructions as a result of branches or loops in your shader. Use a constant interation count to minimize Control Flow time for loops because the Metal compiler can generate optimized code in those cases.

Synchronization

Time spent waiting for a required resource or event before execution could begin. Synchronization types are described below.

Synchronization (wait memory)

Waiting for dependent memory accesses issued in prior instructions, such as texture sampling or buffer read/write.

Synchronization (wait pixel)

Waiting for underlapping pixels to release resources. In addition to color attachments, pixels could be from depth or stencil buffers or user-defined resources. Blending is a common cause of pixel waiting. Use raster order groups to reduce time spent waiting for pixels.

Synchronization (barrier)

The thread reached a barrier and waits for remaining threads in the same group to arrive at the barrier before proceeding.

Synchronization (atomics)

Time spent on atomic instructions.

Update Shaders Live

After making a change to a shader you can apply the update live using the Update Shaders button highlighted in Figure 5.

Figure 5

Using the Update Shaders feature

Screenshot annotating the Update Shaders button. It's in the bottom-middle of the shader source in the center pane.

The Update Shaders button applies the source code changes you make to the same captured Metal frame. The updates reflect as follows:

  • The application window is redrawn.

  • Elapsed time and percentage metrics are recalculated.

  • Attachments in the Assistant Editor are redrawn.

Because Updating Shaders maintains your view in the captured Metal frame, you can easily make successive changes to your shader source code for iterative optimization.

See Also

Profiling and Metrics

GPU Activity Monitors

Use Xcode or macOS tools to view a high-level summary of the GPU activity of your app or a Mac.

Viewing Performance Metrics with GPU Counters

Ensure that properties related to an encoder's rendering are within the desired range.

Viewing Pipeline Statistics of a Draw

See relative percentages of where a given draw call spent its time across the GPU architecture.