Discover which lines of shader code take the longest to complete, identify their primary GPU activities, and tune your shaders accordingly.
On all platforms, the shader profiler shows your app's longest running pipeline stages. By minimizing the GPU time on your app's longest running stages you can maximize the benefit of your optimization effort.
In iOS and tvOS, Xcode provides more granular data by showing you the duration of each line in your shader.
In addition, on iOS family 4 devices or later, Xcode displays a pie chart that shows the percentage of work performed per GPU activity for each line in your shader. When you make changes to your shader, click the refresh button to update the pie chart and check performance gains.
Configure Your Build to Include Source Code
To use the shader profiler, Xcode looks to your compiled .metallib for source code. You enable this in your project build settings by changing "Produce debugging information" for Debug to "Yes, include source code".
Apps shipped to customers shouldn't contain debugging information, so set Release to No.
Capture a Frame
The shader profiler works with Xcode's Metal frame capture. Build and run your project, then click the camera button on Xcode's debugging toolbar.
For more information about frame capture, see Frame Capture Debugging Tools.
View Your Frame by Performance
When the frame capture completes, Xcode shows the results in the Debug navigator. Click the arrow button designated by the callout to view the frame in different ways.
Choose View Frame By Performance.
Xcode lists your app's command passes in descending order, starting with the longest running within the frame.
Check Your Shader's Duration
In Xcode's Debug navigator, disclose the longest running render pass at the top of the list to display the render pass' shader durations.
To maximize the impact of your optimization effort, minimize the time taken by your app's long running shaders.
Check the Duration Per Line in Your Shader
For iOS and tvOS devices, you can see how long each line in your shader took to complete.
The following image shows a fragment shader that took ~590 nanoseconds to complete (~0.59 milliseconds), which is within the desired GPU time. Note, the sum of your app's shaders' elapsed time should be under your frame interval. The frame interval is 16 milliseconds for an app targeting 60 frames per second.
Click the shader with the highest GPU time to display the shader's source code in the center pane. In the right sidebar, observe the profiler statistics marked by a callout in the following image.
The profiler statistics for a function entry point indicates that function's total elapsed time on the GPU.
The statistics for lines in the function body indicate the time as a percent of the function's total elapsed time.
Focus on adjusting the lines that have the highest percentage in order to maximize the benefit of your performance tuning. For example, if you discover that a calculation is running long and you can substitute it with a precalculated constant instead, the shader profiler suggests that this optimization is worth your time.
Interpret the GPU Activity Metrics
For iOS family 4 devices or later, Xcode displays a pie chart next to the numeric statistic to help you improve a function or code line's performance.
For more information about family 4 devices, see Understanding GPU Family 4.
Hold the pointer over the pie chart to enlarge it and show more detail.
This pie chart identifies the work that the GPU did while executing the line of code. The work a GPU performs can be categorized as memory, ALU, synchronization, or control flow. By understanding the activities that the GPU executes for each line in your shader, you can infer any necessary code changes to improve the line's performance.
Explanation and recommendations
Time spent in the GPU's arithmetic logic unit. Change floats to half floats where possible to reduce time spent in the ALU. Also try to minimize your use of complex instructions like
Time spent waiting for access to your app's buffers or texture memory. Reduce time by down-sampling textures, or, if you're not spending much time in memory, improve your texture resolution instead.
Time spent in conditional, increment, or jump instructions as a result of branches or loops in your shader. Use a constant iteration count to minimize control flow time for loops, because the Metal compiler can generate optimized code in those cases.
Time spent waiting for a required resource or event before execution could begin. Synchronization types are described below.
Synchronization (wait memory)
Waiting for dependent memory accesses such as texture sampling or buffer read/write.
Synchronization (wait pixel)
Waiting for underlying pixels to release resources. In addition to color attachments, pixels can come from depth or stencil buffers or user-defined resources. Blending is a common cause of pixel waiting. Use raster order groups to reduce wait time.
The thread reached a barrier and waits for remaining threads in the same group to arrive at the barrier before proceeding.
Time spent on atomic instructions.
Update Shaders Live
After changing a shader, you can update the captured frame with the new source code by clicking the refresh icon.
In turn, Xcode:
Redraws the application window.
Updates the profiler statistics and pie charts.
Redraws attachments in the assistant editor.
Updating shaders maintains your place in the captured frame, thus providing an interactive environment to enhance your shader performance tuning.