Smooth out your frame rate by checking for issues in your app's CPU and GPU utilization.
A low-performing frame rate can cause an app to feel sluggish or disruptive to its users, so it's important to remove temporary interruptions, or stutters, to optimize your app's user experience. To get information about the cause of slowness in your app's frame rate, you use Xcode's Game Performance instrument, which combines threading and system call information with the Metal System Trace instrument. By presenting important app states and rendering activities, Game Performance helps you infer the changes that are necessary to achieve consistent, smooth rendering.
Open the Template
Start the performance analysis from your Xcode project by choosing Product > Profile, or by pressing Command+I (⌘I). In the template selection window, select Game Performance.
To collect the data that's necessary to analyze your app's frame rate, begin by clicking the record button called out in Figure 2.
Within your app, perform the actions that reproduce a slow frame rate, and then click the record button again to stop recording. You'll find the captured results in Xcode's center pane.
Identify Performance Anomalies
To expedite your review of the capture results, narrow your focus around the time the frame rate was slow. Sometimes a frame rate anomaly is caused by infrequently skipped frames, and other times it's caused by a consistently poor frame rate. In either case, you identify frame rate anomalies by finding unexpected delay in your app's display times.
For example, callout 1 in Figure 3 highlights a display instance that took 250 milliseconds (ms) to complete. Callout 2 shows you how many vertical synchronization (vsync) events were skipped over that time.
Because the 250 ms display instance is significantly longer than the display instances before it, the delay in delivering its frame will be interpreted by the user as a stutter.
In contrast, Figure 4 shows an app that maintained a consistent frame rate. In the display results, hover the mouse over a frame to check its duration.
A duration of 16.67 ms is one 60 fps frame, and because all other frames in Figure 4 consistently achieve this frame duration, there's no performance anomaly to observe.
Not all displays use a ~16 ms frame interval; for example, vertical synchronization happens every ~4 ms on ProMotion displays. So it isn't required that display instances align with vsync for a healthy frame rate––an app that uses a 20 ms frame interval is considered to have a healthy frame rate as long as it consistently achieves 50 fps. However, the delay shown in callout 1 of Figure 3––250 ms––is much too long for smooth animations.
Check Shader Core Utilization
After finding a performance anomaly, look to the GPU activities occurring around that time for the cause. The GPU hardware track shows your shader pipeline stages, which are collectively referred to as shader core. Any long-running stages or inconsistent durations in the track timeline can indicate a utilization issue. For example, Figure 5 shows a case where display spanned two frame intervals, which means the app unintentionally skipped a frame. To begin investigating shader core utilization as a potential cause of poor frame rate:
Observe the performance anomaly; in this case, the display spanned two frame intervals.
Note that the vertex shader is healthy because it completes in a small percentage of the frame interval.
Hover the mouse over the fragment shader to see its duration; in this case, it ran for 36 ms, which is too long.
Because the combined duration of the vertex and fragment shader is more than the duration of a 60 fps frame interval (16.67 ms), the app skipped a frame. The vertex shader ran quickly in this case, which means the app's frame rate issues are caused solely by fragment shader over-utilization.
The following are additional reasons the shader core may be over-utilized:
- Too many render passes
Indicated by the renderer depriving the GPU of downtime. Check the number of render passes that occur at the time of the poor frame rate by using the dependency viewer. For more information, refer to Viewing Your Frame Graph.
- High resolution
Inidicated by critically more fragment shader activity per the same number of submitted vertices, as compared to when your viewport is set to the smaller size. To ensure your app's viewport is not related to the slowdown, temporarily reduce viewport size to see if performance improves.
- Large textures
Indicated by high synchronization time when profiling your fragment shader. The profiler shows a high percentage of time in "wait memory", as seen in Table 1 in Optimizing Performance with the Shader Profiler.
- Large meshes
Indicated by a high number of vertices submitted by your app. Check the affected frame(s) using the geometry viewer. For more information, see Viewing Your Meshes with the Geometry Viewer.
- Unoptimized shader code
Indicated by general shader sluggishness. If you're able to modify your app's shaders, profile them to identify hot spots, like those covered in Table 1 in Optimizing Performance with the Shader Profiler. For example, you can optimize your shaders by downsizing data types, or minimizing the use of control structures. Note that you may not know ahead of time whether your shaders can benefit from optimization until you try it out.
Check CPU Utilization
While checking your shader core utilization, look out for signs that indicate problems with your app's CPU utilization. Figure 6 shows a case where frame skipping appears to be caused by something other than the shader core.
In the GPU hardware tracks, look at the moment when the shader pipeline stages are complete. This is the when the final stage in the pipeline––the fragment shader––is done.
Observe a ~1.5 frame-interval gap between display and when the shader core is run again.
Observe a ~13 frame-interval gap between display and when the shader core is run again.
When display spans multiple frame intervals and there are gaps in the shader core timeline as shown in callouts 2 and 3 in Figure 6, it indicates that your host app's code is running long. Next, inspect your app's CPU utilization to consider whether it's responsible for poor frame rate.
Check for Long-Running Host App Code
To check your app's CPU utilization, identify your rendering thread(s) in the thread state tracks. In the case of healthy CPU utilization, your app's rendering thread(s) should show a significant amount of blocked time. Figure 7 shows an app's rendering thread selected and highlights its blocked time over a ~16 ms frame interval.
Blocked time indicates that your renderer finished submitting its draw calls with some time to spare in the frame interval. Because the amount of blocked time shown in Figure 7 encompasses about two-thirds of its frame interval, the host app has left enough time for the shader core to start and finish its work within the same frame interval.
By contrast, if your rendering thread(s) don't show much blocked time, it's likely that your app is over-utilizing the CPU. To identify whether your app's CPU is over-utilized and to get more information about why, follow these steps, called out in Figure 8:
Observe a stutter as identified by a longer than 16.67 ms display duration.
Ensure that shader code isn't the cause of the low frame rate. See Check Shader Core Utilization for more information.
Check the thread results for "Running" that are colored in blue.
Click the thread's track to select it.
Choose Profile from the view selection menu.
Disclose the results list items and look for the highest weight to find the method(s) that are spending the most time in your host app code.
A thread's time spent running is represented by the collection of blue and orange areas in the track (see callout 3 in Figure 8). If a frame interval has little blocked time, it indicates CPU over-utilization. To resolve the issue, focus your optimization efforts on improving slow-running code, like tuning the method marked by callout 6. Because the long-running methods are in your host app, you should have an idea of whether––and how––you can optimize them to run faster.
Check CPU-GPU Pipelining
In addition to shader core and CPU utilization, more subtle causes of low frame rate involve CPU-GPU pipelining. In this context, pipelining refers to how well your app coordinates the efforts of the CPU and GPU, while maintaining a consistent frame rate. The following sections cover issues that can result from poor CPU-GPU pipelining.
Check CPU-GPU Overlap
By minimizing the amount of time the CPU and GPU wait on each other, you maximize the amount of work each chip does in parallel. That's what's meant by CPU-GPU overlap. For example, Metal provides indirect command buffers (ICBs) to increase overlap; by generating rendering commands on the GPU using ICBs, you avoid CPU waiting that otherwise results when you render with a compute kernel. For more information, see Encoding Indirect Command Buffers on the GPU.
Check Thread Prioritization
Your app can get preempted by other processes if you misconfigure thread priority. To consider these kinds of thread-related pipelining issues, check the User Interactive Load track.
The orange spikes in Figure 9 indicate that runnable threads outnumbered the CPU cores available to process them. The green areas indicate the opposite––the healthy situation where enough CPU cores were available for processing. To deal with the problematic orange situations, you can use fewer threads, and increase the priority of your app's threads.
Check Low Thread Priority
To confirm whether low thread priority is affecting your app's frame rate, follow these steps and see the corresponding callouts in Figure 10:
Observe long-running display instances.
Visually confirm there are a number of skipped frames; for an app that uses a 60 fps frame interval, you'll see that vsyncs don't align with display.
Select the User Interactive Load instrument.
In the center pane, click and drag to select the area containing the performance anomaly observed in callout 1.
Select your app in the bottom pane.
Observe your app's thread state.
Observe your app's thread priority.
The Preempted thread state indicates that other Runnable and Running threads starved your app's thread of processing. Low thread-priority is an example of how misconfigured host app code relates to low frame rate. A priority of 45 is recommended for game rendering threads in iOS. To set your thread's priority, call
pthread before creating your thread with