Tuning Your OpenGL Application
After you design and implement your application, it is important that you spend some time analyzing its performance. The key to performance tuning your OpenGL application is to successively refine the design and implementation of your application. You do this by alternating between measuring your application, identifying where the bottleneck is, and removing the bottleneck.
If you are unfamiliar with general performance issues on the Macintosh platform, you will want to read Getting Started with Performance and Performance Overview. Performance Overview contains general performance tips that are useful to all applications. It also describes most of the performance tools provided with OS X.
Next, take a close look at Instruments. Instruments consolidates many measurement tools into a single comprehensive performance-tuning application.
There are two tools other than OpenGL Profiler that are specific for OpenGL development—OpenGL Driver Monitor and OpenGL Shader Builder. OpenGL Driver Monitor collects real-time data from the hardware. OpenGL Shader Builder provides immediate feedback on vertex and fragment programs that you write.
For more information on these tools, see:
Shark User Guide
The following books contain many techniques for getting the most performance from the GPU:
GPU Gems: Programming Techniques, Tips and Tricks for Real Time Graphics, Randima Fernando. In particular, Graphics Pipeline Performance is a critical article for understanding how to find the bottlenecks in your OpenGL application.
GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation, Matt Pharr and Randima Fernando.
This chapter focuses on two main topics:
“Gathering and Analyzing Baseline Performance Data” shows how to use
topand OpenGL Profiler to obtain and interpret baseline performance data.
“Identifying Bottlenecks with Shark” discusses the patterns of usage that the Shark performance tool can make apparent and that indicate places in your code that you may want to improve.
Gathering and Analyzing Baseline Performance Data
Analyzing performance is a systematic process that starts with gathering baseline data. OS X provides several applications that you can use to assess baseline performance for an OpenGL application:
topis a command-line utility that you run in the Terminal window. You can use
topto assess how much CPU time your application consumes.
OpenGL Profiler is an application that determines how much time an application spends in OpenGL. It also provides function traces that you can use to look for redundant calls.
OpenGL Driver Monitor lets you gather real-time data on the operation of the GPU and lets you look at information (OpenGL extensions supported, buffer modes, sample modes, and so forth) for the available renderers. For more information, see OpenGL Tools for Serious Graphics Development.
This section shows how to use
top along with OpenGL Profiler to analyze where to spend your optimization efforts—in your OpenGL code, your other application code, or in both. You'll see how to gather baseline data and how to determine the relationship of OpenGL performance to overall application performance.
Launch your OpenGL application.
Open a Terminal window and place it side-by-side with your application window.
In the Terminal window, type
topand press Return. You'll see output similar to that shown in Figure 15-1.
topprogram indicates the amount of CPU time that an application uses. The CPU time serves as a good baseline value for gauging how much tuning your code needs. Figure 15-1 shows the percentage of CPU time for the OpenGL application GLCarbon1C (highlighted). Note this application utilizes 31.5% of CPU resources.
Open the OpenGL Profiler application, located in
/Developer/Applications/Graphics Tools/. In the window that appears, select the options to collect a trace and include backtraces, as shown in Figure 15-2.
Select the option “Attach to application”, then select your application from the Application list.
You may see small pauses or stutters in the application, particularly when OpenGL Profiler is collecting a function trace. This is normal and does not significantly affect the performance statistics. The glitches are due to the large amount of data that OpenGL Profiler is writing out.
Click Suspend to stop data collection.
Open the Statistics and Trace windows by choosing them from the Views menu.
Figure 15-3 provides an example of what the Statistics window looks like. Figure 15-4 shows a Trace window.
The estimated percentage of time spent in OpenGL is shown at the bottom of Figure 15-3. Note that for this example, it is 28.91%. The higher this number, the more time the application is spending in OpenGL and the more opportunity there may be to improve application performance by optimizing OpenGL code.
You can use the amount of time spent in OpenGL along with the CPU time to calculate a ratio of the application time versus OpenGL time. This ratio indicates where to spend most of your optimization efforts.
In the Trace window, look for duplicate function calls and redundant or unnecessary state changes.
Look for back-to-back function calls with the same or similar data. These are areas that can typically be optimized. Functions that are called more than necessary include
glDisable. For most applications, these functions can be called once from a setup or state modification routine and called only when necessary.
It's generally good practice to keep state changes out of rendering loops (which can be seen in the function trace as the same sequence of state changes and drawing over and over again) as much as possible and use separate routines to adjust state as necessary.
Look at the time value to the left of each function call to determine the cost of the call.
Determine what the performance gain would be if it were possible to reduce the time to execute all OpenGL calls to zero.
For example, take the performance data from the GLCarbon1C application used in this section to determine the performance attributable to the OpenGL calls.
Total Application Time (from
top) = 31.5%
Total Time in OpenGL (from OpenGL Profiler) = 28.91%
At first glance, you might think that optimizing the OpenGL code could improve application performance by almost 29%, thus reducing the total application time by 29%. This isn't the case. Calculate the theoretical performance increase by multiplying the total CPU time by the percentage of time spent in OpenGL. The theoretical performance improvement for this example is:
31.5 X .2891 = 9.11%
If OpenGL took no time at all to execute, the application would see a 9.11% increase in performance. So, if the application runs at 60 frames per second (FPS), it would perform as follows:
New FPS = previous FPS * (1 +(% performance increase)) = 60 fps *(1.0911) = 65.47 fps
The application gains almost 5.5 frames per second by reducing OpenGL from 28.91% to 0%. This shows that the relationship of OpenGL performance to application performance is not linear. Simply reducing the amount of time spent in OpenGL may or may not offer any noticeable benefit in application performance.
Using OpenGL Driver Monitor to Measure Stalls
You can use OpenGL Driver Monitor to measure how long the CPU waits for the GPU, as shown in Figure 15-5. OpenGL Driver Monitor is useful for analyzing other parameters as well. You can choose which parameters to monitor simply by clicking a parameter name from the drawer shown in the figure.
Identifying Bottlenecks with Shark
Shark is an extremely useful tool for identifying places in your code that are slow and could benefit from optimization. Once you learn the basics, you can use it on your OpenGL applications to identify bottlenecks.
There are three issues to watch out for in Shark when using it to analyze OpenGL performance:
Costly data conversions. If you notice the
glgProcessPixelscall (in the
libGLImage.dyliblibrary) showing up in the analysis, it's an indication that the driver is not handling a texture upload optimally. The call is used when your application makes a
glTexSubImagecall using data that is in a nonnative format for the driver, which means the data must be converted before the driver can upload it. You can improve performance by changing your data so that it is in a native format for the driver. See “Use Optimal Data Types and Formats.”
Time in the
mach_kernellibrary. If you see time spent waiting for a timestamp or waiting for the driver, it indicates that your application is waiting for the GPU to finish processing. You see this during a texture upload, for example.
Misleading symbols. You may see a symbol, such as
glgGetString, that appears to be taking time but shouldn't be taking time in your application. That sometimes happens because the underlying optimizations performed by the system don't have any symbols attached to them on the driver side. Without a symbol to display, Shark shows the last symbol. You need to look for the call that your application made prior to that symbol and focus your attention there. You don't need to concern yourself with the calls that were made "underneath" your call.