Tuning Your OpenGL ES App
The performance of OpenGL ES apps in iOS differs from that of OpenGL in OS X or other desktop operating systems. Although powerful computing devices, iOS–based devices do not have the memory or CPU power that desktop or laptop computers possess. Embedded GPUs are optimized for lower memory and power usage, using algorithms different from those a typical desktop or laptop GPU might use. Rendering your graphics data inefficiently can result in a poor frame rate or dramatically reduce the battery life of an iOS-based device.
Later chapters will touch on many techniques to improve your app’s performance; this chapter talks about overall strategies you may want to follow.
General Performance Recommendations
Use common sense to guide your performance tuning efforts. For example, if your app draws only a few dozen triangles per frame, changing how it submits vertex data is unlikely to improve its performance. Look for optimizations that provide the most performance improvement for your effort.
Test Your App with Xcode
Don’t optimize your app until you have tested its performance under a variety of scenarios on a variety of devices. Use these tools in Xcode and Instruments to look for errors and performance issues while your app runs:
OpenGL ES Debug Gauge. When debugging an OpenGL ES app on a device, the Xcode debug navigator includes an FPS (frames per second) indicator below the default CPU and Memory debug gauges. Click this indicator to show a real-time report of your app’s OpenGL ES performance in the editor area. This debug gauge quickly helps you to determine whether OpenGL ES is the main bottleneck in your app; you should refer to it often when testing your OpenGL ES code.
Instruments (OpenGL ES Analysis). This tool helps you study your app’s usage of OpenGL ES. The OpenGL ES Analysis tool records the OpenGL ES commands generated by your app and warns you when your app does not follow the best practices described in this programming guide; it recommends specific changes you can make to follow the best practices. You can see all the commands used to generate each frame of animation. Finally, you can selectively disable portions of the graphics pipeline to determine whether that part of the pipeline is a significant bottleneck in your app.
The OpenGL ES Analysis tool provides you a great set of tools to manually analyze your app and understand its inner workings. It does not, however, automatically point you at the location where your app is currently bottlenecked. For example, even when it offers a suggestion on how to improve your OpenGL ES coding practices, following that suggestion won’t necessarily improve the performance of your app.
Instruments (OpenGL ES Driver). This tool tracks how resources are used by your app. For example, you can use OpenGL ES Driver to track the number of bytes used to hold texture data and how those numbers change from frame to frame.
For a more detailed perspective, use these tools in Xcode to look for errors and performance issues when rendering a specific frame:
OpenGL ES Frame Debugger. Xcode can capture the entire sequence of OpenGL ES drawing commands that produce a displayed frame. To capture a frame while debugging an OpenGL ES app on a device, click the Capture Frame button on the debug bar or choose Debug > Capture OpenGL ES Frame. You can also capture a frame as a breakpoint action. After a frame is captured, Xcode reconfigures its user interface for OpenGL ES frame debugging:
The primary editor shows framebuffer and renderbuffer contents.
The debug navigator shows the sequence of OpenGL ES commands used to render the frame. Selecting a command in the navigator changes the framebuffer view to show rendering output only up to that command. It also highlights any drawing performed by that command.
The assistant editor shows OpenGL ES objects. In this editor, you can view the contents of data buffers, vertex array objects, and textures. You can also view and edit shader source code and see changes reflected in the framebuffer. On OpenGL ES 3.0–capable devices, you can also see profiling information for shaders.
The debug area shows OpenGL ES object, state variables, errors, performance warnings, and statistics.
Use OpenGL ES Frame Debugger frequently to discover errors and performance issues in your OpenGL ES drawing code and shaders.
OpenGL ES Performance Analyzer. This tool extends OpenGL ES Frame Debugger to analyze common performance issues. To see a list of performance issues when debugging an OpenGL ES app on a device, click the OpenGL ES debug gauge in the debug navigator, and then click the Analyze button at the top of the GPU Report that appears in the editor area. After Xcode captures a frame, the GPU Report expands to show a list of performance issues. For each issue, you can see a list of OpenGL ES calls involved, their location in your code and in the frame capture, and specific recommendations for improving performance. A key advantage of the OpenGL ES Performance Analyzer is that it can automatically direct you immediately to the critical location in your app that slows OpenGL ES performance the most.
For more information, see Xcode Overview and Instruments User Guide.
Use Xcode and Instruments to Test for OpenGL ES Errors
OpenGL ES errors result from your app using the OpenGL ES API incorrectly or requesting operations that the underlying hardware is not capable of performing. Even if your content renders correctly, these errors may indicate performance problems. The traditional way to check for OpenGL ES errors is to call the
glGetError function; however, repeatedly calling this function can significantly degrade performance. Instead, you can use the tools outlined above to test for errors:
When profiling your app in Instruments, see the detail pane for OpenGL ES Analyzer instrument to view any OpenGL ES errors reported while recording.
While debugging your app in Xcode, you can capture a frame and use OpenGL ES Frame Debugger to examine the drawing commands used to produce it, as well as any errors encountered while performing those commands.
You can also configure Xcode to stop program execution when an OpenGL ES error is encountered. (See Adding an OpenGL ES Error Breakpoint.)
Annotate Your Drawing Code for Informative Debugging and Profiling
You can make debugging and profiling more efficient by organizing your stream of OpenGL ES commands into logical groups and adding meaningful labels to OpenGL ES objects. These groups and labels appear in OpenGL ES Frame Debugger in Xcode as shown in Figure 7-1, and in OpenGL ES Analyzer in Instruments. To add groups and labels, use the EXT_debug_marker and EXT_debug_label extensions.
When you have a sequence of drawing commands that represent a single meaningful operation—for example, drawing a game character—you can use a marker to group them for debugging. Call the
glPushGroupMarkerEXT function and provide a meaningful name before the drawing calls to be labeled, and call the
glPopGroupMarkerEXT function afterward. Listing 7-1 uses these functions to group the texture, program, vertex array, and draw calls for a single element of a scene.
Listing 7-1 Using the
EXT_debug_marker extension to annotate drawing commands
glPushGroupMarkerEXT(0, "Draw Spaceship");
glDrawElements(GL_TRIANGLE_STRIP, 256, GL_UNSIGNED_SHORT, 0);
You can use multiple nested markers to create a hierarchy of meaningful groups in a complex scene. When you use the
GLKView class to draw OpenGL ES content, it automatically creates a “Rendering” group containing all commands in your drawing method. Any markers you create are nested within this group.
Labels can be used to provide meaningful names for OpenGL ES objects, such as textures, shader programs, and vertex array objects. Call the
glLabelObjectEXT function with the Open GL ES identifier for an object to give it a name to be shown when debugging and profiling. Listing 7-2 illustrates using this function to label a vertex array object. If you use the
GLKTextureLoader class to load texture data, it automatically labels the OpenGL ES texture objects it creates with their filenames.
Listing 7-2 Using the
EXT_debug_label extension to annotate OpenGL ES objects
glLabelObjectEXT(GL_VERTEX_ARRAY_OBJECT_EXT, _spaceshipMesh, 0, "Spaceship");
Redraw Scenes Only When the Scene Data Changes
Your app should wait until something in the scene changes before rendering a new frame. Core Animation caches the last image presented to the user and continues to display it until a new frame is presented.
Even when your data changes, it is not necessary to render frames at the speed the hardware processes commands. A slower but fixed frame rate often appears smoother to the user than a fast but variable frame rate. A fixed frame rate of 30 frames per second is sufficient for most animation and helps reduce power consumption.
Disable Unused OpenGL ES Features
Whether you are using the fixed-function pipeline of OpenGL ES 1.1 or shaders in OpenGL ES 2.0 or later, the best calculation is one that your app never performs. For example, if a calculation can be pre-calculated and stored in your model data, you can avoid performing that calculation at runtime.
If your app is written for OpenGL ES 2.0 or later, do not create a single shader with lots of switches and conditionals that performs every task your app needs to render the scene. Instead, compile multiple shader programs that each perform a specific, focused task.
If your app uses OpenGL ES 1.1, disable any fixed-function operations that are not necessary to render the scene. For example, if your app does not require lighting or blending, you should disable those functions. Similarly, if your app draws only 2D models, it should disable fog and depth testing.
Minimize the Number of Draw Calls
Every time your app submits primitives to be processed by OpenGL ES, the CPU spends time preparing the commands for the graphics hardware. To reduce this overhead, batch your drawing into fewer calls. For example, you might merge multiple triangle strips into a single strip, as described in “Use Triangle Strips to Batch Vertex Data.”
Consolidating models to use a common set of OpenGL state has other advantages in that it reduces the overhead of changing OpenGL ES state. See “Be Mindful of OpenGL ES State Variables.”
For best results, consolidate primitives that are drawn in close spacial proximity. Large, sprawling models are more difficult for your app to efficiently cull when they are not visible in the frame.
Memory Is a Scarce Resource on iOS Devices
Your iOS app shares main memory with the system and other iOS apps. Memory allocated for OpenGL ES reduces the amount of memory available for other uses in your app. With that in mind, allocate only the memory that you need and deallocate it as soon as your app no longer needs it. Here are a few ways you can save memory:
After loading an image into an OpenGL ES texture, free the original image.
Allocate a depth buffer only when your app requires it.
If your app does not need all of its resources at once, load only a subset of the items. For example, a game might be divided into levels; each loads a subset of the total resources that fits within a more strict resource limit.
The virtual memory system in iOS does not use a swap file. When a low-memory condition is detected, instead of writing volatile pages to disk, the virtual memory frees up nonvolatile memory to give your running app the memory it needs. Your app should strive to use as little memory as possible and be prepared to dispose of objects that are not essential to your app. Responding to low-memory conditions is covered in detail in the iOS App Programming Guide.
Do Not Sort Rendered Objects Unless Necessary
Do not waste time sorting objects front to back. OpenGL ES on all iOS devices implements a tile-based deferred rendering model that makes this unnecessary. See “OpenGL ES Hardware Processors” for more information.
Do sort objects by their opacity:
Draw opaque objects first.
Next draw objects that require alpha testing (or in an OpenGL ES 2.0 or 3.0 based app, objects that require the use of
discardin the fragment shader). Note that these operations have a performance penalty, as described in “Avoid Alpha Test and Discard.”
Finally, draw alpha-blended objects.
Simplify Your Lighting Models
This advice applies both to fixed-function lighting in OpenGL ES 1.1 and shader-based lighting calculations you use in your custom shaders in OpenGL ES 2.0 or later.
Use the fewest lights possible and the simplest lighting type for your app. Consider using directional lights instead of spot lighting, which require more calculations. Shaders should perform lighting calculations in model space; consider using simpler lighting equations in your shaders over more complex lighting algorithms.
Pre-compute your lighting and store the color values in a texture that can be sampled by fragment processing.
Avoid Alpha Test and Discard
Graphics hardware often performs depth testing early in the graphics pipeline, before calculating the fragment’s color value. If your app uses an alpha test in OpenGL ES 1.1 or the
discard instruction in an OpenGL ES 2.0 or 3.0 fragment shader, some hardware depth-buffer optimizations must be disabled. In particular, this may require a fragment’s color to be completely calculated only to be discarded because the fragment is not visible.
An alternative to using alpha test or discard to kill pixels is to use alpha blending with alpha set to zero. The color framebuffer is not modified, but the graphics hardware can still use any Z-buffer optimizations it performs. This does change the value stored in the depth buffer and so may require back-to-front sorting of the transparent primitives.
If you need to use alpha testing or a
discard instruction, draw these objects separately in the scene after processing any primitives that do not require it. Place the
discard instruction early in the fragment shader to avoid performing calculations whose results are unused.
Another option for avoiding performance penalties due to discard operations is to use a “Z-Prepass” rendering strategy. Render your scene once using a simple fragment shader containing only your discard logic (avoiding expensive lighting calculations) to fill the depth buffer. Then, render your scene again using the
GL_EQUAL depth test function and your lighting shaders. Though multipass rendering normally incurs a performance penalty, this approach can yield better performance than a single-pass render that involves a large number of discard operations.
Be Aware of Core Animation Compositing Performance
Core Animation composites the contents of renderbuffers with any other layers in your view hierarchy, regardless of whether those layers were drawn with OpenGL ES, Quartz or other graphics libraries. That’s helpful, because it means that OpenGL ES is a first-class citizen to Core Animation. However, mixing OpenGL ES content with other content takes time; when used improperly, your app may perform too slowly to reach interactive frame rates.
For the absolute best performance, your app should rely solely on OpenGL ES to render your content. Size the view that holds your OpenGL ES content to match the screen, make sure its
opaque property is set to
YES (the default for
GLKView objects) and that no other views or Core Animation layers are visible.
If you render into a Core Animation layer that is composited on top of other layers, making your
CAEAGLLayer object opaque reduces—but doesn’t eliminate—the performance cost. If your
CAEAGLLayer object is blended on top of layers underneath it in the layer hierarchy, the renderbuffer’s color data must be in a premultiplied alpha format to be composited correctly by Core Animation. Blending OpenGL ES content on top of other content has a severe performance penalty.