OpenGL ES Design Guidelines
OpenGL ES performs many complex operations on your behalf—transformations, lighting, clipping, texturing, environmental effects, and so on—on large data sets. The size of your data and the complexity of the calculations performed can impact performance, making your stellar 3D graphics shine less brightly than you'd like. Whether your app is a game using OpenGL ES to provide immersive real-time images to the user or an image processing app more concerned with image quality, use the information in this chapter to help you design your graphics engine. This chapter introduces key concepts that later chapters expand on.
How to Visualize OpenGL ES
There are a few ways you can visualize OpenGL ES, and each provides a slightly different context in which to design and observe your app. The most common way to visualize OpenGL ES is as a graphics pipeline such as the one shown in Figure 6-1. Your app configures the graphics pipeline, and then executes one or more drawing commands. The drawing commands send vertex data down the pipeline, where it is processed, assembled into primitives, and rasterized into fragments. Each fragment calculates color and depth values which are then merged into the framebuffer. Using the pipeline as a mental model is essential for identifying exactly what work your app performs to generate a new frame. In a OpenGL ES 2.0 or 3.0 app, your design consists of writing customized shaders to handle the vertex and fragment stages of the pipeline. In an OpenGL ES 1.1 app, you modify the state machine that drives the fixed-function pipeline to perform the desired calculations.
Another benefit of the pipeline model is that individual stages can calculate their results independently and simultaneously. This is a key point. Your app might prepare new primitives while separate portions of the graphics hardware perform vertex and fragment calculations on previously submitted geometry. If any pipeline stage performs too much work or performs too slowly, other pipeline stages sit idle until the slowest stage completes its work. Your design needs to balance the work performed by each pipeline stage by matching calculations to the capabilities of the graphics hardware on the device.
Another way to visualize OpenGL ES is as a client-server architecture, as shown in Figure 6-2. OpenGL ES state changes, texture and vertex data, and rendering commands all have to travel from the app to the OpenGL ES client. The client transforms these data into a format that the graphics hardware understands, and forwards them to the GPU. Not only do these transformations add overhead, but the process of transferring the data to the graphics hardware takes time.
To achieve great performance, an app must reduce the frequency of calls it makes to OpenGL ES, minimize the transformation overhead, and carefully manage the flow of data between itself and OpenGL ES.
Designing a High-Performance OpenGL ES App
To summarize, a well-designed OpenGL ES app needs to:
Exploit parallelism in the OpenGL ES pipeline.
Manage data flow between the app and the graphics hardware.
Figure 6-3 suggests a process flow for an app that uses OpenGL ES to perform animation to the display.
When the app launches, the first thing it does is initialize resources that it does not intend to change over the lifetime of the app. Ideally, the app encapsulates those resources into OpenGL ES objects. The goal is to create any object that can remain unchanged for the runtime of the app (or even a portion of the app’s lifetime, such as the duration of a level in a game), trading increased initialization time for better rendering performance. Complex commands or state changes should be replaced with OpenGL ES objects that can be used with a single function call. For example, configuring the fixed-function pipeline can take dozens of function calls. Instead, compile a graphics shader at initialization time, and switch to it at runtime with a single function call. OpenGL ES objects that are expensive to create or modify should almost always be created as static objects.
The rendering loop processes all of the items you intend to render to the OpenGL ES context, then presents the results to the display. In an animated scene, some data is updated for every frame. In the inner rendering loop shown in Figure 6-3, the app alternates between updating rendering resources (creating or modifying OpenGL ES objects in the process) and submitting drawing commands that use those resources. The goal of this inner loop is to balance the workload so that the CPU and GPU are working in parallel, preventing the app and OpenGL ES from accessing the same resources simultaneously. On iOS, modifying an OpenGL ES object can be expensive when the modification is not performed at the start or the end of a frame.
An important goal for this inner loop is to avoid copying data back from OpenGL ES to the app. Copying results from the GPU to the CPU can be very slow. If the copied data is also used later as part of the process of rendering the current frame, as shown in the middle rendering loop, your app blocks until all previously submitted drawing commands are completed.
After the app submits all drawing commands needed in the frame, it presents the results to the screen. A non-interactive app would copy the final image to app memory for further processing.
Finally, when your app is ready to quit, or when it finishes with a major task, it frees OpenGL ES objects to make additional resources available, either for itself or for other apps.
To summarize the important characteristics of this design:
Create static resources whenever practical.
The inner rendering loop alternates between modifying dynamic resources and submitting rendering commands. Try to avoid modifying dynamic resources except at the beginning or the end of a frame.
Avoid reading intermediate rendering results back to your app.
The rest of this chapter provides useful OpenGL ES programming techniques to implement the features of this rendering loop. Later chapters demonstrate how to apply these general techniques to specific areas of OpenGL ES programming.
Avoid Synchronizing and Flushing Operations
The OpenGL ES specification doesn’t require implementations to execute commands immediately. Often, commands are queued to a command buffer and executed by the hardware at a later time. Usually, OpenGL ES waits until the app has queued many commands before sending the commands to the hardware—batch processing is usually more efficient. However, some OpenGL ES functions must flush the command buffer immediately. Other functions not only flush the command buffer but also block until previously submitted commands have completed before returning control over the app. Use flushing and synchronizing commands only when that behavior is necessary. Excessive use of flushing or synchronizing commands may cause your app to stall while it waits for the hardware to finish rendering.
These situations require OpenGL ES to submit the command buffer to the hardware for execution.
glFlushsends the command buffer to the graphics hardware. It blocks until commands are submitted to the hardware but does not wait for the commands to finish executing.
glFinishflushes the command buffer and then waits for all previously submitted commands to finish executing on the graphics hardware.
Functions that retrieve framebuffer content (such as
glReadPixels) also wait for submitted commands to complete.
The command buffer is full.
Using glFlush Effectively
On some desktop OpenGL implementations, it can be useful to periodically call the
glFlush function to efficiently balance CPU and GPU work, but this is not the case in iOS. The Tile-Based Deferred Rendering algorithm implemented by iOS graphics hardware depends on buffering all vertex data in a scene at once, so it can be optimally processed for hidden surface removal. Typically, there are only two situations where an OpenGL ES app should call the
You should flush the command buffer when your app moves to the background, because executing OpenGL ES commands on the GPU while your app is in the background causes iOS to terminate your app. (See “Implementing a Multitasking-Aware OpenGL ES App.”)
If your app shares OpenGL ES objects (such as vertex buffers or textures) between multiple contexts, you should call the
glFlushfunction to synchronize access to these resources. For example, you should call the
glFlushfunction after loading vertex data in one context to ensure that its contents are ready to be retrieved by another context. This advice also applies when sharing OpenGL ES objects with other iOS APIs such as Core Image.
Avoid Querying OpenGL ES State
glGetError(), may require OpenGL ES to execute previous commands before retrieving any state variables. This synchronization forces the graphics hardware to run lockstep with the CPU, reducing opportunities for parallelism. To avoid this, maintain your own copy of any state you need to query, and access it directly, rather than calling OpenGL ES.
When errors occur, OpenGL ES sets an error flag. These and other errors appear in OpenGL ES Frame Debugger in Xcode or OpenGL ES Analyzer in Instruments. You should use those tools instead of the
glGetError function, which degrades performance if called frequently. Other queries such as
glValidateProgram() are also generally only useful while developing and debugging. You should omit calls to these functions in Release builds of your app.
Use OpenGL ES to Manage Your Resources
Many pieces of OpenGL data can be stored directly inside the OpenGL ES rendering context and its associated sharegroup object. The OpenGL ES implementation is free to transform the data into a format that is optimal for the graphics hardware. This can significantly improve performance, especially for data that changes infrequently. Your app can also provide hints to OpenGL ES about how it intends to use the data. An OpenGL ES implementation can use these hints to process the data more efficiently. For example, static data might be placed in memory that the graphics processor can readily fetch, or even into dedicated graphics memory.
Use Double Buffering to Avoid Resource Conflicts
Resource conflicts occur when your app and OpenGL ES access an OpenGL ES object at the same time. When one participant attempts to modify an OpenGL ES object being used by the other, they may block until the object is no longer in use. Once they begin modifying the object, the other participant may not access the object until the modifications are complete. Alternatively, OpenGL ES may implicitly duplicate the object so that both participants can continue to execute commands. Either option is safe, but each can end up as a bottleneck in your app. Figure 6-4 shows this problem. In this example, there is a single texture object, which both OpenGL ES and your app want to use. When the app attempts to change the texture, it must wait until previously submitted drawing commands complete—the CPU synchronizes to the GPU.
To solve this problem, your app could perform additional work between changing the object and drawing with it. But, if your app does not have additional work it can perform, it should explicitly create two identically sized objects; while one participant reads an object, the other participant modifies the other. Figure 6-5 illustrates the double-buffered approach. While the GPU operates on one texture, the CPU modifies the other. After the initial startup, neither the CPU or GPU sits idle. Although shown for textures, this solution works for almost any type of OpenGL ES object.
Double buffering is sufficient for most apps, but it requires that both participants finish processing commands in roughly the same time. To avoid blocking, you can add more buffers; this implements a traditional producer-consumer model. If the producer finishes before the consumer finishes processing commands, it takes an idle buffer and continues to process commands. In this situation, the producer idles only if the consumer falls badly behind.
Double and triple buffering trade off consuming additional memory to prevent the pipeline from stalling. The additional use of memory may cause pressure on other parts of your app. On an iOS device, memory can be scarce; your design may need to balance using more memory with other app optimizations.
Be Mindful of OpenGL ES State Variables
The hardware has one current state, which is compiled and cached. Switching state is expensive, so it's best to design your app to minimize state switches.
Don't set a state that's already set. Once a feature is enabled, it does not need to be enabled again. Calling an enable function more than once does nothing except waste time because OpenGL ES does not check the state of a feature when you call
glDisable. For instance, if you call
glEnable(GL_LIGHTING) more than once, OpenGL ES does not check to see if the lighting state is already enabled. It simply updates the state value even if that value is identical to the current value.
You can avoid setting a state more than necessary by using dedicated setup or shutdown routines rather than putting such calls in a drawing loop. Setup and shutdown routines are also useful for turning on and off features that achieve a specific visual effect—for example, when drawing a wire-frame outline around a textured polygon.
If you are drawing 2D images, disable all irrelevant state variables, similar to what's shown in Listing 6-1.
Listing 6-1 Disabling state variables on OpenGL ES 1.1
// Disable other state variables as appropriate.
Replace State Changes with OpenGL ES Objects
The “Be Mindful of OpenGL ES State Variables” section suggests that reducing the number of state changes can improve performance. Some OpenGL ES extensions can create objects that collect multiple OpenGL state changes into an object that can be bound with a single function call. Where such techniques are available, they are recommended. For example, configuring the fixed-function pipeline requires many function calls to change the state of the various operators. Not only does this incur overhead for each function called, but the code is more complex and difficult to manage. Instead, use a shader. A shader, once compiled, can have the same effect but requires only a single call to
For another example, vertex array objects store the configuration of multiple vertex attributes into a single vertex array object. See “Consolidate Vertex Array State Changes Using Vertex Array Objects.”