Best Practices for Shaders

Shaders provide great flexibility to your application, but can also be a significant bottleneck if you perform too many calculations or perform them inefficiently.

Compile and Link Shaders During Initialization

Creating a shader program is an expensive operation compared to other OpenGL ES state changes. Listing 11-1 presents a typical strategy to load, compile, and verify a shader program.

Listing 11-1  Loading a Shader

/** Initialization-time for shader **/
            GLuint shader, prog;
            GLchar *shaderText = "... shader text ...";

            // Create ID for shader
           shader = glCreateShader(GL_VERTEX_SHADER);

           // Define shader text
           glShaderSource(shaderText);

           // Compile shader
           glCompileShader(shader);

           // Associate shader with program
           glAttachShader(prog, shader);

          // Link program
           glLinkProgram(prog);
    
           // Validate program
           glValidateProgram(prog);

           // Check the status of the compile/link
           glGetProgramiv(prog, GL_INFO_LOG_LENGTH, &logLen);
           if(logLen > 0)
           {
               // Show any errors as appropriate
               glGetProgramInfoLog(prog, logLen, &logLen, log);
               fprintf(stderr, "Prog Info Log: %s\n", log);
       }

     // Retrieve all uniform locations that are determined during link phase
           for(i = 0; i < uniformCt; i++)
           {
               uniformLoc[i] = glGetUniformLocation(prog, uniformName);
           }

           // Retrieve all attrib locations that are determined during link phase
           for(i = 0; i < attribCt; i++)
           {
               attribLoc[i] = glGetAttribLocation(prog, attribName);
           }

    /** Render stage for shaders **/
    glUseProgram(prog);

Compile, link, and validate your programs when your application is initialized. Once you’ve created all your shaders, your application can efficiently switch between them by calling glUseProgram.

Respect the Hardware Limits on Shaders

OpenGL ES places limitations on the number of each variable type you can use in a vertex or fragment shader. OpenGL ES implementations are not required to implement a software fallback when these limits are exceeded; instead, the shader simply fails to compile or link. Your application must validate all shaders to ensure that no errors occurred during compilation, as shown above in Listing 11-1.

Use Precision Hints

Precision hints were added to the GLSL ES language specification to address the need for compact shader variables that match the smaller hardware limits of embedded devices. Each shader must specify a default precision; individual shader variables may override this precision to provide hints to the compiler on how that variable is used in your application. An OpenGL ES implementation is not required to use the hint information, but may do so to generate more efficient shaders. The GLSL ES specification lists the range and precision for each hint.

For iOS applications, follow these guidelines:

Listing 11-2 defaults to high precision variables, but calculates the color output using low precision variables because higher precision is not necessary.

Listing 11-2  Low precision is acceptable for fragment color

default precision highp; // Default precision declaration is required in fragment shaders.
uniform lowp sampler2D sampler; // Texture2D() result is lowp.
varying lowp vec4 color;
varying vec2 texCoord;   // Uses default highp precision.
 
void main()
{
    gl_FragColor = color * texture2D(sampler, texCoord);
}

Perform Vector Calculations Lazily

Not all graphics processors include vector processors; they may perform vector calculations on a scalar processor. When performing calculations in your shader, consider the order of operations to ensure that the calculations are performed efficiently even if they are performed on a scalar processor.

If the code in Listing 11-3 were executed on a vector processor, each multiplication would be executed in parallel across all four of the vector’s components. However, because of the location of the parenthesis, the same operation on a scalar processor would take eight multiplications, even though two of the three parameters are scalar values.

Listing 11-3  Poor use of vector operators

highp float f0, f1;
highp vec4 v0, v1;
v0 = (v1 * f0) * f1;

The same calculation can be performed more efficiently by shifting the parentheses as shown in Listing 11-4. In this example, the scalar values are multiplied together first, and the result multiplied against the vector parameter; the entire operation can be calculated with five multiplications.

Listing 11-4  Proper use of vector operations

highp float f0, f1;
highp vec4 v0, v1;
v0 = v1 * (f0 * f1);

Similarly, your application should always specify a write mask for a vector operation if it does not use all of the components of the result. On a scalar processor, calculations for components not specified in the mask can be skipped. Listing 11-5 runs twice as fast on a scalar processor because it specifies that only two components are needed.

Listing 11-5  Specifying a write mask

highp vec4 v0;
highp vec4 v1;
highp vec4 v2;
v2.xz = v0 * v1;

Use Uniform or Constants Instead of Computation Within a Shader

Whenever a value can be calculated outside the shader, pass it into the shader as a uniform or a constant. Recalculating dynamic values can potentially be very expensive in a shader.

Avoid Branching

Branches are discouraged in shaders, as they can reduce the ability to execute operations in parallel on 3D graphics processors. If your shaders must use branches, follow these recommendations:

  • Best performance: Branch on a constant known when the shader is compiled.

  • Acceptable: Branch on a uniform variable.

  • Potentially slow: Branching on a value computed inside the shader.

Instead of creating a large shader with many knobs and levers, create smaller shaders specialized for specific rendering tasks. There is a tradeoff between reducing the number of branches in your shaders and increasing the number of shaders you create. Test different options and choose the fastest solution.

Eliminate Loops

You can eliminate many loops by either unrolling the loop or using vectors to perform operations. For example, this code is very inefficient:

// Loop
    int i;
    float f;
    vec4 v;
 
    for(i = 0; i < 4; i++)
        v[i] += f;

The same operation can be done directly using a component-wise add:

    float f;
    vec4 v;
    v += f;

When you cannot eliminate a loop, it is preferred that the loop have a constant limit to avoid dynamic branches.

Avoid Computing Array Indices in Shaders

Using indices computed in the shader is more expensive than a constant or uniform array index. Accessing uniform arrays is usually cheaper than accessing temporary arrays.

Avoid Dynamic Texture Lookups

Dynamic texture lookups, also known as dependent texture reads, occur when a fragment shader computes texture coordinates rather than using the unmodified texture coordinates passed into the shader. Although the shader language supports this, dependent texture reads can delay loading of texel data, reducing performance. When a shader has no dependent texture reads, the graphics hardware may prefetch texel data before the shader executes, hiding some of the latency of accessing memory.

Listing 11-6 shows a fragment shader that calculates new texture coordinates. The calculation in this example can easily be performed in the vertex shader, instead. By moving the calculation to the vertex shader and directly using the vertex shader’s computed texture coordinates, your application avoids the dependent texture read.

Listing 11-6  Dependent Texture Read

varying vec2 vTexCoord;
uniform sampler textureSampler;
 
void main()
{
    vec2 modifiedTexCoord = vec2(1.0 - vTexCoord.x, 1.0 - vTexCoord.y);
    gl_FragColor = texture2D(textureSampler, modifiedTexCoord);
}

Did this document help you? Yes It's good, but... Not helpful...