Improving Locality of Reference

An important improvement you can make to your application’s performance is to reduce the number of virtual memory pages used by the application at any given time. This set of pages is referred to as the working set and consists of application code and runtime data. Reducing the amount of in-memory data is a function of your algorithms, but reducing the amount of in-memory code can be achieved by a process called scatter loading. This technique is also referred to as improving the locality of reference of your code.

Normally, compiled code for methods and functions is organized by source file in the resulting binary. Scatter loading changes this organization and instead groups related methods and functions together, independent of the original locations of those methods and functions. This process allows the kernel to keep an active application’s most frequently referenced executable pages in the smallest memory space possible. Not only does this reduce the footprint of the application, it reduces the likelihood of those pages being paged out.

Profiling Code With gprof

Given profiling data collected at runtime, gprof produces an execution profile of a program. The effect of called routines is incorporated in the profile of each caller. The profile data is taken from the call graph profile file (gmon.out by default), which is created by a program compiled and linked with the -pg option. The symbol table in the executable is correlated with the call graph profile file. If more than one profile file is specified, the gprof output shows the sum of the profile information in the given profile files.

The gprof tool is useful for many purposes, including the following:

Generating Profiling Data

Before you can profile your application, you must set up your project to generate profiling information. To generate profiling information for your Xcode project, you must modify your target or build-style settings to include the “Generate profiling code” option. (See the Xcode Help for information on enabling target and build-style settings.)

The profiling code inside your program generates a file called gmon.out with the profiling information. (Usually, this file is placed in the current working directory.) To analyze the data in this file, copy it to the directory containing your executable prior to calling gprof, or just specify the path to gmon.out when you run gprof.

In addition to profiling your own code, you can find out how much time is spent in Carbon and Cocoa frameworks by linking against profile versions of those frameworks. To do this, add the DYLD_IMAGE_SUFFIX setting to your target or build style and set its value to _profile. The dynamic linker combines this suffix with the framework name to link against the profile version of the framework. To determine which frameworks support profiling, look at the frameworks themselves. For example, the Carbon library comes with both profile and debug versions.

Generating Order Files

An order file contains an ordered sequence of lines, each line consisting of a source file name and a symbol name, separated by a colon with no other white space. Each line represents a block to be placed in a section of the executable. If you modify the file by hand, you must follow this format exactly so the linker can process the file. If the object file name:symbol name pair is not exactly the name seen by the linker, it tries its best to match up the names with the objects being linked.

The lines in an order file for procedure reordering consist of an object filename and procedure name (function, method, or other symbol). The sequence in which procedures are listed in the order file represents the sequence in which they are linked into the __text section of the executable.

To create an order file from the profiling data generated by using a program, run gprof using the -S option (see the man page for gprof (1)). For example,

gprof -S MyApp.profile/MyApp gmon.out

The -S option produces four mutually exclusive order files:

gmon.order

Ordering based on a “closest is best” analysis of the profiling call graph. Calls that call each other frequently are placed close together.

callf.order

Routines sorted by the number of calls made to each routine, largest numbers first.

callo.order

Routines sorted by the order in which they are called.

time.order

Routines sorted by the amount of CPU time spent, largest times first.

You should try using each of these files to see which provides the largest performance improvement, if any. See Using pagestuff to Examine Pages on Disk for a discussion of how to measure the results of the ordering.

These order files contain only those procedures used during profiling. The linker keeps track of missing procedures and links them in their default order after those listed in the order file. Static names for library functions are generated in the order file only if the project directory contains a file generated by the linker’s -whatsloaded option; see Creating a Default Order File for details.

The gprof -S option doesn’t work with executables that have already been linked using an order file.

Fixing Up Your Order Files

After you generate your order files, you should look through them and make sure they are correct. There are a number of cases where you need to edit your order files manually, including the following:

  • Your executable contained assembly-language files.

  • You profiled a stripped executable.

  • Your executable contained files compiled without the -g option.

  • Your project defines internal labels (typically for goto statements).

  • You want to preserve the order of routines in a particular object file.

If the definition of a symbol is located in an assembly file, a stripped executable file, or a file compiled without the -g option, gprof omits the source file name from the symbol’s entry in the order file. If your project uses such files, you must edit the order file manually and add the appropriate source filenames. Alternatively, you can delete the symbol references altogether to force the corresponding routines to be linked in default order.

If your code contains internal labels, you must remove those labels from the order files; otherwise, the function that defines the label will be split apart during the linking phase. You can prevent the inclusion of internal labels in assembly files altogether by prefixing them with the string L_. The assembler interprets symbols with this prefix as local to a particular function and strips them to prevent access by other tools such as gprof.

To preserve the order of routines in a particular object file, use the special symbol .section_all. For example, if the object file foo.o comes from assembly source and you want to link all of the routines without reordering them, delete any existing references to foo.o and insert the following line in the order file:

foo.o:.section_all

This option is useful for object files compiled from assembly source, or for which you don’t have the source.

Linking with an Order File

Once you’ve generated an order file, you can link the program using the -sectorder and -e start options:

cc -o outputFile inputFile.o … -sectorder __TEXT __text orderFile -e start

To use an order file with a Xcode project, modify the “Other Linker Flags” option in the Deployment build style of your project. Add the text -sectorder __TEXT __text orderFile to this setting to specify your order file.

If any inputFile is a library rather than an object file, you may need to edit the order file before linking to replace all references to the object file with references to the appropriate library file. Again, the linker does its best to match names in the order file with the sources it is editing.

With these options, the linker constructs the executable file outputFile so that the contents of the __TEXT segment’s __text section are constructed from blocks of the input files’ __text sections. The linker arranges the routines in the input files in the order listed in orderFile.

As the linker processes the order file, it places the procedures whose object-file and symbol-name pairs aren’t listed in the order file into the __text section of outputFile. It links these symbols in the default order. Object-file and symbol-name pairs listed more than once always generate a warning, and the first occurrence of the pair is used.

By default, the linker prints a summary of the number of symbol names in the linked objects that are not in the order file, the number of symbol names in the order file that are not in the linked objects, and the number of symbol names it tried to match that were ambiguous. To request a detailed listing of these symbols, use the -sectorder_detail option.

The linker’s -e start option preserves the executable’s entry point. The symbol start (note the lack of a leading “_”) is defined in the C runtime shared library /usr/bin/crt1.o; it represents the first text address in your program when you link normally. When you reorder your procedures, you have to use this option to fix the entry point. Another way to do this is to make the line /usr/lib/crt1.o:start or /usr/lib/crt1.o:section_all the first line of your order file.

Limitations of gprof Order Files

The .order files generated by gprof contain only those functions that were called or sampled when the executable was run. For library functions to appear correctly in the order file, a whatsloaded file produced by the linker should exist in the working directory.

The -S option does not work with executables that have already been linked with an order file.

Production of the gmon.order file can take a long time—it can be suppressed with the -x parameter.

Filenames will be missing for the following items:

  • files compiled without the -g parameter

  • routines generated from assembly-language source

  • executables that have had their debugging symbols removed (as with the strip tool)

Profiling With the Monitor Functions

The file /usr/include/monitor.h declares a set of functions that you can use to programmatically profile specific sections of your code. You can use these functions to gather statistics only for certain sections of your code, or for all of your code. You can then use the gprof tool to build call graph and other performance analysis data from the resulting file. Listing 1 shows how to use the monitor functions.

Listing 1  Using monitor functions

#include <monitor.h>
 
    /* To start profiling: */
    moninit();
    moncontrol(1);
 
    /* To stop, and dump to a file */
    moncontrol(0);
    monoutput("/tmp/myprofiledata.out");
    monreset();

Organizing Code at Compile Time

The GCC compiler lets you specify attributes on any function or variable you declare. The section attribute lets you tell GCC which segment and section you want a particular piece of code to be placed.

The section attribute takes several parameters that control where the resulting code is placed. At a minimum, you must specify a segment and section name for the code you want to place. Other options are also available and are described in the GCC documentation.

The following listing shows how you use the section attribute for a function. In this example, the section attribute is added to a forward declaration of the function. The attribute tells the compiler to place the function in a specific __text section of the executable.

void MyFunction (int a) __attribute__((section("__TEXT,__text.10")));

The following listing shows some examples of how you can organize your global variables using the section attribute.

extern const int x __attribute__((section("__TEXT,__my_const")));
const int x=2;
 
extern char foo_string[] __attribute__((section("__DATA,__my_data")));
char foo_string[] = "My text string\n";

For detailed information on specifying section attributes, see the documentation for the GCC compiler in /Developer/Documentation/DeveloperTools/gcc3.

Reordering the __text Section

As described in Overview of the Mach-O Executable Format, the __TEXT segment holds the actual code and read-only portions of your program. The compiler tools, by convention, place procedures from your Mach-O object files (with extension .o) in the __text section of the __TEXT segment.

As your program runs, pages from the __text section are loaded into memory on demand, as routines on these pages are used. Code is linked into the __text section in the order in which it appears in a given source file, and source files are linked in the order in which they are listed on the linker command line (or in the order specifiable in Xcode). Thus, code from the first object file is linked from start to finish, followed by code from the second file and third file and so on.

Loading code in the source file declaration order is rarely optimal. For example, suppose that certain methods or functions in your code are invoked repeatedly, while others are seldom used. Reordering the procedures to place frequently used code at the beginning of the __text section minimizes the average number of pages your application uses and thereby reduces paging activity.

As another example, suppose that all the objects defined by your code are initialized at the same time. Because the initialization routine for each class is defined in a separate source file, the initialization code is ordinarily distributed across the __text section. By contiguously reordering initialization code for all classes, you reduce the number of pages that need to be read in, enhancing initialization performance. The application requires just the small number of pages containing initialization code, rather than a larger number of pages, each containing a small piece of initialization code.

Reordering Procedures

Depending on the size and complexity of your application, you should pursue a strategy for ordering code that best improves the performance of your executable. Like most performance tuning, the more time you spend measuring and retuning your procedure order, the more memory you save. You can easily obtain a good first-cut ordering by running your application and sorting the routines by call frequency. The steps for this strategy are listed below and explained in more detail in the sections that follow:

  1. Build a profile version of your application. This step generates an executable containing symbols used in the profiling and reordering procedures.

  2. Run and use the application to create a set of profile data. Perform a series of functional tests, or have someone use the program for a test period.

  3. Create order files. Order files list procedures in optimized order. The linker uses order files to reorder procedures in the executable.

  4. Run the linker using the order files. This creates an executable with procedures linked into the __text section as specified in the order file.

For information on profiling your code and generating and linking an order file, see Profiling Code With gprof.

Procedure Reordering for Large Programs

For many programs, the ordering generated by the steps just described brings a substantial improvement over unordered procedures. For a simple application with few features, such an ordering represents most of the gains to be had by procedure reordering. However, larger applications and other large programs can benefit greatly from additional analysis. While order files based on call frequency or the call graph are a good start, you can use your knowledge of the structure of your application to further reduce the virtual-memory working set.

Creating a Default Order File

If you want to reorder an application’s procedures using techniques other than those described above, you may want to skip the profiling steps and just start with a default order file that lists all the routines of your application. Once you have a list of the routines in suitable form, you can then rearrange the entries by hand or by using a sorting technique of your choice. You can then use the resulting order file with the linker’s -sectorder option as described in Linking with an Order File.

To create a default order file, first run the linker with the -whatsloaded option:

cc -o outputFile inputFile.o -whatsloaded > loadedFile

This creates a file, loadedFile, that lists the object files loaded in the executable, including any in frameworks or other libraries. The -whatsloaded option can also be used to make sure that order files generated by gprof -S include names for procedures in static libraries.

Using the file loadedFile, you can run nm with the -onjls options and the __TEXT __text argument:

nm -onjls __TEXT __text `cat loadedFile` > orderFile

The content of the file orderFile is the symbol table for the text section. Procedures are listed in the symbol table in their default link order. You can rearrange entries in this file to change the order in which you want procedures to be linked, then run the linker as described in Linking with an Order File.

Using pagestuff to Examine Pages on Disk

The pagestuff tool helps you measure the effectiveness of your procedure ordering by telling you which pages of the executable file are likely to be loaded in memory at a given time. This section briefly describes how to use this tool. See the pagestuff man page for more information.

The pagestuff tool prints out the symbols on a particular page of executable code. The following is the syntax for the command:

pagestuff filename [pageNumber | -a]

The output of pagestuff is a list of procedures contained in filename on page pageNumber. To view all the pages of the file, use the -a option in place of the page number. This output allows you to determine if each page associated with the file in memory is optimized. If it isn’t, you can rearrange entries in the order file and link the executable again to maximize performance gains. For example, move two related procedures together so they are linked on the same page. Perfecting the ordering may require several cycles of linking and tuning.

Grouping Routines According to Usage

Why generate profile data for individual operations of your application? The strategy is based on the assumption that a large application has three general groups of routines:

  • Hot routines run during the most common usage of the application. These are often primitive routines that provide a foundation for the application’s features (for example, routines for accessing a document’s data structures) or routines that implement the core features of an application, such as routines that implement typing in a word processor. These routines should be clustered together in the same set of pages.

  • Warm routines implement specific features of the application. Warm routines are usually associated with particular features that user performs only occasionally (such as launching, printing, or importing graphics). Because these routines are used reasonably often, cluster them in the same small set of pages so they will load quickly. However, because there are long periods when users aren’t accessing this functionality, these routines should not be located in the hot category.

  • Cold routines are rarely used in the application. Cold routines implement obscure features or cover boundary or error cases. Group these routines together to avoid wasting space on a hot or warm page.

At any given time, you should expect most of the hot pages to be resident, and you should expect the warm pages to be resident for the features that the user is currently using. Only very rarely should a cold page be resident.

To achieve this ideal ordering, gather a number of profile data sets. First, gather the hot routines. As described above, compile the application for profiling, launch it, and use the program. Using gprof -S, generate a frequency sorted order file called hot.order from the profile data.

After creating a hot order file, create order files for features that users occasionally use, such as routines that only run when the application is launched. Printing, opening documents, importing images and using various non-document windows and tools are other examples of features that users use occasionally but not continually, and are good candidates for having their own order files. Naming these order files after the feature being profiled (for example, feature.order) is recommended.

Finally, to generate a list of all routines, build a “default” order file default.order (as described in Reordering Procedures).

Once you have these order files, you can combine them using the code shown in Listing 2. You can use this listing to build a command-line utility that removes duplicate lines in the order files while retaining the ordering of the original data.

Listing 2  Code for Unique.c

//
//  unique
//
//  A command for combining files while removing
//  duplicate lines of text. The order of other lines of text
//  in the input files is preserved.
//
//  Build using this command line:
//
//  cc -ObjC -O -o unique -framework Foundation Unique.c
//
//  Note that “unique” differs from the BSD command “uniq” in that
//  “uniq” combines duplicate adjacent lines, while “unique” does not
//  require duplicate lines to be adjacent. “unique” is also spelled
//  correctly.
//
 
#import <stdio.h>
#import <string.h>
#import <Foundation/NSSet.h>
#import <Foundation/NSData.h>
 
#define kBufferSize 8*1024
 
void ProcessFile(FILE *fp)
{
    char buf[ kBufferSize ];
 
    static id theSet = nil;
 
    if( theSet == nil )
    {
        theSet = [[NSMutableSet alloc] init];
    }
 
    while( fgets(buf, kBufferSize, fp) )
    {
        id dataForString;
 
        dataForString = [[NSData alloc] initWithBytes:buf length:strlen(buf)];
 
        if( ! [theSet containsObject:dataForString] )
        {
            [theSet addObject:dataForString];
            fputs(buf, stdout);
        }
 
        [dataForString release];
    }
}
 
int main( int argc, char *argv[] )
{
    int     i;
    FILE *  theFile;
    int     status = 0;
 
    if( argc > 1 )
    {
        for( i = 1; i < argc; i++ )
        {
            if( theFile = fopen( argv[i], "r" ) )
            {
                ProcessFile( theFile );
                fclose( theFile );
            }
            else
            {
                fprintf( stderr, "Could not open ‘%s’\n", argv[i] );
                status = 1;
                break;
            }
        }
    }
    else
    {
        ProcessFile( stdin );
    }
 
    return status;
}

Once built, you would use the program to generate your final order file with syntax similar to the following:

unique hot.order feature1.order ... featureN.order default.order > final.order

Of course, the real test of the ordering is the amount by which paging I/O is reduced. Run your application, use different features, and examine how well your ordering file is performing under different conditions. You can use the top tool (among others) to measure paging performance.

Finding That One Last Hot Routine

After reordering you will usually have a region of pages with cold routines that you expect to be rarely used, often at the end of your text ordering. However, one or two hot routines might slip through the cracks and land in this cold section. This is a costly mistake, because using one of these hot routines now requires an entire page to be resident, a page that is otherwise filled with cold routines that are not likely to be used.

Check that the cold pages of your executable are not being paged in unexpectedly. Look for pages that are resident with high-page offsets in the cold region of your application’s text segment. If there is an unwanted page, you need to find out what routine on that page is being called. One way to do this is to profile during the particular operation that is touching that page, and use the grep tool to search the profiler output for routines that reside on that page. Alternatively, a quick way to identify the location where a page is being touched is to run the application under the gdb debugger and use the Mach call vm_protect to disallow all access to that page:

(gdb) p vm_protect(task_self(), startpage_addr, vm_page_size, FALSE, 0);

After clearing the page protections, any access to that page causes a memory fault, which breaks the program in the debugger. At this point you can simply look at the function call stack (using the bt command) to learn why the routine was being called.

Reordering Other Sections

You can use the -sectorder option of the linker to organize blocks in most sections of the executable. Sections that might occasionally benefit from reordering are literal sections, such as the __TEXT segment’s __cstring section, and the __DATA segment’s __data section.

Reordering Literal Sections

The lines in the order file for literal sections can most easily be produced with the ld and otool tools. For literal sections, otool creates a specific type of order file for each type of literal section:

  • For C string literal sections, the order-file format is one literal C string per line (with ANSI C escape sequences allowed in the C string). For example, a line might look like

    Hello world\n
  • For 4-byte literal sections, the order-file format is one 32-bit hex number with a leading 0x per line with the rest of the line treated as a comment. For example, a line might look like

    0x3f8ccccd (1.10000002384185790000e+00)
  • For 8-byte literal sections, the order file line consists of two 32-bit hexadecimal numbers per line separated by white space each with a leading 0x, with the rest of the line treated as a comment. For example, a line might look like:

    0x3ff00000 0x00000000 (1.00000000000000000000e+00)
  • For literal pointer sections, the format of the lines in the order file represents the pointers, one per line. A literal pointer is represented by the segment name, the section name of the literal pointer, and then the literal itself. These are separated by colons with no extra white space. For example, a line might look like:

    __OBJC:__selector_strs:new
  • For all the literal sections, each line in the order file is simply entered into the literal section and appears in the output file in the order of the order file. No check is made to see if the literal is in the loaded objects.

To reorder a literal section, first create a “whatsloaded” file using the ld -whatsloaded option as described in section Creating a Default Order File. Then, run otool with the appropriate options, segment and section names, and filenames. The output of otool is a default order file for the specified section. For example, the following command line produces an order file listing the default load order for the __TEXT segment’s __cstring section in the file cstring_order:

otool -X -v -s __TEXT __cstring `cat whatsloaded` > cstring_order

Once you’ve created the file cstring_order, you can edit the file and rearrange its entries to optimize locality of reference. For example, you can place literal strings used most frequently by your program (such as labels that appear in your user interface) at the beginning of the file. To produce the desired load order in the executable, use the following command:

cc -o hello hello.o -sectorder __TEXT __cstring  cstring_order

Reordering Data Sections

There are currently no tools to measure code references to data symbols. However, you might know a program’s data-referencing patterns and might be able to get some savings by separating data for seldom-used features from other data. One way to approach __data section reordering is to sort the data by size so small data items end up on as few pages as possible. For example, if a larger data item is placed across two pages with two small items sharing each of these pages, the larger item must be paged in to access the smaller items. Reordering the data by size can minimize this sort of inefficiency. Because this data would normally need to be written to the virtual-memory backing store, this could be a major savings in some programs.

To reorder the __data section, first create an order file listing source files and symbols in the order in which you want them linked (order file entries are described at the beginning of Generating Order Files). Then, link the program using the -sectorder command-line option:

cc -o outputFile inputFile.o … -sectorder __DATA __data orderFile -e start

To use an order file with a Xcode project, modify the “Other Linker Flags” option in the Deployment build style of your project. Add the text -sectorder __DATA __data orderFile to this setting to specify your order file.

Reordering Assembly Language Code

Some additional guidelines to keep in mind when reordering routines coded in assembly language: