Improving Locality of Reference
An important improvement you can make to your application’s performance is to reduce the number of virtual memory pages used by the application at any given time. This set of pages is referred to as the working set and consists of application code and runtime data. Reducing the amount of in-memory data is a function of your algorithms, but reducing the amount of in-memory code can be achieved by a process called scatter loading. This technique is also referred to as improving the locality of reference of your code.
Normally, compiled code for methods and functions is organized by source file in the resulting binary. Scatter loading changes this organization and instead groups related methods and functions together, independent of the original locations of those methods and functions. This process allows the kernel to keep an active application’s most frequently referenced executable pages in the smallest memory space possible. Not only does this reduce the footprint of the application, it reduces the likelihood of those pages being paged out.
Profiling Code With gprof
Given profiling data collected at runtime,
gprof produces an execution profile of a program. The effect of called routines is incorporated in the profile of each caller. The profile data is taken from the call graph profile file (
gmon.out by default), which is created by a program compiled and linked with the
-pg option. The symbol table in the executable is correlated with the call graph profile file. If more than one profile file is specified, the
gprof output shows the sum of the profile information in the given profile files.
gprof tool is useful for many purposes, including the following:
cases where the Sampler application doesn’t work well, such as command-line tools or applications that quit after a short period of time
cases where you want a call graph that includes all the code that might be called in a given program, rather than a periodic sampling of calls
cases where you want to change the link order of your code to optimize the code locality
Generating Profiling Data
Before you can profile your application, you must set up your project to generate profiling information. To generate profiling information for your Xcode project, you must modify your target or build-style settings to include the “Generate profiling code” option. (See the Xcode Help for information on enabling target and build-style settings.)
The profiling code inside your program generates a file called
gmon.out with the profiling information. (Usually, this file is placed in the current working directory.) To analyze the data in this file, copy it to the directory containing your executable prior to calling
gprof, or just specify the path to
gmon.out when you run
In addition to profiling your own code, you can find out how much time is spent in Carbon and Cocoa frameworks by linking against profile versions of those frameworks. To do this, add the
DYLD_IMAGE_SUFFIX setting to your target or build style and set its value to
_profile. The dynamic linker combines this suffix with the framework name to link against the profile version of the framework. To determine which frameworks support profiling, look at the frameworks themselves. For example, the Carbon library comes with both profile and debug versions.
Generating Order Files
An order file contains an ordered sequence of lines, each line consisting of a source file name and a symbol name, separated by a colon with no other white space. Each line represents a block to be placed in a section of the executable. If you modify the file by hand, you must follow this format exactly so the linker can process the file. If the object file name:symbol name pair is not exactly the name seen by the linker, it tries its best to match up the names with the objects being linked.
The lines in an order file for procedure reordering consist of an object filename and procedure name (function, method, or other symbol). The sequence in which procedures are listed in the order file represents the sequence in which they are linked into the
__text section of the executable.
To create an order file from the profiling data generated by using a program, run
gprof using the
-S option (see the man page for
gprof (1)). For example,
gprof -S MyApp.profile/MyApp gmon.out
-S option produces four mutually exclusive order files:
Ordering based on a “closest is best” analysis of the profiling call graph. Calls that call each other frequently are placed close together.
Routines sorted by the number of calls made to each routine, largest numbers first.
Routines sorted by the order in which they are called.
Routines sorted by the amount of CPU time spent, largest times first.
You should try using each of these files to see which provides the largest performance improvement, if any. See “Using pagestuff to Examine Pages on Disk” for a discussion of how to measure the results of the ordering.
These order files contain only those procedures used during profiling. The linker keeps track of missing procedures and links them in their default order after those listed in the order file. Static names for library functions are generated in the order file only if the project directory contains a file generated by the linker’s
-whatsloaded option; see “Creating a Default Order File” for details.
gprof -S option doesn’t work with executables that have already been linked using an order file.
Fixing Up Your Order Files
After you generate your order files, you should look through them and make sure they are correct. There are a number of cases where you need to edit your order files manually, including the following:
Your executable contained assembly-language files.
You profiled a stripped executable.
Your executable contained files compiled without the
Your project defines internal labels (typically for
You want to preserve the order of routines in a particular object file.
If the definition of a symbol is located in an assembly file, a stripped executable file, or a file compiled without the
gprof omits the source file name from the symbol’s entry in the order file. If your project uses such files, you must edit the order file manually and add the appropriate source filenames. Alternatively, you can delete the symbol references altogether to force the corresponding routines to be linked in default order.
If your code contains internal labels, you must remove those labels from the order files; otherwise, the function that defines the label will be split apart during the linking phase. You can prevent the inclusion of internal labels in assembly files altogether by prefixing them with the string
L_. The assembler interprets symbols with this prefix as local to a particular function and strips them to prevent access by other tools such as
To preserve the order of routines in a particular object file, use the special symbol
.section_all. For example, if the object file
foo.o comes from assembly source and you want to link all of the routines without reordering them, delete any existing references to
foo.o and insert the following line in the order file:
This option is useful for object files compiled from assembly source, or for which you don’t have the source.
Linking with an Order File
Once you’ve generated an order file, you can link the program using the
cc -o outputFile inputFile.o
-sectorder __TEXT __text orderFile
To use an order file with a Xcode project, modify the “Other Linker Flags” option in the Deployment build style of your project. Add the text
-sectorder __TEXT __text orderFile to this setting to specify your order file.
If any inputFile is a library rather than an object file, you may need to edit the order file before linking to replace all references to the object file with references to the appropriate library file. Again, the linker does its best to match names in the order file with the sources it is editing.
With these options, the linker constructs the executable file outputFile so that the contents of the
__text section are constructed from blocks of the input files’
__text sections. The linker arranges the routines in the input files in the order listed in orderFile.
As the linker processes the order file, it places the procedures whose object-file and symbol-name pairs aren’t listed in the order file into the
__text section of outputFile. It links these symbols in the default order. Object-file and symbol-name pairs listed more than once always generate a warning, and the first occurrence of the pair is used.
By default, the linker prints a summary of the number of symbol names in the linked objects that are not in the order file, the number of symbol names in the order file that are not in the linked objects, and the number of symbol names it tried to match that were ambiguous. To request a detailed listing of these symbols, use the
-e start option preserves the executable’s entry point. The symbol
start (note the lack of a leading “
_”) is defined in the C runtime shared library
/usr/bin/crt1.o; it represents the first text address in your program when you link normally. When you reorder your procedures, you have to use this option to fix the entry point. Another way to do this is to make the line
/usr/lib/crt1.o:section_all the first line of your order file.
Limitations of gprof Order Files
.order files generated by
gprof contain only those functions that were called or sampled when the executable was run. For library functions to appear correctly in the order file, a
whatsloaded file produced by the linker should exist in the working directory.
-S option does not work with executables that have already been linked with an order file.
Production of the
gmon.order file can take a long time—it can be suppressed with the
Filenames will be missing for the following items:
Profiling With the Monitor Functions
/usr/include/monitor.h declares a set of functions that you can use to programmatically profile specific sections of your code. You can use these functions to gather statistics only for certain sections of your code, or for all of your code. You can then use the
gprof tool to build call graph and other performance analysis data from the resulting file. Listing 1 shows how to use the monitor functions.
Listing 1 Using monitor functions
/* To start profiling: */
/* To stop, and dump to a file */
Organizing Code at Compile Time
The GCC compiler lets you specify attributes on any function or variable you declare. The
section attribute lets you tell GCC which segment and section you want a particular piece of code to be placed.
section attribute takes several parameters that control where the resulting code is placed. At a minimum, you must specify a segment and section name for the code you want to place. Other options are also available and are described in the GCC documentation.
The following listing shows how you use the
section attribute for a function. In this example, the
section attribute is added to a forward declaration of the function. The attribute tells the compiler to place the function in a specific
__text section of the executable.
void MyFunction (int a) __attribute__((section("__TEXT,__text.10")));
The following listing shows some examples of how you can organize your global variables using the
extern const int x __attribute__((section("__TEXT,__my_const")));
const int x=2;
extern char foo_string __attribute__((section("__DATA,__my_data")));
char foo_string = "My text string\n";
For detailed information on specifying
section attributes, see the documentation for the GCC compiler in
Reordering the __text Section
As described in “Overview of the Mach-O Executable Format,” the
__TEXT segment holds the actual code and read-only portions of your program. The compiler tools, by convention, place procedures from your Mach-O object files (with extension
.o) in the
__text section of the
As your program runs, pages from the
__text section are loaded into memory on demand, as routines on these pages are used. Code is linked into the
__text section in the order in which it appears in a given source file, and source files are linked in the order in which they are listed on the linker command line (or in the order specifiable in Xcode). Thus, code from the first object file is linked from start to finish, followed by code from the second file and third file and so on.
Loading code in the source file declaration order is rarely optimal. For example, suppose that certain methods or functions in your code are invoked repeatedly, while others are seldom used. Reordering the procedures to place frequently used code at the beginning of the
__text section minimizes the average number of pages your application uses and thereby reduces paging activity.
As another example, suppose that all the objects defined by your code are initialized at the same time. Because the initialization routine for each class is defined in a separate source file, the initialization code is ordinarily distributed across the
__text section. By contiguously reordering initialization code for all classes, you reduce the number of pages that need to be read in, enhancing initialization performance. The application requires just the small number of pages containing initialization code, rather than a larger number of pages, each containing a small piece of initialization code.
Depending on the size and complexity of your application, you should pursue a strategy for ordering code that best improves the performance of your executable. Like most performance tuning, the more time you spend measuring and retuning your procedure order, the more memory you save. You can easily obtain a good first-cut ordering by running your application and sorting the routines by call frequency. The steps for this strategy are listed below and explained in more detail in the sections that follow:
Build a profile version of your application. This step generates an executable containing symbols used in the profiling and reordering procedures.
Run and use the application to create a set of profile data. Perform a series of functional tests, or have someone use the program for a test period.
Create order files. Order files list procedures in optimized order. The linker uses order files to reorder procedures in the executable.
Run the linker using the order files. This creates an executable with procedures linked into the
__textsection as specified in the order file.
For information on profiling your code and generating and linking an order file, see “Profiling Code With gprof.”
Procedure Reordering for Large Programs
For many programs, the ordering generated by the steps just described brings a substantial improvement over unordered procedures. For a simple application with few features, such an ordering represents most of the gains to be had by procedure reordering. However, larger applications and other large programs can benefit greatly from additional analysis. While order files based on call frequency or the call graph are a good start, you can use your knowledge of the structure of your application to further reduce the virtual-memory working set.
Creating a Default Order File
If you want to reorder an application’s procedures using techniques other than those described above, you may want to skip the profiling steps and just start with a default order file that lists all the routines of your application. Once you have a list of the routines in suitable form, you can then rearrange the entries by hand or by using a sorting technique of your choice. You can then use the resulting order file with the linker’s
-sectorder option as described in “Linking with an Order File.”
To create a default order file, first run the linker with the
cc -o outputFile inputFile.o
-whatsloaded > loadedFile
This creates a file, loadedFile, that lists the object files loaded in the executable, including any in frameworks or other libraries. The
-whatsloaded option can also be used to make sure that order files generated by
-S include names for procedures in static libraries.
Using the file loadedFile, you can run
nm with the
-onjls options and the
__TEXT __text argument:
nm -onjls __TEXT __text `cat loadedFile
The content of the file orderFile is the symbol table for the text section. Procedures are listed in the symbol table in their default link order. You can rearrange entries in this file to change the order in which you want procedures to be linked, then run the linker as described in “Linking with an Order File.”
Using pagestuff to Examine Pages on Disk
pagestuff tool helps you measure the effectiveness of your procedure ordering by telling you which pages of the executable file are likely to be loaded in memory at a given time. This section briefly describes how to use this tool. See the
pagestuff man page for more information.
pagestuff tool prints out the symbols on a particular page of executable code. The following is the syntax for the command:
pagestuff filename [pageNumber |
The output of
pagestuff is a list of procedures contained in filename on page pageNumber. To view all the pages of the file, use the
-a option in place of the page number. This output allows you to determine if each page associated with the file in memory is optimized. If it isn’t, you can rearrange entries in the order file and link the executable again to maximize performance gains. For example, move two related procedures together so they are linked on the same page. Perfecting the ordering may require several cycles of linking and tuning.
Grouping Routines According to Usage
Why generate profile data for individual operations of your application? The strategy is based on the assumption that a large application has three general groups of routines:
Hot routines run during the most common usage of the application. These are often primitive routines that provide a foundation for the application’s features (for example, routines for accessing a document’s data structures) or routines that implement the core features of an application, such as routines that implement typing in a word processor. These routines should be clustered together in the same set of pages.
Warm routines implement specific features of the application. Warm routines are usually associated with particular features that user performs only occasionally (such as launching, printing, or importing graphics). Because these routines are used reasonably often, cluster them in the same small set of pages so they will load quickly. However, because there are long periods when users aren’t accessing this functionality, these routines should not be located in the hot category.
Cold routines are rarely used in the application. Cold routines implement obscure features or cover boundary or error cases. Group these routines together to avoid wasting space on a hot or warm page.
At any given time, you should expect most of the hot pages to be resident, and you should expect the warm pages to be resident for the features that the user is currently using. Only very rarely should a cold page be resident.
To achieve this ideal ordering, gather a number of profile data sets. First, gather the hot routines. As described above, compile the application for profiling, launch it, and use the program. Using
-S, generate a frequency sorted order file called
hot.order from the profile data.
After creating a hot order file, create order files for features that users occasionally use, such as routines that only run when the application is launched. Printing, opening documents, importing images and using various non-document windows and tools are other examples of features that users use occasionally but not continually, and are good candidates for having their own order files. Naming these order files after the feature being profiled (for example,
feature.order) is recommended.
Finally, to generate a list of all routines, build a “default” order file
default.order (as described in “Reordering Procedures”).
Once you have these order files, you can combine them using the code shown in Listing 2. You can use this listing to build a command-line utility that removes duplicate lines in the order files while retaining the ordering of the original data.
Listing 2 Code for Unique.c
// A command for combining files while removing
// duplicate lines of text. The order of other lines of text
// in the input files is preserved.
// Build using this command line:
// cc -ObjC -O -o unique -framework Foundation Unique.c
// Note that "unique" differs from the BSD command "uniq" in that
// "uniq" combines duplicate adjacent lines, while "unique" does not
// require duplicate lines to be adjacent. "unique" is also spelled
#define kBufferSize 8*1024
void ProcessFile(FILE *fp)
char buf[ kBufferSize ];
static id theSet = nil;
if( theSet == nil )
theSet = [[NSMutableSet alloc] init];
while( fgets(buf, kBufferSize, fp) )
dataForString = [[NSData alloc] initWithBytes:buf length:strlen(buf)];
if( ! [theSet containsObject:dataForString] )
int main( int argc, char *argv )
FILE * theFile;
int status = 0;
if( argc > 1 )
for( i = 1; i < argc; i++ )
if( theFile = fopen( argv[i], "r" ) )
ProcessFile( theFile );
fclose( theFile );
fprintf( stderr, "Could not open ‘%s’\n", argv[i] );
status = 1;
ProcessFile( stdin );
Once built, you would use the program to generate your final order file with syntax similar to the following:
unique hot.order feature1.order ... featureN.order default.order > final.order
Of course, the real test of the ordering is the amount by which paging I/O is reduced. Run your application, use different features, and examine how well your ordering file is performing under different conditions. You can use the
top tool (among others) to measure paging performance.
Finding That One Last Hot Routine
After reordering you will usually have a region of pages with cold routines that you expect to be rarely used, often at the end of your text ordering. However, one or two hot routines might slip through the cracks and land in this cold section. This is a costly mistake, because using one of these hot routines now requires an entire page to be resident, a page that is otherwise filled with cold routines that are not likely to be used.
Check that the cold pages of your executable are not being paged in unexpectedly. Look for pages that are resident with high-page offsets in the cold region of your application’s text segment. If there is an unwanted page, you need to find out what routine on that page is being called. One way to do this is to profile during the particular operation that is touching that page, and use the
grep tool to search the profiler output for routines that reside on that page. Alternatively, a quick way to identify the location where a page is being touched is to run the application under the
gdb debugger and use the Mach call
vm_protect to disallow all access to that page:
(gdb) p vm_protect(task_self(), startpage_addr, vm_page_size, FALSE, 0);
After clearing the page protections, any access to that page causes a memory fault, which breaks the program in the debugger. At this point you can simply look at the function call stack (using the
bt command) to learn why the routine was being called.
Reordering Other Sections
You can use the
-sectorder option of the linker to organize blocks in most sections of the executable. Sections that might occasionally benefit from reordering are literal sections, such as the
__cstring section, and the
Reordering Literal Sections
The lines in the order file for literal sections can most easily be produced with the
otool tools. For literal sections,
otool creates a specific type of order file for each type of literal section:
For C string literal sections, the order-file format is one literal C string per line (with ANSI C escape sequences allowed in the C string). For example, a line might look like
For 4-byte literal sections, the order-file format is one 32-bit hex number with a leading 0x per line with the rest of the line treated as a comment. For example, a line might look like
For 8-byte literal sections, the order file line consists of two 32-bit hexadecimal numbers per line separated by white space each with a leading 0x, with the rest of the line treated as a comment. For example, a line might look like:
0x3ff00000 0x00000000 (1.00000000000000000000e+00)
For literal pointer sections, the format of the lines in the order file represents the pointers, one per line. A literal pointer is represented by the segment name, the section name of the literal pointer, and then the literal itself. These are separated by colons with no extra white space. For example, a line might look like:
For all the literal sections, each line in the order file is simply entered into the literal section and appears in the output file in the order of the order file. No check is made to see if the literal is in the loaded objects.
To reorder a literal section, first create a “whatsloaded” file using the
-whatsloaded option as described in section “Creating a Default Order File.” Then, run
otool with the appropriate options, segment and section names, and filenames. The output of
otool is a default order file for the specified section. For example, the following command line produces an order file listing the default load order for the
__cstring section in the file
otool -X -v -s __TEXT __cstring `cat whatsloaded` > cstring_order
Once you’ve created the file
cstring_order, you can edit the file and rearrange its entries to optimize locality of reference. For example, you can place literal strings used most frequently by your program (such as labels that appear in your user interface) at the beginning of the file. To produce the desired load order in the executable, use the following command:
cc -o hello hello.o -sectorder __TEXT __cstring cstring_order
Reordering Data Sections
There are currently no tools to measure code references to data symbols. However, you might know a program’s data-referencing patterns and might be able to get some savings by separating data for seldom-used features from other data. One way to approach
__data section reordering is to sort the data by size so small data items end up on as few pages as possible. For example, if a larger data item is placed across two pages with two small items sharing each of these pages, the larger item must be paged in to access the smaller items. Reordering the data by size can minimize this sort of inefficiency. Because this data would normally need to be written to the virtual-memory backing store, this could be a major savings in some programs.
To reorder the
__data section, first create an order file listing source files and symbols in the order in which you want them linked (order file entries are described at the beginning of “Generating Order Files”). Then, link the program using the
-sectorder command-line option:
cc -o outputFile inputFile.o
-sectorder __DATA __data orderFile
To use an order file with a Xcode project, modify the “Other Linker Flags” option in the Deployment build style of your project. Add the text
-sectorder __DATA __data orderFile to this setting to specify your order file.
Reordering Assembly Language Code
Some additional guidelines to keep in mind when reordering routines coded in assembly language:
temporary labels in assembly code
Within hand-coded assembly code, be careful of branches to temporary labels that branch over a non temporary label. For example, if you use a label that starts with “L” or a d label (where d is a digit), as in this example
foo: b 1f
The resulting program won’t link or execute correctly, because only the symbols
barmake it into the object file’s symbol table. References to the temporary label
1are compiled as offsets; as a result, no relocation entry is generated for the instruction
b 1f. If the linker does not place the block associated with the symbol
bardirectly after that associated with
foo, the branch to
1fwill not go to the correct place. Because there is no relocation entry, the linker doesn’t know to fix up the branch. The source-code change to fix this problem is to change the label
1to a non temporary label (
bar1for example). You can avoid problems with object files containing hand-coded assembly code by linking them whole, without reordering.
If the specified section in any input file has a non-zero size and there is no symbol with the value of the beginning of its section, the linker uses the pseudo symbol
.section_startas the symbol name it associates with the first block in the section. The purpose of this symbol is to deal with literal constants whose symbols do not persist into the object file. Because literal strings and floating-point constants are in literal sections, this not a problem for Apple compilers. You might see this symbol used by assembly-language programs or non-Apple compilers. However, you should not reorder such code and you should instead link the file whole, without reordering (see“Linking with an Order File”).