Overview of the Mach-O Executable Format
Mach-O is the native executable format of binaries in OS X and is the preferred format for shipping code. An executable format determines the order in which the code and data in a binary file are read into memory. The ordering of code and data has implications for memory usage and paging activity and thus directly affects the performance of your program.
A Mach-O binary is organized into segments. Each segment contains one or more sections. Code or data of different types goes into each section. Segments always start on a page boundary, but sections are not necessarily page-aligned. The size of a segment is measured by the number of bytes in all the sections it contains and rounded up to the next virtual memory page boundary. Thus, a segment is always a multiple of 4096 bytes, or 4 kilobytes, with 4096 bytes being the minimum size.
The segments and sections of a Mach-O executable are named according to their intended use. The convention for segment names is to use all-uppercase letters preceded by double underscores (for example,
__TEXT); the convention for section names is to use all-lowercase letters preceded by double underscores (for example,
There are several possible segments within a Mach-O executable, but only two of them are of interest in relation to performance: the
__TEXT segment and the
The __TEXT Segment: Read Only
__TEXT segment is a read-only area containing executable code and constant data. By convention, the compiler tools create every executable file with at least one read-only
__TEXT segment. Because the segment is read-only, the kernel can map the
__TEXT segment directly from the executable into memory just once. When the segment is mapped into memory, it can be shared among all processes interested in its contents. (This is primarily the case with frameworks and other shared libraries.) The read-only attribute also means that the pages that make up the
__TEXT segment never have to be saved to backing store. If the kernel needs to free up physical memory, it can discard one or more
__TEXT pages and re-read them from disk when they are needed.
Table 1 lists some of the more important sections that can appear in the
__TEXT segment. For a complete list of segments, see Mach-O Runtime Architecture.
The compiled machine code for the executable
The general constant data for the executable
Literal string constants (quoted strings in source code)
The __DATA Segment: Read/Write
__DATA segment contains the non-constant data for an executable. This segment is both readable and writable. Because it is writable, the
__DATA segment of a framework or other shared library is logically copied for each process linking with the library. When memory pages are readable and writable, the kernel marks them copy-on-write. This technique defers copying the page until one of the processes sharing that page attempts to write to it. When that happens, the kernel creates a private copy of the page for that process.
__DATA segment has a number of sections, some of which are used only by the dynamic linker. Table 2 lists some of the more important sections that can appear in the
__DATA segment. For a complete list of segments, see Mach-O Runtime Architecture.
Mach-O Performance Implications
The composition of the
__DATA segments of a Mach-O executable file has a direct bearing on performance. The techniques and goals for optimizing these segments are different. However, they have as a common goal: greater efficiency in the use of memory.
Most of a typical Mach-O file consists of executable code, which occupies the
__text section. As noted in “The __TEXT Segment: Read Only,” the
__TEXT segment is read-only and is mapped directly to the executable file. Thus, if the kernel needs to reclaim the physical memory occupied by some
__text pages, it does not have to save the pages to backing store and page them in later. It only needs to free up the memory and, when the code is later referenced, read it back in from disk. Although this is cheaper than swapping—because it involves one disk access instead of two—it can still be expensive, especially if many pages have to be recreated from disk.
One way to improve this situation is through improving your code’s locality of reference through procedure reordering, as described in “Improving Locality of Reference.” This technique groups methods and functions together based on the order in which they are executed, how often they are called, and the frequency with which they call one another. If pages in the
__text section group functions logically in this way, it is less likely they have to be freed and read back in multiple times. For example, if you put all of your launch-time initialization functions on one or two pages, the pages do not have to be recreated after those initializations have occurred.
__TEXT segment, the
__DATA segment can be written to and thus the pages in the
__DATA segment are not shareable. The non-constant global variables in frameworks can have an impact on performance because each process that links with the framework gets its own copy of these variables. The main solution to this problem is to move as many of the non-constant global variables as possible to the
__const section by declaring them
const. “Reducing Shared Memory Pages” describes this and related techniques. This is not usually a problem for applications because the
__DATA section in an application is not shared with other applications.
The compiler stores different types of nonconstant global data in different sections of the
__DATA segment. These types of data are uninitialized static data and symbols consistent with the ANSI C notion of “tentative definition” that aren’t declared
extern. Uninitialized static data is in the
__bss section of the
__DATA segment. Tentative-definition symbols are in the
__common section of the
The ANSI C and C++ standards specify that the system must set uninitialized static variables to zero. (Other types of uninitialized data are left uninitialized.) Because uninitialized static variables and tentative-definition symbols are stored in separate sections, the system needs to treat them differently. But when variables are in different sections, they are more likely to end up on different memory pages and thus can be swapped in and out separately, making your code run slower. The solution to these problems, as described in “Reducing Shared Memory Pages,” is to consolidate the non-constant global data in one section of the