Understanding Mach-O Symbols

This thread has been locked by a moderator; it no longer accepts new replies.

This posts collects together a bunch of information about the symbols found in a Mach-O file.

It assumes the terminology defined in An Apple Library Primer. If you’re unfamiliar with a term used here, look there for the definition.

If you have any questions or comments about this, start a new thread in the Developer Tools & Services > General topic area and tag it with Linker.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"


Understanding Mach-O Symbols

Every Mach-O file has a symbol table. This symbol table has many different uses:

  • During development, it’s written by the compiler.

  • And both read and written by the linker.

  • And various other tools.

  • During execution, it’s read by the dynamic linker.

  • And also by various APIs, most notably dlsym.

The symbol table is an array of entries. The format of each entry is very simple, but they have been used and combined in various creative ways to achieve a wide range of goals. For example:

  • In a Mach-O object file, there’s an entry for each symbol exported to the linker.

  • In a Mach-O image, there’s an entry for each symbol exported to the dynamic linker.

  • And an entry for each symbol imported from dynamic libraries.

  • Some entries hold information used by the debugger. See Debug Symbols, below.

Examining the Symbol Table

There are numerous tools to view and manipulate the symbol table, including nm, dyld_info, symbols, strip, and nmedit. Each of these has its own man page.

A good place to start is nm:

% nm Products/Debug/TestSymTab
                 U ___stdoutp
0000000100000000 T __mh_execute_header
                 U _fprintf
                 U _getpid
0000000100003f44 T _main
0000000100008000 d _tDefault
0000000100003ecc T _test
0000000100003f04 t _testHelper

Note In the examples in this post, TestSymTab is a Mach-O executable that’s formed by linking two Mach-O object files, main.o and TestCore.o.

There are three columns here, and the second is the most important. It’s a single letter indicating the type of the entry. For example, T is a code symbol (in Unix parlance, code is in the text segment), D is a data symbol, and so on. An uppercase letter indicates that the symbol is visible to the linker; a lowercase letter indicates that it’s internal.

An undefined (U) symbol has two potential meanings:

  • In a Mach-O image, the symbol is typically imported from a specific dynamic library. The dynamic linker connects this import to the corresponding exported symbol of the dynamic library at load time.

  • In a Mach-O object file, the symbol is undefined. In most cases the linker will try to resolve this symbol at link time.

Note The above is a bit vague because there are numerous edge cases in how the system handles undefined symbols. For more on this, see Undefined Symbols, below.

The first column in the nm output is the address associated with the entry, or blank if an address is not relevant for this type of entry. For a Mach-O image, this address is based on the load address, so the actual address at runtime is offset by the slide. See An Apple Library Primer for more about those concepts.

The third column is the name for this entry. These names have a leading underscore because that’s the standard name mangling for C. See An Apple Library Primer for more about name mangling.

The nm tool has a lot of formatting options. The ones I use the most are:

  • -m — This prints more information about each symbol table entry. For example, if a symbol is imported from a dynamic library, this prints the library name. For a concrete example, see A Deeper Examination below.

  • -a — This prints all the entries, including debug symbols. We’ll come back to that in the Debug Symbols section, below.

  • -p — By default nm sorts entries by their address. This disables that sort, causing nm to print the entries in the order in which they occur in the symbol table.

  • -x — This outputs entries in a raw format, which is great when you’re trying to understand what’s really going on. See Raw Symbol Information, below, for an example of this.

A Deeper Examination

To get more information about each symbol table, run nm with the -m option:

% nm -m Products/Debug/TestSymTab 
                 (undefined) external ___stdoutp (from libSystem)
0000000100000000 (__TEXT,__text) [referenced dynamically] external __mh_execute_header
                 (undefined) external _fprintf (from libSystem)
                 (undefined) external _getpid (from libSystem)
0000000100003f44 (__TEXT,__text) external _main
0000000100008000 (__DATA,__data) non-external _tDefault
0000000100003ecc (__TEXT,__text) external _test
0000000100003f04 (__TEXT,__text) non-external _testHelper

This contains a world of extra information about each entry. For example:

  • You no longer have to remember cryptic single letter codes. Instead of U, you get undefined.

  • If the symbol is imported from a dynamic library, it gives the name of that dynamic library. Here we see that _fprintf is imported from the libSystem library.

  • It surfaces additional, more obscure information. For example, the referenced dynamically flag is a flag used by the linker to indicate that a symbol is… well… referenced dynamically, and thus shouldn’t be dead stripped.

Undefined Symbols

Mach-O’s handling of undefined symbols is quite complex. To start, you need to draw a distinction between the linker (aka the static linker) and the dynamic linker.

Undefined Symbols at Link Time

The linker takes a set of files as its input and produces a single file as its output. The input files can be Mach-O images or dynamic libraries [1]. The output file is typically a Mach-O image [2]. The goal of the linker is to merge the object files, resolving any undefined symbols used by those object files, and create the Mach-O image.

There are two standard ways to resolve an undefined symbol:

  • To a symbol exported by another Mach-O object file

  • To a symbol exported by a dynamic library

In the first case, the undefined symbol disappears in a puff of linker magic. In the second case, it records that the generated Mach-O image depends on that dynamic library [3] and adds a symbol table entry for that specific symbol. That entry is also shown as undefined, but it now indicates the library that the symbol is being imported from.

This is the core of the two-level namespace. A Mach-O image that imports a symbol records both the symbol name and the library that exports the symbol.

The above describes the standard ways used by the linker to resolve symbols. However, there are many subtleties here. The most radical is the flat namespace. That’s out of scope for this post, because it’s a really bad option for the vast majority of products. However, if you’re curious, the ld man page has some info about how symbol resolution works in that case.

A more interesting case is the -undefined dynamic_lookup option. This represents a halfway house between the two-level namespace and the flat namespace. When you link a Mach-O image with this option, the linker resolves any undefined symbols by adding a dynamic lookup undefined entry to the symbol table. At load time, the dynamic linker attempts to resolve that symbol by searching all loaded images. This is useful if your software works on other Unix-y platforms, where a flat namespace is the norm. It can simplify your build system without going all the way to the flat namespace.

Of course, if you use this facility and there are multiple libraries that export that symbol, you might be in for a surprise!

[1] These days it’s more common for the build system to pass a stub library (.tbd) to the linker. The effect is much the same as passing in a dynamic library. In this discussion I’m sticking with the old mechanism, so just assume that I mean dynamic library or stub library.

If you’re unfamiliar with the concept of a stub library, see An Apple Library Primer.

[2] The linker can also merge the object files together into a single object file, but that’s relatively uncommon operation. For more on that, see the discussion of the -r option in the ld man page.

[3] It adds an LC_LOAD_DYLIB load command with the install name from the dynamic library. See Dynamic Library Identification for more on that.

Undefined Symbols at Load Time

When you load a Mach-O image the dynamic linker is responsible for finding all the libraries it depends on, loading them, and connecting your imports to their exports. In the typical case the undefined entry in your symbol table records the symbol name and the library that exports the symbol. This allows the dynamic linker to quickly and unambiguously find the correct symbol. However, if the entry is marked as dynamic lookup [1], the dynamic linker will search all loaded images for the symbol and connect your library to the first one it finds.

If the dynamic linker is unable to find a symbol, its default behaviour is to fail the load of the Mach-O image. This changes if the symbol is a weak reference. In that case, the dynamic linking continues to load the image but sets the address of the symbol to NULL. See Weak vs Weak vs Weak, below, for more about this.

[1] In this case nm shows the library name as dynamically looked up.

Weak vs Weak vs Weak

Mach-O supports two different types of weak symbols:

  • Weak references (aka weak imports)

  • Weak definitions

IMPORTANT If you use the term weak without qualification, the meaning depends on your audience. App developers tend to assume that you mean a weak reference whereas folks with a C++ background tend to assume that you mean a weak definition. It’s best to be specific.

Weak References

Weak references support the availability mechanism on Apple platforms. Most developers build their apps with the latest SDK and specify a deployment target, that is, the oldest OS version on which their app runs. Within the SDK, each declaration is annotated with the OS version that introduced that symbol [1]. If the app uses a symbol introduced later than its deployment target, the compiler flags that import as a weak reference. The app is then responsible for not using the symbol if it’s run on an OS release where it’s not available.

For example, consider this snippet:

#include <xpc/xpc.h>

void testWeakReference(void) {
    printf("%p\n", xpc_listener_set_peer_code_signing_requirement);
}

The xpc_listener_set_peer_code_signing_requirement function is declared like so:

API_AVAILABLE(macos(14.4))
…
int
xpc_listener_set_peer_code_signing_requirement(…);

The API_AVAILABLE macro indicates that the symbol was introduced in macOS 14.4. If you build this code with the deployment target set to macOS 13, the symbol is marked as a weak reference:

% nm -m Products/Debug/TestWeakRefC
…
                 (undefined) weak external _xpc_listener_set_peer_code_signing_requirement (from libSystem)

If you run the above program on macOS 13, it’ll print NULL (actually 0x0).

Without support for weak references, the dynamic linker on macOS 13 would fail to load the program because the _xpc_listener_set_peer_code_signing_requirement symbol is unavailable.

[1] In practice most of the SDK’s declarations don’t have availability annotations because they were introduced before the minimum deployment target supported by that SDK.

Weak definitions

Weak references are about imports. Weak definitions are about exports. A weak definition allows you to export a symbol from multiple images. The dynamic linker coalesces these symbol definitions. Specifically:

  • The first time it loads a library with a given weak definition, the dynamic linker makes it the primary.

  • It registers that definition such that all references to the symbol resolve to it. This registration occurs in a namespace dedicated to weak definitions. That namespace is flat.

  • Any subsequent definitions of that symbol are ignored.

Weak definitions are weird, but they’re necessary to support C++’s One Definition Rule in a dynamically linked environment.

IMPORTANT Weak definitions are not just weird, but also inefficient. Avoid them where you can. To flush out any unexpected weak definitions, pass the -warn_weak_exports option to the static linker.

The easiest way to create a weak definition is with the weak attribute:

__attribute__((weak))
void testWeakDefinition(void) {
}

IMPORTANT The C++ compiler can generate weak definitions without weak ever appearing in your code.

This shows up in nm like so:

% nm -m Products/Debug/TestWeakDefC
…
0000000100003f40 (__TEXT,__text) weak external _testWeakDefinition
…

The output is quite subtle. A symbol flagged as weak external is either a weak reference or a weak definition depending on whether it’s undefined or not. For clarity, use dyld_info instead:

% dyld_info -imports -exports Products/Debug/TestWeakRefC 
Products/Debug/TestWeakDefC [arm64]:
    …
    -imports:
      …
      0x0001  _xpc_listener_set_peer_code_signing_requirement [weak-import] (from libSystem)
% dyld_info -imports -exports Products/Debug/TestWeakDefC 
Products/Debug/TestWeakDefC [arm64]:
    -exports:
        offset      symbol
        …
        0x00003F40  _testWeakDefinition [weak-def]
        …
    …

Here, weak-import indicates a weak reference and weak-def a weak definition.

Weak Library

There’s one final confusing use of the term weak, that is, weak libraries. A Mach-O image includes a list of imported libraries and a list of symbols along with the libraries they’re imported from. If an image references a library that’s not present, the dynamic linker will fail to load the library even if all the symbols it references in that library are weak references.

To get around this you need to mark the library itself as weak. If you’re using Xcode it will often do this for your automatically. If it doesn’t, mark the library as optional in the Link Binary with Libraries build phase.

Use otool to see whether a library is required or optional. For example, this shows an optional library:

% otool -L Products/Debug/TestWeakRefC
Products/Debug/TestWeakRefC:
    /usr/lib/libEndpointSecurity.dylib (… 511.60.5, weak)
    …

In the non-optional case, there’s no weak indicator:

% otool -L Products/Debug/TestWeakRefC
Products/Debug/TestWeakRefC:
    /usr/lib/libEndpointSecurity.dylib (… 511.60.5)
    …

Debug Symbols

or Why the DWARF still stabs. (-:

Historically, all debug information was stored in symbol table entries, using a format knows as stabs. This format is now obsolete, having been largely replaced by DWARF. However, stabs symbols are still used for some specific roles.

Note See <mach-o/stab.h> and the stab man page for more about stabs on Apple platforms. See stabs and DWARF for general information about these formats.

In DWARF, debug symbols aren’t stored in the symbol table. Rather, debug information is stored in various __DWARF sections. For example:

% otool -l Intermediates.noindex/TestSymTab.build/Debug/TestSymTab.build/Objects-normal/arm64/TestCore.o | grep __DWARF -B 1
  sectname __debug_abbrev
   segname __DWARF
…

The compiler inserts this debug information into the Mach-O object file that it creates. Eventually this Mach-O object file is linked into a Mach-O image. At that point one of two things happens, depending on the Debug Information Format build setting.

During day-to-day development, set Debug Information Format to DWARF. When the linker creates a Mach-O image from a bunch of Mach-O object files, it doesn’t do anything with the DWARF information in those objects. Rather, it records references to the source objects files into the final image. This is super quick.

When you debug that Mach-O image, the debugger finds those references and uses them to locate the DWARF information in the original Mach-O object files.

Each reference is stored in a stabs OSO symbol table entry. To see them, run nm with the -a option:

% nm -a Products/Debug/TestSymTab
…
0000000000000000 - 00 0001   OSO …/Intermediates.noindex/TestSymTab.build/Debug/TestSymTab.build/Objects-normal/arm64/TestCore.o
0000000000000000 - 00 0001   OSO …/Intermediates.noindex/TestSymTab.build/Debug/TestSymTab.build/Objects-normal/arm64/main.o
…

Given the above, the debugger knows to look for DWARF information in TestCore.o and main.o. And notably, the executable does not contain any DWARF sections:

% otool -l Products/Debug/TestSymTab | grep __DWARF -B 1     
% 

When you build your app for distribution, set Debug Information Format to DWARF with dSYM File. The executable now contains no DWARF information:

% otool -l Products/Release/TestSymTab | grep __DWARF -B 1
% 

Xcode runs dsymutil tool to collect the DWARF information, organise it, and export a .dSYM file. This is actually a document package, within which is a Mach-O dSYM companion file:

% find Products/Release/TestSymTab.dSYM 
Products/Release/TestSymTab.dSYM
Products/Release/TestSymTab.dSYM/Contents
…
Products/Release/TestSymTab.dSYM/Contents/Resources/DWARF
Products/Release/TestSymTab.dSYM/Contents/Resources/DWARF/TestSymTab
…
% file Products/Release/TestSymTab.dSYM/Contents/Resources/DWARF/TestSymTab
Products/Release/TestSymTab.dSYM/Contents/Resources/DWARF/TestSymTab: Mach-O 64-bit dSYM companion file arm64

That file contains a copy of the the DWARF information from all the original Mach-O object files, optimised for use by the debugger:

% otool -l Products/Release/TestSymTab.dSYM/Contents/Resources/DWARF/TestSymTab | grep __DWARF -B 1 
…
  sectname __debug_line
   segname __DWARF
…

Raw Symbol Information

As described above, each Mach-O file has a symbol table that’s an array of symbol table entries. The structure of each entry is defined by the declarations in <mach-o/nlist.h> [1]. While there is an nlist man page, the best documentation for this format is the the comments in the header itself.

Note The terms nlist stands for name list and dates back to truly ancient versions of Unix.

Each entry is represented by an nlist_64 structure (nlist for 32-bit Mach-O files) with five fields:

  • n_strx ‘points’ to the string for this entry.

  • n_type encodes the entry type. This is actually split up into four subfields, as discussed below.

  • n_sect is the section number for this entry.

  • n_desc is additional information.

  • n_value is the address of the symbol.

The four fields within n_type are N_STAB (3 bits), N_PEXT (1 bit), N_TYPE (3 bits), and N_EXT (1 bit).

To see these raw values, run nm with the -x option:

% nm -a -x Products/Debug/TestSymTab                                                               
…
0000000000000000 01 00 0300 00000036 _getpid
0000000100003f44 24 01 0000 00000016 _main
0000000100003f44 0f 01 0000 00000016 _main
…

This prints a column for n_value, n_type, n_sect, n_desc, and n_strx. The last column is the string you get when you follow the ‘pointer’ in n_strx.

The mechanism used to encode all the necessary info into these fields is both complex and arcane. For the details, see the comments in <mach-o/nlist.h> and <mach-o/stab.h>. However, just to give you a taste:

  • The entry for getpid has an n_type field with just the N_EXT flag set, indicating that this is an external symbol. The n_sect field is 0, indicating a text symbol. And n_desc is 0x0300, with the top byte indicating that the symbol is imported from the third dynamic library.

  • The first entry for _main has an n_type field set to N_FUN, indicating a stabs function symbol. The n_desc field is the line number, that is, line 22.

  • The second entry for _main has an n_type field with N_TYPE set to N_SECT and the N_EXT flag set, indicating a symbol exported from a section. In this case the section number is 1, that is, the text section.

[1] There is also an <nlist.h> header that defines an API that returns the symbol table. The difference between <nlist.h> and <mach-o/nlist.h> is that the former defines an API whereas the latter defines the Mach-O on-disk format. Don’t include both; that won’t end well!

Boost
Understanding Mach-O Symbols
 
 
Q