Reduce dyld overhead

I'm working on a command line tool, and trying to make it as fast as possible. I ran it under instruments' processor trace (really cool tool by the way, thanks for that) and found that the majority of the time it is taking to run, is actually spent in dyld, specifically dyld4::prepare(dyld4::APIs&, mach_o::Header const*). Out of a total run time of 1.27ms my code only takes 34.17μs or about 2.7%, that's a LOT of overhead!

I re-ran my binary with the dyld activity instrument added to the mix, and it showed that the biggest known chunk of time that dyld spends during process startup is in "Run static initializer" from libSystem, though the majority of the time spent by dyld is unaccounted for and left labelled generically as "Launch Executable".

Obviously I can't modify libSystem on my users' systems so is there anything I can do to reduce this overhead? Maybe some way to promise that I won't use the Obj-C runtime so that doesn't need setting-up or something?

Answered by DTS Engineer in 873209022

OK. I don’t think you can optimise beyond that. Rather, I think you should use that as a baseline for performance work as you evolve your program.

Speaking of that, An Apple Library Primer has links to various WWDC talks where the linker team discusses various topics. Most notably, the 2022 talk discusses launch times. It’s well worth a watch.

Finally, just for context, the libSystem initialiser does a bunch of really critical stuff. For example, it has the code that sets up the App Sandbox, if the executable enables it.

If you want to see this initialiser in action, open your true clone project, set a symbolic breakpoint on libSystem_initializer, and run it from Xcode. When you stop at the breakpoint, Xcode will show a page of disassembled code, but that’s not too hard to understand. And most of the symbols are present, so you can look up the source code in Darwin.

IMPORTANT The Darwin open source isn’t guaranteed to match the source used to build the OS, but it’s usually close enough to be quite instructive.

For example, when I run this test on macOS 26.2, I see this:

libSystem.B.dylib`libSystem_initializer:
->  0x1ada5425c <+0>:   pacibsp 
    …
    0x1ada542b0 <+84>:  mov    x0, #0x0                  ; =0 
    0x1ada542b4 <+88>:  mov    x1, x21
    0x1ada542b8 <+92>:  mov    x2, x19
    0x1ada542bc <+96>:  mov    x3, x20
    0x1ada542c0 <+100>: bl     0x1ada54804               ; symbol stub for: __libplatform_init
    …

and the call site for __libplatform_init is here while the implementation is here.

WARNING Darwin is full of implementation details. It’s fine to go there to learn how the system works, but don’t build a product that relies on those implementation details.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

that's a LOT of overhead!

In relative terms, sure. But it’s also a short amount of total time, so it doesn’t give you a lot of space to optimise.

Anyway, there are a bunch of things you can do on this front, but there are also certain fundamental limits to how fast a given process can launch.

If you use C to create a small ‘hello world’ project, how does its performance compare to that of your tool?

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

I made a "true" clone, just returning 0 immediately from main. It looks pretty much the same in terms of dyld setup time. 893μs for "Launch executable" of which 454μs is "Run static initializer" for libSystem.

OK. I don’t think you can optimise beyond that. Rather, I think you should use that as a baseline for performance work as you evolve your program.

Speaking of that, An Apple Library Primer has links to various WWDC talks where the linker team discusses various topics. Most notably, the 2022 talk discusses launch times. It’s well worth a watch.

Finally, just for context, the libSystem initialiser does a bunch of really critical stuff. For example, it has the code that sets up the App Sandbox, if the executable enables it.

If you want to see this initialiser in action, open your true clone project, set a symbolic breakpoint on libSystem_initializer, and run it from Xcode. When you stop at the breakpoint, Xcode will show a page of disassembled code, but that’s not too hard to understand. And most of the symbols are present, so you can look up the source code in Darwin.

IMPORTANT The Darwin open source isn’t guaranteed to match the source used to build the OS, but it’s usually close enough to be quite instructive.

For example, when I run this test on macOS 26.2, I see this:

libSystem.B.dylib`libSystem_initializer:
->  0x1ada5425c <+0>:   pacibsp 
    …
    0x1ada542b0 <+84>:  mov    x0, #0x0                  ; =0 
    0x1ada542b4 <+88>:  mov    x1, x21
    0x1ada542b8 <+92>:  mov    x2, x19
    0x1ada542bc <+96>:  mov    x3, x20
    0x1ada542c0 <+100>: bl     0x1ada54804               ; symbol stub for: __libplatform_init
    …

and the call site for __libplatform_init is here while the implementation is here.

WARNING Darwin is full of implementation details. It’s fine to go there to learn how the system works, but don’t build a product that relies on those implementation details.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Reduce dyld overhead
 
 
Q