I'm working on a command line tool, and trying to make it as fast as possible. I ran it under instruments' processor trace (really cool tool by the way, thanks for that) and found that the majority of the time it is taking to run, is actually spent in dyld, specifically
dyld4::prepare(dyld4::APIs&, mach_o::Header const*). Out of a total run time of 1.27ms my code only takes 34.17μs or about 2.7%, that's a LOT of overhead!
I re-ran my binary with the dyld activity instrument added to the mix, and it showed that the biggest known chunk of time that dyld spends during process startup is in "Run static initializer" from libSystem, though the majority of the time spent by dyld is unaccounted for and left labelled generically as "Launch Executable".
Obviously I can't modify libSystem on my users' systems so is there anything I can do to reduce this overhead? Maybe some way to promise that I won't use the Obj-C runtime so that doesn't need setting-up or something?