clang compile with LTO - Bus error, BAD_ACCESS code=2

Since we upgraded a Mac to the latest Xcode (coming from 12.4) to 13.1 or 13.2.1 / commandline tools 13.2 we're hitting an error when linking our executable.

clang: error: unable to execute command: Bus error: 10
clang: error: linker command failed due to signal (use -v to see invocation)
ninja: build stopped: subcommand failed.

It only happens when doing LTO builds. I disabled system hardening to be able to attach LLDB to the linker hoping to find more info on why it might crash and got a huge stack trace:

  * frame #0: 0x00000001058087d0 libLTO.dylib`computeKnownBitsFromOperator(llvm::Operator const*, llvm::APInt const&, llvm::KnownBits&, unsigned int, (anonymous namespace)::Query const&) + 32
  frame #1: 0x00000001057f6333 libLTO.dylib`computeKnownBits(llvm::Value const*, llvm::APInt const&, llvm::KnownBits&, unsigned int, (anonymous namespace)::Query const&) + 1523
  frame #2: 0x00000001057f5cc9 libLTO.dylib`computeKnownBits(llvm::Value const*, llvm::KnownBits&, unsigned int, (anonymous namespace)::Query const&) + 169
  frame #3: 0x00000001058097cd libLTO.dylib`computeKnownBitsFromOperator(llvm::Operator const*, llvm::APInt const&, llvm::KnownBits&, unsigned int, (anonymous namespace)::Query const&) + 4125

total stack size is 24824 frames. There is a clear loop pattern here so either this triggers an endless loop causing the stack to get exhausted or we have something in our code that triggers this deepdive that eventually might resolve when stack room would be a plenty.

It actually looks to be the latter as we have a similar issue on Linux compiles which we fixed by changing the ulimit -s 16384 but that command seems to change absolutely nothing on the mac I'm working on. After changing this in the bash shell (users default shell) it still remains a stack of 24824 frames.

Working here on: Mac OS Monterey 12.1 Mac Pro (2019) - 2.5 GHz 28-Core, 96GB RAM

ld -v :

@(#)PROGRAM:ld PROJECT:ld64-711
BUILD 21:57:11 Nov 17 2021
configured to support archs: armv6 armv7 armv7s arm64 arm64e arm64_32 i386 x86_64 x86_64h armv6m armv7k armv7m armv7em
LTO support using: LLVM version 13.0.0, (clang-1300.0.29.30) (static support for 27, runtime is 27)
TAPI support using: Apple TAPI version 13.0.0 (tapi-1300.0.6.5)

It actually looks to be the latter as we have a similar issue on Linux compiles which we fixed by changing the ulimit -s 16384 but that command seems to change absolutely nothing on the mac I'm working on.

ulimit -s definitely works on the Mac. Consider this:

  1. I wrote a test tool that just prints its pid and then stops in pause.

  2. I ran it from a Terminal window.

  3. While it was paused, I ran vmmap on its pid:

    % vmmap -interleaved 9541
    

    In the map I saw this:

    STACK GUARD 7ff7b6f51000-7ff7ba751000 [ 56.0M     0K     0K     0K] ---/rwx SM=NUL stack guard for thread 0
    Stack       7ff7ba751000-7ff7baf51000 [ 8192K    28K    28K     0K] rw-/rwx SM=PRV thread 0
    

    That’s an 8 MiB stack.

  4. I bumped the stack limit:

    % ulimit -s 65532
    
  5. I repeated steps 3. This time I saw this:

    STACK GUARD 7ff7b0ab6000-7ff7b0ab7000 [    4K     0K     0K     0K] ---/rwx SM=NUL stack guard for thread 0
    Stack       7ff7b0ab7000-7ff7b4ab6000 [ 64.0M    28K    28K     0K] rw-/rwx SM=PRV thread 0
    

    That’s a 64 MiB stack.


You can also set the stack size via the -stack_size linker option. In my case I was able to configure the stack size for my tool to 32 MiB by setting Other Linker Flags (OTHER_LDFLAGS) to -Xlinker -stack_size -Xlinker 0x2000000.

Indeed, it looks like the linker takes advantage of this:

% otool -l /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ld | grep -B 1 -A 3 LC_MAIN
Load command 11
       cmd LC_MAIN
   cmdsize 24
  entryoff 211338
 stacksize 16777216

Which brings us back to your main issue. Given that the linker sets an explicit stack size, monkeying with ulimit won’t help )-:


If you still want to test your stack size theory, you could copy ld, use a hex editor to change the stacksize field of the LC_MAIN load command, re-sign it ad hoc, and then retry your link.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Sorry that was a bit too fast. It turned out my test build didn't build a full release type so it wasn't seeing the issue.

After some tweaking and checking with vmmap it seems like it only applied to the main thread - all other threads remain at 8200k.

Checking again with lldb I notice that the failing thread is thread #4 . Any method we can try to fix the thread stack size?

Stack                    700002dfc000-7000035fe000 [ 8200K    76K    76K     0K] rw-/rwx SM=PRV          thread 1
Stack                    7000035ff000-700003e01000 [ 8200K    76K    76K     0K] rw-/rwx SM=PRV          thread 2
Stack                    700003e02000-700004604000 [ 8200K    76K    76K     0K] rw-/rwx SM=PRV          thread 3
Stack                    700004605000-700004e07000 [ 8200K    76K    76K     0K] rw-/rwx SM=PRV          thread 4
Stack                    700004e08000-70000560a000 [ 8200K    76K    76K     0K] rw-/rwx SM=PRV          thread 5
Stack                    70000560b000-700005e0d000 [ 8200K    76K    76K     0K] rw-/rwx SM=PRV          thread 6
Stack                    700005e0e000-700006610000 [ 8200K    76K    76K     0K] rw-/rwx SM=PRV          thread 7
Stack                    700006611000-700006e13000 [ 8200K    80K    80K     0K] rw-/rwx SM=PRV          thread 8
Stack                    700006e14000-700007616000 [ 8200K    76K    76K     0K] rw-/rwx SM=PRV          thread 9
Stack                    700007617000-700007e19000 [ 8200K    76K    76K     0K] rw-/rwx SM=PRV          thread 10
Stack                    700007e1a000-70000861c000 [ 8200K    76K    76K     0K] rw-/rwx SM=PRV          thread 11
Stack                    70000861d000-700008e1f000 [ 8200K    76K    76K     0K] rw-/rwx SM=PRV          thread 12
Stack                    700008e20000-700009622000 [ 8200K    76K    76K     0K] rw-/rwx SM=PRV          thread 13
Stack                    700009623000-700009e25000 [ 8200K    76K    76K     0K] rw-/rwx SM=PRV          thread 14
Stack                    700009e26000-70000a628000 [ 8200K    76K    76K     0K] rw-/rwx SM=PRV          thread 15
Stack                    70000a629000-70000ae2b000 [ 8200K    76K    76K     0K] rw-/rwx SM=PRV          thread 16
Stack                    70000ae2c000-70000b62e000 [ 8200K    76K    76K     0K] rw-/rwx SM=PRV          thread 17
Stack                    70000b62f000-70000be31000 [ 8200K    76K    76K     0K] rw-/rwx SM=PRV          thread 18
Stack                    70000be32000-70000c634000 [ 8200K   160K   160K     0K] rw-/rwx SM=PRV          thread 19
Stack                    70000c635000-70000ce37000 [ 8200K    76K    76K     0K] rw-/rwx SM=PRV          thread 20
Stack                    70000ce38000-70000d63a000 [ 8200K    76K    76K     0K] rw-/rwx SM=PRV          thread 21
Stack                    70000d63b000-70000de3d000 [ 8200K    76K    76K     0K] rw-/rwx SM=PRV          thread 22
Stack                    70000de3e000-70000e640000 [ 8200K    76K    76K     0K] rw-/rwx SM=PRV          thread 23
Stack                    70000e641000-70000ee43000 [ 8200K    76K    76K     0K] rw-/rwx SM=PRV          thread 24
Stack                    70000ee44000-70000f646000 [ 8200K    76K    76K     0K] rw-/rwx SM=PRV          thread 25
Stack                    70000f647000-70000fe49000 [ 8200K    76K    76K     0K] rw-/rwx SM=PRV          thread 26
Stack                    70000fe4a000-70001064c000 [ 8200K    76K    76K     0K] rw-/rwx SM=PRV          thread 27
Stack                    70001064d000-700010e4f000 [ 8200K    76K    76K     0K] rw-/rwx SM=PRV          thread 28
Stack                    7ff7bef00000-7ff7bfefd000 [ 16.0M   112K   112K     0K] rw-/rwx SM=ZER          thread 0
Stack                    7ff7bfefd000-7ff7bff00000 [   12K    12K    12K     0K] rw-/rwx SM=COW  

According to some extra research it seems that pthread_create on Linux takes the soft limit (which is why the ulimit -s trick worked for our Linux builds) but not for Apple.

Right so after noticing the above I continued the work...

  • Build ld64 version 609 (latest I could find)
  • Tweaked the stacksize for threads created by ld64 to a bigger number

Unfortunately still no luck. First I noticed that vmmap had these big threads but than a bit later during the link the threads were only 8mb again... so I got to think that these threads must be created by another dylib... probably libLTO as this was the crashing dylib.

  • So pulled the llvm sources (https://github.com/apple/llvm-project (branch: apple/main))
  • Found thread stack size was defined in llvm/lib/Support/Threading.cpp:90 to be hardcoded at 8mb, changed that to 16 (like my ulimit was trying to achieve - and achieved on Linux)
  • Executing the following to start a build: cmake -G Ninja -C ../clang/cmake/caches/Apple-stage1.cmake ../llvm -D LLVM_ENABLE_PROJECTS="clang;libcxx;libcxxabi;libLTO" -D CMAKE_INSTALL_PREFIX=/Users/sander/llvm-toolchain -D LLVM_PARALLEL_LINK_JOBS=4 -D LLVM_PARALLEL_COMPILE_JOBS=50 -D LLVM_CREATE_XCODE_TOOLCHAIN=TRUE && ninja stage2-install

Created a bunch of symlinks to get all tools to look at the right paths and... got into a linker error at first:

libLTO: Unknown command line argument '-disable-aligned-alloc-awareness=1'.  Try: 'libLTO --help'
libLTO: Did you mean '--disable-inlined-alloca-merging=1'?
libLTO: Unknown command line argument '-enable-dse-memoryssa=0'.  Try: 'libLTO --help'
libLTO: Did you mean '--enable-memcpyopt-memoryssa=0'?
Process 4793 exited with status = 1 (0x00000001) 

Removed the commandline arguments, then got an error about -mllvm not being supported? Just removed those mentions. As I was unsure what it really ment but just trying to get some results...

Finally I got a (manually) linked binary! I ran ok on my initial tests.

But not sure how to proceed now... this doesn't feel like something I want to have / rely upon / maintain as it's so hackish and relies on 'incorrect' versions as I simply don't have access to the sources that were used to build the latest Command Line tools. Don't feel comfortable shipping this to customers and not sure if this can still get correctly notarized.

I also proved that it's not related to the version of clang, by reverting my code change back to 8mb and getting linker errors again.

Is there any way to get this in an official Apple commandline tools build?

Wow, that’s quite a debugging saga!

Is there any way to get this in an official Apple command-line tools build?

Your best option is to file a bug agains the linker. If you can include the build artefacts necessary to trigger the problem, that’d be grand.

Please post your bug number, just for the record.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

clang compile with LTO - Bus error, BAD_ACCESS code=2
 
 
Q