Crash starting app after upgrade to Monterey

Need some help debugging an issue that only started after upgrading to Monterey. Our application is failing to start resulting in a segmentation fault. I am new to debugging such things on MacOS. Using lldb I get the following back trace. Tells me a little but I am unsure how to approach the issue. Looks like it may be related to loading symbols from a dynamic library? Oh, and yes, everything was rebuilt and all dependent libraries installed / upgraded with brew, etc.

Thanks, Bob

(lldb) bt

* thread #1, stop reason = signal SIGSTOP

  * frame #0: 0x0000000000000000

    frame #1: 0x00007ff807e34bb2 libobjc.A.dylib`load_images + 1358

    frame #2: 0x0000000115a0a41c dyld`dyld4::RuntimeState::notifyObjCInit(dyld4::Loader const*) + 170

    frame #3: 0x0000000115a0fbfd dyld`dyld4::Loader::runInitializersBottomUp(dyld4::RuntimeState&, dyld3::Array<dyld4::Loader const*>&) const + 167

    frame #4: 0x0000000115a0fbeb dyld`dyld4::Loader::runInitializersBottomUp(dyld4::RuntimeState&, dyld3::Array<dyld4::Loader const*>&) const + 149

    frame #5: 0x0000000115a0fbeb dyld`dyld4::Loader::runInitializersBottomUp(dyld4::RuntimeState&, dyld3::Array<dyld4::Loader const*>&) const + 149

    frame #6: 0x0000000115a0fcac dyld`dyld4::Loader::runInitializersBottomUpPlusUpwardLinks(dyld4::RuntimeState&) const + 108

    frame #7: 0x0000000115a2332e dyld`dyld4::APIs::runAllInitializersForMain() + 222

    frame #8: 0x0000000115a01358 dyld`dyld4::prepare(dyld4::APIs&, dyld3::MachOAnalyzer const*) + 3438

    frame #9: 0x0000000115a004b4 dyld`start + 388

Need some help debugging an issue that only started after upgrading to Monterey.

Does this crash reproduce if you launch the app from the Finder? If so, please do that and then post a crash report. That’ll give us more info to go on.

For details on how to do this, see Posting a Crash Report.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Thanks Quinn, it is a command line application (a threaded service) which requires some arguments. The minimal argument (--version) also results in a crash. Attached a crash report from such a call (could not attach .ips file directly so copied console to txt file and attached).

int main(int argc, char* argv[]) {
  IGNORE_WARNINGS_ASIO;
  if (2 == argc && 0 == strcmp("--version", argv[1])) {
    printVersion();
    return 0;

  } else if (3 != argc) {
    printUsage();
    return 1;
  }

I suspect the issue is related to the framework entries in our CMakeLists.txt file. I was originally seeing some log4cxx initialization items in the back trace which also crashed due to jump to null pointer. When I moved it to the end of the list the crash only showed the boot loader items before the jump to 0. If I move the log4cxx entry just ahead of the framework entries the back trace shows the initialization calls again. Not sure if there is a recommended way of adding these frameworks.

target_link_libraries (mantis log4cxx)

if (${CMAKE_SYSTEM_NAME} MATCHES "Darwin")
  target_link_libraries (mantis
      "-framework Foundation"
      "-framework CoreServices")
endif (${CMAKE_SYSTEM_NAME} MATCHES "Darwin")

if (GPERF_TOOLS MATCHES "ON")
    target_link_libraries (mantis ${PROFILER_LIBRARY})
    target_link_libraries (mantis ${TMALLOC_LIBRARY})
endif (GPERF_TOOLS MATCHES "ON")

could not attach .ips file directly so copied console to txt file and attached

That’s fine. I’ve already filed a bug requesting that DevForums allow .ips attachments, but using .txt is a valid workaround (r. 84656849).

Consider this:

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0x0000000000000000

You’ve crashed because you’re accessing an invalid pointer, in this case nil.

Now this:

rip: 0x0000000000000000  …

So the invalid memory access is actually an instruction fetch.

Consider the crashing thread backtrace:

0 ???                         0x0 ???
1 libobjc.A.dylib  0x7ff807e34bb2 load_images + 1358
2 dyld                0x1165d741c dyld4::RuntimeState::notifyObjCInit(…

The dynamic linker is in the process of linking your code. Frame 2 indicates that it’s found Objective-C info in the Mach-O image and thus has called into the Objective-C runtime so that it can do its thing. Frame 1 is that Objective-C runtime code. This is heavily inlined, but as far as I can tell this crash is in code that’s trying to call +load methods.

Now, +load methods are a bad idea and you should avoid them if at all possible. However, depending on your specific circumstances that may not be possible.

If I move the log4cxx entry just ahead of the framework entries the back trace shows the initialization calls again.

Can you post a crash report showing that?

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Sure, here's the back trace . . . 

(lldb) bt
* thread #1, stop reason = signal SIGSTOP
 * frame #0: 0x0000000000000000
  frame #1: 0x000000010b98d12d liblog4cxx.12.dylib`log4cxx::helpers::Class::registerClass(newClass=0x000000010bb1c320) at class.cpp:158:2
  frame #2: 0x000000010b99172b liblog4cxx.12.dylib`log4cxx::helpers::ClassRegistration::ClassRegistration(this=0x000000010bb1c330, accessor=(liblog4cxx.12.dylib`log4cxx::pattern::ClassNamePatternConverter::getStaticClass() at classnamepatternconverter.cpp:32))()) at classregistration.cpp:26:2
  frame #3: 0x000000010b99175d liblog4cxx.12.dylib`log4cxx::helpers::ClassRegistration::ClassRegistration(this=0x000000010bb1c330, accessor=(liblog4cxx.12.dylib`log4cxx::pattern::ClassNamePatternConverter::getStaticClass() at classnamepatternconverter.cpp:32))()) at classregistration.cpp:25:1
  frame #4: 0x000000010b98fc1d liblog4cxx.12.dylib`log4cxx::pattern::ClassNamePatternConverter::registerClass() at classnamepatternconverter.cpp:32:1
  frame #5: 0x000000010b9916e9 liblog4cxx.12.dylib`::__cxx_global_var_init() at classnamepatternconverter.cpp:32:1
  frame #6: 0x000000010b991709 liblog4cxx.12.dylib`_GLOBAL__sub_I_classnamepatternconverter.cpp at classnamepatternconverter.cpp:0
  frame #7: 0x0000000119085b49 dyld`invocation function for block in dyld4::Loader::findAndRunAllInitializers(dyld4::RuntimeState&) const + 182
  frame #8: 0x00000001190ac29b dyld`invocation function for block in dyld3::MachOAnalyzer::forEachInitializer(Diagnostics&, dyld3::MachOAnalyzer::VMAddrConverter const&, void (unsigned int) block_pointer, void const*) const + 242
  frame #9: 0x00000001190a3893 dyld`invocation function for block in dyld3::MachOFile::forEachSection(void (dyld3::MachOFile::SectionInfo const&, bool, bool&) block_pointer) const + 566
  frame #10: 0x0000000119072d91 dyld`dyld3::MachOFile::forEachLoadCommand(Diagnostics&, void (load_command const*, bool&) block_pointer) const + 129
  frame #11: 0x00000001190a361b dyld`dyld3::MachOFile::forEachSection(void (dyld3::MachOFile::SectionInfo const&, bool, bool&) block_pointer) const + 179
  frame #12: 0x00000001190abdf2 dyld`dyld3::MachOAnalyzer::forEachInitializer(Diagnostics&, dyld3::MachOAnalyzer::VMAddrConverter const&, void (unsigned int) block_pointer, void const*) const + 466
  frame #13: 0x0000000119085a7c dyld`dyld4::Loader::findAndRunAllInitializers(dyld4::RuntimeState&) const + 144
  frame #14: 0x0000000119085c08 dyld`dyld4::Loader::runInitializersBottomUp(dyld4::RuntimeState&, dyld3::Array<dyld4::Loader const*>&) const + 178
  frame #15: 0x0000000119085beb dyld`dyld4::Loader::runInitializersBottomUp(dyld4::RuntimeState&, dyld3::Array<dyld4::Loader const*>&) const + 149
  frame #16: 0x0000000119085cac dyld`dyld4::Loader::runInitializersBottomUpPlusUpwardLinks(dyld4::RuntimeState&) const + 108
  frame #17: 0x000000011909932e dyld`dyld4::APIs::runAllInitializersForMain() + 222
  frame #18: 0x0000000119077358 dyld`dyld4::prepare(dyld4::APIs&, dyld3::MachOAnalyzer const*) + 3438
  frame #19: 0x00000001190764b4 dyld`start + 388

Here's the end of our CMake file with changes (with tweaks based on other similar issues I found) . . . 

target_link_libraries (mantis log4cxx)
if (${CMAKE_SYSTEM_NAME} MATCHES "Darwin")
find_library(FoundationLib Foundation)
find_library(CoreServicesLib CoreServices)
find_library(CoreFoundationLib CoreFoundation)
find_library(objcLib objc)
message("LIB: ${FoundationLib}")
message("LIB: ${CoreServicesLib}")
message("LIB: ${CoreFoundationLib}")
message("LIB: ${objcLib}")
target_link_libraries (mantis stdc++ "-framework Foundation")
target_link_libraries (mantis stdc++ "-framework CoreServices")
# target_link_libraries (mantis stdc++ "-framework CoreFoundation")
# target_link_libraries (mantis stdc++ objc)
endif (${CMAKE_SYSTEM_NAME} MATCHES "Darwin")

if (GPERF_TOOLS MATCHES "ON")
target_link_libraries (mantis ${PROFILER_LIBRARY})
target_link_libraries (mantis ${TMALLOC_LIBRARY})
endif (GPERF_TOOLS MATCHES "ON")

Even though the log4cxx unit tests pass I decided to add some debug to the function which is now crashing.

The modified code breaks each step down.

bool Class::registerClass(const Class& newClass)
{
	log4cxx::LogString new_class_name = newClass.getName();
	log4cxx::LogString new_class_name_lower = StringHelper::toLowerCase(new_class_name);
	log4cxx::helpers::Class::ClassMap& class_map = getRegistry();
	class_map[new_class_name_lower] = &newClass;
	//getRegistry()[StringHelper::toLowerCase(newClass.getName())] = &newClass;
	return true;
}

The back trace seems to indicate we crash when exiting. All the debug looks fine, the map was populated 10 times and we crash when adding the 11th element.

(lldb) bt

* thread #1, stop reason = signal SIGSTOP

  * frame #0: 0x0000000000000000

    frame #1: 0x0000000105370901 liblog4cxx.12.dylib`log4cxx::helpers::Class::registerClass(newClass=0x00000001054ff320) at class.cpp:164:1

    frame #2: 0x000000010537478b liblog4cxx.12.dylib`log4cxx::helpers::ClassRegistration::ClassRegistration(this=0x00000001054ff330, accessor=(liblog4cxx.12.dylib`log4cxx::pattern::ClassNamePatternConverter::getStaticClass() at classnamepatternconverter.cpp:32))()) at classregistration.cpp:26:2

    frame #3: 0x00000001053747bd liblog4cxx.12.dylib`log4cxx::helpers::ClassRegistration::ClassRegistration(this=0x00000001054ff330, accessor=(liblog4cxx.12.dylib`log4cxx::pattern::ClassNamePatternConverter::getStaticClass() at classnamepatternconverter.cpp:32))()) at classregistration.cpp:25:1

    frame #4: 0x0000000105372c7d liblog4cxx.12.dylib`log4cxx::pattern::ClassNamePatternConverter::registerClass() at classnamepatternconverter.cpp:32:1

    frame #5: 0x0000000105374749 liblog4cxx.12.dylib`::__cxx_global_var_init() at classnamepatternconverter.cpp:32:1

    frame #6: 0x0000000105374769 liblog4cxx.12.dylib`_GLOBAL__sub_I_classnamepatternconverter.cpp at classnamepatternconverter.cpp:0

    frame #7: 0x000000010b823b49 dyld`invocation function for block in dyld4::Loader::findAndRunAllInitializers(dyld4::RuntimeState&) const + 182

    frame #8: 0x000000010b84a29b dyld`invocation function for block in dyld3::MachOAnalyzer::forEachInitializer(Diagnostics&, dyld3::MachOAnalyzer::VMAddrConverter const&, void (unsigned int) block_pointer, void const*) const + 242

    frame #9: 0x000000010b841893 dyld`invocation function for block in dyld3::MachOFile::forEachSection(void (dyld3::MachOFile::SectionInfo const&, bool, bool&) block_pointer) const + 566

    frame #10: 0x000000010b810d91 dyld`dyld3::MachOFile::forEachLoadCommand(Diagnostics&, void (load_command const*, bool&) block_pointer) const + 129

    frame #11: 0x000000010b84161b dyld`dyld3::MachOFile::forEachSection(void (dyld3::MachOFile::SectionInfo const&, bool, bool&) block_pointer) const + 179

    frame #12: 0x000000010b849df2 dyld`dyld3::MachOAnalyzer::forEachInitializer(Diagnostics&, dyld3::MachOAnalyzer::VMAddrConverter const&, void (unsigned int) block_pointer, void const*) const + 466

    frame #13: 0x000000010b823a7c dyld`dyld4::Loader::findAndRunAllInitializers(dyld4::RuntimeState&) const + 144

    frame #14: 0x000000010b823c08 dyld`dyld4::Loader::runInitializersBottomUp(dyld4::RuntimeState&, dyld3::Array<dyld4::Loader const*>&) const + 178

    frame #15: 0x000000010b823beb dyld`dyld4::Loader::runInitializersBottomUp(dyld4::RuntimeState&, dyld3::Array<dyld4::Loader const*>&) const + 149

    frame #16: 0x000000010b823cac dyld`dyld4::Loader::runInitializersBottomUpPlusUpwardLinks(dyld4::RuntimeState&) const + 108

    frame #17: 0x000000010b83732e dyld`dyld4::APIs::runAllInitializersForMain() + 222

    frame #18: 0x000000010b815358 dyld`dyld4::prepare(dyld4::APIs&, dyld3::MachOAnalyzer const*) + 3438

    frame #19: 0x000000010b8144b4 dyld`start + 388

(lldb) f 1

frame #1: 0x0000000105370901 liblog4cxx.12.dylib`log4cxx::helpers::Class::registerClass(newClass=0x00000001054ff320) at class.cpp:164:1

   161 		class_map[new_class_name_lower] = &newClass;

   162 		//getRegistry()[StringHelper::toLowerCase(newClass.getName())] = &newClass;

   163 		return true;

-> 164 	}

   165 	

   166 	void Class::registerClasses()

   167 	{

(lldb) frame variable

(log4cxx::pattern::ClassNamePatternConverter::ClazzClassNamePatternConverter &) newClass = 0x00000001054ff320 {}

(log4cxx::LogString) new_class_name = "ClassNamePatternConverter"

(log4cxx::LogString) new_class_name_lower = "classnamepatternconverter"

(log4cxx::helpers::Class::ClassMap &) class_map = size=11: {

  [0] = (first = "action", second = 0x00000001054ff0e0)

  [1] = (first = "andfilter", second = 0x00000001054ff100)

  [2] = (first = "appenderattachableimpl", second = 0x00000001054ff120)

  [3] = (first = "appenderskeleton", second = 0x00000001054ff140)

  [4] = (first = "asyncappender", second = 0x00000001054ff1d8)

  [5] = (first = "bufferedwriter", second = 0x00000001054ff218)

  [6] = (first = "bytearrayinputstream", second = 0x00000001054ff238)

  [7] = (first = "bytearrayoutputstream", second = 0x00000001054ff258)

  [8] = (first = "charsetdecoder", second = 0x00000001054ff278)

  [9] = (first = "charsetencoder", second = 0x00000001054ff2c8)

  [10] = (first = "classnamepatternconverter", second = 0x00000001054ff320)

}

Right, so this is a C++ static initialiser rather than a +load method, but the same basic setup applies.

The back trace seems to indicate we crash when exiting.

Indeed. In vanilla C that sort of thing is usually indicative of a stack smasher but in C++ it could also be a destructor. I think you’ll have to debug this at the assembly level )-:

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Solved the issue the brute force way. Came up with a trivial application using the existing CMake files. Once it crashed I backed out libraries until it did not. Turns out it was gperftools and tcmalloc.

Hello! After many iterations, we think we arrived at the heart of the issue, but we need assistance to dig it further (@eskimo, thanks for your help btw!).

TL;DR: apple-clang replaces calls to free(x) (where x was allocated with a fixed-size at compile-time) with free_definite_size(x). The problem is that, by definition in malloc.h, the callback free_definite_size can be NULL (and it is correctly checked in malloc.c: https://opensource.apple.com/source/libmalloc/libmalloc-317.140.5/src/malloc.c.auto.html, function void free(void* ptr)).

tcmalloc doesn't implement free_definite_size: https://github.com/gperftools/gperftools/blob/master/src/libc_override_osx.h#L275 . Any program which does allocation/deallocation of fixed size, for example:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
    void *ptr = malloc(128);
    printf("malloc(128) returned %p\n",ptr);
    free(ptr);  /* replaced by free_definite_size(ptr, 128); */
    return 0;
}

It will crash with tcmalloc (because of the lack of the callback for free_definite_size) but it runs fine without it. Solution: we can provide patches to tcmalloc (even if the maintainer doesn't answer), but I think either clang should be fixed, or the documentation updated with an error at compile time when registering the zone. Can someone help me to escalate this to Apple clang developers? Thanks!

apple-clang replaces calls to free(x) … with free_definite_size(x).

It does. That’s not what I’m seeing. I’m using the clang built in to Xcode 13.1:

% clang -v    
Apple clang version 13.0.0 (clang-1300.0.29.3)
Target: x86_64-apple-darwin20.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

And I see it compile your example to call free:

% clang main.c
% lldb a.out
…
(lldb) disas -n main
a.out`main:
…
a.out[0x100003f26] <+22>: movl   $0x80, %edi
a.out[0x100003f2b] <+27>: callq  0x100003f5e       ; symbol stub for: malloc
a.out[0x100003f30] <+32>: movq   %rax, -0x18(%rbp)
…
a.out[0x100003f46] <+54>: movq   -0x18(%rbp), %rdi
a.out[0x100003f4a] <+58>: callq  0x100003f58       ; symbol stub for: free
…

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Is there something else that may interfere? Because I have the same disassembly result, however, I have the crash:

(lldb) r
Process 8502 launched: '/Users/aple/workspace/gperftools/a.out' (x86_64)
malloc(128) returned 0x100604000
Process 8502 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x0000000000000000
error: memory read failed for 0x0
Target 0: (a.out) stopped.
(lldb) disas -n main
a.out`main:
    ...
    0x100003f26 <+22>: movl   $0x80, %edi
    0x100003f2b <+27>: callq  0x100003f5e               ; symbol stub for: malloc
    0x100003f30 <+32>: movq   %rax, -0x18(%rbp)
    ...
    0x100003f46 <+54>: movq   -0x18(%rbp), %rdi
    0x100003f4a <+58>: callq  0x100003f58               ; symbol stub for: free
    0x100003f4f <+63>: xorl   %eax, %eax
    ...
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x0000000000000000
    frame #1: 0x0000000100003f4f a.out`main + 63
    frame #2: 0x00000001000154fe dyld`start + 462
(lldb)

I am linking with -ltcmalloc. If I patch tcmalloc by providing a custom free_definite_size (which simply calls tcmalloc's free), then compile and link again the simple PoC, the segfault goes away. Unfortunately, I cannot go deeper with the debugger (i.e., I cannot see what is happening inside the called free). Thanks!

If you run the tool outside of LLDB, the crash should generate a crash report. Please post that here.

For advice on how to do that, see Posting a Crash Report.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Yep, here it is:

It would be good to analyze clang-apple internals for such free...

Thanks! Nat

Yep, here it is:

Thanks. Correlating this with your LLDB transcript earlier, my best guess is that:

  • The call to free at +58 isn’t actually calling the libSystem free but rather one provided by libtcmalloc.

  • That doesn’t appear in the backtrace due to tail call optimisation.

The last one is a kicker because it obscures any state from the first one.

If you set a breakpoint at +58 and then step one instruction at a time (using LLDB’s thread step-inst, something I usually access via the GDB compatibility si command), where do you end up?

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Hello, I cannot debug more, because I would need to compile a debuggable version of libmalloc. My findings so far:

...
(lldb) thread step-inst
Process 95351 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step into
    frame #0: 0x00007ff81bb921f1 libsystem_malloc.dylib`free + 337
libsystem_malloc.dylib`free:
->  0x7ff81bb921f1 <+337>: popq   %rbp
    0x7ff81bb921f2 <+338>: jmpq   *%rax
    0x7ff81bb921f4 <+340>: xorl   %edi, %edi
    0x7ff81bb921f6 <+342>: testl  $0x140, 0x419face0(%rip)  ; malloc_num_zones_allocated, imm = 0x140
Target 0: (crash) stopped.
(lldb) thread step-inst
Process 95351 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step into
    frame #0: 0x00007ff81bb921f2 libsystem_malloc.dylib`free + 338
libsystem_malloc.dylib`free:
->  0x7ff81bb921f2 <+338>: jmpq   *%rax
    0x7ff81bb921f4 <+340>: xorl   %edi, %edi
    0x7ff81bb921f6 <+342>: testl  $0x140, 0x419face0(%rip)  ; malloc_num_zones_allocated, imm = 0x140
    0x7ff81bb92200 <+352>: sete   %dil
Target 0: (crash) stopped.
(lldb) register read rax --format dec
     rax = 140703593735262  libsystem_malloc.dylib`default_zone_free_definite_size
(lldb) thread step-inst
Process 95351 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step into
    frame #0: 0x00007ff81bb9245e libsystem_malloc.dylib`default_zone_free_definite_size
libsystem_malloc.dylib`default_zone_free_definite_size:
->  0x7ff81bb9245e <+0>:  movq   0x41a16d63(%rip), %rdi    ; lite_zone
    0x7ff81bb92465 <+7>:  testq  %rdi, %rdi
    0x7ff81bb92468 <+10>: jne    0x7ff81bb92474            ; <+22>
    0x7ff81bb9246a <+12>: movq   0x419fa8b7(%rip), %rax    ; malloc_zones
Target 0: (crash) stopped.
...

From tcmalloc's side, everything is returned as it should (128 for the size), and it recognizes the pointer as owned. How can I go deeper? Thanks!

Crash starting app after upgrade to Monterey
 
 
Q