Random crashing in dlopen from Emacs with native compilation in macOS 13

Hi,

I've been using Emacs with native compilation enabled for a year or so now. Since upgrading to macOS 13, I noticed that occasionally, while Emacs is starting, it will crash with a trace like this (it's not always the same, but it always has dlopen_from in it):

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib               0x1a6d621b0 __pthread_kill + 8
1   libsystem_pthread.dylib               0x1a6d98cec pthread_kill + 288
2   libsystem_c.dylib                     0x1a6c9aa50 raise + 32
3   Emacs                                 0x100d69aec
terminate_due_to_signal + 204
4   Emacs                                 0x100d6a2f0 emacs_abort + 20
5   Emacs                                 0x100d38db0 ns_term_shutdown + 132
6   Emacs                                 0x100c24b94 shut_down_emacs + 332
7   Emacs                                 0x100d69ab4
terminate_due_to_signal + 148
8   Emacs                                 0x100c47084 handle_fatal_signal + 16
9   Emacs                                 0x100c47100
deliver_thread_signal + 124
10  Emacs                                 0x100c45700
deliver_fatal_thread_signal + 12
11  libsystem_platform.dylib             0x1a6dc72a4 _sigtramp + 56
12  dyld                                 0x1a6a9c1b4
lsl::BTree<lsl::Allocator::Buffer,
lsl::PersistentAllocator::RegionSizeCompare, false>::NodeCore<15u,
10u>::splitChild(unsigned char, lsl::Allocator&) + 172
13  dyld                                 0x1a6a9c1b4
lsl::BTree<lsl::Allocator::Buffer,
lsl::PersistentAllocator::RegionSizeCompare, false>::NodeCore<15u,
10u>::splitChild(unsigned char, lsl::Allocator&) + 172
14  dyld                                 0x1a6a9bf78
lsl::BTree<lsl::Allocator::Buffer,
lsl::PersistentAllocator::RegionSizeCompare,
false>::const_iterator::prepareForInsertion(lsl::Allocator&) + 532
15  dyld                                 0x1a6a9bbb0
lsl::BTree<lsl::Allocator::Buffer,
lsl::PersistentAllocator::RegionSizeCompare,
false>::insert_internal(lsl::BTree<lsl::Allocator::Buffer,
lsl::PersistentAllocator::RegionSizeCompare, false>::const_iterator&&,
lsl::Allocator::Buffer&&) + 452
16  dyld                                 0x1a6a99a90
lsl::PersistentAllocator::reserveRange(lsl::BTree<lsl::Allocator::Buffer,
lsl::PersistentAllocator::RegionSizeCompare, false>::const_iterator&,
lsl::Allocator::Buffer) + 412
17  dyld                                 0x1a6a9a0b0
lsl::PersistentAllocator::allocate_buffer(unsigned long, unsigned
long, unsigned long, lsl::Allocator**) + 624
18  dyld                                 0x1a6a991e0
lsl::Allocator::aligned_alloc(unsigned long, unsigned long) + 180
19  dyld                                 0x1a6a99230
lsl::Allocator::strdup(char const*) + 48
20  dyld                                 0x1a6a968b8
dyld4::FileManager::fileRecordForPath(char const*) + 40
21  dyld                                 0x1a6ab4624
dyld4::Atlas::ProcessSnapshot::Serializer::readMappedFileInfo(std::__1::span<std::byte,
18446744073709551615ul>&, unsigned long long&, lsl::UUID&,
dyld4::FileRecord&) + 132
22  dyld                                 0x1a6ab3170
dyld4::Atlas::ProcessSnapshot::Serializer::deserialize(std::__1::span<std::byte,
18446744073709551615ul>) + 928
23  dyld                                 0x1a6ab2cec
dyld4::Atlas::ProcessSnapshot::ProcessSnapshot(lsl::Allocator&,
dyld4::FileManager&, bool, std::__1::span<std::byte,
18446744073709551615ul>) + 304
24  dyld                                 0x1a6a7ef64
lsl::UniquePtr<dyld4::Atlas::ProcessSnapshot>
lsl::Allocator::makeUnique<dyld4::Atlas::ProcessSnapshot,
lsl::EphemeralAllocator&, dyld4::FileManager&, bool,
std::__1::span<std::byte, 18446744073709551615ul>
const&>(lsl::EphemeralAllocator&, dyld4::FileManager&, bool&&,
std::__1::span<std::byte, 18446744073709551615ul> const&) + 80
25  dyld                                 0x1a6a7bafc
dyld4::RuntimeState::getCurrentProcessSnapshot() + 96
26  dyld                                 0x1a6a7b8cc
dyld4::RuntimeState::notifyDebuggerLoad(std::__1::span<dyld4::Loader
const*, 18446744073709551615ul> const&) + 72
27  dyld                                 0x1a6aaaf3c
dyld4::APIs::dlopen_from(char const*, int, void*)::$_0::operator()()
const + 748
28  dyld                                 0x1a6aa4968
dyld4::APIs::dlopen_from(char const*, int, void*) + 892
29  Emacs                                 0x100cecdb0 Fnative_elisp_load + 356
30  Emacs                                 0x100cc888c Fload + 2104
31  Emacs                                 0x100cca5f0 save_match_data_load + 92
32  Emacs                                 0x100ca4928
load_with_autoload_queue + 120

I reported the issue to Emacs: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=60220

We are currently stumped. The working theory is that it has to do with the fact that I am loading libraries in an Emacs "idle timer", which means that it is theoretically possible that a load is getting interrupted by that timer and then when it attempts to load again there is some bad state.

This also seems related to this crash in Xojo: https://forum.xojo.com/t/xojo-2022r3-1-crashes-galore/72887/13

Does anyone have any suggestions of how to investigate this further or any idea what may be causing it?

Thanks,

Aaron

Post not yet marked as solved Up vote post of aaronjensen Down vote post of aaronjensen
536 views
  • The same errors have been also reported in other projects (besides xojo mentioned above): GHC (Glasgow Haskell Compiler): https://gitlab.haskell.org/ghc/ghc/-/issues/23097 Darktable https://github.com/darktable-org/darktable/issues/13221

    Please investigate this problem with dyld . Thank you!

Add a Comment

Replies

The same errors have been also reported in other projects (besides xojo mentioned above):

GHC (Glasgow Haskell Compiler): https://gitlab.haskell.org/ghc/ghc/-/issues/23097

Darktable https://github.com/darktable-org/darktable/issues/13221

Please investigate this dyld problem. Thank you!

I have made a simple repro to prove this is an Apple dyld4 issue and have filed a feedback. Unfortunately, crickets from Apple so far. This makes Ventura unusable for scientific python development. See https://github.com/erykoff/ventura_dlopen2