dyld library crash with dlopen RTLD_LAZY | RTLD_LOCAL

Hi,

I'm trying to load libgcrypt library (brew install libgcrypt) with dlopen/dlsym but getting a crash when I use RTLD_LAZY | RTLD_LOCAL mode with dlopen.

Any other combination works, for example:

  • RTLD_LAZY | RTLD_GLOBAL
  • RTLD_NOW | RTLD_GLOBAL
  • RTLD_NOW | RTLD_LOCAL

I have attached a tiny sample program to reproduce the issue:


$ gcc -o dyld_test dyld_test.c

$ ./dyld_test gcry_check_version

dyld: lazy symbol binding failed: Symbol not found: __gcry_check_version Referenced from: /usr/local/lib/libgcrypt.dylib Expected in: flat namespace

dyld: Symbol not found: __gcry_check_version Referenced from: /usr/local/lib/libgcrypt.dylib Expected in: flat namespace


I have tried to build dyld myself but failed to succeed. https://opensource.apple.com/source/dyld/dyld-852.2/

Any help would be really appreciated.

Thank you,

Aleix

Replies

Is the __gcry_check_version symbol exported by libgcrypt.dylib? You can determine this with nm. Specifically, what does this report:

% nm -m /usr/local/lib/libgcrypt.dylib | grep gcry_check_version

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

  • Thank you for your answer. Yes, it is exported. This only happens when RTLD_LAZY | RTLD_LOCAL mode is used with dlopen, all other combinations work fine.

    ❯ nm -m /usr/local/lib/libgcrypt.dylib | grep gcry_check_version 0000000000005954 (__TEXT,__text) external __gcry_check_version 0000000000002aa7 (__TEXT,__text) external _gcry_check_version

Add a Comment

Yes, it is exported.

OK.

This only happens when RTLD_LAZY | RTLD_LOCAL mode is used with dlopen, all other combinations work fine.

Yeah, I think I have an explanation for that, but I need one more bit of info. Please do this and post the results:

% xcrun dyldinfo -lazy_bind libgcrypt.dylib | grep gcry_check_version

Please post this as a reply rather than a comment so that the formatting comes across.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Thank you again for your reply. Here's the info you requested:

❯ xcrun dyldinfo -lazy_bind /usr/local/lib/libgcrypt.dylib | grep gcry_check_version
__DATA  __la_symbol_ptr  0x000A40C8 0x02C7 flat-namespace   __gcry_check_version

Also, here's the one from before properly formatted:

❯ nm -m /usr/local/lib/libgcrypt.dylib | grep gcry_check_version
0000000000005954 (__TEXT,__text) external __gcry_check_version
0000000000002aa7 (__TEXT,__text) external _gcry_check_version

Thanks again,

Aleix

For anyone else reading this thread, I have found this PDF to be very useful to understand what lazy, two-level namespace or flat namespace are.

https://www.symbolcrash.com/wp-content/uploads/2019/02/MachORuntime.pdf

Here's the info you requested

Thanks.

I have found this PDF to be very useful to understand what lazy, two-level namespace or flat namespace are.

Yeah, and that means you’ve probably figured out where I’m heading here (-:

Apple platforms default to using a two-level namespace. That is, when a Mach-O image imports a symbol from a dynamic library, it records both the symbol name and the identity of the library. This has a world of benefits, the most important one being the ability to have two symbols of the same name in your process. If one part of your product uses library A with symbol X and another part of your product uses library B that just happens to use X for one of its symbols, things still work.

The flat namespace is effectively deprecated and we encourage folks not to use it.

It seems like libgcrypt.dylib was built with a flat namespace. One consequence of the flat namespace design is that calls between elements within the library go through the flat namespace. This is necessary because folks using the flat namespace expect to be able to define, say, malloc, and have it be used by all code in the process [1].

So, here’s what’s happening here:

  1. You’ve loaded the libgcrypt.dylib with RTLD_LAZY and RTLD_LOCAL. The first tells the dynamic linker to not immediately bind all the library’s symbols. The second tells it not to publish any of its symbols globally.

  2. You try to reference a symbol in the library. As part of this the dynamic linker tries to resolve __gcry_check_version. This is a flat namespace import, so it tries to find it in the global flat namespace. And that fails because you explicitly told the dynamic linker not to publish any of the libraries symbols globally.

Unless libgcrypt.dylib has a hard dependency on the flat namespace — which is rare but does happen sometimes when dealing with Unix-y code — the best solution here is to rebuild it the default two-level namespace.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

[1] The two-level namespace has a different solution for adding debugging infrastructure to your product, namely dynamic linker interposing. It’s not something we officially support, but you’ll find plenty of info about it online.

Thank you so much for your answer, really appreciate it. I believe this makes sense.

Just one last question: what would be the difference with RTLD_NOW | RTLD_LOCAL? Shouldn't this cause the same issue? Or would RTLD_NOW also look int the global flat namespace even if RTLD_LOCAL is defined?

By the way, I found this issue while using Guile (https://www.gnu.org/software/guile/) and trying to load guile-gcrypt. Newer versions of Guile (>= 3.0.6) use RTLD_LAZY | RTLD_LOCAL and when loading guile-gcrypt the crashed happened.

https://lists.gnu.org/archive/html/guile-devel/2021-09/msg00008.html

I have also provided a patch to Guile, but probably not going to be accepted since this doesn't seem a Guile issue.

https://lists.gnu.org/archive/html/guile-devel/2021-09/msg00024.html

So, I will need to go to homebrew and gcrypt and see how libgcrypt is built.

Thank you so much again.

Aleix

The issue was in libgrcypt libtool.m4. It was an old version which didn't support macOS 11.x properly and thus defaulting to using flat_namespace.

I provided a patch to upgrade to a newer libtool.m4:

https://lists.gnupg.org/pipermail/gcrypt-devel/2021-September/005173.html

I still wonder why RTLD_NOW | RTLD_LOCAL worked...

Thanks,

Aleix

I provided a patch to upgrade to a newer libtool.m4

Neat-o!

what would be the difference with RTLD_NOW | RTLD_LOCAL?

I don’t know the dynamic linker implementation well enough to answer that definitively. My best guess is that dlopen establishes a context such that it can resolve symbols for the duration of that call regardless of RTLD_LOCAL. When you supply RTLD_NOW all symbols are resolved in that context, and so thing work.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

No worries, thank you for all the answers!