dlopen() reloads original instead of new dylib after changes

We have a C++ library that we hotload on macOS. This uses dlopen() and dlclose() and worked up until recent versions of Catalina. We don't use thread_local and don't have Objective-C code in the library.

dlopen() succeeds, we use the original dylib. Then for hotloading we dlclose() the original dylib and then dlopen() the new dylib. All this succeeds, and no dlerror occurs. All of the dyld output indicates that the library is being unloaded and loaded back in.

But after changing the sources, and building a new dylib, the app returns the original dylib and not the new one. This seems to be a problem in the dyld layer itself, and not our sources. On older macOS builds, the hotloading works correctly. Given the lack of edit+continue in Xcode, this is the only way to iterate quickly on source code changes.

How do we fix this? We are not using the hardened runtime. This is failing on macOS 10.15.7 with Xcode 12.2 (and 12.3).

Replies

Here's one theoretical workaround, but which seems like a hack to what is a fundamental dyld problem. Dynamic libraries make little sense if you can't reload them dynamically. The dyld cache has been around forever, but seems to be not functioning properly here. The modstamp on the dylib is completely different in this case, and should be detected and that version loaded, instead of returning the old dylib.

Also how do you tell with otool if a dylib doesn't meet the criteria for hotloading? Are there any parameters which state that it uses ObjC, thread_local, or Swift code? There must be criteria that dyld uses to prevent hotloading.

Also is there a way to tell with all of the dyld environment flags the modstamp of the dylib that was just loaded? In this case, I'm just modifying a print statement, and then running that code, but it doesn't print the changed line until the app is reloaded.

if (modstamp differs from last load) {
 Create a temp file (writeable since it’s in temp directory), use mkstmp
 Store the modstamp
 Copy newly built file from Library/Caches over to temp
 dlclose() old file
 dlopen() temp file dylib (does that work?, @rpath issues)
 Let the system delete the temp file when user quits, next load starts from Library/Caches directory. 
}
Are you sure your library is actually unloading? There are various things that prevent this, including:
  • Use of the Objective-C or Swift runtime

  • Being the initial provider of a C++ ODR value

An easy test here is to call dlclose and then, in the debugger, do an image list. Is the library still there?

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@apple.com"
Hi Quinn,

The "image list" is a good suggestion. I was just using that other day. I'll try that, and see what lldb reports. This is pure C++, no Swift or ObjC usage.

For some of our users, the dylib hotloads and for others not. It's a mix of people mostly on 10.15.7 and XCode 12.2/3. I did the implementation mentioned above of renaming the lib, but it seems like we have a partial reload. So some code hotloads fine, and other code is still pointing to the old lib. Some code at init of the dylib where we crash (with renaming) is reporting the previous numbered dylib in the debugger. This may be an artifact of how Xcode loads the sources once at launch, but when I do hotload, my breakpoints show the new sources.

By ODR, do you mean on-demand-resource? We did have those a while back, but I think they're disabled currently.


it’s definitely not there after dlclose() on my system. But my system works. Others don't. So I'll get some more info on the machines that don't hotload.

after dlopen()
(lldb) image list 
[623] 1BA9F1FE-153F-3F2B-94ED-A7F7CDD466D7 0x0000000180800000 foosigned.dylib

after dlclose() 
dyld: unloaded: <1BA9F1FE-153F-3F2B-94ED-A7F7CDD466D7> foo
signed.dylib
(lldb) image list 
(lldb)

So here's a capture from dlopen/dlclose() of the same library renumbered at the end. The path doesn't change, and it's looks like the cache restores the original 000.dylib instead of the 001.dylib that dlopen() requested on the second call. I have dyld environment settings providing a little more insight here.

Code Block
(lldb) image list -b
[621] other.dylib
[622] other2.dylib
dlopen("/foo/bar_signed_000.dylib")
dyld: Mapping /foo/bar_signed_000.dylib
dyld: loaded: <A6723547-512E-30A0-9EE5-6E17DB08F79B> /foo/bar_signed_000.dylib
(lldb) image list -b
[621] other.dylib
[622] other2.dylib
[623] bar_signed_000.dylib <- correct, it’s loaded and in the list
dlclose()
dyld: unloaded: <A6723547-512E-30A0-9EE5-6E17DB08F79B> /foo/bar_signed_000.dylib <- correct, unloaded
dlopen("/foo/bar_signed_001.dylib")
dyld: loaded: <A6723547-512E-30A0-9EE5-6E17DB08F79B> /foo/bar_signed_001.dylib <- dyld reused the UUID?
(lldb) image list -b
[627] bar_signed_000.dylib <- ugh, this is wrong path

By ODR, do you mean on-demand-resource?

No, sorry, acronym overload. In the context of C++ ODR means One Definition Rule.

ODR is a real challenge in a dynamically-linked environment. Specifically, ODR requires that the linker merge equivalent definitions. With static linker that produces verifiable results. In a dynamically-linked environment you can’t know in advance which image will become the ‘lead’. And if that lead happens to be in your shared library, you won’t be able to unload it.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@apple.com"
I should add that in the lldb session above, the library hotloads correctly despite "image list" reporting the name of the older library. Also "image list barsigned000.dylib" always reports something even when the library isn't loaded, but "image list" does not. It would be nice to not have to list every dylib from lldb to verify where one is loaded, but that appears to not be possible with current lldb.

Also I forgot it in the example, but "image list -b" no longer listed the barsigned000.dylib after dlclose(). So that in addition to the dyld printout leads me to believe the dylib was purged.

So we get stuck in a state where hotloading stops working, and we get constant crashes in dylib init code at startup after a test hotload that shouldn't crash. Then a few hours later, this problem goes away, as if there was a timeout on the dyld dylib cache that is failing.

Also just to emphasize the broken behavior. If lldb cannot report the correct data, then I'm not sure how to triage this. "image list" and "image list <name>" report different output. One doesn't and one does list the original dylib that's already been unloaded (at least according to dyld printouts). So I think the dyld gets confused, and starts trying to return the original dylib from time to tim.

Code Block
(lldb) image list -b -m bar_signed_000.dylib
[  0] bar_signed_000.dylib Fri Jan 15 11:23:27 2021
dlclose(lib) <- "bar_signed_000.dylib"
dyld: unloaded: <A6723547-512E-30A0-9EE5-6E17DB08F79B> /foo/bar_signed_000.dylib <- great
dlopen(“/foo/bar_signed_001.dylib”)
(lldb) image list -b -m /foo/bar_signed_000.dylib
[ 0] bar_signed_000.dylib Fri Jan 15 11:23:27 2021. <- why is this still listed? it’s not in "image list"
dyld: Mapping /foo/bar_signed_001.dylib
dyld: loaded: <A6723547-512E-30A0-9EE5-6E17DB08F79B> /foo/bar_signed_001.dylib
(lldb) image list -b -m Game_signed_001.dylib
error: no modules found that match 'Game_signed_001.dylib’ <- dyld said it just loaded 001, why can “image list” find it?

This stuff should work reliably, and in my experience it does. This, specifically, is a concern:

Then a few hours later, this problem goes away, as if there was a
timeout on the dyld dylib cache that is failing.

dyld doesn’t have any sort of timeout. Something else is going on here, and I don’t have enough time here on DevForums to fully investigate what it is. Given that, I’m going to recommend that you open a DTS tech support incident for this.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@apple.com"
The forums allow a reply and then say the content is restricted. That loses so many replies.

So we think we found the issue with dyld return the same dylib on hotload and even on relaunch of the application. Unlike linux, which sets RTLD_LOCAL by default, Apple's "man dlopen" indicates that RTLD_GLOBAL is set by default on macOS. We verfied with "image list -b -m foo.dylib" that the timestamp doesn't update while running when the dylib is changed out during a hotload with the default setting (RTLD_GLOBAL) and we only set RTLD_NOW. Also relaunching the app, still returns the old dylib and no the new one in the folder. So something about the internal caching and trying to accelerate re-launch isn't correct.

We now set RTLD_LOCAL | RGLD_NOW and are seeing the correct dylib behavior. The new dylib is picked up during app execution and when the app is relaunched.