Our app has an old codebase, originating in 2011, which started out as purely Objective-C (and a little bit of Objective-C++), but a good amount of Swift has been added over time as well. Lots of Objective-C and Swift inter-op, but in general very few 3rd party libraries/frameworks. Like many other codebases of this size and age, we have a good amount of accumulated tech debt. In our case, that mostly comes in the form of using old/deprecated APIs (OpenGL primary amongst them), and also using some ‘tricks’ that allowed us to do highly customized UI popups and the like before they were officially supported by iOS, but unfortunately are still in use to this day (i.e. adding views directly to the UIWindow such that that are ‘on top’ of everything, instead of presenting a VC). Overall though, the app is very powerful and capable, and generally has a relatively low crash rate.
About two months ago, we started seeing some new crashes that seemed to be totally unrelated to the code changes that were made at the time. Moreover, if a new branch with a feature or bug fix was merged in, the new crash would either disappear entirely, or move somewhere else. These were not ‘normal’ crashes either - when hooked up to the debugger in Xcode, often times the crashes would happen when calling into system library (e.g. initializing a UIColor object).
Some of the steps taken to try and mitigate or eliminate these crashes include:
- Rolling back merges
- Often worked, but then most future merges would cause a new and different crash to appear
- Using the TSan and ASan tools to try and diagnose thread or memory issues
- TSan reported a couple of issues near launch that have been fixed, and there are others in some areas of the app, but they have been around a long time and don’t appear to correlate with any recent changes, nor did fixing the ones at launch (and throughout testing to try and reproduce crashes) result in elimination of the new crashes
- ASan does not identify any issues
- Modifying the code changes in a branch before merging it in
- In one case where the changes were limited to declaring ‘@objc static var: Bool’ in a Swift class and setting a value to it in a couple of places, simply removing the @objc from the declaration would result in the crash going away. Since the var had to be exposed to Objective-C, it was eventually moved to a pure Objective-C class that already existed and is a singleton (not ideal, but it’s been around a long time and has not yet been refactored) in order to preserve the functionality and the crash was no longer reproducible
- Removing all 3rd party libraries or frameworks
- Not a long-term solution, and this mostly worked in that the crashes went away, but it also resulted in removal of long-existing features expected by our users
- Updating 3rd party libraries and frameworks when possible (there were some very old ones)
- Updating these did not have any effect on the crashes, except that the crashes moved around in the same way as when merging in a branch, and again, where the crash actually occurred was uncorrelated with the library/framework that was updated
- Changes to the App’s Build Settings in Xcode
- Set supported/valid architectures to arm64 exclusively
- Stripping of all architectures other than arm64 from 3rd party binaries
- Cleaning up of old/outdated linker flags
- Removal of other custom build flags that were needed at one point, but are no longer relevant
- Generally trying to make all the build settings in our (quite old/outdated) app match those of a newly created iOS app
- Code signing inject base entitlements is set to YES
- Removal of old/deprecated BitCode flag
- These changes seemed to help and the codebase was more ‘stable’ (non-crashing) for a while, but as we tried to continue development, the crashes would reappear
- Getting crash reports off of test devices and analyzing them based on the various documents about crash reports provided by Apple
- This was helpful and pointed to new things to investigate, but ultimately did not help to identify the root cause of these crashes
Throughout all of the above, the crashes would come and go, very reproducibly for a given branch being merged in, but if a subsequent branch is merged in, the crash may go away, or simply move somewhere else - sometimes it would crash in our code that calls other parts of our code, and other times when calling system frameworks (like the UIColor example above). One thing that is consistent though, is that the crash would never happen anywhere near the code that was changed or added by a branch that was merged in.
Additional observations when trying to figure out the cause of these crashes:
- Sometimes the smallest code change would result in a crash happening or not
- The crash reports generated on-device vary quite a bit in terms of the type and reason for the crash
- All crashes have an Exception Type of EXC_BAD_ACCESS, but vary between (SIGABRT) (SIGBUS) (SIGKILL) (SIGSEV)
- The crashing thread is often (but not always) on Thread 0 (main thread), and often the first line in the backtrace would be just ‘???’, sometimes followed by a valid memory address and file, but often times just ‘0x0 ???’
- Most crash reports have an exception subtype of KERN_PROTECTION_FAILURE
- Many also state that the Termination Reason is ‘CODESIGNING 2 Invalid Page’
- This in particular was investigated thoroughly, including looking at the Placing Content In A Bundle document but after further changes to ensure that everything is in the right place, the crashes were still observed
- Another odd thing in most of the crash reports is in the Binary Images section, there is a line that once again is mostly ???s or 000s - specifically ‘0x0 - 0xffffffffffffffff ??? unknown-arch <00000000000000000000000000000000> ???’
- The crashes occur on different physical devices, typically the same crash for a given branch, and regardless of iOS version
- This includes building from different Macs. We did observe some differences between versions of Xcode (crashed similarly when built from an older version of Xcode, but not from a newer one), but we recently had all developers ensure they are running Xcode 16.4 - we also tried Xcode 26, but the crashes were still observed
Overall, it seems like there is something very strange going on in terms of how the App binary is constructed such that a small code change somehow affects the binary in such a way that memory is not being accessed correctly, or is not where it is expected to be. This level of what appears to be a build-time issue that manifests in very strange run-time crashes is both confusing and difficult to diagnose. Despite the resources provided by Apple for investigation and diagnosis, we cannot seem to find a root cause for these crashes and eliminate them for good.
Quinn asked me if I could take a look at this, and I have to say this is going to be a tricky one to track down. Let me start with the basics of what's going on. Pulling from your first crash log, here are the crucial details:
Exception Type: EXC_BAD_ACCESS (SIGKILL)
Exception Subtype: KERN_PROTECTION_FAILURE at 0x0000000000000000
Exception Codes: 0x0000000000000002, 0x0000000000000000
...
Termination Reason: CODESIGNING 2 Invalid Page
...
Thread 4 name: Dispatch queue: assetsQueue
Thread 4 Crashed:
0 ??? 0x0 ???
1 Video Star 0x1012b34b0 __28-[ClipMixerView asyncRender]_block_invoke + 512
2 ...g_rt.asan_ios_dynamic.dylib 0x10559adf4 __wrap_dispatch_async_block_invoke + 196
3 libdispatch.dylib 0x19aacaaac _dispatch_call_block_and_release + 32
4 libdispatch.dylib 0x19aae4584 _dispatch_client_callout + 16
5 libdispatch.dylib 0x19aad32d0 _dispatch_lane_serial_drain + 740
6 libdispatch.dylib 0x19aad3dac _dispatch_lane_invoke + 388
7 libdispatch.dylib 0x19aade1dc _dispatch_root_queue_drain_deferred_wlh + 292
8 libdispatch.dylib 0x19aadda60 _dispatch_workloop_worker_thread + 540
9 libsystem_pthread.dylib 0x21cfe0a0c _pthread_wqthread + 292
10 libsystem_pthread.dylib 0x21cfe0aac start_wqthread + 8
As Quinn suggested, this is definitely a memory corruption bug; however, it's not of the "conventional" type. The standard memory corruption issue is that your app attempts to read or write to memory that's no longer valid, meaning it's interacting with memory as "data“ - for example, "reading" from a NULL pointer. That's NOT what's happening here - you didn't try to read NULL, you tried to "run" NULL.
With that context:
Using the TSan and ASan tools to try and diagnose thread or memory issues
I don't think either of those tools could really catch this issue, nor are they really designed to. At the lowest level, the "bug" here is that NULL (the address you attempted to execute) is being assigned (unexpectedly) to a single UInt64 (the function pointer you executed "from"). That's basically looking for a needle in a pile full of needles.
Even worse:
It seems that almost any change to the code, or which files/libraries/frameworks are included in the build, will change how the crashes manifest, or in some cases, do not manifest.
No "seems" about it, that's exactly what's happening. Any change to your code is going to "rearrange" the internal details of your code, changing what happens. Tracking down a crash like this can be very tricky, but I do have a few ideas and suggestions.
First off, I would take a look at every crash log you've seen that seems "tied" to this issue, ESPECIALLY across different configurations. That you're looking for here isn't the specific cause, it's for any kind of "pattern" that connects the elements. What are you interacting with when you crash? And what kind of connection is there between that object "across" the different crash flavors?
Related to that point, one key question you need to understand here is how "dynamic" this crash actually is. The crash logs you posted are actually crashing at 4 distinct locations from four different builds (note the build UUIDs):
(1)
1 Video Star 0x1012b34b0 __28-[ClipMixerView asyncRender]_block_invoke + 512
...
"uuid" : "c57078b4-a9d4-33b5-b6e3-679c5a0bcecc",
"path" : "...9592CCED-E244-42DE-A786-137F4E78C072\/Video Star.app\/Video Star",
(2)
1 Video Star 0x10038f06c __28-[ClipMixerView asyncRender]_block_invoke + 180
...
"uuid" : "b5de6b1a-acce-3258-8c14-bb00054643e3",
"path" : "...46E2DB5C-D926-4497-8696-5A94003CEC44\/Video Star.app\/Video Star",
(3)
1 Video Star 0x102aa12cc -[VideoPreviewVC(Rendering) finishProcessingNewPreviewVideoFrame:currentTime:] + 460
...
0x1028b8000 - 0x1047bbfff Video Star arm64 <479d575c547e3836817e16132cbe7616>
(4)
1 Video Star 0x1008136b0 -[VideoPreviewVC(Rendering) finishProcessingNewPreviewVideoFrame:currentTime:] + 244
...
0x1006dc000 - 0x101747fff Video Star arm64 <4a1bc7bd702f3dfa83095ee61bace7b5>
Crashes like this are generally not truly "random". Typically, you're either crashing at the same "exact" spot or in a small number of distinct "spots", often with different frequencies. Those patterns often provide "hints" as to what's actually causing the failure, especially if the crash occurs in very different parts of your app.
On the topic of patterns, I always like to look at the timestamps of the logs, as they sometimes contain interesting (and possibly useful) clues as to the problem. In this case, the first two crashes happened after exactly 3s and the second two at 18s and 12s. That could actually be an interesting benefit of ASAN - even if it doesn't catch the crash itself, making the crash more consistent is very useful.
Moving to a more "active" investigation, I would start by picking a particular configuration and focusing all of the investigation on that particular configuration. However, the key here is that your focus here is on FINDING the crash, NOT fixing it. You want to treat the crash as a "stable" focus point that you're trying to "preserve", not something you're actually trying to eliminate.
How the investigation goes from there depends on the specifics of how the crash plays out and how our tools disrupt that process. If the crash is predictable and the debugger doesn't disrupt it, then you can use breakpoints/watchpoints to monitor the point you "know" will crash, and slowly narrow your search to the point where you "see" the change occur.
However, it's also likely that the debugger itself may disrupt the investigation too much to be useful. In that case, it's possible that carefully added logging might get you the information you need. One technique I've had success with here is using targeted code changes to get information which I can then feed "back" to the debugger. For example, you might be able to determine how many times a block is being called "before" a given crash by adding a static int into your app which you increment AFTER the point you know you're going to crash, which you can then check the debugger at the point you actually crash. If the timing is reliable, that can actually let you breakpoint your app just BEFORE you actually crash.
__
Kevin Elliott
DTS Engineer, CoreOS/Hardware