macOS 15.x crashes in MetalPerformanceShadersGraph

In our app we use CoreML. But ever since macOS 15.x was released we started to get a great bunch of crashes like this:

Incident Identifier: 424041c3-884b-4e50-bb5a-429a83c3e1c8
CrashReporter Key:   B914246B-1291-4D44-984D-EDF84B52310E
Hardware Model:      Mac14,12
Process:         <REMOVED> [1509]
Path:            /Applications/<REMOVED>
Identifier:      com.<REMOVED>
Version:         <REMOVED>
Code Type:       arm64
Parent Process:  launchd [1]

Date/Time:       2024-11-13T13:23:06.999Z
Launch Time:     2024-11-13T13:22:19Z
OS Version:      Mac OS X 15.1.0 (24B83)
Report Version:  104

Exception Type:  SIGABRT
Exception Codes: #0 at 0x189042600
Crashed Thread:  36

Thread 36 Crashed:
0   libsystem_kernel.dylib               0x0000000189042600 __pthread_kill + 8
1   libsystem_c.dylib                    0x0000000188f87908 abort + 124
2   libsystem_c.dylib                    0x0000000188f86c1c __assert_rtn + 280
3   Metal                                0x0000000193fdd870 MTLReportFailure.cold.1 + 44
4   Metal                                0x0000000193fb9198 MTLReportFailure + 444
5   MetalPerformanceShadersGraph         0x0000000222f78c80 -[MPSGraphExecutable initWithMPSGraphPackageAtURL:compilationDescriptor:] + 296
6   Espresso                             0x00000001a290ae3c E5RT::SharedResourceFactory::GetMPSGraphExecutable(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, NSDictionary*) + 932
.
.
.
43  CoreML                               0x0000000192d263bc -[MLModelAsset modelWithConfiguration:error:] + 120
44  CoreML                               0x0000000192da96d0 +[MLModel modelWithContentsOfURL:configuration:error:] + 176
45  <REMOVED>                            0x000000010497b758 -[<REMOVED> <REMOVED>] (<REMOVED>)

No similar crashes on macOS 12-14!

Incident Identifier: 424041c3-884b-4e50-bb5a-429a83c3e1c8
CrashReporter cKey:   B914246B-1291-4D44-984D-EDF84B52310E
Hardware Model:      Mac14,12
Process:          [1509]
Path:            /Applications/
Identifier:      com.
Version:         
Code Type:       arm64
Parent Process:  launchd [1]

Date/Time:       2024-11-13T13:23:06.999Z
Launch Time:     2024-11-13T13:22:19Z
OS Version:      Mac OS X 15.1.0 (24B83)
Report Version:  104

Exception Type:  SIGABRT
Exception Codes: #0 at 0x189042600
Crashed Thread:  36

Thread 36 Crashed:
0   libsystem_kernel.dylib               0x0000000189042600 __pthread_kill + 8
1   libsystem_c.dylib                    0x0000000188f87908 abort + 124
2   libsystem_c.dylib                    0x0000000188f86c1c __assert_rtn + 280
3   Metal                                0x0000000193fdd870 MTLReportFailure.cold.1 + 44
4   Metal                                0x0000000193fb9198 MTLReportFailure + 444
5   MetalPerformanceShadersGraph         0x0000000222f78c80 -[MPSGraphExecutable initWithMPSGraphPackageAtURL:compilationDescriptor:] + 296
6   Espresso                             0x00000001a290ae3c E5RT::SharedResourceFactory::GetMPSGraphExecutable(std::__1::basic_string, std::__1::allocator > const&, NSDictionary*) + 932
7   Espresso                             0x00000001a290d13c E5RT::SharedResourceManager::GetOrCreateResource(std::__1::basic_string, std::__1::allocator > const&, E5RT::SharedResourceType, std::__1::basic_string, std::__1::allocator > const&, NSDictionary*) + 668
8   Espresso                             0x00000001a28acc7c E5RT::Ops::MpsGraphInferenceOperation::Impl::PrepareOpForEncode() + 820
9   Espresso                             0x00000001a28b4658 E5RT::Ops::MpsGraphInferenceOperation::PrepareOpForEncode() + 80
10  Espresso                             0x00000001a28beb78 E5RT::Ops::PreCompiledComputeOperation::Impl::PrepareOpForEncode() + 280
11  Espresso                             0x00000001a28c3c60 E5RT::Ops::PreCompiledComputeOperation::PrepareOpForEncode() + 396
12  Espresso                             0x00000001a28c5054 E5RT::Ops::PreCompiledComputeOperation::CreatePreCompiledComputeOp(E5RT::PrecompiledComputeOpCreateOptions const&, std::__1::unordered_map, std::__1::allocator >, std::__1::shared_ptr, std::__1::hash, std::__1::allocator > >, std::__1::equal_to, std::__1::allocator > >, std::__1::allocator, std::__1::allocator > const, std::__1::shared_ptr > > >&&, std::__1::unordered_map, std::__1::allocator >, std::__1::shared_ptr, std::__1::hash, std::__1::allocator > >, std::__1::equal_to, std::__1::allocator > >, std::__1::allocator, std::__1::allocator > const, std::__1::shared_ptr > > >&&, std::__1::unordered_map, std::__1::allocator >, std::__1::shared_ptr, std::__1::hash, std::__1::allocator > >, std::__1::equal_to, std::__1::allocator > >, std::__1::allocator, std::__1::allocator > const, std::__1::shared_ptr > > >&&) + 432
13  Espresso                             0x00000001a28c4aa0 E5RT::Ops::PreCompiledComputeOperation::CreatePreCompiledComputeOp(E5RT::PrecompiledComputeOpCreateOptions const&) + 1792
14  Espresso                             0x00000001a28e6b3c E5RT::ExecutionStreamOperation::CreatePreCompiledComputeOp(E5RT::PrecompiledComputeOpCreateOptions const&) + 28
15  Espresso                             0x00000001a2841e50 std::__1::__shared_ptr_pointer, std::__1::allocator >::__on_zero_shared_weak() + 240
16  Espresso                             0x00000001a283ca18 E5RT::ExceptionSafeExecute(std::__1::function) + 56
17  Espresso                             0x00000001a283fd1c e5rt_execution_stream_operation_create_precompiled_compute_operation_with_options + 76
18  CoreML                               0x0000000192e2fe2c __101-[MLE5ProgramLibrary createOperationForFunctionName:forceRespecialization:hasRangeShapeInputs:error:]_block_invoke + 1580
19  libdispatch.dylib                    0x0000000188ec8658 _dispatch_client_callout + 16
20  libdispatch.dylib                    0x0000000188ed7cd8 _dispatch_lane_barrier_sync_invoke_and_complete + 52
21  CoreML                               0x0000000192e2f748 -[MLE5ProgramLibrary createOperationForFunctionName:forceRespecialization:hasRangeShapeInputs:error:] + 276
22  CoreML                               0x0000000192e0fd8c -[MLE5ExecutionStreamOperation _createOperationWithRetryCount:error:] + 184
23  CoreML                               0x0000000192e11948 -[MLE5ExecutionStreamOperation preloadAndReturnError:] + 56
24  CoreML                               0x0000000192c7a4e4 __80-[MLE5StaticShapeExecutionStreamOperationPool prepareWithInitialPoolSize:error:]_block_invoke + 168
25  libdispatch.dylib                    0x0000000188ec8658 _dispatch_client_callout + 16
26  libdispatch.dylib                    0x0000000188ed7cd8 _dispatch_lane_barrier_sync_invoke_and_complete + 52
27  CoreML                               0x0000000192c7a398 -[MLE5StaticShapeExecutionStreamOperationPool prepareWithInitialPoolSize:error:] + 220
28  CoreML                               0x0000000192da5c50 -[MLE5Engine prepareWithConcurrencyHint:error:] + 56
29  CoreML                               0x0000000192d7de4c prepareEngine(id, long, NSError* __autoreleasing*) + 124
30  CoreML                               0x0000000192d7e578 -[MLDelegateModel initWithEngine:error:] + 324
31  CoreML                               0x0000000192d4a1ac +[MLLoader _loadModelWithClass:fromArchive:modelVersionInfo:compilerVersionInfo:configuration:error:] + 560
32  CoreML                               0x0000000192d47ac0 +[MLLoader _loadModelFromArchive:configuration:modelVersion:compilerVersion:loaderEvent:useUpdatableModelLoaders:loadingClasses:error:] + 568
33  CoreML                               0x0000000192d485fc +[MLLoader _loadWithModelLoaderFromArchive:configuration:loaderEvent:useUpdatableModelLoaders:error:] + 464
34  CoreML                               0x0000000192d48ddc +[MLLoader _loadModelFromArchive:configuration:loaderEvent:useUpdatableModelLoaders:error:] + 504
35  CoreML                               0x0000000192d4b55c +[MLLoader _loadModelFromAssetAtURL:configuration:loaderEvent:error:] + 248
36  CoreML                               0x0000000192d4b798 +[MLLoader loadModelFromAssetAtURL:configuration:error:] + 108
37  CoreML                               0x0000000192c964d0 -[MLModelAssetResourceFactoryOnDiskImpl modelWithConfiguration:error:] + 124
38  CoreML                               0x0000000192def798 __60-[MLModelAssetResourceFactory modelWithConfiguration:error:]_block_invoke + 72
39  libdispatch.dylib                    0x0000000188ec8658 _dispatch_client_callout + 16
40  libdispatch.dylib                    0x0000000188ed7cd8 _dispatch_lane_barrier_sync_invoke_and_complete + 52
41  CoreML                               0x0000000192def65c -[MLModelAssetResourceFactory modelWithConfiguration:error:] + 284
42  CoreML                               0x0000000192e04060 -[MLModelAssetModelVendor modelWithConfiguration:error:] + 168
43  CoreML                               0x0000000192d263bc -[MLModelAsset modelWithConfiguration:error:] + 120
44  CoreML                               0x0000000192da96d0 +[MLModel modelWithContentsOfURL:configuration:error:] + 176
45                              0x000000010497b758 -[ ] ()

Any clue what is causing this?

Thanks! :)

I don’t see an easy way to debug this with the info you have available. Consider this tiny test project:

@import Foundation;
@import MetalPerformanceShadersGraph;

int main(int argc, char **argv) {
    [[MPSGraphExecutable alloc] initWithMPSGraphPackageAtURL:nil compilationDescriptor:nil];
    return EXIT_SUCCESS;
}

It crashes with a similar backtrace.

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = hit program assert
    frame #0: 0x0000000186dd2600 libsystem_kernel.dylib`__pthread_kill + 8
    frame #1: 0x0000000186e0af70 libsystem_pthread.dylib`pthread_kill + 288
    frame #2: 0x0000000186d17908 libsystem_c.dylib`abort + 128
    frame #3: 0x0000000186d16c1c libsystem_c.dylib`__assert_rtn + 284
  * frame #4: 0x0000000191d6d870 Metal`MTLReportFailure.cold.1 + 48
    frame #5: 0x0000000191d49198 Metal`MTLReportFailure + 448
    frame #6: 0x0000000220d08c80 MetalPerformanceShadersGraph`-[MPSGraphExecutable initWithMPSGraphPackageAtURL:compilationDescriptor:] + 300
    frame #7: 0x0000000100003f14 Test769129`main + 60
    frame #8: 0x0000000186a88274 dyld`start + 2840

It also prints a handy-dandy error, Error: did not find file at url: (null). However, when you disassemble the code you’ll see that the error is coming from a helper method, -initWithMPSGraphPackageAtURLCommon:compilationDescriptor:error:. So, the actual error could be anything, and there’s no way to tell from this backtrace what actually got printed.

I’m not an expert in Metal or MPS, but my reading of MTLReportFailure is that it might record this failure in the system log. If so, you might be able to make progress on this from a sysdiagnose log captured by the user shortly after seeing the crash. I talk about this more in Using a Sysdiagnose Log to Debug a Hard-to-Reproduce Problem.

Also, if you have access to a JSON crash report (.ips), please post it here. I might be able to learn more from that. See Posting a Crash Report for advice on how to post a crash report.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

According to the declaration of the initWithMPSGraphPackageAtURL:compilationDescriptor: method it does not accept the nil URL. That's probably the reason why the app crashes in the provided example.

The problem in our case, however, is that we are not using the MPSGraphExecutable directly, but trying to load a CoreML model. The MPS seems to be used under the hood and we do not have control over it.

What we already know is that CoreML creates an MPS graph package somewhere under ~/Library/Caches directory when it loads the model for the first time. It tries to load the graph on a subsequent use from the caches. And that's the moment when it crashes the app instead of handling the error. Naively, we would expect CoreML to return an NSError from [MLModel modelWithContentsOfURL:configuration:error:] method instead of crashing. But it does not happen.

What we've also noticed, is that these caches are taking a log of disk space. In our case it's hundreds of megabytes on Sequoia. It was taking much less on previous OS versions (just some megabytes). Can it be that there is just not enough room for all the caches on a user machine, so that they get corrupted?

macOS 15.x crashes in MetalPerformanceShadersGraph
 
 
Q