CoreML MLE5ProgramLibrary AOT recompilation hangs/crashes on iOS 26.4 — C++ exception in espresso IR compiler bypasses Swift error handling

Question

Created 1d

Replies 1

Boosts 0

Participants 2

Area: CoreML / Machine Learning

Describe the issue:
On iOS 26.4, calling MLModel(contentsOf:configuration:) to load an .mlpackage model hangs indefinitely and eventually kills the app via watchdog. The same model loads and runs inference successfully in under 1
second on iOS 26.3.1.

The hang occurs inside eort_eo_compiler_compile_from_ir_program (espresso) during on-device AOT recompilation triggered by MLE5ProgramLibraryOnDeviceAOTCompilationImpl
createProgramLibraryHandleWithRespecialization:error:. A C++ exception (__cxa_throw) is thrown inside libBNNS.dylib during the exception unwind, which then hangs inside __cxxabiv1::dyn_cast_slow and __class_type_info::search_below_dst.

Swift's try/catch does not catch this — the exception originates in C++ and the process hangs rather than terminating cleanly.

Setting config.computeUnits = .cpuOnly does not resolve the issue. MLE5ProgramLibrary initialises as shared infrastructure regardless of compute units.

Steps to reproduce:

Create an app with an .mlpackage CoreML model using the MLE5/espresso backend
Call MLModel(contentsOf: modelURL, configuration: config) at runtime
Run on a device on iOS 26.3.1 — loads successfully in <1 second
Update device to iOS 26.4 — hangs indefinitely, app killed by watchdog after 60–745 seconds

Expected behaviour:
Model loads successfully, or throws a catchable Swift error on failure.

Actual behaviour:
Process hangs in MLE5ProgramLibrary.lazyInitQueue. App killed by watchdog. No Swift error thrown.

Full stack trace at point of hang:
Thread 1 Queue: com.apple.coreml.MLE5ProgramLibrary.lazyInitQueue (serial)
frame 0: __cxxabiv1::__class_type_info::search_below_dst libc++abi.dylib frame 1: __cxxabiv1::(anonymous namespace)::dyn_cast_slow libc++abi.dylib
frame 2: ___lldb_unnamed_symbol_23ab44dd4 libBNNS.dylib
frame 23: eort_eo_compiler_compile_from_ir_program espresso
frame 24: -[MLE5ProgramLibraryOnDeviceAOTCompilationImpl createProgramLibraryHandleWithRespecialization:error:] CoreML
frame 25: -[MLE5ProgramLibrary _programLibraryHandleWithForceRespecialization:error:] CoreML
frame 26: __44-[MLE5ProgramLibrary prepareAndReturnError:]_block_invoke CoreML
frame 27: _dispatch_client_callout libdispatch.dylib
frame 28: _dispatch_lane_barrier_sync_invoke_and_complete libdispatch.dylib
frame 29: -[MLE5ProgramLibrary prepareAndReturnError:] CoreML
frame 30: -[MLE5Engine initWithContainer:configuration:error:] CoreML
frame 31: +[MLE5Engine loadModelFromCompiledArchive:modelVersionInfo:compilerVersionInfo:configuration:error:] CoreML frame 32: +[MLLoader _loadModelWithClass:fromArchive:modelVersionInfo:compilerVersionInfo:configuration:error:] CoreML
frame 45: +[MLModel modelWithContentsOfURL:configuration:error:] CoreML
frame 46: @nonobjc MLModel.__allocating_init(contentsOf:configuration:) GKPersonalV2
frame 47: MDNA_GaitEncoder_v1_3.__allocating_init(contentsOf:configuration:)
frame 48: MDNA_GaitEncoder_v1_3.__allocating_init(configuration:)
frame 50: GaitModelInference.loadModel()
frame 51: GaitModelInference.init()

iOS version: Reproduced on iOS 26.4. Works correctly on iOS 26.3.1.
Xcode version: 26.2
Device: iPhone (model used in testing)
Model format: .mlpackage

Boost

Answer 1

eddiewangyw OP

5h

I've hit a very similar issue with CoreML model loading hanging on the MLE5ProgramLibrary.lazyInitQueue after OS updates. A few things that helped me work around it:

1. Pre-compile to .mlmodelc instead of loading .mlpackage at runtime

The AOT recompilation path (which is what's hanging) gets triggered when the on-device compiled cache is invalidated by the OS update. If you ship a pre-compiled .mlmodelc built with the matching Xcode/SDK version, it often skips recompilation entirely:

// Compile once at build time or first launch
let compiledURL = try MLModel.compileModel(at: mlpackageURL)
// Then load from compiled
let model = try MLModel(contentsOf: compiledURL, configuration: config)

2. Load on a background thread with a timeout

Since the hang is on a serial dispatch queue and the C++ exception bypasses Swift error handling, wrapping the load in a Task with a timeout at least lets you fail gracefully instead of getting watchdog-killed:

let loadTask = Task {
    try MLModel(contentsOf: modelURL, configuration: config)
}
let result = try await withThrowingTaskGroup(of: MLModel.self) { group in
    group.addTask { try await loadTask.value }
    group.addTask {
        try await Task.sleep(for: .seconds(30))
        loadTask.cancel()
        throw CancellationError()
    }
    return try await group.next()!
}

3. Delete the CoreML cache

The stale AOT cache seems to be the trigger. Clearing Library/Caches/com.apple.coreml before loading sometimes forces a clean recompilation that succeeds. Obviously not ideal for production, but useful for diagnosing whether it's a cache corruption issue vs. a compiler bug.

Strongly agree this should be filed as a Feedback — the fact that a C++ exception in espresso/BNNS hangs rather than propagating as an NSError is itself a bug regardless of the AOT issue.

0