Integrate machine learning models into your app using Core ML.

Core ML Documentation

Posts under Core ML subtopic

Post

Replies

Boosts

Views

Activity

Building a 4-agent autonomous coding pipeline on Apple Silicon — MLX backend questions
Hi, I'm building ANF (Autonomous Native Forge) — a cloud-free, 4-agent autonomous software production pipeline running on local hardware with local LLM inference. No middleware, pure Node.js native. Currently running on NVIDIA Blackwell GB10 with vLLM + DeepSeek-R1-32B. Now porting to Apple Silicon. Three technical questions: How production-ready is mlx-lm's OpenAI-compatible API server for long context generation (32K tokens)? What's the recommended approach for KV Cache management with Unified Memory architecture — any specific flags or configurations for M4 Ultra? MLX vs GGUF (llama.cpp) for a multi-agent pipeline where 4 agents call the inference endpoint concurrently — which handles parallel requests better on Apple Silicon? GitHub: github.com/trgysvc/AutonomousNativeForge Any guidance appreciated.
0
0
245
2w
MPS SDPA Attention Kernel Regression on A14-class (M1) in macOS 26.3.1 — Works on A15+ (M2+)
Summary Since macOS 26, our Core ML / MPS inference pipeline produces incorrect results on Mac mini M1 (Macmini9,1, A14-class SoC). The same model and code runs correctly on M2 and newer (A15-class and up). The regression appears to be in the Scaled Dot-Product Attention (SDPA) kernel path in the MPS backend. Environment Affected Mac mini M1 — Macmini9,1 (A14-class) Not affected M2 and newer (A15-class and up) Last known good macOS Sequoia First broken macOS 26 (Tahoe) ? Confirmed broken on macOS 26.3.1 Framework Core ML + MPS backend Language C++ (via CoreML C++ API) Description We ship an audio processing application (VoiceAssist by NoiseWorks) that runs a deep learning model (based on Demucs architecture) via Core ML with the MPS compute unit. On macOS Sequoia this works correctly on all Apple Silicon Macs including M1. After updating to macOS 26 (Tahoe), inference on M1 Macs fails — either producing garbage output or crashing. The same binary, same .mlpackage, same inputs work correctly on M2+. Our Apple contact has suggested the root cause is a regression in the A14-specific MPS SDPA attention kernel, which may have broken when the Metal/MPS stack was updated in macOS 26. The model makes heavy use of attention layers, and the failure correlates precisely with the SDPA path being exercised on A14 hardware. Steps to Reproduce Load a Core ML model that uses Scaled Dot-Product Attention (e.g. a transformer or attention-based audio model) Run inference with MLComputeUnits::cpuAndGPU (MPS active) Run on Mac mini M1 (Macmini9,1) with macOS 26.3.1 Compare output to the same model running on M2 / macOS Sequoia Expected: Correct inference output, consistent with M2+ and macOS Sequoia behavior Actual: Incorrect / corrupted output (or crash), only on A14-class hardware running macOS 26+ Workaround Forcing MLComputeUnits::cpuOnly bypasses MPS entirely and produces correct output on M1, confirming the issue is in the MPS compute path. This is not acceptable as a shipping workaround due to performance impact. Additional Notes The failure is hardware-specific (A14 only) and OS-specific (macOS 26+), pointing to a kernel-level regression rather than a model or app bug We first became aware of this through a customer report Happy to provide a symbolicated crash log if helpful this text was summarized by AI and human verified
1
0
192
1w
How does ARKit achieve low-latency and stable head tracking using only RGB camera ?
Hi, I’m working on a real-time head/face tracking pipeline using a standard 2D RGB camera, and I’m trying to better understand how ARKit achieves such stable and responsive results in comparable conditions. To clarify upfront: I’m specifically interested in RGB-only tracking and the underlying vision/ML pipeline. I’m not using TrueDepth or any depth/IR-based sensors, and I’d like to understand how similar stability and responsiveness can be achieved under those constraints. In my current setup, I estimate head pose from RGB frames (facial landmarks + PnP) and apply temporal filtering (e.g., exponential smoothing and Kalman filtering). This significantly reduces jitter, but introduces noticeable latency, especially during faster head movements. What stands out in ARKit is that it appears to maintain both: Very low jitter Very low perceived latency even when operating with camera input alone. I’m trying to understand what techniques might contribute to this behavior. In particular: Does ARKit use predictive tracking (e.g., velocity or acceleration-based pose extrapolation) to compensate for camera and processing delays in RGB-only scenarios? Are there recommended strategies for balancing temporal smoothing and responsiveness without introducing visible lag in camera-based pose estimation pipelines? Is the tracking pipeline internally decoupled from rendering (e.g., asynchronous processing with prediction applied at render time)? Are there general best practices for minimizing end-to-end latency in vision-based head tracking systems beyond standard filtering approaches? I understand that implementation details may not be public, but any high-level insights or pointers would be greatly appreciated. Thanks!
0
0
180
1w
CoreML MLE5ProgramLibrary AOT recompilation hangs/crashes on iOS 26.4 — C++ exception in espresso IR compiler bypasses Swift error handling
Area: CoreML / Machine Learning Describe the issue: On iOS 26.4, calling MLModel(contentsOf:configuration:) to load an .mlpackage model hangs indefinitely and eventually kills the app via watchdog. The same model loads and runs inference successfully in under 1 second on iOS 26.3.1. The hang occurs inside eort_eo_compiler_compile_from_ir_program (espresso) during on-device AOT recompilation triggered by MLE5ProgramLibraryOnDeviceAOTCompilationImpl createProgramLibraryHandleWithRespecialization:error:. A C++ exception (__cxa_throw) is thrown inside libBNNS.dylib during the exception unwind, which then hangs inside __cxxabiv1::dyn_cast_slow and __class_type_info::search_below_dst. Swift's try/catch does not catch this — the exception originates in C++ and the process hangs rather than terminating cleanly. Setting config.computeUnits = .cpuOnly does not resolve the issue. MLE5ProgramLibrary initialises as shared infrastructure regardless of compute units. Steps to reproduce: Create an app with an .mlpackage CoreML model using the MLE5/espresso backend Call MLModel(contentsOf: modelURL, configuration: config) at runtime Run on a device on iOS 26.3.1 — loads successfully in <1 second Update device to iOS 26.4 — hangs indefinitely, app killed by watchdog after 60–745 seconds Expected behaviour: Model loads successfully, or throws a catchable Swift error on failure. Actual behaviour: Process hangs in MLE5ProgramLibrary.lazyInitQueue. App killed by watchdog. No Swift error thrown. Full stack trace at point of hang: Thread 1 Queue: com.apple.coreml.MLE5ProgramLibrary.lazyInitQueue (serial) frame 0: __cxxabiv1::__class_type_info::search_below_dst libc++abi.dylib frame 1: __cxxabiv1::(anonymous namespace)::dyn_cast_slow libc++abi.dylib frame 2: ___lldb_unnamed_symbol_23ab44dd4 libBNNS.dylib frame 23: eort_eo_compiler_compile_from_ir_program espresso frame 24: -[MLE5ProgramLibraryOnDeviceAOTCompilationImpl createProgramLibraryHandleWithRespecialization:error:] CoreML frame 25: -[MLE5ProgramLibrary _programLibraryHandleWithForceRespecialization:error:] CoreML frame 26: __44-[MLE5ProgramLibrary prepareAndReturnError:]_block_invoke CoreML frame 27: _dispatch_client_callout libdispatch.dylib frame 28: _dispatch_lane_barrier_sync_invoke_and_complete libdispatch.dylib frame 29: -[MLE5ProgramLibrary prepareAndReturnError:] CoreML frame 30: -[MLE5Engine initWithContainer:configuration:error:] CoreML frame 31: +[MLE5Engine loadModelFromCompiledArchive:modelVersionInfo:compilerVersionInfo:configuration:error:] CoreML frame 32: +[MLLoader _loadModelWithClass:fromArchive:modelVersionInfo:compilerVersionInfo:configuration:error:] CoreML frame 45: +[MLModel modelWithContentsOfURL:configuration:error:] CoreML frame 46: @nonobjc MLModel.__allocating_init(contentsOf:configuration:) GKPersonalV2 frame 47: MDNA_GaitEncoder_v1_3.__allocating_init(contentsOf:configuration:) frame 48: MDNA_GaitEncoder_v1_3.__allocating_init(configuration:) frame 50: GaitModelInference.loadModel() frame 51: GaitModelInference.init() iOS version: Reproduced on iOS 26.4. Works correctly on iOS 26.3.1. Xcode version: 26.2 Device: iPhone (model used in testing) Model format: .mlpackage
2
0
309
1d
Building a 4-agent autonomous coding pipeline on Apple Silicon — MLX backend questions
Hi, I'm building ANF (Autonomous Native Forge) — a cloud-free, 4-agent autonomous software production pipeline running on local hardware with local LLM inference. No middleware, pure Node.js native. Currently running on NVIDIA Blackwell GB10 with vLLM + DeepSeek-R1-32B. Now porting to Apple Silicon. Three technical questions: How production-ready is mlx-lm's OpenAI-compatible API server for long context generation (32K tokens)? What's the recommended approach for KV Cache management with Unified Memory architecture — any specific flags or configurations for M4 Ultra? MLX vs GGUF (llama.cpp) for a multi-agent pipeline where 4 agents call the inference endpoint concurrently — which handles parallel requests better on Apple Silicon? GitHub: github.com/trgysvc/AutonomousNativeForge Any guidance appreciated.
Replies
0
Boosts
0
Views
245
Activity
2w
MPS SDPA Attention Kernel Regression on A14-class (M1) in macOS 26.3.1 — Works on A15+ (M2+)
Summary Since macOS 26, our Core ML / MPS inference pipeline produces incorrect results on Mac mini M1 (Macmini9,1, A14-class SoC). The same model and code runs correctly on M2 and newer (A15-class and up). The regression appears to be in the Scaled Dot-Product Attention (SDPA) kernel path in the MPS backend. Environment Affected Mac mini M1 — Macmini9,1 (A14-class) Not affected M2 and newer (A15-class and up) Last known good macOS Sequoia First broken macOS 26 (Tahoe) ? Confirmed broken on macOS 26.3.1 Framework Core ML + MPS backend Language C++ (via CoreML C++ API) Description We ship an audio processing application (VoiceAssist by NoiseWorks) that runs a deep learning model (based on Demucs architecture) via Core ML with the MPS compute unit. On macOS Sequoia this works correctly on all Apple Silicon Macs including M1. After updating to macOS 26 (Tahoe), inference on M1 Macs fails — either producing garbage output or crashing. The same binary, same .mlpackage, same inputs work correctly on M2+. Our Apple contact has suggested the root cause is a regression in the A14-specific MPS SDPA attention kernel, which may have broken when the Metal/MPS stack was updated in macOS 26. The model makes heavy use of attention layers, and the failure correlates precisely with the SDPA path being exercised on A14 hardware. Steps to Reproduce Load a Core ML model that uses Scaled Dot-Product Attention (e.g. a transformer or attention-based audio model) Run inference with MLComputeUnits::cpuAndGPU (MPS active) Run on Mac mini M1 (Macmini9,1) with macOS 26.3.1 Compare output to the same model running on M2 / macOS Sequoia Expected: Correct inference output, consistent with M2+ and macOS Sequoia behavior Actual: Incorrect / corrupted output (or crash), only on A14-class hardware running macOS 26+ Workaround Forcing MLComputeUnits::cpuOnly bypasses MPS entirely and produces correct output on M1, confirming the issue is in the MPS compute path. This is not acceptable as a shipping workaround due to performance impact. Additional Notes The failure is hardware-specific (A14 only) and OS-specific (macOS 26+), pointing to a kernel-level regression rather than a model or app bug We first became aware of this through a customer report Happy to provide a symbolicated crash log if helpful this text was summarized by AI and human verified
Replies
1
Boosts
0
Views
192
Activity
1w
How does ARKit achieve low-latency and stable head tracking using only RGB camera ?
Hi, I’m working on a real-time head/face tracking pipeline using a standard 2D RGB camera, and I’m trying to better understand how ARKit achieves such stable and responsive results in comparable conditions. To clarify upfront: I’m specifically interested in RGB-only tracking and the underlying vision/ML pipeline. I’m not using TrueDepth or any depth/IR-based sensors, and I’d like to understand how similar stability and responsiveness can be achieved under those constraints. In my current setup, I estimate head pose from RGB frames (facial landmarks + PnP) and apply temporal filtering (e.g., exponential smoothing and Kalman filtering). This significantly reduces jitter, but introduces noticeable latency, especially during faster head movements. What stands out in ARKit is that it appears to maintain both: Very low jitter Very low perceived latency even when operating with camera input alone. I’m trying to understand what techniques might contribute to this behavior. In particular: Does ARKit use predictive tracking (e.g., velocity or acceleration-based pose extrapolation) to compensate for camera and processing delays in RGB-only scenarios? Are there recommended strategies for balancing temporal smoothing and responsiveness without introducing visible lag in camera-based pose estimation pipelines? Is the tracking pipeline internally decoupled from rendering (e.g., asynchronous processing with prediction applied at render time)? Are there general best practices for minimizing end-to-end latency in vision-based head tracking systems beyond standard filtering approaches? I understand that implementation details may not be public, but any high-level insights or pointers would be greatly appreciated. Thanks!
Replies
0
Boosts
0
Views
180
Activity
1w
CoreML MLE5ProgramLibrary AOT recompilation hangs/crashes on iOS 26.4 — C++ exception in espresso IR compiler bypasses Swift error handling
Area: CoreML / Machine Learning Describe the issue: On iOS 26.4, calling MLModel(contentsOf:configuration:) to load an .mlpackage model hangs indefinitely and eventually kills the app via watchdog. The same model loads and runs inference successfully in under 1 second on iOS 26.3.1. The hang occurs inside eort_eo_compiler_compile_from_ir_program (espresso) during on-device AOT recompilation triggered by MLE5ProgramLibraryOnDeviceAOTCompilationImpl createProgramLibraryHandleWithRespecialization:error:. A C++ exception (__cxa_throw) is thrown inside libBNNS.dylib during the exception unwind, which then hangs inside __cxxabiv1::dyn_cast_slow and __class_type_info::search_below_dst. Swift's try/catch does not catch this — the exception originates in C++ and the process hangs rather than terminating cleanly. Setting config.computeUnits = .cpuOnly does not resolve the issue. MLE5ProgramLibrary initialises as shared infrastructure regardless of compute units. Steps to reproduce: Create an app with an .mlpackage CoreML model using the MLE5/espresso backend Call MLModel(contentsOf: modelURL, configuration: config) at runtime Run on a device on iOS 26.3.1 — loads successfully in <1 second Update device to iOS 26.4 — hangs indefinitely, app killed by watchdog after 60–745 seconds Expected behaviour: Model loads successfully, or throws a catchable Swift error on failure. Actual behaviour: Process hangs in MLE5ProgramLibrary.lazyInitQueue. App killed by watchdog. No Swift error thrown. Full stack trace at point of hang: Thread 1 Queue: com.apple.coreml.MLE5ProgramLibrary.lazyInitQueue (serial) frame 0: __cxxabiv1::__class_type_info::search_below_dst libc++abi.dylib frame 1: __cxxabiv1::(anonymous namespace)::dyn_cast_slow libc++abi.dylib frame 2: ___lldb_unnamed_symbol_23ab44dd4 libBNNS.dylib frame 23: eort_eo_compiler_compile_from_ir_program espresso frame 24: -[MLE5ProgramLibraryOnDeviceAOTCompilationImpl createProgramLibraryHandleWithRespecialization:error:] CoreML frame 25: -[MLE5ProgramLibrary _programLibraryHandleWithForceRespecialization:error:] CoreML frame 26: __44-[MLE5ProgramLibrary prepareAndReturnError:]_block_invoke CoreML frame 27: _dispatch_client_callout libdispatch.dylib frame 28: _dispatch_lane_barrier_sync_invoke_and_complete libdispatch.dylib frame 29: -[MLE5ProgramLibrary prepareAndReturnError:] CoreML frame 30: -[MLE5Engine initWithContainer:configuration:error:] CoreML frame 31: +[MLE5Engine loadModelFromCompiledArchive:modelVersionInfo:compilerVersionInfo:configuration:error:] CoreML frame 32: +[MLLoader _loadModelWithClass:fromArchive:modelVersionInfo:compilerVersionInfo:configuration:error:] CoreML frame 45: +[MLModel modelWithContentsOfURL:configuration:error:] CoreML frame 46: @nonobjc MLModel.__allocating_init(contentsOf:configuration:) GKPersonalV2 frame 47: MDNA_GaitEncoder_v1_3.__allocating_init(contentsOf:configuration:) frame 48: MDNA_GaitEncoder_v1_3.__allocating_init(configuration:) frame 50: GaitModelInference.loadModel() frame 51: GaitModelInference.init() iOS version: Reproduced on iOS 26.4. Works correctly on iOS 26.3.1. Xcode version: 26.2 Device: iPhone (model used in testing) Model format: .mlpackage
Replies
2
Boosts
0
Views
309
Activity
1d