Integrate machine learning models into your app using Core ML.

Core ML Documentation

Posts under Core ML subtopic

Post

Replies

Boosts

Views

Activity

SoundAnalysis built-in classifier fails in background (SNErrorCode.operationFailed)
I’m seeing consistent failures using SoundAnalysis live classification when my app moves to the background. Setup iOS 17.x AVAudioEngine mic capture SNAudioStreamAnalyzer SNClassifySoundRequest(classifierIdentifier: .version1) UIBackgroundModes = audio AVAudioSession .record / .playAndRecord, active Audio capture + level metering continue working in background (mic indicator stays on) Issue As soon as the app enters background / screen locks: SoundAnalysis starts failing every second with domain:com.apple.SoundAnalysis, code:2(SNErrorCode.operationFailed) Audio capture itself continues normally When the app returns to foreground, classification immediately resumes without restarting the engine/analyzer Question Is live background sound classification with the built-in SoundAnalysis classifier officially unsupported or known to fail in background? If so, is a custom Core ML model the only supported approach for background detection? Or is there a required configuration I’m missing to keep SNClassifySoundRequest(.version1) running in background? Thanks for any clarification.
0
1
220
Dec ’25
CoreML Unified Memory failure/silent exit on long video tasks (M1 Mac 32GB)
Hi Apple Engineers, I am experiencing a potential memory management bug with CoreML on M1 Mac (32GB Unified Memory). When processing long video files (approx. 12,000 frames) using a CoreML execution provider, the system often completes the 'Analysing' phase but fails to transition into 'Processing'. It simply exits silently or hits an import error (scipy). However, if I split the same task into small 20-frame segments, it works perfectly at high speeds (~40 FPS). This suggests the hardware is capable, but there is an issue with memory fragmentation or resource cleanup during long-running CoreML sessions. Is there a way to force a VRAM/Unified Memory flush via CLI, or is this a known limitation for large frame indexing?
0
0
542
Dec ’25
Using coremltools in a CI/CD pipeline
Hi everyone 👋 I'd like to use coremltools to see how well a model performs on a remote device as part of a CI/CD pipeline. According to the Core ML Tools "Debugging and Performance Utilities" guide, remote devices must be in a "connected" state in order for coremltools to install the ModelRunner application. The devices in our system have a "paired" state, and I'm unable to set the them as "connected." The only way I know how to connect a device is to physically plug it in to a computer and open Xcode. I don't have physical access to the devices in the CI/CD system, and the host computer that interacts with them doesn't have Xcode installed. Here are some questions I've been looking into and would love some help answering: Has anyone managed to use the coremltools performance utilities in a similar system? Can you put a device in a "connected" state if you don't have physical access to the device and if you only have access to Xcode command line tools and not the Xcode app? Is it at all possible to install the coremltools ModelRunner application on a "paired" device, for example, by manually building the app and installing it with devicectl? Would other utilities, such as the MLModelBenchmarker work as expected if the app is installed this way? Thank you!
1
0
544
Dec ’25
ANE Error with Statefu Model: "Unable to compute prediction" when State Tensor width is not 32-aligned
Hi everyone, I believe I’ve encountered a potential bug or a hardware alignment limitation in the Core ML Framework / ANE Runtime specifically affecting the new Stateful API (introduced in iOS 18/macOS 15). The Issue: A Stateful mlprogram fails to run on the Apple Neural Engine (ANE) if the state tensor dimensions (specifically the width) are not a multiple of 32. The model works perfectly on CPU and GPU, but fails on ANE both during runtime and when generating a Performance Report in Xcode. Error Message in Xcode UI: "There was an error creating the performance report Unable to compute the prediction using ML Program. It can be an invalid input data or broken/unsupported model." Observations: Case A (Fails): State shape = (1, 3, 480, 270). Prediction fails on ANE. Case B (Success): State shape = (1, 3, 480, 256). Prediction succeeds on ANE. This suggests an internal memory alignment or tiling issue within the ANE driver when handling Stateful buffers that don't meet the 32-pixel/element alignment. Reproduction Code (PyTorch + coremltools): import torch.nn as nn import coremltools as ct import numpy as np class RNN_Stateful(nn.Module): def __init__(self, hidden_shape): super(RNN_Stateful, self).__init__() # Simple conv to update state self.conv1 = nn.Conv2d(3 + hidden_shape[1], hidden_shape[1], kernel_size=3, padding=1) self.conv2 = nn.Conv2d(hidden_shape[1], 3, kernel_size=3, padding=1) self.register_buffer("hidden_state", torch.ones(hidden_shape, dtype=torch.float16)) def forward(self, imgs): self.hidden_state = self.conv1(torch.cat((imgs, self.hidden_state), dim=1)) return self.conv2(self.hidden_state) # h=480, w=255 causes ANE failure. w=256 works. b, ch, h, w = 1, 3, 480, 255 model = RNN_Stateful((b, ch, h, w)).eval() traced_model = torch.jit.trace(model, torch.randn(b, 3, h, w)) mlmodel = ct.convert( traced_model, inputs=[ct.TensorType(name="input_image", shape=(b, 3, h, w), dtype=np.float16)], outputs=[ct.TensorType(name="output", dtype=np.float16)], states=[ct.StateType(wrapped_type=ct.TensorType(shape=(b, ch, h, w), dtype=np.float16), name="hidden_state")], minimum_deployment_target=ct.target.iOS18, convert_to="mlprogram" ) mlmodel.save("rnn_stateful.mlpackage") Steps to see the error: Open the generated .mlpackage in Xcode 16.0+. Go to the Performance tab and run a test on a device with ANE (e.g., iPhone 15/16 or M-series Mac). The report will fail to generate with the error mentioned above. Environment: OS: macOS 15.2 Xcode: 16.3 Hardware: M4 Has anyone else encountered this 32-pixel alignment requirement for StateType tensors on ANE? Is this a known hardware constraint or a bug in the Core ML runtime? Any insights or workarounds (other than manual padding) would be appreciated.
0
0
475
Dec ’25
Core ML .mlpackage not found in bundle despite target membership and Copy Bundle Resources
Hi everyone, I’m working on an iOS app that uses a Core ML model to run live image recognition. I’ve run into a persistent issue with the mlpackage not being turned into a swift class. This following error is in the code, and in carDetection.mlpackage, it says that model class has not been generated yet. The error in the code is as follows: What I’ve tried: Verified Target Membership is checked for carDetectionModel.mlpackage Confirmed the file is listed under Copy Bundle Resources (and removed from Compile Sources) Cleaned the build folder (Shift + Cmd + K) and rebuilt Renamed and re-added the .mlpackage file Restarted Xcode and re-added the file Logged bundle contents at runtime, but the .mlpackage still doesn’t appear The mlpackage is in Copy bundle resources, and is not in the compile sources. I just don't know why a swift class is not being generated for the mlpackage. Could someone please give me some guidance on what to do to resolve this issue? Sorry if my error is a bit naive, I'm pretty new to iOS app development
3
0
582
Dec ’25
Getting CoreML to run inference on already allocated gpu buffers
I am running some experiments with WebGPU using the wgpu crate in rust. I have some Buffers already allocated in the GPU. Is it possible to use those already existing buffers directly as inputs to a predict call in CoreML? I want to prevent gpu to cpu download time as much as possible. Or are there any other ways to do something like this. Is this only possible using the latest Tensor object which came out with Metal 4 ?
0
0
710
Nov ’25
“Accelerate Transformer Training on Apple Devices from Months to Hours!”
I am excited to share that I have developed a Metal kernel for Flash Attention that eliminates race conditions and fully leverages Apple Silicon’s shared memory and registers. This kernel can dramatically accelerate training of transformer-based models. Early benchmarks suggest that models which previously required months to train could see reductions to just a few hours on Apple hardware, while maintaining numerical stability and accuracy. I plan to make the code publicly available to enable the broader community to benefit. I would be happy to keep you updated on the latest developments and improvements as I continue testing and optimizing the kernel. I believe this work could provide valuable insights for Apple’s machine learning research and products.
0
0
274
Nov ’25
Hardware Support for Low Precision Data Types?
Hi all, I'm trying to find out if/when we can expect mxfp8/mxfp4 support on Apple Silicon. I've noticed that mlx now has casting data types, but all computation is still done in bf16. Would be great to reduce power consumption with support for these lower precision data types since edge inference is already typically done at a lower precision! Thanks in advance.
0
0
314
Nov ’25
Nothing else will silently fix itself until Apple re-adds ONNX ingestion in Core ML 10 or a new converter.
After a few attempts I end up at the same spot as categorized here by ChatGPT - I get why it feels that way, Rob — honestly, you’ve walked right into the worst possible moment in Apple’s Core ML toolchain evolution. What you’re running into isn’t your fault; Apple really did rip out the ONNX front-end without offering a clean replacement yet. So every tutorial, every script, and even coremltools itself contradicts what’s actually possible right now. Let’s be brutally clear about the state of things (late 2025): Goal Supported by Apple today Practical workaround Convert ONNX → Core ML using coremltools 9 No. The ONNX importer was removed. Use the legacy coremltools 6.x + onnx-coreml stack in a separate Python 3.9/3.10 env. Convert PyTorch → Core ML directly Yes, but only for simple or traced models. T5 still fails because of dynamic control flow. None reliable without rewriting the model. Run ONNX models directly on iOS Partially (via MLC LLM / Metal Performance Shaders). Works, but that’s an entirely different runtime. So: you’re not crazy, and this isn’t wasted effort—you’ve actually proven that your model exports cleanly and that the only missing piece is Apple’s converter. The ONNX file you built is solid. You just need something that can translate it to Core ML. If your goal is simply “get it running on-device”, you still have two viable moves that aren’t rabbit holes: 1. Use MLC LLM / Metal runtime Skip Core ML altogether. Apple’s mlc-llm (and the “MLCChat” pipeline) can execute your ONNX or quantized model directly on the GPU. You keep full fidelity and avoid all these broken converters. 2. Use the legacy Core ML 6.x toolchain in a sandboxed Python 3.10 VM It actually works. You don’t have to downgrade your whole system—just run that specific environment once to emit the .mlpackage, then never touch it again. Nothing else will silently fix itself until Apple re-adds ONNX ingestion in Core ML 10 or a new converter.
0
0
249
Oct ’25
Core ML Model Performance report shows prediction speed much faster than actual app runs
Hi all, I'm tuning my app prediction speed with Core ML model. I watched and tried the methods in video: Improve Core ML integration with async prediction and Optimize your Core ML usage. I also use instruments to look what's the bottleneck that my prediction speed cannot be faster. Below is the instruments result with my app. its prediction duration is 10.29ms And below is performance report shows the average speed of prediction is 5.55ms, that is about half time of my app prediction! Below is part of my instruments records. I think the prediction should be considered quite frequent. Could it be faster? How to be the same prediction speed as performance report? The prediction speed on macbook Pro M2 is nearly the same as macbook Air M1!
5
0
1.4k
Oct ’25
Custom keypoint detection model through vision api
Hi there, I have a custom keypoint detection model and want to use it via vision's CoremlRequest API. Here's some complication for input and output: For input My model expect 512x512 a image. Which would be resized and padded from a 1920x1080 frame. I use the .scaleToFit option, but can I also specify the color used for padding? For output: My model output a CoreMLFeatureValueObservation, can I have it output in a format vision recognizes? such as joints/keypoints If my model is able to output in a format vision recognizes, would it take care to restoring the coordinates back to the original frame? (undo the padding) If not, how do I restore it from .scaletofit option? Best,
1
0
934
Oct ’25
Best practices for designing proactive FinTech insights with App Intents & Shortcuts?
Hello fellow developers, I'm the founder of a FinTech startup, Cent Capital (https://cent.capital), where we are building an AI-powered financial co-pilot. We're deeply exploring the Apple ecosystem to create a more proactive and ambient user experience. A core part of our vision is to use App Intents and the Shortcuts app to surface personalized financial insights without the user always needing to open our app. For example, suggesting a Shortcut like, "What's my spending in the 'Dining Out' category this month?" or having an App Intent proactively surface an insight like, "Your 'Subscriptions' budget is almost full." My question for the community is about the architectural and user experience best practices for this. How are you thinking about the balance between providing rich, actionable insights via Intents without being overly intrusive or "spammy" to the user? What are the best practices for designing the data model that backs these App Intents for a complex domain like personal finance? Are there specific performance or privacy considerations we should be aware of when surfacing potentially sensitive financial data through these system-level integrations? We believe this is the future of FinTech apps on iOS and would love to hear how other developers are thinking about this challenge. Thanks for your insights!
0
0
335
Oct ’25
CoreML Inference Acceleration
Hello everyone, I have a visual convolutional model and a video that has been decoded into many frames. When I perform inference on each frame in a loop, the speed is a bit slow. So, I started 4 threads, each running inference simultaneously, but I found that the speed is the same as serial inference, every single forward inference is slower. I used the mactop tool to check the GPU utilization, and it was only around 20%. Is this normal? How can I accelerate it?
2
0
709
Sep ’25
iOS 26 beta breaking my model
I just recently updated to iOS 26 beta (23A5336a) to test an app I am developing I running an MLModel loaded from a .mlmodelc file. On the current iOS version 18.6.2 the model is running as expected with no issues. However on iOS 26 I am now getting error when trying to perform an inference to the model where I pass a camera frame into it. Below is the error I am seeing when I attempt to run an inference. at the bottom it says "Failed with status=0x1d : statusType=0x9: Program Inference error status=-1 Unable to compute the prediction using a neural network model. It can be an invalid input data or broken/unsupported model " does this indicate I need to convert my model or something? I don't understand since it runs as normal on iOS 18. Any help getting this to run again would be greatly appreciated. Thank you, processRequest:model:qos:qIndex:modelStringID:options:returnValue:error:: Could not process request ret=0x1d lModel=_ANEModel: { modelURL=file:///var/containers/Bundle/Application/04F01BF5-D48B-44EC-A5F6-3C7389CF4856/RizzCanvas.app/faceParsing.mlmodelc/ : sourceURL=(null) : UUID=46228BFC-19B0-45BF-B18D-4A2942EEC144 : key={"isegment":0,"inputs":{"input":{"shape":[512,512,1,3,1]}},"outputs":{"var_633":{"shape":[512,512,1,19,1]},"94_argmax_out_value":{"shape":[512,512,1,1,1]},"argmax_out":{"shape":[512,512,1,1,1]},"var_637":{"shape":[512,512,1,19,1]}}} : identifierSource=1 : cacheURLIdentifier=01EF2D3DDB9BA8FD1FDE18C7CCDABA1D78C6BD02DC421D37D4E4A9D34B9F8181_93D03B87030C23427646D13E326EC55368695C3F61B2D32264CFC33E02FFD9FF : string_id=0x00000000 : program=_ANEProgramForEvaluation: { programHandle=259022032430 : intermediateBufferHandle=13949 : queueDepth=127 } : state=3 : [Espresso::ANERuntimeEngine::__forward_segment 0] evaluate[RealTime]WithModel returned 0; code=8 err=Error Domain=com.apple.appleneuralengine Code=8 "processRequest:model:qos:qIndex:modelStringID:options:returnValue:error:: ANEProgramProcessRequestDirect() Failed with status=0x1d : statusType=0x9: Program Inference error" UserInfo={NSLocalizedDescription=processRequest:model:qos:qIndex:modelStringID:options:returnValue:error:: ANEProgramProcessRequestDirect() Failed with status=0x1d : statusType=0x9: Program Inference error} [Espresso::handle_ex_plan] exception=Espresso exception: "Generic error": ANEF error: /private/var/containers/Bundle/Application/04F01BF5-D48B-44EC-A5F6-3C7389CF4856/RizzCanvas.app/faceParsing.mlmodelc/model.espresso.net, processRequest:model:qos:qIndex:modelStringID:options:returnValue:error:: ANEProgramProcessRequestDirect() Failed with status=0x1d : statusType=0x9: Program Inference error status=-1 Unable to compute the prediction using a neural network model. It can be an invalid input data or broken/unsupported model (error code: -1). Error Domain=com.apple.Vision Code=3 "The VNCoreMLTransform request failed" UserInfo={NSLocalizedDescription=The VNCoreMLTransform request failed, NSUnderlyingError=0x114d92940 {Error Domain=com.apple.CoreML Code=0 "Unable to compute the prediction using a neural network model. It can be an invalid input data or broken/unsupported model (error code: -1)." UserInfo={NSLocalizedDescription=Unable to compute the prediction using a neural network model. It can be an invalid input data or broken/unsupported model (error code: -1).}}}
1
0
1.2k
Sep ’25
JAX Metal: Random Number Generation Performance Issue on M1 Max
JAX Metal shows 55x slower random number generation compared to NVIDIA CUDA on equivalent workloads. This makes Monte Carlo simulations and scientific computing impractical on Apple Silicon. Performance Comparison NVIDIA GPU: 0.475s for 12.6M random elements M1 Max Metal: 26.3s for same workload Performance gap: 55x slower Environment Apple M1 Max, 64GB RAM, macOS Sequoia Version 15.6.1 JAX 0.4.34, jax-metal latest Backend: Metal Reproduction Code import time import jax import jax.numpy as jnp from jax import random key = random.PRNGKey(42) start_time = time.time() random_array = random.normal(key, (50000, 252)) duration = time.time() - start_time print(f"Duration: {duration:.3f}s")
0
0
469
Aug ’25
Is it possible to create a virtual NPU device on macOS using Hypervisor.framework + CoreML?
Is it possible to expose a custom VirtIO device to a Linux guest running inside a VM — likely using QEMU backed by Hypervisor.framework. The guest would see this device as something like /dev/npu0, and it would use a kernel driver + userspace library to submit inference requests. On the macOS host, these requests would be executed using CoreML, MPSGraph, or BNNS. The results would be passed back to the guest via IPC. Does the macOS allow this kind of "fake" NPU / GPU
1
0
442
Aug ’25
Crash inside of Vision predictWithCVPixelBuffer - Crashed: com.apple.VN.detectorSyncTasksQueue.VNCoreMLTransformer
Hello, We have been encountering a persistent crash in our application, which is deployed exclusively on iPad devices. The crash occurs in the following code block: let requestHandler = ImageRequestHandler(paddedImage) var request = CoreMLRequest(model: model) request.cropAndScaleAction = .scaleToFit let results = try await requestHandler.perform(request) The client using this code is wrapped inside an actor, following Swift concurrency principles. The issue has been consistently reproduced across multiple iPadOS versions, including: iPad OS - 18.4.0 iPad OS - 18.4.1 iPad OS - 18.5.0 This is the crash log - Crashed: com.apple.VN.detectorSyncTasksQueue.VNCoreMLTransformer 0 libobjc.A.dylib 0x7b98 objc_retain + 16 1 libobjc.A.dylib 0x7b98 objc_retain_x0 + 16 2 libobjc.A.dylib 0xbf18 objc_getProperty + 100 3 Vision 0x326300 -[VNCoreMLModel predictWithCVPixelBuffer:options:error:] + 148 4 Vision 0x3273b0 -[VNCoreMLTransformer processRegionOfInterest:croppedPixelBuffer:options:qosClass:warningRecorder:error:progressHandler:] + 748 5 Vision 0x2ccdcc __119-[VNDetector internalProcessUsingQualityOfServiceClass:options:regionOfInterest:warningRecorder:error:progressHandler:]_block_invoke_5 + 132 6 Vision 0x14600 VNExecuteBlock + 80 7 Vision 0x14580 __76+[VNDetector runSuccessReportingBlockSynchronously:detector:qosClass:error:]_block_invoke + 56 8 libdispatch.dylib 0x6c98 _dispatch_block_sync_invoke + 240 9 libdispatch.dylib 0x1b584 _dispatch_client_callout + 16 10 libdispatch.dylib 0x11728 _dispatch_lane_barrier_sync_invoke_and_complete + 56 11 libdispatch.dylib 0x7fac _dispatch_sync_block_with_privdata + 452 12 Vision 0x14110 -[VNControlledCapacityTasksQueue dispatchSyncByPreservingQueueCapacity:] + 60 13 Vision 0x13ffc +[VNDetector runSuccessReportingBlockSynchronously:detector:qosClass:error:] + 324 14 Vision 0x2ccc80 __119-[VNDetector internalProcessUsingQualityOfServiceClass:options:regionOfInterest:warningRecorder:error:progressHandler:]_block_invoke_4 + 336 15 Vision 0x14600 VNExecuteBlock + 80 16 Vision 0x2cc98c __119-[VNDetector internalProcessUsingQualityOfServiceClass:options:regionOfInterest:warningRecorder:error:progressHandler:]_block_invoke_3 + 256 17 libdispatch.dylib 0x1b584 _dispatch_client_callout + 16 18 libdispatch.dylib 0x6ab0 _dispatch_block_invoke_direct + 284 19 Vision 0x2cc454 -[VNDetector internalProcessUsingQualityOfServiceClass:options:regionOfInterest:warningRecorder:error:progressHandler:] + 632 20 Vision 0x2cd14c __111-[VNDetector processUsingQualityOfServiceClass:options:regionOfInterest:warningRecorder:error:progressHandler:]_block_invoke + 124 21 Vision 0x14600 VNExecuteBlock + 80 22 Vision 0x2ccfbc -[VNDetector processUsingQualityOfServiceClass:options:regionOfInterest:warningRecorder:error:progressHandler:] + 340 23 Vision 0x125410 __swift_memcpy112_8 + 4852 24 libswift_Concurrency.dylib 0x5c134 swift::runJobInEstablishedExecutorContext(swift::Job*) + 292 25 libswift_Concurrency.dylib 0x5d5c8 swift_job_runImpl(swift::Job*, swift::SerialExecutorRef) + 156 26 libdispatch.dylib 0x13db0 _dispatch_root_queue_drain + 364 27 libdispatch.dylib 0x1454c _dispatch_worker_thread2 + 156 28 libsystem_pthread.dylib 0x9d0 _pthread_wqthread + 232 29 libsystem_pthread.dylib 0xaac start_wqthread + 8 We found an issue similar to us - https://developer.apple.com/forums/thread/770771. But the crash logs are quite different, we believe this warrants further investigation to better understand the root cause and potential mitigation strategies. Please let us know if any additional information would help diagnose this issue.
3
0
422
Jul ’25
SoundAnalysis built-in classifier fails in background (SNErrorCode.operationFailed)
I’m seeing consistent failures using SoundAnalysis live classification when my app moves to the background. Setup iOS 17.x AVAudioEngine mic capture SNAudioStreamAnalyzer SNClassifySoundRequest(classifierIdentifier: .version1) UIBackgroundModes = audio AVAudioSession .record / .playAndRecord, active Audio capture + level metering continue working in background (mic indicator stays on) Issue As soon as the app enters background / screen locks: SoundAnalysis starts failing every second with domain:com.apple.SoundAnalysis, code:2(SNErrorCode.operationFailed) Audio capture itself continues normally When the app returns to foreground, classification immediately resumes without restarting the engine/analyzer Question Is live background sound classification with the built-in SoundAnalysis classifier officially unsupported or known to fail in background? If so, is a custom Core ML model the only supported approach for background detection? Or is there a required configuration I’m missing to keep SNClassifySoundRequest(.version1) running in background? Thanks for any clarification.
Replies
0
Boosts
1
Views
220
Activity
Dec ’25
CoreML Unified Memory failure/silent exit on long video tasks (M1 Mac 32GB)
Hi Apple Engineers, I am experiencing a potential memory management bug with CoreML on M1 Mac (32GB Unified Memory). When processing long video files (approx. 12,000 frames) using a CoreML execution provider, the system often completes the 'Analysing' phase but fails to transition into 'Processing'. It simply exits silently or hits an import error (scipy). However, if I split the same task into small 20-frame segments, it works perfectly at high speeds (~40 FPS). This suggests the hardware is capable, but there is an issue with memory fragmentation or resource cleanup during long-running CoreML sessions. Is there a way to force a VRAM/Unified Memory flush via CLI, or is this a known limitation for large frame indexing?
Replies
0
Boosts
0
Views
542
Activity
Dec ’25
Using coremltools in a CI/CD pipeline
Hi everyone 👋 I'd like to use coremltools to see how well a model performs on a remote device as part of a CI/CD pipeline. According to the Core ML Tools "Debugging and Performance Utilities" guide, remote devices must be in a "connected" state in order for coremltools to install the ModelRunner application. The devices in our system have a "paired" state, and I'm unable to set the them as "connected." The only way I know how to connect a device is to physically plug it in to a computer and open Xcode. I don't have physical access to the devices in the CI/CD system, and the host computer that interacts with them doesn't have Xcode installed. Here are some questions I've been looking into and would love some help answering: Has anyone managed to use the coremltools performance utilities in a similar system? Can you put a device in a "connected" state if you don't have physical access to the device and if you only have access to Xcode command line tools and not the Xcode app? Is it at all possible to install the coremltools ModelRunner application on a "paired" device, for example, by manually building the app and installing it with devicectl? Would other utilities, such as the MLModelBenchmarker work as expected if the app is installed this way? Thank you!
Replies
1
Boosts
0
Views
544
Activity
Dec ’25
ANE Error with Statefu Model: "Unable to compute prediction" when State Tensor width is not 32-aligned
Hi everyone, I believe I’ve encountered a potential bug or a hardware alignment limitation in the Core ML Framework / ANE Runtime specifically affecting the new Stateful API (introduced in iOS 18/macOS 15). The Issue: A Stateful mlprogram fails to run on the Apple Neural Engine (ANE) if the state tensor dimensions (specifically the width) are not a multiple of 32. The model works perfectly on CPU and GPU, but fails on ANE both during runtime and when generating a Performance Report in Xcode. Error Message in Xcode UI: "There was an error creating the performance report Unable to compute the prediction using ML Program. It can be an invalid input data or broken/unsupported model." Observations: Case A (Fails): State shape = (1, 3, 480, 270). Prediction fails on ANE. Case B (Success): State shape = (1, 3, 480, 256). Prediction succeeds on ANE. This suggests an internal memory alignment or tiling issue within the ANE driver when handling Stateful buffers that don't meet the 32-pixel/element alignment. Reproduction Code (PyTorch + coremltools): import torch.nn as nn import coremltools as ct import numpy as np class RNN_Stateful(nn.Module): def __init__(self, hidden_shape): super(RNN_Stateful, self).__init__() # Simple conv to update state self.conv1 = nn.Conv2d(3 + hidden_shape[1], hidden_shape[1], kernel_size=3, padding=1) self.conv2 = nn.Conv2d(hidden_shape[1], 3, kernel_size=3, padding=1) self.register_buffer("hidden_state", torch.ones(hidden_shape, dtype=torch.float16)) def forward(self, imgs): self.hidden_state = self.conv1(torch.cat((imgs, self.hidden_state), dim=1)) return self.conv2(self.hidden_state) # h=480, w=255 causes ANE failure. w=256 works. b, ch, h, w = 1, 3, 480, 255 model = RNN_Stateful((b, ch, h, w)).eval() traced_model = torch.jit.trace(model, torch.randn(b, 3, h, w)) mlmodel = ct.convert( traced_model, inputs=[ct.TensorType(name="input_image", shape=(b, 3, h, w), dtype=np.float16)], outputs=[ct.TensorType(name="output", dtype=np.float16)], states=[ct.StateType(wrapped_type=ct.TensorType(shape=(b, ch, h, w), dtype=np.float16), name="hidden_state")], minimum_deployment_target=ct.target.iOS18, convert_to="mlprogram" ) mlmodel.save("rnn_stateful.mlpackage") Steps to see the error: Open the generated .mlpackage in Xcode 16.0+. Go to the Performance tab and run a test on a device with ANE (e.g., iPhone 15/16 or M-series Mac). The report will fail to generate with the error mentioned above. Environment: OS: macOS 15.2 Xcode: 16.3 Hardware: M4 Has anyone else encountered this 32-pixel alignment requirement for StateType tensors on ANE? Is this a known hardware constraint or a bug in the Core ML runtime? Any insights or workarounds (other than manual padding) would be appreciated.
Replies
0
Boosts
0
Views
475
Activity
Dec ’25
Please, update coremltools with Keras 3.0 support.
v3 was released 2 years ago but developers are unable to convert models created with Keras v3 to CoreML
Replies
1
Boosts
0
Views
325
Activity
Dec ’25
Core ML .mlpackage not found in bundle despite target membership and Copy Bundle Resources
Hi everyone, I’m working on an iOS app that uses a Core ML model to run live image recognition. I’ve run into a persistent issue with the mlpackage not being turned into a swift class. This following error is in the code, and in carDetection.mlpackage, it says that model class has not been generated yet. The error in the code is as follows: What I’ve tried: Verified Target Membership is checked for carDetectionModel.mlpackage Confirmed the file is listed under Copy Bundle Resources (and removed from Compile Sources) Cleaned the build folder (Shift + Cmd + K) and rebuilt Renamed and re-added the .mlpackage file Restarted Xcode and re-added the file Logged bundle contents at runtime, but the .mlpackage still doesn’t appear The mlpackage is in Copy bundle resources, and is not in the compile sources. I just don't know why a swift class is not being generated for the mlpackage. Could someone please give me some guidance on what to do to resolve this issue? Sorry if my error is a bit naive, I'm pretty new to iOS app development
Replies
3
Boosts
0
Views
582
Activity
Dec ’25
Getting CoreML to run inference on already allocated gpu buffers
I am running some experiments with WebGPU using the wgpu crate in rust. I have some Buffers already allocated in the GPU. Is it possible to use those already existing buffers directly as inputs to a predict call in CoreML? I want to prevent gpu to cpu download time as much as possible. Or are there any other ways to do something like this. Is this only possible using the latest Tensor object which came out with Metal 4 ?
Replies
0
Boosts
0
Views
710
Activity
Nov ’25
“Accelerate Transformer Training on Apple Devices from Months to Hours!”
I am excited to share that I have developed a Metal kernel for Flash Attention that eliminates race conditions and fully leverages Apple Silicon’s shared memory and registers. This kernel can dramatically accelerate training of transformer-based models. Early benchmarks suggest that models which previously required months to train could see reductions to just a few hours on Apple hardware, while maintaining numerical stability and accuracy. I plan to make the code publicly available to enable the broader community to benefit. I would be happy to keep you updated on the latest developments and improvements as I continue testing and optimizing the kernel. I believe this work could provide valuable insights for Apple’s machine learning research and products.
Replies
0
Boosts
0
Views
274
Activity
Nov ’25
Hardware Support for Low Precision Data Types?
Hi all, I'm trying to find out if/when we can expect mxfp8/mxfp4 support on Apple Silicon. I've noticed that mlx now has casting data types, but all computation is still done in bf16. Would be great to reduce power consumption with support for these lower precision data types since edge inference is already typically done at a lower precision! Thanks in advance.
Replies
0
Boosts
0
Views
314
Activity
Nov ’25
Nothing else will silently fix itself until Apple re-adds ONNX ingestion in Core ML 10 or a new converter.
After a few attempts I end up at the same spot as categorized here by ChatGPT - I get why it feels that way, Rob — honestly, you’ve walked right into the worst possible moment in Apple’s Core ML toolchain evolution. What you’re running into isn’t your fault; Apple really did rip out the ONNX front-end without offering a clean replacement yet. So every tutorial, every script, and even coremltools itself contradicts what’s actually possible right now. Let’s be brutally clear about the state of things (late 2025): Goal Supported by Apple today Practical workaround Convert ONNX → Core ML using coremltools 9 No. The ONNX importer was removed. Use the legacy coremltools 6.x + onnx-coreml stack in a separate Python 3.9/3.10 env. Convert PyTorch → Core ML directly Yes, but only for simple or traced models. T5 still fails because of dynamic control flow. None reliable without rewriting the model. Run ONNX models directly on iOS Partially (via MLC LLM / Metal Performance Shaders). Works, but that’s an entirely different runtime. So: you’re not crazy, and this isn’t wasted effort—you’ve actually proven that your model exports cleanly and that the only missing piece is Apple’s converter. The ONNX file you built is solid. You just need something that can translate it to Core ML. If your goal is simply “get it running on-device”, you still have two viable moves that aren’t rabbit holes: 1. Use MLC LLM / Metal runtime Skip Core ML altogether. Apple’s mlc-llm (and the “MLCChat” pipeline) can execute your ONNX or quantized model directly on the GPU. You keep full fidelity and avoid all these broken converters. 2. Use the legacy Core ML 6.x toolchain in a sandboxed Python 3.10 VM It actually works. You don’t have to downgrade your whole system—just run that specific environment once to emit the .mlpackage, then never touch it again. Nothing else will silently fix itself until Apple re-adds ONNX ingestion in Core ML 10 or a new converter.
Replies
0
Boosts
0
Views
249
Activity
Oct ’25
Foundation Models Framework with specialized models
Hello folks! Taking a look at https://developer.apple.com/documentation/foundationmodels it’s not clear how to use another models there. Do anyone knows if it’s possible use one trained model from outside (imported) here in foundation models framework? Thanks!
Replies
5
Boosts
0
Views
1.1k
Activity
Oct ’25
Any Recommandation for a Image Enhance and Denoise Model
I'm really not familiar with ML, but I need a model that can enhance and denoise 4k video stream at 30fps. I have tried to search latest papers but they all have very complex structure, and I don't think I can convert them to mlmodel. So can anyone give me any recommandation for such models? If there is an existing mlmodel, that would be great!
Replies
0
Boosts
0
Views
262
Activity
Oct ’25
Core ML Model Performance report shows prediction speed much faster than actual app runs
Hi all, I'm tuning my app prediction speed with Core ML model. I watched and tried the methods in video: Improve Core ML integration with async prediction and Optimize your Core ML usage. I also use instruments to look what's the bottleneck that my prediction speed cannot be faster. Below is the instruments result with my app. its prediction duration is 10.29ms And below is performance report shows the average speed of prediction is 5.55ms, that is about half time of my app prediction! Below is part of my instruments records. I think the prediction should be considered quite frequent. Could it be faster? How to be the same prediction speed as performance report? The prediction speed on macbook Pro M2 is nearly the same as macbook Air M1!
Replies
5
Boosts
0
Views
1.4k
Activity
Oct ’25
Custom keypoint detection model through vision api
Hi there, I have a custom keypoint detection model and want to use it via vision's CoremlRequest API. Here's some complication for input and output: For input My model expect 512x512 a image. Which would be resized and padded from a 1920x1080 frame. I use the .scaleToFit option, but can I also specify the color used for padding? For output: My model output a CoreMLFeatureValueObservation, can I have it output in a format vision recognizes? such as joints/keypoints If my model is able to output in a format vision recognizes, would it take care to restoring the coordinates back to the original frame? (undo the padding) If not, how do I restore it from .scaletofit option? Best,
Replies
1
Boosts
0
Views
934
Activity
Oct ’25
Best practices for designing proactive FinTech insights with App Intents & Shortcuts?
Hello fellow developers, I'm the founder of a FinTech startup, Cent Capital (https://cent.capital), where we are building an AI-powered financial co-pilot. We're deeply exploring the Apple ecosystem to create a more proactive and ambient user experience. A core part of our vision is to use App Intents and the Shortcuts app to surface personalized financial insights without the user always needing to open our app. For example, suggesting a Shortcut like, "What's my spending in the 'Dining Out' category this month?" or having an App Intent proactively surface an insight like, "Your 'Subscriptions' budget is almost full." My question for the community is about the architectural and user experience best practices for this. How are you thinking about the balance between providing rich, actionable insights via Intents without being overly intrusive or "spammy" to the user? What are the best practices for designing the data model that backs these App Intents for a complex domain like personal finance? Are there specific performance or privacy considerations we should be aware of when surfacing potentially sensitive financial data through these system-level integrations? We believe this is the future of FinTech apps on iOS and would love to hear how other developers are thinking about this challenge. Thanks for your insights!
Replies
0
Boosts
0
Views
335
Activity
Oct ’25
CoreML Inference Acceleration
Hello everyone, I have a visual convolutional model and a video that has been decoded into many frames. When I perform inference on each frame in a loop, the speed is a bit slow. So, I started 4 threads, each running inference simultaneously, but I found that the speed is the same as serial inference, every single forward inference is slower. I used the mactop tool to check the GPU utilization, and it was only around 20%. Is this normal? How can I accelerate it?
Replies
2
Boosts
0
Views
709
Activity
Sep ’25
iOS 26 beta breaking my model
I just recently updated to iOS 26 beta (23A5336a) to test an app I am developing I running an MLModel loaded from a .mlmodelc file. On the current iOS version 18.6.2 the model is running as expected with no issues. However on iOS 26 I am now getting error when trying to perform an inference to the model where I pass a camera frame into it. Below is the error I am seeing when I attempt to run an inference. at the bottom it says "Failed with status=0x1d : statusType=0x9: Program Inference error status=-1 Unable to compute the prediction using a neural network model. It can be an invalid input data or broken/unsupported model " does this indicate I need to convert my model or something? I don't understand since it runs as normal on iOS 18. Any help getting this to run again would be greatly appreciated. Thank you, processRequest:model:qos:qIndex:modelStringID:options:returnValue:error:: Could not process request ret=0x1d lModel=_ANEModel: { modelURL=file:///var/containers/Bundle/Application/04F01BF5-D48B-44EC-A5F6-3C7389CF4856/RizzCanvas.app/faceParsing.mlmodelc/ : sourceURL=(null) : UUID=46228BFC-19B0-45BF-B18D-4A2942EEC144 : key={"isegment":0,"inputs":{"input":{"shape":[512,512,1,3,1]}},"outputs":{"var_633":{"shape":[512,512,1,19,1]},"94_argmax_out_value":{"shape":[512,512,1,1,1]},"argmax_out":{"shape":[512,512,1,1,1]},"var_637":{"shape":[512,512,1,19,1]}}} : identifierSource=1 : cacheURLIdentifier=01EF2D3DDB9BA8FD1FDE18C7CCDABA1D78C6BD02DC421D37D4E4A9D34B9F8181_93D03B87030C23427646D13E326EC55368695C3F61B2D32264CFC33E02FFD9FF : string_id=0x00000000 : program=_ANEProgramForEvaluation: { programHandle=259022032430 : intermediateBufferHandle=13949 : queueDepth=127 } : state=3 : [Espresso::ANERuntimeEngine::__forward_segment 0] evaluate[RealTime]WithModel returned 0; code=8 err=Error Domain=com.apple.appleneuralengine Code=8 "processRequest:model:qos:qIndex:modelStringID:options:returnValue:error:: ANEProgramProcessRequestDirect() Failed with status=0x1d : statusType=0x9: Program Inference error" UserInfo={NSLocalizedDescription=processRequest:model:qos:qIndex:modelStringID:options:returnValue:error:: ANEProgramProcessRequestDirect() Failed with status=0x1d : statusType=0x9: Program Inference error} [Espresso::handle_ex_plan] exception=Espresso exception: "Generic error": ANEF error: /private/var/containers/Bundle/Application/04F01BF5-D48B-44EC-A5F6-3C7389CF4856/RizzCanvas.app/faceParsing.mlmodelc/model.espresso.net, processRequest:model:qos:qIndex:modelStringID:options:returnValue:error:: ANEProgramProcessRequestDirect() Failed with status=0x1d : statusType=0x9: Program Inference error status=-1 Unable to compute the prediction using a neural network model. It can be an invalid input data or broken/unsupported model (error code: -1). Error Domain=com.apple.Vision Code=3 "The VNCoreMLTransform request failed" UserInfo={NSLocalizedDescription=The VNCoreMLTransform request failed, NSUnderlyingError=0x114d92940 {Error Domain=com.apple.CoreML Code=0 "Unable to compute the prediction using a neural network model. It can be an invalid input data or broken/unsupported model (error code: -1)." UserInfo={NSLocalizedDescription=Unable to compute the prediction using a neural network model. It can be an invalid input data or broken/unsupported model (error code: -1).}}}
Replies
1
Boosts
0
Views
1.2k
Activity
Sep ’25
JAX Metal: Random Number Generation Performance Issue on M1 Max
JAX Metal shows 55x slower random number generation compared to NVIDIA CUDA on equivalent workloads. This makes Monte Carlo simulations and scientific computing impractical on Apple Silicon. Performance Comparison NVIDIA GPU: 0.475s for 12.6M random elements M1 Max Metal: 26.3s for same workload Performance gap: 55x slower Environment Apple M1 Max, 64GB RAM, macOS Sequoia Version 15.6.1 JAX 0.4.34, jax-metal latest Backend: Metal Reproduction Code import time import jax import jax.numpy as jnp from jax import random key = random.PRNGKey(42) start_time = time.time() random_array = random.normal(key, (50000, 252)) duration = time.time() - start_time print(f"Duration: {duration:.3f}s")
Replies
0
Boosts
0
Views
469
Activity
Aug ’25
Is it possible to create a virtual NPU device on macOS using Hypervisor.framework + CoreML?
Is it possible to expose a custom VirtIO device to a Linux guest running inside a VM — likely using QEMU backed by Hypervisor.framework. The guest would see this device as something like /dev/npu0, and it would use a kernel driver + userspace library to submit inference requests. On the macOS host, these requests would be executed using CoreML, MPSGraph, or BNNS. The results would be passed back to the guest via IPC. Does the macOS allow this kind of "fake" NPU / GPU
Replies
1
Boosts
0
Views
442
Activity
Aug ’25
Crash inside of Vision predictWithCVPixelBuffer - Crashed: com.apple.VN.detectorSyncTasksQueue.VNCoreMLTransformer
Hello, We have been encountering a persistent crash in our application, which is deployed exclusively on iPad devices. The crash occurs in the following code block: let requestHandler = ImageRequestHandler(paddedImage) var request = CoreMLRequest(model: model) request.cropAndScaleAction = .scaleToFit let results = try await requestHandler.perform(request) The client using this code is wrapped inside an actor, following Swift concurrency principles. The issue has been consistently reproduced across multiple iPadOS versions, including: iPad OS - 18.4.0 iPad OS - 18.4.1 iPad OS - 18.5.0 This is the crash log - Crashed: com.apple.VN.detectorSyncTasksQueue.VNCoreMLTransformer 0 libobjc.A.dylib 0x7b98 objc_retain + 16 1 libobjc.A.dylib 0x7b98 objc_retain_x0 + 16 2 libobjc.A.dylib 0xbf18 objc_getProperty + 100 3 Vision 0x326300 -[VNCoreMLModel predictWithCVPixelBuffer:options:error:] + 148 4 Vision 0x3273b0 -[VNCoreMLTransformer processRegionOfInterest:croppedPixelBuffer:options:qosClass:warningRecorder:error:progressHandler:] + 748 5 Vision 0x2ccdcc __119-[VNDetector internalProcessUsingQualityOfServiceClass:options:regionOfInterest:warningRecorder:error:progressHandler:]_block_invoke_5 + 132 6 Vision 0x14600 VNExecuteBlock + 80 7 Vision 0x14580 __76+[VNDetector runSuccessReportingBlockSynchronously:detector:qosClass:error:]_block_invoke + 56 8 libdispatch.dylib 0x6c98 _dispatch_block_sync_invoke + 240 9 libdispatch.dylib 0x1b584 _dispatch_client_callout + 16 10 libdispatch.dylib 0x11728 _dispatch_lane_barrier_sync_invoke_and_complete + 56 11 libdispatch.dylib 0x7fac _dispatch_sync_block_with_privdata + 452 12 Vision 0x14110 -[VNControlledCapacityTasksQueue dispatchSyncByPreservingQueueCapacity:] + 60 13 Vision 0x13ffc +[VNDetector runSuccessReportingBlockSynchronously:detector:qosClass:error:] + 324 14 Vision 0x2ccc80 __119-[VNDetector internalProcessUsingQualityOfServiceClass:options:regionOfInterest:warningRecorder:error:progressHandler:]_block_invoke_4 + 336 15 Vision 0x14600 VNExecuteBlock + 80 16 Vision 0x2cc98c __119-[VNDetector internalProcessUsingQualityOfServiceClass:options:regionOfInterest:warningRecorder:error:progressHandler:]_block_invoke_3 + 256 17 libdispatch.dylib 0x1b584 _dispatch_client_callout + 16 18 libdispatch.dylib 0x6ab0 _dispatch_block_invoke_direct + 284 19 Vision 0x2cc454 -[VNDetector internalProcessUsingQualityOfServiceClass:options:regionOfInterest:warningRecorder:error:progressHandler:] + 632 20 Vision 0x2cd14c __111-[VNDetector processUsingQualityOfServiceClass:options:regionOfInterest:warningRecorder:error:progressHandler:]_block_invoke + 124 21 Vision 0x14600 VNExecuteBlock + 80 22 Vision 0x2ccfbc -[VNDetector processUsingQualityOfServiceClass:options:regionOfInterest:warningRecorder:error:progressHandler:] + 340 23 Vision 0x125410 __swift_memcpy112_8 + 4852 24 libswift_Concurrency.dylib 0x5c134 swift::runJobInEstablishedExecutorContext(swift::Job*) + 292 25 libswift_Concurrency.dylib 0x5d5c8 swift_job_runImpl(swift::Job*, swift::SerialExecutorRef) + 156 26 libdispatch.dylib 0x13db0 _dispatch_root_queue_drain + 364 27 libdispatch.dylib 0x1454c _dispatch_worker_thread2 + 156 28 libsystem_pthread.dylib 0x9d0 _pthread_wqthread + 232 29 libsystem_pthread.dylib 0xaac start_wqthread + 8 We found an issue similar to us - https://developer.apple.com/forums/thread/770771. But the crash logs are quite different, we believe this warrants further investigation to better understand the root cause and potential mitigation strategies. Please let us know if any additional information would help diagnose this issue.
Replies
3
Boosts
0
Views
422
Activity
Jul ’25