I'm on Tahoe 26.1 / M3 Macbook Air. I'm using VNDetectFaceRectanglesRequest as properly as possible, as in the minimal command line program attached below. For some reason, I always get:
MLE5Engine is disabled through the configuration
printed. I couldn't find any notes on developer docs saying that VNDetectFaceRectanglesRequest can not use the Apple Neural Engine. I'm assuming there is something wrong with my code however I wasn't able to find any remarks from documentation where it might be. I wasn't able to find the above error message online either. I would appreciate your help a lot and thank you in advance.
The code below accesses the video from AVCaptureDevice.DeviceType.builtInWideAngleCamera. Currently it directly chooses the 0th format which has the largest resolution (Full HD on my M3 MBA) and "4:2:0" color "v" reduced color component spectrum encoding ("420v").
After accessing video, it performs a VNDetectFaceRectanglesRequest. It prints "VNDetectFaceRectanglesRequest completion Handler called" many times, then prints the error message above, then continues printing "VNDetectFaceRectanglesRequest completion Handler called" until the user quits it.
To run it in Xcode, File > New project > Mac command line tool. Pasting the code below, then click on the root file > Targets > Signing & Capabilities > Hardened Runtime > Resource Access > Camera.
A possible explanation could be that either Apple's internal CoreML code for this function works on GPU/CPU only or it doesn't accept 420v as supplied by the Macbook Air camera
import AVKit
import Vision
var videoDataOutput: AVCaptureVideoDataOutput = AVCaptureVideoDataOutput()
var detectionRequests: [VNDetectFaceRectanglesRequest]?
var videoDataOutputQueue: DispatchQueue = DispatchQueue(label: "queue")
class XYZ: /*NSViewController or NSObject*/NSObject, AVCaptureVideoDataOutputSampleBufferDelegate {
func viewDidLoad() {
//super.viewDidLoad()
let session = AVCaptureSession()
let inputDevice = try! self.configureFrontCamera(for: session)
self.configureVideoDataOutput(for: inputDevice.device, resolution: inputDevice.resolution, captureSession: session)
self.prepareVisionRequest()
session.startRunning()
}
fileprivate func highestResolution420Format(for device: AVCaptureDevice) -> (format: AVCaptureDevice.Format, resolution: CGSize)? {
let deviceFormat = device.formats[0]
print(deviceFormat)
let dims = CMVideoFormatDescriptionGetDimensions(deviceFormat.formatDescription)
let resolution = CGSize(width: CGFloat(dims.width), height: CGFloat(dims.height))
return (deviceFormat, resolution)
}
fileprivate func configureFrontCamera(for captureSession: AVCaptureSession) throws -> (device: AVCaptureDevice, resolution: CGSize) {
let deviceDiscoverySession = AVCaptureDevice.DiscoverySession(deviceTypes: [AVCaptureDevice.DeviceType.builtInWideAngleCamera], mediaType: .video, position: AVCaptureDevice.Position.unspecified)
let device = deviceDiscoverySession.devices.first!
let deviceInput = try! AVCaptureDeviceInput(device: device)
captureSession.addInput(deviceInput)
let highestResolution = self.highestResolution420Format(for: device)!
try! device.lockForConfiguration()
device.activeFormat = highestResolution.format
device.unlockForConfiguration()
return (device, highestResolution.resolution)
}
fileprivate func configureVideoDataOutput(for inputDevice: AVCaptureDevice, resolution: CGSize, captureSession: AVCaptureSession) {
videoDataOutput.setSampleBufferDelegate(self, queue: videoDataOutputQueue)
captureSession.addOutput(videoDataOutput)
}
fileprivate func prepareVisionRequest() {
let faceDetectionRequest: VNDetectFaceRectanglesRequest = VNDetectFaceRectanglesRequest(completionHandler: { (request, error) in
print("VNDetectFaceRectanglesRequest completion Handler called")
})
// Start with detection
detectionRequests = [faceDetectionRequest]
}
// MARK: AVCaptureVideoDataOutputSampleBufferDelegate
// Handle delegate method callback on receiving a sample buffer.
public func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
var requestHandlerOptions: [VNImageOption: AnyObject] = [:]
let cameraIntrinsicData = CMGetAttachment(sampleBuffer, key: kCMSampleBufferAttachmentKey_CameraIntrinsicMatrix, attachmentModeOut: nil)
if cameraIntrinsicData != nil {
requestHandlerOptions[VNImageOption.cameraIntrinsics] = cameraIntrinsicData
}
let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)!
// No tracking object detected, so perform initial detection
let imageRequestHandler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer,
orientation: CGImagePropertyOrientation.up, options: requestHandlerOptions)
try! imageRequestHandler.perform(detectionRequests!)
}
}
let X = XYZ()
X.viewDidLoad()
sleep(9999999)
Explore the power of machine learning and Apple Intelligence within apps. Discuss integrating features, share best practices, and explore the possibilities for your app here.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
When the system language and Siri language are not the same, Apple AI may not be usable.
For example, if the system is in English and Siri is in Chinese, it may cause Apple AI to not work.
May I ask if there are other reasons why the app still cannot be used internally even after enabling Apple AI?
Using highly optimized Metal Shading Language (MSL) code, I pushed the MacBook Air M2 to its performance limits with the deformable_attention_universal kernel. The results demonstrate both the efficiency of the code and the exceptional power of Apple Silicon.
The total computational workload exceeded 8.455 quadrillion FLOPs, equivalent to processing 8,455 trillion operations. On average, the code sustained a throughput of 85.37 TFLOPS, showcasing the chip’s remarkable ability to handle massive workloads. Peak instantaneous performance reached approximately 673.73 TFLOPS, reflecting near-optimal utilization of the GPU cores.
Despite this intensity, the cumulative GPU runtime remained under 100 seconds, highlighting the code’s efficiency and time optimization. The fastest iteration achieved a record processing time of only 0.051 ms, demonstrating minimal bottlenecks and excellent responsiveness.
Memory management was equally impressive: peak GPU memory usage never exceeded 2 MB, reflecting efficient use of the M2’s Unified Memory. This minimizes data transfer overhead and ensures smooth performance across repeated workloads.
Overall, these results confirm that a well-optimized Metal implementation can unlock the full potential of Apple Silicon, delivering exceptional computational density, processing speed, and memory efficiency. The MacBook Air M2, often considered an energy-efficient consumer laptop, is capable of handling highly intensive workloads at performance levels typically expected from much larger GPUs. This test validates both the robustness of the Metal code and the extraordinary capabilities of the M2 chip for high-performance computing tasks.
I'm downloading a fine-tuned model from HuggingFace which is then cached on my Mac when the app first starts. However, I wanted to test adding a progress bar to show the download progress. To test this I need to delete the cached model. From what I've seen online this is cached at
/Users/userName/.cache/huggingface/hub
However, if I delete the files from here, using Terminal, the app still seems to be able to access the model.
Is the model cached somewhere else?
On my iPhone it seems deleting the app also deletes the cached model (app data) so that is useful.
We are really excited to have introduced the Foundation Models framework in WWDC25. When using the framework, you might have feedback about how it can better fit your use cases.
Starting in macOS/iOS 26 Beta 4, the best way to provide feedback is to use #Playground in Xcode. To do so:
In Xcode, create a playground using #Playground. Fore more information, see Running code snippets using the playground macro.
Reproduce the issue by setting up a session and generating a response with your prompt.
In the canvas on the right, click the thumbs-up icon to the right of the response.
Follow the instructions on the pop-up window and submit your feedback by clicking Share with Apple.
Another way to provide your feedback is to file a feedback report with relevant details. Specific to the Foundation Models framework, it’s super important to add the following information in your report:
Language model feedback
This feedback contains the session transcript, including the instructions, the prompts, the responses, etc. Without that, we can’t reason the model’s behavior, and hence can hardly take any action.
Use logFeedbackAttachment(sentiment:issues:desiredOutput: ) to retrieve the feedback data of your current model session, as shown in the usage example, write the data into a file, and then attach the file to your feedback report.
If you believe what you’d report is related to the system configuration, please capture a sysdiagnose and attach it to your feedback report as well.
The framework is still new. Your actionable feedback helps us evolve the framework quickly, and we appreciate that.
Thanks,
The Foundation Models framework team
Topic:
Machine Learning & AI
SubTopic:
Foundation Models
Hello Apple Developer Community,
I'm investigating Core ML model loading behavior and noticed that even when the compiled model path remains unchanged after an APP update, the first run still triggers an "uncached load" process. This seems to impact user experience with unnecessary delays.
Question: Does Core ML provide any public API to check whether a compiled model (from a specific .mlmodelc path) is already cached in the system?
If such API exists, we'd like to use it for pre-loading decision logic - only perform background pre-load when the model isn't cached.
Has anyone encountered similar scenarios or found official solutions? Any insights would be greatly appreciated!
I’m seeing consistent failures using SoundAnalysis live classification when my app moves to the background.
Setup
iOS 17.x
AVAudioEngine mic capture
SNAudioStreamAnalyzer
SNClassifySoundRequest(classifierIdentifier: .version1)
UIBackgroundModes = audio
AVAudioSession .record / .playAndRecord, active
Audio capture + level metering continue working in background (mic indicator stays on)
Issue
As soon as the app enters background / screen locks:
SoundAnalysis starts failing every second with domain:com.apple.SoundAnalysis, code:2(SNErrorCode.operationFailed)
Audio capture itself continues normally
When the app returns to foreground, classification immediately resumes without restarting the engine/analyzer
Question
Is live background sound classification with the built-in SoundAnalysis classifier officially unsupported or known to fail in background?
If so, is a custom Core ML model the only supported approach for background detection?
Or is there a required configuration I’m missing to keep SNClassifySoundRequest(.version1) running in background?
Thanks for any clarification.
We are trying to implement a custom encryption scheme for our Core ML models. Our goal is to bundle encrypted models, decrypt them into memory at runtime, and instantiate the MLModel without the unencrypted model file ever touching the disk.
We have looked into the native apple encryption described here https://developer.apple.com/documentation/coreml/encrypting-a-model-in-your-app but it has limitations like not working on intel macs, without SIP, and doesn’t work loading from dylib.
It seems like most of the Core ML APIs require a file path, there is MLModelAsset APIs but I think they just write a modelc back to disk when compiling but can’t find any information confirming that (also concerned that this seems to be an older API, and means we need to compile at runtime).
I am aware that the native encryption will be much more secure but would like not to have the models in readable text on disk.
Does anyone know if this is possible or any alternatives to try to obfuscate the Core ML models, thanks
Due to our min iOS version, this is my first time using .xcstrings instead of .strings for AppShortcuts.
When using the migrate .strings to .xcstrings Xcode context menu option, an .xcstrings catalog is produced that, as expected, has each invocation phrase as a separate string key.
However, after compilation, the catalog changes to group all invocation phrases under the first phrase listed for each intent (see attached screenshot). It is possible to hover in blank space on the right and add more translations, but there is no 1:1 key matching requirement to the phrases on the left nor a requirement that there are the same number of keys in one language vs. another. (The lines just happen to align due to my window size.)
What does that mean, practically?
Do all sub-phrases in each language in AppShortcuts.xcstrings get processed during compilation, even if there isn't an equivalent phrase key declared in the AppShortcut (e.g., the ja translation has more phrases than the English)? (That makes some logical sense, as these phrases need not be 1:1 across languages.)
In the AppShortcut declaration, if I delete all but the top invocation phrase, does nothing change with Siri?
Is there something I'm doing incorrectly?
struct WatchShortcuts: AppShortcutsProvider {
static var appShortcuts: [AppShortcut] {
AppShortcut(
intent: QuickAddWaterIntent(),
phrases: [
"\(.applicationName) log water",
"\(.applicationName) log my water",
"Log water in \(.applicationName)",
"Log my water in \(.applicationName)",
"Log a bottle of water in \(.applicationName)",
],
shortTitle: "Log Water",
systemImageName: "drop.fill"
)
}
}
After a few attempts I end up at the same spot as categorized here by ChatGPT -
I get why it feels that way, Rob — honestly, you’ve walked right into the worst possible moment in Apple’s Core ML toolchain evolution. What you’re running into isn’t your fault; Apple really did rip out the ONNX front-end without offering a clean replacement yet. So every tutorial, every script, and even coremltools itself contradicts what’s actually possible right now.
Let’s be brutally clear about the state of things (late 2025):
Goal Supported by Apple today Practical workaround
Convert ONNX → Core ML using coremltools 9 No. The ONNX importer was removed. Use the legacy coremltools 6.x + onnx-coreml stack in a separate Python 3.9/3.10 env.
Convert PyTorch → Core ML directly Yes, but only for simple or traced models. T5 still fails because of dynamic control flow. None reliable without rewriting the model.
Run ONNX models directly on iOS Partially (via MLC LLM / Metal Performance Shaders). Works, but that’s an entirely different runtime.
So: you’re not crazy, and this isn’t wasted effort—you’ve actually proven that your model exports cleanly and that the only missing piece is Apple’s converter. The ONNX file you built is solid. You just need something that can translate it to Core ML.
If your goal is simply “get it running on-device”, you still have two viable moves that aren’t rabbit holes:
1. Use MLC LLM / Metal runtime
Skip Core ML altogether. Apple’s mlc-llm (and the “MLCChat” pipeline) can execute your ONNX or quantized model directly on the GPU. You keep full fidelity and avoid all these broken converters.
2. Use the legacy Core ML 6.x toolchain in a sandboxed Python 3.10 VM
It actually works. You don’t have to downgrade your whole system—just run that specific environment once to emit the .mlpackage, then never touch it again.
Nothing else will silently fix itself until Apple re-adds ONNX ingestion in Core ML 10 or a new converter.
Topic:
Machine Learning & AI
SubTopic:
Core ML
I am running some experiments with WebGPU using the wgpu crate in rust. I have some Buffers already allocated in the GPU.
Is it possible to use those already existing buffers directly as inputs to a predict call in CoreML? I want to prevent gpu to cpu download time as much as possible.
Or are there any other ways to do something like this. Is this only possible using the latest Tensor object which came out with Metal 4 ?
hello,
Do you have any information on the handling of sparse matrix with MPS and PyTorch? release date? ...
Hi everyone,
I believe I’ve encountered a potential bug or a hardware alignment limitation in the Core ML Framework / ANE Runtime specifically affecting the new Stateful API (introduced in iOS 18/macOS 15).
The Issue:
A Stateful mlprogram fails to run on the Apple Neural Engine (ANE) if the state tensor dimensions (specifically the width) are not a multiple of 32. The model works perfectly on CPU and GPU, but fails on ANE both during runtime and when generating a Performance Report in Xcode.
Error Message in Xcode UI:
"There was an error creating the performance report Unable to compute the prediction using ML Program. It can be an invalid input data or broken/unsupported model."
Observations:
Case A (Fails): State shape = (1, 3, 480, 270). Prediction fails on ANE.
Case B (Success): State shape = (1, 3, 480, 256). Prediction succeeds on ANE.
This suggests an internal memory alignment or tiling issue within the ANE driver when handling Stateful buffers that don't meet the 32-pixel/element alignment.
Reproduction Code (PyTorch + coremltools):
import torch.nn as nn
import coremltools as ct
import numpy as np
class RNN_Stateful(nn.Module):
def __init__(self, hidden_shape):
super(RNN_Stateful, self).__init__()
# Simple conv to update state
self.conv1 = nn.Conv2d(3 + hidden_shape[1], hidden_shape[1], kernel_size=3, padding=1)
self.conv2 = nn.Conv2d(hidden_shape[1], 3, kernel_size=3, padding=1)
self.register_buffer("hidden_state", torch.ones(hidden_shape, dtype=torch.float16))
def forward(self, imgs):
self.hidden_state = self.conv1(torch.cat((imgs, self.hidden_state), dim=1))
return self.conv2(self.hidden_state)
# h=480, w=255 causes ANE failure. w=256 works.
b, ch, h, w = 1, 3, 480, 255
model = RNN_Stateful((b, ch, h, w)).eval()
traced_model = torch.jit.trace(model, torch.randn(b, 3, h, w))
mlmodel = ct.convert(
traced_model,
inputs=[ct.TensorType(name="input_image", shape=(b, 3, h, w), dtype=np.float16)],
outputs=[ct.TensorType(name="output", dtype=np.float16)],
states=[ct.StateType(wrapped_type=ct.TensorType(shape=(b, ch, h, w), dtype=np.float16), name="hidden_state")],
minimum_deployment_target=ct.target.iOS18,
convert_to="mlprogram"
)
mlmodel.save("rnn_stateful.mlpackage")
Steps to see the error:
Open the generated .mlpackage in Xcode 16.0+.
Go to the Performance tab and run a test on a device with ANE (e.g., iPhone 15/16 or M-series Mac).
The report will fail to generate with the error mentioned above.
Environment:
OS: macOS 15.2
Xcode: 16.3
Hardware: M4
Has anyone else encountered this 32-pixel alignment requirement for StateType tensors on ANE? Is this a known hardware constraint or a bug in the Core ML runtime?
Any insights or workarounds (other than manual padding) would be appreciated.
I'm adding Visual Intelligence support to my app, and now want to add a Tip using TipKit to guide users to this feature from within my app. I want to add a Rule to my Tip which will only show this Tip on devices where Visual Intelligence is supported (ex. not iPhone 14 Pro Max).
What is the best way for me to determine availability to set this TipKit rule?
Here's the documentation I'm following for Visual Intelligence: https://developer.apple.com/documentation/visualintelligence/integrating-your-app-with-visual-intelligence
Hi everyone
Im currently developing an object detection model that shall identify up to seven classes in an image. While im usually doing development with basic python and the ultralytics library, i thought i would like to give CreateML a shot. The experience is actually very nice, except for the fact that the model seem not to be using any ANE or GPU (MPS) for accelerated training.
On https://developer.apple.com/machine-learning/create-ml/ it states: "On-device training Train models blazingly fast right on your Mac while taking advantage of CPU and GPU."
Am I doing something wrong?
Im running the training on
Apple M1 Pro 16GB
MacOS 26.1 (Tahoe)
Xcode 26.1 (Build version 17B55)
It would be super nice to get some feedback or instructions.
Thank you in advance!
In an under-development MacOS & iOS app, I need to identify various measurements from OCR'ed text: length, weight, counts per inch, area, percentage. The unit type (e.g. UnitLength) needs to be identified as well as the measurement's unit (e.g. .inches) in order to convert the measurement to the app's internal standard (e.g. centimetres), the value of which is stored the relevant CoreData entity.
The use of NLTagger and NLTokenizer is problematic because of the various representations of the measurements: e.g. "50g.", "50 g", "50 grams", "1 3/4 oz."
Currently, I use a bespoke algorithm based on String contains and step-wise evaluation of characters, which is reasonably accurate but requires frequent updating as further representations are detected.
I'm aware of the Python SpaCy model being capable of NER Measurement recognition, but am reluctant to incorporate a Python-based solution into a production app. (ref [https://developer.apple.com/forums/thread/30092])
My preference is for an open-source NER Measurement model that can be used as, or converted to, some form of a Swift compatible Machine Learning model. Does anyone know of such a model?
I'm new to Swift and was hoping the Playground would support loading adaptors. When I tried, I got a permissions error - thinking it's because it's not in the project and Playgrounds don't like going outside the project?
A tutorial and some sample code would be helpful.
Also some benchmarks on how long it's expected to take. Selfishly I'm on an M2 Mac Mini.
Topic:
Machine Learning & AI
SubTopic:
Foundation Models
I've spent way too long today trying to convert an Object Detection TensorFlow2 model to a CoreML object classifier (with bounding boxes, labels and probability score)
The 'SSD MobileNet v2 320x320' is here: https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md
And I've been following all sorts of posts and ChatGPT
https://apple.github.io/coremltools/docs-guides/source/tensorflow-2.html#convert-a-tensorflow-concrete-function
https://developer.apple.com/videos/play/wwdc2020/10153/?time=402
To convert it.
I keep hitting the same errors though, mostly around:
NotImplementedError: Expected model format: [SavedModel | concrete_function | tf.keras.Model | .h5 | GraphDef], got <ConcreteFunction signature_wrapper(input_tensor) at 0x366B87790>
I've had varying success including missing output labels/predictions.
But I simply want to create the CoreML model with all the right inputs and outputs (including correct names) as detailed in the docs here: https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tf2.md
It goes without saying I don't have much (any) experience with this stuff including Python so the whole thing's been a bit of a headache.
If anyone is able to help that would be great.
FWIW I'm not attached to any one specific model, but what I do need at minimum is a CoreML model that can detect objects (has to at least include lights and lamps) within a live video image, detecting where in the image the object is.
The simplest script I have looks like this:
import coremltools as ct
import tensorflow as tf
model = tf.saved_model.load("~/tf_models/ssd_mobilenet_v2_320x320_coco17_tpu-8/saved_model")
concrete_func = model.signatures[tf.saved_model.DEFAULT_SERVING_SIGNATURE_DEF_KEY]
mlmodel = ct.convert(
concrete_func,
source="tensorflow",
inputs=[ct.TensorType(shape=(1, 320, 320, 3))]
)
mlmodel.save("YourModel.mlpackage", save_format="mlpackage")
I am working on a lung cancer scanning app in for iOS with a CoreML model and when I test my app on a physical device, the model results in the same prediction 100% of the time. I even changed the names around and still resulted in the same case. I have listed my labels in cases and when its just stuck on the same case (case 1)
My code is below:
https://github.com/ShivenKhurana1/Detect-to-Protect-App/blob/main/DetectToProtect/SecondView.swift
I couldn't add the code as it was too long so I hope github link is fine!
Hi everyone,
I’m exploring ideas around on-device analysis of user typing behavior on iPhone, and I’d love input from others who’ve worked in this area or thought about similar problems.
Conceptually, I’m interested in things like:
High-level sentiment or tone inferred from what a user types over time using ML-models
Identifying a user’s most important or frequent topics over a recent window (e.g., “last week”)
Aggregated insights rather than raw text (privacy-preserving summaries: e.g., your typo-rate by hour to infer highly efficient time slots or "take-a-break" warning typing errors increase)
I understand the significant privacy restrictions around keyboard input on iOS, especially for third-party keyboards and system text fields. I’m not trying to bypass those constraints—rather, I’m curious about what’s realistically possible within Apple’s frameworks and policies. (For instance, Grammarly as a correction tool includes some information about tone)
Questions I’m thinking through:
Are there any recommended approaches for on-device text analysis that don’t rely on capturing raw keystrokes?
Has anyone used NLP / Core ML / Natural Language successfully for similar summarization or sentiment tasks, scoped only to user-explicit input?
For custom keyboards, what kinds of derived or transient signals (if any) are acceptable to process and summarize locally?
Any design patterns that balance usefulness with Apple’s privacy expectations?
If you’ve built something adjacent—journaling, writing analytics, well-being apps, etc.—I’d appreciate hearing what worked, what didn’t, and what Apple reviewers were comfortable with.
Thanks in advance for any ideas or references 🙏
Topic:
Machine Learning & AI
SubTopic:
General