Hey everyone,
I've been updating my code to take advantage of the new Vision API for text recognition in macOS 15. I'm noticing some very odd behavior though, it seems like in general the new Vision API consistently produces worse results than the old API. For reference here is how I'm setting up my request.
var request = RecognizeTextRequest()
request.recognitionLevel = getOCRMode() // generally accurate
request.usesLanguageCorrection = !disableLanguageCorrection // generally true
request.recognitionLanguages = language.split(separator: ",").map { Locale.Language(identifier: String($0)) } // generally 'en'
let observations = try? await request.perform(on: image) as [RecognizedTextObservation]
Then I will process the results and just get the top candidate, which as mentioned above, typically is of worse quality then the same request formed with the old API.
Am I doing something wrong here?
Explore the power of machine learning and Apple Intelligence within apps. Discuss integrating features, share best practices, and explore the possibilities for your app here.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
I'm using Numbers to build a spreadsheet that I'm exporting as a CSV. I then import this file into Create ML to train a word tagger model. Everything has been working fine for all the models I've trained so far, but now I'm coming across a use case that has been breaking the import process: commas within the training data. This is a case that none of Apple's examples show.
My project takes Navajo text that has been tokenized by syllables and labels the parts-of-speech.
Case that works...
Raw text:
Naaltsoos yídéeshtah.
Tokens column:
Naal,tsoos, ,yí,déesh,tah,.
Labels column:
NObj,NObj,Space,Verb,Verb,VStem,Punct
Case that breaks...
Raw text:
óola, béésh łigaii, tłʼoh naadą́ą́ʼ, wáin, akʼah, dóó á,shįįh
Tokens column with tokenized text (commas quoted):
óo,la,",", ,béésh, ,łi,gaii,",", ,tłʼoh, ,naa,dą́ą́ʼ,",", ,wáin,",", ,a,kʼah,",", ,dóó, ,á,shįįh
(Create ML reports mismatched columns)
Tokens column with tokenized text (commas escaped):
óo,la,\,, ,béésh, ,łi,gaii,\,, ,tłʼoh, ,naa,dą́ą́ʼ,\,, ,wáin,\,, ,a,kʼah,\,, ,dóó, ,á,shįįh
(Create ML reports mismatched columns)
Tokens column with tokenized text (commas escape-quoted):
óo,la,\",\", ,béésh, ,łi,gaii,\",\", ,tłʼoh, ,naa,dą́ą́ʼ,\",\", ,wáin,\",\", ,a,kʼah,\",\", ,dóó, ,á,shįįh
(record not detected by Create ML)
Tokens column with tokenized text (commas escape-quoted):
óo,la,"","", ,béésh, ,łi,gaii,"","", ,tłʼoh, ,naa,dą́ą́ʼ,"","", ,wáin,"","", ,a,kʼah,"","", ,dóó, ,á,shįįh
(Create ML reports mismatched columns)
Labels column:
NSub,NSub,Punct,Space,NSub,Space,NSub,NSub,Punct,Space,NSub,Space,NSub,NSub,Punct,Space,NSub,Punct,Space,NSub,NSub,Punct,Space,Conj,Space,NSub,NSub
Sample From Spreadsheet
Solution Needed
It's simple enough to escape commas within CSV files, but the format needed by Create ML essentially combines entire CSV records into single columns, so I'm ending up needing a CSV record that contains a mixture of commas to use for parsing and ones to use as character literals. That's where this gets complicated.
For this particular use case (which seems like it would frequently arise when training a word tagger model), how should I properly escape a comma literal?
Topic:
Machine Learning & AI
SubTopic:
Create ML
Tags:
Natural Language
Machine Learning
Create ML
TabularData
When I use ChatGPT in Xcode, the following error is displayed:
It was working fine before, but suddenly it became like this, without changing any configuration. Why?
Topic:
Machine Learning & AI
SubTopic:
Apple Intelligence
When using CoreML for VAE model prediction, the prediction result shows a distorted display with no error messages. How can this issue be addressed?
In WWDC25 Metal 4 released quite excited new features for machine learning optimization, but as we all know the pytorch based on metal shader performance (mps) is the one of most important tools for Mac machine learning area.but on mps introduced website we cannot see any support information for metal4.
Hi everyone,
I'm working on integrating object recognition from live video feeds into my existing app by following Apple's sample code. My original project captures video and records it successfully. However, after integrating the Vision-based object detection components (VNCoreMLRequest), no detections occur, and the callback for the request is never triggered.
To debug this issue, I’ve added the following functionality:
Set up AVCaptureVideoDataOutput for processing video frames.
Created a VNCoreMLRequest using my Core ML model.
The video recording functionality works as expected, but no object detection happens. I’d like to know:
How to debug this further? Which key debug points or logs could help identify where the issue lies?
Have I missed any key configurations? Below is a diff of the modifications I’ve made to my project for the new feature.
Diff of Changes:
(Attach the diff provided above)
Specific Observations:
The captureOutput method is invoked correctly, but there is no output or error from the Vision request callback.
Print statements in my setup function setForVideoClassify() show that the setup executes without errors.
Questions:
Could this be due to issues with my Core ML model compatibility or configuration?
Is the VNCoreMLRequest setup incorrect, or do I need to ensure specific image formats for processing?
Platform:
Xcode 16.1, iOS 18.1, Swift 5, SwiftUI, iPhone 11,
Darwin MacBook-Pro.local 24.1.0 Darwin Kernel Version 24.1.0: Thu Oct 10 21:02:27 PDT 2024; root:xnu-11215.41.3~2/RELEASE_X86_64 x86_64
Any guidance or advice is appreciated! Thanks in advance.
Hey,
When generating responses with structured output and non-streaming API, it sometimes takes 3s, sometimes 10-20s. I am firing that request subsequently while testing the app.
Is this by design, or any place I can learn more about what contributes to such variation?
Topic:
Machine Learning & AI
SubTopic:
Foundation Models
Hi all.
My adapter model just won't invoke my tool.
The problem I am having is covered in an older post: https://developer.apple.com/forums/thread/794839?answerId=852262022#852262022
Sadly the thread dies there and no resolution is seen in that thread.
It's worth noting that I have developed an AI chatbot built around LanguageModelSession to which I feed the exact same system prompt that I feed to my training set (pasted further in this post). The AI chatbot works perfectly, the tool is invoked when needed. I am training the adapter model because the base model whilst capable doesn't produce the quality I'm looking for.
So here's the template of an item in my training set:
[
{
'role': 'system',
'content': systemPrompt,
'tools': [TOOL_DEFINITION]
},
{
'role': 'user',
'content': entry['prompt']
},
{
'role': 'assistant',
'content': entry['code']
}
]
where TOOL_DEFINITION =
{
'type': 'function',
'function': {
'name': 'WriteUbersichtWidgetToFileSystem',
'description': 'Writes an Übersicht Widget to the file system. Call this tool as the last step in processing a prompt that generates a widget.',
'parameters': {
'type': 'object',
'properties': {
'jsxContent': {
'type': 'string',
'description': 'Complete JSX code for an Übersicht widget. This should include all required exports: command, refreshFrequency, render, and className. The JSX should be a complete, valid Übersicht widget file.'
}
},
'required': ['jsxContent']
}
}
... and systemPrompt =
A conversation between a user and a helpful assistant. You are an Übersicht widget designer. Create Übersicht widgets when requested by the user.
IMPORTANT: You have access to a tool called WriteUbersichtWidgetToFileSystem. When asked to create a widget, you MUST call this tool.
### Tool Usage:
Call WriteUbersichtWidgetToFileSystem with complete JSX code that implements the Übersicht Widget API. Generate custom JSX based on the user's specific request - do not copy the example below.
### Übersicht Widget API (REQUIRED):
Every Übersicht widget MUST export these 4 items:
- export const command: The bash command to execute (string)
- export const refreshFrequency: Refresh rate in milliseconds (number)
- export const render: React component function that receives {output} prop (function)
- export const className: CSS positioning for absolute placement (string)
Example format (customize for each request):
WriteUbersichtWidgetToFileSystem({jsxContent: `export const command = "echo hello"; export const refreshFrequency = 1000; export const render = ({output}) => { return <div>{output}</div>; }; export const className = "top: 20px; left: 20px;"`})
### Rules:
- The terms "ubersicht widget", "widget", "a widget", "the widget" must all be interpreted as "Übersicht widget"
- Generate complete, valid JSX code that follows the Übersicht widget API
- When you generate a widget, don't just show JSON or code - you MUST call the WriteUbersichtWidgetToFileSystem tool
- Report the results to the user after calling the tool
### Examples:
- "Generate a Übersicht widget" → Use WriteUbersichtWidgetToFileSystem tool
- "Can you add a widget that shows the time" → Use WriteUbersichtWidgetToFileSystem tool
- "Create a widget with a button" → Use WriteUbersichtWidgetToFileSystem tool
When the script that I use to compose the full training set is executed, entry['prompt'] and entry['code'] contain the prompt and the resulting JSX code for one of the examples I'm feeding to the training session. This is repeated for about 60 such examples that I have in my sample data collection.
Thanks for any help.
Michael
Topic:
Machine Learning & AI
SubTopic:
Foundation Models
I'm working on a cross-platform AI app. It is a CMake project. The inference part should be built as a library separately on Windows and MacOS. On MacOS it should be built with objective-c and CoreML.
Here's my step roughly:
Create a XCode Project for CoreML inference and build it as static library. Models are compiled to ".mlmodelc", and codes are compile to binary ".a" lib.
Create a CMake Project for the app, and use the ".a" lib built by XCode.
Run the App.
I initialize the CoreML model like this(just for demostration):
#include "det.h" // the model header generated by xcode
auto url = [[NSURL alloc] initFileURLWithPath:[NSString stringWithFormat:@"%@/%@", dir, @"det.mlmodelc"]];
auto model = [[det alloc] initWithContentsOfURL:url error:&error]; // no error
The url is valid, and the initialization doesn't report any error. However, when I tried to do inference using codes like this:
auto cvPixelBuffer = createCVPixelBuffer(960, 960); // util function
auto preds = [model predictionFromImage:cvPixelBuffer error:NULL];
The output preds will be null and I got these errors:
2024-12-10 14:52:37.678201+0800 望言OCR[50204:5615023] [e5rt] E5RT encountered unknown exception.
2024-12-10 14:52:37.678237+0800 望言OCR[50204:5615023] [coreml] E5RT: E5RT encountered an unknown exception. (11)
2024-12-10 14:52:37.870739+0800 望言OCR[50204:5615023] H11ANEDevice::H11ANEDeviceOpen kH11ANEUserClientCommand_DeviceOpen call failed result=0xe00002e2
2024-12-10 14:52:37.870758+0800 望言OCR[50204:5615023] Device Open failed - status=0xe00002e2
2024-12-10 14:52:37.870760+0800 望言OCR[50204:5615023] (Single-ANE System) Critical Error: Could not open the only H11ANE device
2024-12-10 14:52:37.870769+0800 望言OCR[50204:5615023] H11ANEDeviceOpen failed: 0x17
2024-12-10 14:52:37.870845+0800 望言OCR[50204:5615023] H11ANEDevice::H11ANEDeviceOpen kH11ANEUserClientCommand_DeviceOpen call failed result=0xe00002e2
2024-12-10 14:52:37.870848+0800 望言OCR[50204:5615023] Device Open failed - status=0xe00002e2
2024-12-10 14:52:37.870849+0800 望言OCR[50204:5615023] (Single-ANE System) Critical Error: Could not open the only H11ANE device
2024-12-10 14:52:37.870853+0800 望言OCR[50204:5615023] H11ANEDeviceOpen failed: 0x17
2024-12-10 14:52:37.870857+0800 望言OCR[50204:5615023] [common] start: ANEDeviceOpen() failed : ret=23 :
It seems that CoreML failed to find ANE device. Is there anything need to be done before we use a CoreML Model as a library in a CMake or other non-XCode project?
By the way, codes like above will work on an XCode Native App with CoreML (I tested this before) . So I guess I missed some environment initializations in my non-XCode project?
I have seen inconsistent results for my Colab machine learning notebooks running locally on a Mac M4, compared to running the same notebook code on either T4 (in Colab) or a RTX3090 locally.
To illustrate the problems I have set up a notebook that implements two simple CNN models that solves the Fashion-MNIST problem. https://colab.research.google.com/drive/11BhtHhN079-BWqv9QvvcSD9U4mlVSocB?usp=sharing
For the good model with 2M parameters I get the following results:
T4 (Colab, JAX): Test accuracy: 0.925
3090 (Local PC via ssh tunnel, Jax): Test accuracy: 0.925
Mac M4 (Local, JAX): Test accuracy: 0.893
Mac M4 (Local, Tensorflow): Test accuracy: 0.893
That is, I see a significant drop in performance when I run on the Mac M4 compared to the NVIDIA machines, and it seems to be independent of backend. I however do not know how to pinpoint this to either Keras or Apple’s METAL implementation. I have reported this to Keras: https://colab.research.google.com/drive/11BhtHhN079-BWqv9QvvcSD9U4mlVSocB?usp=sharing but as this can be (likely is?) an Apple Metal issue, I wanted to report this here as well.
On the mac I am running the following Python libraries:
keras 3.9.1
tensorflow 2.19.0
tensorflow-metal 1.2.0
jax 0.5.3
jax-metal 0.1.1
jaxlib 0.5.3
Topic:
Machine Learning & AI
SubTopic:
General
I used the multifunction models feature introduced in iOS 18 to merge three VAE Encoder models with different resolutions into a single model. However, loading this merged model on iOS causes a crash with the error EXC_BAD_ACCESS (code=1, address=0x0). In contrast, merging VAE Decoder models using the same method does not result in crashes. Additionally, merging only two VAE Decoder models with different resolutions also leads to a crash when loaded on iOS. As for the Stable Diffusion Unet model, merging two or even three models does not cause any crashes, and it successfully generates images as expected.
I use the following code to load the model:
let config = MLModelConfiguration()
config.computeUnits = .cpuAndNeuralEngine
config.functionName = "test"
try MLModel(contentsOf: url, configuration: config)
Hello,
I'm running a large language model (LLM) in Core ML that uses a key-value cache (KV-cache) to store past attention states. The model was converted from PyTorch using coremltools and deployed on-device with Swift. The KV-cache is exposed via MLState and is used across inference steps for efficient autoregressive generation.
During the prefill stage — where a prompt of multiple tokens is passed to the model in a single batch to initialize the KV-cache — I’ve noticed that some entries in the KV-cache are not updated after the inference. Specifically:
Here are a few details about the setup:
The MLState returned by the model is identical to the input state (often empty or zero-initialized) for some tokens in the batch.
The issue only happens during the prefill stage (i.e., first call over multiple tokens).
During decoding (single-token generation), the KV-cache updates normally.
The model is invoked using MLModel.prediction(from:using:options:) for each batch.
I’ve confirmed:
The prompt tokens are non-repetitive and not masked.
The model spec has MLState inputs/outputs correctly configured for KV-cache tensors.
Each token is processed in a loop with the correct positional encodings.
Questions:
Is there any known behavior in Core ML that could prevent MLState from updating during batched or prefill inference?
Could this be caused by internal optimizations such as lazy execution, static masking, or zero-value short-circuiting?
How can I confirm that each token in the batch is contributing to the KV-cache during prefill?
Any insights from the Core ML or LLM deployment community would be much appreciated.
Is the face and body detection service in the Vision framework a local model or a cloud model?
https://developer.apple.com/documentation/vision
Hi all, I am interested in unlocking unique applications with the new foundational models. I have a few questions regarding the availability of the following features:
Image Input: The update in June 2025 mentions "image" 44 times (https://machinelearning.apple.com/research/apple-foundation-models-2025-updates) - however I can't seem to find any information about having images as the input/prompt for the foundational models. When will this be available? I understand that there are existing Vision ML APIs, but I want image input into a multimodal on-device LLM (VLM) instead for features like "Which player is holding the ball in the image", etc (image understanding)
Cloud Foundational Model - when will this be available?
Thanks!
Clement :)
Topic:
Machine Learning & AI
SubTopic:
Foundation Models
Tags:
Vision
Machine Learning
Core ML
Apple Intelligence
there was a beta version. after the update it worked just like regular Siri. this message has been there for two days now but there is no loading.
Topic:
Machine Learning & AI
SubTopic:
Apple Intelligence
We’ve encountered what appears to be a CoreML regression between macOS 26.0.1 and macOS 26.1 Beta.
In macOS 26.0.1, CoreML models run and produce correct results. However, in macOS 26.1 Beta, the same models produce scrambled or corrupted outputs, suggesting that tensor memory is being read or written incorrectly. The behavior is consistent with a low-level stride or pointer arithmetic issue — for example, using 16-bit strides on 32-bit data or other mismatches in tensor layout handling.
Reproduction
Install ON1 Photo RAW 2026 or ON1 Resize 2026 on macOS 26.0.1.
Use the newest Highest Quality resize model, which is Stable Diffusion–based and runs through CoreML.
Observe correct, high-quality results.
Upgrade to macOS 26.1 Beta and run the same operation again.
The output becomes visually scrambled or corrupted.
We are also seeing similar issues with another Stable Diffusion UNet model that previously worked correctly on macOS 26.0.1. This suggests the regression may affect multiple diffusion-style architectures, likely due to a change in CoreML’s tensor stride, layout computation, or memory alignment between these versions.
Notes
The affected models are exported using standard CoreML conversion pipelines.
No custom operators or third-party CoreML runtime layers are used.
The issue reproduces consistently across multiple machines.
It would be helpful to know if there were changes to CoreML’s tensor layout, precision handling, or MLCompute backend between macOS 26.0.1 and 26.1 Beta, or if this is a known regression in the current beta.
Hello, I have to create an app in Swift that it scan NFC Identity card. It extract data and convert it to human readable data. I do it with below code
import CoreNFC
class NFCIdentityCardReader: NSObject , NFCTagReaderSessionDelegate {
func tagReaderSessionDidBecomeActive(_ session: NFCTagReaderSession) {
print("\(session.description)")
}
func tagReaderSession(_ session: NFCTagReaderSession, didInvalidateWithError error: any Error) {
print("NFC Error: \(error.localizedDescription)")
}
var session: NFCTagReaderSession?
func beginScanning() {
guard NFCTagReaderSession.readingAvailable else {
print("NFC is not supported on this device")
return
}
session = NFCTagReaderSession(pollingOption: .iso14443, delegate: self, queue: nil)
session?.alertMessage = "Hold your NFC identity card near the device."
session?.begin()
}
func tagReaderSession(_ session: NFCTagReaderSession, didDetect tags: [NFCTag]) {
guard let tag = tags.first else {
session.invalidate(errorMessage: "No tag detected")
return
}
session.connect(to: tag) { (error) in
if let error = error {
session.invalidate(errorMessage: "Connection error: \(error.localizedDescription)")
return
}
switch tag {
case .miFare(let miFareTag):
self.readMiFareTag(miFareTag, session: session)
case .iso7816(let iso7816Tag):
self.readISO7816Tag(iso7816Tag, session: session)
case .iso15693, .feliCa:
session.invalidate(errorMessage: "Unsupported tag type")
@unknown default:
session.invalidate(errorMessage: "Unknown tag type")
}
}
}
private func readMiFareTag(_ tag: NFCMiFareTag, session: NFCTagReaderSession) {
// Read from MiFare card, assuming it's formatted as an identity card
let command: [UInt8] = [0x30, 0x04] // Example: Read command for block 4
let requestData = Data(command)
tag.sendMiFareCommand(commandPacket: requestData) { (response, error) in
if let error = error {
session.invalidate(errorMessage: "Error reading MiFare: \(error.localizedDescription)")
return
}
let readableData = String(data: response, encoding: .utf8) ?? response.map { String(format: "%02X", $0) }.joined()
session.alertMessage = "ID Card Data: \(readableData)"
session.invalidate()
}
}
private func readISO7816Tag(_ tag: NFCISO7816Tag, session: NFCTagReaderSession) {
let selectAppCommand = NFCISO7816APDU(instructionClass: 0x00, instructionCode: 0xA4, p1Parameter: 0x04, p2Parameter: 0x00, data: Data([0xA0, 0x00, 0x00, 0x02, 0x47, 0x10, 0x01]), expectedResponseLength: -1)
tag.sendCommand(apdu: selectAppCommand) { (response, sw1, sw2, error) in
if let error = error {
session.invalidate(errorMessage: "Error reading ISO7816: \(error.localizedDescription)")
return
}
let readableData = response.map { String(format: "%02X", $0) }.joined()
session.alertMessage = "ID Card Data: \(readableData)"
session.invalidate()
}
}
}
But I got null. I think that these data are encrypted. How can I convert them to readable data without MRZ, is it possible ?
I need to get personal informations from Identity card via Core NFC.
Thanks in advance.
Best regards
Woke up to a notification saying playground, Genmoji…etc was ready. but every time I try to use it says early access was requested. Anyone else had this issue? if so how did you fix it?
Topic:
Machine Learning & AI
SubTopic:
Apple Intelligence
Incident Identifier: 4C22F586-71FB-4644-B823-A4B52D158057
CrashReporter Key: adc89b7506c09c2a6b3a9099cc85531bdaba9156
Hardware Model: Mac16,10
Process: PRISMLensCore [16561]
Path: /Applications/PRISMLens.app/Contents/Resources/app.asar.unpacked/node_modules/core-node/PRISMLensCore.app/PRISMLensCore
Identifier: com.prismlive.camstudio
Version: (null) ((null))
Code Type: ARM-64
Parent Process: ? [16560]
Date/Time: (null)
OS Version: macOS 15.4 (24E5228e)
Report Version: 104
Exception Type: EXC_CRASH (SIGABRT)
Exception Codes: 0x00000000 at 0x0000000000000000
Crashed Thread: 34
Application Specific Information:
*** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: '*** -[__NSArrayM insertObject:atIndex:]: object cannot be nil'
Thread 34 Crashed:
0 CoreFoundation 0x000000018ba4dde4 0x18b960000 + 974308 (__exceptionPreprocess + 164)
1 libobjc.A.dylib 0x000000018b512b60 0x18b4f8000 + 109408 (objc_exception_throw + 88)
2 CoreFoundation 0x000000018b97e69c 0x18b960000 + 124572 (-[__NSArrayM insertObject:atIndex:] + 1276)
3 Portrait 0x0000000257e16a94 0x257da3000 + 473748 (-[PTMSRResize addAdditionalOutput:] + 604)
4 Portrait 0x0000000257de91c0 0x257da3000 + 287168 (-[PTEffectRenderer initWithDescriptor:metalContext:useHighResNetwork:faceAttributesNetwork:humanDetections:prevTemporalState:asyncInitQueue:sharedResources:] + 6204)
5 Portrait 0x0000000257dab21c 0x257da3000 + 33308 (__33-[PTEffect updateEffectDelegate:]_block_invoke.241 + 164)
6 libdispatch.dylib 0x000000018b739b2c 0x18b738000 + 6956 (_dispatch_call_block_and_release + 32)
7 libdispatch.dylib 0x000000018b75385c 0x18b738000 + 112732 (_dispatch_client_callout + 16)
8 libdispatch.dylib 0x000000018b742350 0x18b738000 + 41808 (_dispatch_lane_serial_drain + 740)
9 libdispatch.dylib 0x000000018b742e2c 0x18b738000 + 44588 (_dispatch_lane_invoke + 388)
10 libdispatch.dylib 0x000000018b74d264 0x18b738000 + 86628 (_dispatch_root_queue_drain_deferred_wlh + 292)
11 libdispatch.dylib 0x000000018b74cae8 0x18b738000 + 84712 (_dispatch_workloop_worker_thread + 540)
12 libsystem_pthread.dylib 0x000000018b8ede64 0x18b8eb000 + 11876 (_pthread_wqthread + 292)
13 libsystem_pthread.dylib 0x000000018b8ecb74 0x18b8eb000 + 7028 (start_wqthread + 8)
Topic:
Machine Learning & AI
SubTopic:
General
I want to use Foundation Models in a project, but I know my users will want to avoid environmentally intensive AI work in data centers.
Does Foundation Models ever use Private Compute Cloud or any other kind of cloud-based AI system?
I'd like to be able to assure my users that the LLM usage is relatively environmentally friendly. It would be great to be able to cite a specific Apple page explaining that Foundation Models work is always done locally.
If there's any chance that work can be done in the cloud, is there a way to opt out of that?
Topic:
Machine Learning & AI
SubTopic:
Foundation Models