Following WWDC24 video "Discover Swift enhancements in the Vision framework" recommendations (cfr video at 10'41"), I used the following code to perform multiple new iOS 18 `RecognizedTextRequest' in parallel.
Problem: if more than 2 request are run in parallel, the request will hang, leaving the app in a state where no more requests can be started. -> deadlock
I tried other ways to run the requests, but no matter the method employed, or what device I use: no more than 2 requests can ever be run in parallel.
func triggerDeadlock() {}
try await withThrowingTaskGroup(of: Void.self) { group in
// See: WWDC 2024 Discover Siwft enhancements in the Vision framework at 10:41
// ############## THIS IS KEY
let maxOCRTasks = 5 // On a real-device, if more than 2 RecognizeTextRequest are launched in parallel using tasks, the request hangs
// ############## THIS IS KEY
for idx in 0..<maxOCRTasks {
let url = ... // URL to some image
group.addTask {
// Perform OCR
let _ = await performOCRRequest(on: url: url)
}
}
var nextIndex = maxOCRTasks
for try await _ in group { // Wait for the result of the next child task that finished
if nextIndex < pageCount {
group.addTask {
let url = ... // URL to some image
// Perform OCR
let _ = await performOCRRequest(on: url: url)
}
nextIndex += 1
}
}
}
}
// MARK: - ASYNC/AWAIT version with iOS 18
@available(iOS 18, *)
func performOCRRequest(on url: URL) async throws -> [RecognizedText] {
// Create request
var request = RecognizeTextRequest() // Single request: no need for ImageRequestHandler
// Configure request
request.recognitionLevel = .accurate
request.automaticallyDetectsLanguage = true
request.usesLanguageCorrection = true
request.minimumTextHeightFraction = 0.016
// Perform request
let textObservations: [RecognizedTextObservation] = try await request.perform(on: url)
// Convert [RecognizedTextObservation] to [RecognizedText]
return textObservations.compactMap { observation in
observation.topCandidates(1).first
}
}
I also found this Swift forums post mentioning something very similar.
I also opened a feedback: FB17240843
Explore the power of machine learning and Apple Intelligence within apps. Discuss integrating features, share best practices, and explore the possibilities for your app here.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
Whenever I try to initialize a LanguageModelSession (let session = LanguageModelSession()), my app crashes with EXC_BAD_ACCESS.
SystemLanguageModel.default.availability returns available.
I tried running the two sample projects I found that use Foundation Models, FoundationModelsTripPlanner and SwiftTranscriptionSampleApp, and they both also crash—immediately on launch.
I commented out the Foundation Models logic from the SwiftTranscriptionSampleApp and ran it again, and it no longer crashed.
I'm on macOS 26 Beta 4 on an M1 Pro device. I'm based in Austria (EU), if that matters.
Has Apple made any commitment to versioning the Foundation Models on device? What if you build a feature that works great on 26.0 but they change the model or guardrails in 26.1 and it breaks your feature, is your only recourse filing Feedback or pulling the feature from the app? Will there be a way to specify a model version like in all of the server based LLM provider APIs? If not, sounds risky to build on.
Hi all,
I’m encountering an issue when trying to run Apple Foundation Models in a blank project targeting iOS 26.
Below are the details:
Xcode: Latest version with iOS 26 SDK
macOS: macOS 26 Tahoe (installed on main disk)
Mac: 16” MacBook Pro with M2 Pro chip
Apple Intelligence: Available and functional on this machine
Problem:
I created a new blank iOS project, set the deployment target to iOS 26, and ran the following minimal code using Foundation Models. However, I get no response at all in the output - not even an error. The app runs, but the model does not produce any output.
#Playground {
let session = LanguageModelSession()
let response = try await session.respond(to: "Tell me a story")
}
Then, I tried to catch an error with this code:
#Playground {
let session = LanguageModelSession()
do {
let response = try await session.respond(to: "Tell me a story")
print(response)
} catch {
print("Failed to get response:", error)
}
print("This line, never gets executed")
}
And got these results:
I’ve done further testing and discovered something important:
I tried running the Code Along sample project, and there the #Playground macro worked without issues. The only significant difference I noticed was the Canvas run destination:
In my original project, I was using iPhone 16 Pro (iOS 26) as the run target in Canvas. Apple Intelligence was enabled on the simulator, but no response was returned when executing the prompt.
In the sample project, the Canvas was running on My Mac.
I attempted to match that setup, but at first, my destination was My Mac (Designed for iPad), which still didn’t work. The macro finally executed properly once I switched to My Mac (AppKit).
So the question is ... it seems that for now, Foundation Models and the #Playground macro only run correctly when the canvas or destination is set to “My Mac (AppKit)”?
I am using macOS Tahoe on Xcode 26.
Topic:
Machine Learning & AI
SubTopic:
Foundation Models
After updating to macOS15.2beta, the Yolo11 object detection model exported to coreml outputs incorrect and abnormal bounding boxes.
It also doesn't work in iOS apps built on a 15.2 mac.
The same model worked fine on macOS14.1.
When training a Yolo11 custom model in Python, exporting it to coreml, and testing it in the preview tab of mlpackage on macOS15.2 and Xcode16.0, the above result is obtained.
Hello,
I'm unable to develop for Apple Intelligence on my Mac Studio, M1 Max running macOS 26 beta 1.
The models get downloaded and I can also verify that they exist in /System/Library/AssetsV2/ however the download progress remains stuck at 100%.
Checking console logs shows the process generativeexperiencesd reporting the following:
My device region and language is set to English (India).
Things I've already tried:
Changing language and region to English (US)
Reinstalling macOS
Trying with a different ISP via hotspot.
Dear Apple Foundation Models Development Team,
I am a developer integrating Apple Foundation Models (AFM) into my app and encountered the exceededContextWindowSize error when exceeding the 4096-token limit.
Proposal:
I suggest Apple develop a tool to estimate the token count of a prompt before sending it to the model. This tool could be integrated into FoundationModels Framework for ease of use.
Benefits:
A token estimation tool would help developers manage the context window limit and optimize performance. I hope Apple considers this proposal soon.
Thank you!
Topic:
Machine Learning & AI
SubTopic:
Foundation Models
Hello everyone,
I am trying to train using CreateML Version 6.0 Beta (146.1), feature extractor Image Feature Print v2.
I am using 100K images for a total ~4GB on my M3 Max 48GB (MacOs 15.0 Beta (24A5279h))
The images seems to be correctly read and visualized in the Data Source section (no images with corrupted data seems to be there).
When I start the training it's all fine for the first 6k ~ 7k pictures, then I receive the following error:
Failed to create CVPixelBufferPool. Width = 0, Height = 0, Format = 0x00000000
It is the first time I am using it, so I don't really have so much of experience.
Could you help me to understand what could be the problem?
Thanks a lot
Did something change on face detection / Vision Framework on iOS 15?
Using VNDetectFaceLandmarksRequest and reading the VNFaceLandmarkRegion2D to detect eyes is not working on iOS 15 as it did before. I am running the exact same code on an iOS 14 and iOS 15 device and the coordinates are different as seen on the screenshot?
Any Ideas?
I'm using Numbers to build a spreadsheet that I'm exporting as a CSV. I then import this file into Create ML to train a word tagger model. Everything has been working fine for all the models I've trained so far, but now I'm coming across a use case that has been breaking the import process: commas within the training data. This is a case that none of Apple's examples show.
My project takes Navajo text that has been tokenized by syllables and labels the parts-of-speech.
Case that works...
Raw text:
Naaltsoos yídéeshtah.
Tokens column:
Naal,tsoos, ,yí,déesh,tah,.
Labels column:
NObj,NObj,Space,Verb,Verb,VStem,Punct
Case that breaks...
Raw text:
óola, béésh łigaii, tłʼoh naadą́ą́ʼ, wáin, akʼah, dóó á,shįįh
Tokens column with tokenized text (commas quoted):
óo,la,",", ,béésh, ,łi,gaii,",", ,tłʼoh, ,naa,dą́ą́ʼ,",", ,wáin,",", ,a,kʼah,",", ,dóó, ,á,shįįh
(Create ML reports mismatched columns)
Tokens column with tokenized text (commas escaped):
óo,la,\,, ,béésh, ,łi,gaii,\,, ,tłʼoh, ,naa,dą́ą́ʼ,\,, ,wáin,\,, ,a,kʼah,\,, ,dóó, ,á,shįįh
(Create ML reports mismatched columns)
Tokens column with tokenized text (commas escape-quoted):
óo,la,\",\", ,béésh, ,łi,gaii,\",\", ,tłʼoh, ,naa,dą́ą́ʼ,\",\", ,wáin,\",\", ,a,kʼah,\",\", ,dóó, ,á,shįįh
(record not detected by Create ML)
Tokens column with tokenized text (commas escape-quoted):
óo,la,"","", ,béésh, ,łi,gaii,"","", ,tłʼoh, ,naa,dą́ą́ʼ,"","", ,wáin,"","", ,a,kʼah,"","", ,dóó, ,á,shįįh
(Create ML reports mismatched columns)
Labels column:
NSub,NSub,Punct,Space,NSub,Space,NSub,NSub,Punct,Space,NSub,Space,NSub,NSub,Punct,Space,NSub,Punct,Space,NSub,NSub,Punct,Space,Conj,Space,NSub,NSub
Sample From Spreadsheet
Solution Needed
It's simple enough to escape commas within CSV files, but the format needed by Create ML essentially combines entire CSV records into single columns, so I'm ending up needing a CSV record that contains a mixture of commas to use for parsing and ones to use as character literals. That's where this gets complicated.
For this particular use case (which seems like it would frequently arise when training a word tagger model), how should I properly escape a comma literal?
Topic:
Machine Learning & AI
SubTopic:
Create ML
Tags:
Natural Language
Machine Learning
Create ML
TabularData
Hey,
I receive GenerableContent as follows:
let response = try await session.respond(to: "", schema: generationSchema)
And it wraps GeneratedJSON which seems to be private.
What is the best way to get a string / raw value out of it? I noticed it could theoretically be accessed via transcriptEntries but it's not ideal.
Topic:
Machine Learning & AI
SubTopic:
Foundation Models
Hi, unfortunately I am not able to verify this but I remember some time ago I was able to create CoreML models that had one (or more) inputs with an enumerated shape size, and one (or more) inputs with a static shape.
This was some months ago. Since then I updated my MacOS to Sequoia 15.5, and when I try to execute MLModels with this setup I get the following error
libc++abi: terminating due to uncaught exception of type CoreML::MLNeuralNetworkUtilities::AsymmetricalEnumeratedShapesException: A model doesn't allow input features with enumerated flexibility to have unequal number of enumerated shapes, but input feature global_write_indices has 1 enumerated shapes and input feature input_hidden_states has 3 enumerated shapes.
It may make sense (but not really though) to verify that for inputs with a flexible enumerated shape they all have the same number of possible shapes is the same, but this should not impede the possibility of also having static shape inputs with a single shape defined alongside the flexible shape inputs.
Not finding a lot on the Swift Assist technology announced at WWDC 2024. Does anyone know the latest status? Also, currently I use OpenAI's macOS app and its 'Work With...' functionality to assist with Xcode development, and this is okay, certainly saves copying code back and forth, but it seems like AI should be able to do a lot more to help with Xcode app development.
I guess I'm looking at what people are doing with AI in Visual Studio, Cline, Cursor and other IDEs and tools like those and feel a bit left out working in Xcode. Please let me know if there are AI tools or techniques out there you use to help with your Xcode projects.
Thanks in advance!
Hi Apple Developer Community,
I’m exploring ways to fine-tune the SNSoundClassifier to allow users of my iOS app to personalize the model by adding custom sounds or adjusting predictions. While Apple’s WWDC session on sound classification explains how to train from scratch, I’m specifically interested in using SNSoundClassifier as the base model and building/fine-tuning on top of it.
Here are a few questions I have:
1. Fine-Tuning on SNSoundClassifier:
Is there a way to fine-tune this model programmatically through APIs? The manual approach using macOS, as shown in this documentation is clear, but how can it be done dynamically - within the app for users or in a cloud backend (AWS/iCloud)?
Are there APIs or classes that support such on-device/cloud-based fine-tuning or incremental learning? If not directly, can the classifier’s embeddings be used to train a lightweight custom layer?
Training is likely computationally intensive and drains too much on battery, doing it on cloud can be right way but need the right apis to get this done. A sample code will do good.
2. Recommended Approach for In-App Model Customization:
If SNSoundClassifier doesn’t support fine-tuning, would transfer learning on models like MobileNetV2, YAMNet, OpenL3, or FastViT be more suitable?
Given these models (SNSoundClassifier, MobileNetV2, YAMNet, OpenL3, FastViT), which one would be best for accuracy and performance/efficiency on iOS? I aim to maintain real-time performance without sacrificing battery life. Also it is important to see architecture retention and accuracy after conversion to CoreML model.
3. Cost-Effective Backend Setup for Training:
Mac EC2 instances on AWS have a 24-hour minimum billing, which can become expensive for limited user requests. Are there better alternatives for deploying and training models on user request when s/he uploads files (training data)?
4. TensorFlow vs PyTorch:
Between TensorFlow and PyTorch, which framework would you recommend for iOS Core ML integration? TensorFlow Lite offers mobile-optimized models, but I’m also curious about PyTorch’s performance when converted to Core ML.
5. Metrics:
Metrics I have in mind while picking the model are these: Publisher, Accuracy, Fine-Tuning capability, Real-Time/Live use, Suitability of iPhone 16, Architectural retention after coreML conversion, Reasons for unsuitability, Recommended use case.
Any insights or recommended approaches would be greatly appreciated.
Thanks in advance!
Topic:
Machine Learning & AI
SubTopic:
Create ML
Tags:
ML Compute
Machine Learning
Core ML
Create ML
Using Tensorflow for Silicon gives inaccurate results when compared to Google Colab GPU (9-15% differences). Here are my install versions for 4 anaconda env's. I understand the Floating point precision can be an issue, batch size, activation functions but how do you rectify this issue for the past 3 years?
1.) Version TF: 2.12.0, Python 3.10.13, tensorflow-deps: 2.9.0, tensorflow-metal: 1.2.0, h5py: 3.6.0, keras: 2.12.0
2.) Version TF: 2.19.0, Python 3.11.0, tensorflow-metal: 1.2.0, h5py: 3.13.0, keras: 3.9.2, jax: 0.6.0, jax-metal: 0.1.1,jaxlib: 0.6.0, ml_dtypes: 0.5.1
3.) python: 3.10.13,tensorflow: 2.19.0,tensorflow-metal: 1.2.0, h5py: 3.13.0, keras: 3.9.2, ml_dtypes: 0.5.1
4.) Version TF: 2.16.2, tensorflow-deps:2.9.0,Python: 3.10.16, tensorflow-macos 2.16.2, tensorflow-metal: 1.2.0, h5py:3.13.0, keras: 3.9.2, ml_dtypes: 0.3.2
Install of Each ENV with common example:
Create ENV: conda create --name TF_Env_V2 --no-default-packages
start env: source TF_Env_Name
ENV_1.) conda install -c apple tensorflow-deps , conda install tensorflow,pip install tensorflow-metal,conda install ipykernel
ENV_2.) conda install pip python==3.11, pip install tensorflow,pip install tensorflow-metal,conda install ipykernel
ENV_3) conda install pip python 3.10.13,pip install tensorflow, pip install tensorflow-metal,conda install ipykernel
ENV_4) conda install -c apple tensorflow-deps, pip install tensorflow-macos, pip install tensor-metal, conda install ipykernel
Example used on all 4 env:
import tensorflow as tf
cifar = tf.keras.datasets.cifar100
(x_train, y_train), (x_test, y_test) = cifar.load_data()
model = tf.keras.applications.ResNet50(
include_top=True,
weights=None,
input_shape=(32, 32, 3),
classes=100,)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False)
model.compile(optimizer="adam", loss=loss_fn, metrics=["accuracy"])
model.fit(x_train, y_train, epochs=5, batch_size=64)
Hi, The most recent version of tensorflow-metal is only available for macosx 12.0 and python up to version 3.11. Is there any chance it could be updated with wheels for macos 15 and Python 3.12 (which is the default version supported for tensrofllow 2.17+)? I'd note that even downgrading to Python 3.11 would not be sufficient, as the wheels only work for macos 12.
Thanks.
I'm experimenting with downloading an audio file of spoken content, using the Speech framework to transcribe it, then using FoundationModels to clean up the formatting to add paragraph breaks and such. I have this code to do that cleanup:
private func cleanupText(_ text: String) async throws -> String? {
print("Cleaning up text of length \(text.count)...")
let session = LanguageModelSession(instructions: "The content you read is a transcription of a speech. Separate it into paragraphs by adding newlines. Do not modify the content - only add newlines.")
let response = try await session.respond(to: .init(text), generating: String.self)
return response.content
}
The content length is about 29,000 characters. And I get this error:
InferenceError::inferenceFailed::Failed to run inference: Context length of 4096 was exceeded during singleExtend..
Is 4096 a reference to a max input length? Or is this a bug?
This is running on an M1 iPad Air, with iPadOS 26 Seed 1.
In working with Apple's foundation models, we often want to provide as much context as possible. However, since the model has a context size limit of 4096 tokens, is there a way to estimate the number of tokens beforehand?
Topic:
Machine Learning & AI
SubTopic:
Foundation Models
Testing Foundation Models framework with a health-focused recipe generation app. The on-device approach is appealing but performance is rough. Taking 20+ seconds just to get recipe name and description. Same content from Claude API: 4 seconds.
I know it's beta and on-device has different tradeoffs, but this is approaching unusable territory for real-time user experience. The streaming helps psychologically but doesn't mask the underlying latency.The privacy/cost benefits are compelling but not if users abandon the feature before it completes.
Anyone else seeing similar performance? Is this expected for beta, or are there optimization techniques I'm missing?
Topic:
Machine Learning & AI
SubTopic:
Foundation Models