Explore the power of machine learning and Apple Intelligence within apps. Discuss integrating features, share best practices, and explore the possibilities for your app here.

All subtopics
Posts under Machine Learning & AI topic

Post

Replies

Boosts

Views

Activity

A Summary of the WWDC25 Group Lab - Machine Learning and AI Frameworks
At WWDC25 we launched a new type of Lab event for the developer community - Group Labs. A Group Lab is a panel Q&A designed for a large audience of developers. Group Labs are a unique opportunity for the community to submit questions directly to a panel of Apple engineers and designers. Here are the highlights from the WWDC25 Group Lab for Machine Learning and AI Frameworks. What are you most excited about in the Foundation Models framework? The Foundation Models framework provides access to an on-device Large Language Model (LLM), enabling entirely on-device processing for intelligent features. This allows you to build features such as personalized search suggestions and dynamic NPC generation in games. The combination of guided generation and streaming capabilities is particularly exciting for creating delightful animations and features with reliable output. The seamless integration with SwiftUI and the new design material Liquid Glass is also a major advantage. When should I still bring my own LLM via CoreML? It's generally recommended to first explore Apple's built-in system models and APIs, including the Foundation Models framework, as they are highly optimized for Apple devices and cover a wide range of use cases. However, Core ML is still valuable if you need more control or choice over the specific model being deployed, such as customizing existing system models or augmenting prompts. Core ML provides the tools to get these models on-device, but you are responsible for model distribution and updates. Should I migrate PyTorch code to MLX? MLX is an open-source, general-purpose machine learning framework designed for Apple Silicon from the ground up. It offers a familiar API, similar to PyTorch, and supports C, C++, Python, and Swift. MLX emphasizes unified memory, a key feature of Apple Silicon hardware, which can improve performance. It's recommended to try MLX and see if its programming model and features better suit your application's needs. MLX shines when working with state-of-the-art, larger models. Can I test Foundation Models in Xcode simulator or device? Yes, you can use the Xcode simulator to test Foundation Models use cases. However, your Mac must be running macOS Tahoe. You can test on a physical iPhone running iOS 18 by connecting it to your Mac and running Playgrounds or live previews directly on the device. Which on-device models will be supported? any open source models? The Foundation Models framework currently supports Apple's first-party models only. This allows for platform-wide optimizations, improving battery life and reducing latency. While Core ML can be used to integrate open-source models, it's generally recommended to first explore the built-in system models and APIs provided by Apple, including those in the Vision, Natural Language, and Speech frameworks, as they are highly optimized for Apple devices. For frontier models, MLX can run very large models. How often will the Foundational Model be updated? How do we test for stability when the model is updated? The Foundation Model will be updated in sync with operating system updates. You can test your app against new model versions during the beta period by downloading the beta OS and running your app. It is highly recommended to create an "eval set" of golden prompts and responses to evaluate the performance of your features as the model changes or as you tweak your prompts. Report any unsatisfactory or satisfactory cases using Feedback Assistant. Which on-device model/API can I use to extract text data from images such as: nutrition labels, ingredient lists, cashier receipts, etc? Thank you. The Vision framework offers the RecognizeDocumentRequest which is specifically designed for these use cases. It not only recognizes text in images but also provides the structure of the document, such as rows in a receipt or the layout of a nutrition label. It can also identify data like phone numbers, addresses, and prices. What is the context window for the model? What are max tokens in and max tokens out? The context window for the Foundation Model is 4,096 tokens. The split between input and output tokens is flexible. For example, if you input 4,000 tokens, you'll have 96 tokens remaining for the output. The API takes in text, converting it to tokens under the hood. When estimating token count, a good rule of thumb is 3-4 characters per token for languages like English, and 1 character per token for languages like Japanese or Chinese. Handle potential errors gracefully by asking for shorter prompts or starting a new session if the token limit is exceeded. Is there a rate limit for Foundation Models API that is limited by power or temperature condition on the iPhone? Yes, there are rate limits, particularly when your app is in the background. A budget is allocated for background app usage, but exceeding it will result in rate-limiting errors. In the foreground, there is no rate limit unless the device is under heavy load (e.g., camera open, game mode). The system dynamically balances performance, battery life, and thermal conditions, which can affect the token throughput. Use appropriate quality of service settings for your tasks (e.g., background priority for background work) to help the system manage resources effectively. Do the foundation models support languages other than English? Yes, the on-device Foundation Model is multilingual and supports all languages supported by Apple Intelligence. To get the model to output in a specific language, prompt it with instructions indicating the user's preferred language using the locale API (e.g., "The user's preferred language is en-US"). Putting the instructions in English, but then putting the user prompt in the desired output language is a recommended practice. Are larger server-based models available through Foundation Models? No, the Foundation Models API currently only provides access to the on-device Large Language Model at the core of Apple Intelligence. It does not support server-side models. On-device models are preferred for privacy and for performance reasons. Is it possible to run Retrieval-Augmented Generation (RAG) using the Foundation Models framework? Yes, it is possible to run RAG on-device, but the Foundation Models framework does not include a built-in embedding model. You'll need to use a separate database to store vectors and implement nearest neighbor or cosine distance searches. The Natural Language framework offers simple word and sentence embeddings that can be used. Consider using a combination of Foundation Models and Core ML, using Core ML for your embedding model.
1
0
798
Jun ’25
Visual Intelligence API SemanticContentDescriptor labels are empty
I'm trying to use Apple's new Visual Intelligence API for recommending content through screenshot image search. The problem I encountered is that the SemanticContentDescriptor labels are either completely empty or super misleading, making it impossible to query for similar content on my app. Even the closest matching example was inaccurate, returning a single label ["cardigan"] for a Supreme T-Shirt. I see other apps using this API like Etsy for example, and I'm wondering if they're using the input pixel buffer to query for similar content rather than using the labels? If anyone has a similar experience or something that wasn't called out in the documentation please lmk! Thanks.
0
0
89
18h
CoreML Inference Acceleration
Hello everyone, I have a visual convolutional model and a video that has been decoded into many frames. When I perform inference on each frame in a loop, the speed is a bit slow. So, I started 4 threads, each running inference simultaneously, but I found that the speed is the same as serial inference, every single forward inference is slower. I used the mactop tool to check the GPU utilization, and it was only around 20%. Is this normal? How can I accelerate it?
2
0
555
2d
Foundation Models not working: "Model is unavailable" error on iPad Pro M4
I am excited to try Foundation Models during WWDC, but it doesn't work at all for me. When running on my iPad Pro M4 with iPadOS 26 seed 1, I get the following error even when running the simplest query: let prompt = "How are you?" let stream = session.streamResponse(to: prompt) for try await partial in stream { self.answer = partial self.resultString = partial } In the Xcode console, I see the following error: assetsUnavailable(FoundationModels.LanguageModelSession.GenerationError.Context(debugDescription: "Model is unavailable", underlyingErrors: [])) I have verified that Apple Intelligence is enabled on my iPad. Any tips on how can I get it working? I have also submitted this feedback: FB17896752
4
3
769
2d
videotoolbox superresolution
Hello, I'm using videotoolbox superresolution API in MACOS 26: https://developer.apple.com/documentation/videotoolbox/vtsuperresolutionscalerconfiguration/downloadconfigurationmodel(completionhandler:)?language=objc, when using swift, it's ok, when using objective-c, I get error when downloading model with downloadConfigurationModelWithCompletionHandler: [Auto] MA-auto{_failedLockContent} | failure reported by server | error:[com.apple.MobileAssetError.AutoAsset:MissingReference(6111)] [Auto] MA-auto{_failedLockContent} | failure reported by server | error:[com.apple.MobileAssetError.AutoAsset:UnderlyingError(6107)_1_com.apple.MobileAssetError.Download:47] Download completion handler called with error: The operation couldnxe2x80x99t be completed. (VTFrameProcessorErrorDomain error -19743.)
1
0
195
2d
iOS 26 beta breaking my model
I just recently updated to iOS 26 beta (23A5336a) to test an app I am developing I running an MLModel loaded from a .mlmodelc file. On the current iOS version 18.6.2 the model is running as expected with no issues. However on iOS 26 I am now getting error when trying to perform an inference to the model where I pass a camera frame into it. Below is the error I am seeing when I attempt to run an inference. at the bottom it says "Failed with status=0x1d : statusType=0x9: Program Inference error status=-1 Unable to compute the prediction using a neural network model. It can be an invalid input data or broken/unsupported model " does this indicate I need to convert my model or something? I don't understand since it runs as normal on iOS 18. Any help getting this to run again would be greatly appreciated. Thank you, processRequest:model:qos:qIndex:modelStringID:options:returnValue:error:: Could not process request ret=0x1d lModel=_ANEModel: { modelURL=file:///var/containers/Bundle/Application/04F01BF5-D48B-44EC-A5F6-3C7389CF4856/RizzCanvas.app/faceParsing.mlmodelc/ : sourceURL=(null) : UUID=46228BFC-19B0-45BF-B18D-4A2942EEC144 : key={"isegment":0,"inputs":{"input":{"shape":[512,512,1,3,1]}},"outputs":{"var_633":{"shape":[512,512,1,19,1]},"94_argmax_out_value":{"shape":[512,512,1,1,1]},"argmax_out":{"shape":[512,512,1,1,1]},"var_637":{"shape":[512,512,1,19,1]}}} : identifierSource=1 : cacheURLIdentifier=01EF2D3DDB9BA8FD1FDE18C7CCDABA1D78C6BD02DC421D37D4E4A9D34B9F8181_93D03B87030C23427646D13E326EC55368695C3F61B2D32264CFC33E02FFD9FF : string_id=0x00000000 : program=_ANEProgramForEvaluation: { programHandle=259022032430 : intermediateBufferHandle=13949 : queueDepth=127 } : state=3 : [Espresso::ANERuntimeEngine::__forward_segment 0] evaluate[RealTime]WithModel returned 0; code=8 err=Error Domain=com.apple.appleneuralengine Code=8 "processRequest:model:qos:qIndex:modelStringID:options:returnValue:error:: ANEProgramProcessRequestDirect() Failed with status=0x1d : statusType=0x9: Program Inference error" UserInfo={NSLocalizedDescription=processRequest:model:qos:qIndex:modelStringID:options:returnValue:error:: ANEProgramProcessRequestDirect() Failed with status=0x1d : statusType=0x9: Program Inference error} [Espresso::handle_ex_plan] exception=Espresso exception: "Generic error": ANEF error: /private/var/containers/Bundle/Application/04F01BF5-D48B-44EC-A5F6-3C7389CF4856/RizzCanvas.app/faceParsing.mlmodelc/model.espresso.net, processRequest:model:qos:qIndex:modelStringID:options:returnValue:error:: ANEProgramProcessRequestDirect() Failed with status=0x1d : statusType=0x9: Program Inference error status=-1 Unable to compute the prediction using a neural network model. It can be an invalid input data or broken/unsupported model (error code: -1). Error Domain=com.apple.Vision Code=3 "The VNCoreMLTransform request failed" UserInfo={NSLocalizedDescription=The VNCoreMLTransform request failed, NSUnderlyingError=0x114d92940 {Error Domain=com.apple.CoreML Code=0 "Unable to compute the prediction using a neural network model. It can be an invalid input data or broken/unsupported model (error code: -1)." UserInfo={NSLocalizedDescription=Unable to compute the prediction using a neural network model. It can be an invalid input data or broken/unsupported model (error code: -1).}}}
1
0
925
5d
Visual Intelligence -- Make OpenIntent show a sheet rather than open my App
The developer tutorial for visual intelligence indicates that the method to detect and handle taps on a displayed entity from the Search section is via an "OpenIntent" associated with your entity. However, running this intent executes code from within my app. If I have the perform() method display UI, it always displays UI from within my app. I noticed that the Google app's integration to visual intelligence has a different behavior-- tapping on an entity does not take you to the Google app -- instead, a Webview is presented sheet-style WITHIN the Visual Intelligence environment (see below) How is that accomplished?
0
0
478
5d
Python 3.13
Hello, Are there any plans to compile a python 3.13 version of tensorflow-metal? Just got my new Mac mini and the automatically installed version of python installed by brew is python 3.13 and while if I was in a hurry, I could manage to get python 3.12 installed and use the corresponding tensorflow-metal version but I'm not in a hurry. Many thanks, Alan
2
0
744
6d
ImagePlayground: Programmatic Creation Error
Hardware: Macbook Pro M4 Nov 2024 Software: macOS Tahoe 26.0 & xcode 26.0 Apple Intelligence is activated and the Image playground macOS app works Running the following on xcode throws ImagePlayground.ImageCreator.Error.creationFailed Any suggestions on how to make this work? import Foundation import ImagePlayground Task { let creator = try await ImageCreator() guard let style = creator.availableStyles.first else { print("No styles available") exit(1) } let images = creator.images( for: [.text("A cat wearing mittens.")], style: style, limit: 1) for try await image in images { print("Generated image: \(image)") } exit(0) } RunLoop.main.run()
0
0
226
1w
RecognizeDocumentsRequest for receipts
Hi, I'm trying to use the new RecognizeDocumentsRequest from the Vision Framework to read a receipt. It looks very promising by being able to read paragraphs, lines and detect data. So far it unfortunately seems to read every line on the receipt as a paragraph and when there is more space on one line it creates two paragraphs. Is there perhaps an Apple Engineer who knows if this is expected behaviour or if I should file a Feedback for this? Code setup: let request = RecognizeDocumentsRequest() let observations = try await request.perform(on: image) guard let document = observations.first?.document else { return } for paragraph in document.paragraphs { print(paragraph.transcript) for data in paragraph.detectedData { switch data.match.details { case .phoneNumber(let data): print("Phone: \(data)") case .postalAddress(let data): print("Postal: \(data)") case .calendarEvent(let data): print("Calendar: \(data)") case .moneyAmount(let data): print("Money: \(data)") case .measurement(let data): print("Measurement: \(data)") default: continue } } } See attached image as an example of a receipt I'd like to parse. The top 3 lines are the name, street, and postal code + city. These are all separate paragraphs. Checking on detectedData does see the street (2nd line) as PostalAddress, but not the complete address. Might that be a location thing since it's a Dutch address. And lower on the receipt it sees the block with "Pomp 1 95 Ongelood" and the things below also as separate paragraphs. First picking up the left side and after that the right side. So it's something like this: * Pomp 1 Volume Prijs € TOTAAL * BTW Netto 21.00 % 95 Ongelood 41,90 l 1.949/ 1 81.66 € 14.17 67.49
2
1
270
1w
Image Playground Error: Unable to Generate Images Using externalProvider Style
I’m working on generating images using Image Playground. The code works fine for other styles but fails when using an external provider. I don’t see any other requirements mentioned in the documentation. Has anyone else encountered a similar issue? Here’s the relevant code snippet: https://developer.apple.com/documentation/imageplayground/imageplaygroundstyle/externalprovider?changes=_2 The error message is also not very helpful. It simply states that the creation failed. Note: I have enabled ChatGPT Plus, and the image generation using ChatGPT styles works fine when using the Playground app. do { let creator = try await ImageCreator() let concept = ImagePlaygroundConcept.text("Love") let images = creator.images(for: [concept], style: .externalProvider, limit: 1) for try await image in images { // Handle image break } } catch { // Handle error } I’m using the iOS 26 RC, and when I print creator.availableStyles, it doesn’t display the external Provider. [ImagePlayground.ImagePlaygroundStyle(id: "animation", _representationInfo: nil), ImagePlayground.ImagePlaygroundStyle(id: "emoji", _representationInfo: nil), ImagePlayground.ImagePlaygroundStyle(id: "illustration", _representationInfo: nil), ImagePlayground.ImagePlaygroundStyle(id: "sketch", _representationInfo: nil), ImagePlayground.ImagePlaygroundStyle(id: "messages-background", _representationInfo: nil)]
1
0
756
1w
Problem running NLContextualEmbeddingModel in simulator
Environment MacOC 26 Xcode Version 26.0 beta 7 (17A5305k) simulator: iPhone 16 pro iOS: iOS 26 Problem NLContextualEmbedding.load() fails with the following error In simulator Failed to load embedding from MIL representation: filesystem error: in create_directories: Permission denied ["/var/db/com.apple.naturallanguaged/com.apple.e5rt.e5bundlecache"] filesystem error: in create_directories: Permission denied ["/var/db/com.apple.naturallanguaged/com.apple.e5rt.e5bundlecache"] Failed to load embedding model 'mul_Latn' - '5C45D94E-BAB4-4927-94B6-8B5745C46289' assetRequestFailed(Optional(Error Domain=NLNaturalLanguageErrorDomain Code=7 "Embedding model requires compilation" UserInfo={NSLocalizedDescription=Embedding model requires compilation})) in #Playground I'm new to this embedding model. Not sure if it's caused by my code or environment. Code snippet import Foundation import NaturalLanguage import Playgrounds #Playground { // Prefer initializing by script for broader coverage; returns NLContextualEmbedding? guard let embeddingModel = NLContextualEmbedding(script: .latin) else { print("Failed to create NLContextualEmbedding") return } print(embeddingModel.hasAvailableAssets) do { try embeddingModel.load() print("Model loaded") } catch { print("Failed to load model: \(error)") } }
0
0
288
1w
Foundational Model - Image as Input? Timeline
Hi all, I am interested in unlocking unique applications with the new foundational models. I have a few questions regarding the availability of the following features: Image Input: The update in June 2025 mentions "image" 44 times (https://machinelearning.apple.com/research/apple-foundation-models-2025-updates) - however I can't seem to find any information about having images as the input/prompt for the foundational models. When will this be available? I understand that there are existing Vision ML APIs, but I want image input into a multimodal on-device LLM (VLM) instead for features like "Which player is holding the ball in the image", etc (image understanding) Cloud Foundational Model - when will this be available? Thanks! Clement :)
1
0
400
2w
Model w/ Guardrails Disabled Still Frequently Refuses to Summarize Text
Foundation Models are driving me up the wall. My use case: A news app - I want to summarize news articles. Sounds like a perfect use for the added-in-beta-5 "no guardrails" mode for text-to-text transformations... ... and it's true, I don't get guardrails exceptions anymore but now, the model itself frequently refuses to summarize stuff which in a way is even worse as I have to parse the output text to figure out if it failed instead of getting an exception. I mostly worked that out with my system instructions but still, the refusing to summarize makes it really tough to use. I instructed the model to tell me why it failed if that happens. Examples of various refusals for news articles from major sources: "The article mentions "Visual Lookup" but does not provide details about how it integrates with iOS 26." "The article includes unsafe content regarding a political figure's potential influence over the Federal Reserve board, which is against my guidelines." "the article contains unsafe content." "The article is biased and opinionated and focuses on the author's opinion." (this is despite the instructions specifically asking for a neutral summary - I am asking it to not use bias in the output but it still refuses) I have tons of these. Note that if I don't use the "no guardrails" mode and use a Generable instead, some of these work fine so right now I have to do two passes on much of the content since I never know which one will work. Having a "summary mode" that often refuses to summarize current news articles (the world is not a great place, some of these stories are a bummer) is near worthless.
8
0
781
2w