Explore the power of machine learning and Apple Intelligence within apps. Discuss integrating features, share best practices, and explore the possibilities for your app here.

All subtopics
Posts under Machine Learning & AI topic

Post

Replies

Boosts

Views

Activity

A Summary of the WWDC25 Group Lab - Machine Learning and AI Frameworks
At WWDC25 we launched a new type of Lab event for the developer community - Group Labs. A Group Lab is a panel Q&A designed for a large audience of developers. Group Labs are a unique opportunity for the community to submit questions directly to a panel of Apple engineers and designers. Here are the highlights from the WWDC25 Group Lab for Machine Learning and AI Frameworks. What are you most excited about in the Foundation Models framework? The Foundation Models framework provides access to an on-device Large Language Model (LLM), enabling entirely on-device processing for intelligent features. This allows you to build features such as personalized search suggestions and dynamic NPC generation in games. The combination of guided generation and streaming capabilities is particularly exciting for creating delightful animations and features with reliable output. The seamless integration with SwiftUI and the new design material Liquid Glass is also a major advantage. When should I still bring my own LLM via CoreML? It's generally recommended to first explore Apple's built-in system models and APIs, including the Foundation Models framework, as they are highly optimized for Apple devices and cover a wide range of use cases. However, Core ML is still valuable if you need more control or choice over the specific model being deployed, such as customizing existing system models or augmenting prompts. Core ML provides the tools to get these models on-device, but you are responsible for model distribution and updates. Should I migrate PyTorch code to MLX? MLX is an open-source, general-purpose machine learning framework designed for Apple Silicon from the ground up. It offers a familiar API, similar to PyTorch, and supports C, C++, Python, and Swift. MLX emphasizes unified memory, a key feature of Apple Silicon hardware, which can improve performance. It's recommended to try MLX and see if its programming model and features better suit your application's needs. MLX shines when working with state-of-the-art, larger models. Can I test Foundation Models in Xcode simulator or device? Yes, you can use the Xcode simulator to test Foundation Models use cases. However, your Mac must be running macOS Tahoe. You can test on a physical iPhone running iOS 18 by connecting it to your Mac and running Playgrounds or live previews directly on the device. Which on-device models will be supported? any open source models? The Foundation Models framework currently supports Apple's first-party models only. This allows for platform-wide optimizations, improving battery life and reducing latency. While Core ML can be used to integrate open-source models, it's generally recommended to first explore the built-in system models and APIs provided by Apple, including those in the Vision, Natural Language, and Speech frameworks, as they are highly optimized for Apple devices. For frontier models, MLX can run very large models. How often will the Foundational Model be updated? How do we test for stability when the model is updated? The Foundation Model will be updated in sync with operating system updates. You can test your app against new model versions during the beta period by downloading the beta OS and running your app. It is highly recommended to create an "eval set" of golden prompts and responses to evaluate the performance of your features as the model changes or as you tweak your prompts. Report any unsatisfactory or satisfactory cases using Feedback Assistant. Which on-device model/API can I use to extract text data from images such as: nutrition labels, ingredient lists, cashier receipts, etc? Thank you. The Vision framework offers the RecognizeDocumentRequest which is specifically designed for these use cases. It not only recognizes text in images but also provides the structure of the document, such as rows in a receipt or the layout of a nutrition label. It can also identify data like phone numbers, addresses, and prices. What is the context window for the model? What are max tokens in and max tokens out? The context window for the Foundation Model is 4,096 tokens. The split between input and output tokens is flexible. For example, if you input 4,000 tokens, you'll have 96 tokens remaining for the output. The API takes in text, converting it to tokens under the hood. When estimating token count, a good rule of thumb is 3-4 characters per token for languages like English, and 1 character per token for languages like Japanese or Chinese. Handle potential errors gracefully by asking for shorter prompts or starting a new session if the token limit is exceeded. Is there a rate limit for Foundation Models API that is limited by power or temperature condition on the iPhone? Yes, there are rate limits, particularly when your app is in the background. A budget is allocated for background app usage, but exceeding it will result in rate-limiting errors. In the foreground, there is no rate limit unless the device is under heavy load (e.g., camera open, game mode). The system dynamically balances performance, battery life, and thermal conditions, which can affect the token throughput. Use appropriate quality of service settings for your tasks (e.g., background priority for background work) to help the system manage resources effectively. Do the foundation models support languages other than English? Yes, the on-device Foundation Model is multilingual and supports all languages supported by Apple Intelligence. To get the model to output in a specific language, prompt it with instructions indicating the user's preferred language using the locale API (e.g., "The user's preferred language is en-US"). Putting the instructions in English, but then putting the user prompt in the desired output language is a recommended practice. Are larger server-based models available through Foundation Models? No, the Foundation Models API currently only provides access to the on-device Large Language Model at the core of Apple Intelligence. It does not support server-side models. On-device models are preferred for privacy and for performance reasons. Is it possible to run Retrieval-Augmented Generation (RAG) using the Foundation Models framework? Yes, it is possible to run RAG on-device, but the Foundation Models framework does not include a built-in embedding model. You'll need to use a separate database to store vectors and implement nearest neighbor or cosine distance searches. The Natural Language framework offers simple word and sentence embeddings that can be used. Consider using a combination of Foundation Models and Core ML, using Core ML for your embedding model.
1
0
754
Jun ’25
Unsupported type in JAX metal PJRT plugin with rng_bit_generator
Hi all, When executing an HLO program using the JAX metal PJRT plugin, the program fails due to an unsupported data type returned by the rng_bit_generator operation. The generated HLO includes: %output_state, %output = "mhlo.rng_bit_generator"(%1) <{rng_algorithm = #mhlo.rng_algorithm<PHILOX>}> : (tensor<3xi64>) -> (tensor<3xi64>, tensor<3xui32>) The error message indicates that: Metal only supports MPSDataTypeFloat16, MPSDataTypeBFloat16, MPSDataTypeFloat32, MPSDataTypeInt32, and MPSDataTypeInt64. The use of ui32 seems to be incompatible with Metal’s allowed types. I’m trying to understand if the ui32 output is the problem or maybe the use of rng_bit_generator is wrong. Could you clarify if there is a workaround or planned support for ui32 output in this context? Alternatively, guidance on configuring rng_bit_generator for compatibility with Metal’s supported types would be greatly appreciated.
0
0
514
Nov ’24
Keras on Mac (M4) is giving inconsistent results compared to running on NVIDIA GPUs
I have seen inconsistent results for my Colab machine learning notebooks running locally on a Mac M4, compared to running the same notebook code on either T4 (in Colab) or a RTX3090 locally. To illustrate the problems I have set up a notebook that implements two simple CNN models that solves the Fashion-MNIST problem. https://colab.research.google.com/drive/11BhtHhN079-BWqv9QvvcSD9U4mlVSocB?usp=sharing For the good model with 2M parameters I get the following results: T4 (Colab, JAX): Test accuracy: 0.925 3090 (Local PC via ssh tunnel, Jax): Test accuracy: 0.925 Mac M4 (Local, JAX): Test accuracy: 0.893 Mac M4 (Local, Tensorflow): Test accuracy: 0.893 That is, I see a significant drop in performance when I run on the Mac M4 compared to the NVIDIA machines, and it seems to be independent of backend. I however do not know how to pinpoint this to either Keras or Apple’s METAL implementation. I have reported this to Keras: https://colab.research.google.com/drive/11BhtHhN079-BWqv9QvvcSD9U4mlVSocB?usp=sharing but as this can be (likely is?) an Apple Metal issue, I wanted to report this here as well. On the mac I am running the following Python libraries: keras 3.9.1 tensorflow 2.19.0 tensorflow-metal 1.2.0 jax 0.5.3 jax-metal 0.1.1 jaxlib 0.5.3
0
0
79
Mar ’25
ILMessageFilterExtension memory limit
I’m considering creating an ILMessageFilterExtension using a mini LLM/SLM to detect fraud and I’ve read it has strict memory limits yet I can’t find it in the documentation. What’s the set limit or any other constraints impacting the feasibility of running 100-500mb model?
0
0
46
Apr ’25
Issue with OCR on Swift iOS App: Roboflow API Bounding Boxes Missing After Response
Hi everyone, I'm working on an iOS app built in Swift using Xcode, where I'm integrating Roboflow's object detection API to extract items from grocery receipts. My goal is to identify key information (like items, total, tax, etc.) from the images of these receipts. I'm successfully sending images to the Roboflow API and receiving predictions with bounding box data, but when I attempt to extract text from the detected regions (bounding boxes), it appears that the text extraction is failing—no text is being recognized. The issue seems to be that the bounding boxes are either not properly being handled or something is going wrong in the way I process the API response. Here's a brief breakdown of what I'm doing: The image is captured, converted to base64, and sent to the Roboflow API. The API response comes back with bounding boxes for the detected elements (items, date, subtotal, etc.). The problem occurs when I try to extract the text from the image using the bounding box data—it seems like the bounding boxes are being found, but no text is returned. I suspect the issue might be happening because the app’s segue to the results view controller is triggered before the OCR extraction completes, or there might be a problem in my code handling the bounding box response. Response Data: { "inference_id": "77134cce-91b5-4600-a59b-fab74350ca06", "time": 0.09240847699993537, "image": { "width": 370, "height": 502 }, "predictions": [ { "x": 163.5, "y": 250.5, "width": 313.0, "height": 127.0, "confidence": 0.9357666373252869, "class": "Item", "class_id": 1, "detection_id": "753341d5-07b6-42a1-8926-ecbc61128243" }, { "x": 52.5, "y": 417.5, "width": 89.0, "height": 23.0, "confidence": 0.8819760680198669, "class": "Date", "class_id": 0, "detection_id": "b4681149-d538-47b1-8700-d9528bf1daa0" }, ... ] } And the log showing bounding boxes: Prediction: ["width": 313, "y": 250.5, "x": 163.5, "detection_id": 753341d5-07b6-42a1-8926-ecbc61128243, "class": Item, "height": 127, "confidence": 0.9357666373252869, "class_id": 1] No bounding box found in prediction. I've double-checked the bounding box coordinates, and everything seems fine. Does anyone have experience with using OCR alongside object detection APIs in Swift? Any help on how to ensure the bounding boxes are properly processed and used for OCR would be greatly appreciated! Also, would it help to delay the segue to the results view controller until OCR is complete? Thank you!
0
0
582
Sep ’24
Efficient Clustering of Images Using VNFeaturePrintObservation.computeDistance
Hi everyone, I'm working with VNFeaturePrintObservation in Swift to compute the similarity between images. The computeDistance function allows me to calculate the distance between two images, and I want to cluster similar images based on these distances. Current Approach Right now, I'm using a brute-force approach where I compare every image against every other image in the dataset. This results in an O(n^2) complexity, which quickly becomes a bottleneck. With 5000 images, it takes around 10 seconds to complete, which is too slow for my use case. Question Are there any efficient algorithms or data structures I can use to improve performance? If anyone has experience with optimizing feature vector clustering or has suggestions on how to scale this efficiently, I'd really appreciate your insights. Thanks!
0
0
510
Feb ’25
"failed to processImage" in videoProcessor
Hello, I’m working on a program that analyzes video files frame by frame to detect human poses in each frame. However, during the process of reading observations from the stream, the analysis frequently stops with the following error: [LOG_ERROR] /Library/Caches/com.apple.xbs/Sources/MediaAnalysis/VideoProcessing/VCPHumanPoseImageRequest.mm[85]: code -18 [LOG_ERROR] /Library/Caches/com.apple.xbs/Sources/MediaAnalysis/VideoProcessing/VCPHumanPoseImageRequest.mm[178]: code -18 The error was caught and printed using a do-catch block, and here is the output: Error Domain=NSOSStatusErrorDomain Code=-18 "Error: failed to processImage" UserInfo={NSLocalizedDescription=Error: failed to processImage} While the do-catch block helps prevent the app from crashing, the frames following the error cannot be analyzed. I’m hoping to understand the cause of this error, or find a way to skip the problematic frames and continue analyzing the subsequent ones. My development environment is Xcode Version 16.0 (16A242d) and iOS 18.0. Thank you for your help. (Attaching my code below.) let videoProcessor = VideoProcessor(videoURL) let bodyPoseRequest = DetectHumanBodyPoseRequest() let asset = AVURLAsset(url: videoURL) let videoTrack = try await asset.loadTracks(withMediaType: .video).first let bodyPoseStream = try await videoProcessor.addRequest(bodyPoseRequest) videoProcessor.startAnalysis() do { for try await observations in bodyPoseStream { guard let observation = observations.first else { continue } if let timeRange = observation.timeRange { /// do something... } } } catch { print("\(error.localizedDescription)") }
0
1
381
Oct ’24
Can I Perform Hybrid Execution on Neural Engine and CPU with 16-bit Precision?
Hello, I have a question regarding hybrid execution for deep learning models on Apple's Neural Engine and CPU. I am aware that setting the precision of some layers to 32-bit allows hybrid execution across both the Neural Engine and the CPU. However, I would like to know if it is possible to achieve the same with 16-bit precision. Is there any specific configuration or workaround to enable hybrid execution in this case? Any guidance or documentation references would be greatly appreciated. Thank you!
0
0
410
Jan ’25
Custom keypoint detection model through vision api
Hi there, I have a custom keypoint detection model and want to use it via vision's CoremlRequest API. Here's some complication for input and output: For input My model expect 512x512 a image. Which would be resized and padded from a 1920x1080 frame. I use the .scaleToFit option, but can I also specify the color used for padding? For output: My model output a CoreMLFeatureValueObservation, can I have it output in a format vision recognizes? such as joints/keypoints If my model is able to output in a format vision recognizes, would it take care to restoring the coordinates back to the original frame? (undo the padding) If not, how do I restore it from .scaletofit option? Best,
0
0
746
2w
SwiftUI App Intent throws error when using requestDisambiguation with @Parameter property wrapper
I'm implementing an App Intent for my iOS app that helps users plan trip activities. It only works when run as a shortcut but not using voice through Siri. There are 2 issues: The ShortcutsTripEntity will only accept a voice input for a specific trip but not others. I'm stuck with a throwing error when trying to use requestDisambiguation() on the activity day @Parameter property. How do I rectify these issues. This is blocking me from completing a critical feature that lets users quickly plan activities through Siri and Shortcuts. Expected behavior for trip input: The intent should make Siri accept the spoken trip input from any of the options. Actual behavior for trip input: Siri only accepts the same trip when spoken but accepts any when selected by click/touch. Expected behavior for day input: Siri should accept the spoken selected option. Actual behavior for day input: Siri only accepts an input by click/touch but yet throws an error at runtime I'm happy to provide more code. But here's the relevant code: struct PlanActivityTestIntent: AppIntent { @Parameter(title: "Activity Day") var activityDay: ShortcutsItineraryDayEntity @Parameter( title: "Trip", description: "The trip to plan an activity for", default: ShortcutsTripEntity(id: UUID().uuidString, title: "Untitled trip"), requestValueDialog: "Which trip would you like to add an activity to?" ) var tripEntity: ShortcutsTripEntity @Parameter(title: "Activity Title", description: "The title of the activity", requestValueDialog: "What do you want to do or see?") var title: String @Parameter(title: "Activity Day", description: "Activity Day", default: ShortcutsItineraryDayEntity(itineraryDay: .init(itineraryId: UUID(), date: .now), timeZoneIdentifier: "UTC")) var activityDay: ShortcutsItineraryDayEntity func perform() async throws -> some ProvidesDialog { // ...other code... let tripsStore = TripsStore() // load trips and map them to entities try? await tripsStore.getTrips() let tripsAsEntities = tripsStore.trips.map { trip in let id = trip.id ?? UUID() let title = trip.title return ShortcutsTripEntity(id: id.uuidString, title: title, trip: trip) } // Ask user to select a trip. This line would doesn't accept a voice // answer. Why? let selectedTrip = try await $tripEntity.requestDisambiguation( among: tripsAsEntities, dialog: .init( full: "Which of the \(tripsAsEntities.count) trip would you like to add an activity to?", supporting: "Select a trip", systemImageName: "safari.fill" ) ) // This line throws an error let selectedDay = try await $activityDay.requestDisambiguation( among: daysAsEntities, dialog:"Which day would you like to plan an activity for?" ) } } Here are some related images that might help:
0
0
125
Jul ’25
Tapping Take Photo reloads Image Playground sheet on iOS
I integrated the image playground sheet in my app, however when I select Take Photo on the iOS version of my app it just reloads the sheet. After several attempts I get the below error message. This issue doesn’t occur on the macOS version of my app, where it first requests camera permission before allowing me to take the photo. I’m not sure if this is happening because I don’t request the camera permission anywhere in my app. My app doesn’t use the camera at all apart from the Take Photo feature which is part of the image playground sheet. Feedback ID: FB15591786
0
0
482
Oct ’24
Is there anywhere to get precompiled WhisperKit models for Swift?
If try to dynamically load WhipserKit's models, as in below, the download never occurs. No error or anything. And at the same time I can still get to the huggingface.co hosting site without any headaches, so it's not a blocking issue. let config = WhisperKitConfig( model: "openai_whisper-large-v3", modelRepo: "argmaxinc/whisperkit-coreml" ) So I have to default to the tiny model as seen below. I have tried so many ways, using ChatGPT and others, to build the models on my Mac, but too many failures, because I have never dealt with builds like that before. Are there any hosting sites that have the models (small, medium, large) already built where I can download them and just bundle them into my project? Wasted quite a large amount of time trying to get this done. import Foundation import WhisperKit @MainActor class WhisperLoader: ObservableObject { var pipe: WhisperKit? init() { Task { await self.initializeWhisper() } } private func initializeWhisper() async { do { Logging.shared.logLevel = .debug Logging.shared.loggingCallback = { message in print("[WhisperKit] \(message)") } let pipe = try await WhisperKit() // defaults to "tiny" self.pipe = pipe print("initialized. Model state: \(pipe.modelState)") guard let audioURL = Bundle.main.url(forResource: "44pf", withExtension: "wav") else { fatalError("not in bundle") } let result = try await pipe.transcribe(audioPath: audioURL.path) print("result: \(result)") } catch { print("Error: \(error)") } } }
0
0
69
Jun ’25
CoreML Model Conversion Help
I’m trying to follow Apple’s “WWDC24: Bring your machine learning and AI models to Apple Silicon” session to convert the Mistral-7B-Instruct-v0.2 model into a Core ML package, but I’ve run into a roadblock that I can’t seem to overcome. I’ve uploaded my full conversion script here for reference: https://pastebin.com/T7Zchzfc When I run the script, it progresses through tracing and MIL conversion but then fails at the backend_mlprogram stage with this error: https://pastebin.com/fUdEzzKM The core of the error is: ValueError: Op "keyCache_tmp" (op_type: identity) Input x="keyCache" expects list, tensor, or scalar but got state[tensor[1,32,8,2048,128,fp16]] I’ve registered my KV-cache buffers in a StatefulMistralWrapper subclass of nn.Module, matching the keyCache and valueCache state names in my ct.StateType definitions, but Core ML’s backend pass reports the state tensor as an invalid input. I’m using Core ML Tools 8.3.0 on Python 3.9.6, targeting iOS18, and forcing CPU conversion (MPS wasn’t available). Any pointers on how to satisfy the handle_unused_inputs pass or properly declare/cache state for GQA models in Core ML would be greatly appreciated! Thanks in advance for your help, Usman Khan
0
0
144
May ’25
Apple Intelligence & iMac M1
Trying to get on the waitlist for the above and the computer is saying: “Apple Intelligence is not available when Mac is set to English (Singapore)”. When just a few more bullet points below my Language selection shows “English (United States)”. That’s the only thing I can see, of course you guys are the experts. I would like to be part of this AI experiment/experience. Thanks for any help you can give to this 35+ year Mac user. Lee W
0
0
366
Nov ’24
Threading issues when using debugger
Hi, I am modifying the sample camera app that is here: https://developer.apple.com/tutorials/sample-apps/capturingphotos-camerapreview ... In the processPreviewImages, I am using the Vision APIs to generate a segmentation mask for a person/object, then compositing that person onto a different background (with some other filtering). The filtering and compositing is done via CoreImage. At the end, I convert the CIImage to a CGImage then to a SwiftUI Image. When I run it on my iPhone, it works fine, and has not crashed. When I run it on the iPhone with the debugger, it crashes within a few seconds with: EXC_BAD_ACCESS in libRPAC.dylib`std::__1::__hash_table<std::__1::__hash_value_type<long, qos_info_t>, std::__1::__unordered_map_hasher<long, std::__1::__hash_value_type<long, qos_info_t>, std::__1::hash, std::__1::equal_to, true>, std::__1::__unordered_map_equal<long, std::__1::__hash_value_type<long, qos_info_t>, std::__1::equal_to, std::__1::hash, true>, std::__1::allocator<std::__1::__hash_value_type<long, qos_info_t>>>::__emplace_unique_key_args<long, std::__1::piecewise_construct_t const&, std::__1::tuple<long const&>, std::__1::tuple<>>: It had previously been working fine with the debugger, so I'm not sure what has changed. Is there a difference in how the Vision APIs are executed if the debugger is attached vs. not?
0
0
58
Apr ’25
Permanent location for CoreML models
The Core ML developer guide recommends saving reusable compiled Core ML models to a permanent location to avoid unnecessary rebuilds when creating a Core ML model instance. However, there is no location that remains consistent across app updates, since each update changes the UUID associated with the app’s resources path /var/mobile/Containers/Data/Application/<UUID>/Library/Application Support/ As a result, Core ML rebuilds models even if they are unchanged and located in the same relative directory within the app’s file structure.
0
0
513
Dec ’24
Seeking Feedback on an Idea: Real-Time Siri Running Coach for iOS
Hello everyone, I hope you’re all doing well. I’m not a developer, but I have an idea for an iOS app that I’d love to get your thoughts on. I wanted to share it here to gather feedback from this knowledgeable community and to learn from your expertise. Idea Overview: Real-Time AI Running Coach for iOS The concept is an iOS application that provides personalized, real-time running coaching by leveraging on-device data sources and Apple’s latest technologies. The app aims to offer an adaptive and motivating running experience while ensuring user privacy through on-device processing. Key Features: • Personalized Coaching: • Utilize real-time biometric data and personal insights to deliver AI-driven coaching tailored to the user’s mental and physical state. • Analyze health metrics, activity data, mood check-ins, and more to provide context-based motivational feedback. • Privacy First: • All data processing occurs on-device using Apple’s frameworks like Core ML, ensuring no personal data leaves the device. • Adaptive Motivation: • Implement Natural Language Processing to analyze user inputs like journal entries or mood check-ins. • Generate personalized coaching cues based on historical performance and mood trends. • Performance Enhancement: • Offer dynamic adjustments to pace, route, and strategy in real time to help improve running performance. • Seamless integration with Apple Watch for real-time data collection and haptic feedback. Technologies and Frameworks Involved: • HealthKit: Access health metrics such as heart rate, distance run, VO₂ max, sleep patterns, etc. • Core ML: On-device machine learning for real-time data analysis without latency. • Natural Language Processing: Analyze personal inputs for better coaching personalization. • Core Motion & Core Location: Track motion data and location services for runs. • AVFoundation & Speech: Provide real-time voice feedback and coaching cues. • SiriKit Integration: Allow users to initiate workouts and receive updates via Siri. Target Audience: • Runners of all levels seeking personalized coaching that adapts to their mental and physical states. • Users who prioritize privacy and want AI-driven insights without their data leaving the device. • Tech-savvy fitness enthusiasts who use iOS devices and Apple wearables. Questions for the Community: 1. Feasibility: Is this idea technically achievable using current iOS frameworks and technologies? 2. Data Access: Are there limitations in accessing and processing the necessary data on-device, especially regarding privacy and permissions? 3. Potential Challenges: What hurdles might developers face in creating such an app, and how could they be addressed? 4. Advice: As someone without a technical background, what steps would you recommend I take to move this idea forward? I truly appreciate any feedback or insights you can provide. I’m excited about the potential of this idea but also aware there may be complexities I’m not considering. Thank you for taking the time to read this! Best regards, Paul
0
0
463
Oct ’24