Explore the power of machine learning within apps. Discuss integrating machine learning features, share best practices, and explore the possibilities for your app.

Posts under General subtopic

Post

Replies

Boosts

Views

Activity

NLTagger.requestAssets hangs indefinitely
When calling NLTagger.requestAssets with some languages, it hangs indefinitely both in the simulator and a device. This happens consistently for some languages like greek. An example call is NLTagger.requestAssets(for: .greek, tagScheme: .lemma). Other languages like french return immediately. I captured some logs from Console and found what looks like the repeated attempts to download the asset. I would expect the call to eventually terminate, either loading the asset or failing with an error.
1
0
173
May ’25
Is there an API for the 3D effect from flat photos?
Introduced in the Keynote was the 3D Lock Screen images with the kangaroo: https://9to5mac.com/wp-content/uploads/sites/6/2025/06/3d-lock-screen-2.gif I can't see any mention on if this effect is available for developers with an API to convert flat 2D photos in to the same 3D feeling image. Does anyone know if there is an API?
1
1
93
Jun ’25
Vision Framework - Testing RecognizeDocumentsRequest
How do I test the new RecognizeDocumentRequest API. Reference: https://www.youtube.com/watch?v=H-GCNsXdKzM I am running Xcode Beta, however I only have one primary device that I cannot install beta software on. Please provide a strategy for testing. Will simulator work? The new capability is critical to my application, just what I need for structuring document scans and extraction. Thank you.
1
0
217
Jun ’25
AI-Powered Feed Customization via User-Defined Algorithm
Hey guys 👋 I’ve been thinking about a feature idea for iOS that could totally change the way we interact with apps like Twitter/X. Imagine if we could define our own recommendation algorithm, and have an AI on the iPhone that replaces the suggested tweets in the feed with ones that match our personal interests — based on public tweets, and without hacking anything. Kinda like a personalized "AI skin" over the app that curates content you actually care about. Feels like this would make content way more relevant and less algorithmically manipulative. Would love to know what you all think — and if Apple could pull this off 🔥
1
0
79
Jun ’25
SpeechAnalyzer / AssetInventory and preinstalled assets
During testing the “Bringing advanced speech-to-text capabilities to your app” sample app demonstrating the use of iOS 26 SpeechAnalyzer, I noticed that the language model for the English locale was presumably already downloaded. Upon checking the documentation of AssetInventory, I found out that indeed, the language model can be preinstalled on the system. Can someone from the dev team share more info about what assets are preinstalled by the system? For example, can we safely assume that the English language model will almost certainly be already preinstalled by the OS if the phone has the English locale?
1
0
210
Jul ’25
A Summary of the WWDC25 Group Lab - Machine Learning and AI Frameworks
At WWDC25 we launched a new type of Lab event for the developer community - Group Labs. A Group Lab is a panel Q&A designed for a large audience of developers. Group Labs are a unique opportunity for the community to submit questions directly to a panel of Apple engineers and designers. Here are the highlights from the WWDC25 Group Lab for Machine Learning and AI Frameworks. What are you most excited about in the Foundation Models framework? The Foundation Models framework provides access to an on-device Large Language Model (LLM), enabling entirely on-device processing for intelligent features. This allows you to build features such as personalized search suggestions and dynamic NPC generation in games. The combination of guided generation and streaming capabilities is particularly exciting for creating delightful animations and features with reliable output. The seamless integration with SwiftUI and the new design material Liquid Glass is also a major advantage. When should I still bring my own LLM via CoreML? It's generally recommended to first explore Apple's built-in system models and APIs, including the Foundation Models framework, as they are highly optimized for Apple devices and cover a wide range of use cases. However, Core ML is still valuable if you need more control or choice over the specific model being deployed, such as customizing existing system models or augmenting prompts. Core ML provides the tools to get these models on-device, but you are responsible for model distribution and updates. Should I migrate PyTorch code to MLX? MLX is an open-source, general-purpose machine learning framework designed for Apple Silicon from the ground up. It offers a familiar API, similar to PyTorch, and supports C, C++, Python, and Swift. MLX emphasizes unified memory, a key feature of Apple Silicon hardware, which can improve performance. It's recommended to try MLX and see if its programming model and features better suit your application's needs. MLX shines when working with state-of-the-art, larger models. Can I test Foundation Models in Xcode simulator or device? Yes, you can use the Xcode simulator to test Foundation Models use cases. However, your Mac must be running macOS Tahoe. You can test on a physical iPhone running iOS 18 by connecting it to your Mac and running Playgrounds or live previews directly on the device. Which on-device models will be supported? any open source models? The Foundation Models framework currently supports Apple's first-party models only. This allows for platform-wide optimizations, improving battery life and reducing latency. While Core ML can be used to integrate open-source models, it's generally recommended to first explore the built-in system models and APIs provided by Apple, including those in the Vision, Natural Language, and Speech frameworks, as they are highly optimized for Apple devices. For frontier models, MLX can run very large models. How often will the Foundational Model be updated? How do we test for stability when the model is updated? The Foundation Model will be updated in sync with operating system updates. You can test your app against new model versions during the beta period by downloading the beta OS and running your app. It is highly recommended to create an "eval set" of golden prompts and responses to evaluate the performance of your features as the model changes or as you tweak your prompts. Report any unsatisfactory or satisfactory cases using Feedback Assistant. Which on-device model/API can I use to extract text data from images such as: nutrition labels, ingredient lists, cashier receipts, etc? Thank you. The Vision framework offers the RecognizeDocumentRequest which is specifically designed for these use cases. It not only recognizes text in images but also provides the structure of the document, such as rows in a receipt or the layout of a nutrition label. It can also identify data like phone numbers, addresses, and prices. What is the context window for the model? What are max tokens in and max tokens out? The context window for the Foundation Model is 4,096 tokens. The split between input and output tokens is flexible. For example, if you input 4,000 tokens, you'll have 96 tokens remaining for the output. The API takes in text, converting it to tokens under the hood. When estimating token count, a good rule of thumb is 3-4 characters per token for languages like English, and 1 character per token for languages like Japanese or Chinese. Handle potential errors gracefully by asking for shorter prompts or starting a new session if the token limit is exceeded. Is there a rate limit for Foundation Models API that is limited by power or temperature condition on the iPhone? Yes, there are rate limits, particularly when your app is in the background. A budget is allocated for background app usage, but exceeding it will result in rate-limiting errors. In the foreground, there is no rate limit unless the device is under heavy load (e.g., camera open, game mode). The system dynamically balances performance, battery life, and thermal conditions, which can affect the token throughput. Use appropriate quality of service settings for your tasks (e.g., background priority for background work) to help the system manage resources effectively. Do the foundation models support languages other than English? Yes, the on-device Foundation Model is multilingual and supports all languages supported by Apple Intelligence. To get the model to output in a specific language, prompt it with instructions indicating the user's preferred language using the locale API (e.g., "The user's preferred language is en-US"). Putting the instructions in English, but then putting the user prompt in the desired output language is a recommended practice. Are larger server-based models available through Foundation Models? No, the Foundation Models API currently only provides access to the on-device Large Language Model at the core of Apple Intelligence. It does not support server-side models. On-device models are preferred for privacy and for performance reasons. Is it possible to run Retrieval-Augmented Generation (RAG) using the Foundation Models framework? Yes, it is possible to run RAG on-device, but the Foundation Models framework does not include a built-in embedding model. You'll need to use a separate database to store vectors and implement nearest neighbor or cosine distance searches. The Natural Language framework offers simple word and sentence embeddings that can be used. Consider using a combination of Foundation Models and Core ML, using Core ML for your embedding model.
1
0
1.3k
Jun ’25
Real Time Text detection using iOS18 RecognizeTextRequest from video buffer returns gibberish
Hey Devs, I'm trying to create my own Real Time Text detection like this Apple project. https://developer.apple.com/documentation/vision/extracting-phone-numbers-from-text-in-images I want to use the new iOS18 RecognizeTextRequest instead of the old VNRecognizeTextRequest in my SwiftUI project. This is my delegate code with the camera setup. I removed region of interest for debugging but I'm trying to scan English words in books. The idea is to get one word in the ROI in the future. But I can't even get proper words so testing without ROI incase my math is wrong. @Observable class CameraManager: NSObject, AVCapturePhotoCaptureDelegate ... override init() { super.init() setUpVisionRequest() } private func setUpVisionRequest() { textRequest = RecognizeTextRequest(.revision3) } ... func setup() -> Bool { captureSession.beginConfiguration() guard let captureDevice = AVCaptureDevice.default( .builtInWideAngleCamera, for: .video, position: .back) else { return false } self.captureDevice = captureDevice guard let deviceInput = try? AVCaptureDeviceInput(device: captureDevice) else { return false } /// Check whether the session can add input. guard captureSession.canAddInput(deviceInput) else { print("Unable to add device input to the capture session.") return false } /// Add the input and output to session captureSession.addInput(deviceInput) /// Configure the video data output videoDataOutput.setSampleBufferDelegate( self, queue: videoDataOutputQueue) if captureSession.canAddOutput(videoDataOutput) { captureSession.addOutput(videoDataOutput) videoDataOutput.connection(with: .video)? .preferredVideoStabilizationMode = .off } else { return false } // Set zoom and autofocus to help focus on very small text do { try captureDevice.lockForConfiguration() captureDevice.videoZoomFactor = 2 captureDevice.autoFocusRangeRestriction = .near captureDevice.unlockForConfiguration() } catch { print("Could not set zoom level due to error: \(error)") return false } captureSession.commitConfiguration() // potential issue with background vs dispatchqueue ?? Task(priority: .background) { captureSession.startRunning() } return true } } // Issue here ??? extension CameraManager: AVCaptureVideoDataOutputSampleBufferDelegate { func captureOutput( _ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection ) { guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return } Task { textRequest.recognitionLevel = .fast textRequest.recognitionLanguages = [Locale.Language(identifier: "en-US")] do { let observations = try await textRequest.perform(on: pixelBuffer) for observation in observations { let recognizedText = observation.topCandidates(1).first print("recognized text \(recognizedText)") } } catch { print("Recognition error: \(error.localizedDescription)") } } } } The results I get look like this ( full page of English from a any book) recognized text Optional(RecognizedText(string: e bnUI W4, confidence: 0.5)) recognized text Optional(RecognizedText(string: ?'U, confidence: 0.3)) recognized text Optional(RecognizedText(string: traQt4, confidence: 0.3)) recognized text Optional(RecognizedText(string: li, confidence: 0.3)) recognized text Optional(RecognizedText(string: 15,1,#, confidence: 0.3)) recognized text Optional(RecognizedText(string: jllÈ, confidence: 0.3)) recognized text Optional(RecognizedText(string: vtrll, confidence: 0.3)) recognized text Optional(RecognizedText(string: 5,1,: 11, confidence: 0.5)) recognized text Optional(RecognizedText(string: 1141, confidence: 0.3)) recognized text Optional(RecognizedText(string: jllll ljiiilij41, confidence: 0.3)) recognized text Optional(RecognizedText(string: 2f4, confidence: 0.3)) recognized text Optional(RecognizedText(string: ktril, confidence: 0.3)) recognized text Optional(RecognizedText(string: ¥LLI, confidence: 0.3)) recognized text Optional(RecognizedText(string: 11[Itl,, confidence: 0.3)) recognized text Optional(RecognizedText(string: 'rtlÈ131, confidence: 0.3)) Even with ROI set to a specific rectangle Normalized to Vision, I get the same results with single characters returning gibberish. Any help would be amazing thank you. Am I using the buffer right ? Am I using the new perform(on: CVPixelBuffer) right ? Maybe I didn't set up my camera properly? I can provide code
1
0
347
Jul ’25
LLM size for fine-tuning using MLX in MacBook
Hi, recently i tried to fine-tune Gemma-2-2b mlx model on my macbook (24 GB UMA). The code started running, after few seconds i saw swap size reaching 50GB and ram around 23 GB and then it stopped. I ran the Gemma-2-2b (cuda) on colab, it ran and occupied 27 GB on A100 gpu and worked fine. Here i didn't experienced swap issue. Now my question is if my UMA was more than 27 GB, i also would not have experienced swap disk issue. Thanks.
1
0
352
Oct ’25
Downloading my fine tuned model from huggingface
I have used mlx_lm.lora to fine tune a mistral-7b-v0.3-4bit model with my data. I fused the mistral model with my adapters and upload the fused model to my directory on huggingface. I was able to use mlx_lm.generate to use the fused model in Terminal. However, I don't know how to load the model in Swift. I've used Imports import SwiftUI import MLX import MLXLMCommon import MLXLLM let modelFactory = LLMModelFactory.shared let configuration = ModelConfiguration( id: "pharmpk/pk-mistral-7b-v0.3-4bit" ) // Load the model off the main actor, then assign on the main actor let loaded = try await modelFactory.loadContainer(configuration: configuration) { progress in print("Downloading progress: \(progress.fractionCompleted * 100)%") } await MainActor.run { self.model = loaded } I'm getting an error runModel error: downloadError("A server with the specified hostname could not be found.") Any suggestions? Thanks, David PS, I can load the model from the app bundle // directory: Bundle.main.resourceURL! but it's too big to upload for Testflight
1
0
532
Oct ’25
no tensorflow-metal past tf 2.18?
Hi We're on tensorflow 2.20 that has support now for python 3.13 (finally!). tensorflow-metal is still only supporting 2.18 which is over a year old. When can we expect to see support in tensorflow-metal for tf 2.20 (or later!) ? I bought a mac thinking I would be able to get great performance from the M processors but here I am using my CPU for my ML projects. If it's taking so long to release it, why not open source it so the community can keep it more up to date? cheers Matt
1
1
344
Nov ’25
Is MCP (Model Context Protocol) supported on iOS/macOS?
Hi team, I’m exploring the Model Context Protocol (MCP), which is used to connect LLMs/AI agents to external tools in a structured way. It's becoming a common standard for automation and agent workflows. Before I go deeper, I want to confirm: Does Apple currently provide any official MCP server, API surface, or SDK on iOS/macOS? From what I see, only third-party MCP servers exist for iOS simulators/devices, and Apple’s own frameworks (Foundation Models, Apple Intelligence) don’t expose MCP endpoints. Is there any chance Apple might introduce MCP support—or publish recommended patterns for safely integrating MCP inside apps or developer tools? I would like to see if I can share my app's data to the MCP server to enable other third-party apps/services to integrate easily
1
1
431
Dec ’25
Khmer Script Misidentified as Thai in Vision Framework
It is vital for Apple to refine its OCR models to correctly distinguish between Khmer and Thai scripts. Incorrectly labeling Khmer text as Thai is more than a technical bug; it is a culturally insensitive error that impacts national identity, especially given the current geopolitical climate between Cambodia and Thailand. Implementing a more robust language-detection threshold would prevent these harmful misidentifications. There is a significant logic flaw in the VNRecognizeTextRequest language detection when processing Khmer script. When the property automaticallyDetectsLanguage is set to true, the Vision framework frequently misidentifies Khmer characters as Thai. While both scripts share historical roots, they are distinct languages with different alphabets. Currently, the model’s confidence threshold for distinguishing between these two scripts is too low, leading to incorrect OCR output in both developer-facing APIs and Apple’s native ecosystem (Preview, Live Text, and Photos). import SwiftUI import Vision class TextExtractor { func extractText(from data: Data, completion: @escaping (String) -> Void) { let request = VNRecognizeTextRequest { (request, error) in guard let observations = request.results as? [VNRecognizedTextObservation] else { completion("No text found.") return } let recognizedStrings = observations.compactMap { observation in let str = observation.topCandidates(1).first?.string return "{text: \(str!), confidence: \(observation.confidence)}" } completion(recognizedStrings.joined(separator: "\n")) } request.automaticallyDetectsLanguage = true // <-- This is the issue. request.recognitionLevel = .accurate let handler = VNImageRequestHandler(data: data, options: [:]) DispatchQueue.global(qos: .background).async { do { try handler.perform([request]) } catch { completion("Failed to perform OCR: \(error.localizedDescription)") } } } } Recognizing Khmer Confidence Score is low for Khmer text. (The output is in Thai language with low confidence score) Recognizing English Confidence Score is high expected. Recognizing Thai Confidence Score is high as expected Issues on Preview, Photos Khmer text Copied text Kouk Pring Chroum Temple [19121 รอาสายสุกตีนานยารรีสใหิสรราภูชิตีนนสุฐตีย์ [รุก เผือชิษาธอยกัตธ์ตายตราพาษชาณา ถวเชยาใบสราเบรถทีมูสินตราพาษชาณา ทีมูโษา เช็ก อาษเชิษฐอารายสุกบดตพรธุรฯ ตากร"สุก"ผาตากรธกรธุกเยากสเผาพศฐตาสาย รัอรณาษ"ตีพย" สเผาพกรกฐาภูชิสาเครๆผู:สุกรตีพาสเผาพสรอสายใผิตรรารตีพสๆ เดียอลายสุกตีน ธาราชรติ ธิพรหณาะพูชุบละเาหLunet De Lajonquiere ผารูกรสาราพารผรผาสิตภพ ตารสิทูก ธิพิ คุณที่นสายเระพบพเคเผาหนารเกะทรนภาษเราภุพเสารเราษทีเลิกสญาเราหรุฬารชสเกาก เรากุม สงสอบานตรเราะากกต่ายภากายระตารุกเตียน Recommended Solutions 1. Set a Threshold Filter out the detected result where the threshold is less than or equal to 0.5, so that it would not output low quality text which can lead to the issue. For example, let recognizedStrings = observations.compactMap { observation in if observation.confidence <= 0.5 { return nil } let str = observation.topCandidates(1).first?.string return "{text: \(str!), confidence: \(observation.confidence)}" } 2. Add Khmer Language Support This issue would never happen if the model has the capability to detect and recognize image with Khmer language. Doc2Text GitHub: https://github.com/seanghay/Doc2Text-Swift
1
0
544
2w
Translation Framework: Code 16 "Offline models not available" despite status showing .installed
Hi everyone, I'm experiencing an inconsistent behavior with the Translation framework on iOS 18. The LanguageAvailability.status() API reports language models as .installed, but translation fails with Code 16. Setup: Using translationTask modifier with TranslationSession Batch translation with explicit source/target languages Languages: Portuguese→English, German→English Issue: let status = await LanguageAvailability().status(from: sourceLang, to: targetLang) // Returns: .installed // But translation fails: let responses = try await session.translations(from: requests) // Error: TranslationErrorDomain Code=16 "Offline models not available" Logs: Language model installed: pt -> en Language model installed: de -> en Starting translation: de -> en Error Domain=TranslationErrorDomain Code=16 "Translation failed"NSLocalizedFailureReason=Offline models not available for language pair What I've tried: Re-downloading languages in Settings Using source: nil for auto-detection Fresh TranslationSession.Configuration each time Questions: Is there a way to force model re-validation/re-download programmatically? Should translationTask show download popup when Code 16 occurs? Has anyone found a reliable workaround? I've seen similar reports in threads 791357 and 777113. Any guidance appreciated! Thanks!
1
0
303
2d
Efficient Clustering of Images Using VNFeaturePrintObservation.computeDistance
Hi everyone, I'm working with VNFeaturePrintObservation in Swift to compute the similarity between images. The computeDistance function allows me to calculate the distance between two images, and I want to cluster similar images based on these distances. Current Approach Right now, I'm using a brute-force approach where I compare every image against every other image in the dataset. This results in an O(n^2) complexity, which quickly becomes a bottleneck. With 5000 images, it takes around 10 seconds to complete, which is too slow for my use case. Question Are there any efficient algorithms or data structures I can use to improve performance? If anyone has experience with optimizing feature vector clustering or has suggestions on how to scale this efficiently, I'd really appreciate your insights. Thanks!
0
0
536
Feb ’25
Group AppIntents’ Searchable DynamicOptionsProvider in Sections
I’m trying to group my EntityPropertyQuery selection into sections as well as making it searchable. I know that the EntityStringQuery is used to perform the text search via entities(matching string: String). That works well enough and results in this modal: Though, when I’m using a DynamicOptionsProvider to section my EntityPropertyQuery, it doesn’t allow for searching anymore and simply opens the sectioned list in a menu like so: How can I combine both? I’ve seen it in other apps, but can’t figure out why my code doesn’t allow to section the results and make it searchable? Any ideas? My code (simplified) struct MyIntent: AppIntent { @Parameter(title: "Meter"), optionsProvider: MyOptionsProvider()) var meter: MyIntentEntity? // … struct MyOptionsProvider: DynamicOptionsProvider { func results() async throws -> ItemCollection<MyIntentEntity> { // Get All Data let allData = try IntentsDataHandler.shared.getEntities() // Create Arrays for Sections let fooEntities = allData.filter { $0.type == .foo } let barEntities = allData.filter { $0.type == .bar } return ItemCollection(sections: [ ItemSection("Foo", items: fooEntities), ItemSection("Bar", items: barEntities) ]) } } struct MeterIntentQuery: EntityStringQuery { // entities(for identifiers: [UUID]) and suggestedEntities() functions func entities(matching string: String) async throws -> [MyIntentEntity] { // Fetch All Data let allData = try IntentsDataHandler.shared.getEntities() // Filter Data by String let matchingData = allData.filter { data in return data.title.localizedCaseInsensitiveContains(string)) } return matchingData } }
0
2
599
Mar ’25
[MPSGraph runWithFeeds:targetTensors:targetOperations:] randomly crash
I'm implementing an LLM with Metal Performance Shader Graph, but encountered a very strange behavior, occasionally, the model will report an error message as this: LLVM ERROR: SmallVector unable to grow. Requested capacity (9223372036854775808) is larger than maximum value for size type (4294967295) and crash, the stack backtrace screenshot is attached. Note that 5th frame is mlir::getIntValues<long long> and 6th frame is llvm::SmallVectorBase<unsigned int>::grow_pod It looks like mlir mistakenly took a 64 bit value for a 32 bit type. Unfortunately, I could not found the source code of mlir::getIntValues, maybe it's Apple's closed source fork of llvm for MPS implementation? Anyway, any opinion or suggestion on that?
0
0
204
Mar ’25
Named Entity Recognition Model for Measurements
In an under-development MacOS & iOS app, I need to identify various measurements from OCR'ed text: length, weight, counts per inch, area, percentage. The unit type (e.g. UnitLength) needs to be identified as well as the measurement's unit (e.g. .inches) in order to convert the measurement to the app's internal standard (e.g. centimetres), the value of which is stored the relevant CoreData entity. The use of NLTagger and NLTokenizer is problematic because of the various representations of the measurements: e.g. "50g.", "50 g", "50 grams", "1 3/4 oz." Currently, I use a bespoke algorithm based on String contains and step-wise evaluation of characters, which is reasonably accurate but requires frequent updating as further representations are detected. I'm aware of the Python SpaCy model being capable of NER Measurement recognition, but am reluctant to incorporate a Python-based solution into a production app. (ref [https://developer.apple.com/forums/thread/30092]) My preference is for an open-source NER Measurement model that can be used as, or converted to, some form of a Swift compatible Machine Learning model. Does anyone know of such a model?
0
0
126
Mar ’25