Explore the power of machine learning and Apple Intelligence within apps. Discuss integrating features, share best practices, and explore the possibilities for your app here.

All subtopics
Posts under Machine Learning & AI topic

Post

Replies

Boosts

Views

Activity

Is it possible to create a virtual NPU device on macOS using Hypervisor.framework + CoreML?
Is it possible to expose a custom VirtIO device to a Linux guest running inside a VM — likely using QEMU backed by Hypervisor.framework. The guest would see this device as something like /dev/npu0, and it would use a kernel driver + userspace library to submit inference requests. On the macOS host, these requests would be executed using CoreML, MPSGraph, or BNNS. The results would be passed back to the guest via IPC. Does the macOS allow this kind of "fake" NPU / GPU
1
0
348
Aug ’25
Memory Attribution for Foundation Models in iOS 26
Hi, I’m developing an app targeting iOS 26, using the new FoundationModels framework to perform on-device LLM inference. I’m currently testing memory usage. Does the memory used by FoundationModels—including model weights, KV cache, and any inference-related buffers—count toward my app’s Jetsam memory limit, or is any of it managed separately by the system? I may need to run two concurrent inferences, each with a 4096-token context window. Is this explicitly supported or allowed by FoundationModels on iOS 26? Would this significantly increase the risk of memory-based termination? Thanks in advance for any clarification. Thanks.
1
0
400
Jul ’25
Localizing prompts that has string interpolated generable objects
I'm working on localizing my prompts to support multiple languages, and in some cases my prompts has String interpolated Generable objects. for example: "Given the following workout routine: \(routine), suggest one additional exercise to complement it." In the Strings dictionary, I'm only able to select String, Int or Double parameters using %@ and %lld. Has anyone found a way to accomplish this?
1
0
340
Jul ’25
Why doesn't tensorflow-metal use AMD GPU memory?
From tensorflow-metal example: Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: ) I know that Apple silicon uses UMA, and that memory copies are typical of CUDA, but wouldn't the GPU memory still be faster overall? I have an iMac Pro with a Radeon Pro Vega 64 16 GB GPU and an Intel iMac with a Radeon Pro 5700 8 GB GPU. But using tensorflow-metal is still WAY faster than using the CPUs. Thanks for that. I am surprised the 5700 is twice as fast as the Vega though.
1
0
190
Apr ’25
KV-Cache MLState Not Updating During Prefill Stage in Core ML LLM Inference
Hello, I'm running a large language model (LLM) in Core ML that uses a key-value cache (KV-cache) to store past attention states. The model was converted from PyTorch using coremltools and deployed on-device with Swift. The KV-cache is exposed via MLState and is used across inference steps for efficient autoregressive generation. During the prefill stage — where a prompt of multiple tokens is passed to the model in a single batch to initialize the KV-cache — I’ve noticed that some entries in the KV-cache are not updated after the inference. Specifically: Here are a few details about the setup: The MLState returned by the model is identical to the input state (often empty or zero-initialized) for some tokens in the batch. The issue only happens during the prefill stage (i.e., first call over multiple tokens). During decoding (single-token generation), the KV-cache updates normally. The model is invoked using MLModel.prediction(from:using:options:) for each batch. I’ve confirmed: The prompt tokens are non-repetitive and not masked. The model spec has MLState inputs/outputs correctly configured for KV-cache tensors. Each token is processed in a loop with the correct positional encodings. Questions: Is there any known behavior in Core ML that could prevent MLState from updating during batched or prefill inference? Could this be caused by internal optimizations such as lazy execution, static masking, or zero-value short-circuiting? How can I confirm that each token in the batch is contributing to the KV-cache during prefill? Any insights from the Core ML or LLM deployment community would be much appreciated.
1
0
164
May ’25
Real Time Text detection using iOS18 RecognizeTextRequest from video buffer returns gibberish
Hey Devs, I'm trying to create my own Real Time Text detection like this Apple project. https://developer.apple.com/documentation/vision/extracting-phone-numbers-from-text-in-images I want to use the new iOS18 RecognizeTextRequest instead of the old VNRecognizeTextRequest in my SwiftUI project. This is my delegate code with the camera setup. I removed region of interest for debugging but I'm trying to scan English words in books. The idea is to get one word in the ROI in the future. But I can't even get proper words so testing without ROI incase my math is wrong. @Observable class CameraManager: NSObject, AVCapturePhotoCaptureDelegate ... override init() { super.init() setUpVisionRequest() } private func setUpVisionRequest() { textRequest = RecognizeTextRequest(.revision3) } ... func setup() -> Bool { captureSession.beginConfiguration() guard let captureDevice = AVCaptureDevice.default( .builtInWideAngleCamera, for: .video, position: .back) else { return false } self.captureDevice = captureDevice guard let deviceInput = try? AVCaptureDeviceInput(device: captureDevice) else { return false } /// Check whether the session can add input. guard captureSession.canAddInput(deviceInput) else { print("Unable to add device input to the capture session.") return false } /// Add the input and output to session captureSession.addInput(deviceInput) /// Configure the video data output videoDataOutput.setSampleBufferDelegate( self, queue: videoDataOutputQueue) if captureSession.canAddOutput(videoDataOutput) { captureSession.addOutput(videoDataOutput) videoDataOutput.connection(with: .video)? .preferredVideoStabilizationMode = .off } else { return false } // Set zoom and autofocus to help focus on very small text do { try captureDevice.lockForConfiguration() captureDevice.videoZoomFactor = 2 captureDevice.autoFocusRangeRestriction = .near captureDevice.unlockForConfiguration() } catch { print("Could not set zoom level due to error: \(error)") return false } captureSession.commitConfiguration() // potential issue with background vs dispatchqueue ?? Task(priority: .background) { captureSession.startRunning() } return true } } // Issue here ??? extension CameraManager: AVCaptureVideoDataOutputSampleBufferDelegate { func captureOutput( _ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection ) { guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return } Task { textRequest.recognitionLevel = .fast textRequest.recognitionLanguages = [Locale.Language(identifier: "en-US")] do { let observations = try await textRequest.perform(on: pixelBuffer) for observation in observations { let recognizedText = observation.topCandidates(1).first print("recognized text \(recognizedText)") } } catch { print("Recognition error: \(error.localizedDescription)") } } } } The results I get look like this ( full page of English from a any book) recognized text Optional(RecognizedText(string: e bnUI W4, confidence: 0.5)) recognized text Optional(RecognizedText(string: ?'U, confidence: 0.3)) recognized text Optional(RecognizedText(string: traQt4, confidence: 0.3)) recognized text Optional(RecognizedText(string: li, confidence: 0.3)) recognized text Optional(RecognizedText(string: 15,1,#, confidence: 0.3)) recognized text Optional(RecognizedText(string: jllÈ, confidence: 0.3)) recognized text Optional(RecognizedText(string: vtrll, confidence: 0.3)) recognized text Optional(RecognizedText(string: 5,1,: 11, confidence: 0.5)) recognized text Optional(RecognizedText(string: 1141, confidence: 0.3)) recognized text Optional(RecognizedText(string: jllll ljiiilij41, confidence: 0.3)) recognized text Optional(RecognizedText(string: 2f4, confidence: 0.3)) recognized text Optional(RecognizedText(string: ktril, confidence: 0.3)) recognized text Optional(RecognizedText(string: ¥LLI, confidence: 0.3)) recognized text Optional(RecognizedText(string: 11[Itl,, confidence: 0.3)) recognized text Optional(RecognizedText(string: 'rtlÈ131, confidence: 0.3)) Even with ROI set to a specific rectangle Normalized to Vision, I get the same results with single characters returning gibberish. Any help would be amazing thank you. Am I using the buffer right ? Am I using the new perform(on: CVPixelBuffer) right ? Maybe I didn't set up my camera properly? I can provide code
1
0
304
Jul ’25
How to confirm whether CreatML is training
I am currently training a Tabular Classification model in CreatML. The dataset comprises 30 features, including 1,000,000 training data points and 1,000,000 verification data points. Could you please estimate the approximate training time for an M4Max MacBook Pro? During the training process, CreatML has been displaying the “Processing” status, but there is no progress bar. I would like to ascertain whether the training is still ongoing, as I have often suspected that it has ceased.
1
0
622
Jan ’25
About VisionKit DataScannerViewController
Hi I'm having a problem with DataScannerViewController, I'm using the volume barcode scanning feature in my app, prior to that I was using an AVCaptureDevice with the UltraWideAngle set. After discovering DataScannerViewController, we planned to replace the previous obsolete code with DataScannerViewController, all together it was ok, when I want to set the ultra wide angle, I don't know how to start. I tried to get the minZoomFactor and I realized that I get 0.0 I tried to set zoomFactor to 1.0 and I found that he is not valid Note: func dataScannerDidZoom(_ dataScanner: DataScannerViewController), when I try to get the minZoomFactor, set the zoomFactor in this proxy method, I find that it is valid! What should I do next, I want to use only DataScannerViewController and implement ultra wide angle Thanks a lot.
1
0
623
Jan ’25
The asset pack with the ID “testVideoAssetPack” couldn’t be looked up: Could not connect to the server.
On macOS Tahoe26.0, iOS 26.0 (23A5287g) not emulator, Xcode 26.0 beta 3 (17A5276g) Follow this tutorial Testing your asset packs locally The start the test server command I use this command line to start the test server:xcrun ba-serve --host 192.168.0.109 test.aar The terminal showThe content displayed on the terminal is: Loading asset packs… Loading the asset pack at “test.aar”… Listening on port 63125…… Choose an identity in the panel to continue. Listening on port 63125… running the project, Xcode reports an error:Download failed: Could not connect to the server. I use iPhone safari visit this website: https://192.168.0.109:63125, on the page display "Hello, world!" There are too few error messages in both of the above questions. I have no idea what the specific reasons are.I hope someone can offer some guidance. Best Regards. { "assetPackID": "testVideoAssetPack", "downloadPolicy": { "prefetch": { "installationEventTypes": ["firstInstallation", "subsequentUpdate"] } }, "fileSelectors": [ { "file": "video/test.mp4" } ], "platforms": [ "iOS" ] } this is my Manifest.json
1
0
295
Jul ’25
Core ML Model performance far lower on iOS 17 vs iOS 16 (iOS 17 not using Neural Engine)
Hello, I posted an issue on the coremltools GitHub about my Core ML models not performing as well on iOS 17 vs iOS 16 but I'm posting it here just in case. TL;DR The same model on the same device/chip performs far slower (doesn't use the Neural Engine) on iOS 17 compared to iOS 16. Longer description The following screenshots show the performance of the same model (a PyTorch computer vision model) on an iPhone SE 3rd gen and iPhone 13 Pro (both use the A15 Bionic). iOS 16 - iPhone SE 3rd Gen (A15 Bioinc) iOS 16 uses the ANE and results in fast prediction, load and compilation times. iOS 17 - iPhone 13 Pro (A15 Bionic) iOS 17 doesn't seem to use the ANE, thus the prediction, load and compilation times are all slower. Code To Reproduce The following is my code I'm using to export my PyTorch vision model (using coremltools). I've used the same code for the past few months with sensational results on iOS 16. # Convert to Core ML using the Unified Conversion API coreml_model = ct.convert( model=traced_model, inputs=[image_input], outputs=[ct.TensorType(name="output")], classifier_config=ct.ClassifierConfig(class_names), convert_to="neuralnetwork", # compute_precision=ct.precision.FLOAT16, compute_units=ct.ComputeUnit.ALL ) System environment: Xcode version: 15.0 coremltools version: 7.0.0 OS (e.g. MacOS version or Linux type): Linux Ubuntu 20.04 (for exporting), macOS 13.6 (for testing on Xcode) Any other relevant version information (e.g. PyTorch or TensorFlow version): PyTorch 2.0 Additional context This happens across "neuralnetwork" and "mlprogram" type models, neither use the ANE on iOS 17 but both use the ANE on iOS 16 If anyone has a similar experience, I'd love to hear more. Otherwise, if I'm doing something wrong for the exporting of models for iOS 17+, please let me know. Thank you!
1
1
1.8k
Mar ’25
ActivityClassifier doesn't classify movement
I'm using a custom create ML model to classify the movement of a user's hand in a game, The classifier has 3 different spell movements, but my code constantly predicts all of them at an equal 1/3 probability regardless of movement which leads me to believe my code isn't correct (as opposed to the model) which in CreateML at least gives me a heavily weighted prediction My code is below. On adding debug prints everywhere all the data looks good to me and matches similar to my test CSV data So I'm thinking my issue must be in the setup of my model code? /// Feeds samples into the model and keeps a sliding window of the last N frames. final class WandGestureStreamer { static let shared = WandGestureStreamer() private let model: SpellActivityClassifier private var samples: [Transform] = [] private let windowSize = 100 // number of frames the model expects /// RNN hidden state passed between inferences private var stateIn: MLMultiArray /// Last transform dropped from the window for continuity private var lastDropped: Transform? private init() { let config = MLModelConfiguration() self.model = try! SpellActivityClassifier(configuration: config) // Initialize stateIn to the model’s required shape let constraint = self.model.model.modelDescription .inputDescriptionsByName["stateIn"]! .multiArrayConstraint! self.stateIn = try! MLMultiArray(shape: constraint.shape, dataType: .double) } /// Call once per frame with the latest wand position (or any feature vector). func appendSample(_ sample: Transform) { samples.append(sample) // drop oldest frame if over capacity, retaining it for delta at window start if samples.count > windowSize { lastDropped = samples.removeFirst() } } func classifyIfReady(threshold: Double = 0.6) -> (label: String, confidence: Double)? { guard samples.count == windowSize else { return nil } do { let input = try makeInput(initialState: stateIn) let output = try model.prediction(input: input) // Save state for continuity stateIn = output.stateOut let best = output.label let conf = output.labelProbability[best] ?? 0 // If you’ve recognized a gesture with high confidence: if conf > threshold { return (best, conf) } else { return nil } } catch { print("Error", error.localizedDescription, error) return nil } } /// Constructs a SpellActivityClassifierInput from recorded wand transforms. func makeInput(initialState: MLMultiArray) throws -> SpellActivityClassifierInput { let count = samples.count as NSNumber let shape = [count] let timeArr = try MLMultiArray(shape: shape, dataType: .double) let dxArr = try MLMultiArray(shape: shape, dataType: .double) let dyArr = try MLMultiArray(shape: shape, dataType: .double) let dzArr = try MLMultiArray(shape: shape, dataType: .double) let rwArr = try MLMultiArray(shape: shape, dataType: .double) let rxArr = try MLMultiArray(shape: shape, dataType: .double) let ryArr = try MLMultiArray(shape: shape, dataType: .double) let rzArr = try MLMultiArray(shape: shape, dataType: .double) for (i, sample) in samples.enumerated() { let previousSample = i > 0 ? samples[i - 1] : lastDropped let model = WandMovementRecording.DataModel(transform: sample, previous: previousSample) // print("model", model) timeArr[i] = NSNumber(value: model.timestamp) dxArr[i] = NSNumber(value: model.dx) dyArr[i] = NSNumber(value: model.dy) dzArr[i] = NSNumber(value: model.dz) let rot = model.rotation rwArr[i] = NSNumber(value: rot.w) rxArr[i] = NSNumber(value: rot.x) ryArr[i] = NSNumber(value: rot.y) rzArr[i] = NSNumber(value: rot.z) } return SpellActivityClassifierInput( dx: dxArr, dy: dyArr, dz: dzArr, rotation_w: rwArr, rotation_x: rxArr, rotation_y: ryArr, rotation_z: rzArr, timestamp: timeArr, stateIn: initialState ) } }
1
0
352
Jul ’25
Foundation Models Error: Local Sanitizer Asset
Hi, I just upgraded to macOS Tahoe Beta 2 and now I'm getting this error when I try to initialize my Foundation Models' session: Error Resource (Local Sanitizer Asset) unavailable error. import FoundationModels #Playground { let session = LanguageModelSession() do { let result = try await session.respond(to: "Tell me 3 colors") print(result.content) } catch { print("Error", error) } } I couldn't find any resource guiding me on how to solve this. Any help/workaround? Thank you!
1
4
416
Jun ’25