Explore the power of machine learning and Apple Intelligence within apps. Discuss integrating features, share best practices, and explore the possibilities for your app here.

All subtopics
Posts under Machine Learning & AI topic

Post

Replies

Boosts

Views

Activity

ANE Performance for on-device Foundation model
I'm running MacOs 26 Beta 5. I noticed that I can no longer achieve 100% usage on the ANE as I could before with Apple Foundations on-device model. Has Apple activated some kind of throttling or power limiting of the ANE? I cannot get above 3w or 40% usage now since upgrading. I'm on the high power energy mode. I there an API rate limit being applied? I kave a M4 Pro mini with 64 GB of memory.
0
0
322
Aug ’25
How to Ensure Controlled and Contextual Responses Using Foundation Models ?
Hi everyone, I’m currently exploring the use of Foundation models on Apple platforms to build a chatbot-style assistant within an app. While the integration part is straightforward using the new FoundationModel APIs, I’m trying to figure out how to control the assistant’s responses more tightly — particularly: Ensuring the assistant adheres to a specific tone, context, or domain (e.g. hospitality, healthcare, etc.) Preventing hallucinations or unrelated outputs Constraining responses based on app-specific rules, structured data, or recent interactions I’ve experimented with prompt, systemMessage, and few-shot examples to steer outputs, but even with carefully generated prompts, the model occasionally produces incorrect or out-of-scope responses. Additionally, when using multiple tools, I'm unsure how best to structure the setup so the model can select the correct pathway/tool and respond appropriately. Is there a recommended approach to guiding the model's decision-making when several tools or structured contexts are involved? Looking forward to hearing your thoughts or being pointed toward related WWDC sessions, Apple docs, or sample projects.
0
0
119
Jul ’25
“Unleashing the MacBook Air M2: 673 TFLOPS Achieved with Highly Optimized Metal Shading Language”
Using highly optimized Metal Shading Language (MSL) code, I pushed the MacBook Air M2 to its performance limits with the deformable_attention_universal kernel. The results demonstrate both the efficiency of the code and the exceptional power of Apple Silicon. The total computational workload exceeded 8.455 quadrillion FLOPs, equivalent to processing 8,455 trillion operations. On average, the code sustained a throughput of 85.37 TFLOPS, showcasing the chip’s remarkable ability to handle massive workloads. Peak instantaneous performance reached approximately 673.73 TFLOPS, reflecting near-optimal utilization of the GPU cores. Despite this intensity, the cumulative GPU runtime remained under 100 seconds, highlighting the code’s efficiency and time optimization. The fastest iteration achieved a record processing time of only 0.051 ms, demonstrating minimal bottlenecks and excellent responsiveness. Memory management was equally impressive: peak GPU memory usage never exceeded 2 MB, reflecting efficient use of the M2’s Unified Memory. This minimizes data transfer overhead and ensures smooth performance across repeated workloads. Overall, these results confirm that a well-optimized Metal implementation can unlock the full potential of Apple Silicon, delivering exceptional computational density, processing speed, and memory efficiency. The MacBook Air M2, often considered an energy-efficient consumer laptop, is capable of handling highly intensive workloads at performance levels typically expected from much larger GPUs. This test validates both the robustness of the Metal code and the extraordinary capabilities of the M2 chip for high-performance computing tasks.
0
0
366
Nov ’25
Efficient Clustering of Images Using VNFeaturePrintObservation.computeDistance
Hi everyone, I'm working with VNFeaturePrintObservation in Swift to compute the similarity between images. The computeDistance function allows me to calculate the distance between two images, and I want to cluster similar images based on these distances. Current Approach Right now, I'm using a brute-force approach where I compare every image against every other image in the dataset. This results in an O(n^2) complexity, which quickly becomes a bottleneck. With 5000 images, it takes around 10 seconds to complete, which is too slow for my use case. Question Are there any efficient algorithms or data structures I can use to improve performance? If anyone has experience with optimizing feature vector clustering or has suggestions on how to scale this efficiently, I'd really appreciate your insights. Thanks!
0
0
524
Feb ’25
Named Entity Recognition Model for Measurements
In an under-development MacOS & iOS app, I need to identify various measurements from OCR'ed text: length, weight, counts per inch, area, percentage. The unit type (e.g. UnitLength) needs to be identified as well as the measurement's unit (e.g. .inches) in order to convert the measurement to the app's internal standard (e.g. centimetres), the value of which is stored the relevant CoreData entity. The use of NLTagger and NLTokenizer is problematic because of the various representations of the measurements: e.g. "50g.", "50 g", "50 grams", "1 3/4 oz." Currently, I use a bespoke algorithm based on String contains and step-wise evaluation of characters, which is reasonably accurate but requires frequent updating as further representations are detected. I'm aware of the Python SpaCy model being capable of NER Measurement recognition, but am reluctant to incorporate a Python-based solution into a production app. (ref [https://developer.apple.com/forums/thread/30092]) My preference is for an open-source NER Measurement model that can be used as, or converted to, some form of a Swift compatible Machine Learning model. Does anyone know of such a model?
0
0
115
Mar ’25
[MPSGraph runWithFeeds:targetTensors:targetOperations:] randomly crash
I'm implementing an LLM with Metal Performance Shader Graph, but encountered a very strange behavior, occasionally, the model will report an error message as this: LLVM ERROR: SmallVector unable to grow. Requested capacity (9223372036854775808) is larger than maximum value for size type (4294967295) and crash, the stack backtrace screenshot is attached. Note that 5th frame is mlir::getIntValues<long long> and 6th frame is llvm::SmallVectorBase<unsigned int>::grow_pod It looks like mlir mistakenly took a 64 bit value for a 32 bit type. Unfortunately, I could not found the source code of mlir::getIntValues, maybe it's Apple's closed source fork of llvm for MPS implementation? Anyway, any opinion or suggestion on that?
0
0
184
Mar ’25
jax-metal failing due to incompatibility with jax 0.5.1 or later.
Hello, I am interested in using jax-metal to train ML models using Apple Silicon. I understand this is experimental. After installing jax-metal according to https://developer.apple.com/metal/jax/, my python code fails with the following error JaxRuntimeError: UNKNOWN: -:0:0: error: unknown attribute code: 22 -:0:0: note: in bytecode version 6 produced by: StableHLO_v1.12.1 My issue is identical to the one reported here https://github.com/jax-ml/jax/issues/26968#issuecomment-2733120325, and is fixed by pinning to jax-metal 0.1.1., jax 0.5.0 and jaxlib 0.5.0. Thank you!
0
0
428
2w
Various On-Device Frameworks API & ChatGPT
Posting a follow up question after the WWDC 2025 Machine Learning AI & Frameworks Group Lab on June 12. In regards to the on-device API of any of the AI frameworks (foundation model, vision framework, ect.), is there a response condition or path where the API outsources it's input to ChatGPT if the user has allowed this like Siri does? Ignore this if it's a no: is this handled behind the scenes or by the developer?
0
0
254
Jun ’25
is it possible to let siri monitor phone calls, and notify me when a certain trigger happens?
the specific context is that i would like to build an agent that monitors my phone call (with a customer support for example), and simiply identify whether or not im still put on hold, and notify me when im not. currently after reading the doc, i dont think its possible yet, but im so annoyed by the customer support calls that im willing to go the distance and see if theres any way.
0
0
128
Jun ’25
Core-ml-on-device-llama Converting fails
I followed below url for converting Llama-3.1-8B-Instruct model but always fails even i have 64GB of free space after downloading model from huggingface. https://machinelearning.apple.com/research/core-ml-on-device-llama Also tried with other models Llama-3.1-1B-Instruct & Llama-3.1-3B-Instruct models those are converted but while doing performance test in xcode fails for all compunits. Is there any source code to run llama models in ios app.
0
0
117
Apr ’25
tensorflow-metal error
I'm using python 3.9.6, tensorflow 2.20.0, tensorflow-metal 1.2.0, and when I try to run import tensorflow as tf It gives Traceback (most recent call last): File "/Users/haoduoyu/Code/demo.py", line 1, in <module> import tensorflow as tf File "/Users/haoduoyu/Code/test/lib/python3.9/site-packages/tensorflow/__init__.py", line 438, in <module> _ll.load_library(_plugin_dir) File "/Users/haoduoyu/Code/test/lib/python3.9/site-packages/tensorflow/python/framework/load_library.py", line 151, in load_library py_tf.TF_LoadLibrary(lib) tensorflow.python.framework.errors_impl.NotFoundError: dlopen(/Users/haoduoyu/Code/test/lib/python3.9/site-packages/tensorflow-plugins/libmetal_plugin.dylib, 0x0006): Library not loaded: @rpath/_pywrap_tensorflow_internal.so Referenced from: <8B62586B-B082-3113-93AB-FD766A9960AE> /Users/haoduoyu/Code/test/lib/python3.9/site-packages/tensorflow-plugins/libmetal_plugin.dylib Reason: tried: '/Users/haoduoyu/Code/test/lib/python3.9/site-packages/tensorflow-plugins/../_solib_darwin_arm64/_U@local_Uconfig_Utf_S_S_C_Upywrap_Utensorflow_Uinternal___Uexternal_Slocal_Uconfig_Utf/_pywrap_tensorflow_internal.so' (no such file), '/Users/haoduoyu/Code/test/lib/python3.9/site-packages/tensorflow-plugins/../_solib_darwin_arm64/_U@local_Uconfig_Utf_S_S_C_Upywrap_Utensorflow_Uinternal___Uexternal_Slocal_Uconfig_Utf/_pywrap_tensorflow_internal.so' (no such file) As long as I uninstall tensorflow-metal, nothing goes wrong. How can I fix this problem?
0
0
52
1w
Hardware Support for Low Precision Data Types?
Hi all, I'm trying to find out if/when we can expect mxfp8/mxfp4 support on Apple Silicon. I've noticed that mlx now has casting data types, but all computation is still done in bf16. Would be great to reduce power consumption with support for these lower precision data types since edge inference is already typically done at a lower precision! Thanks in advance.
0
0
232
Nov ’25
Detection of balls about 6-10ft Away not detecting
I used Yolo5-11 and while performing great detecting balls lets say 5-10ft away in 1920 resolution and even in 640 it really is taking toll on my app performance. When I use Create ML it outputs all in 415x which is probably the reason why it does not detect objects from far. What can I do to preserve some energy ? My model is used with about 1K pictures 200 each test and validate, and from close up and far.
0
2
118
Apr ’25
Deterministic AI Safety Governor for iOS — Seeking Feedback on App Review Approach
I've built an iOS app with a novel approach to AI safety: a deterministic, pre-inference validation layer called Newton Engine. Instead of relying on the LLM to self-moderate, Newton validates every prompt BEFORE it reaches the model. It uses shape theory and semantic analysis to detect: • Corrosive frames (self-harm language patterns) • Logical contradictions (requests that undermine themselves) • Delegation attempts (asking AI to make human decisions) • Jailbreak patterns (prompt injection, role-play escapes) • Hallucination triggers (requests for fabricated citations) The system achieves a 96% adversarial catch rate across 847 test cases, with zero false positives on benign prompts. Key technical details: • Pure Swift/SwiftUI, no external dependencies • Runs entirely on-device (no server calls for validation) • Deterministic (same input always produces same output) • Auditable (full trace logging for every validation) I'm preparing to submit to the App Store and wanted to ask: Are there specific App Review guidelines I should reference for AI safety claims? Is there interest from Apple in deterministic governance layers for Apple Intelligence integration? Any recommendations for demonstrating safety compliance during review? The app is called Ada, and the engine is open source at: github.com/jaredlewiswechs/ada-newton Happy to share technical documentation or discuss the architecture with anyone interested. See: parcri.net
0
0
321
3d
Using coremltools in a CI/CD pipeline
Hi everyone 👋 I'd like to use coremltools to see how well a model performs on a remote device as part of a CI/CD pipeline. According to the Core ML Tools "Debugging and Performance Utilities" guide, remote devices must be in a "connected" state in order for coremltools to install the ModelRunner application. The devices in our system have a "paired" state, and I'm unable to set the them as "connected." The only way I know how to connect a device is to physically plug it in to a computer and open Xcode. I don't have physical access to the devices in the CI/CD system, and the host computer that interacts with them doesn't have Xcode installed. Here are some questions I've been looking into and would love some help answering: Has anyone managed to use the coremltools performance utilities in a similar system? Can you put a device in a "connected" state if you don't have physical access to the device and if you only have access to Xcode command line tools and not the Xcode app? Is it at all possible to install the coremltools ModelRunner application on a "paired" device, for example, by manually building the app and installing it with devicectl? Would other utilities, such as the MLModelBenchmarker work as expected if the app is installed this way? Thank you!
0
0
176
4d
Difference between compiling a Model using CoreML and Swift-Transformers
Hello, I was successfully able to compile TKDKid1000/TinyLlama-1.1B-Chat-v0.3-CoreML using Core ML, and it's working well. However, I’m now trying to compile the same model using Swift Transformers. With the limited documentation available on the swift-chat and Hugging Face repositories, I’m finding it difficult to understand the correct process for compiling a model via Swift Transformers. I attempted the following approach, but I’m fairly certain it’s not the recommended or correct method. Could someone guide me on the proper way to compile and use models like TinyLlama with Swift Transformers? Any official workflow, example, or best practice would be very helpful. Thanks in advance! This is the approach I have used: import Foundation import CoreML import Tokenizers @main struct HopeApp { static func main() async { print(" Running custom decoder loop...") do { let tokenizer = try await AutoTokenizer.from(pretrained: "PY007/TinyLlama-1.1B-Chat-v0.3") var inputIds = tokenizer("this is the test of the prompt") print("🧠 Prompt token IDs:", inputIds) let model = try float16_model(configuration: .init()) let maxTokens = 30 for _ in 0..<maxTokens { let input = try MLMultiArray(shape: [1, 128], dataType: .int32) let mask = try MLMultiArray(shape: [1, 128], dataType: .int32) for i in 0..<inputIds.count { input[i] = NSNumber(value: inputIds[i]) mask[i] = 1 } for i in inputIds.count..<128 { input[i] = 0 mask[i] = 0 } let output = try model.prediction(input_ids: input, attention_mask: mask) let logits = output.logits // shape: [1, seqLen, vocabSize] let lastIndex = inputIds.count - 1 let lastLogitsStart = lastIndex * 32003 // vocab size = 32003 var nextToken = 0 var maxLogit: Float32 = -Float.greatestFiniteMagnitude for i in 0..<32003 { let logit = logits[lastLogitsStart + i].floatValue if logit > maxLogit { maxLogit = logit nextToken = i } } inputIds.append(nextToken) if nextToken == 32002 { break } let partialText = try await tokenizer.decode(tokens:inputIds) print(partialText) } } catch { print("❌ Error: \(error)") } } }
1
0
163
Jun ’25
Is it allowed for an iOS app to download machine learning model files (e.g., .mlmodel, .onnx) from a separate cloud server?
Hello, I am developing an iOS app that uses machine learning models. To improve accuracy and user experience, I would like to download .mlmodel files (compiled and compressed as zip files) from our own server after the app is installed, and use them for inference within the app. No executable code, scripts, or dynamic libraries will be downloaded—only model data files are used. According to App Store Review Guideline 2.5.2, I understand that apps may not download or execute code which introduces or changes features or functionality. In this case, are compiled and zip-compressed .mlmodel files considered "data" rather than "code", and is it allowed to download and use them in the app? If there are any restrictions or best practices related to this, please let me know. Thank you.
1
0
357
Jul ’25