Integrate machine learning models into your app using Core ML.

Core ML Documentation

Posts under Core ML subtopic

Post

Replies

Boosts

Views

Activity

Does ExecuTorch support VisionOS?
Does anyone know if ExecuTorch is officially supported or has been successfully used on visionOS? If so, are there any specific build instructions, example projects, or potential issues (like sandboxing or memory limitations) to be aware of when integrating it into an Xcode project for the Vision Pro? While ExecuTorch has support for iOS, I can't find any official documentation or community examples specifically mentioning visionOS. Thanks.
0
0
330
Jul ’25
JAX Metal: Random Number Generation Performance Issue on M1 Max
JAX Metal shows 55x slower random number generation compared to NVIDIA CUDA on equivalent workloads. This makes Monte Carlo simulations and scientific computing impractical on Apple Silicon. Performance Comparison NVIDIA GPU: 0.475s for 12.6M random elements M1 Max Metal: 26.3s for same workload Performance gap: 55x slower Environment Apple M1 Max, 64GB RAM, macOS Sequoia Version 15.6.1 JAX 0.4.34, jax-metal latest Backend: Metal Reproduction Code import time import jax import jax.numpy as jnp from jax import random key = random.PRNGKey(42) start_time = time.time() random_array = random.normal(key, (50000, 252)) duration = time.time() - start_time print(f"Duration: {duration:.3f}s")
0
0
560
Aug ’25
Best practices for designing proactive FinTech insights with App Intents & Shortcuts?
Hello fellow developers, I'm the founder of a FinTech startup, Cent Capital (https://cent.capital), where we are building an AI-powered financial co-pilot. We're deeply exploring the Apple ecosystem to create a more proactive and ambient user experience. A core part of our vision is to use App Intents and the Shortcuts app to surface personalized financial insights without the user always needing to open our app. For example, suggesting a Shortcut like, "What's my spending in the 'Dining Out' category this month?" or having an App Intent proactively surface an insight like, "Your 'Subscriptions' budget is almost full." My question for the community is about the architectural and user experience best practices for this. How are you thinking about the balance between providing rich, actionable insights via Intents without being overly intrusive or "spammy" to the user? What are the best practices for designing the data model that backs these App Intents for a complex domain like personal finance? Are there specific performance or privacy considerations we should be aware of when surfacing potentially sensitive financial data through these system-level integrations? We believe this is the future of FinTech apps on iOS and would love to hear how other developers are thinking about this challenge. Thanks for your insights!
0
0
445
Oct ’25
Nothing else will silently fix itself until Apple re-adds ONNX ingestion in Core ML 10 or a new converter.
After a few attempts I end up at the same spot as categorized here by ChatGPT - I get why it feels that way, Rob — honestly, you’ve walked right into the worst possible moment in Apple’s Core ML toolchain evolution. What you’re running into isn’t your fault; Apple really did rip out the ONNX front-end without offering a clean replacement yet. So every tutorial, every script, and even coremltools itself contradicts what’s actually possible right now. Let’s be brutally clear about the state of things (late 2025): Goal Supported by Apple today Practical workaround Convert ONNX → Core ML using coremltools 9 No. The ONNX importer was removed. Use the legacy coremltools 6.x + onnx-coreml stack in a separate Python 3.9/3.10 env. Convert PyTorch → Core ML directly Yes, but only for simple or traced models. T5 still fails because of dynamic control flow. None reliable without rewriting the model. Run ONNX models directly on iOS Partially (via MLC LLM / Metal Performance Shaders). Works, but that’s an entirely different runtime. So: you’re not crazy, and this isn’t wasted effort—you’ve actually proven that your model exports cleanly and that the only missing piece is Apple’s converter. The ONNX file you built is solid. You just need something that can translate it to Core ML. If your goal is simply “get it running on-device”, you still have two viable moves that aren’t rabbit holes: 1. Use MLC LLM / Metal runtime Skip Core ML altogether. Apple’s mlc-llm (and the “MLCChat” pipeline) can execute your ONNX or quantized model directly on the GPU. You keep full fidelity and avoid all these broken converters. 2. Use the legacy Core ML 6.x toolchain in a sandboxed Python 3.10 VM It actually works. You don’t have to downgrade your whole system—just run that specific environment once to emit the .mlpackage, then never touch it again. Nothing else will silently fix itself until Apple re-adds ONNX ingestion in Core ML 10 or a new converter.
0
0
304
Oct ’25
Hardware Support for Low Precision Data Types?
Hi all, I'm trying to find out if/when we can expect mxfp8/mxfp4 support on Apple Silicon. I've noticed that mlx now has casting data types, but all computation is still done in bf16. Would be great to reduce power consumption with support for these lower precision data types since edge inference is already typically done at a lower precision! Thanks in advance.
0
0
360
Nov ’25
“Accelerate Transformer Training on Apple Devices from Months to Hours!”
I am excited to share that I have developed a Metal kernel for Flash Attention that eliminates race conditions and fully leverages Apple Silicon’s shared memory and registers. This kernel can dramatically accelerate training of transformer-based models. Early benchmarks suggest that models which previously required months to train could see reductions to just a few hours on Apple hardware, while maintaining numerical stability and accuracy. I plan to make the code publicly available to enable the broader community to benefit. I would be happy to keep you updated on the latest developments and improvements as I continue testing and optimizing the kernel. I believe this work could provide valuable insights for Apple’s machine learning research and products.
0
0
329
Nov ’25
Getting CoreML to run inference on already allocated gpu buffers
I am running some experiments with WebGPU using the wgpu crate in rust. I have some Buffers already allocated in the GPU. Is it possible to use those already existing buffers directly as inputs to a predict call in CoreML? I want to prevent gpu to cpu download time as much as possible. Or are there any other ways to do something like this. Is this only possible using the latest Tensor object which came out with Metal 4 ?
0
0
928
Nov ’25
ANE Error with Statefu Model: "Unable to compute prediction" when State Tensor width is not 32-aligned
Hi everyone, I believe I’ve encountered a potential bug or a hardware alignment limitation in the Core ML Framework / ANE Runtime specifically affecting the new Stateful API (introduced in iOS 18/macOS 15). The Issue: A Stateful mlprogram fails to run on the Apple Neural Engine (ANE) if the state tensor dimensions (specifically the width) are not a multiple of 32. The model works perfectly on CPU and GPU, but fails on ANE both during runtime and when generating a Performance Report in Xcode. Error Message in Xcode UI: "There was an error creating the performance report Unable to compute the prediction using ML Program. It can be an invalid input data or broken/unsupported model." Observations: Case A (Fails): State shape = (1, 3, 480, 270). Prediction fails on ANE. Case B (Success): State shape = (1, 3, 480, 256). Prediction succeeds on ANE. This suggests an internal memory alignment or tiling issue within the ANE driver when handling Stateful buffers that don't meet the 32-pixel/element alignment. Reproduction Code (PyTorch + coremltools): import torch.nn as nn import coremltools as ct import numpy as np class RNN_Stateful(nn.Module): def __init__(self, hidden_shape): super(RNN_Stateful, self).__init__() # Simple conv to update state self.conv1 = nn.Conv2d(3 + hidden_shape[1], hidden_shape[1], kernel_size=3, padding=1) self.conv2 = nn.Conv2d(hidden_shape[1], 3, kernel_size=3, padding=1) self.register_buffer("hidden_state", torch.ones(hidden_shape, dtype=torch.float16)) def forward(self, imgs): self.hidden_state = self.conv1(torch.cat((imgs, self.hidden_state), dim=1)) return self.conv2(self.hidden_state) # h=480, w=255 causes ANE failure. w=256 works. b, ch, h, w = 1, 3, 480, 255 model = RNN_Stateful((b, ch, h, w)).eval() traced_model = torch.jit.trace(model, torch.randn(b, 3, h, w)) mlmodel = ct.convert( traced_model, inputs=[ct.TensorType(name="input_image", shape=(b, 3, h, w), dtype=np.float16)], outputs=[ct.TensorType(name="output", dtype=np.float16)], states=[ct.StateType(wrapped_type=ct.TensorType(shape=(b, ch, h, w), dtype=np.float16), name="hidden_state")], minimum_deployment_target=ct.target.iOS18, convert_to="mlprogram" ) mlmodel.save("rnn_stateful.mlpackage") Steps to see the error: Open the generated .mlpackage in Xcode 16.0+. Go to the Performance tab and run a test on a device with ANE (e.g., iPhone 15/16 or M-series Mac). The report will fail to generate with the error mentioned above. Environment: OS: macOS 15.2 Xcode: 16.3 Hardware: M4 Has anyone else encountered this 32-pixel alignment requirement for StateType tensors on ANE? Is this a known hardware constraint or a bug in the Core ML runtime? Any insights or workarounds (other than manual padding) would be appreciated.
0
0
577
Dec ’25
CoreML Unified Memory failure/silent exit on long video tasks (M1 Mac 32GB)
Hi Apple Engineers, I am experiencing a potential memory management bug with CoreML on M1 Mac (32GB Unified Memory). When processing long video files (approx. 12,000 frames) using a CoreML execution provider, the system often completes the 'Analysing' phase but fails to transition into 'Processing'. It simply exits silently or hits an import error (scipy). However, if I split the same task into small 20-frame segments, it works perfectly at high speeds (~40 FPS). This suggests the hardware is capable, but there is an issue with memory fragmentation or resource cleanup during long-running CoreML sessions. Is there a way to force a VRAM/Unified Memory flush via CLI, or is this a known limitation for large frame indexing?
0
0
615
Dec ’25
SoundAnalysis built-in classifier fails in background (SNErrorCode.operationFailed)
I’m seeing consistent failures using SoundAnalysis live classification when my app moves to the background. Setup iOS 17.x AVAudioEngine mic capture SNAudioStreamAnalyzer SNClassifySoundRequest(classifierIdentifier: .version1) UIBackgroundModes = audio AVAudioSession .record / .playAndRecord, active Audio capture + level metering continue working in background (mic indicator stays on) Issue As soon as the app enters background / screen locks: SoundAnalysis starts failing every second with domain:com.apple.SoundAnalysis, code:2(SNErrorCode.operationFailed) Audio capture itself continues normally When the app returns to foreground, classification immediately resumes without restarting the engine/analyzer Question Is live background sound classification with the built-in SoundAnalysis classifier officially unsupported or known to fail in background? If so, is a custom Core ML model the only supported approach for background detection? Or is there a required configuration I’m missing to keep SNClassifySoundRequest(.version1) running in background? Thanks for any clarification.
0
1
284
Dec ’25
ML contraints & Timeout clarificaitions for Message Filtering Extension
Hello everyone, I’m currently working with the Message Filtering Extension and would really appreciate some clarification around its performance and operational constraints. While the extension is extremely powerful and useful, I’ve found that some important details are either unclear or not well covered in the available documentation. There are two main areas I’m trying to understand better: Machine learning model constraints within the extension In our case, we already have an existing ML model that classifies messages (and are not dependant on Apple's built-in models). We’re evaluating whether and how it can be used inside the extension. Specifically, I’m trying to understand: Are there documented limits on the size of an ML model (e.g., maximum bundle size or model file size in MB)? What are the memory constraints for a model once loaded into memory by the extension? Under what conditions would the system terminate or “kick out” the extension due to memory or performance pressure? Message processing timeouts and execution constraints What is the timeout for processing a single received message? At what point will the OS stop waiting for the extension’s response and allow the message by default (for example, if the extension does not respond in time)? Any guidance, official references, or practical experience from Apple engineers or other developers would be greatly appreciated. Thanks in advance for your help,
0
0
328
Jan ’26
Is it possible to instantiate MLModel strictly from memory (Data) to support custom encryption?
We are trying to implement a custom encryption scheme for our Core ML models. Our goal is to bundle encrypted models, decrypt them into memory at runtime, and instantiate the MLModel without the unencrypted model file ever touching the disk. We have looked into the native apple encryption described here https://developer.apple.com/documentation/coreml/encrypting-a-model-in-your-app but it has limitations like not working on intel macs, without SIP, and doesn’t work loading from dylib. It seems like most of the Core ML APIs require a file path, there is MLModelAsset APIs but I think they just write a modelc back to disk when compiling but can’t find any information confirming that (also concerned that this seems to be an older API, and means we need to compile at runtime). I am aware that the native encryption will be much more secure but would like not to have the models in readable text on disk. Does anyone know if this is possible or any alternatives to try to obfuscate the Core ML models, thanks
0
1
710
Feb ’26
MLX/Ollama Benchmarking Suite - Open Source and Free
Hi all, I spent the last few months developing an MLX/Ollama local AI Benchmarking suite for Apple Silicon, written in pure Swift and signed with an Apple Developer Certificate, open source, GPL, and free. I would love some feedback to continue development. It is the only benchmarking suite I know of that supports live power metrics and MLX natively, as well as quick exports for benchmark results, and an arena mode, Model A vs B with history. I really want this project to succeed, and have widespread use, so getting 75 stars on the github repo makes it eligible for Homebrew/Cask distribution. Github Repo
0
0
311
Feb ’26
Core Model Editor and Params
Optimal Precision • Current Precision: Mixed (Float32, int32) • Optimal Precision: Not specified in the image, but typically involves using the most efficient data type for the model's operations to balance speed and memory usage without significant loss of accuracy. Comparison: • Mixed Precision: Utilizes both Float32 and int32 to optimize performance. Float32 provides high precision, while int32 reduces memory usage and increases computational speed. • Optimal Precision: Aimed at achieving the best trade-off between performance and accuracy, potentially using other data types like Float16 (bfloat16) for even greater efficiency in certain hardware environments. Operation Distribution • Current Distribution: • iOS18.mul: 168 • iOS18.transpose: 126 • iOS18.linear: 98 • iOS18.add: 97 • iOS18.sliceByIndex: 96 • iOS18.expandDims: 74 • iOS18.concat: 72 • iOS18.squeeze: 72 • iOS18.reshape: 67 • iOS18.layerNorm: 49 • iOS18.matmul: 48 • iOS18.gelu: 26 • iOS18.softmax: 24 • Split: 24 • conv: 1 • iOS18.conv: 1 Comparison: • Operation Count: Indicates how frequently each operation is executed. High counts for operations like mul, transpose, and linear suggest these are computationally intensive parts of the model. • Optimization Opportunities: Reducing the count of high-frequency operations or optimizing their execution can improve performance. This might involve pruning unnecessary operations, optimizing algorithms, or leveraging hardware acceleration. General Recommendations • Precision Tuning: Experiment with different precision levels to find the best balance for your specific hardware and accuracy requirements. • Operation Optimization: Focus on optimizing the most frequent operations. Techniques include using more efficient algorithms, parallelizing computations, or utilizing specialized hardware like GPUs or TPUs. • Benchmarking: Regularly benchmark the model to assess the impact of changes and ensure that optimizations lead to meaningful performance improvements. By focusing on these areas, you can potentially enhance the efficiency and performance of your ML model.
0
0
233
Feb ’26
Building a 4-agent autonomous coding pipeline on Apple Silicon — MLX backend questions
Hi, I'm building ANF (Autonomous Native Forge) — a cloud-free, 4-agent autonomous software production pipeline running on local hardware with local LLM inference. No middleware, pure Node.js native. Currently running on NVIDIA Blackwell GB10 with vLLM + DeepSeek-R1-32B. Now porting to Apple Silicon. Three technical questions: How production-ready is mlx-lm's OpenAI-compatible API server for long context generation (32K tokens)? What's the recommended approach for KV Cache management with Unified Memory architecture — any specific flags or configurations for M4 Ultra? MLX vs GGUF (llama.cpp) for a multi-agent pipeline where 4 agents call the inference endpoint concurrently — which handles parallel requests better on Apple Silicon? GitHub: github.com/trgysvc/AutonomousNativeForge Any guidance appreciated.
0
0
531
Mar ’26
How does ARKit achieve low-latency and stable head tracking using only RGB camera ?
Hi, I’m working on a real-time head/face tracking pipeline using a standard 2D RGB camera, and I’m trying to better understand how ARKit achieves such stable and responsive results in comparable conditions. To clarify upfront: I’m specifically interested in RGB-only tracking and the underlying vision/ML pipeline. I’m not using TrueDepth or any depth/IR-based sensors, and I’d like to understand how similar stability and responsiveness can be achieved under those constraints. In my current setup, I estimate head pose from RGB frames (facial landmarks + PnP) and apply temporal filtering (e.g., exponential smoothing and Kalman filtering). This significantly reduces jitter, but introduces noticeable latency, especially during faster head movements. What stands out in ARKit is that it appears to maintain both: Very low jitter Very low perceived latency even when operating with camera input alone. I’m trying to understand what techniques might contribute to this behavior. In particular: Does ARKit use predictive tracking (e.g., velocity or acceleration-based pose extrapolation) to compensate for camera and processing delays in RGB-only scenarios? Are there recommended strategies for balancing temporal smoothing and responsiveness without introducing visible lag in camera-based pose estimation pipelines? Is the tracking pipeline internally decoupled from rendering (e.g., asynchronous processing with prediction applied at render time)? Are there general best practices for minimizing end-to-end latency in vision-based head tracking systems beyond standard filtering approaches? I understand that implementation details may not be public, but any high-level insights or pointers would be greatly appreciated. Thanks!
0
0
298
Mar ’26
CoreML model cache causes fake hard drive memory usage
Hi, I experiment by creating and compiling a lot of CoreML models and I have the issue that this causes a lot of disk usage, but when I try to delete everything (I search in the disk for possible CoreML cache directories) the disk space is not actually freed up. This is a picture of my disk usage according to what is shown inside of Settings>General>Storage and the Disk Utility app. I am running on macOS 15.7.5
0
0
1.5k
2w
Do loading multiple functions that share model weights multiply memory use?
Hi, I have a multifunction model where the functions share the same model weights, and for latency I have multiple functions loaded at the same time. According to what Codex found this multiplies RAM usage, so if the single model weights 2GB, loading two functions that share the underlying weights still doubles RAM usage to 4GB (seems that it is something like neural wired memory). Does anyone have any knowledge relating to this?
0
0
1.1k
2w
Does ExecuTorch support VisionOS?
Does anyone know if ExecuTorch is officially supported or has been successfully used on visionOS? If so, are there any specific build instructions, example projects, or potential issues (like sandboxing or memory limitations) to be aware of when integrating it into an Xcode project for the Vision Pro? While ExecuTorch has support for iOS, I can't find any official documentation or community examples specifically mentioning visionOS. Thanks.
Replies
0
Boosts
0
Views
330
Activity
Jul ’25
JAX Metal: Random Number Generation Performance Issue on M1 Max
JAX Metal shows 55x slower random number generation compared to NVIDIA CUDA on equivalent workloads. This makes Monte Carlo simulations and scientific computing impractical on Apple Silicon. Performance Comparison NVIDIA GPU: 0.475s for 12.6M random elements M1 Max Metal: 26.3s for same workload Performance gap: 55x slower Environment Apple M1 Max, 64GB RAM, macOS Sequoia Version 15.6.1 JAX 0.4.34, jax-metal latest Backend: Metal Reproduction Code import time import jax import jax.numpy as jnp from jax import random key = random.PRNGKey(42) start_time = time.time() random_array = random.normal(key, (50000, 252)) duration = time.time() - start_time print(f"Duration: {duration:.3f}s")
Replies
0
Boosts
0
Views
560
Activity
Aug ’25
Best practices for designing proactive FinTech insights with App Intents & Shortcuts?
Hello fellow developers, I'm the founder of a FinTech startup, Cent Capital (https://cent.capital), where we are building an AI-powered financial co-pilot. We're deeply exploring the Apple ecosystem to create a more proactive and ambient user experience. A core part of our vision is to use App Intents and the Shortcuts app to surface personalized financial insights without the user always needing to open our app. For example, suggesting a Shortcut like, "What's my spending in the 'Dining Out' category this month?" or having an App Intent proactively surface an insight like, "Your 'Subscriptions' budget is almost full." My question for the community is about the architectural and user experience best practices for this. How are you thinking about the balance between providing rich, actionable insights via Intents without being overly intrusive or "spammy" to the user? What are the best practices for designing the data model that backs these App Intents for a complex domain like personal finance? Are there specific performance or privacy considerations we should be aware of when surfacing potentially sensitive financial data through these system-level integrations? We believe this is the future of FinTech apps on iOS and would love to hear how other developers are thinking about this challenge. Thanks for your insights!
Replies
0
Boosts
0
Views
445
Activity
Oct ’25
Any Recommandation for a Image Enhance and Denoise Model
I'm really not familiar with ML, but I need a model that can enhance and denoise 4k video stream at 30fps. I have tried to search latest papers but they all have very complex structure, and I don't think I can convert them to mlmodel. So can anyone give me any recommandation for such models? If there is an existing mlmodel, that would be great!
Replies
0
Boosts
0
Views
362
Activity
Oct ’25
Nothing else will silently fix itself until Apple re-adds ONNX ingestion in Core ML 10 or a new converter.
After a few attempts I end up at the same spot as categorized here by ChatGPT - I get why it feels that way, Rob — honestly, you’ve walked right into the worst possible moment in Apple’s Core ML toolchain evolution. What you’re running into isn’t your fault; Apple really did rip out the ONNX front-end without offering a clean replacement yet. So every tutorial, every script, and even coremltools itself contradicts what’s actually possible right now. Let’s be brutally clear about the state of things (late 2025): Goal Supported by Apple today Practical workaround Convert ONNX → Core ML using coremltools 9 No. The ONNX importer was removed. Use the legacy coremltools 6.x + onnx-coreml stack in a separate Python 3.9/3.10 env. Convert PyTorch → Core ML directly Yes, but only for simple or traced models. T5 still fails because of dynamic control flow. None reliable without rewriting the model. Run ONNX models directly on iOS Partially (via MLC LLM / Metal Performance Shaders). Works, but that’s an entirely different runtime. So: you’re not crazy, and this isn’t wasted effort—you’ve actually proven that your model exports cleanly and that the only missing piece is Apple’s converter. The ONNX file you built is solid. You just need something that can translate it to Core ML. If your goal is simply “get it running on-device”, you still have two viable moves that aren’t rabbit holes: 1. Use MLC LLM / Metal runtime Skip Core ML altogether. Apple’s mlc-llm (and the “MLCChat” pipeline) can execute your ONNX or quantized model directly on the GPU. You keep full fidelity and avoid all these broken converters. 2. Use the legacy Core ML 6.x toolchain in a sandboxed Python 3.10 VM It actually works. You don’t have to downgrade your whole system—just run that specific environment once to emit the .mlpackage, then never touch it again. Nothing else will silently fix itself until Apple re-adds ONNX ingestion in Core ML 10 or a new converter.
Replies
0
Boosts
0
Views
304
Activity
Oct ’25
Hardware Support for Low Precision Data Types?
Hi all, I'm trying to find out if/when we can expect mxfp8/mxfp4 support on Apple Silicon. I've noticed that mlx now has casting data types, but all computation is still done in bf16. Would be great to reduce power consumption with support for these lower precision data types since edge inference is already typically done at a lower precision! Thanks in advance.
Replies
0
Boosts
0
Views
360
Activity
Nov ’25
“Accelerate Transformer Training on Apple Devices from Months to Hours!”
I am excited to share that I have developed a Metal kernel for Flash Attention that eliminates race conditions and fully leverages Apple Silicon’s shared memory and registers. This kernel can dramatically accelerate training of transformer-based models. Early benchmarks suggest that models which previously required months to train could see reductions to just a few hours on Apple hardware, while maintaining numerical stability and accuracy. I plan to make the code publicly available to enable the broader community to benefit. I would be happy to keep you updated on the latest developments and improvements as I continue testing and optimizing the kernel. I believe this work could provide valuable insights for Apple’s machine learning research and products.
Replies
0
Boosts
0
Views
329
Activity
Nov ’25
Getting CoreML to run inference on already allocated gpu buffers
I am running some experiments with WebGPU using the wgpu crate in rust. I have some Buffers already allocated in the GPU. Is it possible to use those already existing buffers directly as inputs to a predict call in CoreML? I want to prevent gpu to cpu download time as much as possible. Or are there any other ways to do something like this. Is this only possible using the latest Tensor object which came out with Metal 4 ?
Replies
0
Boosts
0
Views
928
Activity
Nov ’25
ANE Error with Statefu Model: "Unable to compute prediction" when State Tensor width is not 32-aligned
Hi everyone, I believe I’ve encountered a potential bug or a hardware alignment limitation in the Core ML Framework / ANE Runtime specifically affecting the new Stateful API (introduced in iOS 18/macOS 15). The Issue: A Stateful mlprogram fails to run on the Apple Neural Engine (ANE) if the state tensor dimensions (specifically the width) are not a multiple of 32. The model works perfectly on CPU and GPU, but fails on ANE both during runtime and when generating a Performance Report in Xcode. Error Message in Xcode UI: "There was an error creating the performance report Unable to compute the prediction using ML Program. It can be an invalid input data or broken/unsupported model." Observations: Case A (Fails): State shape = (1, 3, 480, 270). Prediction fails on ANE. Case B (Success): State shape = (1, 3, 480, 256). Prediction succeeds on ANE. This suggests an internal memory alignment or tiling issue within the ANE driver when handling Stateful buffers that don't meet the 32-pixel/element alignment. Reproduction Code (PyTorch + coremltools): import torch.nn as nn import coremltools as ct import numpy as np class RNN_Stateful(nn.Module): def __init__(self, hidden_shape): super(RNN_Stateful, self).__init__() # Simple conv to update state self.conv1 = nn.Conv2d(3 + hidden_shape[1], hidden_shape[1], kernel_size=3, padding=1) self.conv2 = nn.Conv2d(hidden_shape[1], 3, kernel_size=3, padding=1) self.register_buffer("hidden_state", torch.ones(hidden_shape, dtype=torch.float16)) def forward(self, imgs): self.hidden_state = self.conv1(torch.cat((imgs, self.hidden_state), dim=1)) return self.conv2(self.hidden_state) # h=480, w=255 causes ANE failure. w=256 works. b, ch, h, w = 1, 3, 480, 255 model = RNN_Stateful((b, ch, h, w)).eval() traced_model = torch.jit.trace(model, torch.randn(b, 3, h, w)) mlmodel = ct.convert( traced_model, inputs=[ct.TensorType(name="input_image", shape=(b, 3, h, w), dtype=np.float16)], outputs=[ct.TensorType(name="output", dtype=np.float16)], states=[ct.StateType(wrapped_type=ct.TensorType(shape=(b, ch, h, w), dtype=np.float16), name="hidden_state")], minimum_deployment_target=ct.target.iOS18, convert_to="mlprogram" ) mlmodel.save("rnn_stateful.mlpackage") Steps to see the error: Open the generated .mlpackage in Xcode 16.0+. Go to the Performance tab and run a test on a device with ANE (e.g., iPhone 15/16 or M-series Mac). The report will fail to generate with the error mentioned above. Environment: OS: macOS 15.2 Xcode: 16.3 Hardware: M4 Has anyone else encountered this 32-pixel alignment requirement for StateType tensors on ANE? Is this a known hardware constraint or a bug in the Core ML runtime? Any insights or workarounds (other than manual padding) would be appreciated.
Replies
0
Boosts
0
Views
577
Activity
Dec ’25
CoreML Unified Memory failure/silent exit on long video tasks (M1 Mac 32GB)
Hi Apple Engineers, I am experiencing a potential memory management bug with CoreML on M1 Mac (32GB Unified Memory). When processing long video files (approx. 12,000 frames) using a CoreML execution provider, the system often completes the 'Analysing' phase but fails to transition into 'Processing'. It simply exits silently or hits an import error (scipy). However, if I split the same task into small 20-frame segments, it works perfectly at high speeds (~40 FPS). This suggests the hardware is capable, but there is an issue with memory fragmentation or resource cleanup during long-running CoreML sessions. Is there a way to force a VRAM/Unified Memory flush via CLI, or is this a known limitation for large frame indexing?
Replies
0
Boosts
0
Views
615
Activity
Dec ’25
SoundAnalysis built-in classifier fails in background (SNErrorCode.operationFailed)
I’m seeing consistent failures using SoundAnalysis live classification when my app moves to the background. Setup iOS 17.x AVAudioEngine mic capture SNAudioStreamAnalyzer SNClassifySoundRequest(classifierIdentifier: .version1) UIBackgroundModes = audio AVAudioSession .record / .playAndRecord, active Audio capture + level metering continue working in background (mic indicator stays on) Issue As soon as the app enters background / screen locks: SoundAnalysis starts failing every second with domain:com.apple.SoundAnalysis, code:2(SNErrorCode.operationFailed) Audio capture itself continues normally When the app returns to foreground, classification immediately resumes without restarting the engine/analyzer Question Is live background sound classification with the built-in SoundAnalysis classifier officially unsupported or known to fail in background? If so, is a custom Core ML model the only supported approach for background detection? Or is there a required configuration I’m missing to keep SNClassifySoundRequest(.version1) running in background? Thanks for any clarification.
Replies
0
Boosts
1
Views
284
Activity
Dec ’25
ML contraints & Timeout clarificaitions for Message Filtering Extension
Hello everyone, I’m currently working with the Message Filtering Extension and would really appreciate some clarification around its performance and operational constraints. While the extension is extremely powerful and useful, I’ve found that some important details are either unclear or not well covered in the available documentation. There are two main areas I’m trying to understand better: Machine learning model constraints within the extension In our case, we already have an existing ML model that classifies messages (and are not dependant on Apple's built-in models). We’re evaluating whether and how it can be used inside the extension. Specifically, I’m trying to understand: Are there documented limits on the size of an ML model (e.g., maximum bundle size or model file size in MB)? What are the memory constraints for a model once loaded into memory by the extension? Under what conditions would the system terminate or “kick out” the extension due to memory or performance pressure? Message processing timeouts and execution constraints What is the timeout for processing a single received message? At what point will the OS stop waiting for the extension’s response and allow the message by default (for example, if the extension does not respond in time)? Any guidance, official references, or practical experience from Apple engineers or other developers would be greatly appreciated. Thanks in advance for your help,
Replies
0
Boosts
0
Views
328
Activity
Jan ’26
Is it possible to instantiate MLModel strictly from memory (Data) to support custom encryption?
We are trying to implement a custom encryption scheme for our Core ML models. Our goal is to bundle encrypted models, decrypt them into memory at runtime, and instantiate the MLModel without the unencrypted model file ever touching the disk. We have looked into the native apple encryption described here https://developer.apple.com/documentation/coreml/encrypting-a-model-in-your-app but it has limitations like not working on intel macs, without SIP, and doesn’t work loading from dylib. It seems like most of the Core ML APIs require a file path, there is MLModelAsset APIs but I think they just write a modelc back to disk when compiling but can’t find any information confirming that (also concerned that this seems to be an older API, and means we need to compile at runtime). I am aware that the native encryption will be much more secure but would like not to have the models in readable text on disk. Does anyone know if this is possible or any alternatives to try to obfuscate the Core ML models, thanks
Replies
0
Boosts
1
Views
710
Activity
Feb ’26
MLX/Ollama Benchmarking Suite - Open Source and Free
Hi all, I spent the last few months developing an MLX/Ollama local AI Benchmarking suite for Apple Silicon, written in pure Swift and signed with an Apple Developer Certificate, open source, GPL, and free. I would love some feedback to continue development. It is the only benchmarking suite I know of that supports live power metrics and MLX natively, as well as quick exports for benchmark results, and an arena mode, Model A vs B with history. I really want this project to succeed, and have widespread use, so getting 75 stars on the github repo makes it eligible for Homebrew/Cask distribution. Github Repo
Replies
0
Boosts
0
Views
311
Activity
Feb ’26
Core Model Editor and Params
Optimal Precision • Current Precision: Mixed (Float32, int32) • Optimal Precision: Not specified in the image, but typically involves using the most efficient data type for the model's operations to balance speed and memory usage without significant loss of accuracy. Comparison: • Mixed Precision: Utilizes both Float32 and int32 to optimize performance. Float32 provides high precision, while int32 reduces memory usage and increases computational speed. • Optimal Precision: Aimed at achieving the best trade-off between performance and accuracy, potentially using other data types like Float16 (bfloat16) for even greater efficiency in certain hardware environments. Operation Distribution • Current Distribution: • iOS18.mul: 168 • iOS18.transpose: 126 • iOS18.linear: 98 • iOS18.add: 97 • iOS18.sliceByIndex: 96 • iOS18.expandDims: 74 • iOS18.concat: 72 • iOS18.squeeze: 72 • iOS18.reshape: 67 • iOS18.layerNorm: 49 • iOS18.matmul: 48 • iOS18.gelu: 26 • iOS18.softmax: 24 • Split: 24 • conv: 1 • iOS18.conv: 1 Comparison: • Operation Count: Indicates how frequently each operation is executed. High counts for operations like mul, transpose, and linear suggest these are computationally intensive parts of the model. • Optimization Opportunities: Reducing the count of high-frequency operations or optimizing their execution can improve performance. This might involve pruning unnecessary operations, optimizing algorithms, or leveraging hardware acceleration. General Recommendations • Precision Tuning: Experiment with different precision levels to find the best balance for your specific hardware and accuracy requirements. • Operation Optimization: Focus on optimizing the most frequent operations. Techniques include using more efficient algorithms, parallelizing computations, or utilizing specialized hardware like GPUs or TPUs. • Benchmarking: Regularly benchmark the model to assess the impact of changes and ensure that optimizations lead to meaningful performance improvements. By focusing on these areas, you can potentially enhance the efficiency and performance of your ML model.
Replies
0
Boosts
0
Views
233
Activity
Feb ’26
Building a 4-agent autonomous coding pipeline on Apple Silicon — MLX backend questions
Hi, I'm building ANF (Autonomous Native Forge) — a cloud-free, 4-agent autonomous software production pipeline running on local hardware with local LLM inference. No middleware, pure Node.js native. Currently running on NVIDIA Blackwell GB10 with vLLM + DeepSeek-R1-32B. Now porting to Apple Silicon. Three technical questions: How production-ready is mlx-lm's OpenAI-compatible API server for long context generation (32K tokens)? What's the recommended approach for KV Cache management with Unified Memory architecture — any specific flags or configurations for M4 Ultra? MLX vs GGUF (llama.cpp) for a multi-agent pipeline where 4 agents call the inference endpoint concurrently — which handles parallel requests better on Apple Silicon? GitHub: github.com/trgysvc/AutonomousNativeForge Any guidance appreciated.
Replies
0
Boosts
0
Views
531
Activity
Mar ’26
How does ARKit achieve low-latency and stable head tracking using only RGB camera ?
Hi, I’m working on a real-time head/face tracking pipeline using a standard 2D RGB camera, and I’m trying to better understand how ARKit achieves such stable and responsive results in comparable conditions. To clarify upfront: I’m specifically interested in RGB-only tracking and the underlying vision/ML pipeline. I’m not using TrueDepth or any depth/IR-based sensors, and I’d like to understand how similar stability and responsiveness can be achieved under those constraints. In my current setup, I estimate head pose from RGB frames (facial landmarks + PnP) and apply temporal filtering (e.g., exponential smoothing and Kalman filtering). This significantly reduces jitter, but introduces noticeable latency, especially during faster head movements. What stands out in ARKit is that it appears to maintain both: Very low jitter Very low perceived latency even when operating with camera input alone. I’m trying to understand what techniques might contribute to this behavior. In particular: Does ARKit use predictive tracking (e.g., velocity or acceleration-based pose extrapolation) to compensate for camera and processing delays in RGB-only scenarios? Are there recommended strategies for balancing temporal smoothing and responsiveness without introducing visible lag in camera-based pose estimation pipelines? Is the tracking pipeline internally decoupled from rendering (e.g., asynchronous processing with prediction applied at render time)? Are there general best practices for minimizing end-to-end latency in vision-based head tracking systems beyond standard filtering approaches? I understand that implementation details may not be public, but any high-level insights or pointers would be greatly appreciated. Thanks!
Replies
0
Boosts
0
Views
298
Activity
Mar ’26
CoreML model cache causes fake hard drive memory usage
Hi, I experiment by creating and compiling a lot of CoreML models and I have the issue that this causes a lot of disk usage, but when I try to delete everything (I search in the disk for possible CoreML cache directories) the disk space is not actually freed up. This is a picture of my disk usage according to what is shown inside of Settings>General>Storage and the Disk Utility app. I am running on macOS 15.7.5
Replies
0
Boosts
0
Views
1.5k
Activity
2w
Do loading multiple functions that share model weights multiply memory use?
Hi, I have a multifunction model where the functions share the same model weights, and for latency I have multiple functions loaded at the same time. According to what Codex found this multiplies RAM usage, so if the single model weights 2GB, loading two functions that share the underlying weights still doubles RAM usage to 4GB (seems that it is something like neural wired memory). Does anyone have any knowledge relating to this?
Replies
0
Boosts
0
Views
1.1k
Activity
2w
When will mps support fp8 dtypes?
https://github.com/pytorch/pytorch/issues/132624 this fp8 dtypes unsupport issue has been existed for 2 years, does mlx have any plan to it?
Replies
0
Boosts
0
Views
543
Activity
1w