I was trying the latest coremltools-8.0b1 beta on macOS 15 Beta with the intent to try using the new stateful models api in CoreML.
But the conversion would always fail with the error:
/AppleInternal/Library/BuildRoots/<snip>/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphExecutable.mm:162: failed assertion `Error: the minimum deployment target for macOS is 14.0.0'
Here's a minimal repro, which works fine with both the stable version of coremltools (7.2) and the beta version (8.0b1) on macOS Sonoma 14.5, but fails with both versions of coremltools on macOS 15.0 Beta and Xcode 16.0 Beta. Which means that this most likely isn't an issue with coremltools, but with the native compilation toolchain.
from collections import OrderedDict
import coremltools as ct
import numpy as np
import torch
import torch.nn as nn
class ResidualAttentionBlock(nn.Module):
def __init__(self, d_model: int, n_head: int, attn_mask: torch.Tensor = None):
super().__init__()
self.attn = nn.MultiheadAttention(d_model, n_head)
self.ln_1 = nn.LayerNorm(d_model)
self.mlp = nn.Sequential(
OrderedDict(
[
("c_fc", nn.Linear(d_model, d_model * 4)),
("gelu", nn.GELU()),
("c_proj", nn.Linear(d_model * 4, d_model)),
]
)
)
self.ln_2 = nn.LayerNorm(d_model)
self.attn_mask = attn_mask
def attention(self, x: torch.Tensor):
self.attn_mask = (
self.attn_mask.to(dtype=x.dtype, device=x.device)
if self.attn_mask is not None
else None
)
return self.attn(x, x, x, need_weights=False, attn_mask=self.attn_mask)[0]
def forward(self, x: torch.Tensor):
x = x + self.attention(self.ln_1(x))
x = x + self.mlp(self.ln_2(x))
return x
class Transformer(nn.Module):
def __init__(
self, width: int, layers: int, heads: int, attn_mask: torch.Tensor = None
):
super().__init__()
self.width = width
self.layers = layers
self.resblocks = nn.Sequential(
*[ResidualAttentionBlock(width, heads, attn_mask) for _ in range(layers)]
)
def forward(self, x: torch.Tensor):
return self.resblocks(x)
transformer = Transformer(width=512, layers=12, heads=8)
emb_tokens = torch.rand((1, 512))
ct_model = ct.convert(
torch.jit.trace(transformer.eval(), emb_tokens),
convert_to="mlprogram",
minimum_deployment_target=ct.target.macOS14,
inputs=[ct.TensorType(name="embIn", shape=[1, 512])],
outputs=[ct.TensorType(name="embOutput", dtype=np.float32)],
)
Core ML
RSS for tagIntegrate machine learning models into your app using Core ML.
Post
Replies
Boosts
Views
Activity
I have several CoreML models that I've set up to run in sequence where one of the outputs from each model is passed as one of the inputs to the next.
For the most part, there is very little overhead in between each sub-model "chunk":
However a couple of the models (eg the first two above) spend a noticeable amount of time in "Prepare Neural Engine Request". From Instruments, it seems like this is spent doing some sort of model loading.
Given that I'm calling these models in sequence and in a fixed order, is there some way to reduce or amortize this cost? Thanks!
"On the latest iOS 18 beta 2, the OCR API,the Translate App and Live Text performs very poorly in recognizing Japanese."
Hi All,
I am trying to build a new iOS app by following https://developer.apple.com/videos/play/wwdc2024/10163/?time=67
When I trying to remove all legacy VN I am getting error, I would appreciate if someone can help me get up to speed with the new Vision API
The Keras Embedding layer cannot be calculated on Metal because of the missing Op:StatelessRandomGetKeyCounter, as shown in this error message:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Could not satisfy device specification '/job:localhost/replica:0/task:0/device:GPU:0'. enable_soft_placement=0. Supported device types [CPU]. All available devices [/job:localhost/replica:0/task:0/device:GPU:0, /job:localhost/replica:0/task:0/device:CPU:0]. [Op:StatelessRandomGetKeyCounter]
A workaround is to enable soft placement, but this obviously is slower:
tf.config.set_soft_device_placement(True)
Reporting it here as recommended by the TensorFlow Plugin Metal team.
Hi, the following model does not run on ANE. Inspecting with deCoreML I see the error ane: Failed to retrieved zero_point.
import numpy as np
import coremltools as ct
from coremltools.converters.mil import Builder as mb
import coremltools.converters.mil as mil
B, CIN, COUT = 512, 1024, 1024 * 4
@mb.program(
input_specs=[
mb.TensorSpec((B, CIN), mil.input_types.types.fp16),
],
opset_version=mil.builder.AvailableTarget.iOS18
)
def prog_manual_dequant(
x,
):
qw = np.random.randint(0, 2 ** 4, size=(COUT, CIN), dtype=np.int8).astype(mil.mil.types.np_uint4_dtype)
scale = np.random.randn(COUT, 1).astype(np.float16)
offset = np.random.randn(COUT, 1).astype(np.float16)
# offset = np.random.randint(0, 2 ** 4, size=(COUT, 1), dtype=np.uint8).astype(mil.mil.types.np_uint4_dtype)
dqw = mb.constexpr_blockwise_shift_scale(data=qw, scale=scale, offset=offset)
return mb.linear(x=x, weight=dqw)
cml_qmodel = ct.convert(
prog_manual_dequant,
compute_units=ct.ComputeUnit.CPU_AND_NE,
compute_precision=ct.precision.FLOAT16,
minimum_deployment_target=ct.target.iOS18,
)
Whereas if I use an offset with the same dtype as the weights (uint4 in this case), it does run on ANE
Tested on coremltools 8.0b1, on macOS 15.0 beta 2/Xcode 15 beta 2, and macOS 15.0 beta 3/Xcode 15 beta 3.
Hey all 👋🏼
We're currently working on a video processing project using the Vision framework (face, body and hand pose detection), and We've encountered a couple of errors that I need help with. We are on Xcode 16 Beta 3, testing on an iPhone 14 Pro running iOS 18 beta.
The error messages are as follows:
[LOG_ERROR] /Library/Caches/com.apple.xbs/Sources/MediaAnalysis/VideoProcessing/VCPHumanPoseImageRequest.mm[85]: code 18,446,744,073,709,551,598
encountered an unexpected condition: *** -[__NSArrayM insertObject:atIndex:]: object cannot be nil
What we've tried:
Debugging: I’ve tried stepping through the code, but the errors occur before I can gather any meaningful insights.
Searching Documentation: Looked through Apple’s developer documentation and forums but couldn’t find anything related to these specific error codes.
Nil Check: Added checks to ensure objects are not nil before inserting them into arrays, but the error persists.
Here are my questions:
Has anyone encountered similar errors with the Vision framework, specifically related to VCPHumanPoseImageRequest and NSArray operations?
Is there any known issue or bug in the version of the framework I might be using? Could it also be related to the beta?
Are there any additional debug steps or logging mechanisms I can implement to narrow down the cause?
Any suggestions on how to handle nil objects more effectively in this context?
I would greatly appreciate any insights or suggestions you might have. Thank you in advance for your assistance!
Thanks all!
I was watching wwdc2024 Deploy machine learning and AI models on-device with Core ML (https://developer.apple.com/videos/play/wwdc2024/10161/) and speaker was showing UI interface where he was ruining on device LLMs / Foundation models. I was wondering if this UI interface is open source and I can download and play around with similar app what was shown:
After I upgraded to MacOS 15 Beta 4(M1 16G), the sampling speed of apple ml-stable-diffusion was about 40% slower than MacOS 14.
And when I recompile and run with xcode 16, the following error will appear:
loc("EpicPhoto/Unet.mlmodelc/model.mil":2748:12): error: invalid axis: 4294967296, axis must be in range -|rank| <= axis < |rank|
Assertion failed: (0 && "failed to infer output types"), function _inferJITOutputTypes, file GPUBaseOps.mm, line 339.
I checked the macos 15 release notes and saw that the problem of slow running of Core ML models was fixed, but it didn't seem to be fixed.
Fixed: Inference time for large Core ML models is slower than expected on a subset of M-series SOCs (e.g. M1, M1 max) on macOS. (129682801)
In macOS 15 beta the gridsample function from PyTorch is not executing as expected on the Apple Neural Engine in MacBook Pro M2.
Please find below a Python code snippet that demonstrates the problem:
import coremltools as ct
import torch.nn as nn
import torch.nn.functional as F
class PytorchGridSample(torch.nn.Module):
def __init__(self, grids):
super(PytorchGridSample, self).__init__()
self.upsample1 = nn.ConvTranspose2d(512, 256, kernel_size=4, stride=2, padding=1)
self.upsample2 = nn.ConvTranspose2d(256, 128, kernel_size=4, stride=2, padding=1)
self.upsample3 = nn.ConvTranspose2d(128, 64, kernel_size=4, stride=2, padding=1)
self.upsample4 = nn.ConvTranspose2d(64, 32, kernel_size=4, stride=2, padding=1)
self.upsample5 = nn.ConvTranspose2d(32, 3, kernel_size=4, stride=2, padding=1)
self.grids = grids
def forward(self, x):
x = self.upsample1(x)
x = F.grid_sample(x, self.grids[0], padding_mode='reflection', align_corners=False)
x = self.upsample2(x)
x = F.grid_sample(x, self.grids[1], padding_mode='reflection', align_corners=False)
x = self.upsample3(x)
x = F.grid_sample(x, self.grids[2], padding_mode='reflection', align_corners=False)
x = self.upsample4(x)
x = F.grid_sample(x, self.grids[3], padding_mode='reflection', align_corners=False)
x = self.upsample5(x)
x = F.grid_sample(x, self.grids[4], padding_mode='reflection', align_corners=False)
return x
def convert_to_coreml(model, input_):
traced_model = torch.jit.trace(model, example_inputs=input_, strict=False)
coreml_model = ct.converters.convert(
traced_model,
inputs=[ct.TensorType(shape=input_.shape)],
compute_precision=ct.precision.FLOAT16,
minimum_deployment_target=ct.target.macOS14,
compute_units=ct.ComputeUnit.ALL
)
return coreml_model
def main(pt_model, input_):
coreml_model = convert_to_coreml(pt_model, input_)
coreml_model.save("grid_sample.mlpackage")
if __name__ == "__main__":
input_tensor = torch.randn(1, 512, 4, 4)
grids = [torch.randn(1, 2*i, 2*i, 2) for i in [4, 8, 16, 32, 64, 128]]
pt_model = PytorchGridSample(grids)
main(pt_model, input_tensor)
I wanted to deploy some ViT models on an iPhone. I referred to https://machinelearning.apple.com/research/vision-transformers for deployment and wrote a simple demo based on the code from https://github.com/apple/ml-vision-transformers-ane. However, I found that the uncached load time on the phone is very long. According to the blog, the input is already aligned to 64 bytes, but the speed is still very slow. Is there any way to speed it up? This is my test case:
import torch
import coremltools as ct
import math
from torch import nn
class SelfAttn(torch.nn.Module):
def __init__(self, window_size, num_heads, dim, dim_out):
super().__init__()
self.window_size = window_size
self.num_heads = num_heads
self.dim = dim
self.dim_out = dim_out
self.q_proj = nn.Conv2d(
in_channels=dim,
out_channels=dim_out,
kernel_size=1,
)
self.k_proj = nn.Conv2d(
in_channels=dim,
out_channels=dim_out,
kernel_size=1,
)
self.v_proj = nn.Conv2d(
in_channels=dim,
out_channels=dim_out,
kernel_size=1,
)
def forward(self, x):
B, HW, C = x.shape
image_shape = (B, C, self.window_size, self.window_size)
x_2d = x.permute((0, 2, 1)).reshape(image_shape) # BCHW
x_flat = torch.unsqueeze(x.permute((0, 2, 1)), 2) # BC1L
q, k, v_2d = self.q_proj(x_flat), self.k_proj(x_flat), self.v_proj(x_2d)
mh_q = torch.split(q, self.dim_out // self.num_heads, dim=1) # BC1L
mh_v = torch.split(
v_2d.reshape(B, -1, x_flat.shape[2], x_flat.shape[3]), self.dim_out // self.num_heads, dim=1
)
mh_k = torch.split(
torch.permute(k, (0, 3, 2, 1)), self.dim_out // self.num_heads, dim=3
)
scale_factor = 1 / math.sqrt(mh_q[0].size(1))
attn_weights = [
torch.einsum("bchq, bkhc->bkhq", qi, ki) * scale_factor
for qi, ki in zip(mh_q, mh_k)
]
attn_weights = [
torch.softmax(aw, dim=1) for aw in attn_weights
] # softmax applied on channel "C"
mh_x = [torch.einsum("bkhq,bchk->bchq", wi, vi) for wi, vi in zip(attn_weights, mh_v)]
x = torch.cat(mh_x, dim=1)
return x
window_size = 8
path_batch = 1024
emb_dim = 96
emb_dim_out = 96
x = torch.rand(path_batch, window_size * window_size, emb_dim)
qkv_layer = SelfAttn(window_size, 1, emb_dim, emb_dim_out)
jit = torch.jit.trace(qkv_layer, (x))
mlmod_fixed_shape = ct.convert(
jit,
inputs=[
ct.TensorType("x", x.shape),
],
convert_to="mlprogram",
)
mlmodel_path = "test_ane.mlpackage"
mlmod_fixed_shape.save(mlmodel_path)
The uncached load took nearly 36 seconds, and it was just a single matrix multiplication.
Hello,
My App works well on iOS17 and previous iOS18 Beta version, while it crashes on latest iOS18 Beta5, when it calling model predictionFromFeatures.
Calling stack of crash is as:
*** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: 'Unrecognized ANE execution priority MLANEExecutionPriority_Unspecified'
Last Exception Backtrace:
0 CoreFoundation 0x000000019bd6408c __exceptionPreprocess + 164
1 libobjc.A.dylib 0x000000019906b2e4 objc_exception_throw + 88
2 CoreFoundation 0x000000019be5f648 -[NSException initWithCoder:]
3 CoreML 0x00000001b7507340 -[MLE5ExecutionStream _setANEExecutionPriorityWithOptions:] + 248
4 CoreML 0x00000001b7508374 -[MLE5ExecutionStream _prepareForInputFeatures:options:error:] + 248
5 CoreML 0x00000001b7507ddc -[MLE5ExecutionStream executeForInputFeatures:options:error:] + 68
6 CoreML 0x00000001b74ce5c4 -[MLE5Engine _predictionFromFeatures:stream:options:error:] + 80
7 CoreML 0x00000001b74ce7fc -[MLE5Engine _predictionFromFeatures:options:error:] + 208
8 CoreML 0x00000001b74cf110 -[MLE5Engine _predictionFromFeatures:usingState:options:error:] + 400
9 CoreML 0x00000001b74cf270 -[MLE5Engine predictionFromFeatures:options:error:] + 96
10 CoreML 0x00000001b74ab264 -[MLDelegateModel _predictionFromFeatures:usingState:options:error:] + 684
11 CoreML 0x00000001b70991bc -[MLDelegateModel predictionFromFeatures:options:error:] + 124
And my model file type is ml package file. Source code is as below:
//model
MLModel *_model;
......
// model init
MLModelConfiguration* config = [[MLModelConfiguration alloc]init];
config.computeUnits = MLComputeUnitsCPUAndNeuralEngine;
_model = [MLModel modelWithContentsOfURL:compileUrl configuration:config error:&error];
.....
// model prediction
MLPredictionOptions *option = [[MLPredictionOptions alloc]init];
id<MLFeatureProvider> outFeatures = [_model predictionFromFeatures:_modelInput options:option error:&error];
Is there anything wrong? Any advice would be appreciated.
I'm trying to cast the error thrown by TranslationSession.translations(from:) as Translation.TranslationError. However, the app crashes at runtime whenever Translation.TranslationError is used in the project.
Environment:
iOS Version: 18.1 beta
Xcode Version: 16 beta
yld[14615]: Symbol not found: _$s11Translation0A5ErrorVMa
Referenced from: <3426152D-A738-30C1-8F06-47D2C6A1B75B> /private/var/containers/Bundle/Application/043A25BC-E53E-4B28-B71A-C21F77C0D76D/TranslationAPI.app/TranslationAPI.debug.dylib
Expected in: /System/Library/Frameworks/Translation.framework/Translation
All errors in TranslationError return the same error code, making it difficult to differentiate between them. How can this issue be resolved?
func testMLTensor() {
let t1 = MLTensor(shape: [2000, 1], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 2000), scalarType: Float.self)
let t2 = MLTensor(shape: [1, 3000], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 3000), scalarType: Float.self)
for _ in 0...50 {
let t = Date()
let x = (t1 * t2)
print("MLTensor", t.timeIntervalSinceNow * 1000, "ms")
}
}
testMLTensor()
The above code took more time than expected, especially in the early stage of iteration.
func testMLTensor() {
let t1 = MLTensor(shape: [2000, 1], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 2000), scalarType: Float.self)
let t2 = MLTensor(shape: [1, 3000], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 3000), scalarType: Float.self)
for _ in 0...50 {
let t = Date()
let x = (t1 * t2)
print("MLTensor", t.timeIntervalSinceNow * 1000, "ms")
}
}
testMLTensor()
The above code took more time than expected, especially in the early stage of iteration.
func testMLTensor() {
let t1 = MLTensor(shape: [2000, 1], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 2000), scalarType: Float.self)
let t2 = MLTensor(shape: [1, 3000], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 3000), scalarType: Float.self)
for _ in 0...50 {
let t = Date()
let x = (t1 * t2)
print("MLTensor", t.timeIntervalSinceNow * 1000, "ms")
}
}
testMLTensor()
The above code took more time than expected, especially in the early stage of iteration.
com.apple.Vision Code=9 "Could not build inference plan - ANECF error: failed to load ANE model file:///System/Library/Frameworks/ Vision.framework/anodv4_drop6_fp16.H14G.espresso.hwx
Code rise this error:
func imageToHeadBox(image: CVPixelBuffer) async throws -> [CGRect] {
let request:DetectFaceRectanglesRequest = DetectFaceRectanglesRequest()
let faceResult:[FaceObservation] = try await request.perform(on: image)
let faceBoxs:[CGRect] = faceResult.map { face in
let faceBoundingBox:CGRect = face.boundingBox.cgRect
return faceBoundingBox
}
return faceBoxs
}
Recently, deep learning model have been getting larger, and sometimes loading models has become a bottleneck. I download the .mlpackage format CoreML from the internet and need to use compileModelAtURL to convert the .mlpackage into an .mlmodelc, then call modelWithContentsOfURL to convert the .mlmodelc into a handle. Generally, generating a handle with modelWithContentsOfURL is very slow. I noticed from WWDC 2023 that it is possible to cache the compiled results (see https://developer.apple.com/videos/play/wwdc2023/10049/?time=677, which states "This compilation includes further optimizations for the specific compute device and outputs an artifact that the compute device can run. Once complete, Core ML caches these artifacts to be used for subsequent model loads."). However, it seems that I couldn't find how to cache in the documentation.
Recently, deep learning projects have been getting larger, and sometimes loading models has become a bottleneck. I download the .mlpackage format CoreML from the internet and need to use compileModelAtURL to convert the .mlpackage into an .mlmodelc, then call modelWithContentsOfURL to convert the .mlmodelc into a handle. Generally, generating a handle with modelWithContentsOfURL is very slow. I noticed from WWDC 2023 that it is possible to cache the compiled results (see https://developer.apple.com/videos/play/wwdc2023/10049/?time=677, which states "This compilation includes further optimizations for the specific compute device and outputs an artifact that the compute device can run. Once complete, Core ML caches these artifacts to be used for subsequent model loads."). However, it seems that I couldn't find how to cache in the documentation.