We have an application that receives a message (through MQTT) from an external system to snap a photo, runs a CoreML vision request on the image, and then sends the results back. The customer has 100s of devices and recently on a couple of those devices (13 pros), the customer encountered an issue in which the devices were not responding in time. There was no crash, just some individual inferences were slowed down. The device performs 1000s of requests per day. Upon further evaluation of the request before and after in the device logs, I noticed that Apple loads the following
default 2024-09-04 13:18:31.310401 -0400 ProcessName Processing image for reference: ***
default 2024-09-04 13:18:31.403606 -0400 ProcessName Found matching service: H1xANELoadBalancer
default 2024-09-04 13:18:31.403646 -0400 ProcessName Found matching service: H11ANEIn
default 2024-09-04 13:18:31.403661 -0400 ProcessName Found ANE device :1
default 2024-09-04 13:18:31.403681 -0400 ProcessName Total num of devices 1
default 2024-09-04 13:18:31.403681 -0400 ProcessName (Single-ANE System) Opening H11ANE device at index 0
default 2024-09-04 13:18:31.403681 -0400 ProcessName H11ANEDevice::H11ANEDeviceOpen, usage type: 1
In a good scenario (above), these actions will performed very quickly (in a split second). The app doesn't do anything until coreml inference result is returned. In the bad scenario (below), there is a delay of about 4 seconds from app passing the control to vision request and then getting the response back (leading to timeouts with the customer)
default 2024-09-04 13:19:08.777468 -0400 ProcessName Processing image for reference: ZZZ
default 2024-09-04 13:19:12.199758 -0400 ProcessName Found matching service: H1xANELoadBalancer
default 2024-09-04 13:19:12.199800 -0400 ProcessName Found matching service: H11ANEIn
default 2024-09-04 13:19:12.199812 -0400 ProcessName Found ANE device :1
default 2024-09-04 13:19:12.199832 -0400 ProcessName Total num of devices 1
default 2024-09-04 13:19:12.199834 -0400 ProcessName (Single-ANE System) Opening H11ANE device at index 0
default 2024-09-04 13:19:12.199834 -0400 ProcessName H11ANEDevice::H11ANEDeviceOpen, usage type: 1
The logs are in order, I haven't removed anything. The code is fairly simple, it's just running a vision request without doing much. Has anyone encountered this before?
Core ML
RSS for tagIntegrate machine learning models into your app using Core ML.
Post
Replies
Boosts
Views
Activity
Xcode Version: Version 15.2 (15C500b)
com.github.apple.coremltools.source: torch==1.12.1
com.github.apple.coremltools.version: 7.2
Compute: Mixed (Float16, Int32)
Storage: Float16
The input to the mlpackage is MultiArray (Float16 1 × 1 × 544 × 960)
The flexibility is: 1 × 1 × 544 × 960 | 1 × 1 × 384 × 640 | 1 × 1 × 736 × 1280 | 1 × 1 × 1088 × 1920
I tested this on iPhone XR, iPhone 11, iPhone 12, iPhone 13, and iPhone 14. On all devices except the iPhone 11, the model runs correctly on the NPU. However, on the iPhone 11, the model runs on the CPU instead.
Here is the CoreMLTools conversion code I used:
mlmodel = ct.convert(trace,
inputs=[ct.TensorType(shape=input_shape, name="input", dtype=np.float16)],
outputs=[ct.TensorType(name="output", dtype=np.float16, shape=output_shape)],
convert_to='mlprogram',
minimum_deployment_target=ct.target.iOS16
)
When I use CoreML to infer a w8a8 model on iPhone 15 (iOS 18 beta 8), the model uses CPU inference instead of ANE, which results in slower inference speed. The model I am using is from the coremltools documentation, which indicates that on iOS 17, quantized models can run on ANE properly and achieve faster speeds. How can I make the quantized model run correctly on ANE to achieve the desired inference speed?
To reproduce this issue, you can download the Weight & Activation quantized model from the following link: https://apple.github.io/coremltools/docs-guides/source/opt-quantization-perf.html.
We have a code that crashed The crash stack is as follows
Thread 26 Crashed:
0 CoreFoundation 0x0000000198b0569c CFRelease + 44
1 CoreFoundation 0x0000000198b12334 __CFBasicHashRehash + 1172
2 CoreFoundation 0x0000000198b015dc __CFBasicHashAddValue + 100
3 CoreFoundation 0x0000000198b232e4 CFDictionarySetValue + 208
4 Foundation 0x00000001979b0378 _getStringAtMarker + 464
5 Foundation 0x00000001979b016c _NSXPCSerializationStringForObject + 56
6 Foundation 0x00000001979cec4c __44-[NSXPCDecoder _decodeArrayOfObjectsForKey:]_block_invoke + 52
7 Foundation 0x00000001979ceb90 _NSXPCSerializationIterateArrayObject + 208
8 Foundation 0x00000001979cda7c -[NSXPCDecoder _decodeArrayOfObjectsForKey:] + 240
9 Foundation 0x00000001979cd1bc -[NSDictionary(NSDictionary) initWithCoder:] + 176
10 Foundation 0x00000001979ae6e8 _decodeObject + 1264
11 Foundation 0x00000001979cec4c __44-[NSXPCDecoder _decodeArrayOfObjectsForKey:]_block_invoke + 52
12 Foundation 0x00000001979ceb90 _NSXPCSerializationIterateArrayObject + 208
13 Foundation 0x00000001979cda7c -[NSXPCDecoder _decodeArrayOfObjectsForKey:] + 240
14 Foundation 0x00000001979cd1a4 -[NSDictionary(NSDictionary) initWithCoder:] + 152
15 Foundation 0x00000001979ae6e8 _decodeObject + 1264
16 Foundation 0x00000001979ad030 -[NSXPCDecoder _decodeObjectOfClasses:atObject:] + 148
17 Foundation 0x0000000197a0a7f0 _NSXPCSerializationDecodeTypedObjCValuesFromArray + 892
18 Foundation 0x0000000197a0a1f8 _NSXPCSerializationDecodeInvocationArgumentArray + 412
19 Foundation 0x0000000197a0866c -[NSXPCDecoder __decodeXPCObject:allowingSimpleMessageSend:outInvocation:outArguments:outArgumentsMaxCount:outMethodSignature:outSelector:isReply:replySelector:] + 700
20 Foundation 0x0000000197a61078 -[NSXPCDecoder _decodeReplyFromXPCObject:forSelector:] + 76
21 Foundation 0x0000000197a5f690 -[NSXPCConnection _decodeAndInvokeReplyBlockWithEvent:sequence:replyInfo:] + 252
22 Foundation 0x0000000197a63664 __88-[NSXPCConnection _sendInvocation:orArguments:count:methodSignature:selector:withProxy:]_block_invoke_5 + 188
23 Foundation 0x0000000197a08058 -[NSXPCConnection _sendInvocation:orArguments:count:methodSignature:selector:withProxy:] + 2244
24 CoreFoundation 0x0000000198b19d88 ___forwarding___ + 1016
25 CoreFoundation 0x0000000198b198d0 _CF_forwarding_prep_0 + 96
26 AppleNeuralEngine 0x00000001e912ab1c -[_ANEDaemonConnection loadModel:sandboxExtension:options:qos:withReply:] + 332
27 AppleNeuralEngine 0x00000001e912a674 __44-[_ANEClient doLoadModel:options:qos:error:]_block_invoke + 360
28 libdispatch.dylib 0x00000001a0a21dd4 _dispatch_client_callout + 20
29 libdispatch.dylib 0x00000001a0a312c4 _dispatch_lane_barrier_sync_invoke_and_complete + 56
30 AppleNeuralEngine 0x00000001e9129ef0 -[_ANEClient doLoadModel:options:qos:error:] + 500
31 Espresso 0x00000001a7e02034 Espresso::ANERuntimeEngine::compiler::build_segment(std::__1::shared_ptr<Espresso::abstract_batch> const&, int, Espresso::net_compiler_segment_based::segment_t const&) + 3736
32 Espresso 0x00000001a7e010cc Espresso::net_compiler_segment_based::build(std::__1::shared_ptr<Espresso::abstract_batch> const&, int, int) + 384
33 Espresso 0x00000001a7df02a4 Espresso::ANERuntimeEngine::compiler::build(std::__1::shared_ptr<Espresso::abstract_batch> const&, int, int) + 120
34 Espresso 0x00000001a7e1b3a4 Espresso::net::__build(std::__1::shared_ptr<Espresso::abstract_batch> const&, int, int) + 360
35 Espresso 0x00000001a7e178e0 Espresso::abstract_context::compute_batch_sync(void (std::__1::shared_ptr<Espresso::abstract_batch> const&) block_pointer) + 112
36 Espresso 0x00000001a7e198b8 EspressoLight::espresso_plan::prepare_compiler_if_needed() + 3208
37 Espresso 0x00000001a7e183f4 EspressoLight::espresso_plan::prepare() + 1712
38 Espresso 0x00000001a7da8e78 espresso_plan_build_with_options + 300
39 Espresso 0x00000001a7da8d30 espresso_plan_build + 44
40 CoreML 0x00000001b346645c -[MLNeuralNetworkEngine rebuildPlan:error:] + 536
41 CoreML 0x00000001b3464294 -[MLNeuralNetworkEngine _setupContextAndPlanWithConfiguration:usingCPU:reshapeWithContainer:error:] + 3132
42 CoreML 0x00000001b34797a0 -[MLNeuralNetworkEngine initWithContainer:configuration:error:] + 196
43 CoreML 0x00000001b347962c +[MLNeuralNetworkEngine loadModelFromCompiledArchive:modelVersionInfo:compilerVersionInfo:configuration:error:] + 164
44 CoreML 0x00000001b34792a0 +[MLLoader _loadModelWithClass:fromArchive:modelVersionInfo:compilerVersionInfo:configuration:error:] + 144
45 CoreML 0x00000001b3478c64 +[MLLoader _loadModelFromArchive:configuration:modelVersion:compilerVersion:loaderEvent:useUpdatableModelLoaders:loadingClasses:error:] + 532
46 CoreML 0x00000001b34650c8 +[MLLoader _loadWithModelLoaderFromArchive:configuration:loaderEvent:useUpdatableModelLoaders:error:] + 424
47 CoreML 0x00000001b3474bc8 +[MLLoader _loadModelFromArchive:configuration:loaderEvent:useUpdatableModelLoaders:error:] + 460
48 CoreML 0x00000001b347a024 +[MLLoader _loadModelFromAssetAtURL:configuration:loaderEvent:error:] + 244
49 CoreML 0x00000001b3479cbc +[MLLoader loadModelFromAssetAtURL:configuration:error:] + 104
50 CoreML 0x00000001b347ac2c -[MLModelAsset load:] + 564
51 CoreML 0x00000001b347a9c4 -[MLModelAsset modelWithError:] + 24
52 CoreML 0x00000001b347a7b4 +[MLModel modelWithContentsOfURL:configuration:error:] + 172
53 CoreML 0x00000001b37afbc4 +[MLModel modelWithContentsOfURL:error:] + 76
Core code
MLModel* model = nil;
NSError *error = nil;
@try
{
model = [MLModel modelWithContentsOfURL:modelURL error:&error];
}
@catch (NSException *exception)
{
model = nil;
return Ret_OperationErr_InvalidInit;
}
Two question:
What does this stack mean?
I added @ try @ catch, why is it still crashing?
Recently, deep learning projects have been getting larger, and sometimes loading models has become a bottleneck. I download the .mlpackage format CoreML from the internet and need to use compileModelAtURL to convert the .mlpackage into an .mlmodelc, then call modelWithContentsOfURL to convert the .mlmodelc into a handle. Generally, generating a handle with modelWithContentsOfURL is very slow. I noticed from WWDC 2023 that it is possible to cache the compiled results (see https://developer.apple.com/videos/play/wwdc2023/10049/?time=677, which states "This compilation includes further optimizations for the specific compute device and outputs an artifact that the compute device can run. Once complete, Core ML caches these artifacts to be used for subsequent model loads."). However, it seems that I couldn't find how to cache in the documentation.
Recently, deep learning model have been getting larger, and sometimes loading models has become a bottleneck. I download the .mlpackage format CoreML from the internet and need to use compileModelAtURL to convert the .mlpackage into an .mlmodelc, then call modelWithContentsOfURL to convert the .mlmodelc into a handle. Generally, generating a handle with modelWithContentsOfURL is very slow. I noticed from WWDC 2023 that it is possible to cache the compiled results (see https://developer.apple.com/videos/play/wwdc2023/10049/?time=677, which states "This compilation includes further optimizations for the specific compute device and outputs an artifact that the compute device can run. Once complete, Core ML caches these artifacts to be used for subsequent model loads."). However, it seems that I couldn't find how to cache in the documentation.
com.apple.Vision Code=9 "Could not build inference plan - ANECF error: failed to load ANE model file:///System/Library/Frameworks/ Vision.framework/anodv4_drop6_fp16.H14G.espresso.hwx
Code rise this error:
func imageToHeadBox(image: CVPixelBuffer) async throws -> [CGRect] {
let request:DetectFaceRectanglesRequest = DetectFaceRectanglesRequest()
let faceResult:[FaceObservation] = try await request.perform(on: image)
let faceBoxs:[CGRect] = faceResult.map { face in
let faceBoundingBox:CGRect = face.boundingBox.cgRect
return faceBoundingBox
}
return faceBoxs
}
func testMLTensor() {
let t1 = MLTensor(shape: [2000, 1], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 2000), scalarType: Float.self)
let t2 = MLTensor(shape: [1, 3000], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 3000), scalarType: Float.self)
for _ in 0...50 {
let t = Date()
let x = (t1 * t2)
print("MLTensor", t.timeIntervalSinceNow * 1000, "ms")
}
}
testMLTensor()
The above code took more time than expected, especially in the early stage of iteration.
func testMLTensor() {
let t1 = MLTensor(shape: [2000, 1], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 2000), scalarType: Float.self)
let t2 = MLTensor(shape: [1, 3000], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 3000), scalarType: Float.self)
for _ in 0...50 {
let t = Date()
let x = (t1 * t2)
print("MLTensor", t.timeIntervalSinceNow * 1000, "ms")
}
}
testMLTensor()
The above code took more time than expected, especially in the early stage of iteration.
func testMLTensor() {
let t1 = MLTensor(shape: [2000, 1], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 2000), scalarType: Float.self)
let t2 = MLTensor(shape: [1, 3000], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 3000), scalarType: Float.self)
for _ in 0...50 {
let t = Date()
let x = (t1 * t2)
print("MLTensor", t.timeIntervalSinceNow * 1000, "ms")
}
}
testMLTensor()
The above code took more time than expected, especially in the early stage of iteration.
All errors in TranslationError return the same error code, making it difficult to differentiate between them. How can this issue be resolved?
I'm trying to cast the error thrown by TranslationSession.translations(from:) as Translation.TranslationError. However, the app crashes at runtime whenever Translation.TranslationError is used in the project.
Environment:
iOS Version: 18.1 beta
Xcode Version: 16 beta
yld[14615]: Symbol not found: _$s11Translation0A5ErrorVMa
Referenced from: <3426152D-A738-30C1-8F06-47D2C6A1B75B> /private/var/containers/Bundle/Application/043A25BC-E53E-4B28-B71A-C21F77C0D76D/TranslationAPI.app/TranslationAPI.debug.dylib
Expected in: /System/Library/Frameworks/Translation.framework/Translation
Hello,
My App works well on iOS17 and previous iOS18 Beta version, while it crashes on latest iOS18 Beta5, when it calling model predictionFromFeatures.
Calling stack of crash is as:
*** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: 'Unrecognized ANE execution priority MLANEExecutionPriority_Unspecified'
Last Exception Backtrace:
0 CoreFoundation 0x000000019bd6408c __exceptionPreprocess + 164
1 libobjc.A.dylib 0x000000019906b2e4 objc_exception_throw + 88
2 CoreFoundation 0x000000019be5f648 -[NSException initWithCoder:]
3 CoreML 0x00000001b7507340 -[MLE5ExecutionStream _setANEExecutionPriorityWithOptions:] + 248
4 CoreML 0x00000001b7508374 -[MLE5ExecutionStream _prepareForInputFeatures:options:error:] + 248
5 CoreML 0x00000001b7507ddc -[MLE5ExecutionStream executeForInputFeatures:options:error:] + 68
6 CoreML 0x00000001b74ce5c4 -[MLE5Engine _predictionFromFeatures:stream:options:error:] + 80
7 CoreML 0x00000001b74ce7fc -[MLE5Engine _predictionFromFeatures:options:error:] + 208
8 CoreML 0x00000001b74cf110 -[MLE5Engine _predictionFromFeatures:usingState:options:error:] + 400
9 CoreML 0x00000001b74cf270 -[MLE5Engine predictionFromFeatures:options:error:] + 96
10 CoreML 0x00000001b74ab264 -[MLDelegateModel _predictionFromFeatures:usingState:options:error:] + 684
11 CoreML 0x00000001b70991bc -[MLDelegateModel predictionFromFeatures:options:error:] + 124
And my model file type is ml package file. Source code is as below:
//model
MLModel *_model;
......
// model init
MLModelConfiguration* config = [[MLModelConfiguration alloc]init];
config.computeUnits = MLComputeUnitsCPUAndNeuralEngine;
_model = [MLModel modelWithContentsOfURL:compileUrl configuration:config error:&error];
.....
// model prediction
MLPredictionOptions *option = [[MLPredictionOptions alloc]init];
id<MLFeatureProvider> outFeatures = [_model predictionFromFeatures:_modelInput options:option error:&error];
Is there anything wrong? Any advice would be appreciated.
I wanted to deploy some ViT models on an iPhone. I referred to https://machinelearning.apple.com/research/vision-transformers for deployment and wrote a simple demo based on the code from https://github.com/apple/ml-vision-transformers-ane. However, I found that the uncached load time on the phone is very long. According to the blog, the input is already aligned to 64 bytes, but the speed is still very slow. Is there any way to speed it up? This is my test case:
import torch
import coremltools as ct
import math
from torch import nn
class SelfAttn(torch.nn.Module):
def __init__(self, window_size, num_heads, dim, dim_out):
super().__init__()
self.window_size = window_size
self.num_heads = num_heads
self.dim = dim
self.dim_out = dim_out
self.q_proj = nn.Conv2d(
in_channels=dim,
out_channels=dim_out,
kernel_size=1,
)
self.k_proj = nn.Conv2d(
in_channels=dim,
out_channels=dim_out,
kernel_size=1,
)
self.v_proj = nn.Conv2d(
in_channels=dim,
out_channels=dim_out,
kernel_size=1,
)
def forward(self, x):
B, HW, C = x.shape
image_shape = (B, C, self.window_size, self.window_size)
x_2d = x.permute((0, 2, 1)).reshape(image_shape) # BCHW
x_flat = torch.unsqueeze(x.permute((0, 2, 1)), 2) # BC1L
q, k, v_2d = self.q_proj(x_flat), self.k_proj(x_flat), self.v_proj(x_2d)
mh_q = torch.split(q, self.dim_out // self.num_heads, dim=1) # BC1L
mh_v = torch.split(
v_2d.reshape(B, -1, x_flat.shape[2], x_flat.shape[3]), self.dim_out // self.num_heads, dim=1
)
mh_k = torch.split(
torch.permute(k, (0, 3, 2, 1)), self.dim_out // self.num_heads, dim=3
)
scale_factor = 1 / math.sqrt(mh_q[0].size(1))
attn_weights = [
torch.einsum("bchq, bkhc->bkhq", qi, ki) * scale_factor
for qi, ki in zip(mh_q, mh_k)
]
attn_weights = [
torch.softmax(aw, dim=1) for aw in attn_weights
] # softmax applied on channel "C"
mh_x = [torch.einsum("bkhq,bchk->bchq", wi, vi) for wi, vi in zip(attn_weights, mh_v)]
x = torch.cat(mh_x, dim=1)
return x
window_size = 8
path_batch = 1024
emb_dim = 96
emb_dim_out = 96
x = torch.rand(path_batch, window_size * window_size, emb_dim)
qkv_layer = SelfAttn(window_size, 1, emb_dim, emb_dim_out)
jit = torch.jit.trace(qkv_layer, (x))
mlmod_fixed_shape = ct.convert(
jit,
inputs=[
ct.TensorType("x", x.shape),
],
convert_to="mlprogram",
)
mlmodel_path = "test_ane.mlpackage"
mlmod_fixed_shape.save(mlmodel_path)
The uncached load took nearly 36 seconds, and it was just a single matrix multiplication.
In macOS 15 beta the gridsample function from PyTorch is not executing as expected on the Apple Neural Engine in MacBook Pro M2.
Please find below a Python code snippet that demonstrates the problem:
import coremltools as ct
import torch.nn as nn
import torch.nn.functional as F
class PytorchGridSample(torch.nn.Module):
def __init__(self, grids):
super(PytorchGridSample, self).__init__()
self.upsample1 = nn.ConvTranspose2d(512, 256, kernel_size=4, stride=2, padding=1)
self.upsample2 = nn.ConvTranspose2d(256, 128, kernel_size=4, stride=2, padding=1)
self.upsample3 = nn.ConvTranspose2d(128, 64, kernel_size=4, stride=2, padding=1)
self.upsample4 = nn.ConvTranspose2d(64, 32, kernel_size=4, stride=2, padding=1)
self.upsample5 = nn.ConvTranspose2d(32, 3, kernel_size=4, stride=2, padding=1)
self.grids = grids
def forward(self, x):
x = self.upsample1(x)
x = F.grid_sample(x, self.grids[0], padding_mode='reflection', align_corners=False)
x = self.upsample2(x)
x = F.grid_sample(x, self.grids[1], padding_mode='reflection', align_corners=False)
x = self.upsample3(x)
x = F.grid_sample(x, self.grids[2], padding_mode='reflection', align_corners=False)
x = self.upsample4(x)
x = F.grid_sample(x, self.grids[3], padding_mode='reflection', align_corners=False)
x = self.upsample5(x)
x = F.grid_sample(x, self.grids[4], padding_mode='reflection', align_corners=False)
return x
def convert_to_coreml(model, input_):
traced_model = torch.jit.trace(model, example_inputs=input_, strict=False)
coreml_model = ct.converters.convert(
traced_model,
inputs=[ct.TensorType(shape=input_.shape)],
compute_precision=ct.precision.FLOAT16,
minimum_deployment_target=ct.target.macOS14,
compute_units=ct.ComputeUnit.ALL
)
return coreml_model
def main(pt_model, input_):
coreml_model = convert_to_coreml(pt_model, input_)
coreml_model.save("grid_sample.mlpackage")
if __name__ == "__main__":
input_tensor = torch.randn(1, 512, 4, 4)
grids = [torch.randn(1, 2*i, 2*i, 2) for i in [4, 8, 16, 32, 64, 128]]
pt_model = PytorchGridSample(grids)
main(pt_model, input_tensor)
After I upgraded to MacOS 15 Beta 4(M1 16G), the sampling speed of apple ml-stable-diffusion was about 40% slower than MacOS 14.
And when I recompile and run with xcode 16, the following error will appear:
loc("EpicPhoto/Unet.mlmodelc/model.mil":2748:12): error: invalid axis: 4294967296, axis must be in range -|rank| <= axis < |rank|
Assertion failed: (0 && "failed to infer output types"), function _inferJITOutputTypes, file GPUBaseOps.mm, line 339.
I checked the macos 15 release notes and saw that the problem of slow running of Core ML models was fixed, but it didn't seem to be fixed.
Fixed: Inference time for large Core ML models is slower than expected on a subset of M-series SOCs (e.g. M1, M1 max) on macOS. (129682801)
I was watching wwdc2024 Deploy machine learning and AI models on-device with Core ML (https://developer.apple.com/videos/play/wwdc2024/10161/) and speaker was showing UI interface where he was ruining on device LLMs / Foundation models. I was wondering if this UI interface is open source and I can download and play around with similar app what was shown:
Hey all 👋🏼
We're currently working on a video processing project using the Vision framework (face, body and hand pose detection), and We've encountered a couple of errors that I need help with. We are on Xcode 16 Beta 3, testing on an iPhone 14 Pro running iOS 18 beta.
The error messages are as follows:
[LOG_ERROR] /Library/Caches/com.apple.xbs/Sources/MediaAnalysis/VideoProcessing/VCPHumanPoseImageRequest.mm[85]: code 18,446,744,073,709,551,598
encountered an unexpected condition: *** -[__NSArrayM insertObject:atIndex:]: object cannot be nil
What we've tried:
Debugging: I’ve tried stepping through the code, but the errors occur before I can gather any meaningful insights.
Searching Documentation: Looked through Apple’s developer documentation and forums but couldn’t find anything related to these specific error codes.
Nil Check: Added checks to ensure objects are not nil before inserting them into arrays, but the error persists.
Here are my questions:
Has anyone encountered similar errors with the Vision framework, specifically related to VCPHumanPoseImageRequest and NSArray operations?
Is there any known issue or bug in the version of the framework I might be using? Could it also be related to the beta?
Are there any additional debug steps or logging mechanisms I can implement to narrow down the cause?
Any suggestions on how to handle nil objects more effectively in this context?
I would greatly appreciate any insights or suggestions you might have. Thank you in advance for your assistance!
Thanks all!
Hi, the following model does not run on ANE. Inspecting with deCoreML I see the error ane: Failed to retrieved zero_point.
import numpy as np
import coremltools as ct
from coremltools.converters.mil import Builder as mb
import coremltools.converters.mil as mil
B, CIN, COUT = 512, 1024, 1024 * 4
@mb.program(
input_specs=[
mb.TensorSpec((B, CIN), mil.input_types.types.fp16),
],
opset_version=mil.builder.AvailableTarget.iOS18
)
def prog_manual_dequant(
x,
):
qw = np.random.randint(0, 2 ** 4, size=(COUT, CIN), dtype=np.int8).astype(mil.mil.types.np_uint4_dtype)
scale = np.random.randn(COUT, 1).astype(np.float16)
offset = np.random.randn(COUT, 1).astype(np.float16)
# offset = np.random.randint(0, 2 ** 4, size=(COUT, 1), dtype=np.uint8).astype(mil.mil.types.np_uint4_dtype)
dqw = mb.constexpr_blockwise_shift_scale(data=qw, scale=scale, offset=offset)
return mb.linear(x=x, weight=dqw)
cml_qmodel = ct.convert(
prog_manual_dequant,
compute_units=ct.ComputeUnit.CPU_AND_NE,
compute_precision=ct.precision.FLOAT16,
minimum_deployment_target=ct.target.iOS18,
)
Whereas if I use an offset with the same dtype as the weights (uint4 in this case), it does run on ANE
Tested on coremltools 8.0b1, on macOS 15.0 beta 2/Xcode 15 beta 2, and macOS 15.0 beta 3/Xcode 15 beta 3.
The Keras Embedding layer cannot be calculated on Metal because of the missing Op:StatelessRandomGetKeyCounter, as shown in this error message:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Could not satisfy device specification '/job:localhost/replica:0/task:0/device:GPU:0'. enable_soft_placement=0. Supported device types [CPU]. All available devices [/job:localhost/replica:0/task:0/device:GPU:0, /job:localhost/replica:0/task:0/device:CPU:0]. [Op:StatelessRandomGetKeyCounter]
A workaround is to enable soft placement, but this obviously is slower:
tf.config.set_soft_device_placement(True)
Reporting it here as recommended by the TensorFlow Plugin Metal team.