ML Compute

linear_quantize_activations taking 90 minutes + on MacBook Air M1 2020

In my quantization code, the line: compressed_model_a8 = cto.coreml.experimental.linear_quantize_activations( model, activation_config, [{'img':np.random.randn(1,13,1024,1024)}] ) has taken 90 minutes to run so far and is still not completed. From debugging, I can see that the line it's stuck on is line 261 in _model_debugger.py: model = ct.models.MLModel( cloned_spec, weights_dir=self.weights_dir, compute_units=compute_units, skip_model_load=False, # Don't skip model load as we need model prediction to get activations range. ) Is this expected behaviour? Would it be quicker to run on another computer with more RAM?

Machine Learning & AI Core ML Developer Tools ML Compute

1

0

12

1d

Expected a 'mps:0' generator device but found 'cpu'

使用MPS来加速机器学习功能，有时是否与torch会有适配性问题？

Machine Learning & AI General ML Compute

1

0

597

3w

Network problems

Hey everyone, i just updated my IPhone to ios 18.4 a just after i updated my mobile internet stopped working. With no internet i cant do anything, and i dont know how to fix it, i hope someone can help me

Community Apple Developers Community Management ML Compute iOS

1

0

377

3w

Using the Apple Neural Engine for MLTensor operations

Based on the documentation, it appears that MLTensor can be used to perform tensor operations using the ANE (Apple Neural Engine) by wrapping the tensor operations with withMLTensorComputePolicy with a MLComputePolicy initialized with MLComputeUnits.cpuAndNeuralEngine (it can also be initialized with MLComputeUnits.all to let the OS spread the load between the Neural Engine, GPU and CPU). However, when using the Instruments app, it appears that the tensor operations never get executed on the Neural Engine. It would be helpful if someone can guide me on the correct way to ensure that the Nerual Engine is used to perform the tensor operations (not as part of a CoreML model file). based on this example, I've created a simple code to try it: import Foundation import CoreML print("Starting...") let semaphore = DispatchSemaphore(value: 0) Task { await withMLTensorComputePolicy(.init(MLComputeUnits.cpuAndNeuralEngine)) { let v1 = MLTensor([1.0, 2.0, 3.0, 4.0]) let v2 = MLTensor([5.0, 6.0, 7.0, 8.0]) let v3 = v1.matmul(v2) await v3.shapedArray(of: Float.self) // is 70.0 let m1 = MLTensor(shape: [2, 3], scalars: [ 1, 2, 3, 4, 5, 6 ], scalarType: Float.self) let m2 = MLTensor(shape: [3, 2], scalars: [ 7, 8, 9, 10, 11, 12 ], scalarType: Float.self) let m3 = m1.matmul(m2) let result = await m3.shapedArray(of: Float.self) // is [[58, 64], [139, 154]] // Supports broadcasting let m4 = MLTensor(randomNormal: [3, 1, 1, 4], scalarType: Float.self) let m5 = MLTensor(randomNormal: [4, 2], scalarType: Float.self) let m6 = m4.matmul(m5) print("Done") return result; } semaphore.signal() } semaphore.wait() Here's what I get on the Instruments app: Notice how the Neural Engine line shows no usage. Ive run this test on an M1 Max MacBook Pro.

Machine Learning & AI Core ML ML Compute Core ML

2

3

512

4w

Segmentation Fault in np.matmul on macOS 15.2 with Accelerate BLAS

I'm encountering a segmentation fault when using np.matmul with relatively small arrays on macOS 15.2. The issue only occurs in specific scenarios and results in a crash with the following error: Exception Type: EXC_BAD_ACCESS (SIGSEGV) Exception Codes: KERN_INVALID_ADDRESS at 0x0000000000000110 Termination Reason: Namespace SIGNAL, Code 11 Segmentation fault: 11 Full error log: Gist link The crash consistently occurs on a specific line where np.matmul is called, despite similar np.matmul operations succeeding earlier in the same script. The issue cannot be reproduced in a separate script that contains identical operations. When I build the NumPy wheel using OpenBLAS, this issue no longer arises, which leads me to believe that it is related to a problem with Accelerate. Environment NumPy Version: 2.1.3 Python Version: 3.12.7 OS Version: macOS 15.2 BLAS Configuration: Build Dependencies: blas: detection method: system found: true include directory: unknown lib directory: unknown name: accelerate openblas configuration: unknown pc file directory: unknown version: unknown lapack: detection method: system found: true include directory: unknown lib directory: unknown name: accelerate openblas configuration: unknown pc file directory: unknown version: unknown Compilers: c: commands: cc linker: ld64 name: clang version: 15.0.0 c++: commands: c++ linker: ld64 name: clang version: 15.0.0 cython: commands: cython linker: cython name: cython version: 3.0.11 Machine Information: build: cpu: aarch64 endian: little family: aarch64 system: darwin host: cpu: aarch64 endian: little family: aarch64 system: darwin

Developer Tools & Services Xcode ML Compute macOS Accelerate Apple Silicon

1

0

355

Jan ’25

Troubleshooting Apple Vision Framework Errors

When working on the project "Analyzing a Selfie and Visualizing Its Content" from Apple's documentation, I downloaded the project and opened it in Xcode. However, I encountered the following error: VTEST: error: perform(_:): inside 'for await result in resultStream' error: internalError("Error Domain=com.apple.Vision Code=9 \"Could not create inference context\" UserInfo={NSLocalizedDescription=Could not create inference context}") VTEST: error: DetectFaceRectanglesRequest was cancelled. VTEST: error: DetectFaceRectanglesRequest was cancelled. Error Domain=com.apple.Vision Code=9 "Could not create inference context" UserInfo={NSLocalizedDescription=Could not create inference context} How can I resolve this issue? Thanks in advance!

Developer Tools & Services Xcode ML Compute

1

0

293

Feb ’25

Broken compatibility in tensorflow-metal with tensorflow 2.18

Issue type: Bug TensorFlow metal version: 1.1.1 TensorFlow version: 2.18 OS platform and distribution: MacOS 15.2 Python version: 3.11.11 GPU model and memory: Apple M2 Max GPU 38-cores Standalone code to reproduce the issue: import tensorflow as tf if __name__ == '__main__': gpus = tf.config.experimental.list_physical_devices('GPU') print(gpus) Current behavior Apple silicone GPU with tensorflow-metal==1.1.0 and python 3.11 works fine with tensorboard==2.17.0 This is normal output: /Users/mspanchenko/anaconda3/envs/cryptoNN_ml_core/bin/python /Users/mspanchenko/VSCode/cryptoNN/ml/core_second_window/test_tensorflow_gpus.py [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')] Process finished with exit code 0 But if I upgrade tensorflow to 2.18 I'll have error: /Users/mspanchenko/anaconda3/envs/cryptoNN_ml_core/bin/python /Users/mspanchenko/VSCode/cryptoNN/ml/core_second_window/test_tensorflow_gpus.py Traceback (most recent call last): File "/Users/mspanchenko/VSCode/cryptoNN/ml/core_second_window/test_tensorflow_gpus.py", line 1, in <module> import tensorflow as tf File "/Users/mspanchenko/anaconda3/envs/cryptoNN_ml_core/lib/python3.11/site-packages/tensorflow/__init__.py", line 437, in <module> _ll.load_library(_plugin_dir) File "/Users/mspanchenko/anaconda3/envs/cryptoNN_ml_core/lib/python3.11/site-packages/tensorflow/python/framework/load_library.py", line 151, in load_library py_tf.TF_LoadLibrary(lib) tensorflow.python.framework.errors_impl.NotFoundError: dlopen(/Users/mspanchenko/anaconda3/envs/cryptoNN_ml_core/lib/python3.11/site-packages/tensorflow-plugins/libmetal_plugin.dylib, 0x0006): Symbol not found: __ZN3tsl8internal10LogMessageC1EPKcii Referenced from: <D2EF42E3-3A7F-39DD-9982-FB6BCDC2853C> /Users/mspanchenko/anaconda3/envs/cryptoNN_ml_core/lib/python3.11/site-packages/tensorflow-plugins/libmetal_plugin.dylib Expected in: <2814A58E-D752-317B-8040-131217E2F9AA> /Users/mspanchenko/anaconda3/envs/cryptoNN_ml_core/lib/python3.11/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so Process finished with exit code 1

Machine Learning & AI General ML Compute Apple Silicon tensorflow-metal

3

1.4k

Feb ’25

How to confirm whether CreatML is training

I am currently training a Tabular Classification model in CreatML. The dataset comprises 30 features, including 1,000,000 training data points and 1,000,000 verification data points. Could you please estimate the approximate training time for an M4Max MacBook Pro? During the training process, CreatML has been displaying the “Processing” status, but there is no progress bar. I would like to ascertain whether the training is still ongoing, as I have often suspected that it has ceased.

Machine Learning & AI Create ML ML Compute Core ML Create ML TabularData

1

0

559

Jan ’25

WebGPU Enabled but WKWebView doesn't have GPU Access

We enabled WebGPU feature flag on Safari on iOS 18.2. This does give Safari an access to GPU but WKWebView still doesn't have GPU access. Can WKWebView not access GPU through Safari feature flag? Is there some other mechanism through which we can enable GPU access for WKWebView? We are testing gpu access by loading : https://webgpureport.org/ Regards Saalis Umer Microsoft Safari Feature Flag - webgpu = true Safari GPU Access: WKWebView GPU Access:

Safari & Web General Foundation ML Compute WebKit

0

2

594

Dec ’24

The yolo11 object detection model I exported to coreml stopped working in macOS15.2 beta.

After updating to macOS15.2beta, the Yolo11 object detection model exported to coreml outputs incorrect and abnormal bounding boxes. It also doesn't work in iOS apps built on a 15.2 mac. The same model worked fine on macOS14.1. When training a Yolo11 custom model in Python, exporting it to coreml, and testing it in the preview tab of mlpackage on macOS15.2 and Xcode16.0, the above result is obtained.

Machine Learning & AI Core ML ML Compute Beta

6

1

1.1k

Feb ’25

Tensorflow Metal M4 Slowness

I am running the same Python script using the TensorFlow Metal module on computers with M3 and M4 GPUs. While 1 epoch takes 5 minutes on the M3 device, it takes 15 minutes on the M4 device. What could be the reason for this? Could it be that TensorFlow Metal is not yet optimized for the M4 architecture?

App & System Services Hardware ML Compute Metal Performance Shaders tensorflow-metal

0

1

686

Dec ’24

Unable to Use M1 Mac Pro Max GPU for TensorFlow Model Training

Hi Everyone, I'm currently facing an issue where TensorFlow is unable to detect the GPU on my M1 Mac for model training. When I run the following code to check for available GPUs: import tensorflow as tf print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU'))) Num GPUs Available: 0 I have already applied the steps mentioned in the developer apple document. https://developer.apple.com/metal/tensorflow-plugin/ System Information: Device: M1 Mac Pro Max Python Version: 3.12.2 TensorFlow Version: 2.17.0 OS: macOS Sequoia (15.1) Questions: Is there any additional configuration required to enable GPU support on M1 Macs? Are there specific TensorFlow versions that I should be using for better compatibility? Has anyone else faced this issue, and how did you resolve it?

Machine Learning & AI General Developer Tools ML Compute Core ML tensorflow-metal

2

1

783

Nov ’24

macOS 15.x crashes in MetalPerformanceShadersGraph

In our app we use CoreML. But ever since macOS 15.x was released we started to get a great bunch of crashes like this: Incident Identifier: 424041c3-884b-4e50-bb5a-429a83c3e1c8 CrashReporter Key: B914246B-1291-4D44-984D-EDF84B52310E Hardware Model: Mac14,12 Process: <REMOVED> [1509] Path: /Applications/<REMOVED> Identifier: com.<REMOVED> Version: <REMOVED> Code Type: arm64 Parent Process: launchd [1] Date/Time: 2024-11-13T13:23:06.999Z Launch Time: 2024-11-13T13:22:19Z OS Version: Mac OS X 15.1.0 (24B83) Report Version: 104 Exception Type: SIGABRT Exception Codes: #0 at 0x189042600 Crashed Thread: 36 Thread 36 Crashed: 0 libsystem_kernel.dylib 0x0000000189042600 __pthread_kill + 8 1 libsystem_c.dylib 0x0000000188f87908 abort + 124 2 libsystem_c.dylib 0x0000000188f86c1c __assert_rtn + 280 3 Metal 0x0000000193fdd870 MTLReportFailure.cold.1 + 44 4 Metal 0x0000000193fb9198 MTLReportFailure + 444 5 MetalPerformanceShadersGraph 0x0000000222f78c80 -[MPSGraphExecutable initWithMPSGraphPackageAtURL:compilationDescriptor:] + 296 6 Espresso 0x00000001a290ae3c E5RT::SharedResourceFactory::GetMPSGraphExecutable(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, NSDictionary*) + 932 . . . 43 CoreML 0x0000000192d263bc -[MLModelAsset modelWithConfiguration:error:] + 120 44 CoreML 0x0000000192da96d0 +[MLModel modelWithContentsOfURL:configuration:error:] + 176 45 <REMOVED> 0x000000010497b758 -[<REMOVED> <REMOVED>] (<REMOVED>) No similar crashes on macOS 12-14! MetalPerformanceShadersGraph.log Any clue what is causing this? Thanks! :)

Machine Learning & AI Core ML ML Compute Metal Debugging

2

1

721

Dec ’24

How to Fine-Tune the SNSoundClassifier for Custom Sound Classification in iOS?

Hi Apple Developer Community, I’m exploring ways to fine-tune the SNSoundClassifier to allow users of my iOS app to personalize the model by adding custom sounds or adjusting predictions. While Apple’s WWDC session on sound classification explains how to train from scratch, I’m specifically interested in using SNSoundClassifier as the base model and building/fine-tuning on top of it. Here are a few questions I have: 1. Fine-Tuning on SNSoundClassifier: Is there a way to fine-tune this model programmatically through APIs? The manual approach using macOS, as shown in this documentation is clear, but how can it be done dynamically - within the app for users or in a cloud backend (AWS/iCloud)? Are there APIs or classes that support such on-device/cloud-based fine-tuning or incremental learning? If not directly, can the classifier’s embeddings be used to train a lightweight custom layer? Training is likely computationally intensive and drains too much on battery, doing it on cloud can be right way but need the right apis to get this done. A sample code will do good. 2. Recommended Approach for In-App Model Customization: If SNSoundClassifier doesn’t support fine-tuning, would transfer learning on models like MobileNetV2, YAMNet, OpenL3, or FastViT be more suitable? Given these models (SNSoundClassifier, MobileNetV2, YAMNet, OpenL3, FastViT), which one would be best for accuracy and performance/efficiency on iOS? I aim to maintain real-time performance without sacrificing battery life. Also it is important to see architecture retention and accuracy after conversion to CoreML model. 3. Cost-Effective Backend Setup for Training: Mac EC2 instances on AWS have a 24-hour minimum billing, which can become expensive for limited user requests. Are there better alternatives for deploying and training models on user request when s/he uploads files (training data)? 4. TensorFlow vs PyTorch: Between TensorFlow and PyTorch, which framework would you recommend for iOS Core ML integration? TensorFlow Lite offers mobile-optimized models, but I’m also curious about PyTorch’s performance when converted to Core ML. 5. Metrics: Metrics I have in mind while picking the model are these: Publisher, Accuracy, Fine-Tuning capability, Real-Time/Live use, Suitability of iPhone 16, Architectural retention after coreML conversion, Reasons for unsuitability, Recommended use case. Any insights or recommended approaches would be greatly appreciated. Thanks in advance!

Machine Learning & AI Create ML ML Compute Machine Learning Core ML Create ML

6

1

1.2k

Dec ’24

Does iPhone 15 Pro Use a Single Microphone or Multiple Microphones for Voice and Sound Recognition?

Hello, I have a question regarding the voice and sound recognition features on the iPhone 15 Pro. The iPhone 15 Pro is equipped with four microphones, and I understand that for features like Apple’s sound recognition and when invoking Siri, the microphone(s) must always be active. My question is whether the device uses a single microphone (mono channel) for these functions or if multiple microphones are activated simultaneously. I would appreciate clarification on how the microphones are utilized in sound and voice recognition features. Thank you for your assistance. Best regards.

Accessibility & Inclusion General App Tracking Transparency iPhone AudioToolbox ML Compute

1

0

808

Oct ’24

Optimizing YOLOv8 for Real-Time Object Detection in a Specific Screen Area

I’m working on real-time object detection using YOLOv8, but I only need to detect objects in approximately 40% of the screen area. Is it possible to limit the captureOut method to focus solely on that specific region of the screen? If this isn’t feasible, I’m considering an approach where the full-screen pixel buffer is captured and then cropped to the target area before running detection. However, I’m concerned about how this might affect real-time performance. I’d appreciate any insights on how to maintain real-time performance or suggestions for better alternatives. Thank you!

Media Technologies Photos & Camera ML Compute Machine Learning Core Image AVFoundation

2

0

747

Oct ’24

Urgent Issue with SoundAnalysis in iOS 18 - Critical Background Permissions Error

We are experiencing a major issue with the native .version1 of the SoundAnalysis framework in iOS 18, which has led to all our user not having recordings. Our core feature relies heavily on sound analysis in the background, and it previously worked flawlessly in prior iOS versions. However, in the new iOS 18, sound analysis stops working in the background, triggering a critical warning. Details of the issue: We are using SoundAnalysis to analyze background sounds and have enabled the necessary background permissions. We are using the latest XCode A warning now appears, and sound analysis fails in the background. Below is the warning message we are encountering: Warning Message: Execution of the command buffer was aborted due to an error during execution. Insufficient Permission (to submit GPU work from background) [Espresso::handle_ex_plan] exception=Espresso exception: "Generic error": Insufficient Permission (to submit GPU work from background) (00000006:kIOGPUCommandBufferCallbackErrorBackgroundExecutionNotPermitted); code=7 status=-1 Unable to compute the prediction using a neural network model. It can be an invalid input data or broken/unsupported model (error code: -1). CoreML prediction failed with Error Domain=com.apple.CoreML Code=0 "Failed to evaluate model 0 in pipeline" UserInfo={NSLocalizedDescription=Failed to evaluate model 0 in pipeline, NSUnderlyingError=0x30330e910 {Error Domain=com.apple.CoreML Code=0 "Failed to evaluate model 1 in pipeline" UserInfo={NSLocalizedDescription=Failed to evaluate model 1 in pipeline, NSUnderlyingError=0x303307840 {Error Domain=com.apple.CoreML Code=0 "Unable to compute the prediction using a neural network model. It can be an invalid input data or broken/unsupported model (error code: -1)." UserInfo={NSLocalizedDescription=Unable to compute the prediction using a neural network model. It can be an invalid input data or broken/unsupported model (error code: -1).}}}}} We urgently need guidance or a fix for this, as our application’s main functionality is severely impacted by this background permission error. Please let us know the next steps or if this is a known issue with iOS 18.

Machine Learning & AI Core ML ML Compute Sound Analysis Core ML Background Tasks

12

11

2k

Dec ’24

Code with Swift Assist

Hello, I would like to inquire about the release date of Swift Assist’s beta version. Apple has stated that it will be released later this year, but they have not provided a specific date or time. Could you please provide information on the beta version’s release date? Additionally, is there a trial version available? If so, when was it released? Thank you for your assistance.

Developer Tools & Services Xcode Xcode Swift ML Compute Apple Intelligence

2

1

2.3k

Jan ’25

MLTensor computation took more time than expected.

func testMLTensor() { let t1 = MLTensor(shape: [2000, 1], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 2000), scalarType: Float.self) let t2 = MLTensor(shape: [1, 3000], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 3000), scalarType: Float.self) for _ in 0...50 { let t = Date() let x = (t1 * t2) print("MLTensor", t.timeIntervalSinceNow * 1000, "ms") } } testMLTensor() The above code took more time than expected, especially in the early stage of iteration.

Machine Learning & AI Core ML ML Compute Accelerate Performance Core ML

1

0

779

Aug ’24

MLTensor computation took more time than expected.

func testMLTensor() { let t1 = MLTensor(shape: [2000, 1], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 2000), scalarType: Float.self) let t2 = MLTensor(shape: [1, 3000], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 3000), scalarType: Float.self) for _ in 0...50 { let t = Date() let x = (t1 * t2) print("MLTensor", t.timeIntervalSinceNow * 1000, "ms") } } testMLTensor() The above code took more time than expected, especially in the early stage of iteration.

Machine Learning & AI Core ML ML Compute Accelerate Core ML

0

639

Aug ’24

Post

Replies

Boosts

Views

Activity

ML Compute

Posts under ML Compute tag

Post

Replies

Boosts

Views

Activity