ML Compute

RSS for tag

Accelerate training and validation of neural networks using the CPU and GPUs.

ML Compute Documentation

Posts under ML Compute tag

36 Posts
Sort by:
Post not yet marked as solved
2 Replies
208 Views
Hi from France ! I'm trying to create a model for dice detection. I've take about 100 photos of dice on the same side (1 point). Are-my bounding boxes good ? should I take the whole dice ? I launched the trainning, it seems to work well : Then in the Evaluation tab, the values seems not great but not bad : I/U 84% Varied I/U 44% The validation scope is very low : In the preview tab, no matter what image I give to it, I have no detection What am I missing ? What should I improve ?
Posted
by Cyril42.
Last updated
.
Post not yet marked as solved
2 Replies
243 Views
First off, I'm mainly a wet-lab biologist, and Python is not my strong suit, so sorry if I seem a little clueless here. Anyways, I am having trouble converting a pretrained keras model (.h5 format) with coremltools. My code to try and convert the model: DL_model = (path to .h5 file) model_converted = ct.convert(DL_model, source = "tensorflow") and it throws an error: ValueError: Unknown metric function: binary_recall. Please ensure this object is passed to the custom_objects argument. See https://www.tensorflow.org/guide/keras/save_and_serialize#registering_the_custom_object for details. I assume I need to pass some sort of custom_objects = {"custom_obj": custom_obj} argument in ct.convert, but I don't know how. I tried: custom_objects = {"binary_recall": binary_recall}) but that caused NameError: name 'binary_recall' is not defined Can anyone give me some help here? It would be nice to speed up this model by converting it to Apple's format; I work with huge files and cutting down data processing time is important for me. Thanks!! Noah
Posted
by nlittman1.
Last updated
.
Post not yet marked as solved
3 Replies
612 Views
I'm trying to run sample code for MPS graph, which I got here: https://developer.apple.com/documentation/metalperformanceshadersgraph/adding_custom_functions_to_a_shader_graph And it's not working. Builds successfully, but after you press train (play button), program fails right after first training iteration with errors like these: -[MTLDebugCommandBuffer lockPurgeableObjects]:2103: failed assertion `MTLResource 0x600001693940 (label: (null)), referenced in cmd buffer 0x124015800 (label: (null)) is in volatile or empty purgeable state at commit' -[MTLDebugCommandBuffer lockPurgeableObjects]:2103: failed assertion `MTLResource 0x600001693940 (label: (null)), referenced in cmd buffer 0x124015800 (label: (null)) is in volatile or empty purgeable state at commit' It is failing on commandBuffer.commit() in runTrainingIterationBatch() method. Its like something already committed operation (I've checked and yeah, command buffer is already commited). But why such thing in EXAMPLE CODE? I've tried to wrap commit operation with command buffer status check and it is helping to not fail, but program works wrong overall and not calculating loss well. Everything is getting worse because documentation for MPS Graph is empty! It's contains only class and method names without any description D; My env: Xcode 13.4.1 (13F100) macOS 12.4 MacBook Pro (m1 pro) 14' 2021 16gb Tried to build on iPhone 12 Pro Max / iOS 15.5 and to Mac catalyst application. Got same error everywhere
Posted
by abesmon.
Last updated
.
Post not yet marked as solved
9 Replies
20k Views
I just got my new MacBook Pro with M1 Max chip and am setting up Python. I've tried several combinational settings to test speed - now I'm quite confused. First put my questions here: Why python run natively on M1 Max is greatly (~100%) slower than on my old MacBook Pro 2016 with Intel i5? On M1 Max, why there isn't significant speed difference between native run (by miniforge) and run via Rosetta (by anaconda) - which is supposed to be slower ~20%? On M1 Max and native run, why there isn't significant speed difference between conda installed Numpy and TensorFlow installed Numpy - which is supposed to be faster? On M1 Max, why run in PyCharm IDE is constantly slower ~20% than run from terminal, which doesn't happen on my old Intel Mac. Evidence supporting my questions is as follows: Here are the settings I've tried: 1. Python installed by Miniforge-arm64, so that python is natively run on M1 Max Chip. (Check from Activity Monitor, Kind of python process is Apple). Anaconda.: Then python is run via Rosseta. (Check from Activity Monitor, Kind of python process is Intel). 2. Numpy installed by conda install numpy: numpy from original conda-forge channel, or pre-installed with anaconda. Apple-TensorFlow: with python installed by miniforge, I directly install tensorflow, and numpy will also be installed. It's said that, numpy installed in this way is optimized for Apple M1 and will be faster. Here is the installation commands: conda install -c apple tensorflow-deps python -m pip install tensorflow-macos python -m pip install tensorflow-metal 3. Run from Terminal. PyCharm (Apple Silicon version). Here is the test code: import time import numpy as np np.random.seed(42) a = np.random.uniform(size=(300, 300)) runtimes = 10 timecosts = [] for _ in range(runtimes): s_time = time.time() for i in range(100): a += 1 np.linalg.svd(a) timecosts.append(time.time() - s_time) print(f'mean of {runtimes} runs: {np.mean(timecosts):.5f}s') and here are the results: +-----------------------------------+-----------------------+--------------------+ | Python installed by (run on)→ | Miniforge (native M1) | Anaconda (Rosseta) | +----------------------+------------+------------+----------+----------+---------+ | Numpy installed by ↓ | Run from → | Terminal | PyCharm | Terminal | PyCharm | +----------------------+------------+------------+----------+----------+---------+ | Apple Tensorflow | 4.19151 | 4.86248 | / | / | +-----------------------------------+------------+----------+----------+---------+ | conda install numpy | 4.29386 | 4.98370 | 4.10029 | 4.99271 | +-----------------------------------+------------+----------+----------+---------+ This is quite slow. For comparison, run the same code on my old MacBook Pro 2016 with i5 chip - it costs 2.39917s. another post reports that run with M1 chip (not Pro or Max), miniforge+conda_installed_numpy is 2.53214s, and miniforge+apple_tensorflow_numpy is 1.00613s. you may also try on it your own. Here is the CPU information details: My old i5: $ sysctl -a | grep -e brand_string -e cpu.core_count machdep.cpu.brand_string: Intel(R) Core(TM) i5-6360U CPU @ 2.00GHz machdep.cpu.core_count: 2 My new M1 Max: % sysctl -a | grep -e brand_string -e cpu.core_count machdep.cpu.brand_string: Apple M1 Max machdep.cpu.core_count: 10 I follow instructions strictly from tutorials - but why would all these happen? Is it because of my installation flaws, or because of M1 Max chip? Since my work relies heavily on local runs, local speed is very important to me. Any suggestions to possible solution, or any data points on your own device would be greatly appreciated :)
Posted Last updated
.
Post not yet marked as solved
0 Replies
298 Views
I tried training my model on my M1 Pro using Tensorflow's mixed-precision, hoping it will boost the performance, but I got an error: .../mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm:289:0: error: 'mps.select' op failed to verify that all of {true_value, false_value, result} have same element type .../mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm:289:0: note: see current operation: %5 = "mps.select"(%4, %3, %2) : (tensor<1xi1>, tensor<1xf16>, tensor<1xf32>) -> tensor<1xf16>
Posted Last updated
.
Post not yet marked as solved
0 Replies
219 Views
The data format of MLMultiArray content is float32. How to convert the data format of content to int
Posted
by LIttt.
Last updated
.
Post not yet marked as solved
0 Replies
237 Views
I have an app in the App Store that has been running fine. Starting yesterday it began throwing an exception during initialization in the App Store and during development: "[0x000...***] during in one-time initialization function for my_model_name at /ViewController.swift:84" Thread 2: EXC_BREAKPOINT (code=1, =***) [coreml] Could not create persistent key blob for ----*** : error=Error Domain=com.apple.CoreML Code=8 "Fetching decryption key from server failed." UserInfo=NSLocalizedDescription=Fetching decryption key from server failed., NSUnderlyingError=0x2822c6a60 Error Domain=CKErrorDomain Code=6 "CKInternalErrorDomain: 2022" UserInfo=NSDebugDescription=CKInternalErrorDomain: 2022, RequestUID=----, NSLocalizedDescription=Request failed with http status code 503, CKErrorDescription=Request failed with http status code 503, CKRetryAfter=21, NSUnderlyingError=0x2822c6ac0 Error Domain=CKInternalErrorDomain Code=2022 "Request failed with http status code 503" UserInfo=CKRetryAfter=21, CKHTTPStatus=503, CKErrorDescription=Request failed with http status code 503, RequestUID=----, NSLocalizedDescription=Request failed with http status code 503, CKHTTPStatus=503 ... NSLocalizedDescription=Request failed with http status code 503, CKHTTPStatus=503 ... Help! Anyone know what is going on? ... Also, when I try to generate a new key. I get this: I'm the only admin developer on my account. I tried following its directions by signing out and back in with my Apple ID in preferences. Restarted XCode and still can't generate a new key, but still received the same error.
Posted
by Glenn007.
Last updated
.
Post not yet marked as solved
0 Replies
286 Views
I'm using tensorflow in python and I'm exploring the hyper parameters of a machine learning models for my dataset. My workflow involves two python scripts. Script 1 iterates through a suite of ML hyper parameters calling a second python script that fits resulting ML model to the data. Script 2 is command line implementation of the ML model setup and fitting to the data. On my Macbook pro (M1 chip) everything works fine. On my new Mac Studio, script 2 hangs after a number of calls because it fails to run on the GPU. I cannot avoid this behaviour. If I kill the hung job the first script continues to call script 2 until it hangs again after another 10ish calls of script 2. What going on? Problem with the GPU of Mac studio? Any suggestion to test would be appreciated. R
Posted
by rmatear.
Last updated
.
Post marked as solved
3 Replies
2.4k Views
SYSTEM: MacBook Pro 14 (M1 Apple Silicon) MacOS 12.0.1 DONE: https://developer.apple.com/metal/tensorflow-plugin/ I have followed this for ARM M1 apple silicon for my 14" (I had anaconda installed before, that may causing the error but I DO NOT want to delete my anaconda all together) CODE import tensorflow as tf ERROR --------------------------------------------------------------------------- ImportError Traceback (most recent call last) ~/miniforge3/lib/python3.9/site-packages/numpy/core/__init__.py in <module> 21 try: ---> 22 from . import multiarray 23 except ImportError as exc: ~/miniforge3/lib/python3.9/site-packages/numpy/core/multiarray.py in <module> 11 ---> 12 from . import overrides 13 from . import _multiarray_umath ~/miniforge3/lib/python3.9/site-packages/numpy/core/overrides.py in <module> 6 ----> 7 from numpy.core._multiarray_umath import ( 8 add_docstring, implement_array_function, _get_implementing_args) ImportError: dlopen(/Users/ps/miniforge3/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-darwin.so, 0x0002): Library not loaded: @rpath/libcblas.3.dylib Referenced from: /Users/ps/miniforge3/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-darwin.so Reason: tried: '/Users/ps/miniforge3/lib/libcblas.3.dylib' (no such file), '/Users/ps/miniforge3/lib/libcblas.3.dylib' (no such file), '/Users/ps/miniforge3/lib/python3.9/site-packages/numpy/core/../../../../libcblas.3.dylib' (no such file), '/Users/ps/miniforge3/lib/libcblas.3.dylib' (no such file), '/Users/ps/miniforge3/lib/libcblas.3.dylib' (no such file), '/Users/ps/miniforge3/lib/python3.9/site-packages/numpy/core/../../../../libcblas.3.dylib' (no such file), '/Users/ps/miniforge3/lib/libcblas.3.dylib' (no such file), '/Users/ps/miniforge3/bin/../lib/libcblas.3.dylib' (no such file), '/Users/ps/miniforge3/lib/libcblas.3.dylib' (no such file), '/Users/ps/miniforge3/bin/../lib/libcblas.3.dylib' (no such file), '/usr/local/lib/libcblas.3.dylib' (no such file), '/usr/lib/libcblas.3.dylib' (no such file) During handling of the above exception, another exception occurred: ImportError Traceback (most recent call last) /var/folders/yp/mq9ddgh54gjg2rp7mw_t015c0000gn/T/ipykernel_95218/3793406994.py in <module> ----> 1 import tensorflow as tf ~/miniforge3/lib/python3.9/site-packages/tensorflow/__init__.py in <module> 39 import sys as _sys 40 ---> 41 from tensorflow.python.tools import module_util as _module_util 42 from tensorflow.python.util.lazy_loader import LazyLoader as _LazyLoader 43 ~/miniforge3/lib/python3.9/site-packages/tensorflow/python/__init__.py in <module> 39 40 from tensorflow.python import pywrap_tensorflow as _pywrap_tensorflow ---> 41 from tensorflow.python.eager import context 42 43 # pylint: enable=wildcard-import ~/miniforge3/lib/python3.9/site-packages/tensorflow/python/eager/context.py in <module> 28 29 from absl import logging ---> 30 import numpy as np 31 import six 32 ~/miniforge3/lib/python3.9/site-packages/numpy/__init__.py in <module> 138 from . import _distributor_init 139 --> 140 from . import core 141 from .core import * 142 from . import compat ~/miniforge3/lib/python3.9/site-packages/numpy/core/__init__.py in <module> 46 """ % (sys.version_info[0], sys.version_info[1], sys.executable, 47 __version__, exc) ---> 48 raise ImportError(msg) 49 finally: 50 for envkey in env_added: ImportError: IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE! Importing the numpy C-extensions failed. This error can happen for many reasons, often due to issues with your setup or how NumPy was installed. We have compiled some common reasons and troubleshooting tips at: https://numpy.org/devdocs/user/troubleshooting-importerror.html Please note and check the following: * The Python version is: Python3.9 from "/Users/ps/miniforge3/bin/python" * The NumPy version is: "1.19.5" and make sure that they are the versions you expect. Please carefully study the documentation linked above for further help. Original error was: dlopen(/Users/ps/miniforge3/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-darwin.so, 0x0002): Library not loaded: @rpath/libcblas.3.dylib Referenced from: /Users/ps/miniforge3/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-darwin.so Reason: tried: '/Users/ps/miniforge3/lib/libcblas.3.dylib' (no such file), '/Users/ps/miniforge3/lib/libcblas.3.dylib' (no such file), '/Users/ps/miniforge3/lib/python3.9/site-packages/numpy/core/../../../../libcblas.3.dylib' (no such file), '/Users/ps/miniforge3/lib/libcblas.3.dylib' (no such file), '/Users/ps/miniforge3/lib/libcblas.3.dylib' (no such file), '/Users/ps/miniforge3/lib/python3.9/site-packages/numpy/core/../../../../libcblas.3.dylib' (no such file), '/Users/ps/miniforge3/lib/libcblas.3.dylib' (no such file), '/Users/ps/miniforge3/bin/../lib/libcblas.3.dylib' (no such file), '/Users/ps/miniforge3/lib/libcblas.3.dylib' (no such file), '/Users/ps/miniforge3/bin/../lib/libcblas.3.dylib' (no such file), '/usr/local/lib/libcblas.3.dylib' (no such file), '/usr/lib/libcblas.3.dylib' (no such file)
Posted
by sogu.
Last updated
.
Post not yet marked as solved
0 Replies
423 Views
Hello, I'm trying to use Accelerate's sparse Cholesky solver but met an issue on M1. Is anyone aware of this issue or similar ones? The sparse factorization class _SparseNumericFactorSymmetric in Apple Accelerate will hang randomly for some large symmetric positive definite matrices with a specific sparsity pattern when created for Cholesky factorization. This issue happens in probability about 1/1000. It can be reproduced through following steps Read and load the attached sparse matrix (in MatrixMarket format). In a for-loop, repeatedly factorize the same matrix through _SparseNumericFactorSymmetric for 1000~2000 times. Matrix: https://www.dropbox.com/s/2pyl0cpmgy1qdrh/mat.mtx.zip?dl=0 The following code calls Apple Accelerate through a wrapper from Eigen 3.4.9 (https://eigen.tuxfamily.org/dox/group__AccelerateSupport__Module.html): #include <unsupported/Eigen/SparseExtra> #include <Eigen/AccelerateSupport> #include <iostream> int main() { Eigen::SparseMatrix<double> A; Eigen::loadMarket(A, "mat.mtx"); for (int i = 0; i < 2000; ++i) { Eigen::AccelerateLLT<Eigen::SparseMatrix<double>> solver; solver.compute(A); std::cout << i << std::endl; } return 0; } The factorizations should perform smoothly for the times specified in the for-loop. But the app will hang and stop being responsive infinitely after performing the factorization for hundreds of times (sometimes ~500+ loops, or sometimes ~900+ loops). I'm using a Xcode Version 13.1 (13A1030d) on MacOS Monterey 12.5.1 (21G83), with Apple M1 Max. It seems this issue is related to multithreading in VecLib since it only happens when we leave the environment variable unset (i.e., VECLIB_MAXIMUM_THREADS > 1). Setting VECLIB_MAXIMUM_THREADS = 1 will eliminate the issue at the cost of losing performance. And this issue only happens on M1 Macs, not on Intel-based ones.
Posted
by nepluno.
Last updated
.
Post not yet marked as solved
1 Replies
570 Views
We use several CoreML models on our swift application. Memory footprint of these coreML models varies in a range from 15 kB to 3.5 MB according to the XCode coreML utility tool. We observe a huge difference of loading time in function of the type of the compute units selected to run the model. Here is a small sample code used to load the model: let configuration = MLModelConfiguration() //Here I use the the .all compute units mode: configuration.computeUnits = .all let myModel = try! myCoremlModel(configuration: configuration).model Here are the profiling results of this sample code for different models sizes in function of the targeted compute units: Model-3.5-MB : computeUnits is .cpuAndGPU: 188 ms ⇒ 18 MB/s computeUnits is .all or .cpuAndNeuralEngine on iOS16: 4000 ms ⇒ 875 kB/s Model-2.6-MB: computeUnits is .cpuAndGPU: 144 ms ⇒ 18 MB/s computeUnits is .all or .cpuAndNeuralEngine on iOS16: 1300 ms ⇒ 2 MB/s Model-15-kB: computeUnits is .cpuAndGPU: 18 ms ⇒ 833 kB/s computeUnits is .all or .cpuAndNeuralEngine on iOS16: 700 ms ⇒ 22 kB/s What explained the difference of loading time in function en the computeUnits mode ? Is there a way to reduce the loading time of the models when using the .all or .cpuAndNeuralEngine computeUnits mode ?
Posted
by dbphr.
Last updated
.
Post not yet marked as solved
1 Replies
481 Views
We use dynamic input size for some uses cases. When compute unit mode is .all there is strong difference in the execution time if the dynamic input shape doesn’t fit with the optimal shape. If we set the model optimal input shape as 896x896 but run it with an input shape of 1024x768 the execution time is almost twice as slower compared to an input size of 896x896. For example a model set with 896x896 preferred input shape can achieve inference at 66 ms when input shape is 896x896. However this model only achieve inference at 117 ms when input shape is 1024x768. In that case if we want to achieve best performances at inference time we need to switch from a model to another in function of the input shape which is not dynamic at all and memory greedy. There is a way to reduce the execution time when shape out of the preferred shape range?
Posted
by dbphr.
Last updated
.
Post not yet marked as solved
0 Replies
383 Views
Hi, I am doing multiprocessing currently in python and need to upgrade my laptop. The scientific computing I am doing which usually takes awhile so I'd like to have as many cores as I can get. The question: How, if possible, might I use the NPU instead of (or in tandem with) the CPU for multiprocessing? In python it is fairly straight forward with the concurrent futures package, https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ProcessPoolExecutor .
Posted
by ThomK.
Last updated
.
Post not yet marked as solved
4 Replies
1.1k Views
I am training a model using tensorflow-metal and model training (and the whole application) freezes up. The behavior is nondeterministic. I believe the problem is with Metal (1) because of the contents of the backtraces below, and (2) because when I run the same code on a machine with non-Metal TensorFlow (using a GPU), everything works fine. I can't share my code publicly, but I would be willing to share it with an Apple engineer privately over email if that would help. It's hard to create a minimum reproduction example since my program is somewhat complex and the bug is nondeterministic. The bug does appear pretty reliably. It looks like the problem might be in some Metal Performance Shaders init code. The state of everything (backtraces, etc.) when the program freezes is attached. Backtraces
Posted
by andmis.
Last updated
.
Post not yet marked as solved
4 Replies
644 Views
This does not seem to be effecting the training, but it seems somewhat important (no clue on how to read it however): Error: command buffer exited with error status. The Metal Performance Shaders operations encoded on it may not have completed. Error: (null) Internal Error (0000000e:Internal Error) <AGXG13XFamilyCommandBuffer: 0x29b027b50> label = <none> device = <AGXG13XDevice: 0x12da25600> name = Apple M1 Max commandQueue = <AGXG13XFamilyCommandQueue: 0x106477000> label = <none> device = <AGXG13XDevice: 0x12da25600> name = Apple M1 Max retainedReferences = 1 This is happening during a "heavy" model training on "heavy" dataset, so maybe is related to some memory issue, but I have no clue how to confront it
Posted Last updated
.
Post not yet marked as solved
1 Replies
425 Views
Can someone tell me if using copyrighted content for neural network training is infringement or a fair use? For example: Someone took 100000 superhero pictures from Google for training. After this neural network can create superhero pictures with the query of user. Is it an infringement or a fair use? Can developer sell these created pictures to users (or a subscription to service)? Or everyone uses only public domain and open source content for training?
Posted
by Dimbill.
Last updated
.
Post not yet marked as solved
0 Replies
489 Views
Documentation for MPS graph has no information about class and methods functionality. Its only enumerates everything it's got without any explanation what it is and how it works. Why so? https://developer.apple.com/documentation/metalperformanceshadersgraph Though, In MPSGraph header files there is some commentaries, so it's seems like a bug
Posted
by abesmon.
Last updated
.
Post not yet marked as solved
6 Replies
1.4k Views
After installing tensorflow-metal PluggableDevice according to Getting Started with tensorflow-metal PluggableDevice I have tested this DCGAN example: https://www.tensorflow.org/tutorials/generative/dcgan. Everything was working perfectly until I decided tu upgrade macOS from 12.0.1 to 12.1. Before the final result after 50 epoch was like on picture1 below , after upgrade is like on picture2 below . I am using: TensrofFlow 2.7.0 tensorflow-metal-0.3.0 python3.9 I hope this question will also help Apple to improve Metal PluggableDevice. I can't wait to use it in my research.
Posted Last updated
.