ML Compute

RSS for tag

Accelerate training and validation of neural networks using the CPU and GPUs.

ML Compute Documentation

Posts under ML Compute tag

38 Posts
Sort by:
Post not yet marked as solved
0 Replies
328 Views
Hello! I’m having an issue with retrieving the trained weights from MLCLSTMLayer in ML Compute when training on a GPU. I maintain references to the input-weights, hidden-weights, and biases tensors and use the following code to extract the data post-training: extension MLCTensor { func dataArray<Scalar>(as _: Scalar.Type) throws -> [Scalar] where Scalar: Numeric { let count = self.descriptor.shape.reduce(into: 1) { (result, value) in result *= value } var array = [Scalar](repeating: 0, count: count) self.synchronizeData() // This *should* copy the latest data from the GPU to memory that’s accessible by the CPU _ = try array.withUnsafeMutableBytes { (pointer) in guard let data = self.data else { throw DataError.uninitialized // A custom error that I declare elsewhere } data.copyBytes(to: pointer) } return array } } The issue is that when I call dataArray(as:) on a weights or biases tensor for an LSTM layer that has been trained on a GPU, the values that it retrieves are the same as they were before training began. For instance, if I initialize the biases all to 0 and then train the LSTM layer on a GPU, the biases values seemingly remain 0 post-training, even though the reported loss values decrease as you would expect. This issue does not occur when training an LSTM layer on a CPU, and it also does not occur when training a fully-connected layer on a GPU. Since both types of layers work properly on a CPU but only MLCFullyConnectedLayer works properly on a GPU, it seems that the issue is a bug in ML Compute’s GPU implementation of MLCLSTMLayer specifically. For reference, I’m testing my code on M1 Max. Am I doing something wrong, or is this an actual bug that I should report in Feedback Assistant?
Posted Last updated
.
Post not yet marked as solved
0 Replies
458 Views
Project is based on python3.8 and 3.9, containing some C and C++ source How can I do parallel computing on CPU and GPU of M1max In deed, I buy Mac m1max for the strong GPU to do quantitative finance, for which the speed is extremely important. Unfortunately, cuda is not compatible with Mac. Show me how to do it, thx. Are Accelerate(for CPU) and Metal(for GPU) can speed up any source by building like this: Step 1: download source from github Step 2: create a file named "site.cfg"in this souce file, and add content: [accelerate] libraries=Metal, Acelerate, vecLib Step 3: Terminal: NPY_LAPACK_Order=accelerate python3 setup.py build Step 4: pip3 install . or python3 setup.py install ? (I am not sure which method to apply) 2、how is the compatibility of such method? I need speed up numpy, pandas and even a open souce project, such as https://github.com/microsoft/qlib 3、just show me the code 4、when compiling C++, C source, a lot of errors were reported, which gcc and g++ to choose? the default gcc installed by brew is 4.2.1, which cannot work. and I even tried to download gcc from the offical website of ARM, still cannot work. give me a hint. thx so much urgent
Posted
by jefftang.
Last updated
.
Post not yet marked as solved
1 Replies
312 Views
So I've read the documentation, downloaded the Accelerate source, and created a simple example. I'm attempting to solve a system of two equations, 90x+85y=400, and y-x=0. The result should be just greater than 2.25 for both x and y. What I get is [x,y]=[2.2857144, 205.7143]. I'm new to this, so I'm sure I've misread the docs, but I can't see where. Here is the code I modified to do my experiment. do{     let aValues: [Float] = [85, 90,                         1,-1]     /// The _b_ in _Ax = b_.     let bValues: [Float] = [400,0]     /// Call `nonsymmetric_general` to compute the _x_ in _Ax = b_.     let x = nonsymmetric_general(a: aValues,                                  dimension: 2,                                  b: bValues,                                  rightHandSideCount: 1)     /// Calculate _b_ using the computed _x_.     if let x = x {         let b = matrixVectorMultiply(matrix: aValues,                                      dimension: (m: 2, n: 2),                                      vector: x)         /// Prints _b_ in _Ax = b_ using the computed _x_: `~[70, 160, 250]`.         print("\nx = ",x)         print("\nb =", b)     } } What did I misunderstand? Thanks
Posted
by lbarney.
Last updated
.
Post not yet marked as solved
2 Replies
545 Views
I am running a test model on my MBP M1 pro and the GPU clock speed never goes above ~450mhz (GPU cores are 100%). Using other apps that peg the GPU I can see the clock speed is about 1.3ghz. Is this is an issue with tf-metal or am I doing something wrong? FR
Posted
by farbodr.
Last updated
.
Post not yet marked as solved
4 Replies
11k Views
Can I run inference on the new MacBook Pro with M1 Chips (Apple Silicon) using Keras Models (sometimes PyTorch). These would be computer vision models, some might have custom loss functions or metrics and would have been trained on lets say, Google Colab. If I can perform inference, how do I do that? Also, will the Neural Engines help while performing inference or will it boost training if I have to train on the Mac?
Posted
by jmayank23.
Last updated
.
Post not yet marked as solved
0 Replies
311 Views
Hi all, I've spent some time experimenting with the BNNS (Accelerate) LSTM-related APIs lately and despite a distinct lack of documentation (even though the headers have quite a few) a got most things to a point where I think I know what's going on and I get the expected results. However, one thing I have not been able to do is to get this working if inputSize != hiddenSize. I am currently only concerned with a simple unidirectional LSTM with a single layer but none of my permutations of gate "iw_desc" matrices with various 2D layouts and reordering input-size/hidden-size made any difference, ultimately BNNSDirectApplyLSTMBatchTrainingCaching always returns -1 as an indication of error. Any help would be greatly appreciated. PS: The bnns.h framework header file claims that "When a parameter is invalid or an internal error occurs, an error message will be logged. Some combinations of parameters may not be supported. In that case, an info message will be logged.", and yet, I've not been able to find any such messages logged to NSLog() or stderr or Console. Is there a magic environment variable that I need to set to get more verbose logging?
Posted
by andi.
Last updated
.
Post not yet marked as solved
0 Replies
233 Views
HI,@apple dev, can you apple make sure of macos10.13 support coreml using gpu infering? I used macos10.13.6 do a test ,I found my coreml model inferring only using CPU(BNNS), no gpu using .... So can anyone could make sure of that? give me an answer... Thanks
Posted
by ek86.
Last updated
.
Post not yet marked as solved
0 Replies
428 Views
I tried Create ML to train MNIST dataset which has very small images of 0-10 digits. It's the first time I use Create ML but its training speed is still too slow based on what I learnt, MNIST is a very small dataset. I am using a MacBook Pro 2021, 16 inch, with M1 pro + 16GB ram + 1TB SSD. I check the activity monitor and saw that CPU reaches 100%. 14/16 GB of Memory are used, 2GB for cache and 12.5GB of swap used. Memory used by the MLRecipeExecutionService process is 19.55GB. If I double click to see the details, the Virtual Memory Size is 410GB. I ran sudo powermetrics and observe that GPU power is ~50-60mw, which means GPU is not used for training. When I check Disk usage in Activity Monitor, I saw that process MLRecipeExecutionService contributed 1.1TB of Bytes Write. The entire MNIST dataset is only 17.5MB. I don't understand why it's so slow, and so much resources were used. Based on what I've learnt about Machine Learning, this is irregular.
Posted
by Huakun.
Last updated
.
Post not yet marked as solved
0 Replies
244 Views
I am trying to train the pretrained network via transfer learning in CreateML with aprox.4500 images with bounding boxes. The CreateML stopped after 3700 iterations, allocates all memory and doing nothing. I can pause it and CreateML unallocated RAM. After that I can continue runnig training. I have MacMini 64GB RAM, eGPU Radeon 580 8GB, i5 6-core, BigSur, CreateML 3.0 (78.5)
Posted
by rsicak.
Last updated
.
Post not yet marked as solved
1 Replies
743 Views
Hello everyone, We are GPU developers who utilize PTX / GCN / RDNA ISA to develop our software Is there any reference available for asm-level Neural Engine and GPU, so we can write our custom device code and get it built and running? Not sure if this is a normal request, I understand there are high-level libraries such as FFT / BLAS / Accelerate etc available, but we need to go down in order to implement our own technology features that solve some specific problems we need to resolve first before rolling out the product for the Mac OS X platform
Posted Last updated
.
Post not yet marked as solved
0 Replies
238 Views
Not long ago I have updated to Xcode Version 13.0 (13A233), and I have noticed that when I do a clean or build of my project the processor goes to 100% in all my cores, I have a 2.3 GHz 8-Core Intel Core i9 processor:
Posted
by vzpintor.
Last updated
.
Post not yet marked as solved
0 Replies
367 Views
I have a model I developed in Tensorflow 2.3 and then converted it to an MLModel w/ the coreml tools. I also reduce to to fp16. The model works great on most iOS devices but on a few particularly ones like the iPhone 11 pro max A2218 - it gives an NaN error in from on the NeuralEngine- if the same model is run on CPU/GPU t - there is no issues. I also tried the fp32 version of the model and has the same results of NaN on NeuralEngine and works great w/ CPU/GPU. Thoughts suggestions?
Posted
by derekg.
Last updated
.
Post marked as solved
4 Replies
2.8k Views
Hi, I would love to code with the Neural Engine on my macbook pro M1 2020. Is there any low-level API to create my very own work-loads? I am working with audio and MIDI. As well sound synthesis and mixing. Can I use the Neural Engine to offload the CPU? I am especially interested in parallelism using threads. My programming lanuage of choice is ANSI C and Objective C.
Posted
by joel2001k.
Last updated
.
Post not yet marked as solved
2 Replies
647 Views
I would like to generate and run ML program inside an app. I got familiar with the coremltools and MIL format, however I can't seem to find any resources on how to generate mlmodel/mlpackage files using Swift on the device. Is there any Swift equivalent of coremltools? Or is there a way to translate MIL description of a ML program into instance of a MLModel? Or something similar.
Posted
by mlajtos.
Last updated
.
Post not yet marked as solved
0 Replies
579 Views
Using an LSTM model for finance predictions I found these benchmark results: TF 2.7 GPU - 188 Seconds (tensorflow-metal 0.1.2) TF 2.5 GPU - 149 Seconds (tensorflow-metal 0.1.2) The slowness is expected due to a small batch size. TF 2.5 CPU - 6.91 Seconds TF 2.5 CPU - 4.66 Seconds (added disable_eager_execution()) TF 2.7 CPU - 4.47 Seconds So TF 2.7 (master) is about 4% faster using CPU. The metal Plugin is way slower with TF 2.7, but at least it works to enable the GPU. Apple should make the sources for tensorflow-metal available on git and ensure to update it regulary for each TF main releases like currently 2.6.
Posted Last updated
.
Post not yet marked as solved
1 Replies
653 Views
My m1 MBP is not using the full gpu power when running TensorFlow. Its taking about 6 seconds / epoch when for the same task other get 1 second per epoch. when running the training the Mac doesn't get hot and the fans don't rev. Any advice? Thanks, Logan
Posted
by LogqnTy.
Last updated
.