ML Compute

RSS for tag

Accelerate training and validation of neural networks using the CPU and GPUs.

ML Compute Documentation

Posts under ML Compute tag

36 results found
Sort by:
Post not yet marked as solved
17 Views

Illegal instruction: 4

Trying to run python file in vs code but getting error mentioned in the title , I am basically trying to train deep learning model by importing libraries like tensorflow , numpy , pandas , matplotlib etc , I am only getting error Illegal instruction: 4 nothing else and one more thing same code is working fine in windows , plz help
Asked
by himaws72.
Last updated
.
Post not yet marked as solved
56 Views

Tensorflow M1 Max Metal 0.4 convergence problems

We are developing a simple GAN an when training the solution, the behavior of the convergence of the discriminator is different if we use GPU than using only CPU or even executing in Collab. We've read a lot, but this is the only one post that seems to talk about similar behavior. Unfortunately, after updating to 0.4 version problem persists. My Hardware/Software: MacBook Pro. model: MacBookPro18,2. Chip: Apple M1 Max. Cores: 10 (8 de rendimiento y 2 de eficiencia). Memory: 64 GB. firmware: 7459.101.3. OS: Monterey 12.3.1. OS Version: 7459.101.3. Python version 3.8 and libraries (the most related) using !pip freeze keras==2.8.0 Keras-Preprocessing==1.1.2 .... tensorboard==2.8.0 tensorboard-data-server==0.6.1 tensorboard-plugin-wit==1.8.1 tensorflow-datasets==4.5.2 tensorflow-docs @ git+https://github.com/tensorflow/docs@7d5ea2e986a4eae7573be3face00b3cccd4b8b8b tensorflow-macos==2.8.0 tensorflow-metadata==1.7.0 tensorflow-metal==0.4.0 #####. CODE TO REPRODUCE. ####### Code does not fit in the max space in this message... I've shared a Google Collab Notebook at: https://colab.research.google.com/drive/1oDS8EV0eP6kToUYJuxHf5WCZlRL0Ypgn?usp=sharing You can easily see that loss goes to 0 after 1 or 2 epochs when GPU is enabled, buy if GPU is disabled everything is OK
Asked Last updated
.
Post not yet marked as solved
88 Views

TensorFlow installed with PDM not detecting GPU with M1 Pro

I tried to install dependencies with PDM for a project and found the inconvenience that TensorFlow does not detect the GPU of the M1 Pro. When I create a virtual environment with poetry I do not have the same problem. Any hint on how to solve this inconvenience?
Asked Last updated
.
Post not yet marked as solved
259 Views

tensorflow-metal freezing nondeterministically during model training

I am training a model using tensorflow-metal and model training (and the whole application) freezes up. The behavior is nondeterministic. I believe the problem is with Metal (1) because of the contents of the backtraces below, and (2) because when I run the same code on a machine with non-Metal TensorFlow (using a GPU), everything works fine. I can't share my code publicly, but I would be willing to share it with an Apple engineer privately over email if that would help. It's hard to create a minimum reproduction example since my program is somewhat complex and the bug is nondeterministic. The bug does appear pretty reliably. It looks like the problem might be in some Metal Performance Shaders init code. The state of everything (backtraces, etc.) when the program freezes is attached. Backtraces
Asked
by andmis.
Last updated
.
Post not yet marked as solved
1.2k Views

How to use GPU in Tensorflow?

Im using my 2020 Mac mini with M1 chip and this is the first time try to use it on convolutional neural network training. So the problem is I install the python(ver 3.8.12) using miniforge3 and Tensorflow following this instruction. But still facing the GPU problem when training a 3D Unet. Here's part of my code and hoping to receive some suggestion to fix this. import tensorflow as tf from tensorflow import keras import json import numpy as np import pandas as pd import nibabel as nib import matplotlib.pyplot as plt from tensorflow.keras import backend as K #check available devices def get_available_devices(): local_device_protos = device_lib.list_local_devices() return [x.name for x in local_device_protos] print(get_available_devices()) Metal device set to: Apple M1 ['/device:CPU:0', '/device:GPU:0'] 2022-02-09 11:52:55.468198: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2022-02-09 11:52:55.468885: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: ) X_norm_with_batch_dimension = np.expand_dims(X_norm, axis=0) #tf.device('/device:GPU:0') #Have tried this line doesn't work #tf.debugging.set_log_device_placement(True) #Have tried this line doesn't work patch_pred = model.predict(X_norm_with_batch_dimension) InvalidArgumentError: 2 root error(s) found. (0) INVALID_ARGUMENT: CPU implementation of Conv3D currently only supports the NHWC tensor format. [[node model/conv3d/Conv3D (defined at /Users/mwshay/miniforge3/envs/tensor/lib/python3.8/site-packages/keras/layers/convolutional.py:231) ]] [[model/conv3d/Conv3D/_4]] (1) INVALID_ARGUMENT: CPU implementation of Conv3D currently only supports the NHWC tensor format. [[node model/conv3d/Conv3D (defined at /Users/mwshay/miniforge3/envs/tensor/lib/python3.8/site-packages/keras/layers/convolutional.py:231) ]] 0 successful operations. 0 derived errors ignored. The code is executable on Google Colab but can't run on Mac mini locally with Jupyter notebook. The NHWC tensor format problem might indicate that Im using my CPU to execute the code instead of GPU. Is there anyway to optimise GPU to train the network in Tensorflow?
Asked
by MW_Shay.
Last updated
.
Post not yet marked as solved
748 Views

CoreML performance issue on M1Max about Neural Engine

I am using a CoreML model from https://github.com/PeterL1n/RobustVideoMatting. I have an M1Macbook13 16G and an M1Max Macbook 16 64G. When "computeUnits" using .all or default, M1Max 16 is much slower than M1 13, finish one prediction time is 0.202 and 0.155. Using .cpuOnly, M1Max 16 is fast a little, time is 0.129 and 0.146. Using .cpuAndGPU, M1Max 16 is much fast than M1 13, time is 0.057 and 0.086. And when I use .all or default, M1Max will appear error messages like this: H11ANEDevice::H11ANEDeviceOpen IOServiceOpen failed result= 0xe00002e2 H11ANEDevice::H11ANEDeviceOpen kH11ANEUserClientCommand_DeviceOpen call failed result=0xe00002bc Error opening LB - status=0xe00002bc.. Skipping LB and retrying But M1 13 doesn't have any errors. So I want to know is this a bug of CoreML or M1Max? My Codes is like this: let config = MLModelConfiguration() config.computeUnits = .all let model = try rvm_mobilenetv3_1920x1080_s0_25_int8_ANE(configuration: config) let image1 = NSImage(named: "test1")?.cgImage(forProposedRect: nil, context: nil, hints: nil) let input = try? rvm_mobilenetv3_1920x1080_s0_25_int8_ANEInput(srcWith:image1!, r1i: MLMultiArray(), r2i: MLMultiArray(), r3i: MLMultiArray(), r4i: MLMultiArray())  _ = try? model.prediction(input: input!)
Asked
by Tinyfool.
Last updated
.
Post not yet marked as solved
12k Views

Why Python native on M1 Max is greatly slower than Python on old Intel i5?

I just got my new MacBook Pro with M1 Max chip and am setting up Python. I've tried several combinational settings to test speed - now I'm quite confused. First put my questions here: Why python run natively on M1 Max is greatly (~100%) slower than on my old MacBook Pro 2016 with Intel i5? On M1 Max, why there isn't significant speed difference between native run (by miniforge) and run via Rosetta (by anaconda) - which is supposed to be slower ~20%? On M1 Max and native run, why there isn't significant speed difference between conda installed Numpy and TensorFlow installed Numpy - which is supposed to be faster? On M1 Max, why run in PyCharm IDE is constantly slower ~20% than run from terminal, which doesn't happen on my old Intel Mac. Evidence supporting my questions is as follows: Here are the settings I've tried: 1. Python installed by Miniforge-arm64, so that python is natively run on M1 Max Chip. (Check from Activity Monitor, Kind of python process is Apple). Anaconda.: Then python is run via Rosseta. (Check from Activity Monitor, Kind of python process is Intel). 2. Numpy installed by conda install numpy: numpy from original conda-forge channel, or pre-installed with anaconda. Apple-TensorFlow: with python installed by miniforge, I directly install tensorflow, and numpy will also be installed. It's said that, numpy installed in this way is optimized for Apple M1 and will be faster. Here is the installation commands: conda install -c apple tensorflow-deps python -m pip install tensorflow-macos python -m pip install tensorflow-metal 3. Run from Terminal. PyCharm (Apple Silicon version). Here is the test code: import time import numpy as np np.random.seed(42) a = np.random.uniform(size=(300, 300)) runtimes = 10 timecosts = [] for _ in range(runtimes): s_time = time.time() for i in range(100): a += 1 np.linalg.svd(a) timecosts.append(time.time() - s_time) print(f'mean of {runtimes} runs: {np.mean(timecosts):.5f}s') and here are the results: +-----------------------------------+-----------------------+--------------------+ | Python installed by (run on)→ | Miniforge (native M1) | Anaconda (Rosseta) | +----------------------+------------+------------+----------+----------+---------+ | Numpy installed by ↓ | Run from → | Terminal | PyCharm | Terminal | PyCharm | +----------------------+------------+------------+----------+----------+---------+ | Apple Tensorflow | 4.19151 | 4.86248 | / | / | +-----------------------------------+------------+----------+----------+---------+ | conda install numpy | 4.29386 | 4.98370 | 4.10029 | 4.99271 | +-----------------------------------+------------+----------+----------+---------+ This is quite slow. For comparison, run the same code on my old MacBook Pro 2016 with i5 chip - it costs 2.39917s. another post reports that run with M1 chip (not Pro or Max), miniforge+conda_installed_numpy is 2.53214s, and miniforge+apple_tensorflow_numpy is 1.00613s. you may also try on it your own. Here is the CPU information details: My old i5: $ sysctl -a | grep -e brand_string -e cpu.core_count machdep.cpu.brand_string: Intel(R) Core(TM) i5-6360U CPU @ 2.00GHz machdep.cpu.core_count: 2 My new M1 Max: % sysctl -a | grep -e brand_string -e cpu.core_count machdep.cpu.brand_string: Apple M1 Max machdep.cpu.core_count: 10 I follow instructions strictly from tutorials - but why would all these happen? Is it because of my installation flaws, or because of M1 Max chip? Since my work relies heavily on local runs, local speed is very important to me. Any suggestions to possible solution, or any data points on your own device would be greatly appreciated :)
Asked Last updated
.
Post not yet marked as solved
270 Views

Using CreateML in Playgrounds

I need to build a model to add to my app and tried following the Apple docs here. No luck because I get an error that is discussed on this thread on the forum. I'm still not clear on why the error is occurring and can't resolve it. I wonder if CreateML inside Playgrounds is still supported at all? I tried using the CreateML app that you can access through developer tools but it just crashes my Mac (2017 MBP - is it just too much of a brick to use for ML at this point? I should think not because I've recently built and trained relatively simple models using Tensorflow. + Python on this machine, and the classifier I'm trying to make now is really simple and doesn't have a huge dataset).
Asked
by agaS95.
Last updated
.
Post not yet marked as solved
305 Views

Using HelloPhotogrammetry CL for eGPU

I am using the default HelloPhotogrammetry app you guys made: https://developer.apple.com/documentation/realitykit/creating_a_photogrammetry_command-line_app/ My system originally did not fit the specs because of a GPU issue to run this command line. To solve this issue I bought the Apple supported eGPU Black Magic to allow the graphics issue to function. Here is the error when I run it despite the eGPU: apply_selection_policy_once: prefer use of removable GPUs (via (null):GPUSelectionPolicy->preferRemovable) I have deduced that there needs to be this with the application running it: https://developer.apple.com/documentation/bundleresources/information_property_list/gpuselectionpolicy I tried modifying the Terminal.plist to the updated value - but there was no luck with it. I believe the CL within Xcode needs to have the updated value -- I need help on that aspect to be able to allow the system to use the eGPU. I did create a PropertyList within the MacOS app and added GPUSelectionPolicy with preferRemovable, and I am still having issues with the same above error. Please advice. Also -- to note, I did try to temporary turn off the Prefer External GPU within Terminal -- and it was doing the processing of the Photogrammetry but it was taking awhile to process (>30 mins plus.) I ended up killing that task. I did have a look at Activity Monitor and I did see that my internal GPU was being used, not my eGPU which is what I am trying to use. Previously -- when I did not have the eGPU plugged in - I would be getting an error saying that my specs did not meet criteria, so it was interesting to see that it assumed my Mac had criteria (which it technically did) it just did processing on the less powerful GPU.
Asked Last updated
.
Post not yet marked as solved
389 Views

Can `MTLTexture` be used to store 5-D input tensor?

I'm trying to implement a pytorch custom layer [grid_sampler] (https://pytorch.org/docs/1.9.1/generated/torch.nn.functional.grid_sample.html) on GPU. Both of its inputs, input and grid can be 5-D. My implementation of encodeToCommandBuffer, which is MLCustomLayer protocol's function, is shown below. According to my current attempts, both value of id<MTLTexture> input and id<MTLTexture> grid don't meet expectations. So i wonder can MTLTexture be used to store 5-D input tensor as inputs of encodeToCommandBuffer? Or can anybody help to show me how to use MTLTexture correctly here? Thanks a lot! - (BOOL)encodeToCommandBuffer:(id<MTLCommandBuffer>)commandBuffer             inputs:(NSArray<id<MTLTexture>> *)inputs            outputs:(NSArray<id<MTLTexture>> *)outputs             error:(NSError * _Nullable *)error {   NSLog(@"Dispatching to GPU");   NSLog(@"inputs count %lu", (unsigned long)inputs.count);   NSLog(@"outputs count %lu", (unsigned long)outputs.count);   id<MTLComputeCommandEncoder> encoder = [commandBuffer       computeCommandEncoderWithDispatchType:MTLDispatchTypeSerial];     assert(encoder != nil);       id<MTLTexture> input = inputs[0];   id<MTLTexture> grid = inputs[1];   id<MTLTexture> output = outputs[0];   NSLog(@"inputs shape %lu, %lu, %lu, %lu", (unsigned long)input.width, (unsigned long)input.height, (unsigned long)input.depth, (unsigned long)input.arrayLength);   NSLog(@"grid shape %lu, %lu, %lu, %lu", (unsigned long)grid.width, (unsigned long)grid.height, (unsigned long)grid.depth, (unsigned long)grid.arrayLength);   if (encoder)   {     [encoder setTexture:input atIndex:0];     [encoder setTexture:grid atIndex:1];     [encoder setTexture:output atIndex:2];           NSUInteger wd = grid_sample_Pipeline.threadExecutionWidth;     NSUInteger ht = grid_sample_Pipeline.maxTotalThreadsPerThreadgroup / wd;     MTLSize threadsPerThreadgroup = MTLSizeMake(wd, ht, 1);     MTLSize threadgroupsPerGrid = MTLSizeMake((input.width + wd - 1) / wd, (input.height + ht - 1) / ht, input.arrayLength);     [encoder setComputePipelineState:grid_sample_Pipeline];     [encoder dispatchThreadgroups:threadgroupsPerGrid threadsPerThreadgroup:threadsPerThreadgroup];     [encoder endEncoding];         }   else     return NO;   *error = nil;   return YES; }
Asked
by stx-000.
Last updated
.
Post not yet marked as solved
507 Views

Tensorflow LSTM gives different when changing batch size running on M1 Max Mac

When running the same code on my m1 Mac with tensorflow-metal vs in a google collab I see a problem with results. The code: https://colab.research.google.com/drive/13GzSfToUvmmGHaROS-sGCu9mY1n_2FYf?usp=sharing import tensorflow as tf import numpy as np import pandas as pd # Setup model input_shape = (10, 5) model_tst = tf.keras.Sequential() model_tst.add(tf.keras.Input(shape=input_shape)) model_tst.add(tf.keras.layers.LSTM(100, return_sequences=True)) model_tst.add(tf.keras.layers.Dense(2, activation="sigmoid")) model_tst.summary() optimizer = tf.keras.optimizers.Adam(learning_rate=0.01) loss = tf.keras.losses.BinaryCrossentropy(from_logits=False) model_tst.compile( loss=loss, optimizer=optimizer, # metrics=[tf.keras.metrics.BinaryCrossentropy() metrics=["mse" ] ) # Generate step data random_input = np.ones((11, 10, 5)) random_input[:, 8:, :] = 99 # Predictions random_output2 = model_tst.predict(random_input, batch_size=1)[0, :, :].reshape(10, 2) random_output3 = model_tst.predict(random_input, batch_size=10)[0, :, :].reshape(10, 2) # Compare results diff2 = random_output3 - random_output2 pd.DataFrame(diff2).T Output on Mac: Output on google collab: If I reduce the number of nodes in the LSTM I can get the problem to disappear: import tensorflow as tf import numpy as np import pandas as pd # Setup model input_shape = (10, 5) model_tst = tf.keras.Sequential() model_tst.add(tf.keras.Input(shape=input_shape)) model_tst.add(tf.keras.layers.LSTM(2, return_sequences=True)) model_tst.add(tf.keras.layers.Dense(2, activation="sigmoid")) model_tst.summary() optimizer = tf.keras.optimizers.Adam(learning_rate=0.01) loss = tf.keras.losses.BinaryCrossentropy(from_logits=False) model_tst.compile( loss=loss, optimizer=optimizer, # metrics=[tf.keras.metrics.BinaryCrossentropy() metrics=["mse" ] ) # Generate step data random_input = np.ones((11, 10, 5)) random_input[:, 8:, :] = 99 # Predictions random_output2 = model_tst.predict(random_input, batch_size=1)[0, :, :].reshape(10, 2) random_output3 = model_tst.predict(random_input, batch_size=10)[0, :, :].reshape(10, 2) # Compare results diff2 = random_output3 - random_output2 pd.DataFrame(diff2).T -> outputs are the same in this case. I guess this has to do with how calculations are getting passed to Apple silicon. Any debugging steps I should try to result this problem? Info: I setup tensor flow using the following steps: https://developer.apple.com/metal/tensorflow-plugin/ When running I get this output showing that the GPU plugins are being used
Asked Last updated
.
Post marked as solved
616 Views

How to dispatch my `MLCustomLayer` to GPU instead of CPU

MLCustomLayer implementation always dispatches to CPU instead of GPU Background: I am trying to run my CoreML model with a custom layer on the iPhone 13 Pro. My custom layer runs successfully on the CPU, however it still dispatches to the CPU instead of the mobile's GPU despite the encodeToCommandBuffer member function being defined in the application's binding class for the custom layer. I have been following the CoreMLTools documentation's suggested Swift example to get this working, but note that my implementation is purely in Objective-C++. Despite reading in depth into the documentation, I still have not come across any resolution to the problem. Any help looking into this issue (or perhaps even bug in CoreML) would be much appreciated! Below, I provide a minimal example based off of the Swift example mentioned above. Implementation My toy Objective C++ implementation is based off of the Swift example here. This implements the Swish activation function for both the CPU and GPU. PyTorch model to CoreML MLModel transformation For brevity, I will not define my toy PyTorch model, nor the Python bindings to allow the custom Swish layer to be scripted/traced and then converted to a CoreML MLModel, but I can provide these if necessary. Just note that the Python layer's name and bindings should match the name in the class defined below, ie. ToySwish. To convert the scripted/traced PyTorch model (called torchscript_model in the listing below) to a CoreML MLModel, I use CoreMLTools (from Python) and then save the model as follows; input_shapes = [[1,64,256,256]] mlmodel = coremltools.converters.convert( torchscript_model, source='pytorch', inputs=[coremltools.TensorType(name=f'input_{i}', shape=input_shape) for i, input_shape in enumerate(input_shapes)], add_custom_layers = True, minimum_deployment_target = coremltools.target.iOS14, compute_units = coremltools.ComputeUnit.CPU_AND_GPU, ) mlmodel.save('toy_swish_model.mlmodel') Metal shader I use the same Metal shader function swish from Swish.metal here. MLCustomLayer binding class for Swish MLModel layer I define an analogous Objective-C++ class to the Swift example. This class inherits from NSObject and the MLCustomLayer protocol. This class follows the guidelines in the Apple documentation for integrating a CoreML MLModel with a custom layer. This is defined as follows; Class definition and resource setup; #import <Foundation/Foundation.h> #include <CoreML/CoreML.h> #import <Metal/Metal.h> @interface ToySwish : NSObject<MLCustomLayer>{} @end @implementation ToySwish{ id<MTLComputePipelineState> swishPipeline; } - (instancetype) initWithParameterDictionary:(NSDictionary<NSString *,id> *)parameters error:(NSError *__autoreleasing _Nullable *)error{    NSError* errorPSO = nil;   id<MTLDevice> device = MTLCreateSystemDefaultDevice();   id<MTLLibrary> defaultlibrary = [device newDefaultLibrary];   id<MTLFunction> swishFunction = [defaultlibrary newFunctionWithName:@"swish"];   swishPipeline = [device newComputePipelineStateWithFunction:swishFunction error:&errorPSO]; assert(errorPSO == nil);   return self; } - (BOOL) setWeightData:(NSArray<NSData *> *)weights error:(NSError *__autoreleasing _Nullable *) error{   return YES; } - (NSArray<NSArray<NSNumber *> * > *) outputShapesForInputShapes:(NSArray<NSArray<NSNumber *> *> *)inputShapes error:(NSError *__autoreleasing _Nullable *) error{   return inputShapes; } CPU compute method (this is only shown for completeness); - (BOOL) evaluateOnCPUWithInputs:(NSArray<MLMultiArray *> *)inputs outputs:(NSArray<MLMultiArray *> *)outputs error:(NSError *__autoreleasing _Nullable *)error{   NSLog(@"Dispatching to CPU");   for(NSInteger i = 0; i < inputs.count; i++){    NSInteger num_elems = inputs[i].count;    float* input_ptr = (float *) inputs[i].dataPointer;    float* output_ptr = (float *) outputs[i].dataPointer;        for(int j = 0; j < num_elems; j++){     output_ptr[j] = 1.0/(1.0 + exp(-input_ptr[j]));    }   }   return YES; } Encode GPU commands to command buffer; Note, according to documentation, this command buffer should not be committed, as it is executed by CoreML after this method returns. - (BOOL) encodeToCommandBuffer:(id<MTLCommandBuffer>)commandBuffer inputs:(NSArray<id<MTLTexture>> *)inputs outputs:(NSArray<id<MTLTexture>> *)outputs error:(NSError *__autoreleasing _Nullable *)error{      NSLog(@"Dispatching to GPU");      id<MTLComputeCommandEncoder> computeEncoder = [commandBuffer computeCommandEncoderWithDispatchType:MTLDispatchTypeSerial];   assert(computeEncoder != nil); for(int i = 0; i < inputs.count; i++){      [computeEncoder setComputePipelineState:swishPipeline];   [computeEncoder setTexture:inputs[i] atIndex:0];   [computeEncoder setTexture:outputs[i] atIndex:1];      NSInteger w = swishPipeline.threadExecutionWidth;   NSInteger h = swishPipeline.maxTotalThreadsPerThreadgroup / w;   MTLSize threadGroupSize = MTLSizeMake(w, h, 1);   NSInteger groupWidth = (inputs[0].width    + threadGroupSize.width - 1) / threadGroupSize.width;   NSInteger groupHeight = (inputs[0].height   + threadGroupSize.height - 1) / threadGroupSize.height;   NSInteger groupDepth = (inputs[0].arrayLength + threadGroupSize.depth - 1) / threadGroupSize.depth;   MTLSize threadGroups = MTLSizeMake(groupWidth, groupHeight, groupDepth);   [computeEncoder dispatchThreads:threadGroups threadsPerThreadgroup:threadGroupSize];   [computeEncoder endEncoding];    }   return YES; } Run inference for a given input The MLModel is loaded and compiled in the application. I check to ensure that the model configuration's computeUnits are set to MLComputeUnitsAll as desired (this should allow dispatching to CPU, GPU and ANU) of the MLModel layers. I define a MLDictionaryFeatureProvider object called feature_provider from a NSMutableDictionary of input features (input tensors in this case), and then pass this to the predictionFromFeatures method of my loaded model model as follows; @autoreleasepool { [model predictionFromFeatures:feature_provider error:error]; } This computes a single forward pass of my model. When this executes, you can see that the 'Dispatching to CPU' string is printed instead of the 'Dispatching to GPU' string. This (along with the slow execution time) indicates the Swish layer is being run from the evaluateOnCPUWithInputs method and thus on the CPU, instead of the GPU as expected. I am quite new to developing for iOS and to Objective-C++, so I might have missed something that is quite simple, however from reading the documentation and examples, it is not at all clear to me what the issue is. Any help or advice would be really appreciated :) Environment XCode 13.1 iPhone 13 iOS 15.1.1 iOS deployment target 15.0
Asked Last updated
.
Post marked as solved
1.2k Views

M1 Pro After installing TF based on Official apple guide gives ERROR for even importing TF

SYSTEM: MacBook Pro 14 (M1 Apple Silicon) MacOS 12.0.1 DONE: https://developer.apple.com/metal/tensorflow-plugin/ I have followed this for ARM M1 apple silicon for my 14" (I had anaconda installed before, that may causing the error but I DO NOT want to delete my anaconda all together) CODE import tensorflow as tf ERROR --------------------------------------------------------------------------- ImportError Traceback (most recent call last) ~/miniforge3/lib/python3.9/site-packages/numpy/core/__init__.py in <module> 21 try: ---> 22 from . import multiarray 23 except ImportError as exc: ~/miniforge3/lib/python3.9/site-packages/numpy/core/multiarray.py in <module> 11 ---> 12 from . import overrides 13 from . import _multiarray_umath ~/miniforge3/lib/python3.9/site-packages/numpy/core/overrides.py in <module> 6 ----> 7 from numpy.core._multiarray_umath import ( 8 add_docstring, implement_array_function, _get_implementing_args) ImportError: dlopen(/Users/ps/miniforge3/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-darwin.so, 0x0002): Library not loaded: @rpath/libcblas.3.dylib Referenced from: /Users/ps/miniforge3/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-darwin.so Reason: tried: '/Users/ps/miniforge3/lib/libcblas.3.dylib' (no such file), '/Users/ps/miniforge3/lib/libcblas.3.dylib' (no such file), '/Users/ps/miniforge3/lib/python3.9/site-packages/numpy/core/../../../../libcblas.3.dylib' (no such file), '/Users/ps/miniforge3/lib/libcblas.3.dylib' (no such file), '/Users/ps/miniforge3/lib/libcblas.3.dylib' (no such file), '/Users/ps/miniforge3/lib/python3.9/site-packages/numpy/core/../../../../libcblas.3.dylib' (no such file), '/Users/ps/miniforge3/lib/libcblas.3.dylib' (no such file), '/Users/ps/miniforge3/bin/../lib/libcblas.3.dylib' (no such file), '/Users/ps/miniforge3/lib/libcblas.3.dylib' (no such file), '/Users/ps/miniforge3/bin/../lib/libcblas.3.dylib' (no such file), '/usr/local/lib/libcblas.3.dylib' (no such file), '/usr/lib/libcblas.3.dylib' (no such file) During handling of the above exception, another exception occurred: ImportError Traceback (most recent call last) /var/folders/yp/mq9ddgh54gjg2rp7mw_t015c0000gn/T/ipykernel_95218/3793406994.py in <module> ----> 1 import tensorflow as tf ~/miniforge3/lib/python3.9/site-packages/tensorflow/__init__.py in <module> 39 import sys as _sys 40 ---> 41 from tensorflow.python.tools import module_util as _module_util 42 from tensorflow.python.util.lazy_loader import LazyLoader as _LazyLoader 43 ~/miniforge3/lib/python3.9/site-packages/tensorflow/python/__init__.py in <module> 39 40 from tensorflow.python import pywrap_tensorflow as _pywrap_tensorflow ---> 41 from tensorflow.python.eager import context 42 43 # pylint: enable=wildcard-import ~/miniforge3/lib/python3.9/site-packages/tensorflow/python/eager/context.py in <module> 28 29 from absl import logging ---> 30 import numpy as np 31 import six 32 ~/miniforge3/lib/python3.9/site-packages/numpy/__init__.py in <module> 138 from . import _distributor_init 139 --> 140 from . import core 141 from .core import * 142 from . import compat ~/miniforge3/lib/python3.9/site-packages/numpy/core/__init__.py in <module> 46 """ % (sys.version_info[0], sys.version_info[1], sys.executable, 47 __version__, exc) ---> 48 raise ImportError(msg) 49 finally: 50 for envkey in env_added: ImportError: IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE! Importing the numpy C-extensions failed. This error can happen for many reasons, often due to issues with your setup or how NumPy was installed. We have compiled some common reasons and troubleshooting tips at: https://numpy.org/devdocs/user/troubleshooting-importerror.html Please note and check the following: * The Python version is: Python3.9 from "/Users/ps/miniforge3/bin/python" * The NumPy version is: "1.19.5" and make sure that they are the versions you expect. Please carefully study the documentation linked above for further help. Original error was: dlopen(/Users/ps/miniforge3/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-darwin.so, 0x0002): Library not loaded: @rpath/libcblas.3.dylib Referenced from: /Users/ps/miniforge3/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-darwin.so Reason: tried: '/Users/ps/miniforge3/lib/libcblas.3.dylib' (no such file), '/Users/ps/miniforge3/lib/libcblas.3.dylib' (no such file), '/Users/ps/miniforge3/lib/python3.9/site-packages/numpy/core/../../../../libcblas.3.dylib' (no such file), '/Users/ps/miniforge3/lib/libcblas.3.dylib' (no such file), '/Users/ps/miniforge3/lib/libcblas.3.dylib' (no such file), '/Users/ps/miniforge3/lib/python3.9/site-packages/numpy/core/../../../../libcblas.3.dylib' (no such file), '/Users/ps/miniforge3/lib/libcblas.3.dylib' (no such file), '/Users/ps/miniforge3/bin/../lib/libcblas.3.dylib' (no such file), '/Users/ps/miniforge3/lib/libcblas.3.dylib' (no such file), '/Users/ps/miniforge3/bin/../lib/libcblas.3.dylib' (no such file), '/usr/local/lib/libcblas.3.dylib' (no such file), '/usr/lib/libcblas.3.dylib' (no such file)
Asked
by sogu.
Last updated
.
Post not yet marked as solved
853 Views

ML Compute C APIs?

ML Compute APIs - https://developer.apple.com/documentation/mlcompute are in Swift. Are there C APIs for ML Compute?
Asked
by danzheng.
Last updated
.
Post not yet marked as solved
279 Views

MLCLSTMLayer GPU Issue

Hello! I’m having an issue with retrieving the trained weights from MLCLSTMLayer in ML Compute when training on a GPU. I maintain references to the input-weights, hidden-weights, and biases tensors and use the following code to extract the data post-training: extension MLCTensor { func dataArray<Scalar>(as _: Scalar.Type) throws -> [Scalar] where Scalar: Numeric { let count = self.descriptor.shape.reduce(into: 1) { (result, value) in result *= value } var array = [Scalar](repeating: 0, count: count) self.synchronizeData() // This *should* copy the latest data from the GPU to memory that’s accessible by the CPU _ = try array.withUnsafeMutableBytes { (pointer) in guard let data = self.data else { throw DataError.uninitialized // A custom error that I declare elsewhere } data.copyBytes(to: pointer) } return array } } The issue is that when I call dataArray(as:) on a weights or biases tensor for an LSTM layer that has been trained on a GPU, the values that it retrieves are the same as they were before training began. For instance, if I initialize the biases all to 0 and then train the LSTM layer on a GPU, the biases values seemingly remain 0 post-training, even though the reported loss values decrease as you would expect. This issue does not occur when training an LSTM layer on a CPU, and it also does not occur when training a fully-connected layer on a GPU. Since both types of layers work properly on a CPU but only MLCFullyConnectedLayer works properly on a GPU, it seems that the issue is a bug in ML Compute’s GPU implementation of MLCLSTMLayer specifically. For reference, I’m testing my code on M1 Max. Am I doing something wrong, or is this an actual bug that I should report in Feedback Assistant?
Asked Last updated
.