tensorflow-metal

RSS for tag

TensorFlow accelerates machine learning model training with Metal on Mac GPUs.

tensorflow-metal Documentation

Posts under tensorflow-metal tag

126 Posts
Sort by:
Post not yet marked as solved
0 Replies
484 Views
Working Environment MacBook Pro 14' with M2-Pro chip macOS Sonoma 14.0 Python 3.11.4 tensorflow 2.14.0, tensorflow-macos 2.14.0, tensorflow-metal 1.1.0 Issue Description Hi there! I met an issue when working around with Keras' TextVectorization preprocessing layer. text_vectorization = keras.layers.TextVectorization(output_mode="tf_idf") text_vectorization.adapt(ds.map(lambda x: x['title'])) The inputs are string contents. And here is the trackback: --------------------------------------------------------------------------- NotFoundError Traceback (most recent call last) /Users/ken/Workspaces/MLE101/tfrs101/preprocess.ipynb Cell 13 line 3 1 # with tf.device('/CPU:0'): 2 text_vectorization = keras.layers.TextVectorization(output_mode="tf_idf") ----> 3 text_vectorization.adapt(ds.map(lambda x: x['title'])) File ~/miniconda3/envs/ds-101/lib/python3.11/site-packages/keras/src/layers/preprocessing/text_vectorization.py:473, in TextVectorization.adapt(self, data, batch_size, steps) 423 def adapt(self, data, batch_size=None, steps=None): 424 """Computes a vocabulary of string terms from tokens in a dataset. 425 426 Calling `adapt()` on a `TextVectorization` layer is an alternative to (...) 471 argument is not supported with array inputs. 472 """ --> 473 super().adapt(data, batch_size=batch_size, steps=steps) File ~/miniconda3/envs/ds-101/lib/python3.11/site-packages/keras/src/engine/base_preprocessing_layer.py:258, in PreprocessingLayer.adapt(self, data, batch_size, steps) 256 with data_handler.catch_stop_iteration(): 257 for _ in data_handler.steps(): --> 258 self._adapt_function(iterator) 259 if data_handler.should_sync: 260 context.async_wait() File ~/miniconda3/envs/ds-101/lib/python3.11/site-packages/tensorflow/python/util/traceback_utils.py:153, in filter_traceback.<locals>.error_handler(*args, **kwargs) 151 except Exception as e: 152 filtered_tb = _process_traceback_frames(e.__traceback__) --> 153 raise e.with_traceback(filtered_tb) from None 154 finally: 155 del filtered_tb File ~/miniconda3/envs/ds-101/lib/python3.11/site-packages/tensorflow/python/eager/execute.py:60, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name) 53 # Convert any objects of type core_types.Tensor to Tensor. 54 inputs = [ 55 tensor_conversion_registry.convert(t) 56 if isinstance(t, core_types.Tensor) 57 else t 58 for t in inputs 59 ] ---> 60 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, 61 inputs, attrs, num_outputs) 62 except core._NotOkStatusException as e: 63 if name is not None: NotFoundError: Graph execution error: Detected at node StringSplit/stack defined at (most recent call last): ... No registered 'ExpandDims' OpKernel for 'GPU' devices compatible with node {{node StringSplit/stack}} (OpKernel was found, but attributes didn't match) Requested Attributes: T=DT_STRING, Tdim=DT_INT32, _XlaHasReferenceVars=false, _device="/job:localhost/replica:0/task:0/device:GPU:0" . Registered: device='XLA_CPU_JIT'; Tdim in [DT_INT32, DT_INT64]; T in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, DT_INT8, DT_COMPLEX64, DT_INT64, DT_BOOL, DT_QINT8, DT_QUINT8, DT_QINT32, DT_BFLOAT16, DT_UINT16, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64, DT_FLOAT8_E5M2, DT_FLOAT8_E4M3FN] device='DEFAULT'; T in [DT_HALF]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_HALF]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_BFLOAT16]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_BFLOAT16]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_FLOAT]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_FLOAT]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_DOUBLE]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_DOUBLE]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_UINT64]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_UINT64]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_INT64]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_INT64]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_UINT32]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_UINT32]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_UINT16]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_UINT16]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_INT16]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_INT16]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_UINT8]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_UINT8]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_INT8]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_INT8]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_COMPLEX64]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_COMPLEX64]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_COMPLEX128]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_COMPLEX128]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_BOOL]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_BOOL]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_INT32]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_INT32]; Tdim in [DT_INT64] device='CPU'; Tdim in [DT_INT32] device='CPU'; Tdim in [DT_INT64] [[StringSplit/stack]] [Op:__inference_adapt_step_71204] I have to explicitly specify to use CPU to make it work - with tf.device('/CPU:0'): text_vectorization = keras.layers.TextVectorization(output_mode="tf_idf") text_vectorization.adapt(ds.map(lambda x: x['title'])) I have referred to this post: https://developer.apple.com/forums/thread/700108
Posted Last updated
.
Post not yet marked as solved
2 Replies
447 Views
Hi, I've been going over this tutorial of autoencoders https://www.tensorflow.org/tutorials/generative/autoencoder#third_example_anomaly_detection Notebook link https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/generative/autoencoder.ipynb And when I downloaded and ran the notebook locally on my M2 Pro Max - the results were dramatically different and the plots were way off. This is the plot in the working notebook: This is the local plot: I checked every moving piece and the difference seems to be in the output of the autoencoder, these lines: encoded_data = autoencoder.encoder(normal_test_data).numpy() decoded_data = autoencoder.decoder(encoded_data).numpy() The working notebook output is: The local output: And the overall result is notebook: Accuracy = 0.944 Precision = 0.9941176470588236 Recall = 0.9053571428571429 local: Accuracy = 0.44 Precision = 0.0 Recall = 0.0 I'm using Mac M2 Pro Max Python 3.10.12 Tensorflow 2.14.0 Can anyone help? Thanks a lot in advance.
Posted Last updated
.
Post not yet marked as solved
0 Replies
391 Views
`print("Hello") import tensorflow as tf` I have an error during installing tensorflow "Process finished with exit code 132 (interrupted by signal 4: SIGILL)" Mac air 2022 M2 14.1 | Tensorflow latest version | Python version 3.11.5 Who can help me please? I have tried different variants of tensorflow (for Mac, for cpu and other versions). Also I have tried anaconda and miniconda but I can't. Process finished with exit code 132 (interrupted by signal 4: SIGILL)
Posted
by toniX.
Last updated
.
Post not yet marked as solved
0 Replies
318 Views
I have tried different variants of tensorflow (for Mac, for cpu and other versions). Also I have used anaconda and miniconda but I can't. Process finished with exit code 132 (interrupted by signal 4: SIGILL)
Posted
by toniX.
Last updated
.
Post not yet marked as solved
0 Replies
486 Views
I have tried too many different variants. I've tried every version of module tensorflow (for Mac, for cpu...) I have tried anaconda and miniconda. At the result I can't do that. Please help me
Posted
by toniX.
Last updated
.
Post not yet marked as solved
1 Replies
492 Views
I've been running tensorflow with python 3.9 to training a CNN model, and this process is accelerated by the GPU. After 80 epochs the process went to sleep (status S) and its GPU usage drops to 0 percent, I am wondering if this traing process crashed the GPU or the OS is mandatating the process to go to sleep because it takes up too much GPU time? Thanks a lot!
Posted
by chaoyi240.
Last updated
.
Post not yet marked as solved
2 Replies
474 Views
I have been following the instructions here: https://developer.apple.com/metal/tensorflow-plugin/ I manage to execute step 1 set up the environment, step 2 install base Tensorflow but when I try to execute step 3 Install tensorflow-metal plug-in with the line "python -m pip install tensorflow-metal", I get the following messages: "ERROR: Could not find a version that satisfies the requirement tensorflow-metal (from versions: none) ERROR: No matching distribution found for tensorflow-metal" What am I missing here? So the code used are as follows: Step 1 python3 -m venv ~/venv-metal source ~/venv-metal/bin/activate python -m pip install -U pip Step 2 python -m pip install tensorflow Step 3 python -m pip install tensorflow-metal
Posted
by yauhooi.
Last updated
.
Post not yet marked as solved
0 Replies
468 Views
Hi, When I try to train resnet-50 with tensorflow-metal I found the l2 regularizer makes each epoch take almost 4x as long (~220ms instead of 60ms). I'm on a M1 Max 16" MBP. It seems like regularization shouldn't add that much time, is there anything I can do to make it faster? Here's some sample code that reproduces the issue: import tensorflow as tf from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, ZeroPadding2D,\ Flatten, BatchNormalization, AveragePooling2D, Dense, Activation, Add from tensorflow.keras.regularizers import l2 from tensorflow.keras.models import Model from tensorflow.keras import activations import random import numpy as np random.seed(1234) np.random.seed(1234) tf.random.set_seed(1234) batch_size = 64 (train_im, train_lab), (test_im, test_lab) = tf.keras.datasets.cifar10.load_data() train_im, test_im = train_im/255.0 , test_im/255.0 train_lab_categorical = tf.keras.utils.to_categorical( train_lab, num_classes=10, dtype='uint8') train_DataGen = tf.keras.preprocessing.image.ImageDataGenerator() train_set_data = train_DataGen.flow(train_im, train_lab, batch_size=batch_size, shuffle=False) # Change this to l2 for it to train much slower regularizer = None # l2(0.001) def res_identity(x, filters): x_skip = x f1, f2 = filters x = Conv2D(f1, kernel_size=(1, 1), strides=(1, 1), padding='valid', use_bias=False, kernel_regularizer=regularizer)(x) x = BatchNormalization()(x) x = Activation(activations.relu)(x) x = Conv2D(f1, kernel_size=(3, 3), strides=(1, 1), padding='same', use_bias=False, kernel_regularizer=regularizer)(x) x = BatchNormalization()(x) x = Activation(activations.relu)(x) x = Conv2D(f2, kernel_size=(1, 1), strides=(1, 1), padding='valid', use_bias=False, kernel_regularizer=regularizer)(x) x = BatchNormalization()(x) x = Add()([x, x_skip]) x = Activation(activations.relu)(x) return x def res_conv(x, s, filters): x_skip = x f1, f2 = filters x = Conv2D(f1, kernel_size=(1, 1), strides=(s, s), padding='valid', use_bias=False, kernel_regularizer=regularizer)(x) x = BatchNormalization()(x) x = Activation(activations.relu)(x) x = Conv2D(f1, kernel_size=(3, 3), strides=(1, 1), padding='same', use_bias=False, kernel_regularizer=regularizer)(x) x = BatchNormalization()(x) x = Activation(activations.relu)(x) x = Conv2D(f2, kernel_size=(1, 1), strides=(1, 1), padding='valid', use_bias=False, kernel_regularizer=regularizer)(x) x = BatchNormalization()(x) x_skip = Conv2D(f2, kernel_size=(1, 1), strides=(s, s), padding='valid', use_bias=False, kernel_regularizer=regularizer)(x_skip) x_skip = BatchNormalization()(x_skip) x = Add()([x, x_skip]) x = Activation(activations.relu)(x) return x input = Input(shape=(train_im.shape[1], train_im.shape[2], train_im.shape[3]), batch_size=batch_size) x = ZeroPadding2D(padding=(3, 3))(input) x = Conv2D(64, kernel_size=(7, 7), strides=(2, 2), use_bias=False)(x) x = BatchNormalization()(x) x = Activation(activations.relu)(x) x = MaxPooling2D((3, 3), strides=(2, 2))(x) x = res_conv(x, s=1, filters=(64, 256)) x = res_identity(x, filters=(64, 256)) x = res_identity(x, filters=(64, 256)) x = res_conv(x, s=2, filters=(128, 512)) x = res_identity(x, filters=(128, 512)) x = res_identity(x, filters=(128, 512)) x = res_identity(x, filters=(128, 512)) x = res_conv(x, s=2, filters=(256, 1024)) x = res_identity(x, filters=(256, 1024)) x = res_identity(x, filters=(256, 1024)) x = res_identity(x, filters=(256, 1024)) x = res_identity(x, filters=(256, 1024)) x = res_identity(x, filters=(256, 1024)) x = res_conv(x, s=2, filters=(512, 2048)) x = res_identity(x, filters=(512, 2048)) x = res_identity(x, filters=(512, 2048)) x = AveragePooling2D((2, 2), padding='same')(x) x = Flatten()(x) x = Dense(10, activation='softmax', kernel_initializer='he_normal')(x) model = Model(inputs=input, outputs=x, name='Resnet50') opt = tf.keras.optimizers.legacy.SGD(learning_rate = 0.01) model.compile(loss=tf.keras.losses.CategoricalCrossentropy(reduction=tf.keras.losses.Reduction.SUM_OVER_BATCH_SIZE), optimizer=opt) model.fit(x=train_im, y=train_lab_categorical, batch_size=batch_size, epochs=150, steps_per_epoch=train_im.shape[0]/batch_size)
Posted Last updated
.
Post not yet marked as solved
6 Replies
902 Views
I only get this error when using the JAX Metal device (CPU is fine). It seems to be a problem whenever I want to modify values of an array in-place using at and set. note: see current operation: %2903 = "mhlo.scatter"(%arg3, %2902, %2893) ({ ^bb0(%arg4: tensor<f32>, %arg5: tensor<f32>): "mhlo.return"(%arg5) : (tensor<f32>) -> () }) {indices_are_sorted = true, scatter_dimension_numbers = #mhlo.scatter<update_window_dims = [0, 1], inserted_window_dims = [1], scatter_dims_to_operand_dims = [1]>, unique_indices = true} : (tensor<10x100x4xf32>, tensor<1xsi32>, tensor<10x4xf32>) -> tensor<10x100x4xf32> blocks = blocks.at[i].set( ...
Posted
by Cemlyn.
Last updated
.
Post not yet marked as solved
9 Replies
9.9k Views
Hi everyone, I found that the performance of GPU is not good as I expected (as slow as a turtle), I wanna switch from GPU to CPU. but mlcompute module cannot be found, so wired. The same code ran on colab and my computer (jupyter lab) take 156s vs 40 minutes per epoch, respectively. I only used a small dataset (a few thousands of data points), and each epoch only have 20 baches. I am so disappointing and it seems like the "powerful" GPU is a joke. I am using 12.0.1 macOS and the version of tensorflow-macos is 2.6.0 Can anyone tell me why this happens?
Posted Last updated
.
Post not yet marked as solved
5 Replies
2.4k Views
I am aware this question has been asked before, but resolutions have worked for me. When I try to import TensorFlow on my python 3.9 environment I get the following error: uwewinter@Uwes-MBP % python3 Python 3.9.10 (main, Jan 15 2022, 11:40:53)  [Clang 13.0.0 (clang-1300.0.29.3)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow 2022-02-09 21:30:01.701794: F tensorflow/c/experimental/stream_executor/stream_executor.cc:808] Non-OK-status: stream_executor::MultiPlatformManager::RegisterPlatform( std::move(cplatform)) status: INTERNAL: platform is already registered with name: "METAL" zsh: abort      python3 I have the newest versions of TensorFlow-macos and TensorFlow-metal installed: uwewinter@Uwes-MBP % pip3 list | grep tensorflow         tensorflow-estimator           2.7.0 tensorflow-macos               2.7.0 tensorflow-metal               0.3.0 OSX is latest: uwewinter@Uwes-MBP % sw_vers  ProductName: macOS ProductVersion: 12.2 BuildVersion: 21D49 Mac is a 2021 MBP uwewinter@Uwes-MBP % sysctl hw.model hw.model: MacBookPro18,3
Posted
by uwwint.
Last updated
.
Post not yet marked as solved
6 Replies
6.3k Views
Hi, I installed skearn successfully and ran the MINIST toy example successfully. then I started to run my project. The finning thing everything seems good at the start point (at least no ImportError occurs). but when I made some changes of my code and try to run all cells (I use jupyter lab) again, ImportError occurs..... ImportError: dlopen(/Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/qhull.cpython-39-darwin.so, 0x0002): Library not loaded: @rpath/liblapack.3.dylib   Referenced from: /Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/qhull.cpython-39-darwin.so   Reason: tried: '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/../../../../liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/../../../../liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/bin/../lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/bin/../lib/liblapack.3.dylib' (no such file), '/usr/local/lib/liblapack.3.dylib' (no such file), '/usr/lib/liblapack.3.dylib' (no such file) then I have to uninstall scipy, sklearn, etc and reinstall all of them. and my code can be ran again..... Magically I hate to say, anyone knows how to permanently solve this problem? make skearn more stable?
Posted Last updated
.
Post not yet marked as solved
4 Replies
621 Views
Hi. I have followed the instructions here to install tensorflow with GPU support for my 16inch 2019 intel macbook pro (with AMD graphic). The installation process seems to be successful (I get no error) but, when I try to test it, after running import tensorflow as tf I get the following error: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/mahonik/.virtualenvs/tf-metal-new/lib/python3.11/site-packages/tensorflow/__init__.py", line 445, in <module> _ll.load_library(_plugin_dir) File "/Users/mahonik/.virtualenvs/tf-metal-new/lib/python3.11/site-packages/tensorflow/python/framework/load_library.py", line 151, in load_library py_tf.TF_LoadLibrary(lib) tensorflow.python.framework.errors_impl.NotFoundError: dlopen(/Users/mahonik/.virtualenvs/tf-metal-new/lib/python3.11/site-packages/tensorflow-plugins/libmetal_plugin.dylib, 0x0006): Symbol not found: __ZN10tensorflow16TensorShapeProtoC1ERKS0_ Referenced from: <C62E0AB4-567E-3E14-8F96-9F07A746C4DC> /Users/mahonik/.virtualenvs/tf-metal-new/lib/python3.11/site-packages/tensorflow-plugins/libmetal_plugin.dylib Expected in: <0B1F231A-6766-3F61-81D9-6782129807A9> /Users/mahonik/.virtualenvs/tf-metal-new/lib/python3.11/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so My env's packages ... numpy 1.26.1 tensorboard 2.14.1 tensorboard-data-server 0.7.1 tensorflow 2.14.0 tensorflow-estimator 2.14.0 tensorflow-io-gcs-filesystem 0.34.0 tensorflow-metal 1.0.0 ...
Posted Last updated
.
Post not yet marked as solved
9 Replies
3.8k Views
I am comparing my M1 MBA with my 2019 16" Intel MBP. The M1 MBA has tensorflow-metal, while the Intel MBP has TF directly from Google. Generally, the same programs runs 2-5 times FASTER on the Intel MBP, which presumably has no GPU acceleration. Is there anything I could have done wrong on the M1? Here is the start of the metal run: Metal device set to: Apple M1 systemMemory: 16.00 GB maxCacheSize: 5.33 GB 2022-01-19 04:43:50.975025: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2022-01-19 04:43:50.975291: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: ) 2022-01-19 04:43:51.216306: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz Epoch 1/10 2022-01-19 04:43:51.298428: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
Posted Last updated
.
Post not yet marked as solved
7 Replies
1.7k Views
Hello - I have been struggling to find a solution online and I hope you can help me timely. I have installed the latest tesnorflow and tensorflow-metal, I even went to install the ternsorflow-nightly. My app generates the following as a result of my fit function on a CNN model with 8 layers. 2023-09-29 22:21:06.115768: I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M1 Pro 2023-09-29 22:21:06.115846: I metal_plugin/src/device/metal_device.cc:296] systemMemory: 16.00 GB 2023-09-29 22:21:06.116048: I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 5.33 GB 2023-09-29 22:21:06.116264: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2023-09-29 22:21:06.116483: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: ) Most importantly, the learning process is very slow and I'd like to take advantage of al the new features of the latest versions. What can I do?
Posted
by erezkatz.
Last updated
.
Post not yet marked as solved
0 Replies
562 Views
I can train a yolov3 at MacOS M2 ventura with tensorflow-macos=2.9.0 and tensorflow-mental=0.5. But when I upgrade the system to Sonoma14.0. I can not train model with below error. I could train MacOS M1 even I upgrade to Sonoma 14.0 although it report - error: 'anec.gain_offset_control' op. But M1 there is no error for last - `MPSKernel MTLComputePipelineStateCache unable to load function ndArrayConvolution2DGradientWithWeightsA14. Compute function exceeds available temporary registers: (null) When I change my optimizer from Adam to SGD. - error: 'anec.gain_offset_control' op will disappear. So this error happen due something in Adam. But for error - `MPSKernel MTLComputePipelineStateCache unable to load function ndArrayConvolution2DGradientWithWeightsA14. Compute function exceeds available temporary registers: (null) I can not resolve it. ERROR Info MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":294:0)): error: 'anec.gain_offset_control' op result #0 must be 4D/5D memref of 16-bit float or 8-bit signed integer or 8-bit unsigned integer values, but got 'memref<1x1x1x1xi1>' loc("mps_select"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/75428952-3aa4-11ee-8b65-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":294:0)): error: 'anec.gain_offset_control' op result #0 must be 4D/5D memref of 16-bit float or 8-bit signed integer or 8-bit unsigned integer values, but got 'memref<1x1x1x1xi1>' loc("mps_select"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/75428952-3aa4-11ee-8b65-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":294:0)): error: 'anec.gain_offset_control' op result #0 must be 4D/5D memref of 16-bit float or 8-bit signed integer or 8-bit unsigned integer values, but got 'memref<1x1x1x1xi1>' loc("mps_select"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/75428952-3aa4-11ee-8b65-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":294:0)): error: 'anec.gain_offset_control' op result #0 must be 4D/5D memref of 16-bit float or 8-bit signed integer or 8-bit unsigned integer values, but got 'memref<1x1x1x1xi1>' loc("mps_select"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/75428952-3aa4-11ee-8b65-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":294:0)): error: 'anec.gain_offset_control' op result #0 must be 4D/5D memref of 16-bit float or 8-bit signed integer or 8-bit unsigned integer values, but got 'memref<1x1x1x1xi1>' /AppleInternal/Library/BuildRoots/90c9c1ae-37b6-11ee-a991-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Utility/MPSLibrary.mm:550: failed assertion `MPSKernel MTLComputePipelineStateCache unable to load function ndArrayConvolution2DGradientWithWeightsA14. Compute function exceeds available temporary registers: (null)
Posted
by cloris.
Last updated
.
Post not yet marked as solved
1 Replies
326 Views
I am trying hard to get some Whisper software running on Mac under jax. However, this requires jaxlib>=0.4.14. The current metal-jax requires jaxlib==0.4.11. Anyone knows if there is any planned upgrade?
Posted
by PerEK.
Last updated
.
Post not yet marked as solved
3 Replies
5.1k Views
Im using my 2020 Mac mini with M1 chip and this is the first time try to use it on convolutional neural network training. So the problem is I install the python(ver 3.8.12) using miniforge3 and Tensorflow following this instruction. But still facing the GPU problem when training a 3D Unet. Here's part of my code and hoping to receive some suggestion to fix this. import tensorflow as tf from tensorflow import keras import json import numpy as np import pandas as pd import nibabel as nib import matplotlib.pyplot as plt from tensorflow.keras import backend as K #check available devices def get_available_devices(): local_device_protos = device_lib.list_local_devices() return [x.name for x in local_device_protos] print(get_available_devices()) Metal device set to: Apple M1 ['/device:CPU:0', '/device:GPU:0'] 2022-02-09 11:52:55.468198: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2022-02-09 11:52:55.468885: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: ) X_norm_with_batch_dimension = np.expand_dims(X_norm, axis=0) #tf.device('/device:GPU:0') #Have tried this line doesn't work #tf.debugging.set_log_device_placement(True) #Have tried this line doesn't work patch_pred = model.predict(X_norm_with_batch_dimension) InvalidArgumentError: 2 root error(s) found. (0) INVALID_ARGUMENT: CPU implementation of Conv3D currently only supports the NHWC tensor format. [[node model/conv3d/Conv3D (defined at /Users/mwshay/miniforge3/envs/tensor/lib/python3.8/site-packages/keras/layers/convolutional.py:231) ]] [[model/conv3d/Conv3D/_4]] (1) INVALID_ARGUMENT: CPU implementation of Conv3D currently only supports the NHWC tensor format. [[node model/conv3d/Conv3D (defined at /Users/mwshay/miniforge3/envs/tensor/lib/python3.8/site-packages/keras/layers/convolutional.py:231) ]] 0 successful operations. 0 derived errors ignored. The code is executable on Google Colab but can't run on Mac mini locally with Jupyter notebook. The NHWC tensor format problem might indicate that Im using my CPU to execute the code instead of GPU. Is there anyway to optimise GPU to train the network in Tensorflow?
Posted
by MW_Shay.
Last updated
.
Post not yet marked as solved
3 Replies
345 Views
Hi, Are there plans to support complex numbers? Something simple like this: def return_complex(x): return x*1+1.0j x = jnp.ones((10)) print(return_complex(x)) results in an error.
Posted Last updated
.
Post not yet marked as solved
1 Replies
360 Views
Good evening! Tried to use Flax nn.ConvTranspose which calls jax.lax.conv_transpose but it looks like it isn't implemented correctly for the METAL backend, works fine on CPU. File "/Users/cemlyn/Documents/VCLless/mnist_vae/venv/lib/python3.11/site-packages/flax/linen/linear.py", line 768, in __call__ y = lax.conv_transpose( ^^^^^^^^^^^^^^^^^^^ jaxlib.xla_extension.XlaRuntimeError: UNKNOWN: <unknown>:0: error: type of return operand 0 ('tensor<1x8x8x64xf32>') doesn't match function result type ('tensor<1x14x14x64xf32>') in function @main <unknown>:0: note: see current operation: "func.return"(%0) : (tensor<1x8x8x64xf32>) -> () Versions: pip list | grep jax jax 0.4.11 jax-metal 0.0.4 jaxlib 0.4.11
Posted
by Cemlyn.
Last updated
.