tensorflow-metal

Issue with Tensorflow 2.14 on MacOS: No registered 'ExpandDims' OpKernel for 'GPU' devices compatible with node {{node StringSplit/stack}}

Working Environment MacBook Pro 14' with M2-Pro chip macOS Sonoma 14.0 Python 3.11.4 tensorflow 2.14.0, tensorflow-macos 2.14.0, tensorflow-metal 1.1.0 Issue Description Hi there! I met an issue when working around with Keras' TextVectorization preprocessing layer. text_vectorization = keras.layers.TextVectorization(output_mode="tf_idf") text_vectorization.adapt(ds.map(lambda x: x['title'])) The inputs are string contents. And here is the trackback: --------------------------------------------------------------------------- NotFoundError Traceback (most recent call last) /Users/ken/Workspaces/MLE101/tfrs101/preprocess.ipynb Cell 13 line 3 1 # with tf.device('/CPU:0'): 2 text_vectorization = keras.layers.TextVectorization(output_mode="tf_idf") ----> 3 text_vectorization.adapt(ds.map(lambda x: x['title'])) File ~/miniconda3/envs/ds-101/lib/python3.11/site-packages/keras/src/layers/preprocessing/text_vectorization.py:473, in TextVectorization.adapt(self, data, batch_size, steps) 423 def adapt(self, data, batch_size=None, steps=None): 424 """Computes a vocabulary of string terms from tokens in a dataset. 425 426 Calling `adapt()` on a `TextVectorization` layer is an alternative to (...) 471 argument is not supported with array inputs. 472 """ --> 473 super().adapt(data, batch_size=batch_size, steps=steps) File ~/miniconda3/envs/ds-101/lib/python3.11/site-packages/keras/src/engine/base_preprocessing_layer.py:258, in PreprocessingLayer.adapt(self, data, batch_size, steps) 256 with data_handler.catch_stop_iteration(): 257 for _ in data_handler.steps(): --> 258 self._adapt_function(iterator) 259 if data_handler.should_sync: 260 context.async_wait() File ~/miniconda3/envs/ds-101/lib/python3.11/site-packages/tensorflow/python/util/traceback_utils.py:153, in filter_traceback.<locals>.error_handler(*args, **kwargs) 151 except Exception as e: 152 filtered_tb = _process_traceback_frames(e.__traceback__) --> 153 raise e.with_traceback(filtered_tb) from None 154 finally: 155 del filtered_tb File ~/miniconda3/envs/ds-101/lib/python3.11/site-packages/tensorflow/python/eager/execute.py:60, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name) 53 # Convert any objects of type core_types.Tensor to Tensor. 54 inputs = [ 55 tensor_conversion_registry.convert(t) 56 if isinstance(t, core_types.Tensor) 57 else t 58 for t in inputs 59 ] ---> 60 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, 61 inputs, attrs, num_outputs) 62 except core._NotOkStatusException as e: 63 if name is not None: NotFoundError: Graph execution error: Detected at node StringSplit/stack defined at (most recent call last): ... No registered 'ExpandDims' OpKernel for 'GPU' devices compatible with node {{node StringSplit/stack}} (OpKernel was found, but attributes didn't match) Requested Attributes: T=DT_STRING, Tdim=DT_INT32, _XlaHasReferenceVars=false, _device="/job:localhost/replica:0/task:0/device:GPU:0" . Registered: device='XLA_CPU_JIT'; Tdim in [DT_INT32, DT_INT64]; T in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, DT_INT8, DT_COMPLEX64, DT_INT64, DT_BOOL, DT_QINT8, DT_QUINT8, DT_QINT32, DT_BFLOAT16, DT_UINT16, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64, DT_FLOAT8_E5M2, DT_FLOAT8_E4M3FN] device='DEFAULT'; T in [DT_HALF]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_HALF]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_BFLOAT16]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_BFLOAT16]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_FLOAT]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_FLOAT]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_DOUBLE]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_DOUBLE]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_UINT64]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_UINT64]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_INT64]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_INT64]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_UINT32]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_UINT32]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_UINT16]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_UINT16]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_INT16]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_INT16]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_UINT8]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_UINT8]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_INT8]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_INT8]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_COMPLEX64]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_COMPLEX64]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_COMPLEX128]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_COMPLEX128]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_BOOL]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_BOOL]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_INT32]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_INT32]; Tdim in [DT_INT64] device='CPU'; Tdim in [DT_INT32] device='CPU'; Tdim in [DT_INT64] [[StringSplit/stack]] [Op:__inference_adapt_step_71204] I have to explicitly specify to use CPU to make it work - with tf.device('/CPU:0'): text_vectorization = keras.layers.TextVectorization(output_mode="tf_idf") text_vectorization.adapt(ds.map(lambda x: x['title'])) I have referred to this post: https://developer.apple.com/forums/thread/700108

tensorflow-metal

Posted

by

ValerioL29.

Last updated

.

Tensorflow Autoencoders different results between local (M2 Pro Max) and colab / kaggle

Hi, I've been going over this tutorial of autoencoders https://www.tensorflow.org/tutorials/generative/autoencoder#third_example_anomaly_detection Notebook link https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/generative/autoencoder.ipynb And when I downloaded and ran the notebook locally on my M2 Pro Max - the results were dramatically different and the plots were way off. This is the plot in the working notebook: This is the local plot: I checked every moving piece and the difference seems to be in the output of the autoencoder, these lines: encoded_data = autoencoder.encoder(normal_test_data).numpy() decoded_data = autoencoder.decoder(encoded_data).numpy() The working notebook output is: The local output: And the overall result is notebook: Accuracy = 0.944 Precision = 0.9941176470588236 Recall = 0.9053571428571429 local: Accuracy = 0.44 Precision = 0.0 Recall = 0.0 I'm using Mac M2 Pro Max Python 3.10.12 Tensorflow 2.14.0 Can anyone help? Thanks a lot in advance.

tensorflow-metal

Posted

by

nivmorabin.

Last updated

.

An error during installing tensorflow

`print("Hello") import tensorflow as tf` I have an error during installing tensorflow "Process finished with exit code 132 (interrupted by signal 4: SIGILL)" Mac air 2022 M2 14.1 | Tensorflow latest version | Python version 3.11.5 Who can help me please? I have tried different variants of tensorflow (for Mac, for cpu and other versions). Also I have tried anaconda and miniconda but I can't. Process finished with exit code 132 (interrupted by signal 4: SIGILL)

tensorflow-metal

Posted

by

toniX.

Last updated

.

problem with import tensorflow

I have tried different variants of tensorflow (for Mac, for cpu and other versions). Also I have used anaconda and miniconda but I can't. Process finished with exit code 132 (interrupted by signal 4: SIGILL)

tensorflow-metal

Posted

by

toniX.

Last updated

.

How to install tensorflow on Mac M2

I have tried too many different variants. I've tried every version of module tensorflow (for Mac, for cpu...) I have tried anaconda and miniconda. At the result I can't do that. Please help me

tensorflow-metal

Posted

by

toniX.

Last updated

.

M1 GPU python process stopped?

I've been running tensorflow with python 3.9 to training a CNN model, and this process is accelerated by the GPU. After 80 epochs the process went to sleep (status S) and its GPU usage drops to 0 percent, I am wondering if this traing process crashed the GPU or the OS is mandatating the process to go to sleep because it takes up too much GPU time? Thanks a lot!

Posted

by

chaoyi240.

Last updated

.

Issues with installing Tensorflow on M1 MacBook Pro

I have been following the instructions here: https://developer.apple.com/metal/tensorflow-plugin/ I manage to execute step 1 set up the environment, step 2 install base Tensorflow but when I try to execute step 3 Install tensorflow-metal plug-in with the line "python -m pip install tensorflow-metal", I get the following messages: "ERROR: Could not find a version that satisfies the requirement tensorflow-metal (from versions: none) ERROR: No matching distribution found for tensorflow-metal" What am I missing here? So the code used are as follows: Step 1 python3 -m venv ~/venv-metal source ~/venv-metal/bin/activate python -m pip install -U pip Step 2 python -m pip install tensorflow Step 3 python -m pip install tensorflow-metal

tensorflow-metal

Posted

by

yauhooi.

Last updated

.

Tensorflow-metal training with l2 regularizer much slower than without regularizer

Hi, When I try to train resnet-50 with tensorflow-metal I found the l2 regularizer makes each epoch take almost 4x as long (~220ms instead of 60ms). I'm on a M1 Max 16" MBP. It seems like regularization shouldn't add that much time, is there anything I can do to make it faster? Here's some sample code that reproduces the issue: import tensorflow as tf from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, ZeroPadding2D,\ Flatten, BatchNormalization, AveragePooling2D, Dense, Activation, Add from tensorflow.keras.regularizers import l2 from tensorflow.keras.models import Model from tensorflow.keras import activations import random import numpy as np random.seed(1234) np.random.seed(1234) tf.random.set_seed(1234) batch_size = 64 (train_im, train_lab), (test_im, test_lab) = tf.keras.datasets.cifar10.load_data() train_im, test_im = train_im/255.0 , test_im/255.0 train_lab_categorical = tf.keras.utils.to_categorical( train_lab, num_classes=10, dtype='uint8') train_DataGen = tf.keras.preprocessing.image.ImageDataGenerator() train_set_data = train_DataGen.flow(train_im, train_lab, batch_size=batch_size, shuffle=False) # Change this to l2 for it to train much slower regularizer = None # l2(0.001) def res_identity(x, filters): x_skip = x f1, f2 = filters x = Conv2D(f1, kernel_size=(1, 1), strides=(1, 1), padding='valid', use_bias=False, kernel_regularizer=regularizer)(x) x = BatchNormalization()(x) x = Activation(activations.relu)(x) x = Conv2D(f1, kernel_size=(3, 3), strides=(1, 1), padding='same', use_bias=False, kernel_regularizer=regularizer)(x) x = BatchNormalization()(x) x = Activation(activations.relu)(x) x = Conv2D(f2, kernel_size=(1, 1), strides=(1, 1), padding='valid', use_bias=False, kernel_regularizer=regularizer)(x) x = BatchNormalization()(x) x = Add()([x, x_skip]) x = Activation(activations.relu)(x) return x def res_conv(x, s, filters): x_skip = x f1, f2 = filters x = Conv2D(f1, kernel_size=(1, 1), strides=(s, s), padding='valid', use_bias=False, kernel_regularizer=regularizer)(x) x = BatchNormalization()(x) x = Activation(activations.relu)(x) x = Conv2D(f1, kernel_size=(3, 3), strides=(1, 1), padding='same', use_bias=False, kernel_regularizer=regularizer)(x) x = BatchNormalization()(x) x = Activation(activations.relu)(x) x = Conv2D(f2, kernel_size=(1, 1), strides=(1, 1), padding='valid', use_bias=False, kernel_regularizer=regularizer)(x) x = BatchNormalization()(x) x_skip = Conv2D(f2, kernel_size=(1, 1), strides=(s, s), padding='valid', use_bias=False, kernel_regularizer=regularizer)(x_skip) x_skip = BatchNormalization()(x_skip) x = Add()([x, x_skip]) x = Activation(activations.relu)(x) return x input = Input(shape=(train_im.shape[1], train_im.shape[2], train_im.shape[3]), batch_size=batch_size) x = ZeroPadding2D(padding=(3, 3))(input) x = Conv2D(64, kernel_size=(7, 7), strides=(2, 2), use_bias=False)(x) x = BatchNormalization()(x) x = Activation(activations.relu)(x) x = MaxPooling2D((3, 3), strides=(2, 2))(x) x = res_conv(x, s=1, filters=(64, 256)) x = res_identity(x, filters=(64, 256)) x = res_identity(x, filters=(64, 256)) x = res_conv(x, s=2, filters=(128, 512)) x = res_identity(x, filters=(128, 512)) x = res_identity(x, filters=(128, 512)) x = res_identity(x, filters=(128, 512)) x = res_conv(x, s=2, filters=(256, 1024)) x = res_identity(x, filters=(256, 1024)) x = res_identity(x, filters=(256, 1024)) x = res_identity(x, filters=(256, 1024)) x = res_identity(x, filters=(256, 1024)) x = res_identity(x, filters=(256, 1024)) x = res_conv(x, s=2, filters=(512, 2048)) x = res_identity(x, filters=(512, 2048)) x = res_identity(x, filters=(512, 2048)) x = AveragePooling2D((2, 2), padding='same')(x) x = Flatten()(x) x = Dense(10, activation='softmax', kernel_initializer='he_normal')(x) model = Model(inputs=input, outputs=x, name='Resnet50') opt = tf.keras.optimizers.legacy.SGD(learning_rate = 0.01) model.compile(loss=tf.keras.losses.CategoricalCrossentropy(reduction=tf.keras.losses.Reduction.SUM_OVER_BATCH_SIZE), optimizer=opt) model.fit(x=train_im, y=train_lab_categorical, batch_size=batch_size, epochs=150, steps_per_epoch=train_im.shape[0]/batch_size)

Posted

by

noahmartin.

Last updated

.

JAX Metal error: failed to legalize operation 'mhlo.scatter'

I only get this error when using the JAX Metal device (CPU is fine). It seems to be a problem whenever I want to modify values of an array in-place using at and set. note: see current operation: %2903 = "mhlo.scatter"(%arg3, %2902, %2893) ({ ^bb0(%arg4: tensor<f32>, %arg5: tensor<f32>): "mhlo.return"(%arg5) : (tensor<f32>) -> () }) {indices_are_sorted = true, scatter_dimension_numbers = #mhlo.scatter<update_window_dims = [0, 1], inserted_window_dims = [1], scatter_dims_to_operand_dims = [1]>, unique_indices = true} : (tensor<10x100x4xf32>, tensor<1xsi32>, tensor<10x4xf32>) -> tensor<10x100x4xf32> blocks = blocks.at[i].set( ...

Posted

by

Cemlyn.

Last updated

.

M1 GPU is extremely slow, how can I enable CPU to train my NNs?

Hi everyone, I found that the performance of GPU is not good as I expected (as slow as a turtle), I wanna switch from GPU to CPU. but mlcompute module cannot be found, so wired. The same code ran on colab and my computer (jupyter lab) take 156s vs 40 minutes per epoch, respectively. I only used a small dataset (a few thousands of data points), and each epoch only have 20 baches. I am so disappointing and it seems like the "powerful" GPU is a joke. I am using 12.0.1 macOS and the version of tensorflow-macos is 2.6.0 Can anyone tell me why this happens?

Posted

by

dkjdjdfdskln.

Last updated

.

Tensorflow on OSX12.2: platform is already registered with name: "METAL"

I am aware this question has been asked before, but resolutions have worked for me. When I try to import TensorFlow on my python 3.9 environment I get the following error: uwewinter@Uwes-MBP % python3 Python 3.9.10 (main, Jan 15 2022, 11:40:53) [Clang 13.0.0 (clang-1300.0.29.3)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow 2022-02-09 21:30:01.701794: F tensorflow/c/experimental/stream_executor/stream_executor.cc:808] Non-OK-status: stream_executor::MultiPlatformManager::RegisterPlatform( std::move(cplatform)) status: INTERNAL: platform is already registered with name: "METAL" zsh: abort python3 I have the newest versions of TensorFlow-macos and TensorFlow-metal installed: uwewinter@Uwes-MBP % pip3 list | grep tensorflow tensorflow-estimator 2.7.0 tensorflow-macos 2.7.0 tensorflow-metal 0.3.0 OSX is latest: uwewinter@Uwes-MBP % sw_vers ProductName: macOS ProductVersion: 12.2 BuildVersion: 21D49 Mac is a 2021 MBP uwewinter@Uwes-MBP % sysctl hw.model hw.model: MacBookPro18,3

tensorflow-metal

Posted

by

uwwint.

Last updated

.

Sklearn is unstable on Apple Silicon

Hi, I installed skearn successfully and ran the MINIST toy example successfully. then I started to run my project. The finning thing everything seems good at the start point (at least no ImportError occurs). but when I made some changes of my code and try to run all cells (I use jupyter lab) again, ImportError occurs..... ImportError: dlopen(/Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/qhull.cpython-39-darwin.so, 0x0002): Library not loaded: @rpath/liblapack.3.dylib Referenced from: /Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/qhull.cpython-39-darwin.so Reason: tried: '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/../../../../liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/python3.9/site-packages/scipy/spatial/../../../../liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/bin/../lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/lib/liblapack.3.dylib' (no such file), '/Users/a/miniforge3/bin/../lib/liblapack.3.dylib' (no such file), '/usr/local/lib/liblapack.3.dylib' (no such file), '/usr/lib/liblapack.3.dylib' (no such file) then I have to uninstall scipy, sklearn, etc and reinstall all of them. and my code can be ran again..... Magically I hate to say, anyone knows how to permanently solve this problem? make skearn more stable?

Posted

by

dkjdjdfdskln.

Last updated

.

unsuccessful importing of tensorflow

Hi. I have followed the instructions here to install tensorflow with GPU support for my 16inch 2019 intel macbook pro (with AMD graphic). The installation process seems to be successful (I get no error) but, when I try to test it, after running import tensorflow as tf I get the following error: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/mahonik/.virtualenvs/tf-metal-new/lib/python3.11/site-packages/tensorflow/__init__.py", line 445, in <module> _ll.load_library(_plugin_dir) File "/Users/mahonik/.virtualenvs/tf-metal-new/lib/python3.11/site-packages/tensorflow/python/framework/load_library.py", line 151, in load_library py_tf.TF_LoadLibrary(lib) tensorflow.python.framework.errors_impl.NotFoundError: dlopen(/Users/mahonik/.virtualenvs/tf-metal-new/lib/python3.11/site-packages/tensorflow-plugins/libmetal_plugin.dylib, 0x0006): Symbol not found: __ZN10tensorflow16TensorShapeProtoC1ERKS0_ Referenced from: <C62E0AB4-567E-3E14-8F96-9F07A746C4DC> /Users/mahonik/.virtualenvs/tf-metal-new/lib/python3.11/site-packages/tensorflow-plugins/libmetal_plugin.dylib Expected in: <0B1F231A-6766-3F61-81D9-6782129807A9> /Users/mahonik/.virtualenvs/tf-metal-new/lib/python3.11/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so My env's packages ... numpy 1.26.1 tensorboard 2.14.1 tensorboard-data-server 0.7.1 tensorflow 2.14.0 tensorflow-estimator 2.14.0 tensorflow-io-gcs-filesystem 0.34.0 tensorflow-metal 1.0.0 ...

Posted

by

mahdiaslanimk.

Last updated

.

Tensorflow-metal runs extremely slow

I am comparing my M1 MBA with my 2019 16" Intel MBP. The M1 MBA has tensorflow-metal, while the Intel MBP has TF directly from Google. Generally, the same programs runs 2-5 times FASTER on the Intel MBP, which presumably has no GPU acceleration. Is there anything I could have done wrong on the M1? Here is the start of the metal run: Metal device set to: Apple M1 systemMemory: 16.00 GB maxCacheSize: 5.33 GB 2022-01-19 04:43:50.975025: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2022-01-19 04:43:50.975291: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: ) 2022-01-19 04:43:51.216306: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz Epoch 1/10 2022-01-19 04:43:51.298428: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.

tensorflow-metal

Posted

by

ahostmadsen.

Last updated

.

TensorFlow is slow after upgrading to Sonoma

Hello - I have been struggling to find a solution online and I hope you can help me timely. I have installed the latest tesnorflow and tensorflow-metal, I even went to install the ternsorflow-nightly. My app generates the following as a result of my fit function on a CNN model with 8 layers. 2023-09-29 22:21:06.115768: I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M1 Pro 2023-09-29 22:21:06.115846: I metal_plugin/src/device/metal_device.cc:296] systemMemory: 16.00 GB 2023-09-29 22:21:06.116048: I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 5.33 GB 2023-09-29 22:21:06.116264: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2023-09-29 22:21:06.116483: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: ) Most importantly, the learning process is very slow and I'd like to take advantage of al the new features of the latest versions. What can I do?

Posted

by

erezkatz.

Last updated

.

MacOS M2 upgrade Sonoma 14.0 can not train model with tensorflow

I can train a yolov3 at MacOS M2 ventura with tensorflow-macos=2.9.0 and tensorflow-mental=0.5. But when I upgrade the system to Sonoma14.0. I can not train model with below error. I could train MacOS M1 even I upgrade to Sonoma 14.0 although it report - error: 'anec.gain_offset_control' op. But M1 there is no error for last - `MPSKernel MTLComputePipelineStateCache unable to load function ndArrayConvolution2DGradientWithWeightsA14. Compute function exceeds available temporary registers: (null) When I change my optimizer from Adam to SGD. - error: 'anec.gain_offset_control' op will disappear. So this error happen due something in Adam. But for error - `MPSKernel MTLComputePipelineStateCache unable to load function ndArrayConvolution2DGradientWithWeightsA14. Compute function exceeds available temporary registers: (null) I can not resolve it. ERROR Info MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":294:0)): error: 'anec.gain_offset_control' op result #0 must be 4D/5D memref of 16-bit float or 8-bit signed integer or 8-bit unsigned integer values, but got 'memref<1x1x1x1xi1>' loc("mps_select"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/75428952-3aa4-11ee-8b65-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":294:0)): error: 'anec.gain_offset_control' op result #0 must be 4D/5D memref of 16-bit float or 8-bit signed integer or 8-bit unsigned integer values, but got 'memref<1x1x1x1xi1>' loc("mps_select"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/75428952-3aa4-11ee-8b65-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":294:0)): error: 'anec.gain_offset_control' op result #0 must be 4D/5D memref of 16-bit float or 8-bit signed integer or 8-bit unsigned integer values, but got 'memref<1x1x1x1xi1>' loc("mps_select"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/75428952-3aa4-11ee-8b65-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":294:0)): error: 'anec.gain_offset_control' op result #0 must be 4D/5D memref of 16-bit float or 8-bit signed integer or 8-bit unsigned integer values, but got 'memref<1x1x1x1xi1>' loc("mps_select"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/75428952-3aa4-11ee-8b65-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":294:0)): error: 'anec.gain_offset_control' op result #0 must be 4D/5D memref of 16-bit float or 8-bit signed integer or 8-bit unsigned integer values, but got 'memref<1x1x1x1xi1>' /AppleInternal/Library/BuildRoots/90c9c1ae-37b6-11ee-a991-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Utility/MPSLibrary.mm:550: failed assertion `MPSKernel MTLComputePipelineStateCache unable to load function ndArrayConvolution2DGradientWithWeightsA14. Compute function exceeds available temporary registers: (null)

tensorflow-metal

Posted

by

cloris.

Last updated

.

Jaxlib version

I am trying hard to get some Whisper software running on Mac under jax. However, this requires jaxlib>=0.4.14. The current metal-jax requires jaxlib==0.4.11. Anyone knows if there is any planned upgrade?

tensorflow-metal

Posted

by

PerEK.

Last updated

.

How to use GPU in Tensorflow?

Im using my 2020 Mac mini with M1 chip and this is the first time try to use it on convolutional neural network training. So the problem is I install the python(ver 3.8.12) using miniforge3 and Tensorflow following this instruction. But still facing the GPU problem when training a 3D Unet. Here's part of my code and hoping to receive some suggestion to fix this. import tensorflow as tf from tensorflow import keras import json import numpy as np import pandas as pd import nibabel as nib import matplotlib.pyplot as plt from tensorflow.keras import backend as K #check available devices def get_available_devices(): local_device_protos = device_lib.list_local_devices() return [x.name for x in local_device_protos] print(get_available_devices()) Metal device set to: Apple M1 ['/device:CPU:0', '/device:GPU:0'] 2022-02-09 11:52:55.468198: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2022-02-09 11:52:55.468885: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: ) X_norm_with_batch_dimension = np.expand_dims(X_norm, axis=0) #tf.device('/device:GPU:0') #Have tried this line doesn't work #tf.debugging.set_log_device_placement(True) #Have tried this line doesn't work patch_pred = model.predict(X_norm_with_batch_dimension) InvalidArgumentError: 2 root error(s) found. (0) INVALID_ARGUMENT: CPU implementation of Conv3D currently only supports the NHWC tensor format. [[node model/conv3d/Conv3D (defined at /Users/mwshay/miniforge3/envs/tensor/lib/python3.8/site-packages/keras/layers/convolutional.py:231) ]] [[model/conv3d/Conv3D/_4]] (1) INVALID_ARGUMENT: CPU implementation of Conv3D currently only supports the NHWC tensor format. [[node model/conv3d/Conv3D (defined at /Users/mwshay/miniforge3/envs/tensor/lib/python3.8/site-packages/keras/layers/convolutional.py:231) ]] 0 successful operations. 0 derived errors ignored. The code is executable on Google Colab but can't run on Mac mini locally with Jupyter notebook. The NHWC tensor format problem might indicate that Im using my CPU to execute the code instead of GPU. Is there anyway to optimise GPU to train the network in Tensorflow?

Posted

by

MW_Shay.

Last updated

.

Support for complex numbers

Hi, Are there plans to support complex numbers? Something simple like this: def return_complex(x): return x*1+1.0j x = jnp.ones((10)) print(return_complex(x)) results in an error.

tensorflow-metal

Posted

by

FilipeMaia.

Last updated

.

jax.lax.conv_transpose not correctly implemented

Good evening! Tried to use Flax nn.ConvTranspose which calls jax.lax.conv_transpose but it looks like it isn't implemented correctly for the METAL backend, works fine on CPU. File "/Users/cemlyn/Documents/VCLless/mnist_vae/venv/lib/python3.11/site-packages/flax/linen/linear.py", line 768, in __call__ y = lax.conv_transpose( ^^^^^^^^^^^^^^^^^^^ jaxlib.xla_extension.XlaRuntimeError: UNKNOWN: <unknown>:0: error: type of return operand 0 ('tensor<1x8x8x64xf32>') doesn't match function result type ('tensor<1x14x14x64xf32>') in function @main <unknown>:0: note: see current operation: "func.return"(%0) : (tensor<1x8x8x64xf32>) -> () Versions: pip list | grep jax jax 0.4.11 jax-metal 0.0.4 jaxlib 0.4.11

tensorflow-metal

Posted

by

Cemlyn.

Last updated

.

Posts under tensorflow-metal tag