tensorflow-metal

RSS for tag

TensorFlow accelerates machine learning model training with Metal on Mac GPUs.

tensorflow-metal Documentation

Posts under tensorflow-metal tag

122 Posts
Sort by:
Post not yet marked as solved
0 Replies
504 Views
Working Environment MacBook Pro 14' with M2-Pro chip macOS Sonoma 14.0 Python 3.11.4 tensorflow 2.14.0, tensorflow-macos 2.14.0, tensorflow-metal 1.1.0 Issue Description Hi there! I met an issue when working around with Keras' TextVectorization preprocessing layer. text_vectorization = keras.layers.TextVectorization(output_mode="tf_idf") text_vectorization.adapt(ds.map(lambda x: x['title'])) The inputs are string contents. And here is the trackback: --------------------------------------------------------------------------- NotFoundError Traceback (most recent call last) /Users/ken/Workspaces/MLE101/tfrs101/preprocess.ipynb Cell 13 line 3 1 # with tf.device('/CPU:0'): 2 text_vectorization = keras.layers.TextVectorization(output_mode="tf_idf") ----> 3 text_vectorization.adapt(ds.map(lambda x: x['title'])) File ~/miniconda3/envs/ds-101/lib/python3.11/site-packages/keras/src/layers/preprocessing/text_vectorization.py:473, in TextVectorization.adapt(self, data, batch_size, steps) 423 def adapt(self, data, batch_size=None, steps=None): 424 """Computes a vocabulary of string terms from tokens in a dataset. 425 426 Calling `adapt()` on a `TextVectorization` layer is an alternative to (...) 471 argument is not supported with array inputs. 472 """ --> 473 super().adapt(data, batch_size=batch_size, steps=steps) File ~/miniconda3/envs/ds-101/lib/python3.11/site-packages/keras/src/engine/base_preprocessing_layer.py:258, in PreprocessingLayer.adapt(self, data, batch_size, steps) 256 with data_handler.catch_stop_iteration(): 257 for _ in data_handler.steps(): --> 258 self._adapt_function(iterator) 259 if data_handler.should_sync: 260 context.async_wait() File ~/miniconda3/envs/ds-101/lib/python3.11/site-packages/tensorflow/python/util/traceback_utils.py:153, in filter_traceback.<locals>.error_handler(*args, **kwargs) 151 except Exception as e: 152 filtered_tb = _process_traceback_frames(e.__traceback__) --> 153 raise e.with_traceback(filtered_tb) from None 154 finally: 155 del filtered_tb File ~/miniconda3/envs/ds-101/lib/python3.11/site-packages/tensorflow/python/eager/execute.py:60, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name) 53 # Convert any objects of type core_types.Tensor to Tensor. 54 inputs = [ 55 tensor_conversion_registry.convert(t) 56 if isinstance(t, core_types.Tensor) 57 else t 58 for t in inputs 59 ] ---> 60 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, 61 inputs, attrs, num_outputs) 62 except core._NotOkStatusException as e: 63 if name is not None: NotFoundError: Graph execution error: Detected at node StringSplit/stack defined at (most recent call last): ... No registered 'ExpandDims' OpKernel for 'GPU' devices compatible with node {{node StringSplit/stack}} (OpKernel was found, but attributes didn't match) Requested Attributes: T=DT_STRING, Tdim=DT_INT32, _XlaHasReferenceVars=false, _device="/job:localhost/replica:0/task:0/device:GPU:0" . Registered: device='XLA_CPU_JIT'; Tdim in [DT_INT32, DT_INT64]; T in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, DT_INT8, DT_COMPLEX64, DT_INT64, DT_BOOL, DT_QINT8, DT_QUINT8, DT_QINT32, DT_BFLOAT16, DT_UINT16, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64, DT_FLOAT8_E5M2, DT_FLOAT8_E4M3FN] device='DEFAULT'; T in [DT_HALF]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_HALF]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_BFLOAT16]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_BFLOAT16]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_FLOAT]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_FLOAT]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_DOUBLE]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_DOUBLE]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_UINT64]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_UINT64]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_INT64]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_INT64]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_UINT32]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_UINT32]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_UINT16]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_UINT16]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_INT16]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_INT16]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_UINT8]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_UINT8]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_INT8]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_INT8]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_COMPLEX64]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_COMPLEX64]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_COMPLEX128]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_COMPLEX128]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_BOOL]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_BOOL]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_INT32]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_INT32]; Tdim in [DT_INT64] device='CPU'; Tdim in [DT_INT32] device='CPU'; Tdim in [DT_INT64] [[StringSplit/stack]] [Op:__inference_adapt_step_71204] I have to explicitly specify to use CPU to make it work - with tf.device('/CPU:0'): text_vectorization = keras.layers.TextVectorization(output_mode="tf_idf") text_vectorization.adapt(ds.map(lambda x: x['title'])) I have referred to this post: https://developer.apple.com/forums/thread/700108
Posted
by
Post not yet marked as solved
2 Replies
462 Views
Hi, I've been going over this tutorial of autoencoders https://www.tensorflow.org/tutorials/generative/autoencoder#third_example_anomaly_detection Notebook link https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/generative/autoencoder.ipynb And when I downloaded and ran the notebook locally on my M2 Pro Max - the results were dramatically different and the plots were way off. This is the plot in the working notebook: This is the local plot: I checked every moving piece and the difference seems to be in the output of the autoencoder, these lines: encoded_data = autoencoder.encoder(normal_test_data).numpy() decoded_data = autoencoder.decoder(encoded_data).numpy() The working notebook output is: The local output: And the overall result is notebook: Accuracy = 0.944 Precision = 0.9941176470588236 Recall = 0.9053571428571429 local: Accuracy = 0.44 Precision = 0.0 Recall = 0.0 I'm using Mac M2 Pro Max Python 3.10.12 Tensorflow 2.14.0 Can anyone help? Thanks a lot in advance.
Posted
by
Post not yet marked as solved
0 Replies
408 Views
`print("Hello") import tensorflow as tf` I have an error during installing tensorflow "Process finished with exit code 132 (interrupted by signal 4: SIGILL)" Mac air 2022 M2 14.1 | Tensorflow latest version | Python version 3.11.5 Who can help me please? I have tried different variants of tensorflow (for Mac, for cpu and other versions). Also I have tried anaconda and miniconda but I can't. Process finished with exit code 132 (interrupted by signal 4: SIGILL)
Posted
by
Post not yet marked as solved
0 Replies
332 Views
I have tried different variants of tensorflow (for Mac, for cpu and other versions). Also I have used anaconda and miniconda but I can't. Process finished with exit code 132 (interrupted by signal 4: SIGILL)
Posted
by
Post not yet marked as solved
0 Replies
502 Views
I have tried too many different variants. I've tried every version of module tensorflow (for Mac, for cpu...) I have tried anaconda and miniconda. At the result I can't do that. Please help me
Posted
by
Post not yet marked as solved
1 Replies
513 Views
I've been running tensorflow with python 3.9 to training a CNN model, and this process is accelerated by the GPU. After 80 epochs the process went to sleep (status S) and its GPU usage drops to 0 percent, I am wondering if this traing process crashed the GPU or the OS is mandatating the process to go to sleep because it takes up too much GPU time? Thanks a lot!
Posted
by
Post not yet marked as solved
2 Replies
492 Views
I have been following the instructions here: https://developer.apple.com/metal/tensorflow-plugin/ I manage to execute step 1 set up the environment, step 2 install base Tensorflow but when I try to execute step 3 Install tensorflow-metal plug-in with the line "python -m pip install tensorflow-metal", I get the following messages: "ERROR: Could not find a version that satisfies the requirement tensorflow-metal (from versions: none) ERROR: No matching distribution found for tensorflow-metal" What am I missing here? So the code used are as follows: Step 1 python3 -m venv ~/venv-metal source ~/venv-metal/bin/activate python -m pip install -U pip Step 2 python -m pip install tensorflow Step 3 python -m pip install tensorflow-metal
Posted
by
Post not yet marked as solved
0 Replies
483 Views
Hi, When I try to train resnet-50 with tensorflow-metal I found the l2 regularizer makes each epoch take almost 4x as long (~220ms instead of 60ms). I'm on a M1 Max 16" MBP. It seems like regularization shouldn't add that much time, is there anything I can do to make it faster? Here's some sample code that reproduces the issue: import tensorflow as tf from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, ZeroPadding2D,\ Flatten, BatchNormalization, AveragePooling2D, Dense, Activation, Add from tensorflow.keras.regularizers import l2 from tensorflow.keras.models import Model from tensorflow.keras import activations import random import numpy as np random.seed(1234) np.random.seed(1234) tf.random.set_seed(1234) batch_size = 64 (train_im, train_lab), (test_im, test_lab) = tf.keras.datasets.cifar10.load_data() train_im, test_im = train_im/255.0 , test_im/255.0 train_lab_categorical = tf.keras.utils.to_categorical( train_lab, num_classes=10, dtype='uint8') train_DataGen = tf.keras.preprocessing.image.ImageDataGenerator() train_set_data = train_DataGen.flow(train_im, train_lab, batch_size=batch_size, shuffle=False) # Change this to l2 for it to train much slower regularizer = None # l2(0.001) def res_identity(x, filters): x_skip = x f1, f2 = filters x = Conv2D(f1, kernel_size=(1, 1), strides=(1, 1), padding='valid', use_bias=False, kernel_regularizer=regularizer)(x) x = BatchNormalization()(x) x = Activation(activations.relu)(x) x = Conv2D(f1, kernel_size=(3, 3), strides=(1, 1), padding='same', use_bias=False, kernel_regularizer=regularizer)(x) x = BatchNormalization()(x) x = Activation(activations.relu)(x) x = Conv2D(f2, kernel_size=(1, 1), strides=(1, 1), padding='valid', use_bias=False, kernel_regularizer=regularizer)(x) x = BatchNormalization()(x) x = Add()([x, x_skip]) x = Activation(activations.relu)(x) return x def res_conv(x, s, filters): x_skip = x f1, f2 = filters x = Conv2D(f1, kernel_size=(1, 1), strides=(s, s), padding='valid', use_bias=False, kernel_regularizer=regularizer)(x) x = BatchNormalization()(x) x = Activation(activations.relu)(x) x = Conv2D(f1, kernel_size=(3, 3), strides=(1, 1), padding='same', use_bias=False, kernel_regularizer=regularizer)(x) x = BatchNormalization()(x) x = Activation(activations.relu)(x) x = Conv2D(f2, kernel_size=(1, 1), strides=(1, 1), padding='valid', use_bias=False, kernel_regularizer=regularizer)(x) x = BatchNormalization()(x) x_skip = Conv2D(f2, kernel_size=(1, 1), strides=(s, s), padding='valid', use_bias=False, kernel_regularizer=regularizer)(x_skip) x_skip = BatchNormalization()(x_skip) x = Add()([x, x_skip]) x = Activation(activations.relu)(x) return x input = Input(shape=(train_im.shape[1], train_im.shape[2], train_im.shape[3]), batch_size=batch_size) x = ZeroPadding2D(padding=(3, 3))(input) x = Conv2D(64, kernel_size=(7, 7), strides=(2, 2), use_bias=False)(x) x = BatchNormalization()(x) x = Activation(activations.relu)(x) x = MaxPooling2D((3, 3), strides=(2, 2))(x) x = res_conv(x, s=1, filters=(64, 256)) x = res_identity(x, filters=(64, 256)) x = res_identity(x, filters=(64, 256)) x = res_conv(x, s=2, filters=(128, 512)) x = res_identity(x, filters=(128, 512)) x = res_identity(x, filters=(128, 512)) x = res_identity(x, filters=(128, 512)) x = res_conv(x, s=2, filters=(256, 1024)) x = res_identity(x, filters=(256, 1024)) x = res_identity(x, filters=(256, 1024)) x = res_identity(x, filters=(256, 1024)) x = res_identity(x, filters=(256, 1024)) x = res_identity(x, filters=(256, 1024)) x = res_conv(x, s=2, filters=(512, 2048)) x = res_identity(x, filters=(512, 2048)) x = res_identity(x, filters=(512, 2048)) x = AveragePooling2D((2, 2), padding='same')(x) x = Flatten()(x) x = Dense(10, activation='softmax', kernel_initializer='he_normal')(x) model = Model(inputs=input, outputs=x, name='Resnet50') opt = tf.keras.optimizers.legacy.SGD(learning_rate = 0.01) model.compile(loss=tf.keras.losses.CategoricalCrossentropy(reduction=tf.keras.losses.Reduction.SUM_OVER_BATCH_SIZE), optimizer=opt) model.fit(x=train_im, y=train_lab_categorical, batch_size=batch_size, epochs=150, steps_per_epoch=train_im.shape[0]/batch_size)
Posted
by
Post not yet marked as solved
4 Replies
646 Views
Hi. I have followed the instructions here to install tensorflow with GPU support for my 16inch 2019 intel macbook pro (with AMD graphic). The installation process seems to be successful (I get no error) but, when I try to test it, after running import tensorflow as tf I get the following error: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/mahonik/.virtualenvs/tf-metal-new/lib/python3.11/site-packages/tensorflow/__init__.py", line 445, in <module> _ll.load_library(_plugin_dir) File "/Users/mahonik/.virtualenvs/tf-metal-new/lib/python3.11/site-packages/tensorflow/python/framework/load_library.py", line 151, in load_library py_tf.TF_LoadLibrary(lib) tensorflow.python.framework.errors_impl.NotFoundError: dlopen(/Users/mahonik/.virtualenvs/tf-metal-new/lib/python3.11/site-packages/tensorflow-plugins/libmetal_plugin.dylib, 0x0006): Symbol not found: __ZN10tensorflow16TensorShapeProtoC1ERKS0_ Referenced from: <C62E0AB4-567E-3E14-8F96-9F07A746C4DC> /Users/mahonik/.virtualenvs/tf-metal-new/lib/python3.11/site-packages/tensorflow-plugins/libmetal_plugin.dylib Expected in: <0B1F231A-6766-3F61-81D9-6782129807A9> /Users/mahonik/.virtualenvs/tf-metal-new/lib/python3.11/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so My env's packages ... numpy 1.26.1 tensorboard 2.14.1 tensorboard-data-server 0.7.1 tensorflow 2.14.0 tensorflow-estimator 2.14.0 tensorflow-io-gcs-filesystem 0.34.0 tensorflow-metal 1.0.0 ...
Posted
by
Post not yet marked as solved
0 Replies
573 Views
I can train a yolov3 at MacOS M2 ventura with tensorflow-macos=2.9.0 and tensorflow-mental=0.5. But when I upgrade the system to Sonoma14.0. I can not train model with below error. I could train MacOS M1 even I upgrade to Sonoma 14.0 although it report - error: 'anec.gain_offset_control' op. But M1 there is no error for last - `MPSKernel MTLComputePipelineStateCache unable to load function ndArrayConvolution2DGradientWithWeightsA14. Compute function exceeds available temporary registers: (null) When I change my optimizer from Adam to SGD. - error: 'anec.gain_offset_control' op will disappear. So this error happen due something in Adam. But for error - `MPSKernel MTLComputePipelineStateCache unable to load function ndArrayConvolution2DGradientWithWeightsA14. Compute function exceeds available temporary registers: (null) I can not resolve it. ERROR Info MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":294:0)): error: 'anec.gain_offset_control' op result #0 must be 4D/5D memref of 16-bit float or 8-bit signed integer or 8-bit unsigned integer values, but got 'memref<1x1x1x1xi1>' loc("mps_select"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/75428952-3aa4-11ee-8b65-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":294:0)): error: 'anec.gain_offset_control' op result #0 must be 4D/5D memref of 16-bit float or 8-bit signed integer or 8-bit unsigned integer values, but got 'memref<1x1x1x1xi1>' loc("mps_select"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/75428952-3aa4-11ee-8b65-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":294:0)): error: 'anec.gain_offset_control' op result #0 must be 4D/5D memref of 16-bit float or 8-bit signed integer or 8-bit unsigned integer values, but got 'memref<1x1x1x1xi1>' loc("mps_select"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/75428952-3aa4-11ee-8b65-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":294:0)): error: 'anec.gain_offset_control' op result #0 must be 4D/5D memref of 16-bit float or 8-bit signed integer or 8-bit unsigned integer values, but got 'memref<1x1x1x1xi1>' loc("mps_select"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/75428952-3aa4-11ee-8b65-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":294:0)): error: 'anec.gain_offset_control' op result #0 must be 4D/5D memref of 16-bit float or 8-bit signed integer or 8-bit unsigned integer values, but got 'memref<1x1x1x1xi1>' /AppleInternal/Library/BuildRoots/90c9c1ae-37b6-11ee-a991-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Utility/MPSLibrary.mm:550: failed assertion `MPSKernel MTLComputePipelineStateCache unable to load function ndArrayConvolution2DGradientWithWeightsA14. Compute function exceeds available temporary registers: (null)
Posted
by
Post not yet marked as solved
1 Replies
340 Views
I am trying hard to get some Whisper software running on Mac under jax. However, this requires jaxlib>=0.4.14. The current metal-jax requires jaxlib==0.4.11. Anyone knows if there is any planned upgrade?
Posted
by
Post not yet marked as solved
0 Replies
298 Views
Will TensorFlow-Metal and JAX-Metal code be open sourced? Reasons why I ask: If it is open sourced on GitHub or something it might make it easier for people to find issues and create new ones if necessary, also the open source community might be able to help ;) I'd love to learn about how you guys implement some of these operations :P (I know you guys made an Apple tutorial on how to implement TensorFlow custom op for Metal which was fire https://developer.apple.com/documentation/metal/metal_sample_code_library/customizing_a_tensorflow_operation)
Posted
by
Post not yet marked as solved
1 Replies
377 Views
Good evening! Tried to use Flax nn.ConvTranspose which calls jax.lax.conv_transpose but it looks like it isn't implemented correctly for the METAL backend, works fine on CPU. File "/Users/cemlyn/Documents/VCLless/mnist_vae/venv/lib/python3.11/site-packages/flax/linen/linear.py", line 768, in __call__ y = lax.conv_transpose( ^^^^^^^^^^^^^^^^^^^ jaxlib.xla_extension.XlaRuntimeError: UNKNOWN: <unknown>:0: error: type of return operand 0 ('tensor<1x8x8x64xf32>') doesn't match function result type ('tensor<1x14x14x64xf32>') in function @main <unknown>:0: note: see current operation: "func.return"(%0) : (tensor<1x8x8x64xf32>) -> () Versions: pip list | grep jax jax 0.4.11 jax-metal 0.0.4 jaxlib 0.4.11
Posted
by
Post not yet marked as solved
3 Replies
367 Views
Hi, Are there plans to support complex numbers? Something simple like this: def return_complex(x): return x*1+1.0j x = jnp.ones((10)) print(return_complex(x)) results in an error.
Posted
by
Post not yet marked as solved
0 Replies
367 Views
Hi, following instructions at https://developer.apple.com/metal/jax/, jax works fine on M1 pro. However, only in Terminal. If you run Jupyter Notebook or Pycharm, the following always defaults to CPU. from jax.lib import xla_bridge print(xla_bridge.get_backend().platform) I also notice that if you restart the Terminal, jax defaults to CPU only. You need to always set the virtual environment to jax-meta first to get Apple Silicon's GPU work: python3 -m venv ~/jax-metal source ~/jax-metal/bin/activate Is there any way to make sure that Jupyter Notebook and other IDEs default to jax-metal? I'm currently only able to use it in Terminal after each time manually setting the virtual environment to jax-metal, which is annoying.
Posted
by
Post not yet marked as solved
1 Replies
438 Views
Trying to setup Tensorflow on mac M1. conda install -c apple tensorflow-deps throwing following error: UnsatisfiableError: The following specifications were found to be incompatible with each other: Output in format: Requested package -> Available versions following specifications were found to be incompatible with your system: - feature:/osx-arm64::__osx==13.6=0 - tensorflow-deps -> grpcio[version='>=1.37.0,<2.0'] -> __osx[version='>=10.10|>=10.9'] Your installed version is: 13.6 The .condarc as follows: channels: - defaults subdirs: - osx-arm64 - osx-64 - noarch ssl_verify: false subdir: osx-arm64 And conda info: active environment : base active env location : /Users/mdrahman/miniconda3 shell level : 1 user config file : /Users/mdrahman/.condarc populated config files : /Users/mdrahman/.condarc conda version : 23.5.2 conda-build version : not installed python version : 3.11.4.final.0 virtual packages : __archspec=1=arm64 __osx=13.6=0 __unix=0=0 base environment : /Users/mdrahman/miniconda3 (writable) conda av data dir : /Users/mdrahman/miniconda3/etc/conda conda av metadata url : None channel URLs : https://repo.anaconda.com/pkgs/main/osx-arm64 https://repo.anaconda.com/pkgs/main/osx-64 https://repo.anaconda.com/pkgs/main/noarch https://repo.anaconda.com/pkgs/r/osx-arm64 https://repo.anaconda.com/pkgs/r/osx-64 https://repo.anaconda.com/pkgs/r/noarch package cache : /Users/mdrahman/miniconda3/pkgs /Users/mdrahman/.conda/pkgs envs directories : /Users/mdrahman/miniconda3/envs /Users/mdrahman/.conda/envs platform : osx-arm64 user-agent : conda/23.5.2 requests/2.29.0 CPython/3.11.4 Darwin/22.6.0 OSX/13.6 UID:GID : 501:20 netrc file : None offline mode : False``` Looking forward for your support.
Posted
by
Post not yet marked as solved
6 Replies
933 Views
I only get this error when using the JAX Metal device (CPU is fine). It seems to be a problem whenever I want to modify values of an array in-place using at and set. note: see current operation: %2903 = "mhlo.scatter"(%arg3, %2902, %2893) ({ ^bb0(%arg4: tensor<f32>, %arg5: tensor<f32>): "mhlo.return"(%arg5) : (tensor<f32>) -> () }) {indices_are_sorted = true, scatter_dimension_numbers = #mhlo.scatter<update_window_dims = [0, 1], inserted_window_dims = [1], scatter_dims_to_operand_dims = [1]>, unique_indices = true} : (tensor<10x100x4xf32>, tensor<1xsi32>, tensor<10x4xf32>) -> tensor<10x100x4xf32> blocks = blocks.at[i].set( ...
Posted
by
Post not yet marked as solved
7 Replies
1.8k Views
Hello - I have been struggling to find a solution online and I hope you can help me timely. I have installed the latest tesnorflow and tensorflow-metal, I even went to install the ternsorflow-nightly. My app generates the following as a result of my fit function on a CNN model with 8 layers. 2023-09-29 22:21:06.115768: I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M1 Pro 2023-09-29 22:21:06.115846: I metal_plugin/src/device/metal_device.cc:296] systemMemory: 16.00 GB 2023-09-29 22:21:06.116048: I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 5.33 GB 2023-09-29 22:21:06.116264: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2023-09-29 22:21:06.116483: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: ) Most importantly, the learning process is very slow and I'd like to take advantage of al the new features of the latest versions. What can I do?
Posted
by
Post not yet marked as solved
1 Replies
1k Views
The JAX ml_dtypes module was recently updated to 0.3.0 - as part of this change, the 'float8_e4m3b11' dtype has been deprecated, with newer versions of JAX also reflecting this change. The new ml_dtypes version now seems to be incompatible with JAX v0.4.11. As jax-metal currently requires JAX v0.4.11, perhaps the dependencies list should be updated to include ml_dtypes==0.2.0 in order to prevent the following import error: AttributeError: module 'ml_dtypes' has no attribute 'float8_e4m3b11' Which essentially makes JAX unusable on import (and appears to be fixed by pip install ml_dtypes==0.2.0)
Posted
by
Post not yet marked as solved
2 Replies
404 Views
I'm trying to train a trivial example of a CNN using cifar10 dataset on CPU vs GPU. Although GPU is much faster, the accuracies and losses behave really strange: Accuracy and validation accuracy on CPU looks like follows: Training Accuracy on GPU looks like this: Package Versions: python = "3.11.4" tensorflow = "2.13.0" tensorflow-macos = "2.13.0" tensorflow-metal = "1.0.1" The code: import tensorflow as tf # with tf.device("/cpu:0"): # uncomment and indent the following to run on CPU from tensorflow.keras import datasets, layers, models import matplotlib.pyplot as plt (train_images, train_labels), ( test_images, test_labels, ) = datasets.cifar10.load_data() # Normalize pixel values to be between 0 and 1 train_images, test_images = train_images / 255.0, test_images / 255.0 class_names = [ "airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck", ] plt.figure(figsize=(10, 10)) for i in range(25): plt.subplot(5, 5, i + 1) plt.xticks([]) plt.yticks([]) plt.grid(False) plt.imshow(train_images[i]) # The CIFAR labels happen to be arrays, # which is why you need the extra index plt.xlabel(class_names[train_labels[i][0]]) plt.show() model = models.Sequential() model.add(layers.Conv2D(32, (3, 3), activation="relu", input_shape=(32, 32, 3))) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation="relu")) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation="relu")) model.summary() model.add(layers.Flatten()) model.add(layers.Dense(64, activation="relu")) model.add(layers.Dense(10)) model.summary() model.compile( optimizer="adam", loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=["accuracy"], ) history = model.fit( train_images, train_labels, epochs=20, batch_size=64, validation_data=(test_images, test_labels), ) plt.plot(history.history["accuracy"], label="accuracy") plt.plot(history.history["val_accuracy"], label="val_accuracy") plt.xlabel("Epoch") plt.ylabel("Accuracy") plt.ylim([0, 1]) plt.legend(loc="lower right") test_loss, test_acc = model.evaluate( test_images, test_labels, batch_size=64, verbose=2 ) Hope this helps for reconstruction.
Posted
by