tensorflow-metal

RSS for tag

TensorFlow accelerates machine learning model training with Metal on Mac GPUs.

tensorflow-metal Documentation

Posts under tensorflow-metal tag

123 Posts
Sort by:
Post not yet marked as solved
1 Replies
376 Views
Good evening! Tried to use Flax nn.ConvTranspose which calls jax.lax.conv_transpose but it looks like it isn't implemented correctly for the METAL backend, works fine on CPU. File "/Users/cemlyn/Documents/VCLless/mnist_vae/venv/lib/python3.11/site-packages/flax/linen/linear.py", line 768, in __call__ y = lax.conv_transpose( ^^^^^^^^^^^^^^^^^^^ jaxlib.xla_extension.XlaRuntimeError: UNKNOWN: <unknown>:0: error: type of return operand 0 ('tensor<1x8x8x64xf32>') doesn't match function result type ('tensor<1x14x14x64xf32>') in function @main <unknown>:0: note: see current operation: "func.return"(%0) : (tensor<1x8x8x64xf32>) -> () Versions: pip list | grep jax jax 0.4.11 jax-metal 0.0.4 jaxlib 0.4.11
Posted
by Cemlyn.
Last updated
.
Post not yet marked as solved
7 Replies
818 Views
XlaRuntimeError Traceback (most recent call last) Cell In[49], line 4 1 arr = jnp.array( [7, 8, 9]) 3 # Find indices where the condition is True ----> 4 indices = jnp.where(arr > 1) 6 print(indices) XlaRuntimeError: UNKNOWN
Posted
by yannlecun.
Last updated
.
Post not yet marked as solved
1 Replies
1k Views
The JAX ml_dtypes module was recently updated to 0.3.0 - as part of this change, the 'float8_e4m3b11' dtype has been deprecated, with newer versions of JAX also reflecting this change. The new ml_dtypes version now seems to be incompatible with JAX v0.4.11. As jax-metal currently requires JAX v0.4.11, perhaps the dependencies list should be updated to include ml_dtypes==0.2.0 in order to prevent the following import error: AttributeError: module 'ml_dtypes' has no attribute 'float8_e4m3b11' Which essentially makes JAX unusable on import (and appears to be fixed by pip install ml_dtypes==0.2.0)
Posted
by tdgfrost.
Last updated
.
Post not yet marked as solved
0 Replies
298 Views
Will TensorFlow-Metal and JAX-Metal code be open sourced? Reasons why I ask: If it is open sourced on GitHub or something it might make it easier for people to find issues and create new ones if necessary, also the open source community might be able to help ;) I'd love to learn about how you guys implement some of these operations :P (I know you guys made an Apple tutorial on how to implement TensorFlow custom op for Metal which was fire https://developer.apple.com/documentation/metal/metal_sample_code_library/customizing_a_tensorflow_operation)
Posted
by Cemlyn.
Last updated
.
Post not yet marked as solved
8 Replies
2.1k Views
We run into an issue that a more complex model fails to converge on M1 Max GPU while it converges on its CPU and on Non-M1 based models. the performance is the same for CPU and GPU for models with single RNN but once we use two RNNs GPU fails to converge. That said, the below example is based on non-sensical data for the model architecture used. but we can observe here the same behavior as the one we observe in our production models (which for obvious reasons we cannot share here). Mainly: the loss goes down to the bottom of the e-06 precision in all cases but when we use two RNNs on GPU. during training we often test e-07 precision level for double RNN with GPU condition, the results do not go that low sometimes reaching also e-05 value level. for our production data we see that double RNN with GPU results in loss of 1.0 and basically stays the same from the first epoch; but for the other conditions it often reaches 0.2 level with clear learning curve. in production model increasing the LSTM_Cell number made the divergence more visible (in this syntactic date it does not happen) the more complex the model is (after the RNN layers) the more visible the issue. Suspected issues: different precision used in CPU and GPU training - we had to decrease the data values a lot to make the effect visible ( if you work with raw data all approaches seem to produce the comparable results) somehow the vanishing gradient problem is more pronounced on GPU as indicated by worse performance as the complexity of the model increases. please let me know if you need any further details Software Stack: Mac OS 12.1 tf 2.7 metal 0.3 also tested on tf. 2.8 Sample Syntax: TEST CONDITIONS: #conditions with issue: 1,2 gpu = 1 # 0 CPU, 1 GPU model_size = 2 # 1 single RNN, 2 double RNN #PARAMETERS LSTM_Cells = 64 epochs = 300 batch = 128 import numpy as np import pandas as pd import sys from sklearn import preprocessing #""" if 'tensorflow' in sys.modules: print("tensorflow uploaded") del sys.modules["tensorflow"] #del tf import tensorflow as tf else: print("tensorflow not uploaded") import tensorflow as tf if gpu == 1: pass else: tf.config.set_visible_devices([], 'GPU') #print("GPUs:", tf.config.list_physical_devices('GPU')) print("GPUs:", tf.config.list_logical_devices('GPU')) #print("CPUs:", tf.config.list_physical_devices('CPU')) print("CPUs:", tf.config.list_logical_devices('CPU')) #""" from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data' column_names = ['MPG', 'Displacement', 'Horsepower', 'Weight'] dataset = pd.read_csv(url, names=column_names, na_values='?', comment='\t', sep=' ', skipinitialspace=True).dropna() scaler = preprocessing.StandardScaler().fit(dataset) X_scaled = scaler.transform(dataset) X_scaled = X_scaled * 0.001 Large Values #x_train = np.array(dataset[['Horsepower', 'Weight']]).reshape(-1,2,2) #y_train = np.array(dataset[['MPG','Displacement']]).reshape(-1,2,2) Small Values x_train = np.array(X_scaled[:,2:]).reshape(-1,2,2) y_train = np.array(X_scaled[:,:2]).reshape(-1,2,2) #print(dataset) print(x_train.shape) print(y_train.shape) print(weight.shape) train_data = tf.data.Dataset.from_tensor_slices((x_train[:,:,:8], y_train)).cache().shuffle(x_train.shape[0]).batch(batch).repeat().prefetch(tf.data.experimental.AUTOTUNE) if model_size == 2: #""" # MINIMAL NOT WORKING encoder_inputs = tf.keras.Input(shape=(x_train.shape[1],x_train.shape[2])) encoder_l1 = tf.keras.layers.LSTM(LSTM_Cells,return_sequences = True, return_state=True) encoder_l1_outputs = encoder_l1(encoder_inputs) encoder_l2 = tf.keras.layers.LSTM(LSTM_Cells, return_state=True) encoder_l2_outputs = encoder_l2(encoder_l1_outputs[0]) dense_1 = tf.keras.layers.Dense(128, activation='relu')(encoder_l2_outputs[0]) dense_2 = tf.keras.layers.Dense(64, activation='relu')(dense_1) dense_3 = tf.keras.layers.Dense(32, activation='relu')(dense_2) dense_4 = tf.keras.layers.Dense(16, activation='relu')(dense_3) flat = tf.keras.layers.Flatten()(dense_2) dense_5 = tf.keras.layers.Dense(22)(flat) reshape_output = tf.keras.layers.Reshape([2,2])(dense_5) model = tf.keras.models.Model(encoder_inputs, reshape_output) #""" else: #""" # WORKING encoder_inputs = tf.keras.Input(shape=(x_train.shape[1],x_train.shape[2])) encoder_l1 = tf.keras.layers.LSTM(LSTM_Cells,return_sequences = True, return_state=True) encoder_l1_outputs = encoder_l1(encoder_inputs) dense_1 = tf.keras.layers.Dense(128, activation='relu')(encoder_l1_outputs[0]) dense_2 = tf.keras.layers.Dense(64, activation='relu')(dense_1) dense_3 = tf.keras.layers.Dense(32, activation='relu')(dense_2) dense_4 = tf.keras.layers.Dense(16, activation='relu')(dense_3) flat = tf.keras.layers.Flatten()(dense_2) dense_5 = tf.keras.layers.Dense(22)(flat) reshape_output = tf.keras.layers.Reshape([2,2])(dense_5) model = tf.keras.models.Model(encoder_inputs, reshape_output) #""" print(model.summary()) loss_tf = tf.keras.losses.MeanSquaredError() model.compile(optimizer='adam', loss=loss_tf, run_eagerly=True) model.fit(train_data, epochs = epochs, steps_per_epoch = 3)
Posted
by sebtac.
Last updated
.
Post not yet marked as solved
1 Replies
437 Views
Trying to setup Tensorflow on mac M1. conda install -c apple tensorflow-deps throwing following error: UnsatisfiableError: The following specifications were found to be incompatible with each other: Output in format: Requested package -> Available versions following specifications were found to be incompatible with your system: - feature:/osx-arm64::__osx==13.6=0 - tensorflow-deps -> grpcio[version='>=1.37.0,<2.0'] -> __osx[version='>=10.10|>=10.9'] Your installed version is: 13.6 The .condarc as follows: channels: - defaults subdirs: - osx-arm64 - osx-64 - noarch ssl_verify: false subdir: osx-arm64 And conda info: active environment : base active env location : /Users/mdrahman/miniconda3 shell level : 1 user config file : /Users/mdrahman/.condarc populated config files : /Users/mdrahman/.condarc conda version : 23.5.2 conda-build version : not installed python version : 3.11.4.final.0 virtual packages : __archspec=1=arm64 __osx=13.6=0 __unix=0=0 base environment : /Users/mdrahman/miniconda3 (writable) conda av data dir : /Users/mdrahman/miniconda3/etc/conda conda av metadata url : None channel URLs : https://repo.anaconda.com/pkgs/main/osx-arm64 https://repo.anaconda.com/pkgs/main/osx-64 https://repo.anaconda.com/pkgs/main/noarch https://repo.anaconda.com/pkgs/r/osx-arm64 https://repo.anaconda.com/pkgs/r/osx-64 https://repo.anaconda.com/pkgs/r/noarch package cache : /Users/mdrahman/miniconda3/pkgs /Users/mdrahman/.conda/pkgs envs directories : /Users/mdrahman/miniconda3/envs /Users/mdrahman/.conda/envs platform : osx-arm64 user-agent : conda/23.5.2 requests/2.29.0 CPython/3.11.4 Darwin/22.6.0 OSX/13.6 UID:GID : 501:20 netrc file : None offline mode : False``` Looking forward for your support.
Posted
by Hafizur.
Last updated
.
Post not yet marked as solved
14 Replies
6.4k Views
System information Script can be found below MacBook Pro M1 (Mac OS Big Sir (11.5.1)) TensorFlow installed from (source) TensorFlow version (2.5 version) with Metal Support Python version: 3.9 GPU model and memory: MacBook Pro M1 and 16 GB Steps needed for installing Tensorflow with metal support. https://developer.apple.com/metal/tensorflow-plugin/ I am trying to train a model on Macbook Pro M1, but the performance is so bad and the train doesn't work properly. It takes a ridiculously long time just for a single epoch. Code needed for reproducing this behavior. import tensorflow as tf from tensorflow.keras.datasets import imdb from tensorflow.keras.layers import Embedding, Dense, LSTM from tensorflow.keras.losses import BinaryCrossentropy from tensorflow.keras.models import Sequential from tensorflow.keras.optimizers import Adam from tensorflow.keras.preprocessing.sequence import pad_sequences # Model configuration additional_metrics = ['accuracy'] batch_size = 128 embedding_output_dims = 15 loss_function = BinaryCrossentropy() max_sequence_length = 300 num_distinct_words = 5000 number_of_epochs = 5 optimizer = Adam() validation_split = 0.20 verbosity_mode = 1 # Disable eager execution tf.compat.v1.disable_eager_execution() # Load dataset (x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=num_distinct_words) print(x_train.shape) print(x_test.shape) # Pad all sequences padded_inputs = pad_sequences(x_train, maxlen=max_sequence_length, value = 0.0) # 0.0 because it corresponds with <PAD> padded_inputs_test = pad_sequences(x_test, maxlen=max_sequence_length, value = 0.0) # 0.0 because it corresponds with <PAD> # Define the Keras model model = Sequential() model.add(Embedding(num_distinct_words, embedding_output_dims, input_length=max_sequence_length)) model.add(LSTM(10)) model.add(Dense(1, activation='sigmoid')) # Compile the model model.compile(optimizer=optimizer, loss=loss_function, metrics=additional_metrics) # Give a summary model.summary() # Train the model history = model.fit(padded_inputs, y_train, batch_size=batch_size, epochs=number_of_epochs, verbose=verbosity_mode, validation_split=validation_split) # Test the model after training test_results = model.evaluate(padded_inputs_test, y_test, verbose=False) print(f'Test results - Loss: {test_results[0]} - Accuracy: {100*test_results[1]}%') I have noticed this same problem with LSTM layers Also, this issue is been reported in Keras and they can't debug. Keras issue https://github.com/keras-team/keras/issues/15003
Posted
by OriAlpha.
Last updated
.
Post not yet marked as solved
0 Replies
366 Views
Hi, following instructions at https://developer.apple.com/metal/jax/, jax works fine on M1 pro. However, only in Terminal. If you run Jupyter Notebook or Pycharm, the following always defaults to CPU. from jax.lib import xla_bridge print(xla_bridge.get_backend().platform) I also notice that if you restart the Terminal, jax defaults to CPU only. You need to always set the virtual environment to jax-meta first to get Apple Silicon's GPU work: python3 -m venv ~/jax-metal source ~/jax-metal/bin/activate Is there any way to make sure that Jupyter Notebook and other IDEs default to jax-metal? I'm currently only able to use it in Terminal after each time manually setting the virtual environment to jax-metal, which is annoying.
Posted
by GSSpython.
Last updated
.
Post marked as solved
3 Replies
522 Views
System Information MacOS version: 13.4 TensorFlow (macos) version: tf-nightly==2.14.0.dev20230616 TensorFlow-Metal Plugin Version: 1.0.1 Problem Description I'm trying to compute the CTC Loss using TensorFlow's tf.nn.ctc_loss on M1 Mac, but an error is thrown indicating that no OpKernel was registered to support the CTCLossV2 operation. However, when using the CPU or even tf.keras.backend.ctc_batch_cost, it works fine. The error stack trace is as follows: tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error: Detected at node CTCLossV2 defined at (most recent call last): <stack traces unavailable> No OpKernel was registered to support Op 'CTCLossV2' used by {{node CTCLossV2}} with these attrs: [ctc_merge_repeated=true, preprocess_collapse_repeated=false, ignore_longer_outputs_than_inputs=false] Registered devices: [CPU, GPU] Registered kernels: <no registered kernels> [[CTCLossV2]] [[ctc_loss_func/PartitionedCall]] [Op:__inference_train_function_13095]
Posted Last updated
.
Post not yet marked as solved
41 Replies
30k Views
Device: MacBook Pro 16 M1 Max, 64GB running MacOS 12.0.1. I tried setting up GPU Accelerated TensorFlow on my Mac using the following steps: Setup: XCode CLI / Homebrew/ Miniforge Conda Env: Python 3.9.5 conda install -c apple tensorflow-deps python -m pip install tensorflow-macos python -m pip install tensorflow-metal brew install libjpeg conda install -y matplotlib jupyterlab In Jupyter Lab, I try to execute this code: from tensorflow.keras import layers from tensorflow.keras import models model = models.Sequential() model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1))) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu')) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu')) model.add(layers.Flatten()) model.add(layers.Dense(64, activation='relu')) model.add(layers.Dense(10, activation='softmax')) model.summary() The code executes, but I get this warning, indicating no GPU Acceleration can be used as it defaults to a 0MB GPU. Error: Metal device set to: Apple M1 Max 2021-10-27 08:23:32.872480: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2021-10-27 08:23:32.872707: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>) Anyone has any idea how to fix this? I came across a bunch of posts around here related to the same issue but with no solid fix. I created a new question as I found the other questions less descriptive of the issue, and wanted to comprehensively depict it. Any fix would be of much help.
Posted Last updated
.
Post marked as solved
27 Replies
17k Views
It doesn't matter if I install miniforge or mamba, directly or through brew, when I try to fit the sample model from https://developer.apple.com/metal/tensorflow-plugin/, even with a simple sequential model, I always get this error. Is there any workaround on this? I'll appreciate any help, thanks! 2022-12-10 11:18:19.941623: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz 2022-12-10 11:18:20.427283: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled. 2022-12-10 11:18:21.222950: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.223003: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.363366: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.364757: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.388739: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.388757: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 NotFoundError Traceback (most recent call last) Cell In[25], line 2 1 model = create_model() ----> 2 history = model.fit(Xf_train, yf_train, epochs=3, batch_size=64); File /opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/utils/traceback_utils.py:70, in filter_traceback..error_handler(*args, **kwargs) 67 filtered_tb = _process_traceback_frames(e.traceback) 68 # To get the full stack trace, call: 69 # tf.debugging.disable_traceback_filtering() ---> 70 raise e.with_traceback(filtered_tb) from None 71 finally: 72 del filtered_tb File /opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/tensorflow/python/eager/execute.py:52, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name) 50 try: 51 ctx.ensure_initialized() ---> 52 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, 53 inputs, attrs, num_outputs) 54 except core._NotOkStatusException as e: 55 if name is not None: NotFoundError: Graph execution error: Detected at node 'StatefulPartitionedCall_4' defined at (most recent call last): File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/ipykernel_launcher.py", line 17, in app.launch_new_instance() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/traitlets/config/application.py", line 992, in launch_instance app.start() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/ipykernel/kernelapp.py", line 711, in start self.io_loop.start() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/tornado/platform/asyncio.py", line 215, in start self.asyncio_loop.run_forever() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/asyncio/base_events.py", line 603, in run_forever self._run_once() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/asyncio/base_events.py", line 1899, in _run_once handle._run() ... File "/var/folders/f9/bp40pn0d401d974fy48dxm8h0000gn/T/ipykernel_63636/3393788193.py", line 2, in <module> history = model.fit(Xf_train, yf_train, epochs=3, batch_size=64); File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler return fn(*args, **kwargs) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/engine/training.py", line 1650, in fit tmp_logs = self.train_function(iterator) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/engine/training.py", line 1249, in train_function return step_function(self, iterator) ...... File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/engine/training.py", line 1222, in run_step outputs = model.train_step(data) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/engine/training.py", line 1027, in train_step self.optimizer.minimize(loss, self.trainable_variables, tape=tape) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize self.apply_gradients(grads_and_vars) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients return super().apply_gradients(grads_and_vars, name=name) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 634, in apply_gradients iteration = self._internal_apply_gradients(grads_and_vars) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1166, in _internal_apply_gradients return tf.__internal__.distribute.interim.maybe_merge_call( File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1216, in _distributed_apply_gradients_fn distribution.extended.update( File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1211, in apply_grad_to_update_var return self._update_step_xla(grad, var, id(self._var_key(var))) Node: 'StatefulPartitionedCall_4' could not find registered platform with id: 0x28edf1f90 [[{{node StatefulPartitionedCall_4}}]] [Op:__inference_train_function_1241]
Posted
by ppobar.
Last updated
.
Post marked as solved
20 Replies
36k Views
Following the instructions at https://developer.apple.com/metal/tensorflow-plugin/ I got as far as python -m pip install tensorflow-macos and it responded "ERROR: Could not find a version that satisfies the requirement tensorflow-macos (from versions: none) ERROR: No matching distribution found for tensorflow-macos" I'd be grateful for any suggestions
Posted Last updated
.
Post not yet marked as solved
7 Replies
2.2k Views
Hello,everyone! I recently purchased a MacBook Air with the M1 chip and used it for neural network training. While testing the VGG neural network on the CIFAR-10 dataset, I found the training speed to be too slow. Following a recommendation, I installed TensorFlow-Metal for hardware acceleration. Prior to installing TensorFlow-Metal, each epoch took approximately 12 minutes, and after 5 epochs, the model's accuracy reached 0.72. However, after installing TensorFlow-Metal and conducting the model training again, the runtime was significantly reduced. Unfortunately, after 5 epochs, the accuracy remained between 0.1 to 0.2, almost equivalent to random selection. I am puzzled as to why this is happening.After installing TensorFlow-Metal, are there any specific considerations to keep in mind? What changes are required in the code compared to not installing it?
Posted
by Peanu11.
Last updated
.
Post not yet marked as solved
2 Replies
2.3k Views
Testing out https://developer.apple.com/metal/jax/ mainly for trying to run whisper-jax (https://github.com/sanchit-gandhi/whisper-jax/tree/main) on my M2. The jax-metal plugin seems to install without issues, and the basic test code runs fine. However, the jax-whisper code fails when trying to encode a file with the following error: error: failed to legalize operation 'mhlo.convolution' /Users/pere/jax-metal/lib/python3.10/site-packages/whisper_jax/layers.py:1236:0: note: see current operation: %111 = "mhlo.convolution"(%110, <<UNKNOWN SSA VALUE>>) {batch_group_count = 1 : i64, dimension_numbers = #mhlo.conv<[b, 0, f]x[0, i, o]->[b, 0, f]>, feature_group_count = 1 : i64, lhs_dilation = dense<1> : tensor<1xi64>, padding = dense<1> : tensor<1x2xi64>, precision_config = [#mhlo<precision DEFAULT>, #mhlo<precision DEFAULT>], rhs_dilation = dense<1> : tensor<1xi64>, window_reversal = dense<false> : tensor<1xi1>, window_strides = dense<1> : tensor<1xi64>} : (tensor<1x3000x80xf32>, tensor<3x80x384xf32>) -> tensor<1x3000x384xf32>
Posted
by PerEK.
Last updated
.
Post not yet marked as solved
2 Replies
404 Views
I'm trying to train a trivial example of a CNN using cifar10 dataset on CPU vs GPU. Although GPU is much faster, the accuracies and losses behave really strange: Accuracy and validation accuracy on CPU looks like follows: Training Accuracy on GPU looks like this: Package Versions: python = "3.11.4" tensorflow = "2.13.0" tensorflow-macos = "2.13.0" tensorflow-metal = "1.0.1" The code: import tensorflow as tf # with tf.device("/cpu:0"): # uncomment and indent the following to run on CPU from tensorflow.keras import datasets, layers, models import matplotlib.pyplot as plt (train_images, train_labels), ( test_images, test_labels, ) = datasets.cifar10.load_data() # Normalize pixel values to be between 0 and 1 train_images, test_images = train_images / 255.0, test_images / 255.0 class_names = [ "airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck", ] plt.figure(figsize=(10, 10)) for i in range(25): plt.subplot(5, 5, i + 1) plt.xticks([]) plt.yticks([]) plt.grid(False) plt.imshow(train_images[i]) # The CIFAR labels happen to be arrays, # which is why you need the extra index plt.xlabel(class_names[train_labels[i][0]]) plt.show() model = models.Sequential() model.add(layers.Conv2D(32, (3, 3), activation="relu", input_shape=(32, 32, 3))) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation="relu")) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation="relu")) model.summary() model.add(layers.Flatten()) model.add(layers.Dense(64, activation="relu")) model.add(layers.Dense(10)) model.summary() model.compile( optimizer="adam", loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=["accuracy"], ) history = model.fit( train_images, train_labels, epochs=20, batch_size=64, validation_data=(test_images, test_labels), ) plt.plot(history.history["accuracy"], label="accuracy") plt.plot(history.history["val_accuracy"], label="val_accuracy") plt.xlabel("Epoch") plt.ylabel("Accuracy") plt.ylim([0, 1]) plt.legend(loc="lower right") test_loss, test_acc = model.evaluate( test_images, test_labels, batch_size=64, verbose=2 ) Hope this helps for reconstruction.
Posted
by styx0r.
Last updated
.
Post not yet marked as solved
0 Replies
356 Views
When I use tensorflow to write maskrcnn, bus error will be reported. It is not a matter of memory, but a problem of the system itself. I hope it can be solved
Posted
by xulinqing.
Last updated
.
Post not yet marked as solved
1 Replies
445 Views
Running into GPU related error while working with latest tensorflow ( 2.13 ) . Please note the test model training provided on tensorflow-metal page to verify my setup works fine. PLEASE ADVISE - tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __wrapped__IteratorGetNext_output_types_18_device_/job:localhost/replica:0/task:0/device:GPU:0}} indices[0] = 0 is not in [0, 0) [[{{node GatherV2_7}}]] [[MultiDeviceIteratorGetNextFromShard]] [[RemoteCall]] [Op:IteratorGetNext] name: The above are the last lines of the error message. below is the full log from the model training script https://stackoverflow.com/questions/77076602/training-custom-data-set-model-using-mask-rcnn-inception-from-tensorflow-model-z I went to SO since I cant share the full log here due to length restrictions. Please help.
Posted Last updated
.
Post not yet marked as solved
1 Replies
606 Views
I am just starting to learn neural networks. If I run my code and try to fit a simple trigonometric function, the model builds a good-looking function. If I pip install tensorflow-metal and run, I get a straight line not resembling the non-linear function at all. if I uninstall metal, everything works again. Which suggests there is something wrong with metal. Any help would be appreciated. I would use the metal acceleration for the next steps in my project. Thank you
Posted
by Yash11.
Last updated
.
Post not yet marked as solved
0 Replies
289 Views
I very recently noticed installation of the tensorflow-metal plugin would lead to a CNN model (loaded from a Keras-tuner checkpoint) gives completely different and unreliable predictions, soon below: Without tensorflow-metal With tensorflow-metal The same code and dataset, with the only difference being tensorflow-metal. Also tried both tensorflow-macos 2.12 and 2.13, same issue. Other info: MacBook Air M2 with 8-core M2 chip, macOS Ventura 13.5.1 (22G90) Python 3.9.6 installed by anaconda 3 The source code is related to a unpublished Paper and should be publicly available by Oct. 2023 and I would unload the code to recreate the issue as soon as possible.
Posted Last updated
.