tensorflow-metal

RSS for tag

TensorFlow accelerates machine learning model training with Metal on Mac GPUs.

tensorflow-metal Documentation

Posts under tensorflow-metal tag

176 Posts
Sort by:
Post not yet marked as solved
0 Replies
2 Views
The JAX ml_dtypes module was recently updated to 0.3.0 - as part of this change, the 'float8_e4m3b11' dtype has been deprecated, with newer versions of JAX also reflecting this change. The new ml_dtypes version now seems to be incompatible with JAX v0.4.11. As jax-metal currently requires JAX v0.4.11, perhaps the dependencies list should be updated to include ml_dtypes==0.2.0 in order to prevent the following import error: AttributeError: module 'ml_dtypes' has no attribute 'float8_e4m3b11' Which essentially makes JAX unusable on import (and appears to be fixed by pip install ml_dtypes==0.2.0)
Posted
by tdgfrost.
Last updated
.
Post not yet marked as solved
0 Replies
68 Views
I'm trying to train a trivial example of a CNN using cifar10 dataset on CPU vs GPU. Although GPU is much faster, the accuracies and losses behave really strange: Accuracy and validation accuracy on CPU looks like follows: Training Accuracy on GPU looks like this: Package Versions: python = "3.11.4" tensorflow = "2.13.0" tensorflow-macos = "2.13.0" tensorflow-metal = "1.0.1" The code: import tensorflow as tf # with tf.device("/cpu:0"): # uncomment and indent the following to run on CPU from tensorflow.keras import datasets, layers, models import matplotlib.pyplot as plt (train_images, train_labels), ( test_images, test_labels, ) = datasets.cifar10.load_data() # Normalize pixel values to be between 0 and 1 train_images, test_images = train_images / 255.0, test_images / 255.0 class_names = [ "airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck", ] plt.figure(figsize=(10, 10)) for i in range(25): plt.subplot(5, 5, i + 1) plt.xticks([]) plt.yticks([]) plt.grid(False) plt.imshow(train_images[i]) # The CIFAR labels happen to be arrays, # which is why you need the extra index plt.xlabel(class_names[train_labels[i][0]]) plt.show() model = models.Sequential() model.add(layers.Conv2D(32, (3, 3), activation="relu", input_shape=(32, 32, 3))) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation="relu")) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation="relu")) model.summary() model.add(layers.Flatten()) model.add(layers.Dense(64, activation="relu")) model.add(layers.Dense(10)) model.summary() model.compile( optimizer="adam", loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=["accuracy"], ) history = model.fit( train_images, train_labels, epochs=20, batch_size=64, validation_data=(test_images, test_labels), ) plt.plot(history.history["accuracy"], label="accuracy") plt.plot(history.history["val_accuracy"], label="val_accuracy") plt.xlabel("Epoch") plt.ylabel("Accuracy") plt.ylim([0, 1]) plt.legend(loc="lower right") test_loss, test_acc = model.evaluate( test_images, test_labels, batch_size=64, verbose=2 ) Hope this helps for reconstruction.
Posted
by styx0r.
Last updated
.
Post not yet marked as solved
0 Replies
75 Views
When I use tensorflow to write maskrcnn, bus error will be reported. It is not a matter of memory, but a problem of the system itself. I hope it can be solved
Posted
by xulinqing.
Last updated
.
Post not yet marked as solved
1 Replies
88 Views
Running into GPU related error while working with latest tensorflow ( 2.13 ) . Please note the test model training provided on tensorflow-metal page to verify my setup works fine. PLEASE ADVISE - tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __wrapped__IteratorGetNext_output_types_18_device_/job:localhost/replica:0/task:0/device:GPU:0}} indices[0] = 0 is not in [0, 0) [[{{node GatherV2_7}}]] [[MultiDeviceIteratorGetNextFromShard]] [[RemoteCall]] [Op:IteratorGetNext] name: The above are the last lines of the error message. below is the full log from the model training script https://stackoverflow.com/questions/77076602/training-custom-data-set-model-using-mask-rcnn-inception-from-tensorflow-model-z I went to SO since I cant share the full log here due to length restrictions. Please help.
Posted Last updated
.
Post not yet marked as solved
1 Replies
182 Views
I am just starting to learn neural networks. If I run my code and try to fit a simple trigonometric function, the model builds a good-looking function. If I pip install tensorflow-metal and run, I get a straight line not resembling the non-linear function at all. if I uninstall metal, everything works again. Which suggests there is something wrong with metal. Any help would be appreciated. I would use the metal acceleration for the next steps in my project. Thank you
Posted
by Yash11.
Last updated
.
Post not yet marked as solved
1 Replies
171 Views
macbook pro m2 max/ 64G / macos:13.2.1 (22D68) import tensorflow as tf def runMnist(device = '/device:CPU:0'): with tf.device(device): #tf.config.set_default_device(device) mnist = tf.keras.datasets.mnist (x_train, y_train), (x_test, y_test) = mnist.load_data() x_train, x_test = x_train / 255.0, x_test / 255.0 model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(128, activation='relu'), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(10) ]) loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True) model.compile(optimizer='adam', loss=loss_fn, metrics=['accuracy']) model.fit(x_train, y_train, epochs=10) runMnist(device = '/device:CPU:0') runMnist(device = '/device:GPU:0')
Posted Last updated
.
Post not yet marked as solved
0 Replies
116 Views
I very recently noticed installation of the tensorflow-metal plugin would lead to a CNN model (loaded from a Keras-tuner checkpoint) gives completely different and unreliable predictions, soon below: Without tensorflow-metal With tensorflow-metal The same code and dataset, with the only difference being tensorflow-metal. Also tried both tensorflow-macos 2.12 and 2.13, same issue. Other info: MacBook Air M2 with 8-core M2 chip, macOS Ventura 13.5.1 (22G90) Python 3.9.6 installed by anaconda 3 The source code is related to a unpublished Paper and should be publicly available by Oct. 2023 and I would unload the code to recreate the issue as soon as possible.
Posted Last updated
.
Post not yet marked as solved
0 Replies
87 Views
For a tensorflow layer I need a multi column argsort. So I implemented the following function: import tensorflow as tf def multi_column_argsort(tensor, columns_order): sorted_indices = tf.range(start=0, limit=tf.shape(tensor)[0], dtype=tf.int32) for col in reversed(columns_order): col_vals = tf.gather(tensor[:, col], sorted_indices) col_argsort = tf.argsort(col_vals, stable=True) print("Column:", col) print("Column Values:", col_vals.numpy()) print("Col Argsort:", col_argsort.numpy()) print("Sorted Indices Before:", sorted_indices.numpy()) sorted_indices = tf.gather(sorted_indices, col_argsort) print("Sorted Indices After:", sorted_indices.numpy()) print("---") return sorted_indices After debugging this function for a while I found out that it was not sorting the 3 columns as expected because the argsort were not stable i.e. did not respect the previous sorting. To test this I used the following example: points = tf.constant([[1.1, 2.0, 0.1], [1.1, 1.0, 0.2], [2.2, 1.0, 0.1], [1.1, 2.0, 0.2], [1.1, 1.0, 0.1]]) columns_order = [0, 1, 2] sorted_indices = multi_column_argsort(points, columns_order) print("Final Sorted Indices:", sorted_indices.numpy()) With the following result: Column: 2 Column Values: [0.1 0.2 0.1 0.2 0.1] Col Argsort: [2 4 0 3 1] Sorted Indices Before: [0 1 2 3 4] Sorted Indices After: [2 4 0 3 1] --- Column: 1 Column Values: [1. 1. 2. 2. 1.] Col Argsort: [1 4 0 3 2] Sorted Indices Before: [2 4 0 3 1] Sorted Indices After: [4 1 2 3 0] --- Column: 0 Column Values: [1.1 1.1 2.2 1.1 1.1] Col Argsort: [3 1 4 0 2] Sorted Indices Before: [4 1 2 3 0] Sorted Indices After: [3 1 0 4 2] --- Final Sorted Indices: [3 1 0 4 2] which is obviously wrong at every passage I tested the same code in a colab environment and the result was as expected Column: 2 Column Values: [0.1 0.2 0.1 0.2 0.1] Col Argsort: [0 2 4 1 3] Sorted Indices Before: [0 1 2 3 4] Sorted Indices After: [0 2 4 1 3] --- Column: 1 Column Values: [2. 1. 1. 1. 2.] Col Argsort: [1 2 3 0 4] Sorted Indices Before: [0 2 4 1 3] Sorted Indices After: [2 4 1 0 3] --- Column: 0 Column Values: [2.2 1.1 1.1 1.1 1.1] Col Argsort: [1 2 3 4 0] Sorted Indices Before: [2 4 1 0 3] Sorted Indices After: [4 1 0 3 2] --- Final Sorted Indices: [4 1 0 3 2] And this is correct and consistent with the documentation My environments specs: Apple M2 Max 96 GB MacOS Ventura Version 13.4.1 (c) (22F770820d) tensorflow==2.13.0rc1 tensorflow-datasets==4.9.2 tensorflow-estimator==2.13.0rc0 tensorflow-macos==2.13.0rc1 tensorflow-metadata==1.14.0 tensorflow-metal==1.0.1
Posted Last updated
.
Post not yet marked as solved
4 Replies
830 Views
I initially raised this issue in the tensorflow forum, and they directed me back here since this is a tf-macos specific problem [see https://github.com/tensorflow/tensorflow/issues/60673]. When calling Model.compile() with the AdamW optimizer, a warning is thrown saying that v2.11+ optimizers have a known slowdown on M1/M2 devices, and so the backend attempts to fallback to a legacy version. However, no legacy version of the AdamW optimizer exists. In a previous tf-macos version 2.12, this lead to an error during Model.compile() [see issue https://github.com//issues/60652 and https://developer.apple.com/forums/thread/729732]. In the current nightly, this error is not thrown - however, after calling model.compile(), the attribute model.optimizer is set to string 'adamw' instead of an optimizer object. Later, when we call model.fit(), this leads to an AttributeError, because model.optimizer.minimize() does not exist when model.optimizer is a string. Expected behaviour: correctly compile the model with either a v2.11+ optimiser without slowdown, or a legacy-compatible implementation of the AdamW optimizer. Then the model will train correctly with a valid AdamW optimizer when calling model.fit(). Note: a warning message suggests using the optimizer located at tf.keras.optimizers.legacy.AdamW, but this does not exist It would be nice to be able to either use modern optimizers, or have a legacy-compatible version of AdamW, since weight-decay is an important tool in modern ML research, and currently cannot be used on mac. Standalone code to reproduce the issue ##===========## ## Imports ## ##===========## import sys import tensorflow as tf import numpy as np from tensorflow.keras.models import Model from tensorflow.keras.layers import Input, Dense from tensorflow.keras.optimizers import AdamW ##===================## ## Report versions ## ##===================## # # Expected outputs: # Python version is: 3.10.11 | packaged by conda-forge | (main, May 10 2023, 19:01:19) [Clang 14.0.6 ] # TF version is: 2.14.0-dev20230523 # Numpy version is: 1.23.2 # print(f"Python version is: {sys.version}") print(f"TF version is: {tf.__version__}") print(f"Numpy version is: {np.__version__}") ##==============================## ## Create a very simple model ## ##==============================## # # Expected outputs: # Model: "model_1" # _________________________________________________________________ # Layer (type) Output Shape Param # # ================================================================= # Layer_in (InputLayer) [(None, 2)] 0 # # Layer_hidden (Dense) (None, 10) 30 # # Layer_out (Dense) (None, 2) 22 # # ================================================================= # Total params: 52 (208.00 Byte) # Trainable params: 52 (208.00 Byte) # Non-trainable params: 0 (0.00 Byte) # _________________________________________________________________ # x_in = Input(2 , dtype=tf.float32, name="Layer_in" ) x = x_in x = Dense(10, dtype=tf.float32, name="Layer_hidden", activation="relu" )(x) x = Dense(2 , dtype=tf.float32, name="Layer_out" , activation="linear")(x) model = Model(x_in, x) model.summary() ##===================================================## ## Compile model with MSE loss and AdamW optimizer ## ##===================================================## # # Expected outputs: # WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.AdamW` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.AdamW`. # WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.AdamW`. # model.compile( loss = "mse", optimizer = AdamW(learning_rate=1e-3, weight_decay=1e-2) ) ##===========================## ## Generate some fake data ## ##===========================## # # Expected outputs: # X shape is (100, 2), Y shape is (100, 2) # dataset_size = 100 X = np.random.normal(size=(dataset_size, 2)) X = tf.constant(X, dtype=tf.float32) Y = np.random.normal(size=(dataset_size, 2)) Y = tf.constant(Y, dtype=tf.float32) print(f"X shape is {X.shape}, Y shape is {Y.shape}") ##===================================## ## Fit model to data for one epoch ## ##===================================## # # Expected outputs: # --------------------------------------------------------------------------- # AttributeError Traceback (most recent call last) # Cell In[9], line 51 # 1 ##===================================## # 2 ## Fit model to data for one epoch ## # 3 ##===================================## # (...) # 48 # • mask=None # 49 # # ---> 51 model.fit(X, Y, epochs=1) # File ~/miniforge3/envs/tf_macos_nightly_230523/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py:70, in filter_traceback.<locals>.error_handler(*args, **kwargs) # 67 filtered_tb = _process_traceback_frames(e.__traceback__) # 68 # To get the full stack trace, call: # 69 # `tf.debugging.disable_traceback_filtering()` # ---> 70 raise e.with_traceback(filtered_tb) from None # 71 finally: # 72 del filtered_tb # File /var/folders/6_/gprzxt797d5098h8dtk22nch0000gn/T/__autograph_generated_filezzqv9k36.py:15, in outer_factory.<locals>.inner_factory.<locals>.tf__train_function(iterator) # 13 try: # 14 do_return = True # ---> 15 retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope) # 16 except: # 17 do_return = False # AttributeError: in user code: # File "/Users/Ste/miniforge3/envs/tf_macos_nightly_230523/lib/python3.10/site-packages/keras/src/engine/training.py", line 1338, in train_function * # return step_function(self, iterator) # File "/Users/Ste/miniforge3/envs/tf_macos_nightly_230523/lib/python3.10/site-packages/keras/src/engine/training.py", line 1322, in step_function ** # outputs = model.distribute_strategy.run(run_step, args=(data,)) # File "/Users/Ste/miniforge3/envs/tf_macos_nightly_230523/lib/python3.10/site-packages/keras/src/engine/training.py", line 1303, in run_step ** # outputs = model.train_step(data) # File "/Users/Ste/miniforge3/envs/tf_macos_nightly_230523/lib/python3.10/site-packages/keras/src/engine/training.py", line 1084, in train_step # self.optimizer.minimize(loss, self.trainable_variables, tape=tape) # AttributeError: 'str' object has no attribute 'minimize' model.fit(X, Y, epochs=1)
Posted
by smenary.
Last updated
.
Post not yet marked as solved
6 Replies
1.4k Views
We run into an issue that a more complex model fails to converge on M1 Max GPU while it converges on its CPU and on Non-M1 based models. the performance is the same for CPU and GPU for models with single RNN but once we use two RNNs GPU fails to converge. That said, the below example is based on non-sensical data for the model architecture used. but we can observe here the same behavior as the one we observe in our production models (which for obvious reasons we cannot share here). Mainly: the loss goes down to the bottom of the e-06 precision in all cases but when we use two RNNs on GPU. during training we often test e-07 precision level for double RNN with GPU condition, the results do not go that low sometimes reaching also e-05 value level. for our production data we see that double RNN with GPU results in loss of 1.0 and basically stays the same from the first epoch; but for the other conditions it often reaches 0.2 level with clear learning curve. in production model increasing the LSTM_Cell number made the divergence more visible (in this syntactic date it does not happen) the more complex the model is (after the RNN layers) the more visible the issue. Suspected issues: different precision used in CPU and GPU training - we had to decrease the data values a lot to make the effect visible ( if you work with raw data all approaches seem to produce the comparable results) somehow the vanishing gradient problem is more pronounced on GPU as indicated by worse performance as the complexity of the model increases. please let me know if you need any further details Software Stack: Mac OS 12.1 tf 2.7 metal 0.3 also tested on tf. 2.8 Sample Syntax: TEST CONDITIONS: #conditions with issue: 1,2 gpu = 1 # 0 CPU, 1 GPU model_size = 2 # 1 single RNN, 2 double RNN #PARAMETERS LSTM_Cells = 64 epochs = 300 batch = 128 import numpy as np import pandas as pd import sys from sklearn import preprocessing #""" if 'tensorflow' in sys.modules: print("tensorflow uploaded") del sys.modules["tensorflow"] #del tf import tensorflow as tf else: print("tensorflow not uploaded") import tensorflow as tf if gpu == 1: pass else: tf.config.set_visible_devices([], 'GPU') #print("GPUs:", tf.config.list_physical_devices('GPU')) print("GPUs:", tf.config.list_logical_devices('GPU')) #print("CPUs:", tf.config.list_physical_devices('CPU')) print("CPUs:", tf.config.list_logical_devices('CPU')) #""" from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data' column_names = ['MPG', 'Displacement', 'Horsepower', 'Weight'] dataset = pd.read_csv(url, names=column_names, na_values='?', comment='\t', sep=' ', skipinitialspace=True).dropna() scaler = preprocessing.StandardScaler().fit(dataset) X_scaled = scaler.transform(dataset) X_scaled = X_scaled * 0.001 Large Values #x_train = np.array(dataset[['Horsepower', 'Weight']]).reshape(-1,2,2) #y_train = np.array(dataset[['MPG','Displacement']]).reshape(-1,2,2) Small Values x_train = np.array(X_scaled[:,2:]).reshape(-1,2,2) y_train = np.array(X_scaled[:,:2]).reshape(-1,2,2) #print(dataset) print(x_train.shape) print(y_train.shape) print(weight.shape) train_data = tf.data.Dataset.from_tensor_slices((x_train[:,:,:8], y_train)).cache().shuffle(x_train.shape[0]).batch(batch).repeat().prefetch(tf.data.experimental.AUTOTUNE) if model_size == 2: #""" # MINIMAL NOT WORKING encoder_inputs = tf.keras.Input(shape=(x_train.shape[1],x_train.shape[2])) encoder_l1 = tf.keras.layers.LSTM(LSTM_Cells,return_sequences = True, return_state=True) encoder_l1_outputs = encoder_l1(encoder_inputs) encoder_l2 = tf.keras.layers.LSTM(LSTM_Cells, return_state=True) encoder_l2_outputs = encoder_l2(encoder_l1_outputs[0]) dense_1 = tf.keras.layers.Dense(128, activation='relu')(encoder_l2_outputs[0]) dense_2 = tf.keras.layers.Dense(64, activation='relu')(dense_1) dense_3 = tf.keras.layers.Dense(32, activation='relu')(dense_2) dense_4 = tf.keras.layers.Dense(16, activation='relu')(dense_3) flat = tf.keras.layers.Flatten()(dense_2) dense_5 = tf.keras.layers.Dense(22)(flat) reshape_output = tf.keras.layers.Reshape([2,2])(dense_5) model = tf.keras.models.Model(encoder_inputs, reshape_output) #""" else: #""" # WORKING encoder_inputs = tf.keras.Input(shape=(x_train.shape[1],x_train.shape[2])) encoder_l1 = tf.keras.layers.LSTM(LSTM_Cells,return_sequences = True, return_state=True) encoder_l1_outputs = encoder_l1(encoder_inputs) dense_1 = tf.keras.layers.Dense(128, activation='relu')(encoder_l1_outputs[0]) dense_2 = tf.keras.layers.Dense(64, activation='relu')(dense_1) dense_3 = tf.keras.layers.Dense(32, activation='relu')(dense_2) dense_4 = tf.keras.layers.Dense(16, activation='relu')(dense_3) flat = tf.keras.layers.Flatten()(dense_2) dense_5 = tf.keras.layers.Dense(22)(flat) reshape_output = tf.keras.layers.Reshape([2,2])(dense_5) model = tf.keras.models.Model(encoder_inputs, reshape_output) #""" print(model.summary()) loss_tf = tf.keras.losses.MeanSquaredError() model.compile(optimizer='adam', loss=loss_tf, run_eagerly=True) model.fit(train_data, epochs = epochs, steps_per_epoch = 3)
Posted
by sebtac.
Last updated
.
Post marked as solved
5 Replies
1.2k Views
I'm running example from TF site and getting different results from CPU and GPU. Results from GPU are obviously wrong (second image). Why? If I'm executing code with with tf.device('/cpu:0') then the code works as expected, but slower. It's sufficient to execute this lines on CPU to fix the issue: with tf.device('/cpu:0'): real_output = discriminator(images, training=True) fake_output = discriminator(generated_images, training=True) Source code: https://www.tensorflow.org/tutorials/generative/dcgan My complete results: https://disk.yandex.ru/d/E-hU5dpffOmkLg
Posted Last updated
.
Post not yet marked as solved
1 Replies
313 Views
Hello, I am trying to use gpu for machine learning task from apple using "mps" as device for GPU but it is not working. I am using PyTorch Stable version. How can I use MacBook GPU for machine learning tasks?
Posted
by mishrr.
Last updated
.
Post not yet marked as solved
6 Replies
482 Views
Hello,everyone! I recently purchased a MacBook Air with the M1 chip and used it for neural network training. While testing the VGG neural network on the CIFAR-10 dataset, I found the training speed to be too slow. Following a recommendation, I installed TensorFlow-Metal for hardware acceleration. Prior to installing TensorFlow-Metal, each epoch took approximately 12 minutes, and after 5 epochs, the model's accuracy reached 0.72. However, after installing TensorFlow-Metal and conducting the model training again, the runtime was significantly reduced. Unfortunately, after 5 epochs, the accuracy remained between 0.1 to 0.2, almost equivalent to random selection. I am puzzled as to why this is happening.After installing TensorFlow-Metal, are there any specific considerations to keep in mind? What changes are required in the code compared to not installing it?
Posted
by Peanu11.
Last updated
.
Post not yet marked as solved
0 Replies
173 Views
Hi everyone, I'm trying to test some functionality of jax-metal and got this error. Any help please? import jax import jax.numpy as jnp import numpy as np def f(x): y1=x+x*x+3 y2=x*x+x*x.T return y1*y2 x = np.random.randn(3000,3000).astype('float32') jax_x_gpu = jax.device_put(jnp.array(x), jax.devices('METAL')[0]) jax_x_cpu = jax.device_put(jnp.array(x), jax.devices('cpu')[0]) jax_f_gpu = jax.jit(f, backend='METAL') jax_f_gpu(jax_x_gpu) --------------------------------------------------------------------------- XlaRuntimeError Traceback (most recent call last) Cell In[1], line 17 13 jax_x_cpu = jax.device_put(jnp.array(x), jax.devices('cpu')[0]) 15 jax_f_gpu = jax.jit(f, backend='METAL') ---> 17 jax_f_gpu(jax_x_gpu) [... skipping hidden 5 frame] File ~/.virtualenvs/jax-metal/lib/python3.11/site-packages/jax/_src/pjit.py:817, in _create_sharding_with_device_backend(device, backend) 814 elif backend is not None: 815 assert device is None 816 out = SingleDeviceSharding( --> 817 xb.get_backend(backend).get_default_device_assignment(1)[0]) 818 return out XlaRuntimeError: UNIMPLEMENTED: DefaultDeviceAssignment not supported for Metal Client.
Posted Last updated
.
Post not yet marked as solved
11 Replies
7.1k Views
I m using macbook air 13" and pursuing Artificial Intelligence course I m facing huge problem with Jupyter notebook post installing tensorflow as the kernel keeps dying and I have literally tried solution in every article/resource on Google Nothing seems to be fixing the issue. It began only when I started to run code for Convolutional Neural Network Please help me fix this issue and understand why its not getting fixed At the moment, I can only think of trading Macbook for Windows Laptop but It will be very hard as I have not had hands-on Windows laptop Hope to hear back soon Thanks Keshav Lal Seth
Posted Last updated
.
Post not yet marked as solved
0 Replies
133 Views
I have probably found a bug when indexing tensors with tensorflow-metal. It is best demonstrated by the following minimal example: import tensorflow as tf print(tf.constant([[1, 2], [3, 4]], dtype=tf.float32)[..., :2, 1]) The expected result is [2, 4] (i.e. the second column of the matrix) which is what I get when tensorflow-metal is not installed (and on other non-Apple machines), but using tensorflow-metal I get [2, 2] (i.e. the first element of the column is repeated - this also happens if there are more than two rows). The following conditions seem to be necessary in order to trigger this behavior: dtype must be float32; it works correctly with float64, int32 and int64. the sequence of ellipsis (for batch axes), stride (for row), index (for column) is critical; i.e. it does work correctly when the column is also a stride, and it does work if the row is a single number or the "full" slice :. the indexed tensor does not actually have batch axes (the ellipsis is there because it could have) The original context is: I have function that gets a tensor with 0 or more batch axes containing 4x4 homogenous matrices from which I want to extract the translation, i.e. the first three rows of the last column, which leads to [..., :3, 3]. Versions: python 3.9.6 (system) tensorflow-macos 2.13.0 tensorflow-metal 1.0.1
Posted
by arha.
Last updated
.
Post not yet marked as solved
4 Replies
707 Views
Hi, I've found a memory leak issue when using the tensorFlow-metal plugin for running a deep learning model on a Mac with the M1 chip. Here are the details of my system: System Information MacOS version: 13.4 TensorFlow (macos) version: 2.12.0, 2.13.0-rc1, tf-nightly==2.14.0.dev20230616 TensorFlow-Metal Plugin Version: 0.8, 1.0.0, 1.0.1 Model Details I've implemented a custom model architecture using TensorFlow's Keras API. The model has a dynamic Input, which I resize the images in a Resizing layer. Moreover, the data is passed to the model through a data generator class, using model.fit(). Problem Description When I train this model using the GPU on M1 Mac, I observe a continuous increase in memory usage, leading to a memory leak. This memory increase is more prominent with larger image inputs. For smaller images or average sizes (1024x128), the increase is smaller, but continuous, leading to a memory leak after several epochs. On the other hand, when I switch to using the CPU for training (tf.config.set_visible_devices([], 'GPU')), the memory leak issue is resolved, and I observe normal memory usage. In addition, I've tested the model with different sizes of images and various layer configurations. The memory leak appears to be present only when using the GPU, indeed. I hope this information is helpful in identifying and resolving the issue. If you need any further details, please let me know. The project code is private, but I can try to provide it with pseudocode if necessary.
Posted Last updated
.
Post not yet marked as solved
38 Replies
24k Views
Device: MacBook Pro 16 M1 Max, 64GB running MacOS 12.0.1. I tried setting up GPU Accelerated TensorFlow on my Mac using the following steps: Setup: XCode CLI / Homebrew/ Miniforge Conda Env: Python 3.9.5 conda install -c apple tensorflow-deps python -m pip install tensorflow-macos python -m pip install tensorflow-metal brew install libjpeg conda install -y matplotlib jupyterlab In Jupyter Lab, I try to execute this code: from tensorflow.keras import layers from tensorflow.keras import models model = models.Sequential() model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1))) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu')) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu')) model.add(layers.Flatten()) model.add(layers.Dense(64, activation='relu')) model.add(layers.Dense(10, activation='softmax')) model.summary() The code executes, but I get this warning, indicating no GPU Acceleration can be used as it defaults to a 0MB GPU. Error: Metal device set to: Apple M1 Max 2021-10-27 08:23:32.872480: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2021-10-27 08:23:32.872707: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>) Anyone has any idea how to fix this? I came across a bunch of posts around here related to the same issue but with no solid fix. I created a new question as I found the other questions less descriptive of the issue, and wanted to comprehensively depict it. Any fix would be of much help.
Posted Last updated
.
Post marked as solved
24 Replies
13k Views
It doesn't matter if I install miniforge or mamba, directly or through brew, when I try to fit the sample model from https://developer.apple.com/metal/tensorflow-plugin/, even with a simple sequential model, I always get this error. Is there any workaround on this? I'll appreciate any help, thanks! 2022-12-10 11:18:19.941623: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz 2022-12-10 11:18:20.427283: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled. 2022-12-10 11:18:21.222950: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.223003: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.363366: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.364757: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.388739: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.388757: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 NotFoundError Traceback (most recent call last) Cell In[25], line 2 1 model = create_model() ----> 2 history = model.fit(Xf_train, yf_train, epochs=3, batch_size=64); File /opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/utils/traceback_utils.py:70, in filter_traceback..error_handler(*args, **kwargs) 67 filtered_tb = _process_traceback_frames(e.traceback) 68 # To get the full stack trace, call: 69 # tf.debugging.disable_traceback_filtering() ---> 70 raise e.with_traceback(filtered_tb) from None 71 finally: 72 del filtered_tb File /opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/tensorflow/python/eager/execute.py:52, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name) 50 try: 51 ctx.ensure_initialized() ---> 52 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, 53 inputs, attrs, num_outputs) 54 except core._NotOkStatusException as e: 55 if name is not None: NotFoundError: Graph execution error: Detected at node 'StatefulPartitionedCall_4' defined at (most recent call last): File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/ipykernel_launcher.py", line 17, in app.launch_new_instance() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/traitlets/config/application.py", line 992, in launch_instance app.start() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/ipykernel/kernelapp.py", line 711, in start self.io_loop.start() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/tornado/platform/asyncio.py", line 215, in start self.asyncio_loop.run_forever() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/asyncio/base_events.py", line 603, in run_forever self._run_once() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/asyncio/base_events.py", line 1899, in _run_once handle._run() ... File "/var/folders/f9/bp40pn0d401d974fy48dxm8h0000gn/T/ipykernel_63636/3393788193.py", line 2, in <module> history = model.fit(Xf_train, yf_train, epochs=3, batch_size=64); File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler return fn(*args, **kwargs) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/engine/training.py", line 1650, in fit tmp_logs = self.train_function(iterator) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/engine/training.py", line 1249, in train_function return step_function(self, iterator) ...... File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/engine/training.py", line 1222, in run_step outputs = model.train_step(data) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/engine/training.py", line 1027, in train_step self.optimizer.minimize(loss, self.trainable_variables, tape=tape) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize self.apply_gradients(grads_and_vars) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients return super().apply_gradients(grads_and_vars, name=name) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 634, in apply_gradients iteration = self._internal_apply_gradients(grads_and_vars) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1166, in _internal_apply_gradients return tf.__internal__.distribute.interim.maybe_merge_call( File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1216, in _distributed_apply_gradients_fn distribution.extended.update( File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1211, in apply_grad_to_update_var return self._update_step_xla(grad, var, id(self._var_key(var))) Node: 'StatefulPartitionedCall_4' could not find registered platform with id: 0x28edf1f90 [[{{node StatefulPartitionedCall_4}}]] [Op:__inference_train_function_1241]
Posted
by ppobar.
Last updated
.