TensorFlow with Metal start giving wrong results after upgrading macOS from 12.0.1 to 12.1

After installing tensorflow-metal PluggableDevice according to Getting Started with tensorflow-metal PluggableDevice I have tested this DCGAN example: https://www.tensorflow.org/tutorials/generative/dcgan. Everything was working perfectly until I decided tu upgrade macOS from 12.0.1 to 12.1. Before the final result after 50 epoch was like on picture1 below

, after upgrade is like on picture2 below

.

I am using:

  • TensrofFlow 2.7.0
  • tensorflow-metal-0.3.0
  • python3.9

I hope this question will also help Apple to improve Metal PluggableDevice. I can't wait to use it in my research.

  • I upgraded to 12.1 today. I just launched a DCGAN, I'll let you know. BUT, I have other model in training (an autoencoder) and haven't noticed any difference since yesterday.

Add a Comment

Replies

I'm still on Epoch 5, on a MacBook Air M1 2020, but it look fine too me. so far. My other trainings run just fine too. look like you just got bad luck on this run ? What about the other intermediary result ? do they all look bad ?

edit : I also have some very bad result sometimes, weird. is there a problem with random generation ? i have a model that heavily use random.uniform, I'll check.

EDIT again : I need to double check but random is broken in some situation

  • i have more now. I have some particularly bad result on some epoch, just like your. weird.

  • Thank you very much for such a quick reply. That's very strange, especially as it works fine on macOS 12.0.1.

Add a Comment

wrote a minimal use case, this used to generate 2 different series :

import tensorflow as tf

x = tf.random.uniform((10,))
y = tf.random.uniform((10,))

tf.print(x)
tf.print(y)
[0.178906798 0.8810848 0.384304762 ... 0.162458301 0.64780426 0.0123682022]
[0.178906798 0.8810848 0.384304762 ... 0.162458301 0.64780426 0.0123682022]

works fine on collab :

It also works fine if I disable GPU with :

tf.config.set_visible_devices([], 'GPU')

WORKAROUND :

g = tf.random.Generator.from_non_deterministic_state()
x = g.uniform((10,))
y = g.uniform((10,))
tf.print(x)
tf.print(y)
  • thank you a lot for workaround, but I hope next versions of Metal PluggableDevice will address this issue

Add a Comment

I have the same problem with TensorFlow-metal-0.3.0 and python3.9 running DCGAN. The solutions converge to an almost identical picture that does not resemble a digit. I have tried several times with up to 100 Epochs. It never worked correctly. I have  MacBook Pro M1 2020 with the system version 12.1. The problem seems to be specific to version 12.1 of the operating system

  • yes. tf.random is broken on 12.1

Add a Comment

This issue has been addressed and fixed in tensorflow-metal==0.5.0.

  • Weird, I was already using tensorflow-metal==0.5.0 but it still happens.

  • I agree with davystrong, this still happens to me

Add a Comment

For me it still occurs on Monterey 12.3.1 with newest versions:

tensorflow-metal==0.5.0
tensorflow-macos=2.9.2

For example this code will still always print the same values:

import tensorflow as tf

class CustomLayer(tf.keras.layers.Layer):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
    
    def call(self, x, training):
        a = tf.random.uniform([])
        tf.print(a)
        return x
    
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test, y_train, y_test = x_train[:10], x_test[:10], y_train[:10], y_test[:10]
model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)),
                                    tf.keras.layers.Dense(128, activation='relu'),
                                    CustomLayer(),
                                    tf.keras.layers.Dropout(0.2),
                                    tf.keras.layers.Dense(10) ])
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer='adam', loss=loss_fn, metrics=['accuracy'])
model.fit(x_train, y_train, epochs=1, batch_size=1) 

I am new to deep learning, so at first I thought it must be an error with my code. After testing with a code example I know to work, I discovered I have an issue very similar to this. The produced images are nonsensical - often some noise centred around the middle of the image. I am using the latest version of TensorFlow(2.9.2) and the metal plugin (0.5.0), and macOS Monterey (12.3).

The code I used is here https://github.com/PacktPublishing/Deep-Learning-with-TensorFlow-2-and-Keras/blob/master/Chapter%206/VanillaGAN.ipynb Below are images of the result using the Metal plugin at epochs 1 & 5 (larger epochs have also been tested)

Running this code in an environment using standard TensorFlow (without the macOS/metal plugin) does not produce the same error. It also works on Google Colab fine.