Performance issue on Macbook Pro M1

System information

  • Script can be found below
  • MacBook Pro M1 (Mac OS Big Sir (11.5.1))
  • TensorFlow installed from (source)
  • TensorFlow version (2.5 version) with Metal Support
  • Python version: 3.9
  • GPU model and memory: MacBook Pro M1 and 16 GB

Steps needed for installing Tensorflow with metal support. https://developer.apple.com/metal/tensorflow-plugin/

I am trying to train a model on Macbook Pro M1, but the performance is so bad and the train doesn't work properly. It takes a ridiculously long time just for a single epoch.

Code needed for reproducing this behavior.

import tensorflow as tf
from tensorflow.keras.datasets import imdb
from tensorflow.keras.layers import Embedding, Dense, LSTM
from tensorflow.keras.losses import BinaryCrossentropy
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Model configuration
additional_metrics = ['accuracy']
batch_size = 128
embedding_output_dims = 15
loss_function = BinaryCrossentropy()
max_sequence_length = 300
num_distinct_words = 5000
number_of_epochs = 5
optimizer = Adam()
validation_split = 0.20
verbosity_mode = 1

# Disable eager execution
tf.compat.v1.disable_eager_execution()

# Load dataset
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=num_distinct_words)
print(x_train.shape)
print(x_test.shape)

# Pad all sequences
padded_inputs = pad_sequences(x_train, maxlen=max_sequence_length, value = 0.0) # 0.0 because it corresponds with <PAD>
padded_inputs_test = pad_sequences(x_test, maxlen=max_sequence_length, value = 0.0) # 0.0 because it corresponds with <PAD>

# Define the Keras model
model = Sequential()
model.add(Embedding(num_distinct_words, embedding_output_dims, input_length=max_sequence_length))
model.add(LSTM(10))
model.add(Dense(1, activation='sigmoid'))

# Compile the model
model.compile(optimizer=optimizer, loss=loss_function, metrics=additional_metrics)

# Give a summary
model.summary()

# Train the model
history = model.fit(padded_inputs, y_train, batch_size=batch_size, epochs=number_of_epochs, verbose=verbosity_mode, validation_split=validation_split)

# Test the model after training
test_results = model.evaluate(padded_inputs_test, y_test, verbose=False)
print(f'Test results - Loss: {test_results[0]} - Accuracy: {100*test_results[1]}%')

I have noticed this same problem with LSTM layers

Also, this issue is been reported in Keras and they can't debug.

Keras issue https://github.com/keras-team/keras/issues/15003

Post not yet marked as solved Up vote post of OriAlpha Down vote post of OriAlpha
3.7k views
  • I tried for few hours, due to slow training I only trained for 1 epoch, this is a log

    2021-07-26 23:09:28.130352: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled. 2021-07-26 23:09:28.185390: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled. 2021-07-26 23:09:28.217406: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled. 2021-07-26 23:09:28.229984: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled. Epoch 1/1 20000/20000 [==============================] - loss: 0.5489 - accuracy: 0.6923 --- 6894.8485770225524902 seconds ---

    Just for one epoch, it takes around 2 hours that's a nightmare

Add a Comment

Replies

It is not fair to achieve TensorFlow repo, before fixing issues of code

Hi @OriAlpha, We recommend users to upgrade to 12.0 for best support and performance of Metal plugin. I tried the attached script with MacOS 12.0 on a M1 machine and Tensorflow-metal==0.1.2 (I recommend updating to latest metal plugin version). And I got following performance. Please let us know if that helps.

2021-08-24 23:20:50.927094: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.

157/157 [==============================] - 46s 271ms/step - loss: 0.6877 - accuracy: 0.5416 - val_loss: 0.6579 - val_accuracy: 0.6034

Epoch 2/5

157/157 [==============================] - 38s 243ms/step - loss: 0.5634 - accuracy: 0.7459 - val_loss: 0.4508 - val_accuracy: 0.8192

Epoch 3/5

157/157 [==============================] - 38s 244ms/step - loss: 0.4140 - accuracy: 0.8303 - val_loss: 0.3805 - val_accuracy: 0.8410

Epoch 4/5

157/157 [==============================] - 38s 245ms/step - loss: 0.3474 - accuracy: 0.8609 - val_loss: 0.4135 - val_accuracy: 0.8380

Epoch 5/5

157/157 [==============================] - 39s 251ms/step - loss: 0.3075 - accuracy: 0.8814 - val_loss: 0.3535 - val_accuracy: 0.8554


  • Do I have to upgrade to MacOS 12.0 to fix this problem? Currently, 12.0 is still a beta version.

Add a Comment

I saw the same issue, over 7000 seconds per epoch and a lot of warning messages. Then I tried with tf.device("/gpu:0"). Each epoch takes about 38 seconds. However, then I tried with tf.device("/cpu:0"). Each epoch takes only about 7 seconds. So GPU performance is still awful.

I have not yet found a neural net architecture where the M1 GPU is faster than the CPU. For matrix multiplication, the GPU can be 9x faster, but this does not carry over to network training.

Based on other threads and on the comment above by an Apple engineer, it looks like the Apple team doesn't even realize how bad their TensorFlow speed is.

MacBook Air M1 (Mac OS 12 beta) TensorFlow version (2.5 version) with Metal Support Python version: 3.8 GPU model and memory: MacBook Air M1 and 16 GB

I have the exact same problem!! Started noticing really long training times for a simple BLSTM, and decided to test the above code. I'm also using MacBook Air M1 (Mac OS 12 beta) TensorFlow version (2.5 version) with Metal Support Python version: 3.9 GPU model and memory: MacBook Air M1 and 16 GB. This completely undermines my work! Apple should do something!

Yep for me both CPU and GPU performance are not good at all, a relatively simple CNN on a free google colab (with a K80) took about 7 minutes to train, while this same model took about 30minutes on GPU and 42 on CPU in tf 2.6 on my mac mini m1 16gb.

I have seen multiple posts of people experiencing the same issue and the solution always seems to be that you need to upgrade to 12.0 or use CPU (for smaller batch sizes), which both don't seem to fix the issue at hand for most cases.

I would really expect Apple to come up some solution to this, it has been a year since this m1 model was released and I am paying for 3 party notebooks while I would expect such an optimised machine for ML (according to the marketing) to be able to at least run tf at a similar pace as a free colab notebook.

Hello, Today, I stil getting the same issue in 2022.

it seems the problem has never been solved... I will start started un class on Tensor soon and getting something whitch is very slow like this, that is just so awful.

I don't have choice to use google collab..

Any new update for 22/12/2022 ?