training accuracy using GPU worse than using CPU on simple CNN (M1 MAX)

I'm trying to train a trivial example of a CNN using cifar10 dataset on CPU vs GPU. Although GPU is much faster, the accuracies and losses behave really strange:

Accuracy and validation accuracy on CPU looks like follows:

Training Accuracy on GPU looks like this:

Package Versions:

python = "3.11.4"
tensorflow = "2.13.0"
tensorflow-macos = "2.13.0"
tensorflow-metal = "1.0.1"

The code:

import tensorflow as tf

# with tf.device("/cpu:0"): # uncomment and indent the following to run on CPU

from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt

(train_images, train_labels), (
    test_images,
    test_labels,
) = datasets.cifar10.load_data()
# Normalize pixel values to be between 0 and 1
train_images, test_images = train_images / 255.0, test_images / 255.0

class_names = [
    "airplane",
    "automobile",
    "bird",
    "cat",
    "deer",
    "dog",
    "frog",
    "horse",
    "ship",
    "truck",
]

plt.figure(figsize=(10, 10))
for i in range(25):
    plt.subplot(5, 5, i + 1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(train_images[i])
    # The CIFAR labels happen to be arrays,
    # which is why you need the extra index
    plt.xlabel(class_names[train_labels[i][0]])
plt.show()

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation="relu", input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation="relu"))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation="relu"))

model.summary()

model.add(layers.Flatten())
model.add(layers.Dense(64, activation="relu"))
model.add(layers.Dense(10))

model.summary()

model.compile(
    optimizer="adam",
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=["accuracy"],
)

history = model.fit(
    train_images,
    train_labels,
    epochs=20,
    batch_size=64,
    validation_data=(test_images, test_labels),
)

plt.plot(history.history["accuracy"], label="accuracy")
plt.plot(history.history["val_accuracy"], label="val_accuracy")
plt.xlabel("Epoch")
plt.ylabel("Accuracy")
plt.ylim([0, 1])
plt.legend(loc="lower right")

test_loss, test_acc = model.evaluate(
    test_images, test_labels, batch_size=64, verbose=2
)

Hope this helps for reconstruction.

In a crazy coincidence, while trying to debug the same issue, I came across your post, and replicated it with your code.

I was running TF 2.15. Uupgraded to 2.16 - And retested this. It seems this is kinda fixed in tensorflow 2.16.

I'm new getting pretty much identical accuracy and loss between the CPU and GPU, but the performance is lower than before on GPU. Depending on test, from around 25% faster to (in your test case above) around 50% faster when using the GPU vs CPU. This is on an m2 max.

Though I can't be sure, it does seem like the GPU performance is no where near as high as it was before. But at least it's more accurate. (Maybe they're defaulting back to CPU for whatever GPU operations were causing the issue before?)

Just an update: In other testing I'm getting almost 3x performance on the GPU vs CPU on tf2.16, so it's not as bad as it looked yesterday. might even be more than this on larger models than the simple test ones I'm learning from.

training accuracy using GPU worse than using CPU on simple CNN (M1 MAX)
 
 
Q