MPSGraph adamUpdateWithLearningRateTensor:beta1Tensor:beta2Tensor:epsilonTensor:beta1PowerTensor:beta2PowerTensor:valuesTensor:momentumTensor:velocityTensor:gradientTensor:name:]: unrecognized selector sent to instance 0x60000f3adf80

Hi,

I am running tensorflow-macos 2.11 and tensorflow-metal 0.7.0 on my Intel MacBook Pro. I noted that we needed to use tf.keras.optimizers.legacy.Adam because of the XLA changes in the TensorFlow code.

However, I get the following error:

2022-12-15 00:03:38.306 python[9202:427797] -[MPSGraph adamUpdateWithLearningRateTensor:beta1Tensor:beta2Tensor:epsilonTensor:beta1PowerTensor:beta2PowerTensor:valuesTensor:momentumTensor:velocityTensor:gradientTensor:name:]: unrecognized selector sent to instance 0x60000f3adf80

When I change to tf.keras.optimizers.legacy.SGD it works.

Previously I was training on tensorflow-macos 2.10 and tensorflow-metal 0.6.0 and using tf.keras.optimizers.Adam works fine.

Looks like the Adam optimiser under tensorflow-metal 0.7.0 is not working?

Hi @ianlokh!

Could you provide a small script to reproduce this and check which version of the OS you are running your Intel MacBook Pro on? Mostly I'm interested in seeing the line where you define the Adam optimizer to see which parameters are passed to it. This does seem like at least some variant in the Adam optimizer has a bug that needs to be fixed on the metal plugin side, on top of the obvious one of the missing XLA path you mentioned.

Hi,

Thanks for the reply. I am using Ventura 13.1. Please find the code snippet here.

model.compile(
    optimizer=tf.keras.optimizers.legacy.Adam(learning_rate=0.001),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=['accuracy']
)

Hi

I tried to upload the sample script but somehow wasn't able to. I am copying and pasting the sample script here


import os
import tensorflow as tf
import tensorflow_datasets as tfds

BATCH_SIZE = 256
EPOCHS = 100
AUTOTUNE = tf.data.AUTOTUNE

def set_gpu(gpu_ids_list):
    gpus = tf.config.list_physical_devices('GPU')
    if gpus:
        try:
            gpus_used = [gpus[i] for i in gpu_ids_list]
            tf.config.set_visible_devices(gpus_used, 'GPU')
            for gpu in gpus_used:
                tf.config.experimental.set_memory_growth(gpu, True)
            logical_gpus = tf.config.experimental.list_logical_devices('GPU')
            print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPU")
        except RuntimeError as e:
            # Visible devices must be set before GPUs have been initialized
            print(e)


set_gpu([0])


# load MNIST
print('\ndownloading mnist')
(ds_train, ds_test), ds_info = tfds.load(
    'mnist',
    split=['train', 'test'],
    shuffle_files=True,
    as_supervised=True,
    with_info=True,
)

def normalize_img(image, label):
    """Normalize images: 'unit8' -> 'float32'."""
    return tf.cast(image, tf.float32) / 255., label


ds_train = ds_train.map(normalize_img, num_parallel_calls=AUTOTUNE)
ds_train = ds_train.cache()
ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples)
ds_train = ds_train.batch(BATCH_SIZE, num_parallel_calls=AUTOTUNE)
ds_train = ds_train.prefetch(AUTOTUNE)

ds_test = ds_test.map(normalize_img, num_parallel_calls=AUTOTUNE)
ds_test = ds_test.cache()
ds_test = ds_test.batch(BATCH_SIZE, num_parallel_calls=AUTOTUNE)

print('\ncreate and compile model')

model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(16, 3, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(32, 3, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(64, 3, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(10)
])

model.compile(
    optimizer=tf.keras.optimizers.legacy.Adam(learning_rate=0.001),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=['accuracy']
)

model.fit(ds_train, epochs=EPOCHS, validation_data=ds_test, verbose=1)

We have identified and created a fix for the issue. This fix will be included in tensorflow-metal==0.7.1 which we'll aim to release soon. The issue should only happen on Python 3.8 on Ventura so a workaround would be to create a Python 3.9 (or 3.10) environment in the mean while.

MPSGraph adamUpdateWithLearningRateTensor:beta1Tensor:beta2Tensor:epsilonTensor:beta1PowerTensor:beta2PowerTensor:valuesTensor:momentumTensor:velocityTensor:gradientTensor:name:]: unrecognized selector sent to instance 0x60000f3adf80
 
 
Q