Inconsistent results when performing inference in CPU vs Metal

Hello, everyone,

I have been testing tensorflow-metal in my 2020 Macbook Pro (M1) running macOS 12.0.1 by performing the inference of a pre-trained model on a known dataset.

To my surprise, Tensorflow produces different (wrong) results when performing the inference using the Metal pluggable device GPU vs when performing it in the CPU.

I might very well be doing something wrong, but my test program is fairly simple:

#!/usr/bin/env python3

import pathlib
import numpy as np
import tensorflow as tf
from tensorflow import keras


def main(model_path, dataset_path):
    # Print some system info
    print('Tensorflow configuration:')
    print(f'\tVersion: {tf.__version__}')
    print('\tDevices usable by Tensorflow:')
    for device in tf.config.get_visible_devices():
        print(f'\t\t{device}')

    # Load the model & the input data
    model = keras.models.load_model(model_path)
    matrix_data = np.genfromtxt(dataset_path)
    matrix_data = matrix_data.reshape([1, matrix_data.shape[0], matrix_data.shape[1]])

    # Perform inference in CPU
    with tf.device('/CPU:0'):
        prediction = model.predict(matrix_data)[1]
        print('Model Evaluation on CPU')
        print(f'\tPrediction: {prediction[0, 0]}')

    # Perform inference in GPU
    with tf.device('/GPU:0'):
        prediction = model.predict(matrix_data)[1]
        print('Model Evaluation on GPU')
        print(f'\tPrediction: {prediction[0, 0]}')


if __name__ == "__main__":
    main('model/model.h5', 'dataset/01.csv')

The CPU path produces a result of 4.890502452850342 and this is coherent with the results I'm seeing in Ubuntu Linux using CPU & GPU (CUDA) based inference. The GPU code path results in a prediction of 3.1839447021484375, which is way off.

I have set up a GitLab repo with all the resources required for replicating the problem here

This is quite concerning for me, since the big difference in results is something that I was not expecting and -if confirmed- makes me not trust the results provided by the Metal backend.

Am I doing something wrong? Is there any place where I can report this as a bug?

Accepted Reply

Hi @josebagar,

Thanks for reporting this issue. I am able reproduce it locally and have triaged it to the Conv1D layer when running on the GPU with certain combinations of the input parameters. We will update here once we have a solution for the problem.

  • Great to know that the issue is being addressed. Please keep us posted on any updates.

    Kind regards, Joseba

  • I've checked that, once I update to tensorflow 2.8.0 tensorflow provides the correct results. I've updated the repo sample to reflect this.

Add a Comment

Replies

I tested your code with the latest version:

  • tensorflow-mac==2.7.0
  • tensorflow-metal==0.3.0

The bug still is there:

CPU Prediction: 4.890502452850342

GPU Prediction: 3.1839447021484375

Indeed, very concerning!

Hopefully someone from apple sees this!!

Ps: other reports:

  • It’s nice to know that I’m not the only one having this issue, thanks for the links.

    Yes, let’s hope that this gets addressed by Apple. I have tested some other, simpler, models and they work fine, but the Metal results cannot be trusted right now.

  • That is disappointing. I thought perhaps the lack of response was because they were working on a fix for the new 0.3 release version. Seems they weren’t.

  • I've updated the example code to use Tensorflow 2.7.0, just to keep it up to date. As you say, there is no change in results.

    As a note in case someone from Apple sees this, the model includes an Attention layer. Here's the model summary:

    input_1 (InputLayer)conv1d (Conv1D)conv1d_1 (Conv1D)conv1d_2 (Conv1D)conv1d_transpose (Conv1DTranspose)conv1d_3 (Conv1D)conv1d_transpose_1 (Conv1DTranspose)flatten (Flatten)conv1d_transpose_2 (Conv1DTranspose)attention_layer (Attention)conv1d_transpose_3 (Conv1DTranspose)flatten_1 (Flatten)activation (Activation)dense (Dense)

Add a Comment

Hi @josebagar,

Thanks for reporting this issue. I am able reproduce it locally and have triaged it to the Conv1D layer when running on the GPU with certain combinations of the input parameters. We will update here once we have a solution for the problem.

  • Great to know that the issue is being addressed. Please keep us posted on any updates.

    Kind regards, Joseba

  • I've checked that, once I update to tensorflow 2.8.0 tensorflow provides the correct results. I've updated the repo sample to reflect this.

Add a Comment