Issues and Errors when runing tensorflow on GPU, but not CPU.

I was following the text classification tutorial below, and using tensorflow-macos 2.9.0. on Macbook M1.

https://www.tensorflow.org/text/tutorials/text_classification_rnn.

However, it ran into three issues,

  1. If GPU was enabled, the model fit was extremely slow, while disabling GPU would make the model fit faster.
  2. Error when fitting the model with GPU enabled. The model still kept running after showing the following messages, but very very slow.
W tensorflow/core/common_runtime/forward_type_inference.cc:332] Type inference failed. This indicates an invalid graph that escaped type checking. Error message: INVALID_ARGUMENT: expected compatible input types, but input 1:
type_id: TFT_OPTIONAL
args {
  type_id: TFT_PRODUCT
  args {
    type_id: TFT_TENSOR
    args {
      type_id: TFT_INT32
    }
  }
}
 is neither a subtype nor a supertype of the combined inputs preceding it:
type_id: TFT_OPTIONAL
args {
  type_id: TFT_PRODUCT
  args {
    type_id: TFT_TENSOR
    args {
      type_id: TFT_FLOAT
    }
  }
}
  1. Two results were supposed to be identical, but they were identical when GPU was disabled. When GPU is enabled, they were not.

When GPU is enabled

To confirm that this works as expected, evaluate a sentence twice. First, alone so there's no padding to mask:

1/1 [==============================] - 1s 1s/step
[0.00808082]
Now, evaluate it again in a batch with a longer sentence. The result should be identical:

1/1 [==============================] - 16s 16s/step
[-0.01561341]

When GPU is disabled

(First run as above)
1/1 [==============================] - 1s 1s/step
[-0.0032991]
(second run)
1/1 [==============================] - 0s 71ms/step
[-0.0032991]
Add a Comment

Replies

Hello,

I have the exact same issue when training any RNN (I tried both LSTMs and GRUs) model with my MBP16 with M1 Max 32C.

I get the same exception/warning, and the performance with the GPU is horrible. Disabling the CPU results in no warning, and better performance. I don't have the same issue with CNNs.

All of this is with the latest tensorflow-macos (2.10) and tensorflow-metal 0.6.