[Apple M1]: I got No registered 'AddN' OpKernel for 'GPU' devices compatible with node while training my model

Hi! GPU acceleration lacks of M1 GPU support (only with this specific model), getting this message when trying to run a trained model on GPU:

NotFoundError: Graph execution error:

No registered 'AddN' OpKernel for 'GPU' devices compatible with node {{node model_3/keras_layer_3/StatefulPartitionedCall/StatefulPartitionedCall/StatefulPartitionedCall/roberta_pack_inputs/StatefulPartitionedCall/RaggedConcat/ArithmeticOptimizer/AddOpsRewrite_Leaf_0_add_2}}
	 (OpKernel was found, but attributes didn't match) Requested Attributes: N=2, T=DT_INT64, _XlaHasReferenceVars=false, _grappler_ArithmeticOptimizer_AddOpsRewriteStage=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"
	.  Registered:  device='XLA_CPU_JIT'; T in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, 16534343205130372495, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64, DT_VARIANT]
  device='GPU'; T in [DT_FLOAT]
  device='DEFAULT'; T in [DT_INT32]
  device='CPU'; T in [DT_UINT64]
  device='CPU'; T in [DT_INT64]
  device='CPU'; T in [DT_UINT32]
  device='CPU'; T in [DT_UINT16]
  device='CPU'; T in [DT_INT16]
  device='CPU'; T in [DT_UINT8]
  device='CPU'; T in [DT_INT8]
  device='CPU'; T in [DT_INT32]
  device='CPU'; T in [DT_HALF]
  device='CPU'; T in [DT_BFLOAT16]
  device='CPU'; T in [DT_FLOAT]
  device='CPU'; T in [DT_DOUBLE]
  device='CPU'; T in [DT_COMPLEX64]
  device='CPU'; T in [DT_COMPLEX128]
  device='CPU'; T in [DT_VARIANT]

	 [[model_3/keras_layer_3/StatefulPartitionedCall/StatefulPartitionedCall/StatefulPartitionedCall/roberta_pack_inputs/StatefulPartitionedCall/RaggedConcat/ArithmeticOptimizer/AddOpsRewrite_Leaf_0_add_2]] [Op:__inference_train_function_300451]

Post not yet marked as solved Up vote post of sm_96 Down vote post of sm_96
1.8k views

Replies

Hi,

Were you able to resolve the issue by any chance? I am facing the same error.

Thanks, Latha

Hi @sm_96!

The GPU support for AddN (and most ops) is not extended to 64-bit formats. So here node AddOpsRewrite_Leaf_0_add_2 is trying to call for the int64 version of the op while executing the graph on the GPU which causing the error you are seeing. The way around this is to either change that node to use the 32-bit data formats. Or if the 64-bit precision is essential the whole graph would need to be executed on the CPU.

Hi! Thank you very much. Indeed, I executed on the CPU, it took many hours but eventually I got quite good results.

Hello! I'm running into similar problems, trying to train a BERT text classifier in tensor flow. I'm using tf and tf-text 2.9.0, and tf-metal 0.5.0

My model looks as follows:

text_input = tf.keras.layers.Input(shape=(), dtype=tf.string, name = "text") encoder_inputs = bert_preprocess(text_input) outputs = bert_encoder(encoder_inputs) bert_embeds = outputs["pooled_output"] intermediate_layer = tf.keras.layers.Dense(512, activation = "relu", name = "intermediate_layer")(bert_embeds) dropout_layer = tf.keras.layers.Dropout(0.1, name = "dropout_layer")(intermediate_layer) # neurons are dropped at rate 0.1 to prevent overfitting output_layer = tf.keras.layers.Dense(1, activation = "sigmoid", name = "output_layer")(dropout_layer) model = tf.keras.Model(text_input, output_layer)

I'm getting the error when fitting the model. Do I interpret the error code correctly (which is the same as the error code shared by the OP), that only int 64 is supported currently by the AddN Op Kernel? However, this would conflict with the bert_embeds pooled_output which is in floats (I tried forcing to 64 float, but that didn't solve the issue). My outcome variable is binary, I forced it to int64.

Any help would be appreciated. My model works when I run it on CPU, but is just quite slow.

Thanks and best Amin