Tensorflow on M1 Macbook Pro, error when model fit executes

It doesn't matter if I install miniforge or mamba, directly or through brew, when I try to fit the sample model from https://developer.apple.com/metal/tensorflow-plugin/, even with a simple sequential model, I always get this error.

Is there any workaround on this? I'll appreciate any help, thanks!

2022-12-10 11:18:19.941623: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz 2022-12-10 11:18:20.427283: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled. 2022-12-10 11:18:21.222950: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.223003: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.363366: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.364757: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.388739: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.388757: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90

NotFoundError Traceback (most recent call last) Cell In[25], line 2 1 model = create_model() ----> 2 history = model.fit(Xf_train, yf_train, epochs=3, batch_size=64);

File /opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/utils/traceback_utils.py:70, in filter_traceback..error_handler(*args, **kwargs) 67 filtered_tb = _process_traceback_frames(e.traceback) 68 # To get the full stack trace, call: 69 # tf.debugging.disable_traceback_filtering() ---> 70 raise e.with_traceback(filtered_tb) from None 71 finally: 72 del filtered_tb

File /opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/tensorflow/python/eager/execute.py:52, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name) 50 try: 51 ctx.ensure_initialized() ---> 52 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, 53 inputs, attrs, num_outputs) 54 except core._NotOkStatusException as e: 55 if name is not None:

NotFoundError: Graph execution error:

Detected at node 'StatefulPartitionedCall_4' defined at (most recent call last): File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/ipykernel_launcher.py", line 17, in app.launch_new_instance() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/traitlets/config/application.py", line 992, in launch_instance app.start() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/ipykernel/kernelapp.py", line 711, in start self.io_loop.start() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/tornado/platform/asyncio.py", line 215, in start self.asyncio_loop.run_forever() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/asyncio/base_events.py", line 603, in run_forever self._run_once() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/asyncio/base_events.py", line 1899, in _run_once handle._run() ...

File "/var/folders/f9/bp40pn0d401d974fy48dxm8h0000gn/T/ipykernel_63636/3393788193.py", line 2, in <module>
  history = model.fit(Xf_train, yf_train, epochs=3, batch_size=64);
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
  return fn(*args, **kwargs)
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/engine/training.py", line 1650, in fit
  tmp_logs = self.train_function(iterator)
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/engine/training.py", line 1249, in train_function
  return step_function(self, iterator)
......

File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/engine/training.py", line 1222, in run_step
  outputs = model.train_step(data)
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/engine/training.py", line 1027, in train_step
  self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize
  self.apply_gradients(grads_and_vars)
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients
  return super().apply_gradients(grads_and_vars, name=name)
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 634, in apply_gradients
  iteration = self._internal_apply_gradients(grads_and_vars)
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1166, in _internal_apply_gradients
  return tf.__internal__.distribute.interim.maybe_merge_call(
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1216, in _distributed_apply_gradients_fn
  distribution.extended.update(
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1211, in apply_grad_to_update_var
  return self._update_step_xla(grad, var, id(self._var_key(var)))

Node: 'StatefulPartitionedCall_4' could not find registered platform with id: 0x28edf1f90 [[{{node StatefulPartitionedCall_4}}]] [Op:__inference_train_function_1241]

Answered by dweilert in 739446022

I dropped back to the following versions: tensorflow-macos==2.9 and tensorflow-metal==0.5.0. Was using the tensorflow-macos==2.11 and tensorflow-metal==0.7.0 version and just couldn't get things to work. After dropping back I was able to use the GPU and all my validations worked. I'll check back later to see if a more current version will worl.

Hello!

I followed the advice posted here when building a BERT text classifier in tensor flow. I'm using tf and tf-text 2.9.0, and tf-metal 0.5.0

My model looks as follows:

text_input = tf.keras.layers.Input(shape=(), dtype=tf.string, name = "text") encoder_inputs = bert_preprocess(text_input) outputs = bert_encoder(encoder_inputs) bert_embeds = outputs["pooled_output"] intermediate_layer = tf.keras.layers.Dense(512, activation = "relu", name = "intermediate_layer")(bert_embeds) dropout_layer = tf.keras.layers.Dropout(0.1, name = "dropout_layer")(intermediate_layer) # neurons are dropped at rate 0.1 to prevent overfitting output_layer = tf.keras.layers.Dense(1, activation = "sigmoid", name = "output_layer")(dropout_layer) model = tf.keras.Model(text_input, output_layer)

However, I'm getting the following error when fitting the model:

2 root error(s) found. (0) NOT_FOUND: No registered 'AddN' OpKernel for 'GPU' devices compatible with node {{node model_13/keras_layer_1/StatefulPartitionedCall/StatefulPartitionedCall/StatefulPartitionedCall/bert_pack_inputs/PartitionedCall/RaggedConcat/ArithmeticOptimizer/AddOpsRewrite_Leaf_0_add_2}} (OpKernel was found, but attributes didn't match) Requested Attributes: N=2, T=DT_INT64, _XlaHasReferenceVars=false, _grappler_ArithmeticOptimizer_AddOpsRewriteStage=true, _device="/job:localhost/replica:0/task:0/device:GPU:0" . Registered: device='XLA_CPU_JIT'; T in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, 16534343205130372495, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64, DT_VARIANT] device='GPU'; T in [DT_FLOAT] device='DEFAULT'; T in [DT_INT32] device='CPU'; T in [DT_UINT64] device='CPU'; T in [DT_INT64] device='CPU'; T in [DT_UINT32] device='CPU'; T in [DT_UINT16] device='CPU'; T in [DT_INT16] device='CPU'; T in [DT_UINT8] device='CPU'; T in [DT_INT8] device='CPU'; T in [DT_INT32] device='CPU'; T in [DT_HALF] device='CPU'; T in [DT_BFLOAT16] device='CPU'; T in [DT_FLOAT] device='CPU'; T in [DT_DOUBLE] device='CPU'; T in [DT_COMPLEX64] device='CPU'; T in [DT_COMPLEX128] device='CPU'; T in [DT_VARIANT]

Do I interpret the error code correctly (which is the same as the error code shared by the OP), that only int 64 is supported currently by the AddN Op Kernel? However, this would conflict with the bert_embeds pooled_output which is in floats (I tried forcing to 64 float, but that didn't solve the issue). My outcome variable is binary, I forced it to int64.

Any help would be appreciated. My model works when I run it on CPU, but is just quite slow.

Thanks and best Amin

I just upgraded to Sonoma the latest Mac OS and my fit method start giving me weird warning and errors and the training is very slow! Unusable 2023-09-28 10:50:29.393958: I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M1 Pro 2023-09-28 10:50:29.394708: I metal_plugin/src/device/metal_device.cc:296] systemMemory: 16.00 GB 2023-09-28 10:50:29.394896: I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 5.33 GB 2023-09-28 10:50:29.395661: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:303] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2023-09-28 10:50:29.395694: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:269] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>) Training a brand new model - [13, 23, 42, 70]-Long 2023-09-28 10:50:35.795666: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled. 2023-09-28 10:50:35.803057: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled. Epoch 1/10 2023-09-28 10:50:37.043038: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.

After downgrading to legacy version the speed came back and works again. I then tried to upgrade back to the latest libraries and plug iin and I am getting the following warnings: 2023-09-28 22:02:25.250960: I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M1 Pro 2023-09-28 22:02:25.250990: I metal_plugin/src/device/metal_device.cc:296] systemMemory: 16.00 GB 2023-09-28 22:02:25.250999: I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 5.33 GB 2023-09-28 22:02:25.251066: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:303] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2023-09-28 22:02:25.251101: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:269] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)

Every epoc -- or every Fit or Predict displays this line now: 2023-09-28 22:05:11.530865: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.

Any idea what's going on and to get things back to optimal performance? Its is still about 5X the speed of the legacy versions

So - after downgrading to an older version of TensorFlow-Metal and TensorFlow-MacOs it works back in the desired speed (although a few annoying warning messages) conda install -c apple tensorflow-deps==2.9.0 python -m pip install tensorflow-macos==2.9.0 python -m pip install tensorflow-metal==0.5.0 That did the trick

But the annoying warning messages are as follows - if anyone has an idea how to fix I am running on Mac OS Sonoma

819/819 [==============================] - ETA: 0s - loss: 0.7380 - auc: 0.7107 - prc: 0.63482023-09-28 22:20:35.837423: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled. loc("mps_select"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/75428952-3aa4-11ee-8b65-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":294:0)): error: 'anec.gain_offset_control' op result #0 must be 4D/5D memref of 16-bit float or 8-bit signed integer or 8-bit unsigned integer values, but got 'memref<1x64x1x1xi1>' loc("mps_select"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/75428952-3aa4-11ee-8b65-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":294:0)): error: 'anec.gain_offset_control' op result #0 must be 4D/5D memref of 16-bit float or 8-bit signed integer or 8-bit unsigned integer values, but got 'memref<1x64x1x1xi1>' loc("mps_select"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/75428952-3aa4-11ee-8b65-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":294:0)): error: 'anec.gain_offset_control' op result #0 must be 4D/5D memref of 16-bit float or 8-bit signed integer or 8-bit unsigned integer values, but got 'memref<1x1x1x1999xi1>' loc("mps_select"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/75428952-3aa4-11ee-8b65-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":294:0)): error: 'anec.gain_offset_control' op result #0 must be 4D/5D memref of 16-bit float or 8-bit signed integer or 8-bit unsigned integer values, but got 'memref<1x40x1x1xi1>' loc("mps_select"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/75428952-3aa4-11ee-8b65-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":294:0)): error: 'anec.gain_offset_control' op result #0 must be 4D/5D memref of 16-bit float or 8-bit signed integer or 8-bit unsigned integer values, but got 'memref<1x40x1x1xi1>'

Tensorflow on M1 Macbook Pro, error when model fit executes
 
 
Q