Issue with Tensorflow 2.14 on MacOS: No registered 'ExpandDims' OpKernel for 'GPU' devices compatible with node {{node StringSplit/stack}}

Working Environment

  • MacBook Pro 14' with M2-Pro chip
  • macOS Sonoma 14.0
  • Python 3.11.4
  • tensorflow 2.14.0, tensorflow-macos 2.14.0, tensorflow-metal 1.1.0

Issue Description

Hi there! I met an issue when working around with Keras' TextVectorization preprocessing layer.

text_vectorization = keras.layers.TextVectorization(output_mode="tf_idf")
text_vectorization.adapt(ds.map(lambda x: x['title']))

The inputs are string contents. And here is the trackback:

---------------------------------------------------------------------------
NotFoundError                             Traceback (most recent call last)
/Users/ken/Workspaces/MLE101/tfrs101/preprocess.ipynb Cell 13 line 3
      1 # with tf.device('/CPU:0'):
      2 text_vectorization = keras.layers.TextVectorization(output_mode="tf_idf")
----> 3 text_vectorization.adapt(ds.map(lambda x: x['title']))

File ~/miniconda3/envs/ds-101/lib/python3.11/site-packages/keras/src/layers/preprocessing/text_vectorization.py:473, in TextVectorization.adapt(self, data, batch_size, steps)
    423 def adapt(self, data, batch_size=None, steps=None):
    424     """Computes a vocabulary of string terms from tokens in a dataset.
    425 
    426     Calling `adapt()` on a `TextVectorization` layer is an alternative to
   (...)
    471           argument is not supported with array inputs.
    472     """
--> 473     super().adapt(data, batch_size=batch_size, steps=steps)

File ~/miniconda3/envs/ds-101/lib/python3.11/site-packages/keras/src/engine/base_preprocessing_layer.py:258, in PreprocessingLayer.adapt(self, data, batch_size, steps)
    256 with data_handler.catch_stop_iteration():
    257     for _ in data_handler.steps():
--> 258         self._adapt_function(iterator)
    259         if data_handler.should_sync:
    260             context.async_wait()

File ~/miniconda3/envs/ds-101/lib/python3.11/site-packages/tensorflow/python/util/traceback_utils.py:153, in filter_traceback.<locals>.error_handler(*args, **kwargs)
    151 except Exception as e:
    152   filtered_tb = _process_traceback_frames(e.__traceback__)
--> 153   raise e.with_traceback(filtered_tb) from None
    154 finally:
    155   del filtered_tb

File ~/miniconda3/envs/ds-101/lib/python3.11/site-packages/tensorflow/python/eager/execute.py:60, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     53   # Convert any objects of type core_types.Tensor to Tensor.
     54   inputs = [
     55       tensor_conversion_registry.convert(t)
     56       if isinstance(t, core_types.Tensor)
     57       else t
     58       for t in inputs
     59   ]
---> 60   tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
     61                                       inputs, attrs, num_outputs)
     62 except core._NotOkStatusException as e:
     63   if name is not None:

NotFoundError: Graph execution error:

Detected at node StringSplit/stack defined at (most recent call last):

...

No registered 'ExpandDims' OpKernel for 'GPU' devices compatible with node {{node StringSplit/stack}}
	 (OpKernel was found, but attributes didn't match) Requested Attributes: T=DT_STRING, Tdim=DT_INT32, _XlaHasReferenceVars=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"
	.  Registered:  device='XLA_CPU_JIT'; Tdim in [DT_INT32, DT_INT64]; T in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, DT_INT8, DT_COMPLEX64, DT_INT64, DT_BOOL, DT_QINT8, DT_QUINT8, DT_QINT32, DT_BFLOAT16, DT_UINT16, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64, DT_FLOAT8_E5M2, DT_FLOAT8_E4M3FN]
  device='DEFAULT'; T in [DT_HALF]; Tdim in [DT_INT32]
  device='DEFAULT'; T in [DT_HALF]; Tdim in [DT_INT64]
  device='DEFAULT'; T in [DT_BFLOAT16]; Tdim in [DT_INT32]
  device='DEFAULT'; T in [DT_BFLOAT16]; Tdim in [DT_INT64]
  device='DEFAULT'; T in [DT_FLOAT]; Tdim in [DT_INT32]
  device='DEFAULT'; T in [DT_FLOAT]; Tdim in [DT_INT64]
  device='DEFAULT'; T in [DT_DOUBLE]; Tdim in [DT_INT32]
  device='DEFAULT'; T in [DT_DOUBLE]; Tdim in [DT_INT64]
  device='DEFAULT'; T in [DT_UINT64]; Tdim in [DT_INT32]
  device='DEFAULT'; T in [DT_UINT64]; Tdim in [DT_INT64]
  device='DEFAULT'; T in [DT_INT64]; Tdim in [DT_INT32]
  device='DEFAULT'; T in [DT_INT64]; Tdim in [DT_INT64]
  device='DEFAULT'; T in [DT_UINT32]; Tdim in [DT_INT32]
  device='DEFAULT'; T in [DT_UINT32]; Tdim in [DT_INT64]
  device='DEFAULT'; T in [DT_UINT16]; Tdim in [DT_INT32]
  device='DEFAULT'; T in [DT_UINT16]; Tdim in [DT_INT64]
  device='DEFAULT'; T in [DT_INT16]; Tdim in [DT_INT32]
  device='DEFAULT'; T in [DT_INT16]; Tdim in [DT_INT64]
  device='DEFAULT'; T in [DT_UINT8]; Tdim in [DT_INT32]
  device='DEFAULT'; T in [DT_UINT8]; Tdim in [DT_INT64]
  device='DEFAULT'; T in [DT_INT8]; Tdim in [DT_INT32]
  device='DEFAULT'; T in [DT_INT8]; Tdim in [DT_INT64]
  device='DEFAULT'; T in [DT_COMPLEX64]; Tdim in [DT_INT32]
  device='DEFAULT'; T in [DT_COMPLEX64]; Tdim in [DT_INT64]
  device='DEFAULT'; T in [DT_COMPLEX128]; Tdim in [DT_INT32]
  device='DEFAULT'; T in [DT_COMPLEX128]; Tdim in [DT_INT64]
  device='DEFAULT'; T in [DT_BOOL]; Tdim in [DT_INT32]
  device='DEFAULT'; T in [DT_BOOL]; Tdim in [DT_INT64]
  device='DEFAULT'; T in [DT_INT32]; Tdim in [DT_INT32]
  device='DEFAULT'; T in [DT_INT32]; Tdim in [DT_INT64]
  device='CPU'; Tdim in [DT_INT32]
  device='CPU'; Tdim in [DT_INT64]

	 [[StringSplit/stack]] [Op:__inference_adapt_step_71204]

I have to explicitly specify to use CPU to make it work -

with tf.device('/CPU:0'):
    text_vectorization = keras.layers.TextVectorization(output_mode="tf_idf")
    text_vectorization.adapt(ds.map(lambda x: x['title']))

I have referred to this post: https://developer.apple.com/forums/thread/700108