Hi,
I am reliably able to get the following results after running pip install tensorflow-metal
. Note I did not cull anything (including some device registration messages that only appear the first time you use tensorflow - hopefully not too distracting, but thought it would provide helpful context about my environment in case something is fishy).
Python 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:14)
[Clang 12.0.1 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.config.list_physical_devices('GPU')
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
>>> tf.zeros_like([1])
Metal device set to: Apple M1
systemMemory: 8.00 GB
maxCacheSize: 2.67 GB
2022-06-05 18:54:29.515755: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-06-05 18:54:29.516007: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/homebrew/Caskroom/miniforge/base/envs/ml/lib/python3.8/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/opt/homebrew/Caskroom/miniforge/base/envs/ml/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 7164, in raise_from_not_ok_status
raise core._status_to_exception(e) from None # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: Multiple Default OpKernel registrations match NodeDef '{{node ZerosLike}}': 'op: "ZerosLike" device_type: "DEFAULT" constraint { name: "T" allowed_values { list { type: DT_INT32 } } } host_memory_arg: "y"' and 'op: "ZerosLike" device_type: "DEFAULT" constraint { name: "T" allowed_values { list { type: DT_INT32 } } } host_memory_arg: "y"' [Op:ZerosLike]
Whereas after uninstalling tensorflow-metal (pip uninstall tensorflow-metal
) the same commands produce:
Python 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:14)
[Clang 12.0.1 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.config.list_physical_devices('GPU')
[]
>>> tf.zeros_like([1])
<tf.Tensor: shape=(1,), dtype=int32, numpy=array([0], dtype=int32)>
It looks like a simple double registration issue, but I've only just found out about the 'PluggableDevice' API, so I don't know if it has recommendations for resolving multiple registrations. If I had to guess it is unexpected in the extreme for a pluggable device extension to contain default device op registrations, but without being able to see the code I cannot guess further about what might be wrong.