Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.

Device: MacBook Pro 16 M1 Max, 64GB running MacOS 12.0.1.

I tried setting up GPU Accelerated TensorFlow on my Mac using the following steps:

  1. Setup: XCode CLI / Homebrew/ Miniforge
  2. Conda Env: Python 3.9.5
  3. conda install -c apple tensorflow-deps
  4. python -m pip install tensorflow-macos
  5. python -m pip install tensorflow-metal
  6. brew install libjpeg
  7. conda install -y matplotlib jupyterlab
  8. In Jupyter Lab, I try to execute this code:
from tensorflow.keras import layers
from tensorflow.keras import models
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))
model.summary()

The code executes, but I get this warning, indicating no GPU Acceleration can be used as it defaults to a 0MB GPU. Error:

Metal device set to: Apple M1 Max
2021-10-27 08:23:32.872480: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2021-10-27 08:23:32.872707: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)

Anyone has any idea how to fix this? I came across a bunch of posts around here related to the same issue but with no solid fix. I created a new question as I found the other questions less descriptive of the issue, and wanted to comprehensively depict it. Any fix would be of much help.

  • Metal device set to: Apple M1

    systemMemory: 16.00 GB maxCacheSize: 5.33 GB

    2021-12-13 19:59:56.135942: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2021-12-13 19:59:56.136049: I tensorflow/core/common_runtime/pluggåable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: )

Add a Comment

Replies

ME TOO!!!!

me too....

# pip uninstall tensorflow-metal

  • Well yes that got rid of the error msg on macbook air m1.

  • Well then you aren't using the GPU. Only CPU Tensorflow. :P Ofcourse it gets rid of the error.

Add a Comment

MacOS with AMD GPU here. I am using tensorflow for metal as soon as it was launched, with GPU acceleration. Sometimes I get the same message (Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.). But, it still uses the GPU. You can check that by opening Activity Monitor, then pressing Cmd + 3 and Cmd + 4, which shows you GPU and CPU usage.

One potential problem for you, assuming it still is not benefitting from acceleration, could be that you are using Python 3.9. If I recall correctly, tensorflow for metal requires Python 3.8. That is what I am using, without any problems.

Hope the above can help you & others!

Post not yet marked as solved Up vote reply of dkga Down vote reply of dkga
  • it works on 3.9 too (not sure if it worked on 3.9 when you posted this a month ago. But as of today, it works)

  • The documentation states "Python 3.8 or later".

Add a Comment

ME TOO! And my kernal in jupyter also dies.

Okay I don't know if you guys faced this issue but with me the kernel also died and the GPU wasn't being used. And I discovered the issue and fixed it. Im on the M1 Macbook Air.

This arises with the latest tensorflow-metal package ver. 0.2.0.

Just install the last version 0.1.2. of metal and the GPU will be utlized. You will still get the warning though but the training will run utlizing the GPU.

  • Update: with tensorflow-metal version 0.1.1 that warning also vanishes.

    Hopefully this will be fixed in coming versions of metal.

  • Another Update: Metel v0.2.0 is built for MacOS 12 Monteray. So that version should work fine after updating to Monterey as reported by others.

  • but the last version 0.6.0, there also are a same warning,GPU can`t work.

I have this same issue with my new MacBook Pro 14 with M1 Max fully loaded.

I've tried creating a clean python 3.8 and 3.9 installation following instructions here and elsewhere. Tried downgrading my tensor flow-metal package. Just about every possible combination in clean environments and new installations.

Bottom line is that while the GPU "works", it runs about 5x slower than running on pure CPU. Aka if I uninstall tensorflow-metal package the same training that took say 11 sec with the package installed takes only about 2.5 sec without the metal package. You can also replicate same results with forcing tensorflow to run on CPU with the metal package installed.

Looking at Activity Monitor during run suggests that the M1 Max GPU is in fact loaded with the package installed. It just performs horribly poorly, in fact so badly as to be unusable. My working assumption is that this is not the intended performance, but a bug.

What's concerning is that no maintainer in any of the forums, whether it be tensorflow/keras, or whether it be Apple forums, has really acknowledged that this is a bug. Perhaps there's confusion between the different manifestation of the bug in Intel vs. older M1 vs. newer M1 Pro/Max, as well as the different operating systems involved.

So let me be unambiguously clear: none of the stuff listed above here, or in other threads, involving reinstallation, downgrading packages, etc. makes this work on my M1 Max properly.

  • models running faster on CPU than GPU is a very common occurence, it's not Mac specific. If there is a bug, identify it. saying that a cpu can be faster than gpu is not a bug, it's extremely common. It depend on the model, the cpu, the gpu, the input pipeline, etc ... I've also had the M1 CPU running faster than on a badass Nvidia Quadro or even a Nvidia P100. The M1 cpu is surprisingly good at this.

  • Completely agree with @jsvnyc Running on Mac mini M1, under Big-Sur in my application (deeplabcut) I was running at 97% GPU utilization. Once I updated to Monterey it runs at50 at best. I have tried variations of Tensorflow (2.7,2.6,2.5 etc, to no avail) the GPU usage is no at best half of what it was and this is directly reflected in the exe union times

  • I believe that when you run on the CPU, you also run on the ANE. When you run on the GPU, you just get the GPU. MLCompute.set_mlc_device can be set to 'cpu,' 'gpu,' or 'any.' Any seems to be the fastest using the CPU, GPU, and ANE. However, I think this will only work for inference.

Add a Comment

It is not working on M1 Max too although I have upgraded the latest version of MacOS 12 Monteray.

I got this same error on the M1 Macbook Air and solved it by changing the tensorflow-metal version to 0.1.1.

I also faced this issue. My OS version: Monterey (12.1) M1 Max CPU and 64GB RAM I could not solve the problem by re-installing.

It's a perfectly normal and harmless message on a M1. I have it too and my model & code works just fine.

2021-12-20 23:19:04.025952: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2021-12-20 23:19:04.026364: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
Metal device set to: Apple M1

systemMemory: 8.00 GB
maxCacheSize: 2.67 GB

__________________________________________________________________________________________________
2021-12-20 23:19:04.413489: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
Epoch 1/10
2021-12-20 23:19:04.723827: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
32/32 [==============================] - ETA: 0s - loss: 0.0256 - accuracy: 0.9605 - mae: 0.0933 - mse: 0.02562021-12-20 23:19:24.073636: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
32/32 [==============================] - 20s 608ms/step - loss: 0.0256 - accuracy: 0.9605 - mae: 0.0933 - mse: 0.0256 - val_loss: 0.0100 - val_accuracy: 0.9855 - val_mae: 0.0650 - val_mse: 0.0100
Epoch 2/10
32/32 [==============================] - 19s 585ms/step - loss: 0.0079 - accuracy: 0.9787 - mae: 0.0568 - mse: 0.0079 - val_loss: 0.0063 - val_accuracy: 0.9869 - val_mae: 0.0534 - val_mse: 0.0063
Epoch 3/10
32/32 [==============================] - 18s 575ms/step - loss: 0.0060 - accuracy: 0.9700 - mae: 0.0506 - mse: 0.0060 - val_loss: 0.0045 - val_accuracy: 0.9776 - val_mae: 0.0438 - val_mse: 0.0045
Epoch 4/10
....
  • How long did you take for training this (I assume this is classic MNIST) on GPU? I am curious cuz I had taken approximately 1mins to finish... Also, I am not quite understanding how to purely use CPU and force GPU quit the job?

  • tf.config.list_physical_devices() - to list available devices. You would see something like "/physical_device:CPU:0" in the list. Modify it to be "/CPU:0", and then execute your code like this: with tf.device("/CPU:0"):. I've managed to speed up my training x20, although by default it could be seen that tf uses cpu anyway (it is seen in Tensorboard profiler logs). I guess explicitly optin in for only cpu saved TF some energy on making decisions about allocation. This in itself sounds like a bug - that explicitly stating the device works much faster than just default training

  • It would be nice if when engineers thought up error messages they considered that people might waste hours trying to solve what looks like a problem.

Add a Comment

Metal device set to: AMD Radeon Pro 5600M

2022-01-13 17:02:36.447465: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-01-13 17:02:36.448221: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2022-01-13 17:02:36.448581: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: )

Prior to running my model:

print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

Num GPUs Available: 1

I was excited to see an tensorflow-apple version 2.7 but STILL does not work.

This issue still persists.

Same issue here.

I have the same issue with m1, my nn code was working boyfriend's MACBOOK PRO with m1 max.(but not faster than a huawei matebook ) I AM SOO ANGRY MY PROJECT DEVELOPMENT STOPPED.