🤔 GitHub tensorflow macOS alpha had better performance on M1?

Question

sedwardsmarsh OP

Created Aug ’21

Replies 7

Boosts 0

Views 4.1k

Participants 8

Hello,

I noticed a substantial decrease in performance compared to previous releases of tensorflow for M1 Macs.

I previously installed the alpha release of tensorflow for M1 from GitHub, found here: https://github.com/apple/tensorflow_macos and was very impressed by the performance.

I used the following script to benchmark my M1 Mac and other systems: https://gist.github.com/tampapath/662aca8cd0ef6790ade1bf3c23fe611a#file-fashin_mnist-py Running the alpha release from GitHub, my M1 Mac handsomely outperformed both google colab's random GPU offerings and an RTX 2070 windows computer.

Recently, I went back to the GitHub repository, looking for new updates on tensorflow support for the M1 and was redirected here to the tensorflow-metal PluggableDevices installation guide: https://developer.apple.com/metal/tensorflow-plugin/

After installing the conda environment and running the same benchmark script, I realized my M1 systems's was running much slower.

Additionally, the following error messages printed to the console while running the benchmark:

2021-08-12 21:48:16.306946: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.

2021-08-12 21:48:16.307209: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)

2021-08-12 21:48:16.437942: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)

2021-08-12 21:48:16.441196: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz

Has anyone else noticed this loss in performance?

The results I got are as follow:

benchmark script duration

tf GitHub alpha	🟢	9.62s
new tf-metal	🔴	76.52s
google colab	🔴	57.53s
RTX 2070 PC	🔴	23.18s

both tf GitHub alpha and new tf-metal were ran on the same 13" M1 MacBook Pro.

I wrote an installation guide for the GitHub alpha release if anyone wants to compare results, or run a faster version of tensorflow compatible with their M1 Mac: https://github.com/apple/tensorflow_macos/issues/215

Answer 1

brendank_ntb OP

Sep ’21

I had a bit of a look into how this was performing on my system (13" M1 MacBook Air).

Using tensorFlow-metal pluggable device I had total training and testing time of 62.52s. However, when training this on the CPU only I had training and testing time of 9.41s.

I never managed to successfully install the original apple tf alpha so I can't directly test that but I am guessing that it allowed this to train and test on the CPU.

I have done a bunch of other testing (as have others) that show that for small models and small image dimensions the CPU is faster than the GPU. Once the model, batch size and image size become a bit larger the GPU becomes faster. For example, using EfficientNetB0 against CIFAR100, image size 32x32 is consistently faster on CPU, image size 64x64 is pretty even and image size 128x128 is generally faster on GPU.

Compared to Google Colab, a similar patter emerges. For small models, batch size and image size the M1 compares well but as the model and the data become larger the Colab GPU powers ahead.

This has captured my interest because of the rumours of the M1X with double the CPU high performance cores and quadruple the GPU cores. If that turns out to be true then the Apple machines could become genuinely capable AI development systems (at a very competitive price). Fingers crossed :).

Answer 2

essandess OP

Oct ’21

I also see this issue.

I benchmarked my Mac Pro’s Radeon Pro 580X against this simple CNN model: https://github.com/macports/macports-ports/pull/12678

Apple GitHub tensorflow_macos alpha3: 5 s/epoch
PyPi tensorflow-macos v2.6 + tensorflow-metal v0.2: 15 s/epoch (3X slower)
Mac Pro 12-core Xeon W CPU: 10 s/epoch
Tesla V100: 1 s/epoch

I conclude that there are still some significant alpha-release issues with tensorflow-macos/tensorflow-metal.

Answer 3

parthsharma OP

Oct ’21

how to fix these warnings? I am on macOS 12 public release but these warning still persists @essandess @brendank_ntb

2021-08-12 21:48:16.306946: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.

2021-08-12 21:48:16.307209: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)

2021-08-12 21:48:16.437942: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)

Answer 4

brendank_ntb OP

Oct ’21

Hi @parthsharma, I get those warnings too (at the start of training) but I also get a message:

I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.

Training then continues using the GPU. I expect that the warning messages could be suppressed but I have not bothered to do so.

Answer 5

parthsharma OP

Oct ’21

@brendank_ntb yes I get the same message too. In this manner

`2021-10-27 20:22:30.943266: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2021-10-27 20:22:30.946426: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2021-10-27 20:22:38.821706: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.`

Thanks for your response . I thought this was not supposed to happen but this is happening with everyone .

Answer 6

rainbowforest OP

Oct ’21

Wrapping your code with tf.device('/cpu:0'): I get 9.63 seconds (M1) and with GPU 60s.

Answer 7

SteadyFa OP

Oct ’21

ME TOO！！！！！！