Hello,
I noticed a substantial decrease in performance compared to previous releases of tensorflow for M1 Macs.
I previously installed the alpha release of tensorflow for M1 from GitHub, found here: https://github.com/apple/tensorflow_macos and was very impressed by the performance.
I used the following script to benchmark my M1 Mac and other systems: https://gist.github.com/tampapath/662aca8cd0ef6790ade1bf3c23fe611a#file-fashin_mnist-py Running the alpha release from GitHub, my M1 Mac handsomely outperformed both google colab's random GPU offerings and an RTX 2070 windows computer.
Recently, I went back to the GitHub repository, looking for new updates on tensorflow support for the M1 and was redirected here to the tensorflow-metal PluggableDevices installation guide: https://developer.apple.com/metal/tensorflow-plugin/
After installing the conda environment and running the same benchmark script, I realized my M1 systems's was running much slower.
Additionally, the following error messages printed to the console while running the benchmark:
2021-08-12 21:48:16.306946: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2021-08-12 21:48:16.307209: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2021-08-12 21:48:16.437942: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2021-08-12 21:48:16.441196: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
Has anyone else noticed this loss in performance?
The results I got are as follow:
| tf GitHub alpha | 🟢 | 9.62s |
| new tf-metal | 🔴 | 76.52s |
| google colab | 🔴 | 57.53s |
| RTX 2070 PC | 🔴 | 23.18s |
- both tf GitHub alpha and new tf-metal were ran on the same 13" M1 MacBook Pro.
I wrote an installation guide for the GitHub alpha release if anyone wants to compare results, or run a faster version of tensorflow compatible with their M1 Mac: https://github.com/apple/tensorflow_macos/issues/215
-
—
brendank_ntb
Add a CommentI had a bit of a look into how this was performing on my system (13" M1 MacBook Air).
Using tensorFlow-metal pluggable device I had total training and testing time of 62.52s. However, when training this on the CPU only I had training and testing time of 9.41s.
I never managed to successfully install the original apple tf alpha so I can't directly test that but I am guessing that it allowed this to train and test on the CPU.
I have done a bunch of other testing (as have others) that show that for small models and small image dimensions the CPU is faster than the GPU. Once the model, batch size and image size become a bit larger the GPU becomes faster. For example, using EfficientNetB0 against CIFAR100, image size 32x32 is consistently faster on CPU, image size 64x64 is pretty even and image size 128x128 is generally faster on GPU.
Compared to Google Colab, a similar patter emerges. For small models, batch size and image size the M1 compares well but as the model and the data become larger the Colab GPU powers ahead.
This has captured my interest because of the rumours of the M1X with double the CPU high performance cores and quadruple the GPU cores. If that turns out to be true then the Apple machines could become genuinely capable AI development systems (at a very competitive price). Fingers crossed :).