Hi everyone,
I'm a Mac enthusiast experimenting with tensorflow-metal on my Mac Pro (2013). My question is about GPU selection in tensorflow-metal (v0.8.0), which still supports Intel-based Macs, including my machine.
I've noticed that when running TensorFlow with Metal, it automatically selects a GPU, regardless of what I specify using device indices like "gpu:0", "gpu:1", or "gpu:2". I'm wondering if there's a way to manually specify which GPU should be used via an environment variable or another method.
For reference, I’ve tried the example from TensorFlow’s guide on multi-GPU selection: https://www.tensorflow.org/guide/gpu#using_a_single_gpu_on_a_multi-gpu_system
My goal is to explore performance optimizations by using MirroredStrategy in TensorFlow to leverage multiple GPUs: https://www.tensorflow.org/guide/distributed_training#mirroredstrategy
Interestingly, I discovered that the metalcompute Python library (https://pypi.org/project/metalcompute/) allows to utilize manually selected GPUs on my system, allowing for proper multi-GPU computations. This makes me wonder:
Is there a hidden environment variable or setting that allows manual GPU selection in tensorflow-metal?
Has anyone successfully used MirroredStrategy on multiple GPUs with tensorflow-metal?
Would a bridge between metalcompute and tensorflow-metal be necessary for this use case, or is there a more direct approach?
I’d love to hear if anyone else has experimented with this or has insights on getting finer control over GPU selection. Any thoughts or suggestions would be greatly appreciated!
Thanks!
tensorflow-metal
RSS for tagTensorFlow accelerates machine learning model training with Metal on Mac GPUs.
Posts under tensorflow-metal tag
38 Posts
Sort by:
Post
Replies
Boosts
Views
Activity
I've been trying to get some basic models to work on an M2 with tensor metal 1.2 and keras 2.15 and 2.18 and they all fail to work as expected.
I'm running models copy/pasted from common tutorials like Jason Brownlee ML Mastery Object Classification tutorial using CIFAR-10. When run with the GPU I can't get any reasonable results. Under keras 2.15 the best validation accuracy ends up being around 10-15%. Under keras 2.18, the validation goes off the rails around epoch 5 with wildly low accuracy and loss values that are reported as "nan".
Epoch 4/25
782/782: 19s 24ms/step - accuracy: 0.3450 - loss: 2.8925 - val_accuracy: 0.2992 - val_loss: 1.9869
Epoch 5/25
782/782: 19s 24ms/step - accuracy: 0.2553 - loss: nan - val_accuracy: 0.0000e+00 - val_loss: nan
Running the same code on the CPU using keras 2.15 using tf.config.experimental.set_visible_devices([], 'GPU') yields a reasonable result with the validation accuracy around 75% as expected. Running the same code on keras 2.15 on a linux instance with just the CPU provides similar results.
The tutorial can be found here:
https://machinelearningmastery.com/object-recognition-convolutional-neural-networks-keras-deep-learning-library/
The only places I've deviated from the provided tutorial is using
sdg = tf.keras.optimizers.legacy.SGD(learning_rate=lrate, momentum=0.9, nesterov=False)
I did this at the advice of the warning:
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.
Is there something special that I need to do to make this work? I've followed the instructions here: https://developer.apple.com/metal/tensorflow-plugin/
I've purged the venv a few times and started from scratch, but all with similarly terrible results.
Here are my platform details:
Chip: Apple M2
Memory: 16 GB
macOS : Sequoia 15.2
Python venv: 3.11
Jupyter Lab Version: 4.3.3
TensorFlow versions: 2.15, 2.18
tensorflow-metal: 1.2.0
Thanks for any assistance or advice.
Has anyone been able to run Tensorflow > 2.15 with Tensorflow Metal 1.1.0 on M3? I tried several times but was not successful. Seems like development on TensorFlow Metal has paused?
I've checked on pypi.org and it appears to only have arm64 packages, has x86 with AMD been deprecated?
Issue type: Bug
TensorFlow metal version: 1.1.1
TensorFlow version: 2.18
OS platform and distribution: MacOS 15.2
Python version: 3.11.11
GPU model and memory: Apple M2 Max GPU 38-cores
Standalone code to reproduce the issue:
import tensorflow as tf
if __name__ == '__main__':
gpus = tf.config.experimental.list_physical_devices('GPU')
print(gpus)
Current behavior
Apple silicone GPU with tensorflow-metal==1.1.0 and python 3.11 works fine with tensorboard==2.17.0
This is normal output:
/Users/mspanchenko/anaconda3/envs/cryptoNN_ml_core/bin/python /Users/mspanchenko/VSCode/cryptoNN/ml/core_second_window/test_tensorflow_gpus.py
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Process finished with exit code 0
But if I upgrade tensorflow to 2.18 I'll have error:
/Users/mspanchenko/anaconda3/envs/cryptoNN_ml_core/bin/python /Users/mspanchenko/VSCode/cryptoNN/ml/core_second_window/test_tensorflow_gpus.py
Traceback (most recent call last):
File "/Users/mspanchenko/VSCode/cryptoNN/ml/core_second_window/test_tensorflow_gpus.py", line 1, in <module>
import tensorflow as tf
File "/Users/mspanchenko/anaconda3/envs/cryptoNN_ml_core/lib/python3.11/site-packages/tensorflow/__init__.py", line 437, in <module>
_ll.load_library(_plugin_dir)
File "/Users/mspanchenko/anaconda3/envs/cryptoNN_ml_core/lib/python3.11/site-packages/tensorflow/python/framework/load_library.py", line 151, in load_library
py_tf.TF_LoadLibrary(lib)
tensorflow.python.framework.errors_impl.NotFoundError: dlopen(/Users/mspanchenko/anaconda3/envs/cryptoNN_ml_core/lib/python3.11/site-packages/tensorflow-plugins/libmetal_plugin.dylib, 0x0006): Symbol not found: __ZN3tsl8internal10LogMessageC1EPKcii
Referenced from: <D2EF42E3-3A7F-39DD-9982-FB6BCDC2853C> /Users/mspanchenko/anaconda3/envs/cryptoNN_ml_core/lib/python3.11/site-packages/tensorflow-plugins/libmetal_plugin.dylib
Expected in: <2814A58E-D752-317B-8040-131217E2F9AA> /Users/mspanchenko/anaconda3/envs/cryptoNN_ml_core/lib/python3.11/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
Process finished with exit code 1
I am attempting to install Tensorflow on my M1 and I seem to be unable to find the correct matching versions of jax, jaxlib and numpy to make it all work.
I am in Bash, because the default shell gave me issues.
I downgraded to python 3.10, because with 3.13, I could not do anything right.
Current actions:
bash-3.2$ python3.10 -m venv ~/venv-metal
bash-3.2$ python --version
Python 3.10.16
python3.10 -m venv ~/venv-metal
source ~/venv-metal/bin/activate
python -m pip install -U pip
python -m pip install tensorflow-macos
And here, I keep running tnto errors like:
(venv-metal):~$ pip install tensorflow-macos tensorflow-metal
ERROR: Could not find a version that satisfies the requirement tensorflow-macos (from versions: none)
ERROR: No matching distribution found for tensorflow-macos
What is wrong here?
How can I fix that?
It seems like the system wants to use the x86 version of python ... which can't be right.
I am running the same Python script using the TensorFlow Metal module on computers with M3 and M4 GPUs. While 1 epoch takes 5 minutes on the M3 device, it takes 15 minutes on the M4 device. What could be the reason for this? Could it be that TensorFlow Metal is not yet optimized for the M4 architecture?
Topic:
App & System Services
SubTopic:
Hardware
Tags:
ML Compute
Metal Performance Shaders
tensorflow-metal
I was installing TensorFlow metal in the environment called "arm64_tf'" in anaconda using command line "python -m pip install tensorflow-metal" in terminal and it shows :
ERROR: Could not find a version that satisfies the requirement tensorflow-metal (from versions: none)
ERROR: No matching distribution found for tensorflow-metal
I have already tried using " conda install -c anaconda libffi" but it still doesn't work is there a solution ? Thanks
apologies for my bad English
Hello! I've been trying to run tensorflow on my MBA M3. I previously had an Intel Mac and was able to run tensorflow without any problem. I've been working on a personal project in a directory I made on my previous Mac, that I was running through Jupyter notebook. Now every time I try to run the code, the kernel will die and I'm unsure what to do.
I tried following tutorials, but every tutorial I've seen has made me create a new environment to access Jupyter Notebook, but not letting me access notebooks and files that have already been created.
I tried to run this following command in terminal and received the subsequent error back.
python -m pip install tensorflow-metal
ERROR: Could not find a version that satisfies the requirement tensorflow-metal (from versions: none)
ERROR: No matching distribution found for tensorflow-metal
I've installed miniforge, Xcode, and anaconda onto my computer already and wanted some assistance.
Hi Everyone,
I'm currently facing an issue where TensorFlow is unable to detect the GPU on my M1 Mac for model training. When I run the following code to check for available GPUs:
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
Num GPUs Available: 0
I have already applied the steps mentioned in the developer apple document.
https://developer.apple.com/metal/tensorflow-plugin/
System Information:
Device: M1 Mac Pro Max
Python Version: 3.12.2
TensorFlow Version: 2.17.0
OS: macOS Sequoia (15.1)
Questions:
Is there any additional configuration required to enable GPU support on M1 Macs?
Are there specific TensorFlow versions that I should be using for better compatibility?
Has anyone else faced this issue, and how did you resolve it?
Topic:
Machine Learning & AI
SubTopic:
General
Tags:
Developer Tools
ML Compute
Core ML
tensorflow-metal
Hi, The most recent version of tensorflow-metal is only available for macosx 12.0 and python up to version 3.11. Is there any chance it could be updated with wheels for macos 15 and Python 3.12 (which is the default version supported for tensrofllow 2.17+)? I'd note that even downgrading to Python 3.11 would not be sufficient, as the wheels only work for macos 12.
Thanks.
Hi all,
When executing an HLO program using the JAX metal PJRT plugin, the program fails due to an unsupported data type returned by the rng_bit_generator operation.
The generated HLO includes:
%output_state, %output = "mhlo.rng_bit_generator"(%1) <{rng_algorithm = #mhlo.rng_algorithm<PHILOX>}> : (tensor<3xi64>) -> (tensor<3xi64>, tensor<3xui32>)
The error message indicates that:
Metal only supports MPSDataTypeFloat16, MPSDataTypeBFloat16, MPSDataTypeFloat32, MPSDataTypeInt32, and MPSDataTypeInt64.
The use of ui32 seems to be incompatible with Metal’s allowed types.
I’m trying to understand if the ui32 output is the problem or maybe the use of rng_bit_generator is wrong.
Could you clarify if there is a workaround or planned support for ui32 output in this context? Alternatively, guidance on configuring rng_bit_generator for compatibility with Metal’s supported types would be greatly appreciated.
Hello,
I’m attempting to convert a TensorFlow model to CoreML using the coremltools package, but I’m encountering an error during the conversion process. The error traceback points to an issue within the Cast operation in the MIL (Model Intermediate Layer) when it tries to perform type inference:
AttributeError: 'float' object has no attribute 'astype'
Here is the relevant part of the error traceback:
File ~/.pyenv/versions/3.10.12/lib/python3.10/site-packages/coremltools/converters/mil/mil/ops/defs/iOS15/elementwise_unary.py", line 896, in get_cast_value
return input_var.val.astype(dtype=type_map[dtype_val])
I’ve tried converting a model from the yamnet-tensorflow2 repository, and this error occurs when CoreML tries to cast a float type during the conversion of certain operations. I’m currently using Python 3.10 and coremltools version 6.0.1, with TensorFlow 2.x.
Has anyone encountered a similar issue or can offer suggestions on how to resolve this?
I’ve also considered that this might be related to mismatches in the model’s data types, but I’m not sure how to proceed.
Platform and package versions:
coremltools 6.1
tensorflow 2.10.0
tensorflow-estimator 2.10.0
tensorflow-hub 0.16.1
tensorflow-io-gcs-filesystem 0.37.1
Python 3.10.12
pip 24.3.1 from ~/.pyenv/versions/3.10.12/lib/python3.10/site-packages/pip (python 3.10)
Darwin MacBook-Pro.local 24.1.0 Darwin Kernel Version 24.1.0: Thu Oct 10 21:02:27 PDT 2024; root:xnu-11215.41.3~2/RELEASE_X86_64 x86_64
Any help or pointers would be greatly appreciated!
hi,
I am currently running LSTM on TensorFlow. However, when i switched from keras2 to keras3. code running time has increased 10 times -- it seems there is no GPU acceleration.
Here is my code:
batch size = 256
optimiser = adam
activation = tanh
_______________________________________________
Layer (type) Output Shape Param #
=============================================
input_1 (InputLayer) [(None, 7, 16)] 0
bidirectional (Bidirection (None, 7, 320) 226560
al)
bidirectional_1 (Bidirecti (None, 7, 512) 1181696
onal)
bidirectional_2 (Bidirecti (None, 256) 656384
onal)
dense (Dense) (None, 1) 257
==============================================
Total params: 2064897 (7.88 MB)
Trainable params: 2064897 (7.88 MB)
Non-trainable params: 0 (0.00 Byte)
______________________________________________
This is keras 3.6.0 + tensorflow 2.17.0 + tensorflow-metal 1.1.0 training status:
Training------------
Epoch 1/200
28/681 ━━━━━━━━━━━━━━━━━━━━ 8:13 756ms/step - loss: 0.5901 - mape: 338.6876 - mse: 0.8591
This is keras 2.14.0 + tensorflow 2.14.0 + tensorflow-metal 1.1.0 training status:
Training------------
Epoch 1/200
681/681 [==============================] - 37s 49ms/step - loss: 3.6345 - mape: 499038.7500 - mse: 34.4148 - val_loss: 3.5452 - val_mape: 41.7964 - val_mse: 32.0133 - lr: 0.0010
Is that because keras3 has no GPU support on macos?
Apart from that, if I change LSTM activation from tanh to sigmoid in keras2, it does not have GPU support as well.
My system is 15.0.1 and the code was running on python3.11
I am not sure why these happen.
Thanks
I was working on my project and when I tried to train a model the kernel crashed, so I restarted the kernel and tried the same and still I got the same crashing issue. Then I read one of the thread having the same issue where the apple support was saying to install tensorflow-macos and tensorflow-metal and read the guide from this site:
https://developer.apple.com/metal/tensorflow-plugin/
and I did so, I tried every single thing and when I tried the test code provided in the site, I got the same error, here's the code and the output.
Code:
import tensorflow as tf
cifar = tf.keras.datasets.cifar100
(x_train, y_train), (x_test, y_test) = cifar.load_data()
model = tf.keras.applications.ResNet50(
include_top=True,
weights=None,
input_shape=(32, 32, 3),
classes=100,)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False)
model.compile(optimizer="adam", loss=loss_fn, metrics=["accuracy"])
model.fit(x_train, y_train, epochs=5, batch_size=64)
and here's the output:
Epoch 1/5
The Kernel crashed while executing code in the current cell or a previous cell.
Please review the code in the cell(s) to identify a possible cause of the failure.
Click here for more info.
View Jupyter log for further details.
And here's the half of log file as it was not fully coming:
metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M1
2024-10-06 23:30:49.894405: I metal_plugin/src/device/metal_device.cc:296] systemMemory: 8.00 GB
2024-10-06 23:30:49.894420: I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 2.67 GB
2024-10-06 23:30:49.894444: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2024-10-06 23:30:49.894460: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: )
2024-10-06 23:30:56.701461: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:117] Plugin optimizer for device_type GPU is enabled.
[libprotobuf FATAL google/protobuf/message_lite.cc:353] CHECK failed: target + size == res:
libc++abi: terminating due to uncaught exception of type google::protobuf::FatalException: CHECK failed: target + size == res:
Please respond to this post as soon as possible as I am working on my project now and getting this error again n again.
Device: Apple MacBook Air M1.
The metal plugin for TensorFlow had its GitHub repo taken down, and on pypi, the last update was a year ago for TF 2.14. What's the status on the metal plugin? For now it seems to work fine for TF 2.15 but what's the plan for the future?
The following code taken from keras.io produces the error
InternalError: Exception encountered when calling GPT2Tokenizer.call().
...
2 root error(s) found.
(0) INTERNAL: stream cannot wait for itself
Macos on Macbook, M2 Max. Setting the optimizer to "Adam" does not help.
import keras_nlp # version 0.15
causal_lm = keras_nlp.models.GPT2CausalLM.from_preset("gpt2_base_en")
causal_lm.compile(sampler="greedy")
# the next call produces the error
causal_lm.generate(["Keras is a"])
Following this instruction to install jax (https://developer.apple.com/metal/jax/), I still encountered this error:
RuntimeError: This version of jaxlib was built using AVX instructions, which your CPU and/or operating system do not support. This error is frequently encountered on macOS when running an x86 Python installation on ARM hardware. In this case, try installing an ARM build of Python. Otherwise, you may be able work around this issue by building jaxlib from source.
How to fix it?
getting this error again and again even if I tried reinstalling.
Traceback (most recent call last):
File "", line 1, in
File "/Users/aman/LLM/env/lib/python3.8/site-packages/tensorflow/init.py", line 439, in
_ll.load_library(_plugin_dir)
File "/Users/aman/LLM/env/lib/python3.8/site-packages/tensorflow/python/framework/load_library.py", line 151, in load_library
py_tf.TF_LoadLibrary(lib)
tensorflow.python.framework.errors_impl.NotFoundError: dlopen(/Users/aman/LLM/env/lib/python3.8/site-packages/tensorflow-plugins/libmetal_plugin.dylib, 0x0006): Symbol not found: OBJC_CLASS$_MPSGraphRandomOpDescriptor
Referenced from: /Users/aman/LLM/env/lib/python3.8/site-packages/tensorflow-plugins/libmetal_plugin.dylib
Expected in: /System/Library/Frameworks/MetalPerformanceShadersGraph.framework/Versions/A/MetalPerformanceShadersGraph
I've been attempting to install tf metal on my computer so that I can use GPUs instead of CPUs. I have tf macOS installed already, and I am fully updated with pip and tf. I'm currently 2 months into building and training a tf CNN, and I'm at the point where training a single epoch for my network will take a week (I have a lot of data that I need to use). I desperately need to use GPUs but am stuck with CPUs for now. I can't get access to a cluster, so the best I can do is continue to use my M2 MacBook. Is there any other way I can install TF metal? Is there a way I can use GPUs (rather than CPUs) when using TF if I can't get install metal?
I keep getting this error message:
"ERROR: Could not find a version that satisfies the requirement tensorflow-metal (from versions: none) ERROR: No matching distribution found for tensorflow-metal"
I looked on apple forums, tried to download it from GitHub (the page is down), and anything else I could think of and/or find on the internet to help, but it still isn't installing.
I've used the following commands and still no luck:
python -m pip install tensorflow-metal
pip install https://github.com/apple/tensorflow_metal/releases/download/v0.5.0/tensorflow_metal-0.5.0-py3-none-any.whl
pip install tensorflow-metal
pip3 install tensorflow-metal
SYSTEM_VERSION_COMPAT=0 python -m pip install tensorflow-metal
SYSTEM_VERSION_COMPAT=0 pip install tensorflow-macos tensorflow-metal
conda install -c anaconda tensorflow-gpu
Any help would be appreciated! Thanks so much!