Massives issues with tensorflow gpu, when will apple do something?

Question

Created Feb ’24

Replies 8

Boosts 5

Views 3.4k

Participants 10

Hello, We all face issues with the latest tensorflow gpu. Incorrect result, errors etc... We all agreed to pay extra for the M1/2/3 so we could work on a professional grade computer but in the end we must use CPU. When will apple actually comment on that and provide updates. I totally understand these issues aren't fixed overnight and take some time, but i've never seen any apple dev answer saying that they understand and they're working on a fix. I've basically bought a Mac M3 Pro to be able to run on GPU some stuff without having to purchase a server and it's now useless. It's really frustrating.

Boost

Answer 1

09jtip OP

Feb ’24

I think we would all appreciate some information on this. Many of the issues have been around for well over a year, with no mention of resolution.

3

Answer 2

mgelain OP

Feb ’24

Slow training and wrong results with sonoma, tensorflow 2.15, tensorflow-metal 1.1

3

Answer 3

jzombie123 OP

Mar ’24

If I'm not mistaken, I think this issue is more apparent on Sonoma than Ventura.

0

Answer 4

charlie94965 OP

Mar ’24

I have added a python file from the Coursera TensorFlow: Advanced Techniques Specialization, class Generative Deep Learning with TensorFlow , Course4, Week3, Assignment 1.

This code runs correctly in colab or on local cpu cores, but fails drastically due to obvious untrapped numerical errors on a metal gpu.

To reproduce, switch between environments with the two command lines:

pip install tensorflow-metal
pip uninstall tensorflow-metal

Feel free to contact me if you require additional information, thanks!

C4W3_Assignment.py

1

Answer 5

charlie94965 OP

Mar ’24

Just wanted to add I have tried most combinations of Tensorflow 2.16.1, 2.15, and 2.14, with Pythons 3.9, 3.10 and 3.11, and with conda or pip virtual environments, all with and without tensorflow-metal. The results are always the same, with all moderately complex models I've tried employing convolution and batch normalization total failures with gpu support, and with success under colab or cpu-only Mac configs.

The tensorflow install page does suggest this, though they also link to install procedures for the defective tensorflow-metal package.

macOS 10.12.6 (Sierra) or later (no GPU support)

https://www.tensorflow.org/install

1

Answer 6

antoniothomacelli OP

Apr ’24

Hi, I use Apple Macbook M3, happens same problem, I solve using Anaconda and create a enviromnet using a version of Python 3.8.18 and than I can install tensorflow-macos and tensorflow-metal. Using Python in latest version > 3.8 tensorflow didn't works...

(AppleTensorflow) antoniothomacelli@mbp-de-antonio Aula001 % python --version
Python 3.8.18
(AppleTensorflow) antoniothomacelli@mbp-de-antonio Aula001 % pip list | grep tensorflow
tensorflow              2.16.1
tensorflow-macos        2.16.1
(AppleTensorflow) antoniothomacelli@mbp-de-antonio Aula001 %

1

Answer 7

aalih OP

May ’24

Try a different activation function than relu (e.g. elu, tanh)

0

Answer 8

benls OP

Dec ’24

I am still experiencing these issues with relatively simple models catastrophically failing to converge on Apple Silicon machines when using the GPU. When running on the CPU, I get better performance and good convergence. The only change is switching from GPU to CPU.

I am using tensorflow-metal v1.1.0, tensorflow v2.16.2, tensorflow-macos v2.16.2. If anyone has found a solution, or even a temporary work-around that yields good performance on the GPU, please advise.

0