M3 Max + keras-ocr + tensorflow-metal returns incorrect results

Running the sample Python keras-ocr example on M3 Max returns incorrect results if tensorflow-metal is installed. Code Example: https://keras-ocr.readthedocs.io/en/latest/examples/using_pretrained_models.html

Note: https://upload.wikimedia.org/wikipedia/commons/e/e8/FseeG2QeLXo.jpg not found. Line commented out.

Without tensorflow-metal (Correct results):

['toodstande', 's', 'somme', 'srny', 'squadron', 'ds', 'quentn', 'snhnen', 'bnpnone', 'sasne', 'taing', 'yeoms', 'sry', 'the', 'royal', 'wessex', 'yeomanry', 'regiment', 'yeomanry', 'wests', 'south', 'the', 'now', 'recruiting', 'arm', 'blon', 'wxybsqipsacomodn', 'email', '438300', '01722']
['banana', 'union', 'no', 'no', 'software', 'patents']

With tensorflow-metal (Incorrect results):

['sddoooo', '', 'eamnooss', 'xynrr', 'daanues', 'idd', 'innee', 'iiiinus', 'tnounppanab', 'inla', 'ppnt', 'mmnooexyy', 'yyr', 'ehhtt', 'laayvyoorr', 'xeseww', 'rinamoevy', 'tnemiger', 'yrnamoey', 'sstseww', 'htuwlos', 'fefeahit', 'wwoniia', 'turceedrr', 'ymmrira', 'atate', 'prasbyxwr', 'liamme', '00338803144', '22277100']
['annnaab', 'noolinnu', 'oon', 'oon', 'wttffoos', 'sttneettaap']

Logs: With tensorflow-metal (Incorrect results)

(.venv) <REDACTED> % pip3 install -U tensorflow-metal
Collecting tensorflow-metal
  Using cached tensorflow_metal-1.1.0-cp311-cp311-macosx_12_0_arm64.whl.metadata (1.2 kB)
Requirement already satisfied: wheel~=0.35 in ./.venv/lib/python3.11/site-packages (from tensorflow-metal) (0.42.0)
Requirement already satisfied: six>=1.15.0 in ./.venv/lib/python3.11/site-packages (from tensorflow-metal) (1.16.0)
Using cached tensorflow_metal-1.1.0-cp311-cp311-macosx_12_0_arm64.whl (1.4 MB)
Installing collected packages: tensorflow-metal
Successfully installed tensorflow-metal-1.1.0
(.venv) <REDACTED> % python3 keras-ocr-bug.py        
Looking for <REDACTED>/.keras-ocr/craft_mlt_25k.h5
2023-12-16 22:05:05.452493: I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M3 Max
2023-12-16 22:05:05.452532: I metal_plugin/src/device/metal_device.cc:296] systemMemory: 64.00 GB
2023-12-16 22:05:05.452545: I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 24.00 GB
2023-12-16 22:05:05.452591: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-12-16 22:05:05.452609: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
WARNING:tensorflow:From <REDACTED>/.venv/lib/python3.11/site-packages/tensorflow/python/util/dispatch.py:1260: resize_bilinear (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.image.resize(...method=ResizeMethod.BILINEAR...)` instead.
Looking for <REDACTED>/.keras-ocr/crnn_kurapan.h5
2023-12-16 22:05:07.526354: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:117] Plugin optimizer for device_type GPU is enabled.
1/1 [==============================] - 1s 855ms/step
2/2 [==============================] - 1s 140ms/step
['sddoooo', '', 'eamnooss', 'xynrr', 'daanues', 'idd', 'innee', 'iiiinus', 'tnounppanab', 'inla', 'ppnt', 'mmnooexyy', 'yyr', 'ehhtt', 'laayvyoorr', 'xeseww', 'rinamoevy', 'tnemiger', 'yrnamoey', 'sstseww', 'htuwlos', 'fefeahit', 'wwoniia', 'turceedrr', 'ymmrira', 'atate', 'prasbyxwr', 'liamme', '00338803144', '22277100']
['annnaab', 'noolinnu', 'oon', 'oon', 'wttffoos', 'sttneettaap']

Logs: Valid results, without tensorflow-metal

(.venv) <REDACTED> % pip3 uninstall tensorflow-metal
Found existing installation: tensorflow-metal 1.1.0
Uninstalling tensorflow-metal-1.1.0:
  Would remove:
    <REDACTED>/.venv/lib/python3.11/site-packages/tensorflow-plugins/*
    <REDACTED>/.venv/lib/python3.11/site-packages/tensorflow_metal-1.1.0.dist-info/*
Proceed (Y/n)? Y
  Successfully uninstalled tensorflow-metal-1.1.0
(.venv) <REDACTED> % python3 keras-ocr-bug.py       
Looking for <REDACTED>/.keras-ocr/craft_mlt_25k.h5
WARNING:tensorflow:From <REDACTED>/.venv/lib/python3.11/site-packages/tensorflow/python/util/dispatch.py:1260: resize_bilinear (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.image.resize(...method=ResizeMethod.BILINEAR...)` instead.
Looking for <REDACTED>/.keras-ocr/crnn_kurapan.h5
1/1 [==============================] - 7s 7s/step
2/2 [==============================] - 1s 71ms/step
['toodstande', 's', 'somme', 'srny', 'squadron', 'ds', 'quentn', 'snhnen', 'bnpnone', 'sasne', 'taing', 'yeoms', 'sry', 'the', 'royal', 'wessex', 'yeomanry', 'regiment', 'yeomanry', 'wests', 'south', 'the', 'now', 'recruiting', 'arm', 'blon', 'wxybsqipsacomodn', 'email', '438300', '01722']
['banana', 'union', 'no', 'no', 'software', 'patents']

Replies

I have similar issues on my M3 Max. Using keras with tensorflow-metal produces incorrect results. CPU only tensorflow works as expected. I ran the test following what was described here: https://keras.io/examples/vision/mnist_convnet/

The training performance with tensorflow-metal was very poor whereas tensorflow on CPU yielded excellent training performance which matched expected results shown in that link.

I see the same results on my M3 MAX. With tensorfolw-metal installed my results are wrong for some models. Training is fast but the Model is often useless . Without tensorflow-metal training tooks hours instead of minutes because of lack of GPU but the results are right. I tested with following model:

https://github.com/davidADSP/Generative_Deep_Learning_2nd_Edition/tree/main/notebooks/03_vae/03_vae_faces

Some other models work, for example this: https://github.com/davidADSP/Generative_Deep_Learning_2nd_Edition/blob/main/notebooks/03_vae/02_vae_fashion/vae_fashion.ipynb