When I use standard tensorflow installation the training (fit) process works well. I got:
2021-09-03 15:36:26.802170: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2) Epoch 1/20 1424/1424 [==============================] - 656s 460ms/step - loss: 0.0173 - binary_accuracy: 0.9913 - accuracy: 0.6272 - val_loss: 0.0198 - val_binary_accuracy: 0.9933 - val_accuracy: 0.6535 Epoch 2/20 1424/1424 [==============================] - 660s 464ms/step - loss: 0.0173 - binary_accuracy: 0.9913 - accuracy: 0.6250 - val_loss: 0.0218 - val_binary_accuracy: 0.9932 - val_accuracy: 0.6450 Epoch 3/20 1424/1424 [==============================] - 637s 447ms/step - loss: 0.0174 - binary_accuracy: 0.9913 - accuracy: 0.6224 - val_loss: 0.0204 - val_binary_accuracy: 0.9932 - val_accuracy: 0.6451 Epoch 4/20 1424/1424 [==============================] - 633s 444ms/step - loss: 0.0173 - binary_accuracy: 0.9913 - accuracy: 0.6244 - val_loss: 0.0237 - val_binary_accuracy: 0.9931 - val_accuracy: 0.6211 Epoch 5/20 1424/1424 [==============================] - 616s 433ms/step - loss: 0.0173 - binary_accuracy: 0.9913 - accuracy: 0.6243 - val_loss: 0.0198 - val_binary_accuracy: 0.9934 - val_accuracy: 0.6487
When I train the same model on the same train set on installation with tensorflow-macos + tensorflow-metal I receive:
2021-09-03 21:59:18.973547: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2) Epoch 1/20 2021-09-03 21:59:20.298178: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled. 175/175 [==============================] - ETA: 0s - loss: nan - binary_accuracy: 0.9814 - accuracy: 1.7905e-042021-09-03 21:59:50.110699: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled. 175/175 [==============================] - 34s 162ms/step - loss: nan - binary_accuracy: 0.9814 - accuracy: 1.7905e-04 - val_loss: nan - val_binary_accuracy: 0.9910 - val_accuracy: 0.0000e+00 Epoch 2/20 175/175 [==============================] - 28s 158ms/step - loss: nan - binary_accuracy: 0.9867 - accuracy: 0.0000e+00 - val_loss: nan - val_binary_accuracy: 0.9910 - val_accuracy: 0.0000e+00 Epoch 3/20 175/175 [==============================] - 28s 158ms/step - loss: nan - binary_accuracy: 0.9867 - accuracy: 0.0000e+00 - val_loss: nan - val_binary_accuracy: 0.9910 - val_accuracy: 0.0000e+00 Epoch 4/20 175/175 [==============================] - 28s 158ms/step - loss: nan - binary_accuracy: 0.9867 - accuracy: 0.0000e+00 - val_loss: nan - val_binary_accuracy: 0.9910 - val_accuracy: 0.0000e+00 Epoch 5/20 175/175 [==============================] - 28s 161ms/step - loss: nan - binary_accuracy: 0.9867 - accuracy: 0.0000e+00 - val_loss: nan - val_binary_accuracy: 0.9910 - val_accuracy: 0.0000e+00
The problem is that I have "nan" both for the loss and val_loss functions.