CRNN training slower on GPU than on CPU

I was training the CRNN model described here (https://keras.io/examples/vision/handwriting_recognition/) in tensorflow 2.8 with and without tensorflow metal version 0.4. This model has 424,081 trainable parameters. Even when varying the batch size, GPU is always much slower than CPU, as shown in below graph. Surprisingly, training gets even slower on GPU for larger batch sizes.

Please let me know, how I can make GPU training much faster than CPU.

System: M1 Max 64GB, macOS 12.2.1.

P.s. since there were differences in the loss trajectory between CPU and metal in metal versions prior to 0.4, I am happy to report, that this has been resolved. See below graph

Post not yet marked as solved Up vote post of gtsoukas Down vote post of gtsoukas
670 views