Hi, I have some latest experiment which may indicate low performance issues when using Dense layer on the M1 Max (this is a follow-up issue about my previous question ).
import tensorflow as tf
from tensorflow.keras import Model, layers
import numpy as np
from tqdm import tqdm
class NeuralNet(Model):
# Set layers.
def __init__(self):
super(NeuralNet, self).__init__()
# First fully-connected hidden layer.
self.fc1 = layers.Dense(8192 * 8 * 2, activation=tf.nn.relu)
# Set forward pass.
def call(self, x):
return self.fc1(x)
# Build neural network model.
neural_net = NeuralNet()
batch_size = 1024
x = np.random.rand(batch_size, 256)
for _ in tqdm(range(10000000)):
neural_net(x)
The above code runs at 17.06it/s on the M1 Max chip and 168.04it/s on the Zotac RTX 3090. Both gpu utilisation of M1 max and RTX 3090 is 100%. The wattage usage for M1 max is 44.5W and 340W for RTX 3090. The M1 max is much slower compared to RTX 3090 (10% the performance of RTX 3090 which shouldn't be the case, it should be roughly 30% of a RTX 3090).
Here is the detailed performance comparsion of a RTX 3090 / M1 max for different batch size used which shows RTX 3090 is roughly 10 times faster than a M1 max and even faster for bigger batch size:
Notice that the batch size of above experiments is already big enough. Please test the above experiments and fix the problems. Thanks.