FP16 underperforming with PyTorch MPS on M4 compared to M3

Machine Learning & AI Core ML Metal Performance Shaders ML Compute

Created Mar ’25

Replies 0

Boosts 0

Participants 1

I got 3203.23 GFLOPS (FP16) on the M3 Macbook Pro and only 2833.24 GFLOPS (FP16) on the M4 Macbook Air for 4096x4096 matrix multiplications for a PyTorch MPS FP16 Benchmark. Wasn't the performance supposed to be twice as high on the M4 compared to the M3 even with the termal throtling on the Macbook Air? What went wrong?