No Speedup with CoreML SDPA

I am testing the new scaled dot product attention CoreML op on macOS 15 beta 1. Based on the session video I was expecting to see a speedup when running on GPU however I see roughly equivalent performance to the same model on macOS 14.

I ran tests with two models:

  • one that simply repeats y = sdpa(y, k, v) 50 times
  • gpt2 124M converted from nanoGPT (the only change is not returning loss from the forward method)

I converted both models using coremltools 8.0b1 with minimum deployment targets of macOS 14 and also macOS 15. In Xcode, I can see that the new op was used for the macOS 15 target. Running on macOS 15 both target models take the same time, and that time matches the runtime on macOS 14.

Should I be seeing performance improvements?

I am still new to all model training tasks, but based on the WWDC 2024, I was also led to understand that conversion and training of models in Apple Silicon would be much faster and viable than before. In so much I traded in my old intel macbook pro for a new M3 pro macbook pro.

Yona Havocainen's session "Train your machine learning and AI models on Apple GPUs" this year shows the training in Jupyter Notebooks as fast, and the model query execution instantaneous.

I asked Apple Support for clarification on what versions he was using for his packages, macOS, and model were. But the response I got was to look into the forums and videos to which I already watched...

Hello @smpanaro, the model performance can depend on many variables. Could you file a feedback assistant with the models attached?

No Speedup with CoreML SDPA
 
 
Q