Is there any way to make the model run on GPU / Neural Engine?

Hi folks, I'm working on converting a GPT2 model to coreml with KV caching enabled.

I have a GPT2 model runinng on GPU with static input shape

It seems once I enable flexible shape (i.e. either range shape or enumerated shape), the model will be run on CPU according to the performance report. I can see new operators being added ( get_shape and general_slice ) and it is not supported by GPU / ANE

Wondering if there's any way to get around this to get the model running on GPU / ANE? How does the machine decide whether to run the model on GPU / Neural Engine?

Thanks!

Replies

You can customize where your model runs with MLModelConfiguration.computeUnits. You will see that you can target the ANE and CPU or many other versions.

Crucially you cannot ask to only run on the ANE, in order to support all given operations some may need to run on the CPU so that promise could not be granted. The performance report is showing you what is supported on the various compute units. The system tries to be smart, so it will run on the CPU even if a lot of its instructions could run on the ANE if it thinks it gives better Power usage or Performance characteristics.