Hi folks, I'm working on converting a GPT2 model to coreml with KV caching enabled.
I have a GPT2 model runinng on GPU with static input shape
It seems once I enable flexible shape (i.e. either range shape or enumerated shape), the model will be run on CPU according to the performance report. I can see new operators being added ( get_shape and general_slice ) and it is not supported by GPU / ANE
Wondering if there's any way to get around this to get the model running on GPU / ANE? How does the machine decide whether to run the model on GPU / Neural Engine?
Thanks!