Differing outputs on CPU, GPU and ANE for the same MLModel

I am using a converted custom PyTorch Model on device for use with real time video. The Model was converted successfully using both CoreMLTools V4.1 and V5.0b3 (both versions exhibit the same issues). When running the model both from a python environment using CoreMLTools, as well as a MacOS app, using the same input image and supplementary data the output is identical, correct and matches the output of the pure PyTorch model.

However, when running it on device, the models output is incorrect. On an iPhone XR, using the .all or .cpuAndGPU value of computeUnits, the output is simply a white square with no error or warning message. What this means is our output, which we normal expect to be in the range of [0,255], has values of 255 in every location. However, running using .cpuOnly on the iPhone XR produces the correct output.

Furthermore, when simulating a device from a MacOS machine, the output is correct regardless of the computeUnits value.

On an iPhone 12 this situation gets even more confusing. With the setting .cpuAndGPU, we get the pure white incorrect output, using .cpuOnly we get the correct output, but with .all we get a different incorrect output, an image of wildly incorrect colors but a vaguely similar form to the image we expect. In addition with the .all setting we get the following error.

2021-09-01 15:07:16.595048-0500 sensoriumViewer[33717:10399075] [espresso] [Espresso::ANERuntimeEngine::__forward_segment 3] evaluate[RealTime]WithModel returned 0; code=5 err=Error Domain=com.apple.appleneuralengine Code=5 "processRequest:qos:qIndex:modelStringID:options:error:: 0xd: Program Inference overflow" UserInfo={NSLocalizedDescription=processRequest:qos:qIndex:modelStringID:options:error:: 0xd: Program Inference overflow} 2021-09-01 15:07:16.595103-0500 sensoriumViewer[33717:10399075] [espresso] [Espresso::overflow_error] /private/var/containers/Bundle/Application/16433631-57DE-488C-8772-D9560C3D8B48/sensoriumViewer.app/SensoriumMLTest16V1.mlmodelc/model.espresso.net:3

Which makes it pretty clear that there is some sort of either Integer or Floating Point overflow error. What I believe is happening is this: Regardless of model, using the GPU causes the overflow to truncate, giving us values of 255 for all pixels, on the iPhone 12 using .all passes it to the ANE (Apple Neural Engine) which wraps the overflow error, giving unpredictable colors but a kind of correct shape, using .all on the iPhone XR just uses the GPU because for some reason this model won't go to the XR ANE, and lastly, using .cpuOnly does not overflow and gives us the correct result.

Why does the XR not use its ANE for this model? Can the ANE and GPU just not handle 32 bit floats? We are quantizing the model to 16bit using CoreMLTools, why are we still overflowing? I see the documentation for the new MLProgram format and it seems promising, will that solve this issue? Is there any documentation surrounding the supported operations and number precision for Pytorch converted models?
Why are there no errors or warnings when passing this through the GPU?

Any help or insight would be greatly appreciated as the documentation I've seen surrounding the ANE is not very comprehensive.

Could you try with new coreML tools to create an .mlpackage instead? It should leverage MPSGraph and the GPU should definitely produce the correct result with the use of MPS.

Hi, the floating point precisions available of each backend is described in more detail here : https://coremltools.readme.io/docs/typed-execution

The quantization of the model to 16 bits using tools described in https://coremltools.readme.io/docs/quantization, only quantizes the weights, and not the activations.

The ANE capabilities differ in each generation of hardware and software, hence that could explain the differences between iphone XR and iphone 12. However, that said, it could also be a bug, so I'd recommend to file a bug with the model and the observations, so that it can be reproduced and investigated more closely.

Differing outputs on CPU, GPU and ANE for the same MLModel
 
 
Q