Core ML inference results on A14 device differs a lot between GPU and ANE

I have a model contains Convolution, LeakyReLU as activation layer and one concat layer. I convert it into .mlmodel format for IPhone12 inference. When the model is on GPU, the result is a little different from the result on CPU but still acceptable. When using ANE (Apple Neural Engine), the result differs a lot from CPU and GPU both.

I have tried changing LeakyReLU to ReLU, but the problem still exists. The only difference in my CPU, GPU and ANE inference execution is changing compute units between MLComputeCPUOnly, MLComputeCPUAndGPU and MLComputeAll.
Core ML inference results on A14 device differs a lot between GPU and ANE
 
 
Q