This post is from the WWDC26 Core AI Q&A.
When we specialize with preferredComputeUnitKind: .neuralEngine, the resolved options' allowedComputeUnitKinds return all three units, and sometimes a function we intend for ANE ends up on the GPU. We can't find any API that reports where a function actually ran, but system resource utilization shows a GPU spike.
Is there a supported way to confirm the actual compute unit at runtime? And how does your prioritization work if we prefer running on the ANE rather than the GPU? Or, can we disallow certain compute units?
How does this compute unit selection map with someone coming from CoreML where the desired compute units were honored?