Compute unit specification for function runs

When we specialize with preferredComputeUnitKind: .neuralEngine, the resolved options' allowedComputeUnitKinds return all three units, and sometimes a function we intend for ANE ends up on the GPU. We can't find any API that reports where a function actually ran, but system resource utilization shows a GPU spike.

Is there a supported way to confirm the actual compute unit at runtime? And how does your prioritization work if we prefer running on the ANE rather than the GPU? Or, can we disallow certain compute units?

How does this compute unit selection map with someone coming from CoreML where the desired compute units were honored?

Answered by Engineer in 891736022

When you specify preferredComputeUnitKind: .neuralEngine, it does not disallow the model from running on other compute units which is why the allowed compute units still reflects all. However when scheduling is done it will use the neural engine for all operations which can be run there, with other operations falling back to GPU or CPU.

In seed 1 the available way for telling what compute unit(s) the model is using is by capturing an instruments trace with the CoreAI instrument which will show which compute units are being used (though does not show op by op). There is not currently a runtime API available to dynamically tell which operations are running on the neural engine. Also currently the only way to disallow other compute units is to specify .cpuOnly, however there is no way to request only neural engine. For both telling where specific operations are running, and also requesting to limit compute units, I'd encourage you to file a feedback report with your use case included on how they'd be helpful!

In terms of how it maps to CoreML's MLComputeUnits, the default structure is similar where any compute units can be used, but there's some differences in how restrictions are specified. CoreAI and CoreML both have a way to request cpu only, but from there CoreAI has a notion of preferred compute units for requesting a specific compute unit to be maximized, but keeping others as an option to make sure the model can be run on all devices as some operations may not be able to run on some compute units. But again please consider filing a feedback report with your use case if your goal is to more directly restrict certain compute units.

Great question

When you specify preferredComputeUnitKind: .neuralEngine, it does not disallow the model from running on other compute units which is why the allowed compute units still reflects all. However when scheduling is done it will use the neural engine for all operations which can be run there, with other operations falling back to GPU or CPU.

In seed 1 the available way for telling what compute unit(s) the model is using is by capturing an instruments trace with the CoreAI instrument which will show which compute units are being used (though does not show op by op). There is not currently a runtime API available to dynamically tell which operations are running on the neural engine. Also currently the only way to disallow other compute units is to specify .cpuOnly, however there is no way to request only neural engine. For both telling where specific operations are running, and also requesting to limit compute units, I'd encourage you to file a feedback report with your use case included on how they'd be helpful!

In terms of how it maps to CoreML's MLComputeUnits, the default structure is similar where any compute units can be used, but there's some differences in how restrictions are specified. CoreAI and CoreML both have a way to request cpu only, but from there CoreAI has a notion of preferred compute units for requesting a specific compute unit to be maximized, but keeping others as an option to make sure the model can be run on all devices as some operations may not be able to run on some compute units. But again please consider filing a feedback report with your use case if your goal is to more directly restrict certain compute units.

Compute unit specification for function runs
 
 
Q