Mixing Core AI and Core ML in one pipeline

We built a setup where a model split into an encoder and a decoder can run each part on a different backend, using our own component protocols. Is mixing Core AI and Core ML within a single inference pass something you would recommend, and what is the realistic cost at the boundary where we convert between MLMultiArray / MLTensor and NDArray? Is there a way to keep the encoder output resident on the GPU or ANE so it does not need a host round trip into the other backend?

Answered by Engineer in 891724022

Mixing compute between CoreML and CoreAI is definitely possible, some parts can be done without bridging cost, others may require synchronization:

In terms of mixing buffer types, MLMultiArray/MLShapedArray can be bridged without copy by using NDArray.View/RawView types to construct NDArray views from the memory of the MultiArray/ShapedArray . Then those views can be used as inputs to the CoreAI work (or construct mutable views for the outputs).

You can similarly bridge an MLTensor to an NDArray by first bridging MLTensor to an MLShapedArray and then bridge to NDArray as shown.

However for the MLTensor bridging it sounds like you're trying to have an optimized asynchronous pipeline between the CoreML work and CoreAI model such that if both are running on GPU/Neural engine they share the same async flow without synchronizing to CPU in between. We have support for this with separate models/functions running through CoreAI only , however there isn't a way to put both the CoreML and CoreAI inference on the same async pipeline, you'd have to wait on the CPU for the first to complete, then dispatch the second one.

Accepted Answer

The way to share buffers across runtimes would be using the async API and async mutable buffers:

https://developer.apple.com/documentation/coreai/inferencefunction/encode(inputs:states:outputviews:to:)

This gives broader control on how the NDArray data is stored (including backing by existing data structures), and how it is encoded into a ComputeStream

However you'd need to be aware about the memory layout (strides) and ensure they would match.

Mixing compute between CoreML and CoreAI is definitely possible, some parts can be done without bridging cost, others may require synchronization:

In terms of mixing buffer types, MLMultiArray/MLShapedArray can be bridged without copy by using NDArray.View/RawView types to construct NDArray views from the memory of the MultiArray/ShapedArray . Then those views can be used as inputs to the CoreAI work (or construct mutable views for the outputs).

You can similarly bridge an MLTensor to an NDArray by first bridging MLTensor to an MLShapedArray and then bridge to NDArray as shown.

However for the MLTensor bridging it sounds like you're trying to have an optimized asynchronous pipeline between the CoreML work and CoreAI model such that if both are running on GPU/Neural engine they share the same async flow without synchronizing to CPU in between. We have support for this with separate models/functions running through CoreAI only , however there isn't a way to put both the CoreML and CoreAI inference on the same async pipeline, you'd have to wait on the CPU for the first to complete, then dispatch the second one.

Mixing Core AI and Core ML in one pipeline
 
 
Q