Mixing Core AI and Core ML in one pipeline

We built a setup where a model split into an encoder and a decoder can run each part on a different backend, using our own component protocols. Is mixing Core AI and Core ML within a single inference pass something you would recommend, and what is the realistic cost at the boundary where we convert between MLMultiArray / MLTensor and NDArray? Is there a way to keep the encoder output resident on the GPU or ANE so it does not need a host round trip into the other backend?

Related: Are there any specific caveats when using Core ML and Core AI at the same time for multiple models in a single process?

Mixing Core AI and Core ML in one pipeline
 
 
Q