We built a setup where a model split into an encoder and a decoder can run each part on a different backend, using our own component protocols. Is mixing Core AI and Core ML within a single inference pass something you would recommend, and what is the realistic cost at the boundary where we convert between MLMultiArray / MLTensor and NDArray? Is there a way to keep the encoder output resident on the GPU or ANE so it does not need a host round trip into the other backend?
Mixing Core AI and Core ML in one pipeline
Related: Are there any specific caveats when using Core ML and Core AI at the same time for multiple models in a single process?