How to Train and Deploy PyTorch Models on Apple Hardware: A Unified Path for Deep ML Practice on Core ML?

Submited as : FB16052050

I am looking to adopt Machine Learning in a more granular manner, going beyond just using pre-built Metal, Core ML, or Create ML approaches. Specifically, I want to train models using Open Python PyTorch libraries, as these offer greater flexibility compared to Apple's native tools. However, these PyTorch APIs are primarily optimised for NVIDIA GPUs (or TPUs), not Apple's M3 or Apple Neural Engine (ANE).

My goal is to train the models locally without resorting to cloud-based solutions for training or inference, and to then convert the models into Core ML format for deployment on Apple hardware. This would allow me to leverage Apple's hardware acceleration (via ANE, Metal, and MPS) while maintaining control over the training process in PyTorch.

I want to know:

What are my options for training models in PyTorch on local hardware (Apple M3 or equivalent), and how can I ensure that the PyTorch model can eventually be converted to Core ML without losing flexibility in model training and customisation? How can I perform training in PyTorch and avoid being restricted to inference-only workflows as Core ML typically allows? Is it possible to use the training capabilities of PyTorch and still get the performance benefits of Apple's hardware for both training and inference? What are the best practices or tools to ensure that my training pipeline in PyTorch is compatible with Apple's hardware constraints and optimised for local execution?

I'm seeking a practical, cloud-free approach on Apple Hardware only that allows me to train models in PyTorch (keeping control over the training process) while ensuring that they can be deployed efficiently using Core ML on Apple hardware.

I am doing some experiments in past few days, it's not easy to convert my PyTorch model to coreML models.

I have to modify the python code to avoid conversation error, even if the conversation succeeds, it may crash when running in Xcode; finally I can manage to convert the model successfully and run it in Xcode, but it only uses the CPU to compute, only if I use fixed input shape, it can use the NPU to compute, but I really need to use non-fixed input shape to deal with different images.

So I'm searching other solutions, mlx? libTorch? I don't know.

How to Train and Deploy PyTorch Models on Apple Hardware: A Unified Path for Deep ML Practice on Core ML?
 
 
Q