Loading time of a CoreML model on iPhone varies in function of the computeUnits

We use several CoreML models on our swift application. Memory footprint of these coreML models varies in a range from 15 kB to 3.5 MB according to the XCode coreML utility tool. We observe a huge difference of loading time in function of the type of the compute units selected to run the model. Here is a small sample code used to load the model:

let configuration = MLModelConfiguration()

//Here I use the the .all compute units mode:

configuration.computeUnits = .all

let myModel = try! myCoremlModel(configuration: configuration).model

Here are the profiling results of this sample code for different models sizes in function of the targeted compute units:

Model-3.5-MB :

  • computeUnits is .cpuAndGPU: 188 ms ⇒ 18 MB/s
  • computeUnits is .all or .cpuAndNeuralEngine on iOS16: 4000 ms ⇒ 875 kB/s

Model-2.6-MB:

  • computeUnits is .cpuAndGPU: 144 ms ⇒ 18 MB/s
  • computeUnits is .all or .cpuAndNeuralEngine on iOS16: 1300 ms ⇒ 2 MB/s

Model-15-kB:

  • computeUnits is .cpuAndGPU: 18 ms ⇒ 833 kB/s
  • computeUnits is .all or .cpuAndNeuralEngine on iOS16: 700 ms ⇒ 22 kB/s

What explained the difference of loading time in function en the computeUnits mode ? Is there a way to reduce the loading time of the models when using the .all or .cpuAndNeuralEngine computeUnits mode ?

Depending on the compute unit preference specified at the model load time (MLModelConfiguration.computeUnits), CoreML performs additional optimisations at the model load time - which explains the difference in load time. Note that many of these optimisations happen only at the first model load and should not impact subsequent loads. Please file a bug on feedbackassistant.apple.com if you observe otherwise.

Loading time of a CoreML model on iPhone varies in function of the computeUnits
 
 
Q