Loading time of a CoreML model on iPhone varies in function of the computeUnits

We use several CoreML models on our swift application. Memory footprint of these coreML models varies in a range from 15 kB to 3.5 MB according to the XCode coreML utility tool. We observe a huge difference of loading time in function of the type of the compute units selected to run the model. Here is a small sample code used to load the model:

let configuration = MLModelConfiguration()

//Here I use the the .all compute units mode:

configuration.computeUnits = .all

let myModel = try! myCoremlModel(configuration: configuration).model

Here are the profiling results of this sample code for different models sizes in function of the targeted compute units:

Model-3.5-MB :

  • computeUnits is .cpuAndGPU: 188 ms ⇒ 18 MB/s
  • computeUnits is .all or .cpuAndNeuralEngine on iOS16: 4000 ms ⇒ 875 kB/s

Model-2.6-MB:

  • computeUnits is .cpuAndGPU: 144 ms ⇒ 18 MB/s
  • computeUnits is .all or .cpuAndNeuralEngine on iOS16: 1300 ms ⇒ 2 MB/s

Model-15-kB:

  • computeUnits is .cpuAndGPU: 18 ms ⇒ 833 kB/s
  • computeUnits is .all or .cpuAndNeuralEngine on iOS16: 700 ms ⇒ 22 kB/s

What explained the difference of loading time in function en the computeUnits mode ? Is there a way to reduce the loading time of the models when using the .all or .cpuAndNeuralEngine computeUnits mode ?

Post not yet marked as solved Up vote post of dbphr Down vote post of dbphr
1.4k views

Replies

Depending on the compute unit preference specified at the model load time (MLModelConfiguration.computeUnits), CoreML performs additional optimisations at the model load time - which explains the difference in load time. Note that many of these optimisations happen only at the first model load and should not impact subsequent loads. Please file a bug on feedbackassistant.apple.com if you observe otherwise.