Article

Reducing the Size of Your Core ML App

Reduce the storage used by the Core ML model inside your app bundle.

Overview

Bundling your machine learning model in your app is the easiest way to get started with Core ML. As models get more advanced, they can become large and take up significant storage space. For a neural network based model, consider reducing its footprint by using a lower precision representation for its weight parameters. If your model is not a neural network that can use half precision or you need to further reduce your app's size, add functionality to download and compile your models on the user's device instead of bundling the models with your app.

Convert to a Half-Precision Model

The Core ML Tools provide a conversion function to convert a neural network model's floating point weights from full-precision into half-precision values (reducing the number of bits used in the representation from 32 down to 16). This type of conversion can significantly reduce a network's size, most of which often comes from the connection weights within the network.

Listing 1

Converting a model to lower precision with Core ML Tools

# Load a model, lower its precision, and then save the smaller model.
model_spec = coremltools.utils.load_spec('./exampleModel.mlmodel')
model_fp16_spec = coremltools.utils.convert_neural_network_spec_weights_to_fp16(model_spec)
coremltools.utils.save_spec(model_fp16_spec, 'exampleModelFP16.mlmodel')

You can only convert neural networks or pipeline models embedding neural networks to half precision. All full-precision weight parameters in a model must be converted to half-precision.

Using half-precision floating point values not only reduces the accuracy of the floating point values, but the range of possible values is also reduced. Before deploying this option to your users, confirm that the behavior of your model is not degraded.

Download and Compile a Model

Another option to reduce the size of your app is to have the app download the model onto the user's device and compile it in the background. For example, if users use only a subset of the models your app supports, you don't need to bundle all the possible models with your app. Instead, the models can be downloaded later based on user behavior. See Downloading and Compiling a Model on the User's Device.