Reducing the Size of Your Core ML App

Reduce the storage used by the Core ML model inside your app bundle.


Bundling your machine learning model in your app is the easiest way to get started with Core ML. As models get more advanced, they can become large and take up significant storage space. For a neural-network based model, consider reducing its footprint by using a lower precision representation for its weight parameters. If your model isn’t a neural network that can use half precision or you need to further reduce your app’s size, add functionality to download and compile your models on the user’s device instead of bundling the models with your app.

Convert to a Half-Precision Model

The Core ML Tools provide a conversion function to convert a neural network model’s floating point weights from full-precision into half-precision values (reducing the number of bits used in the representation from 32 down to 16). This type of conversion can significantly reduce a network’s size, most of which often comes from the connection weights within the network.

Listing 1

Converting a model to lower precision with Core ML Tools

# Load a model, lower its precision, and then save the smaller model.
model_spec = coremltools.utils.load_spec('./exampleModel.mlmodel')
model_fp16_spec = coremltools.utils.convert_neural_network_spec_weights_to_fp16(model_spec)
coremltools.utils.save_spec(model_fp16_spec, 'exampleModelFP16.mlmodel')

You can only convert neural networks or pipeline models embedding neural networks to half precision. All full-precision weight parameters in a model must be converted to half-precision.

Using half-precision floating point values not only reduces the accuracy of the floating point values, but the range of possible values is also reduced. Before deploying this option to your users, confirm that the behavior of your model is not degraded.

Models that are converted to half precision require these OS versions or later: iOS 11.2, macOS 10.13.2, tvOS 11.2, or watchOS 4.2.

Convert to a Lower Precision Model

Core ML Tools 2.0 introduced new utilities to reduce the precision of a model down to 8, 4, 2, or 1 bit. The tools include functions to gauge the differences in behavior between the original model and the lower precision model. For more information about using these utilities, see the Core ML Tools Neural Network Quantization documentation.

Lower precision models require these OS versions or later: iOS 12, macOS 10.14, tvOS 12, or watchOS 5.

Download and Compile a Model

Another option to reduce the size of your app is to have the app download the model onto the user’s device and compile it in the background. For example, if users use only a subset of the models your app supports, you don’t need to bundle all the possible models with your app. Instead, the models can be downloaded later based on user behavior. See Downloading and Compiling a Model on the User's Device.