Dynamic coreml model inference is significantly slower than static model

devices:

iphone 11

config:

configuration.computeUnits = .all
let myModel = try! myCoremlModel(configuration: configuration).model

Set the Range for Each Dimension:

input_shape= ct.Shape(shape=(1,3,ct.RangeDim(lower_bound=128, upper_bound=384, default=256),ct.RangeDim(lower_bound=128, upper_bound=384, default=256)))

  • inference time as table(average of 100 runs)

The default size inference for dynamic models is the same as for static models, but 128128 and 384384 hundreds of times slow than fixed-size models. Is this normal? Is there any good solution?

  • model init time is too long

load model time about 2 minutes, Is there a way to speed it up? For example, load from the cache? Can converted mlparkage speed up the loading time?

Accepted Reply

For models with range flexibility, we currently only support running on the Neural Engine for the input's default shape. Other shapes will be run on either GPU or CPU, which is likely why you are seeing higher latency for non-default shapes.

One other option you have here is to use enumerated flexibility instead of range flexibility. If you only need a smaller set of sizes supported by the model, you can use ct.EnumeratedShapes type to specify each shape the model should support. For enumerated shape flexibility, each shape should be able to run on the Neural Engine. You can read more about the advantages of enumerated shapes here https://coremltools.readme.io/docs/flexible-inputs.

  • Thank you very much, this explains my confusion.

    I have try to use predetermined shapes, the model i used is with multi-inputs

    Except default shape , the other two are still too slow. Please see another reply with sample code:)

  • Thank you for reply, i have try use predefined models, but my model have two inputs, there are still some problems. Please check my post below for detailed test~

Add a Comment

Replies

For models with range flexibility, we currently only support running on the Neural Engine for the input's default shape. Other shapes will be run on either GPU or CPU, which is likely why you are seeing higher latency for non-default shapes.

One other option you have here is to use enumerated flexibility instead of range flexibility. If you only need a smaller set of sizes supported by the model, you can use ct.EnumeratedShapes type to specify each shape the model should support. For enumerated shape flexibility, each shape should be able to run on the Neural Engine. You can read more about the advantages of enumerated shapes here https://coremltools.readme.io/docs/flexible-inputs.

  • Thank you very much, this explains my confusion.

    I have try to use predetermined shapes, the model i used is with multi-inputs

    Except default shape , the other two are still too slow. Please see another reply with sample code:)

  • Thank you for reply, i have try use predefined models, but my model have two inputs, there are still some problems. Please check my post below for detailed test~

Add a Comment

Here is a simple example

import torch
import torch.nn as nn

import coremltools as ct

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv_pre1 = nn.ConvTranspose2d(128, 256, kernel_size=3, stride=2, padding=1, output_padding=1)
        self.conv_pre2 = nn.ConvTranspose2d(256, 256, kernel_size=3, stride=2, padding=1, output_padding=1)

        self.conv1 = nn.ConvTranspose2d(256, 256, kernel_size=3, stride=2, padding=1, output_padding=1)
        self.conv2 = nn.ConvTranspose2d(256, 256, kernel_size=3, stride=2, padding=1, output_padding=1)
        self.conv3 = nn.ConvTranspose2d(256, 256, kernel_size=3, stride=2, padding=1, output_padding=1)
        self.conv4 = nn.ConvTranspose2d(256, 3, kernel_size=3, stride=2, padding=1, output_padding=1)


    def forward(self, input1, input2):
        y = self.conv_pre1(input2)
        y = self.conv_pre2(y)
        
        x = input1 + y
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.conv3(x)
        x = self.conv4(x)
        nn_output = torch.clip(x, 0.0, 1.0)
        recon_img_out = torch.ceil(nn_output*255.0-0.5)
        return recon_img_out

model = Model()
model.cuda()


dummy_input_f = torch.randn(1,256, 68, 120, device='cuda')
dummy_input_z = torch.randn(1,128, 17, 30, device='cuda')

torch_model = model.eval()
trace_model = torch.jit.trace(torch_model, (dummy_input_f, dummy_input_z))


# Set the input_shape to use RangeDim for each dimension.
input_x1_shape = ct.EnumeratedShapes(shapes=[[1, 256, 128//16, 128//16],
                                          [1, 256, 8,8],
                                          [1, 256, 24,24]],
                                          default=[1, 3,16,16])
input_x2_shape = ct.EnumeratedShapes(shapes=[[1, 128, 2, 2],
                                          [1, 128, 2, 2],
                                          [1, 128, 6, 6]],
                                          default=[1, 128, 4, 4])

input_1=ct.TensorType(name="input_x1", shape=input_x1_shape)  
input_2=ct.TensorType(name="input_x2", shape=input_x2_shape)  
outputs=ct.TensorType(name="output_img")  
# outputs=ct.ImageType(name="output_img", color_layout=ct.colorlayout.RGB)
mlmodel = ct.convert(
    trace_model,
    inputs=[input_1, input_2],
    outputs=[outputs],
)
mlmodel.save("check.mlmodel")

Except default shape , the other two are still too slow.

  1. input1: 8x8  input2: 2x2 50ms
  2. input1: 24x24  input2: 6x6 50ms
  3. input1: 16x16  input2: 4x4(default) 1.8ms

Then i change model of one input by remove input2, non-default shape inference times speed up a bit, but still unusual.

Enumerate Model Inference Speed:

  1. input1: 8x8  1.9ms
  2. input1: 24x24  12.14ms
  3. input1: 16x16  (default) 1.8ms

8x8 and 24x24  inference times with a fixed size model is ~0.5ms and ~4ms.

Are these results normal? Does the single-input enumerate model also slow down by 3 to 4 times??