Divergent output between PyTorch a… | Apple Developer Forums

This issue has already been raised a few times in the coremltools repo (here, here, and here). I'm reposting here because this may be an issue in CoreML itself.

In short, converting Huggingface's Bert implementation from PyTorch to CoreML results in significantly different model outputs. This test was originally posted in one of the linked issues:

import numpy as np
import torch
from transformers import AutoTokenizer, AutoModel
import coremltools as ct

MODEL_NAME = "bert-base-uncased"

sentences = ["This is a test."]
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModel.from_pretrained(MODEL_NAME, torchscript=True).eval()
encoded_input = tokenizer(sentences, return_tensors=&#039;pt&#039;)

traced_model = torch.jit.trace(model, tuple(encoded_input.values()))
scripted_model = torch.jit.script(traced_model)

model = ct.convert(scripted_model, source="pytorch",
                   inputs=[ct.TensorType(name="input_ids",      shape=(ct.RangeDim(), ct.RangeDim()),      dtype=np.int32),
                           ct.TensorType(name="token_type_ids", shape=(ct.RangeDim(), ct.RangeDim()), dtype=np.int32),
                           ct.TensorType(name="attention_mask", shape=(ct.RangeDim(), ct.RangeDim()), dtype=np.int32)],
                   convert_to="mlprogram", compute_units=ct.ComputeUnit.CPU_ONLY)
with torch.no_grad():
    pt_out = scripted_model(**encoded_input)
cml_inputs = {k: v.to(torch.int32).numpy() for k, v in encoded_input.items()}
pred_coreml = model.predict(cml_inputs)
np.testing.assert_allclose(pt_out[0].detach().numpy(), pred_coreml["hidden_states"], atol=1e-5, rtol=1e-4)

Running this shows that the model outputs are highly divergent:

Max absolute difference: 7.901174
Max relative difference: 3424.6594

By contrast, running the same test with Huggingface's Distilbert implementation (distilbert-base-uncased) shows a much smaller difference in output:

Max absolute difference: 0.00523943
Max relative difference: 45.603153

Again, I'm not totally sure that this is an issue in CoreML, but it would be great to be able to run Bert based models with CoreML!

Divergent output between PyTorch and converted CoreML Huggingface Bert model