Object Detection using Vision performs different than in Create ML Preview

Context

So basically I've trained my model for object detection with +4k images. Under preview I'm able to check the prediction for Image "A" which detects two labels with 100% and its Bounding Boxes look accurate.

The problem itself

However, inside the Swift Playground, when I try to perform object detection using the same model and same Image I don't get same results.

What I expected

Is that after performing the request and processing the array of VNRecognizedObjectObservation would show the very same results that appear in CreateML Preview.

Notes:

  • So the way I'm importing the model into playground is just by drag and drop.
  • I've trained the images using JPEG format.
  • The test Image is rotated so that it looks vertical using MacOS Finder rotation tool.
  • I've tried, while creating VNImageRequestHandlerto pass a different orientation, with the same result.

Swift Playground code

This is the code I'm using.

import UIKit
import Vision

do{
    let model = try MYMODEL_FROMCREATEML(configuration: MLModelConfiguration())

    let mlModel = model.model
    let coreMLModel = try VNCoreMLModel(for: mlModel)


    let request = VNCoreMLRequest(model: coreMLModel) { request, error in
        
        guard let results = request.results as? [VNRecognizedObjectObservation] else {
            return
        }
        results.forEach { result in
            print(result.labels)
            print(result.boundingBox)
        }

    }

    let image = UIImage(named: "TEST_IMAGE.HEIC")!
    
    let requestHandler = VNImageRequestHandler(cgImage: image.cgImage!)

    try requestHandler.perform([request])
} catch {
    print(error)
}

Additional Notes & Uncertainties

Not sure if this is relevant, but just in case: I've trained the model using pictures I took from my iPhone using 48MP HEIC format. All photos were on vertical position. With a python script I overwrote the EXIF orientation to 1 (Normal). This was in order to be able to annotate the images using the CVAT tool and then convert to CreateML annotation format.

Assumption #1

Since I've read that Object Detection in Create ML is based on YOLOv3 architecture which inside the first layer resizes the image dimension, meaning that I don't have to worry about using very large images to train my model. Is this correct?

Assumption #2

Also makes me asume that the same thing happens when I try to make a prediction?