Display bounding box in SceneKit

Hello, I am using YOLOv3 with Vision to classify objects during my AR session. I want to render the bounding boxes of the detected objects in my screen view. Unfortunately, the bounding boxes are are placed too far down and have a wrong aspect ratio. Does someone know what the issue might be?

This is how I am currently transforming the bounding boxes. Assumptions:

  • The app is in portrait mode
  • Vision request is performed with centerCrop and orientation .right.

Fix the coordinate origin of vision:

let newY = 1 - boundingBox.origin.y
    let newBox = CGRect(x: boundingBox.origin.x, y: newY, width: boundingBox.width, height: boundingBox.height)

Undo center cropping of Vision:

let imageResolution: CGSize = currentFrame.camera.imageResolution
// Switching height and width because the original image is rotated
let imageWidth = imageResolution.height
let imageHeight = imageResolution.width
// Square inside of normalized coordinates.
let roi = CGRect(x: 0, y: 1 - (imageWidth/imageHeight + ((imageHeight-imageWidth) / (imageHeight*2))), width: 1, height: imageWidth / imageHeight)
let newBox = VNImageRectForNormalizedRectUsingRegionOfInterest(boundingBox, Int(imageWidth), Int(imageHeight), roi)

Bring coordinates back to normalized form:

let imageWidth = imageResolution.height
let imageHeight = imageResolution.width
let transformNormalize = CGAffineTransform(scaleX: 1.0 / imageWidth, y: 1.0 / imageHeight)
let newBox = boundingBox.applying(transformNormalize)

Transform to scene view: (I assume the error is here. I found out while debugging that the aspect ratio of the bounding box changes here.)

let viewPort = sceneView.frame.size
let transformFormat = currentFrame.displayTransform(for: .landscapeRight, viewportSize: viewPort)
let newBox = boundingBox.applying(transformFormat)

Scale up to viewport size:

let viewPort = sceneView.frame.size
let transformScale = CGAffineTransform(scaleX: viewPort.width, y: viewPort.height)
let newBox = boundingBox.applying(transformScale)

Thanks in advance for any help!

Replies

I have the same question wrt a Detectron2 model. Would like to load the masks into ARKit, but can't figure out the correct transformations to match the arscene.