I am trying to use VNDetectFaceRectanglesRequest
to detect face bounding boxes on frames obtained by ARKit callbacks.
I have my app in Portrait Device Orientation and I am passing the .right
orientation to perform
method on VNSequenceRequestHandler
something like:
private let requestHandler = VNSequenceRequestHandler()
private var facePoseRequest: VNDetectFaceRectanglesRequest!
// ...
try? self.requestHandler.perform([self.facePoseRequest], on: currentBuffer, orientation: orientation)
Im setting .right
for orientation
above, in the hopes that Vision-Framework will re-orient before running inference.
Im trying to draw the returned BB on top of the Image. Here's my results processing code:
guard let faceRes = self.facePoseRequest.results?.first as? VNFaceObservation else {
return
}
//Option1: Assuming reported BB is in coordinate space of orientation-adjusted pixel buffer
// Problems/Observations:
// BoundingBox turns into a square with equal width and height
// Also BB does not cover entire face, but only from chin to eyes
//Notice Height & Width are flipped below
let flippedBB = VNImageRectForNormalizedRect(faceRes.boundingBox, currBufHeight, currBufWidth)
//vs
//Option2: Assuming, reported BB is in coordinate-system of original un-oriented pixel-buffer
// Problem/Observations:
// while the drawn BB does appear like a rectangle and covering most of the face, it is not always centered on the face.
// It moves around the screen when I tilt the device or my face.
let currBufWidth = CVPixelBufferGetWidth(currentBuffer)
let currBufHeight = CVPixelBufferGetHeight(currentBuffer)
let reportedBB = VNImageRectForNormalizedRect(faceRes.boundingBox, currBufWidth, currBufHeight)
In Option1 above:
BoundingBox becomes a square shape with Width
and Height
becoming equal. I noticed that the reported normalized BB has the same aspect ration as the Input Pixel Buffer, which is 1.33 . This is the reason that when I flip Width and Height params in VNImageRectForNormalizedRect
, width and height become equal.
In Option2 above: BB seems to be somewhat right height, it jumps around when I tilt the device or my head.
What coordinate system are the reported bounding boxes in? Do I need to adjust for y-flippedness of Vision framework before I perform above operations? What's the best way to draw these BB on the captured-frame and or ARview?
Thank you