I am trying to map the 3D skeleton joint positions of an ARBodyAnchor to the real body on the camera image.
I know I could simply use the "detectedBody" of the ARFrame, which would already deliver the normalized 2D position of each joint, but what I am mostly interested in is the z-axis (the distance of each joint to the camera).
I am starting a ARBodyTrackingConfiguration, setting the world alignment to ARWorldAlignmentCamera (in which case the camera transform is an identity matrix) and multiplying each joint transform in model space (via modelTransformForJointName:) with the transform of the ARBodyAnchor. And then tried many different ways to get the joints to line up with the image, by for example multiplying the transforms with the projectionMatrix of the ARCamera. But whatever I do, it never lines up correctly.
For example, the doesn't really seem to be a scale factor in the projectionMatrix or the ARBodyAnchor transform, no matter the distance of the camera to the detected body, the scale of the body is always the same.
Which means I am missing something important, and I haven't figured out what. So does anyone have an example of how I can get the body align to the camera image? (or get the distance to each joint in any other way?)
Thanks!
Hello @snarp,
You are roughly on the right track, it's not clear to me exactly where your error is.
Here is a snippet that projects the head joint to normalized view space:
guard let headJointModelTransform = bodyAnchor.skeleton.modelTransform(for: .head) else { return }
let headJointWorldTransform = bodyAnchor.transform * headJointModelTransform
let headJointWorldPosition = SIMD4(headJointWorldTransform.columns.3[SIMD3(0,1,2)], 1.0)
let projectionMatrix = frame.camera.projectionMatrix(for: interfaceOrientation, viewportSize: arView.frame.size, zNear: 0.01, zFar: 100)
let viewMatrix = frame.camera.viewMatrix(for: interfaceOrientation)
let viewProjectionMatrix = projectionMatrix * viewMatrix
let clipSpacePosition = viewProjectionMatrix * headJointWorldPosition
let ndcPosition = (clipSpacePosition / clipSpacePosition.w)[SIMD3(0,1,2)]
let normalizedPosition = (ndcPosition + 1) / 2
print(normalizedPosition)