What is the difference between sceneView.pointOfView.transform and ARCamera.transform in ARKit?

I have a fair background with camera geometry, but I am new to Swift and ARKit development, and I am getting a bit confused between the different types of transform matrices that can be returned.

In particular, one tutorial I followed had me grabbing the ARSCNView as sceneView and then printing sceneView.pointOfView.transform. This showed me what I expect: when the session is begun, the 4x4 matrix is very close to identity.

However, when using the ARCamera associated with each ARFrame, as in this code snippet:

Code Block
extension ViewController: ARSessionDelegate {
func session(_ session: ARSession, didUpdate frame: ARFrame) {
print(frame.camera.transform.transpose)
}
}


I instead get a camera matrix that looks like (note the transpose is on purpose as I am more familiar with the row-major layout):

Code Block
[0 1 0 0]
[-1 0 0 0]
[0 0 1 0]
[0 0 0 1]


I've tried explicitly adding the ARWorldAlignment property to .gravity and .gravityAndHeading, but neither produces the expected identity matrix when the session is run.

I also added the .showWorldOrigin debug option, which shows the world axes as I would expect: +X to the right of my phone, +Y up, and -Z down the view axis.

What am I missing here? Can someone please clarify the coordinate systems of the transforms that ARFrame is returning and how they differ from those of ARSCNView?

At the end of the day, I am looking to log the camera pose (rotation and translation) during the session, and I would expect the initial position to have identity rotation and the camera center to be at the origin. How do I achieve that?
Hello,

The difference is that the ARCamera transform is constant regardless of the view orientation, it always uses landscape right to define the camera's x, y, and z axes. (i.e. +x will be towards the edge of the device (where the front facing camera is), +y will be towards the left edge of the device assuming the device is in portrait, and +z will be outwards from the screen (towards the user))

The pointOfView transform rotates the ARCamera's transform depending on the view orientation. As a test, you can restrict your app's supported device orientations to "Landscape Right" only, and you will see that the two transforms are equivalent.

At the end of the day, I am looking to log the camera pose (rotation and translation) during the session, and I would expect the initial position to have identity rotation and the camera center to be at the origin. How do I achieve that?

And to elaborate on this question:

The first transform that you receive should have its translation components (i.e. 3rd column x,y,z components) set to zero. However, you would only receive an identity rotation in this first transform if the device was perfectly aligned with all three axes of the world coordinate system, which is very unlikely to occur.
Thanks for the response @gchiste! This clarifies things a lot. Of course I don't expect floating point perfection with orientation, but, yeah, generally, I'd expect the orientation to be near identity if aligned to the world coordinate system when the session has begun.

Interestingly though, I observe everything you said but 180 degrees out of phase. That is, when I rotate my phone 90 degrees CCW about the Z axis (from portrait, camera rotates from top to left--I believe this is "Landscape Left"), things align for me (iPhone 11, iOS 14.0.1).

Now, this said, it seems my confusion stemmed from the coordinate systems not being clear. Where is this logged in the developer docs? All coordinate system information seems to provide references to a portrait-view-based system, e.g. ARWorldAlignmentGravity.

Do I understand you correctly when I say that our camera coordinate system is fixed according to the landscape orientation (one of left or right), and that ARCamera.transform is giving us the transform [R t] s.t.

X_w = RX_c + t

where X_w is a 3D point in the world coordinate system and X_c is a 3D point in the camera coordinate system? That is, this is the camera-to-world transformation.

As I understand it, with .gravity alignment, the world coordinate system sets +Y as the anti-gravity vector. Are +X and +Z just random mutually orthonormal vectors? They seem to largely align to the camera's initial orientation (+X is right and -Z is forward) though.

Lastly, is there any way to set these camera coordinate system to align to identity in portrait view? Obviously I can simply provide a rotation matrix if that's what I'm going for, but it might be a useful feature (if it doesn't already exist) to expose camera coordinate system alignment flags similar to the world alignment flags. If only so that developers can be explicit in the code.
Accepted Answer
@meder411


Interestingly though, I observe everything you said but 180 degrees out of phase. That is, when I rotate my phone 90 degrees CCW about the Z axis (from portrait, camera rotates from top to left--I believe this is "Landscape Left"), things align for me (iPhone 11, iOS 14.0.1).

We are referring to the same orientation. If you set the "Device Orientation" setting to "Landscape Right" in Xcode, your interface will be rotated 90 degrees CCW about the Z axis as you describe, so what you are seeing is correct. (This is landscape right in the sense that it is a landscape device orientation and if the device had a home button, it would be to the right.)

Now, this said, it seems my confusion stemmed from the coordinate systems not being clear. Where is this logged in the developer docs? All coordinate system information seems to provide references to a portrait-view-based system, e.g. ARWorldAlignmentGravity.

This is mentioned in the ARCamera transform documentation.

where Xw is a 3D point in the world coordinate system and Xc is a 3D point in the camera coordinate system? That is, this is the camera-to-world transformation.

The two points would have the same translation components, but they would have different rotations unless the interface was also in landscape right.

As I understand it, with .gravity alignment, the world coordinate system sets +Y as the anti-gravity vector. Are +X and +Z just random mutually orthonormal vectors? They seem to largely align to the camera's initial orientation (+X is right and -Z is forward) though. 

The .gravity coordinate system is thoroughly defined here (is there a part of that definition that you find confusing?): https://developer.apple.com/documentation/arkit/arconfiguration/worldalignment/gravity

Lastly, is there any way to set these camera coordinate system to align to identity in portrait view? Obviously I can simply provide a rotation matrix if that's what I'm going for, but it might be a useful feature (if it doesn't already exist) to expose camera coordinate system alignment flags similar to the world alignment flags. If only so that developers can be explicit in the code.

There is no api to set this. As you say, you can apply a rotation, but if you'd like to see api for this specific camera coordinate system transformation, then please file an enhancement request using Feedback Assistant.

Thanks!









@gchiste: Thank you for resolving most of my initial confusion! One last thing:

Your description of landscapeRight:

(This is landscape right in the sense that it is a landscape device orientation and if the device had a home button, it would be to the right.)

seems to conflict with the Docs:

The device is in landscape mode, with the device held upright and the home button on the left side.

I'm sure there's a reason for this, but it's not clear to me. Could you please help me understand the distinction?
@meder411

The landscape right as I have described it seems to align with the docs for UIInterfaceOrientation.landscapeRight. This also seems to align with the results of checking the "Landscape Right" box for the "Device Orientation" setting in Xcode.
@gchiste: Ah, okay that clears it up.

I had confused the landscapeRight flag of UIInterfaceOrientation with that of UIDeviceOrientation.

From UIDeviceOrientation.landscapeRight:

The device is in landscape mode, with the device held upright and the home button on the left side.

From UIInterfaceOrientation.landscapeRight:

The device is in landscape mode, with the device upright and the Home button on the right.

It's getting away from my original post, but these conventions are definitely something that will trip up novices like myself, especially as you point out that the the 'Landscape Right' box for the *'Device Orientation'* setting in Xcode will align with the UIInterfaceOrientation definition, rather than UIDeviceOrientation.
What is the difference between sceneView.pointOfView.transform and ARCamera.transform in ARKit?
 
 
Q