ARKit3 Hardware Requirement Clarification, and How to Get Depth Info and Occlusion / Face tracking / Motion Capture for Other Users?

Question

Created Aug ’19

Replies 1

Boosts 0

Views 1.7k

Participants 2

Hello. I'm interested in working with ARKit 3 and a couple of iPads to create a multi-user (collaborative) experience, as support for collaborative AR seems to have improved according to WWDC '19.

However, there are a few features and requirements that are unclear to me:

Firstly, on Apple's ARKit 3 page (https://developer.apple.com/augmented-reality/arkit/) there is fine print reading, "People Occlusion and the use of motion capture, simultaneous front and back camera, and multiple face tracking are supported on devices with A12/A12X Bionic chips, ANE, and TrueDepth Camera."

I read this sentence as a computer scientist, which suggests that the device must have an A12, ANE, *and* a TrueDepth camera to support any of "people occlusion, motion capture, simultaneous front/back camera, and multiple face tracking." If this is the case, then the only iPad that can use any of these features is the latest iPad Pro, and *not* an Air, which have an A12, but not a TrueDepth camera. (Sidenote: what is ANE? I can't find documentation on it, but I think it has something to do with the machine learning system.) Is this correct--that only the iPad Pro supports any of these features?

I ask because people occlusion is incredibly important for a multi-user experience around a table.

Secondly, Apple talks a lot about face tracking and motion capture, but it sounds like this is only supported on the front facing camera (facing the person holding the device.) Is there no way to do face tracking of your friends who are sharing the experience? In the WWDC demo video, it looks like the motion capture character is being generated from a person in the user's view, and the Minecraft demo shows people in the user's view being mixed with Minecraft content in AR. This suggests that the *back camera* is handling this. Yet, I thought the point of AR was to attach virtual objects to the physical world *in front of you*. Reality Composer has an example with face tracking and a quote bubble that would follow the face around, but because I do not have a device with a depth camera, I do not know if the example is meant to have that quote bubble follow you, the user around, or someone else in the camera's view.

In short, I'm a little confused about what sorts of things I can do with face tracking, people occlusion, and body tracking with respect to other people in a shared AR environment. Which camera are in use, and which features can I apply to other people as opposed to just myself (selfie style)? My assumption is that motion capture and people occlusion work for multiple people (in the scene), but that face tracking, mesh generation, and facial expression recognition are only for the user holding the iPad (and any other people standing with the user). Is this correct? If so, is there any other way to get facial mesh generation and expression recognition? How do I get depth information from the scene if DrueDepth won't give me this information?

Lastly, assuming that I CAN do face and body tracking of other people in my view, and that I can do occlusion for other people, would someone direct me to some example code? Specifically I'd like to know how to use the networking / physics system in tandem with all of this so I can have a synchronized experience. I'd also like to use the depth information from the scene (again, if that's possible).

Thank you.

Boost

Answer 1

davidjohnston OP

Aug ’19

Hiya,

IPerhaps the copy on the Product page isn't as plain as it could be. My understanding from testing the APIs and watching the WWDC sessions is this :

People Occlusion requires A12 and A12X Bionic Chips (or later)

Motion Capture requires A12 and A12X Bionic Chips (or later)

Multiple face tracking requires a A12 or A12X Bionic chip AND a TrueDepth Camera

Simultaneous Front and Back Camera usage requires A12 or A12X Bionic chip AND a TrueDepth Camera (required for the facial tracking element of the simultaneous usage)

ANE is the "Apple Neural Engine", a co-processor in the System On A Chip that Apple uses for machine learning inference on device (presumably a highly optimised matrix co-processor).

Motion Capture (as a skeleton) works using the outward (rear) facing camera. Face tracking requires the TrueDepth camera and uses the 'selfie' camera.

If you look at the API for RealityKit, all the samples are there : https://developer.apple.com/documentation/realitykit

Hope that helps.

1