Hi @christiandevin
It sounds like you want to convert a 2D point, on an image (of the left camera frame), to its corresponding location in 3D space.
Before we discuss this, I want to bring your attention to a bug: The camera intrinsic matrix is row major instead of column major. I suspect this bug is the cause of the unexpected behavior @tsia observed. To account for this, look for the principal point and focal length at different positions in the intrinsic matrix (see snippet).
Now let's turn to your goal. I'll refer to the 2D point on the image as the "observation point". Using the camera intrinsic and extrinsic data with queryDeviceAnchor, you can convert the observation point to a 3D point (in world space) that represents the observation's location relative to the left camera's projection plane. That's not the same as its position in 3D space. Imagine seeing the world through a piece of glass (which represents the projection plane), the former is a point on that glass and the latter is the actual point. To get the observation's position in 3D space you need a map of 2D points (on a projection plane) to depth. Depth data is not provided by CameraFrameProvider. I encourage you to file an enhancement request via feedback assistant with an explanation of your use case and how it can benefit from depth data.
In the meantime, consider one of the following alternatives to obtain the z axis:
- Use SceneReconstructionProvider to create collision shapes for real world objects then raycast along the vector from the device to the observation point. This works best on nearby, stationary objects.
- Use monocular depth. I've not tried this and it doesn't appear trivial to implement.
Sometimes code is easier to understand. Here's a snippet that covers the first observation returned from DetectBarcodesRequest with a plane. Note this positions an entity at the xy position of the observation relative to the left camera's projection plane then scales it to match the size of the barcode; it does not place the plane at the barcode's xyz position.
| |
| |
| guard let pixelBuffer = sample?.pixelBuffer else { return } |
| |
| let image = CIImage(cvPixelBuffer: pixelBuffer) |
| let request = DetectBarcodesRequest() |
| |
| |
| do { |
| observations = try await request.perform(on: image, orientation: .downMirrored) |
| } catch { |
| observations = [] |
| } |
Position a plane at the observation's xy coordinates relative to the projection plane for the left camera.
| struct ImmersiveView: View { |
| @Environment(AppModel.self) var appModel |
| @Environment(\.physicalMetrics) var physicalMetrics |
| @State var arkitSession = ARKitSession() |
| @State var worldTrackingProvider = WorldTrackingProvider() |
| @State var observationRoot = Entity() |
| |
| |
| @State var observationEntity = Entity() |
| |
| var body: some View { |
| @Bindable var appModel = appModel |
| |
| RealityView { content in |
| observationEntity.components.set(ModelComponent( |
| mesh: .generateBox(width: 2, height: 2, depth: 0.001), |
| materials: [SimpleMaterial(color: .green, isMetallic: false)] |
| )) |
| observationEntity.components.set(OpacityComponent(opacity: 0.5)) |
| observationEntity.isEnabled = false |
| |
| observationRoot.addChild(observationEntity) |
| content.add(observationRoot) |
| } |
| update: { content in |
| |
| guard |
| |
| let rect = appModel.observations.first?.boundingBox.cgRect, |
| |
| let sample = appModel.sample, |
| let deviceAnchor = worldTrackingProvider.queryDeviceAnchor(atTimestamp: CACurrentMediaTime()) else { |
| |
| observationEntity.isEnabled = false |
| |
| return |
| } |
| |
| observationEntity.isEnabled = true |
| |
| let intrinsics = sample.parameters.intrinsics |
| let focalLength = physicalMetrics.convert(intrinsics.columns.0.x, to: .meters) |
| let focalLengthTransform = Transform(translation: [0, 0, focalLength]).matrix |
| |
| |
| observationRoot.transform.matrix = deviceAnchor.originFromAnchorTransform |
| * sample.parameters.extrinsics.inverse |
| * focalLengthTransform |
| |
| |
| |
| let centerX = physicalMetrics.convert(intrinsics.columns.0.z, to: .meters) |
| let centerY = physicalMetrics.convert(intrinsics.columns.1.z, to: .meters) |
| |
| observationEntity.position.x = remap(value: Float(rect.midX), fromRange: [0, 1], toRange: [-centerX, centerX]) |
| observationEntity.position.y = remap(value: Float(rect.midY), fromRange: [0, 1], toRange: [-centerY, centerY]) |
| |
| observationEntity.scale.x = Float(rect.width) * centerX |
| observationEntity.scale.y = Float(rect.height) * centerY |
| |
| } |
| .task { |
| try? await arkitSession.run([worldTrackingProvider]) |
| await appModel.start() |
| } |
| } |
| |
| func remap(value: Float, fromRange: SIMD2<Float>, toRange: SIMD2<Float>) -> Float { |
| toRange.x + (value - fromRange.x) * (toRange.y - toRange.x) / (fromRange.y - fromRange.x) |
| } |
| } |