Hi @christiandevin
It sounds like you want to convert a 2D point, on an image (of the left camera frame), to its corresponding location in 3D space.
Before we discuss this, I want to bring your attention to a bug: The camera intrinsic matrix is row major instead of column major. I suspect this bug is the cause of the unexpected behavior @tsia observed. To account for this, look for the principal point and focal length at different positions in the intrinsic matrix (see snippet).
Now let's turn to your goal. I'll refer to the 2D point on the image as the "observation point". Using the camera intrinsic and extrinsic data with queryDeviceAnchor, you can convert the observation point to a 3D point (in world space) that represents the observation's location relative to the left camera's projection plane. That's not the same as its position in 3D space. Imagine seeing the world through a piece of glass (which represents the projection plane), the former is a point on that glass and the latter is the actual point. To get the observation's position in 3D space you need a map of 2D points (on a projection plane) to depth. Depth data is not provided by CameraFrameProvider. I encourage you to file an enhancement request via feedback assistant with an explanation of your use case and how it can benefit from depth data.
In the meantime, consider one of the following alternatives to obtain the z axis:
- Use SceneReconstructionProvider to create collision shapes for real world objects then raycast along the vector from the device to the observation point. This works best on nearby, stationary objects.
- Use monocular depth. I've not tried this and it doesn't appear trivial to implement.
Sometimes code is easier to understand. Here's a snippet that covers the first observation returned from DetectBarcodesRequest with a plane. Note this positions an entity at the xy position of the observation relative to the left camera's projection plane then scales it to match the size of the barcode; it does not place the plane at the barcode's xyz position.
// in AppModel
guard let pixelBuffer = sample?.pixelBuffer else { return }
let image = CIImage(cvPixelBuffer: pixelBuffer)
let request = DetectBarcodesRequest()
// observations is a property on appModel
do {
observations = try await request.perform(on: image, orientation: .downMirrored)
} catch {
observations = []
}
Position a plane at the observation's xy coordinates relative to the projection plane for the left camera.
struct ImmersiveView: View {
@Environment(AppModel.self) var appModel
@Environment(\.physicalMetrics) var physicalMetrics
@State var arkitSession = ARKitSession()
@State var worldTrackingProvider = WorldTrackingProvider()
@State var observationRoot = Entity()
// Entity to represent the barcode
@State var observationEntity = Entity()
var body: some View {
@Bindable var appModel = appModel
RealityView { content in
observationEntity.components.set(ModelComponent(
mesh: .generateBox(width: 2, height: 2, depth: 0.001),
materials: [SimpleMaterial(color: .green, isMetallic: false)]
))
observationEntity.components.set(OpacityComponent(opacity: 0.5))
observationEntity.isEnabled = false
observationRoot.addChild(observationEntity)
content.add(observationRoot)
}
update: { content in
guard
// rect is the first observation of a barcode.
let rect = appModel.observations.first?.boundingBox.cgRect,
// sample is the sample returned from CameraFrameProvider.
let sample = appModel.sample,
let deviceAnchor = worldTrackingProvider.queryDeviceAnchor(atTimestamp: CACurrentMediaTime()) else {
observationEntity.isEnabled = false
return
}
observationEntity.isEnabled = true
let intrinsics = sample.parameters.intrinsics
let focalLength = physicalMetrics.convert(intrinsics.columns.0.x, to: .meters)
let focalLengthTransform = Transform(translation: [0, 0, focalLength]).matrix
// Position an entity to represent the projection plane.
observationRoot.transform.matrix = deviceAnchor.originFromAnchorTransform
* sample.parameters.extrinsics.inverse
* focalLengthTransform
// Position the barcode relative to the projection plane.
// Note: you have to account for the different coordinate systems (in this case top,left to Cartesian).
let centerX = physicalMetrics.convert(intrinsics.columns.0.z, to: .meters)
let centerY = physicalMetrics.convert(intrinsics.columns.1.z, to: .meters)
observationEntity.position.x = remap(value: Float(rect.midX), fromRange: [0, 1], toRange: [-centerX, centerX])
observationEntity.position.y = remap(value: Float(rect.midY), fromRange: [0, 1], toRange: [-centerY, centerY])
observationEntity.scale.x = Float(rect.width) * centerX
observationEntity.scale.y = Float(rect.height) * centerY
}
.task {
try? await arkitSession.run([worldTrackingProvider])
await appModel.start()
}
}
func remap(value: Float, fromRange: SIMD2<Float>, toRange: SIMD2<Float>) -> Float {
toRange.x + (value - fromRange.x) * (toRange.y - toRange.x) / (fromRange.y - fromRange.x)
}
}