I'm trying to implement a prototype to render virtual objects in a mixed immersive space on the camer frames captured by CameraFrameProvider
.
Here are what I have done:
- Get camera's instrinsics from
frame.primarySample.parameters.intrinsics
- Get camera's extrinsics from
frame.primarySample.parameters.extrinsics
- Get the device anchor by
worldTrackingProvider.queryDeviceAnchor(atTimestamp: CACurrentMediaTime())
- Setup a RealityKit.RealityRenderer to render virtual objects on the captured camera frames
let realityRenderer = try RealityKit.RealityRenderer()
realityRenderer.cameraSettings.colorBackground = .outputTexture()
let cameraEntity = PerspectiveCamera()
// see https://developer.apple.com/forums/thread/770235
let cameraTransform = deviceAnchor.originFromAnchorTransform * extrinsics.inverse
cameraEntity.setTransformMatrix(cameraTransform, relativeTo: nil)
cameraEntity.camera.near = 0.01
cameraEntity.camera.far = 100
cameraEntity.camera.fieldOfViewOrientation = .horizontal
// manually calculated based on camera intrinsics
cameraEntity.camera.fieldOfViewInDegrees = 105
realityRenderer.entities.append(cameraEntity)
realityRenderer.activeCamera = cameraEntity
Virtual objects, which should be seen in the camera frames, are clipped out by the camera transform.
If I use deviceAnchor.originFromAnchorTransform
as the camera transform, virtual objects can be rendered on camera frames at wrong positions (I think it is because the camera extrinsics isn't used to adjust the camera to the correct position).
My question is how to use the camera extrinsic matrix for this purpose?
Does the camera extrinsics point to a similar orientation of the device anchor with some minor rotation and postion change? Here is an extrinsics from a camera frame. It seems that the direction of Y-axis and Z-axis are flipped by the extrinsics. So the camera is point to a wrong direction.
simd_float4x4([[0.9914258, 0.012555369, -0.13006608, 0.0], // X-axis
[-0.0009778949, -0.9946325, -0.10346654, 0.0], // Y-axis
[-0.13066702, 0.10270659, -0.98609203, 0.0], // Z-axis
[0.024519, -0.019568002, -0.058280986, 1.0]]) // translation
Hi @hale_xie
I did some prototyping over the weekend and came up with something that's close, but not perfect. Specifically, there's increasing misalignment as the angle between an object and the camera increases. I'd appreciate it if you file a feedback request to request an abstraction to simplify offline rendering with passthrough. Be sure to detail your use case.
Now on to the solution which uses ProjectiveTransformCameraComponent
instead of PerspectiveCamera
.
Here's a class to render a scene with passthrough. Construct it with the root entity you want to render. When CameraFrameProvider
delivers an update, call render
to obtain a UIImage
of the scene.
import SwiftUI
import RealityKit
import ARKit
@MainActor
final class EntityToImage {
let renderer:RealityRenderer?
let cameraEntity = Entity()
init(root: Entity) {
renderer = try? RealityRenderer()
renderer?.entities.append(root)
renderer?.entities.append(cameraEntity)
}
private func computeProjectionMatrix(
intrinsics: simd_float3x3,
extrinsics: simd_float4x4
) -> simd_float4x4 {
let rotation = simd_float3x3(extrinsics.columns.0.xyz, extrinsics.columns.1.xyz, extrinsics.columns.2.xyz)
let translation = extrinsics.columns.3.xyz
let projectionMatrix3x4 = intrinsics * rotation
let projectionTranslation = intrinsics * translation
return simd_float4x4(
simd_float4(projectionMatrix3x4.columns.0, projectionTranslation.x),
simd_float4(projectionMatrix3x4.columns.1, projectionTranslation.y),
simd_float4(projectionMatrix3x4.columns.2, projectionTranslation.z),
simd_float4(0, 0, 0, 1)
)
}
private func fixIntrinsics(_ intrinsics: simd_float3x3, physicalMetrics: PhysicalMetricsConverter) -> simd_float3x3 {
let cx:Float = 0
let cy:Float = 0
let fx:Float = -physicalMetrics.convert(intrinsics.columns.0.x, to: .meters) * 2.0
let fy:Float = -physicalMetrics.convert(intrinsics.columns.1.y, to: .meters) * 2.0
return simd_float3x3([[fx, 0, cx],
[0, fy, cy],
[0, 0, 1]])
}
func render(sample: CameraFrame.Sample,
deviceAnchor: DeviceAnchor,
physicalMetrics: PhysicalMetricsConverter) async throws -> UIImage? {
guard let renderer = renderer else { return nil }
func textureImage(from texture: MTLTexture) -> UIImage? {
let componentCount = 4
let bitmapInfo = CGImageByteOrderInfo.order32Big.rawValue | CGImageAlphaInfo.premultipliedLast.rawValue
let bitsPerComponent = 8
let colorSpace = CGColorSpace(name: CGColorSpace.sRGB)!
let bytesPerRow = texture.width * componentCount
guard let pixelBuffer = malloc(texture.height * bytesPerRow) else {
return nil
}
defer {
free(pixelBuffer)
}
let region = MTLRegionMake2D(0, 0, texture.width, texture.height)
texture.getBytes(pixelBuffer, bytesPerRow: bytesPerRow, from: region, mipmapLevel: 0)
let ctx = CGContext(data: pixelBuffer,
width: texture.width,
height: texture.height,
bitsPerComponent: bitsPerComponent,
bytesPerRow: bytesPerRow,
space: colorSpace,
bitmapInfo: bitmapInfo)
guard let cgImage = ctx?.makeImage() else {
return nil
}
let ciImage = CIImage(cgImage: cgImage)
let passThroughImage = CIImage(cvPixelBuffer: sample.pixelBuffer)
let compositedCIImage = ciImage.composited(over: passThroughImage)
let context = CIContext(options: nil)
let composited = context.createCGImage(compositedCIImage, from: compositedCIImage.extent)
return UIImage(cgImage: composited!)
}
let intrinsics = fixIntrinsics(sample.parameters.intrinsics, physicalMetrics: physicalMetrics)
let extrinsics = sample.parameters.extrinsics
let projectionMatrix = computeProjectionMatrix(intrinsics: intrinsics, extrinsics: extrinsics)
let projectiveTransformCameraComponent = ProjectiveTransformCameraComponent(projectionMatrix: projectionMatrix)
cameraEntity.components.set(projectiveTransformCameraComponent)
cameraEntity.transform.matrix = deviceAnchor.originFromAnchorTransform
renderer.activeCamera = cameraEntity
renderer.cameraSettings.colorBackground = .color(.init(gray: 0.0, alpha: 0.0))
renderer.cameraSettings.antialiasing = .none
// TODO if you need an IBL enable it here
// renderer.lighting.resource = try await EnvironmentResource(named: "ImageBasedLighting")
let imageWidth:Double = Double(sample.parameters.intrinsics.columns.0.z) * 2.0
let imageHeight:Double = Double(sample.parameters.intrinsics.columns.1.z) * 2.0
let contentSize = CGSize(width: imageWidth, height: imageHeight)
let descriptor = MTLTextureDescriptor()
descriptor.width = Int(contentSize.width)
descriptor.height = Int(contentSize.height)
descriptor.pixelFormat = .rgba8Unorm_srgb
descriptor.sampleCount = 1
descriptor.usage = [.renderTarget, .shaderRead, .shaderWrite]
guard let texture = MTLCreateSystemDefaultDevice()?.makeTexture(descriptor: descriptor) else {
return nil
}
let image: UIImage? = await withCheckedContinuation { (continuation: CheckedContinuation<UIImage?, Never>) in
do {
let output = try RealityRenderer.CameraOutput(RealityRenderer.CameraOutput.Descriptor.singleProjection(colorTexture: texture))
try renderer.updateAndRender(deltaTime: 0.1, cameraOutput: output, onComplete: { _ in
let uiImage = textureImage(from: texture)
continuation.resume(returning: uiImage)
})
} catch {
continuation.resume(returning: nil)
}
}
return image
}
}
extension simd_float4 {
var xyz: simd_float3 {
get {
return [self.x, self.y, self.z]
}
}
}