Custom RealityKit occlusion based on Depth map


Im trying to determine if point in 3D space is covered by other objects like human hand or a wall.

I do not want to use raycast, so my idea is to calculate two things:

1) distance between iPad camera and this point.

2) position of this 3D point projected to 2D arView and then find depth information from depthMap at this point

If depth is smaller than distance to point I can assume that point is covered by something.

My code works well when iPad is facing our 3D point straight, but when we rotate iPad a little then calculation 2 (based on depth) gain an error. It looks like calculation 1 and 2 take two different points on iPad as a reference (camera position) but I could not find any logic in it.

This is my code:

let viewSize = arView.bounds.size

let frame = arView.session.currentFrame!

// Transform to translate between arView and depth map

let displayTransform = frame.displayTransform(for: arView.interfaceOrientation, viewportSize: viewSize)

guard let depthPixelBuffer = frame.sceneDepth?.depthMap else { return }

let depthWidth = CVPixelBufferGetWidth(depthPixelBuffer)

let depthWidthFloat = CGFloat(depthWidth)

let depthHeight = CVPixelBufferGetHeight(depthPixelBuffer)

let depthHeightFloat = CGFloat(depthHeight)

// Point in 3D space (our point, red square on images)

let object3Dposition = self.position

// Calculate distance between camera and point in 3D space // this always works good

let distanceToObject = distance(object3Dposition, arView.cameraTransform.translation)

// 2D point on ArView projected from 3D position (find where this point will be visible on arView)

guard let pointOnArView = arView.project(object3Dposition) else { return }

// Normalize 2D point (0-1)

let pointOnArViewNormalized = CGPoint(x: pointOnArView.x/viewSize.width, y: pointOnArView.y/viewSize.height)

// Transform form ArView position to depthMap position

let pointOnDepthMapNormalized = CGPointApplyAffineTransform(pointOnArViewNormalized, displayTransform.inverted())

// Point on depth map (from normalized coordinates to true coordinates)

let pointOnDepthMap = CGPoint(x: pointOnDepthMapNormalized.x * depthWidthFloat, y: pointOnDepthMapNormalized.y * depthHeightFloat)


    pointOnDepthMap.x >= 0 && pointOnDepthMap.y >= 0 && pointOnDepthMap.x < depthWidthFloat && pointOnDepthMap.y < depthHeightFloat

else {

    // Point not visible, outside of screen

    isVisibleByCamera = false



// Read depth from buffer

let depth: Float32

CVPixelBufferLockBaseAddress(depthPixelBuffer, CVPixelBufferLockFlags(rawValue: 2))

let floatBuffer = unsafeBitCast(


    to: UnsafeMutablePointer<Float32>.self


// Get depth in 'pointOnDepthMap' coordinates (convert from X,Y coordinates to buffer index)

let depthBufferIndex = depthWidth * Int(pointOnDepthMap.y) + Int(pointOnDepthMap.x)

// This depth is incorrect when iPad is rotated

depth = floatBuffer[depthBufferIndex]


CVPixelBufferUnlockBaseAddress(depthPixelBuffer, CVPixelBufferLockFlags(rawValue: 2))

if distanceToObject > depth + 0.05 {

    isVisibleByCamera = false

} else {

    isVisibleByCamera = true


Ok I figured it out. This depth value is just "z" component of x,y,z coordinates.

rest of code:

    var intrinsics = cameraIntrinsics
    intrinsics[0][0] /= cameraToDepthRatio
    intrinsics[1][1] /= cameraToDepthRatio
    intrinsics[2][0] /= cameraToDepthRatio
    intrinsics[2][1] /= cameraToDepthRatio
    let depthMapPixelPoint = pointOnBuffer
    let xrw = ((Float(depthMapPixelPoint.x) - intrinsics[2][0]) * depth / intrinsics[0][0])
    let yrw = (Float(depthMapPixelPoint.y) - intrinsics[2][1]) * depth / intrinsics[1][1]
    // Y is UP in camera space, vs it being DOWN in image space.
    let localPoint = simd_float3(xrw, -yrw, -depth)
    let worldPoint = viewMatrix.inverse * simd_float4(localPoint, 1)
    let result = simd_float3(worldPoint.x, worldPoint.y, worldPoint.z)
    let vector = result - cameraPosition
    let distance = simd_length(vector)

I was experimenting for few days and checking other people solutions but problem with incorrect depth values when iPad is rotated seems to be in every implementation.

I collected data representing depth error percentage comparing real depth with depth from depthMap for iPad camera angles from 0 to 30 degrees and there is clear correlation:

For 30 degrees error is about 13%. (with confidence == 2)

What am I missing here? Any help would be appreciated, thanks:)

