Hi,
Im trying to determine if point in 3D space is covered by other objects like human hand or a wall.
I do not want to use raycast, so my idea is to calculate two things:
1) distance between iPad camera and this point.
2) position of this 3D point projected to 2D arView and then find depth information from depthMap at this point
If depth is smaller than distance to point I can assume that point is covered by something.
My code works well when iPad is facing our 3D point straight, but when we rotate iPad a little then calculation 2 (based on depth) gain an error. It looks like calculation 1 and 2 take two different points on iPad as a reference (camera position) but I could not find any logic in it.
This is my code:
let viewSize = arView.bounds.size
let frame = arView.session.currentFrame!
// Transform to translate between arView and depth map
let displayTransform = frame.displayTransform(for: arView.interfaceOrientation, viewportSize: viewSize)
guard let depthPixelBuffer = frame.sceneDepth?.depthMap else { return }
let depthWidth = CVPixelBufferGetWidth(depthPixelBuffer)
let depthWidthFloat = CGFloat(depthWidth)
let depthHeight = CVPixelBufferGetHeight(depthPixelBuffer)
let depthHeightFloat = CGFloat(depthHeight)
// Point in 3D space (our point, red square on images)
let object3Dposition = self.position
// Calculate distance between camera and point in 3D space // this always works good
let distanceToObject = distance(object3Dposition, arView.cameraTransform.translation)
// 2D point on ArView projected from 3D position (find where this point will be visible on arView)
guard let pointOnArView = arView.project(object3Dposition) else { return }
// Normalize 2D point (0-1)
let pointOnArViewNormalized = CGPoint(x: pointOnArView.x/viewSize.width, y: pointOnArView.y/viewSize.height)
// Transform form ArView position to depthMap position
let pointOnDepthMapNormalized = CGPointApplyAffineTransform(pointOnArViewNormalized, displayTransform.inverted())
// Point on depth map (from normalized coordinates to true coordinates)
let pointOnDepthMap = CGPoint(x: pointOnDepthMapNormalized.x * depthWidthFloat, y: pointOnDepthMapNormalized.y * depthHeightFloat)
guard
pointOnDepthMap.x >= 0 && pointOnDepthMap.y >= 0 && pointOnDepthMap.x < depthWidthFloat && pointOnDepthMap.y < depthHeightFloat
else {
// Point not visible, outside of screen
isVisibleByCamera = false
return
}
// Read depth from buffer
let depth: Float32
CVPixelBufferLockBaseAddress(depthPixelBuffer, CVPixelBufferLockFlags(rawValue: 2))
let floatBuffer = unsafeBitCast(
CVPixelBufferGetBaseAddress(depthPixelBuffer),
to: UnsafeMutablePointer<Float32>.self
)
// Get depth in 'pointOnDepthMap' coordinates (convert from X,Y coordinates to buffer index)
let depthBufferIndex = depthWidth * Int(pointOnDepthMap.y) + Int(pointOnDepthMap.x)
// This depth is incorrect when iPad is rotated
depth = floatBuffer[depthBufferIndex]
CVPixelBufferUnlockBaseAddress(depthPixelBuffer, CVPixelBufferLockFlags(rawValue: 2))
if distanceToObject > depth + 0.05 {
isVisibleByCamera = false
} else {
isVisibleByCamera = true
}
Thank you :)
Ok I figured it out. This depth value is just "z" component of x,y,z coordinates.
rest of code:
var intrinsics = cameraIntrinsics
intrinsics[0][0] /= cameraToDepthRatio
intrinsics[1][1] /= cameraToDepthRatio
intrinsics[2][0] /= cameraToDepthRatio
intrinsics[2][1] /= cameraToDepthRatio
let depthMapPixelPoint = pointOnBuffer
let xrw = ((Float(depthMapPixelPoint.x) - intrinsics[2][0]) * depth / intrinsics[0][0])
let yrw = (Float(depthMapPixelPoint.y) - intrinsics[2][1]) * depth / intrinsics[1][1]
// Y is UP in camera space, vs it being DOWN in image space.
let localPoint = simd_float3(xrw, -yrw, -depth)
let worldPoint = viewMatrix.inverse * simd_float4(localPoint, 1)
let result = simd_float3(worldPoint.x, worldPoint.y, worldPoint.z)
let vector = result - cameraPosition
let distance = simd_length(vector)