I am trying to do a hit test of sorts between a person in my ARFrame and a RealityKit Entity. So far I have been able to use the position value of my entity and project it to a CGPoint which I can match up with the ARFrame's segmentationBuffer to determine whether a person intersects with that entity. Now I want to find out if that person is at the same depth as that entity. How do I relate the SIMD3 position value for the entity, which is in meters I think, to the estimatedDepthData value?
Matching Virtual Object Depth with ARFrame Estimated Depth Data
The depth data in the estimatedDepthData pixel buffer is estimated linear depth in meters from the point of view.
So, if you have a pixel where your entity intersects with the segmentationBuffer, you can unproject that position into world space using the estimated linear depth, which you may be able to use as a sort of rough hit test.
This sample contains an unprojection method which may be useful for reference: https://developer.apple.com/documentation/arkit/visualizing_a_point_cloud_using_scene_depth
So, if you have a pixel where your entity intersects with the segmentationBuffer, you can unproject that position into world space using the estimated linear depth, which you may be able to use as a sort of rough hit test.
This sample contains an unprojection method which may be useful for reference: https://developer.apple.com/documentation/arkit/visualizing_a_point_cloud_using_scene_depth
Thanks for the suggestion. Since posting this I have indeed been able to get the beginnings of a hit test going with the segmentationBuffer, but then when I try to use the estimatedDepthData, I run into trouble extracting values.
Here's some of my code:
I strongly suspect that my problems are in line 19 and 20 of my code above, but I can't figure out the right values to find the point I want in the estimatedDepthData
Here's some of my code:
Code Block let segmentationCols = CVPixelBufferGetWidth(segmentationBuffer) let segmentationRows = CVPixelBufferGetHeight(segmentationBuffer) let colPosition = screenPosition.x / UIScreen.main.bounds.width * CGFloat(segmentationCols) let rowPosition = screenPosition.y / UIScreen.main.bounds.height * CGFloat(segmentationRows) CVPixelBufferLockBaseAddress(segmentationBuffer, .readOnly) guard let baseAddress = CVPixelBufferGetBaseAddress(segmentationBuffer) else { return } let bytesPerRow = CVPixelBufferGetBytesPerRow(segmentationBuffer) let buffer = baseAddress.assumingMemoryBound(to: UInt8.self) let index = Int(colPosition) + Int(rowPosition) * bytesPerRow let b = buffer[index] if let segment = ARFrame.SegmentationClass(rawValue: b), segment == .person, let depthBuffer = frame.estimatedDepthData { print("Person!") CVPixelBufferLockBaseAddress(depthBuffer, .readOnly) guard let depthAddress = CVPixelBufferGetBaseAddress(depthBuffer) else { return } let depthBytesPerRow = CVPixelBufferGetBytesPerRow(depthBuffer) let depthBoundBuffer = depthAddress.assumingMemoryBound(to: Float32.self) let depthIndex = Int(colPosition) * Int(rowPosition) let depth_b = depthBoundBuffer[depthIndex] print(depth_b) CVPixelBufferUnlockBaseAddress(depthBuffer, .readOnly) } CVPixelBufferUnlockBaseAddress( segmentationBuffer, .readOnly )
I strongly suspect that my problems are in line 19 and 20 of my code above, but I can't figure out the right values to find the point I want in the estimatedDepthData
Thanks for the suggestion. Since posting this I have indeed been able to get the beginnings of a hit test going with the segmentationBuffer, but then when I try to use the estimatedDepthData, I run into trouble extracting values.
Here's some of my code:
I strongly suspect that my problems are in line 19 and 20 of my code above, but I can't figure out the right values to find the point I want in the estimatedDepthData
Here's some of my code:
Code Block let segmentationCols = CVPixelBufferGetWidth(segmentationBuffer) let segmentationRows = CVPixelBufferGetHeight(segmentationBuffer) let colPosition = screenPosition.x / UIScreen.main.bounds.width * CGFloat(segmentationCols) let rowPosition = screenPosition.y / UIScreen.main.bounds.height * CGFloat(segmentationRows) CVPixelBufferLockBaseAddress(segmentationBuffer, .readOnly) guard let baseAddress = CVPixelBufferGetBaseAddress(segmentationBuffer) else { return } let bytesPerRow = CVPixelBufferGetBytesPerRow(segmentationBuffer) let buffer = baseAddress.assumingMemoryBound(to: UInt8.self) let index = Int(colPosition) + Int(rowPosition) * bytesPerRow let b = buffer[index] if let segment = ARFrame.SegmentationClass(rawValue: b), segment == .person, let depthBuffer = frame.estimatedDepthData { print("Person!") CVPixelBufferLockBaseAddress(depthBuffer, .readOnly) guard let depthAddress = CVPixelBufferGetBaseAddress(depthBuffer) else { return } let depthBytesPerRow = CVPixelBufferGetBytesPerRow(depthBuffer) let depthBoundBuffer = depthAddress.assumingMemoryBound(to: Float32.self) let depthIndex = Int(colPosition) * Int(rowPosition) let depth_b = depthBoundBuffer[depthIndex] print(depth_b) CVPixelBufferUnlockBaseAddress(depthBuffer, .readOnly) } CVPixelBufferUnlockBaseAddress( segmentationBuffer, .readOnly )
I strongly suspect that my problems are in line 19 and 20 of my code above, but I can't figure out the right values to find the point I want in the estimatedDepthData
It looks like the error is in line 20:
You should try:
Code Block let depthIndex = Int(colPosition) * Int(rowPosition)
You should try:
Code Block let depthIndex = Int(colPosition) + Int(rowPosition) * width // Where width is CVPixelBufferGetWidth(pixelBuffer)
Hey thanks for the suggestion. That is actually what I had initially, similar to line 10, but it wasn't working so I started messing around with other values to see if I could get something to work. Neither one works though. Any other ideas? Most examples I've come across are Metal implementations and don't have corresponding code to what I'm trying to do.
It's difficult to say where you've gone wrong, the following method will extract the value at the provided image coordinate from the depth texture:
I recommend that you start here, make sure you can get valid values, and then work forward from there to see where your issue is. It is likely an error in converting between coordinate spaces somewhere.
Code Block extension CVPixelBuffer { func value(column: Int, row: Int) -> Float? { guard CVPixelBufferGetPixelFormatType(self) == kCVPixelFormatType_DepthFloat32 else { return nil } CVPixelBufferLockBaseAddress(self, .readOnly) if let baseAddress = CVPixelBufferGetBaseAddress(self) { let width = CVPixelBufferGetWidth(self) let index = column + row*width let offset = index * MemoryLayout<Float>.stride let value = baseAddress.load(fromByteOffset: offset, as: Float.self) CVPixelBufferUnlockBaseAddress(self, .readOnly) return value } CVPixelBufferUnlockBaseAddress(self, .readOnly) return nil } }
I recommend that you start here, make sure you can get valid values, and then work forward from there to see where your issue is. It is likely an error in converting between coordinate spaces somewhere.
That extension was super helpful and solved my problems, so thank you so much! Comparing the extension to my code, I think the key problem was in fact what you highlighted earlier--I needed to account for the pixel buffer width. In my previous implementation, I had been just accounting for the bytes per row which is what I thought you were saying too, but in fact you need to account for both.
Thanks again!
Thanks again!
One tricky bit I have discovered is that when working with an iPhone, the screen aspect ratio does not match the aspect ratio of the depth buffer, so translating from the buffer width to the screen position requires disregarding some of the buffer width on each side.
@aharriscrowne Please see my solution here:
https://developer.apple.com/forums/thread/705216?answerId=712036022#712036022