How to convert in iOS a RealityKit SpatialTapGesture value to an Entity coordinate?

I have an app with a visionOS target, and I want to add an iOS target. Both are based on RealityKit. I want to use a SpatialTapGesture to get the tap coordinate local to the entity tapped. In visionOS this is easy:

SpatialTapGesture(coordinateSpace: .local)
    .targetedToAnyEntity()
    .onEnded { tap in
        let entity = tap.entity
        let localPoint3D = tap.convert(tap.location3D, from: .local, to: entity)
        // …
    }

However, according to the docs, the convert function seems to exist only in visionOS, not in iOS.
So how can I do this conversion in iOS?

PS: This was already posted on StackOverflow without success. There, I tried to find a workaround, but I failed.

Hi @Reinhard_Maenner ,

Could you try something like this?

 .gesture(SpatialTapGesture()
            .targetedToAnyEntity()
            .onEnded { value in
            #if os(visionOS)
            let position = value.convert(value.location3D, from: .local, to: value.entity.parent!)
            #else
            let position = value.hitTest(point: value.location, in: .local).first?.position ?? value.entity.position
            #endif
               })

This uses the hitTest API, but it's only for iOS and macOS so you'll need to conditionally use it.

Thanks for the suggestion to use hitTest. However, it does not work in my case.
Here is an example view:

The scene has 2 entities, a board and a vehicle. If I use your code and tap the corners of the board (sequence front left, back left, back right, front right), I get the positions
SIMD3<Float>(1.4597505, 0.02, -4.9677944)
SIMD3<Float>(1.4252294, 0.02, -2.2314916)
SIMD3<Float>(-1.4423977, 0.02, -2.1940436)
SIMD3<Float>(-1.4726961, 0.02, -4.962293).

If I tap the vehicle, I get the position
SIMD3<Float>(-1.138767, 0.09374983, -4.1757255)

This shows that the returned positions are not local to the tapped entity (as it is required in my case) but local to the current view.

@Reinhard_Maenner thanks for getting back to me, would you mind showing a snippet of your code where you add these to the scene as well as the spatial tap gesture?

Sure!
Here is the code how the board and a vehicle are added to the scene. I removed everything that is to my mind not essential. I hope this makes sense to you.

struct ImmersiveView: View {  

	@Environment(\.realityKitScene) var scene: RealityKit.Scene?
	@Environment(AppModel.self) private var appModel: AppModel
	@State private var realityContent = RealityContent()
	
	var body: some View {
		RealityView { content, attachments in
			appModel.scene = scene
			let initialEntities = realityContent.setUpContent(boardDimensions: boardDimensions, 
															 boardPosition_m: boardPosition_m, 
															 appModel: appModel)
			initialEntities.forEach { content.add($0) }
		}

		update: { content, attachments in
			guard let boardEntity = content.entities.filter({ $0.name == boardEntityName }).first as? ModelEntity else { return }
			realityContent.updateVehicles(boardEntity: boardEntity, 
													  vehicles: appModel.vehicles, 
													  boardTopFaceSize_m: appModel.boardTopFaceSize_m)
	} 
} 

@MainActor
struct RealityContent {
	func setUpContent(boardDimensions: BoxSize, boardPosition_m: SIMD3<Float>, appModel: AppModel) -> Set<Entity> {
		var entities: Set<Entity> = []
		entities.insert(makeBoardEntity(boardDimensions: boardDimensions, boardPosition_m: boardPosition_m, fields: appModel.fields))
		return entities
	}

	func makeBoardEntity(boardDimensions: BoxSize, boardPosition_m: SIMD3<Float>, fields: Set<Field>) -> Entity {
		// Define geometry
		let boardWidth  = Float(boardDimensions.x_width.m)
		let boardHeight = Float(boardDimensions.y_height.m)
		let boardDepth  = Float(boardDimensions.z_depth.m)
		let mesh = MeshResource.generateBox(width:  boardWidth, 
										   height: boardHeight, 
										   depth:  boardDepth, 
										   splitFaces: true)
		
		// Define materials
		let (materials, textureImage) = makeBoardMaterials(boardDimensions: boardDimensions, fields: fields)
		
		// Make entity
		let boardEntity = ModelEntity(mesh: mesh, materials: materials)
		boardEntity.name = boardEntityName
		
		// Add a custom texture image component. This is required to get the pixel color after a ray cast hit the board.
		boardEntity.components[TextureImageComponent.self] = TextureImageComponent(textureImage: textureImage)
		
		// Position the board
		boardEntity.transform.translation = boardPosition_m
		
		// Allow to tap the board
		boardEntity.components.set(InputTargetComponent())
		boardEntity.generateCollisionShapes(recursive: false)
		
		// Add the board entity to the reality content and return it
		return boardEntity
	}  

	func updateVehicles(boardEntity: ModelEntity, vehicles: Set<Vehicle>, boardTopFaceSize_m: CGSize) {
		let vehicleEntity = makeVehicleEntity(vehicle: vehicle, boardTopFaceSize_m: boardTopFaceSize_m)
		transformVehicle(boardEntity: boardEntity, 
						 vehicleEntity: vehicleEntity, 
						 boardTopFacePosition_m: vehicle.onBoardPosition_m, 
						 orientation: vehicle.orientation, 
						 animated: false)
			
		// Add the new entity to the board
		boardEntity.addChild(vehicleEntity)
	}

	func transformVehicle(boardEntity: ModelEntity, 
						  vehicleEntity: ModelEntity, 
						  boardTopFacePosition_m: CGPoint, 
						  orientation: Angle,
						  animated: Bool) {
		// Position the vehicle on the board.
		// The y offset relative to the board is board thickness / 2 + half of the vehicle height.
		let halfVehicleHeight_ = vehicleEntity.model!.mesh.bounds.extents.y / 2.0
		let halfVehicleHeight_m = halfVehicleHeight_ * vehicleEntity.scale.y
		let halfBoardThickness_m = Float(boardDimensions.y_height.m) / 2.0
		let yOffset_m = halfVehicleHeight_m + halfBoardThickness_m
		
		// Move the vehicle to the right position and turn it to the right direction
		let scale = vehicleEntity.scale
		let translation = SIMD3<Float>(Float(boardTopFacePosition_m.x), yOffset_m, Float(boardTopFacePosition_m.y))
		let rotationY = simd_quatf(angle: Float(orientation.radians), axis: SIMD3(x: 0, y: 1, z: 0))
		let transform = Transform(scale: scale, rotation: rotationY, translation: translation)
		
		// Move the vehicle with the right orientation to the right position.
		if animated {
			animatedMove(boardEntity: boardEntity, 
						 vehicleEntity: vehicleEntity, 
						 transform: transform, 
						 duration: slowEvolutionTimeStep)
		} else {
			vehicleEntity.transform = transform
		}
	}

hey @Reinhard_Maenner , sorry for the delay.

hitTest only returns in the .scene space, so do this to get the local space:

   .gesture(SpatialTapGesture(coordinateSpace: .local)
                   .targetedToAnyEntity()
                   .onEnded { value in
                   #if os(visionOS)
                       let position = value.convert(value.location3D, from: .local, to: value.entity)
                       print(position, value.entity.name)
                   #else
                       guard let scenePosition = value.hitTest(point: value.location, in: .local).first?.position else { return }
                       let localPosition = value.entity.convert(position: scenePosition, from: value.entity.parent)
print(localPosition) // correctly prints now
                   #endif
                      })
    }

This iOS code nearly works in my case. However, there are 2 problems:

If I tap the front left corner of the board (see screenshot above), I get
localPosition: SIMD3<Float>(1.4599041, 0.02, -1.4540071)
Assuming a coordinate system where x goes towards right, y towards up, and z towards the observer, the y and z values are correct, but the x value should be its negative.

If I attach a child entity to the board (see above) and tap the child, I get (depending on the child position relative to the parent) e.g.
localPosition: SIMD3<Float>(627.7879, 6.9999986, 325.40625)
which is no valid coordinate relative to the child, no to the parent.

Maybe you chould try to verify this in your setup.

@Reinhard_Maenner this is the code I've been testing with and it prints the coordinates local to both the parent and child depending on which I've tapped. Feel free to try it for yourself:

struct ContentView: View {
    
    var body: some View {
        RealityView { content in
            // Create the parent
            let plane = ModelEntity(mesh: .generateBox(width: 1, height: 0.05, depth: 1), materials: [SimpleMaterial(color: .red, isMetallic: false)])
            plane.components.set(InputTargetComponent())
            plane.generateCollisionShapes(recursive: true)
            plane.name = "plane"
            
            // Create the child
            let cube = ModelEntity(mesh: .generateBox(size: 0.1), materials: [SimpleMaterial(color: .blue, isMetallic: true)])
            cube.components.set(InputTargetComponent())
            cube.components.set(CollisionComponent(shapes: [.generateBox(width: 0.1, height: 0.1, depth: 0.1)]))
            cube.name = "cube"
            
            // Add the child to the plane and set the position
            plane.addChild(cube)
            cube.position.y = plane.position.y + 0.05
            cube.position.x = plane.position.x - 0.3
            content.add(plane)
        }
        .gesture(SpatialTapGesture(coordinateSpace: .local)
            .targetedToAnyEntity()
            .onEnded { value in
            #if os(visionOS)
                let position = value.convert(value.location3D, from: .local, to: value.entity)
                print(position, value.entity.name)
            #else
                guard let scenePosition = value.hitTest(point: value.location, in: .local).first?.position else { return }
                let localPosition = value.entity.convert(position: scenePosition, from: value.entity.parent)
                print(localPosition, value.entity.name)
            #endif
            })
    }
}

Thank you for providing your code. I found the following:
In my demo, the board is added directly to the reality content, while the vehicle is added as a child to the board.
When I tap the board, the board entity's parent is some coreEntity, probably something like the root node of SceneKit. Getting the 3D position using

let localPoint3D_m = value.entity.convert(position: scenePosition, from: value.entity.parent)

gives the same as using

let localPoint3D_m = value.entity.convert(position: scenePosition, from: nil)

Now, if I tap the vehicle,

let localPoint3D_m = value.entity.convert(position: scenePosition, from: value.entity.parent)  

gives some strange coordinate values, e.g.

(SIMD3<Float>) (473.135803, 7.00000191, -524.761536)

whereas

let localPoint3D_m = value.entity.convert(position: scenePosition, from: nil)  

gives

(SIMD3<Float>) (5.38323975, 7.00000191, -3.98590088)  

None of these values makes any sense to me.
Currently, I do need only the position tapped on the board, and your solution provides it (despite of the not understood minus sign). When I tap the vehicle, I do not need the tapped position on the vehicle, only the tapped entity. So this is now OK with me.
Maybe it is the best to wait until RealityKit for iOS has been extended by the missing functions. Thanks a lot for all your work.

How to convert in iOS a RealityKit SpatialTapGesture value to an Entity coordinate?
 
 
Q