In the WWDC session titled "Deep dive into volumes and immersive spaces", the developers discussed adding a Spatial Tracking Session and an Anchor Entity to detect the floor. They then glossed over some important details. They added a spatial tap gesture to let the user place content relative to the floor anchor, but they left a lot of information.
.gesture(
SpatialTapGesture(
coordinateSpace: .immersiveSpace
)
.targetedToAnyEntity()
.onEnded { value in
handleTapOnFloor(value: value)
}
)
My understanding is that an entity has to have input and collision components for gestures like this to work. How can we add a collision to an AnchorEntity when we don't know its size or shape?
I've been trying for days to understand what is happening here and I just don't get it. It is even more frustrating that the example project that Apple released does not contain any of these features.
I would like to be able
Detect the floor plane
Get the position/transform of the floor plane
Add a collider to the floor plane
Enable collisions and physics on the floor plane
Enable gestures on the floor plane
It seems to me that the Anchor Entity is placed as an entirely arbitrary position. It has absolutely no relationship to the rectangle with the floor label that I can see in the Xcode visualization. It is just a point, not a plane or rect that I can use.
I've tried manually calculating the collision shape after the anchor is detected, but nothing that I have tried works. I can't tap on the floor with gestures. I can't drop entities onto the floor. I can't seem to do ANYTHING at all with this floor anchor other than place entity at the totally arbitrary location somewhere on the floor.
Is there anyway at all with Spatial Tracking Session and Anchor Entity to get the actual plane that was detected?
struct FloorExample: View {
@State var trackingSession: SpatialTrackingSession = SpatialTrackingSession()
@State var subject: Entity?
@State var floor: AnchorEntity?
var body: some View {
RealityView { content, attachments in
let session = SpatialTrackingSession()
let configuration = SpatialTrackingSession.Configuration(tracking: [.plane])
_ = await session.run(configuration)
self.trackingSession = session
let floorAnchor = AnchorEntity(.plane(.horizontal, classification: .floor, minimumBounds: SIMD2(x: 0.1, y: 0.1)))
floorAnchor.anchoring.physicsSimulation = .none
floorAnchor.name = "FloorAnchorEntity"
floorAnchor.components.set(InputTargetComponent())
floorAnchor.components.set(CollisionComponent(shapes: .init()))
content.add(floorAnchor)
self.floor = floorAnchor
// This is just here to let me see where visinoOS decided to "place" the floor anchor.
let floorPlaced = ModelEntity(
mesh: .generateSphere(radius: 0.1),
materials: [SimpleMaterial(color: .black, isMetallic: false)])
floorAnchor.addChild(floorPlaced)
if let scene = try? await Entity(named: "AnchorLabsFloor", in: realityKitContentBundle) {
content.add(scene)
if let subject = scene.findEntity(named: "StepSphereRed") {
self.subject = subject
}
// I can see when the anchor is added
_ = content.subscribe(to: SceneEvents.AnchoredStateChanged.self) { event in
event.anchor.generateCollisionShapes(recursive: true) // this doesn't seem to work
print("**anchor changed** \(event)")
print("**anchor** \(event.anchor)")
}
// place the reset button near the user
if let panel = attachments.entity(for: "Panel") {
panel.position = [0, 1, -0.5]
content.add(panel)
}
}
} update: { content, attachments in
} attachments: {
Attachment(id: "Panel", {
Button(action: {
print("**button pressed**")
if let subject = self.subject {
subject.position = [-0.5, 1.5, -1.5]
// Remove the physics body and assign a new one - hack to remove momentum
if let physics = subject.components[PhysicsBodyComponent.self] {
subject.components.remove(PhysicsBodyComponent.self)
subject.components.set(physics)
}
}
}, label: {
Text("Reset Sphere")
})
})
}
}
}
ARKit
RSS for tagIntegrate iOS device camera and motion features to produce augmented reality experiences in your app or game using ARKit.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
Subject: Combining ARKit Face Tracking with High-Resolution AVCapture and Perspective Rendering on Front Camera
Message:
Hello Apple Developer Community,
We’re developing an application using the front camera that requires both real-time ARKit face tracking/guidance and the capture of high-resolution still images via AVCaptureSession. Our goal is to leverage ARKit’s depth and face data to render a captured image from another perspective post-capture, maintaining high image quality.
Our Approach:
Real-Time ARKit Guidance:
Utilize ARKit (e.g., ARFaceTrackingConfiguration) for continuous face tracking, depth, and scene understanding to guide the user in real time.
High-Resolution Capture Transition:
At the moment of capture, we plan to pause the ARKit session and switch to an AVCaptureSession to take a high-resolution image.
We assume that for a front-facing image, the subject’s face is directly front-on, and the relative pose between the face and camera remains the same during the transition. The only variation we expect is a change in distance.
Our intention is to minimize the delay between the last ARKit frame and the high-res capture to maintain temporal consistency, assuming that aside from distance, the face-camera relative pose remains unchanged.
Post-Processing Perspective Rendering:
Using the last ARKit face data (depth, pose, and landmarks) along with the high-resolution 2D image, we aim to render the scene from another perspective.
We want to correct the perspective of the 2D image using SceneKit or RealityKit, leveraging the collected ARKit scene information to achieve a natural, high-quality rendering from a different viewpoint.
The rendering should match the quality of a normally captured high-resolution image, adjusting for the difference in distance while using the stored ARKit data to correct perspective.
Our Questions:
Session Transition Best Practices:
What are the recommended best practices to seamlessly pause ARKit and switch to a high-resolution AVCapture session on the front camera
How can we minimize user movement or other issues during this brief transition, given our assumption that the face-camera pose remains largely consistent except for distance changes?
Data Integration for Perspective Rendering:
How can we effectively integrate stored ARKit face, depth, and pose data with the high-res image to perform accurate perspective correction or rendering from another viewpoint?
Given that we assume the relative pose is constant except for distance, are there strategies or APIs to leverage this assumption for simplifying the perspective transformation?
Perspective Correction with SceneKit/RealityKit:
What techniques or workflows using SceneKit or RealityKit are recommended for correcting the perspective of a captured 2D image based on ARKit scene data?
How can we use these frameworks to render the high-resolution image from an alternative perspective, while maintaining image quality and fidelity?
4. Pitfalls and Guidelines:
What common pitfalls should we be aware of when combining ARKit tracking data with high-res capture and post-processing for perspective rendering?
Are there performance considerations, recommended thresholds for acceptable temporal consistency, or validation techniques to ensure the ARKit data remains applicable at the moment of high-res capture?
We appreciate any advice, sample code references, or documentation pointers that could assist us in implementing this workflow effectively.
Thank you!
Hello,
I was looking back into downloading the Tracking geographic locations in AR sample app from https://developer.apple.com/documentation/arkit/tracking-geographic-locations-in-ar
Unfortunately the Download links to the .zip of the DisplayingAPointCloudUsingSceneDepth sample project.
The exact same issue occurs when trying to download the sample code from https://developer.apple.com/documentation/ARKit/creating-a-fog-effect-using-scene-depth
Wondering if those links are deliberately broken because of possible deprecations.
Thanks to any Apple Engineer willing to look into that.
Hi folks, I’m new to Vision Pro stack, still trying to learn all the nuances. Here is a problem I can’t seem to find an answer.
I placed entity A( a small .02 radius sphere) inside entity B( size:.1 box). Both entities have HoverEffectComponent, and both inputcomponent is set to .direct. Entity A is NOT a child of Entity B. When I direct touch Entity B, I noticed that Entity A’s hover effect is fired as well. This only happens if Entity A‘s position is inside Entity B. The gesture that is only targeted at Entity A doesn’t work either. I double checked Entity A collider which sits inside entity B collider, my direct touch shouldn’t have trigger its hove effect. Having one collider inside another seems to produce unpredictable behavior? Thanks in advance 🙏🙏🙏
Context: I’m trying to create an invisible bound around Entity A, so when my hand approaches the bound to grab Entity A, a nice spotlight hover effect would fire first on the bound before hand reaching entity A.
I am developing an ARKit based application that requires plane detection of the tabletop at which the user is seated. Early testing was with an iPhone 8 and iPhone 8+. With those devices, ARKit rapidly detected the plane of the tabletop when it was only 8 to 10 inches away. Using iPhone 15 with the same code, it seems to require me to move the phone more like 15 to 16 inches away before detecting the plane of the table. This is an awkward motion for a user seated at a table. To validate that it was not necessarily a feature of my code, I determined that the same behavior results with Apple's sample AR Interaction application. Has anyone else experienced this, and if so, have suggestions to improve the situation?
Hello,
I have downloaded and run the sample object tracking app for visionos.
Now I'm working on my own objects for tracking. I have made a model using Create ML using images of my object.
However, I cannot see how to convert the Create ML output file (***.mlmodel) into a reference object like the files in the sample project.
is there a tool for converting them?
TIA
Topic:
Spatial Computing
SubTopic:
ARKit
As I understand it there are two ways I can track a hand, or a joint, in RealityKit:
either, create an AnchorEntity, for example AnchorEntity(.hand(.left, location: .palm))
or, set up an ARSession with a HandTrackingProvider ( a lot more code which I haven't repeated here).
Assuming this is correct, when would I want to use one over the other?
We applied for the visionOS enterprise permission license, which can help us improve object tracking capabilities on Vision Pro. However, we are unsure how to use it in Unity, specifically how to implement object tracking in Unity and increase the tracking speed.
In ARKit for visionOS, I can track the user's head with a HeadAnchor, but it will not give the location. However, I can get the device's transform by calling queryDeviceAnchor(atTimestamp: CACurrentMediaTime()) on a WorldTrackingProvider.
Why the difference? - if I know the device's transform, I effectively know the head's transform.
I am using Entity of RealityKit to display virtual content, however I find that sometimes the real object in front of the virtual content can not occulude the virtual content.
For example, I place an Entity in a room, but when I walk into another room, I can still see the Entity through the wall.
I wonder how should I fix the problem. Thank you!
I am currently creating an app where two people share an instance of an immersive space so that they are able to point to certain things in the immersive space. Right now, other people are hidden behind the immersive space, and even with people awareness enabled for everything, people are still too difficult to see. I've found this documentation (https://developer.apple.com/documentation/arkit/occluding-virtual-content-with-people) which describes what I want to do, but it is only listed as working on iOS an iPadOS. Is there anything similar to this that will work on VisionOS?
Is it possible to detect distance from the vision pro to real live objects and people? I tried using scene.raycast to perform a raycast forward from the center of the viewport, but it doesn't seem to react to real life objects, only entities.
I see mentioned here: https://developer.apple.com/forums/thread/776807?answerId=829576022#829576022, that a raycast with scene reconstruction should allow me to measure that distance, as long as the object is non-moving. How could I accomplish that?
As far as I know, Apple hasn’t opened access to the Vision Pro camera for developers yet, so I’m trying to find possible workarounds within the current capabilities. I’m wondering if there’s any way to apply a mesh to a person in the scene in Vision Pro, or if there’s an alternative approach to roughly detect a human shape in front of the user?
I am looking for a material that functions in the same way that Occlusion Material does, except that it only partially occludes whatever is behind it. One way that I have thought of doing this was to change the opacity of the entity that was covered in Occlusion Material, however this did not change anything. Please let me know if this is possible.
Hi, I'm playing now with hand tracking. I want to get position of hand inside a system update function. I was not sure if transform I'm getting from hand attached AnchorEntity (with trackingMode: .predicted) would give same results as handAnchors(at:) from hand tracking provider, so I started to read them both and compare. For handAnchors i tried using context.scene.timebase.sourceTimebase!.sourceClock!.time.seconds and CACurrentMediaTime() as timestamp source. They seem to use exactly same clock, so that doesn't matter, but:
for some reason update handler is always called twice with same context.deltaTime, but first time the query finds 0 entities, second time it finds them all. The query is the standard EntityQuery(where: .has(MyComponent.self)) and in update (matching: Self.query, updatingSystemWhen: .rendering). Here's part of logs:
System update called, entity count: 0, dt: 0.01000458374619484, absTime: 4654.222593541
System update called, entity count: 11, dt: 0.01000458374619484, absTime: 4654.22262525
System update called, entity count: 0, dt: 0.009999999776482582, absTime: 4654.249390875
System update called, entity count: 11, dt: 0.009999999776482582, absTime: 4654.249425
accounting for the double update calling I started to calculate time delta of absolute time between calls and they're most of the time much bigger, or much smaller than advertised by system's context.deltaTime, only sometimes they kind of match, for example:
system: (dt: 0.01000458374619484)
scene : (dt: 0.021419291667371) (absTime: 4654.222628125001)
and the very next call
system: (dt: 0.010009 166784584522)
scene : (dt: 0.0013097083328830195) (absTime: 4654.223937833334)
but sometimes
system: (dt: 0.009999999776482582)
scene : (dt: 0.009 112249999816413) (absTime: 4654.351299 166668)
Shouldn't those be more or less equal, or am I missing something?
In the end it seems that getting hand position from AnchorEntity and with handAnchors(at:) gives kind of same results, but at different time points, so I'd love to understand what's the correct way to use them and why time flows differently :).
--Edit--
P.S. Had to put spaces everywhere in logs between "9" and "1", otherwise post was blocked due to "sensitive content" :D
Hi 26 beta guys,
I have apps using ARKit.
In iPadOS 26 beta, ARKit stops working after switching to other apps.
how to:
Enable WindowMode in iPadOS 26
Launch my app and start ARSession
Switch to another app (preference app, etc.)
Switch back to my app
AR stops updating camerafeed.
I debug printed ARSessionDelegate, and found that
after sessionWasInterrupted was called, sessionInterruptionEnded was never called.
sessionInterruptionEnded is called if WindowMode disabled.
Is this just a bug for 26 beta?
I suspect there is similar problem with non-AR camera.
Any idea?
Topic:
Spatial Computing
SubTopic:
ARKit
Hello, I am trying to develop an app that broadcasts what the user sees via Apple Vision Pro. I am a graduate student studying at the university.
And I have two problems,
If I want to use passthrough in screen capture (in VisionOS), do I have to join Apple Developer Enterprise Program to get Enterprise API?
and Can I buy Apple Developer Enterprise Program (Enterprise API) with my university account?
Have any of you been able to do this?
Thank you
I tested the new visionOS object tracking and it worked really well.
I have created a reference object using Create ML and it really detected the object.
My question is: does it work also with iOS and, if not right now, is it planned to work in mobile iOS in the future?
Topic:
Spatial Computing
SubTopic:
ARKit
In visionOS, I want to make a watch. After the actual production, the display of the hand makes it impossible to see the watch due to the virtual watch.
How to set up the watch to give priority to display?
Topic:
Spatial Computing
SubTopic:
ARKit
How can I create a 3D model of clothing that behaves like real fabric, with realistic physics? Is it possible to achieve this model by photogrammetry? I want to use this model in the Apple Vision Pro and interact with it using hand gestures.