I am trying to create an object in immersive space that is partially transparent (~50% opacity). I have implemented this in a few different ways including creating a model entity and setting its opacity component to 0.5, and creating a custom material with blending set to a transparent opacity of 0.5. These both work partially, as they behaved as intended for many cases, but seemingly randomly would act like occlusion material and block any other immersive content behind them, showing the real world instead.
Some notes: I am using RealityKit to render the semi-transparent object and an opaque object that is behind the semi-transparent object. I am using VisionOS 2.1, and am updating the location of the semi-transparent object often. Both objects are ModelEntities.
I would appreciate any guidance on how to implement this. Please let me know if there are any other questions.
Discuss spatial computing on Apple platforms and how to design and build an entirely new universe of apps and games for Apple Vision Pro.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
I'm starting my journey in developing an immersive app for VisionOS. I've been making steady progress, but I've encountered a specific challenge that I haven't been able to resolve.
I created two ModelEntity objects — a sphere and a cube — and added a DragGesture to the cube. When I drag the cube over the sphere, the two collide correctly, and the collision is logged in the console. So far, everything works as expected.
However, when I try to anchor the cube to my hand, the collision stops working. It's as if the cube loses its ability to detect collisions once it's anchored.
Any guidance or clarification on this behavior would be greatly appreciated.
// ImmersiveView.swift
// estudos_vision
//
// Created by Lailan Rogerio Rodrigues Matos on 15/05/25.
//
import SwiftUI
import RealityKit
import RealityKitContent
struct ImmersiveView: View {
@Environment(AppModel.self) var appModel
@State private var session: SpatialTrackingSession?
@State private var box = ModelEntity()
@State private var subs: [EventSubscription] = []
@State private var ballEntity: Entity?
var body: some View {
RealityView { content in
// Load initial content from the RealityKit scene.
if let immersiveContentEntity = try? await Entity(named: "Immersive", in: realityKitContentBundle) {
content.add(immersiveContentEntity)
}
// Create and run a spatial tracking session.
let session = SpatialTrackingSession()
let configuration = SpatialTrackingSession.Configuration(tracking: [.hand])
_ = await session.run(configuration)
self.session = session
// Create a red box.
let boxMesh = MeshResource.generateBox(size: 0.2)
let material = SimpleMaterial(color: .red, isMetallic: false)
box = ModelEntity(mesh: boxMesh, materials: [material])
box.position.y += 0.15 // Position the box slightly above the origin.
// Configure the box for user interaction and physics.
box.components.set(InputTargetComponent(allowedInputTypes: .indirect)) // Make it interactive.
box.generateCollisionShapes(recursive: false) // Generate collision shapes for physics.
box.components.set(PhysicsBodyComponent( // Add physics behavior.
massProperties: .default,
material: .default,
mode: .kinematic // Use kinematic mode so it can be moved by user interaction.
))
box.components.set(GroundingShadowComponent(castsShadow: true)) // Add a shadow.
//content.add(box) //commented out to add to hand anchor
// Create a left hand anchor and add the box as a child.
let handAnchor = AnchorEntity(.hand(.left, location: .palm), trackingMode: .continuous)
handAnchor.addChild(box)
content.add(handAnchor) // Add the hand anchor to the scene.
// Create a sphere.
let ball = ModelEntity(mesh: .generateSphere(radius: 0.15))
ball.position = [0.0, 1.5, -1.0] // Initial position of the ball.
ball.generateCollisionShapes(recursive: false) // Add collision.
ball.name = "Sphere"
content.add(ball)
ballEntity = ball
// Subscribe to collision events between the box and other entities.
let event = content.subscribe(to: CollisionEvents.Began.self, on: box) { ce in
print("Collision between \(ce.entityA.name) and \(ce.entityB.name) occurred")
//ce.entityA.removeFromParent() // removes the colliding object
//ce.entityB.removeFromParent()
}
Task {
subs.append(event)
}
}
// Add a drag gesture to the box, allowing the user to move it.
.gesture(
DragGesture()
.targetedToEntity(box) // Target the drag gesture to the box.
.onChanged({ value in
// Update the position of the box based on the drag gesture.
box.position = value.convert(value.location3D, from: .local, to: box.parent!)
})
)
}
}
#Preview(immersionStyle: .full) {
ImmersiveView()
.environment(AppModel())
}
Topic:
Spatial Computing
SubTopic:
General
Hi guys,
In visionOS, when using a ZStack decorated with .glassBackgroundEffect(), you can see the 3D glass background from the front, but when viewed from the side, the view appears to have no thickness.
However, I noticed that in an app built by Apple, when viewing a glass background view from the side, it appears to have thickness.
I tried adding .frame(depth:) to a glass background view, but it appears as two separate layers spaced by the depth value.
My question is:
Is there a view modifier that adds visual thickness to a glass background view, as shown in the picture?
Or, if not, how should I write a custom view modifier to achieve this effect? Thanks!
According to the official documentation, the .blur(radius:) modifier could apply gaussian blur to a realityview. However, when applied directly to a RealityView, nothing inside it (neither 2D attachments nor 3D entities) appears to be blurred.
Here’s the test code:
struct ContentView: View {
var body: some View {
VStack(spacing: 20) {
Text("Above the RealityView")
.font(.title)
RealityView { content, attachments in
if let text = attachments.entity(for: "2dView") {
text.position.y = 0.1
content.add(text)
}
let box = ModelEntity(
mesh: .generateBox(size: 0.1),
materials: [SimpleMaterial(color: .red, isMetallic: true)]
)
content.add(box)
} attachments: {
Attachment(id: "2dView") {
Text("Above the Box")
.font(.title)
}
}
.frame(width: 300, height: 300)
.border(.blue)
.blur(radius: 99) // Has no visual effect
Text("Below the RealityView")
.font(.subheadline)
}
.padding()
}
}
My question:
How can I make .blur(radius:) visually affect the content rendered in a RealityView?
Can you provide a working example that .blur() to visually affect any part of a RealityView?
Thanks!
Anyone could share ideas or nodes setup to implement a gaussian blur on shader graph material, with a blur size parameter? Thanks!
I have an entity that was created using Mixamo, and it has an animation.
after the animation completes the mesh of the robot is not where the entity is positioned.
I want to do something like when the animation finishes, I set the root entity's transform to the mesh's transform. There are no transformations applied to any of the children of this root of the model, which means that the transformations are applied to the skeleton due the the playing of animations.
Is there a way where I can apply the final position of the root of the skeleton to the root entity to make sure to position the entity where the animation has ended just before the next animation plays?
Hello,
I'm developing a visionOS application for Apple Vision Pro that aims to scan unknown physical objects, capture their 3D data (such as meshes or point clouds), and export them as 3D models. Ideally, I'd also like to visualize these reconstructions in real-time within the headset.
This functionality is similar to what's available in Reality Composer on iPad and iPhone, but I'm seeking to implement it natively on Vision Pro.
I've reviewed the visionOS documentation but haven't found clear guidance on accessing LiDAR depth data or performing scene reconstruction.
Specifically, I'm interested in:
1.Accessing LiDAR or depth data from Vision Pro's sensors.
2.Utilizing ARKit's scene reconstruction capabilities on visionOS.
3.Exporting captured 3D data as models (e.g., USDZ or OBJ formats).
Are there APIs or frameworks in visionOS that support these features?
Topic:
Spatial Computing
SubTopic:
General
In my Reality Composer Pro workflow for Vision Pro development, I’m using xcrun realitytool image to pre-compress textures into .ktx format, typically using ASTC block compression. These textures are used for cubemaps and environment assets.
I’ve noticed that regardless of the image content—whether it’s a highly detailed photo or a completely black image—once compressed with the same ASTC block size (e.g., ASTC_8x8), the resulting .ktx file size is nearly identical. There appears to be no content-aware logic that adapts the compression ratio to the actual texture complexity.
In contrast, Unreal Engine behaves differently: even when all cubemap faces are imported at the same resolution as DDS textures, the engine performs content-aware compression during packaging:
Low-complexity images are compressed more aggressively
The final packaged file size varies based on content complexity
Since Reality Composer Pro requires textures to be pre-compressed as .ktx, there’s no opportunity for runtime optimization or per-image compression adjustment.
Just wondering: is there any recommended way to implement content-aware compression for .ktx textures in Reality Composer Pro?
Or any best practices to optimize .ktx sizes based on image complexity?
Thanks!
I'm developing an AR application for the iPad pro where the primary purpose is to overlay 3D design data on top of production parts. For alignment, we are using Vuforia (model targets) which work really well locally. The further the device is moved from the point of original alignment, we are seeing quite a bit of overlay error (drift?).
My primary questions are:
Are there any best practices to stabilize frame-to-frame tracking when using model targets? We are noticing drift as soon as the device starts moving (the drift appears to occur specifically in the direction the device is moving). After about 15 feet of movement, we are observing about 3-6" of overlay error
These use cases can be over 100 feet long. In order to reset drift, we understand we'll need multiple alignment points (model targets) along the way. Is there a standard/best practice for this? Ex: have a new alignment point every x-feet?
We are using plane anchors to set our alignment. Typically we attach it to the nearest plane; however, the anchor point can be very far away (the origin of the model, which often is not near where the virtual content is). Could this be the issue? The anchor is far from the plane that we attach it too. Would moving the anchor closer to the plane we attach it too improve stability? After a few steps, the plane we originally attach too will be out of FoV anyway.
Thanks in advance!
Topic:
Spatial Computing
SubTopic:
ARKit
Hi Apple Team and Developers,
First of all, I’d like to express my appreciation for the incredible results achieved using PhotogrammetrySession. I’ve been developing a portrait scanning app using Object Capture, and in many tests—especially with human models—I’ve found the reconstructed body surfaces are remarkably smooth and clean, often outperforming tools like Metashape and RealityCapture in terms of aesthetic results.
However, I’ve encountered some challenges when working with complex areas like long hair overlapping the face. For instance, with female models where strands of hair partially occlude the face, the resulting mesh tends to merge the hair and facial geometry. This leads to distorted or “melted” facial features, likely due to ambiguity in the geometry estimation phase.
Feature Suggestion:
Would it be possible to allow developers to supply two versions of the input images:
• One version (original) for texture generation
• A pre-processed version (e.g., contrast-enhanced or CLAHE filtered) to guide mesh reconstruction only
This would give us the flexibility to enhance edge features or shadow detail without affecting the final texture appearance. In other photogrammetry pipelines, applying image enhancement selectively before dense reconstruction improves geometry quality in low-contrast areas.
Question:
Is there any plan to support this kind of two-path workflow in future versions of PhotogrammetrySession? Or perhaps expose more intermediate stages or tunable parameters to developers?
Also, any hints on what we can expect from WWDC 2025 regarding improvements to Object Capture or related vision/3D technologies?
Thanks again for this powerful API. Looking forward to hearing insights from the team and other developers.
Warm regards,
KitCheng
Hi Apple Team,
I’m working on a human portrait scanning application using PhotogrammetrySession, and I’ve been very impressed by the results. Thank you for building such a powerful and accessible photogrammetry solution into macOS!
I do, however, have a question regarding mesh detail limitations on different Mac hardware configurations.
When using PhotogrammetrySession.Request.Detail.custom and trying to set maximumPolygonCount = 1000000, I see the following log message:
Clamped max poly count: 1000000 to device limit. 250000 is used.
This is on an M1 Max with 32 GB RAM.
I’m aware that PhotogrammetrySession.limits can report values like maximumInputImageDimension and maximumNumberOfInputImages, but I haven’t found documentation on how the maximumPolygonCount is determined, and what hardware specs influence it.
Is it tied more to:
• GPU performance (e.g. neural/graphics cores)?
• CPU architecture?
• Memory size or bandwidth?
• Or is it fixed per SoC generation?
I’d love to understand what kind of hardware upgrades (e.g. moving to M4 Pro or increasing RAM) could allow me to increase mesh complexity and generate more detailed models.
Any insights would be greatly appreciated—and if this is covered in upcoming WWDC sessions or documentation, I’d be happy to tune in.
Thanks in advance!
KitCheng
I want to display a huge image in RealityView in 3d space on Vision Pro. of course instead of one giant file I'm using a lot of big images.
to achieve this, I'm generating multiple planes exactly beside each others and put each image on them. although the planes are exactly beside each others but there is still a white gap between them.(image below)
**Does anybody know how to fix this issue? **
Topic:
Spatial Computing
SubTopic:
General
Tags:
RealityKit
Reality Composer Pro
Shader Graph Editor
visionOS
I am allowing users to go through and capture different rooms, and add a custom label to that room. Is there a way to store data about this in the captured room so that it persists into the final merge? As it is now, My users mark all their merges with custom labels, but after merging there is no way to remember which room is which in the merging process so they have to go through and manually add the labels back. For larger floor plans this is not ideal.
In several visionOS apps, we readjust our scenes to the user's eye level (their heads). But, we have encountered issues whereby the WorldTrackingProvider returns bad/incorrect positions for the first x number of frames.
See below code which you can copy paste in any Immersive Space. Relaunch the space and observe the numberOfBadWorldInfos value is inconsistent.
a. what is the most reliable way to get the devices's position?
b. is this indeed a bug?
c. are we using worldInfo improperly?
d. as a workaround, in our apps we set to 10 the number of frames to let pass before using worldInfo, should we set our threshold differently?
import ARKit
import Combine
import OSLog
import SwiftUI
import RealityKit
import RealityKitContent
let SUBSYSTEM = Bundle.main.bundleIdentifier!
struct ImmersiveView: View {
let logger = Logger(subsystem: SUBSYSTEM, category: "ImmersiveView")
let session = ARKitSession()
let worldInfo = WorldTrackingProvider()
@State var sceneUpdateSubscription: EventSubscription? = nil
@State var deviceTransform: simd_float4x4? = nil
@State var numberOfBadWorldInfos = 0
@State var isBadWorldInfoLoged = false
var body: some View {
RealityView { content in
try? await session.run([worldInfo])
sceneUpdateSubscription = content.subscribe(to: SceneEvents.Update.self) { event in
guard let pose = worldInfo.queryDeviceAnchor(atTimestamp: CACurrentMediaTime()) else {
return
}
// `worldInfo` does not return correct values for the first few frames (exact number of frames is unknown)
// - known SO: https://stackoverflow.com/questions/78396187/how-to-determine-the-first-reliable-position-of-the-apple-vision-pro-device
deviceTransform = pose.originFromAnchorTransform
if deviceTransform!.columns.3.y < 1.6 {
numberOfBadWorldInfos += 1
logger.warning("\(#function) \(#line) deviceTransform.columns.3.y \(deviceTransform!.columns.3.y), numberOfBadWorldInfos \(numberOfBadWorldInfos)")
} else {
if !isBadWorldInfoLoged {
logger.info("\(#function) \(#line) deviceTransform.columns.3.y \(deviceTransform!.columns.3.y), numberOfBadWorldInfos \(numberOfBadWorldInfos)")
}
isBadWorldInfoLoged = true // stop logging.
}
}
}
}
}
I want to implement the functions in this video, how should I set the window
In my Reality Composer Pro workflow for Vision Pro development, I’m using xcrun realitytool image to pre-compress textures into .ktx format, typically using ASTC block compression. These textures are used for cubemaps and environment assets.
I’ve noticed that regardless of the image content—whether it’s a highly detailed photo or a completely black image—once compressed with the same ASTC block size (e.g., ASTC_8x8), the resulting .ktx file size is nearly identical. There appears to be no content-aware logic that adapts the compression ratio to the actual texture complexity.
In contrast, Unreal Engine behaves differently: even when all cubemap faces are imported at the same resolution as DDS textures, the engine performs content-aware compression during packaging:
Low-complexity images are compressed more aggressively
The final packaged file size varies based on content complexity
Since Reality Composer Pro requires textures to be pre-compressed as .ktx, there’s no opportunity for runtime optimization or per-image compression adjustment.
Just wondering: is there any recommended way to implement content-aware compression for .ktx textures in Reality Composer Pro?
Or any best practices to optimize .ktx sizes based on image complexity?
Thanks!
I am trying to launch a fully immersive game from Unity on a SwiftUI view. The game is using Metal Rendering with Compositor Services.
I added the unity Xcode project into the workspace, added the necessary bridge code. When I click on the button to call ufw?.showUnityWindow(), it does not start and I get the following in the console:
AR session failed to start after 5 seconds. Is the app configured to use an immersive space?
I have two RealityView: ParentView and When click the button in ParentView, ChildView will be shown as full screen cover, but the camera feed in ChildView will not be shown, only black screen.
If I show ChildView directly, it works with camera feed.
Please help me on this issue? Thanks.
import RealityKit
import SwiftUI
struct ParentView: View{
@State private var showIt = false
var body: some View{
ZStack{
RealityView{content in
content.camera = .virtual
let box = ModelEntity(mesh: MeshResource.generateSphere(radius: 0.2),materials: [createSimpleMaterial(color: .red)])
content.add(box)
}
Button("Click here"){
showIt = true
}
}
.fullScreenCover(isPresented: $showIt){
ChildView()
.overlay(
Button("Close"){
showIt = false
}.padding(20),
alignment: .bottomLeading
)
}
.ignoresSafeArea(.all)
}
}
import ARKit
import RealityKit
import SwiftUI
struct ChildView: View{
var body: some View{
RealityView{content in
content.camera = .spatialTracking
}
}
}
SpatialEventGesture Not Working to Show Hidden Menu in Immersive Panorama View - visionOS
Problem Description
I'm developing a Vision Pro app that displays 360° panoramic photos in a full immersive space. I have a floating menu that auto-hides after 5 seconds, and I want users to be able to show the menu again using spatial gestures (particularly pinch gestures) when it's hidden.
However, the SpatialEventGesture implementation is not working as expected. The menu doesn't appear when users perform pinch gestures or other spatial interactions in the immersive space.
Current Implementation
Here's the relevant gesture detection code in my ImmersiveView:
import SwiftUI
import RealityKit
struct ImmersiveView: View {
@EnvironmentObject var appModel: AppModel
@Environment(\.openWindow) private var openWindow
var body: some View {
RealityView { content in
// RealityView content setup with panoramic sphere...
let rootEntity = Entity()
content.add(rootEntity)
// Load panoramic content here...
}
// Using SpatialEventGesture to handle multiple spatial gestures
.gesture(
SpatialEventGesture()
.onEnded { eventCollection in
// Check menu visibility state
if !appModel.isPanoramaMenuVisible {
// Iterate through event collection to handle various gestures
for event in eventCollection {
switch event.kind {
case .touch:
print("Detected spatial touch gesture, showing menu")
showMenuWithGesture()
return
case .indirectPinch:
print("Detected spatial pinch gesture, showing menu")
showMenuWithGesture()
return
case .pointer:
print("Detected spatial pointer gesture, showing menu")
showMenuWithGesture()
return
@unknown default:
print("Detected unknown spatial gesture: \(event.kind)")
showMenuWithGesture()
return
}
}
}
}
)
// Keep long press gesture as backup
.simultaneousGesture(
LongPressGesture(minimumDuration: 1.5)
.onEnded { _ in
if !appModel.isPanoramaMenuVisible {
print("Detected long press gesture, showing menu")
showMenuWithGesture()
}
}
)
}
private func showMenuWithGesture() {
if !appModel.isPanoramaMenuVisible {
appModel.showPanoramaMenu()
if !appModel.windowExists(id: "PanoramaMenu") {
openWindow(id: "PanoramaMenu", value: "menu")
}
}
}
}
What I've Tried
Multiple SpatialTapGesture approaches: Originally tried using multiple .gesture() modifiers with SpatialTapGesture(count: 1) and SpatialTapGesture(count: 2), but realized they override each other.
SpatialEventGesture implementation: Switched to SpatialEventGesture to handle multiple event types (.touch, .indirectPinch, .pointer), but pinch gestures still don't trigger the menu.
Added debugging: Console logs show that the gesture callbacks are never called when performing pinch gestures in the immersive space.
Backup LongPressGesture: Added a simultaneous long press gesture as backup, which also doesn't work consistently.
Expected Behavior
When the panorama menu is hidden (after 5-second auto-hide), users should be able to:
Perform a pinch gesture (indirect pinch) to show the menu
Tap in space to show the menu
Use other spatial gestures to show the menu
Questions
Is SpatialEventGesture the correct approach for detecting gestures in a full immersive RealityView?
Are there any special considerations for gesture detection when the RealityView contains a large panoramic sphere that might be intercepting gestures?
Should I be using a different gesture approach for visionOS immersive spaces?
Is there a way to ensure gestures work even when the RealityView content (panoramic sphere) might be blocking them?
Environment
Xcode 16.1
visionOS 2.5
Testing on Vision Pro device
App uses SwiftUI + RealityKit
Any guidance on the proper way to implement spatial gesture detection in visionOS immersive spaces would be greatly appreciated!
Additional Context
The app manages multiple windows and the gesture detection should work specifically when in the immersive panorama mode with the menu hidden.
Thank you for any help or suggestions!
Greetings. I am having this issue with a Unity Polyspatial VisionOS app.
We have our main Bounded Volume for our app.
We have other Native UI windows that appear when we interact with objects in our Bounded Volume.
If a user closes our main Bounded Volume...sometimes it quits the app. Sometimes it doesn't.
If we go back to the home screen and reopen the app, our main Bounded Volume doesn't always appear, and just the Native UI windows we left open are visible. But, we can sometimes still hear sounds that are playing in our Bounded Volume.
What solutions are there to make sure our Bounded Volume always appears when the app is open?