Discuss spatial computing on Apple platforms and how to design and build an entirely new universe of apps and games for Apple Vision Pro.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
The WWDC25 video and notes titled “Learn About Apple Immersive Video Technologies” introduced the Apple Spatial Audio Format (ASAF) and codec (APAC). However, despite references throughout on using immersive video, there is scant information on ASAF/APAC (including no code examples and no framework references), and I’ve found no documentation in Apple’s APIs/Frameworks about its implementation and use months on.
I want to leverage ambisonic audio in my app. I don’t want to write a custom AU if APAC will be opened up to developers. If you read the notes below along with the iPhone 17 advertising (“Video is captured with Spatial Audio for immersive listening”), it sounds like this is very much a live feature in iOS26.
Anyone know the state of play? I’m across how the PHASE engine works, which is unrelated to what I’m asking about here.
Original quote from video referenced above: “ASAF enables truly externalized audio experiences by ensuring acoustic cues are used to render the audio. It’s composed of new metadata coupled with linear PCM, and a powerful new spatial renderer that’s built into Apple platforms. It produces high resolution Spatial Audio through numerous point sources and high resolution sound scenes, or higher order ambisonics.”
”ASAF is carried inside of broadcast Wave files with linear PCM signals and metadata. You typically use ASAF in production, and to stream ASAF audio, you will need to encode that audio as an mp4 APAC file.”
”APAC efficiently distributes ASAF, and APAC is required for any Apple immersive video experience. APAC playback is available on all Apple platforms except watchOS, and supports Channels, Objects, Higher Order Ambisonics, Dialogue, Binaural audio, interactive elements, as well as provisioning for extendable metadata.”
Topic:
Spatial Computing
SubTopic:
General
I tried to show spatial photo on my application by swiftUI's Image but it just show flat version of it even I Use Vision Pro,
so, how can I show spatial photo to users,
does there any options for this?
I have recently started testing ARKit on an iPhone 16 Pro and I have noticed that the AutoFocus reaction on this device is much slower than other devices. For example, if I point the camera to a close object AutoFocus takes 4-5 seconds to stabilize, the focal length is adjusted very very slowly. In some cases (although this is rare) AutoFocus seems almost stuck and requires a bit of device movement to trigger.
This is quite problematic when using some ARKit features like Image and Object detection as the detection algorithms struggle with out-of-focus images.
This problem is limited to ARKit. AutoFocus is significantly more responsive when the standard AVFoundation Camera API is used.
This behavior is easy to reproduce with any of the ARKit samples like https://developer.apple.com/documentation/arkit/arkit_in_ios/content_anchors/tracking_and_visualizing_planes
Is anybody else experiencing this problem?
I am develop visionOS app. I am now very interested in Metal and Compositor Services, but I have not explored them in depth. I know that Metal has a higher degree of control freedom. I am wondering if using Compositor Services will have fewer functions than RealityKit in AR technology (such as scene reconstruction and understanding, hover effect, etc.).
We are using the ARKit image tracking feature on visionOS 2.0 with three pre-registered images. The image tracking works, but only one image is actively tracked at a time. When more than one target image is visible to the camera, it has difficulty detecting and tracking the other images.
Is this the expected behavior in visionOS, or is there something we need to do to resolve this issue?
How do you call the effect where the edges around the central image gradually become transparent? This effect is also seen when viewing immersive mode of spatial photos in Vision Pro. How can I achieve this effect using SwiftUI or ShaderGraph? I want to use this effect when displaying images in my app.
I have an open Feedback conversation with Apple on this topic, but I am curious if others have run into this, or want to try out my sample code in their set up.
there are two API’s for reading controller buttons, axis, and D pads: GCPhysicalInputProfile and GCControllerLiveInput. There are inconsistencies in behaviour between the two of them. Apple recommends we use GCControllerLiveInput, however, there are some capabilities on these controllers that are only accessible through GCPhysicalInputProfile, as I’ll discuss below.
PSVR2 R2/L2 buttons, a.k.a. triggers, have force input analogue values. These can only be accessed on GCPhysicalInputProfile
PSVR2 thumbstick direction values are read through “axes” on GCPhysicalInputProfile, but only “dpads” on GCControllerLiveInput
on both GCPhysicalInputProfile and GCControllerLiveInput, All pressed events of all buttons are fired properly using generic aliases ( Trigger, Grip ,Menu, Right Thumbstick, Left Thumbstick, Right Button A & B (Circle & Cross), Left Button A&B (Triangle and Square) ). Apple reserves the system button as the equivalent of a home button for the OS.
on GCPhysicalInputProfile, touch events are fired when the button is also pressed, but not for only touches.
on GCControllerLiveInput , Touch events only works for the following buttons: Left Thumbstick, Right Thumbstick, Right Button A (Circle), and Right Button B (Cross). But Right Button B touch event isn’t labelled correctly, it fires as the Right Button A event.
I observed this inside ALVR which uses a polling based approach to event processing:
https://github.com/alvr-org/alvr-visionos/blob/17b5968f9d894944b53e97134b39dfce0993302a/ALVRClient/WorldTracker.swift#L301
To simplify to see this on a very simple app, I used the Apple example TrackingAccessories application:
https://developer.apple.com/documentation/ARKit/tracking-accessories-in-volumetric-windows
I’ve attached the code that replaces the AccessoryTrackingModel class. I added code that prints out what is touched/pressed, see the trackAllConnectedSpatialControllers method:
https://github.com/svrc/TrackingAccessories
I downloaded the official sample project “Accessing the Main Camera”, but I found that it’s not able to retrieve the camera feed on visionOS 26.1. After checking the debug logs, it seems the issue is caused by the system being unable to find the expected format.
I tested on a device running visionOS 2, and the camera feed worked correctly — but only when using the sample code from the visionOS 2 version, not the current one. I also noticed that some of the APIs have changed between versions.
Has anyone managed to successfully access the camera feed on visionOS 26.1?
I made an animation in Blender using geometry nodes that I exported to USDC file (then I used Reality Converter to convert to USDZ) and I can see the animation when viewing from the finder but does not play after importing to RCP. Any idea how I can play the animation? Or can the animation be accessed through Xcode?
Thanks!
Hi,
since iOS 15 I've repeatedly noticed the console warning »ARSessionDelegate is retaining X ARFrames. This can lead to future camera frames being dropped« even for rather simple projects using RealityKit and ARKit. Could someone from the ARKit team please elaborate what causes this warning and what can be done to avoid it?
If I remember correctly I didn't even assign an ARSessionDelegate.
Thank you!
The issue reproducible with empty project. When you run it and tap "Open immersive space" it takes a couple of minutes to respond. The issue only reproducible on real device with debugger attached. Reproducible other developers too (not specific to my environment). Issue doesn't exists in Xcode 16.
Afer initial long delay subsequent opens works fine.
Console logs:
nw_socket_copy_info [C1:2] getsockopt TCP_INFO failed [102: Operation not supported on socket]
nw_socket_copy_info getsockopt TCP_INFO failed [102: Operation not supported on socket]
Failed to set dependencies on asset 9303749952624825765 because NetworkAssetManager does not have an asset entity for that id.
void * _Nullable NSMapGet(NSMapTable * _Nonnull, const void * _Nullable): map table argument is NULL
PSO compilation completed for driver shader copyFromBufferToTexture so=0 sbpr=256 sbpi=16384 ss=(64, 64, 1) p=70 sc=1 ds=0 dl=0 do=(0, 0, 0) in 1997
XPC connection interrupted
<<<< FigAudioSession(AV) >>>> audioSessionAVAudioSession_CopyMXSessionProperty signalled err=-19224 (kFigAudioSessionError_UnsupportedOperation) (getMXSessionProperty unsupported) at FigAudioSession_AVAudioSession.m:606
Failed to load item AXCodeItem<0x14706f250> [Rank:6000] SpringBoardUIServices [AXBundle name:/System/Library/AccessibilityBundles/SpringBoardUIServices.axbundle/SpringBoardUIServices] [Platforms and Targets:{ iOS = SpringBoardUIServices; } Framework] [Excluded: (null)]. error: Error Domain=AXLoading Code=0 "URL does not exist: file:///System/Library/AccessibilityBundles/SpringBoardUIServices.axbundle" UserInfo={NSLocalizedDescription=URL does not exist: file:///System/Library/AccessibilityBundles/SpringBoardUIServices.axbundle}
Failed to load item AXCodeItem<0x14706f250> [Rank:6000] SpringBoardUIServices [AXBundle name:/System/Library/AccessibilityBundles/SpringBoardUIServices.axbundle/SpringBoardUIServices] [Platforms and Targets:{ iOS = SpringBoardUIServices; } Framework] [Excluded: (null)]. error: Error Domain=AXLoading Code=0 "URL does not exist: file:///System/Library/AccessibilityBundles/SpringBoardUIServices.axbundle" UserInfo={NSLocalizedDescription=URL does not exist: file:///System/Library/AccessibilityBundles/SpringBoardUIServices.axbundle}
[b30780-MRUIFeedbackTypeButtonWithBackgroundTouchDown] Playback timed out before completion (after 3111 ms)
Failed to set dependencies on asset 7089614247973236977 because NetworkAssetManager does not have an asset entity for that id.
the following documentation tells me that the CameraFrame.Sample.Parameters.extrinsics is of type simd_float4x4, great!
https://developer.apple.com/documentation/arkit/cameraframe/sample/parameters/4443449-extrinsics
I have read in the answer of another post that this extrinsics represents the pose of the physical camera relative to the device anchor.
Did I understand correctly that the device anchor is where the scene is rendered from onto the user's display?
What is the coordinate system in which this offset is defined, which axis is left, which one is up, which one is forward?
The last column of the extrinsics seems to define a translation of approximately 2 cm along the x axis, -2cm along the y axis and -5 cm along the z axis. I tried to measure the physical distance between the main left and right cameras in order to find out if it's rather 2cm or 5 cm from the "middle", it looks more like 5, so I assume that the z axis is looking towards the right (from the user's perspective). Is that so? For x and y, I assume that the physical camera is approximately 2 cm to the front of the user and 2cm to the bottom, which of x and y is horizontal, which on vertical?
How is the camera image indexed, is it row-major and is the origin on the top left?
I am looking forward to learning about all the details on these extrinsics in order to make use of it.
Hi, I'm very new to 3D and am currently porting a SwiftUI iOS app to visionOS 2.0.
I saw WWDC24 feature Blender in multiple spatial videos, and have begun integrating Blender models and animations into my VisionOS app (I would also like to integrate skeletons and programmatic rigging, more on that later).
I'm wondering if there are “Best Practices” for this workflow - from Blender to USD to RCP 2.0 to visionOS 2 in Xcode. I’ve cobbled together the following that has some obvious holes:
I’ve been able to find some pre-rigged and pre-animated models online that can serve as a great starting point. As a reference, here is a free model from SketchFab - a simple rigged skeleton with 6 built in animations:
https://sketchfab.com/3d-models/skeleton-character-low-poly-8856e0138f424d68a8e0b40e185951f6
When exporting to USD from Blender, I haven’t been able to export more than one animation per USD file. Is there a workflow to export multiple animations in a single USDC file, or is this just not possible?
As a temporary workaround, here is a python script I’ve been using to loop through all Blender animations, and export a model for each animation:
import bpy
import os
# Set the directory where you want to save the USD files
output_directory = “/path/to/export”
# Ensure the directory exists
if not os.path.exists(output_directory):
os.makedirs(output_directory)
# Function to export current scene as USD
def export_scene_as_usd(output_path, start_frame, end_frame):
bpy.context.scene.frame_start = start_frame
bpy.context.scene.frame_end = end_frame
# Export the scene as a USD file
bpy.ops.wm.usd_export(
filepath=output_path,
export_animation=True
)
# Save the current scene name
original_scene = bpy.context.scene.name
# Iterate through each action and export it as a USD file
for action in bpy.data.actions:
# Create a new scene for each action
bpy.context.window.scene = bpy.data.scenes[original_scene].copy()
new_scene = bpy.context.scene
# Link the action to all relevant objects
for obj in new_scene.objects:
if obj.animation_data is not None:
obj.animation_data.action = action
# Determine the frame range for the action
start_frame, end_frame = action.frame_range
# Export the scene as a USD file
output_path = os.path.join(output_directory, f"{action.name}.usdc")
export_scene_as_usd(output_path, int(start_frame), int(end_frame))
# Delete the temporary scene to free memory
bpy.data.scenes.remove(new_scene)
print("Export completed.")
I have also been able to successfully export rigging armatures as a single Skeleton - each “bone” showing getting imported into Reality Composer Pro 2.0 when exporting/importing manually.
I would like to have all of these animations available in a single scene to be used in a RealityView in visionOS - so I have placed all animation models in a RCP scene and created named Timeline Action animations for each, showing the correct model and hiding the rest when triggering specific animations.
I apply materials/textures to each so they appear the same, using Shader Graph.
Then in SwiftUI I use notifications (as shown here - https://forums.developer.apple.com/forums/thread/756978) to trigger each RCP Timeline Action animation from code.
Two questions:
Is there a better way than to have multiple models of the same skeleton - each with a different animation - in a scene to be able to trigger multiple animations? Or would this require recreating Blender animations using skeleton rigging and keyframes from within RCP Timelines?
If I want to programmatically create custom animations and move parts of the skeleton/armatures - do I need to do this by defining custom components in RCP, using IKRig and define movement of each of the “bones” in Xcode?
I’m looking for any tips/tricks/workflow from experienced engineers or 3D artists that can create a more efficient/optimized workflow using Blender, USD, RCP 2 and visionOS 2 with SwiftUI.
Thanks so much, I appreciate any help! I am very excited about all the new tools that keep evolving to make spatial apps really fun to build!
Topic:
Spatial Computing
SubTopic:
Reality Composer Pro
Tags:
RealityKit
Reality Composer Pro
visionOS
Hi, in visionOS2, I wonder if there is a method to add/remove referenceObjects of ObjectTrackingProvider after ARSession started without stop it.
Now I must stop it and run it again. The current tracking is stopped.
Thanks.
I’d like to use ARKit world tracking and display both the back camera feed and the front camera feeds, using the front feed as as a PIP. This would work great for an internet streaming use case.
However, it’s impossible. As soon as ARKit is told to use one mode, the camera for the other side freezes/doesn’t work. This page also says you have to pick one camera to show: https://developer.apple.com/documentation/arkit/arkit_in_ios/choosing_which_camera_feed_to_augment?language=objc
A question to the developers: why is this limitation in-place? Are there any work-arounds for the use case of ARKit world tracking + displaying the back camera feed + displaying the front camera feed as an overlay?
It’s possible to do this with plain camera initialization without ARKit. (There’s an official example.) With ARKit, it no longer works.
It’s strange that I cannot access the front feed via one of the other frameworks, but I guess that ARKit blocks that.
I was watching the Developer videos, and there was mention that RealityView handles persistent world data differently and also automatically for us.
I am having an issue finding the material I need to get up to speed on that.
In ARKit, I was able to place a model with the world data and recall that .map data. It even stored a reference image for the scene to help match the world data.
I'm looking for the information on how to implement and work with those same features with RealityView, as it seems to be better/automatically integrated?
I need help being pointed in the right direction. Sample code would be amazing.
Topic:
Spatial Computing
SubTopic:
ARKit
I would like to visualize a point cloud taken from a lidar. Assuming I can get the XYZ values of every point (of which there may be hundreds or thousands), what is the most efficient way for me to create a point cloud using this information?
Hello,
For GuessTogether source code, it seems like the code assumes that you're already in a FaceTime call before pressing the custom SharePlay button (labeled "Play Guess Together"). If not already on a FaceTime call, my Apple Vision Pro and the visionOS simulator both do nothing after throwing warnings. Is this intended behavior?
If so, how do I make it so that pressing the button can also initiate FaceTime calls? Is this allowed?
Thank you!
Hi,
On visionOS to manage entity rotation we can rely on RotateGesture3D. We can even with the constrainedToAxis parameter authorize only rotation on an x, y or z axis or even make combinations.
What I want to know is if it is possible to constrain the rotation on axis automatically.
Let me explain, the functionality that I would like to implement is to constrain the rotation on an axis only once the user has started his gesture. The initial gesture the user makes should let us know which axis they want to rotate on.
This would be equivalent to activating a constraint automatically on one of the axes, as if we were defining the gesture on one of the axes.
RotateGesture3D(constrainedToAxis: .x)
RotateGesture3D(constrainedToAxis: .y)
RotateGesture3D(constrainedToAxis: .z)
Is it possible to do this?
If so, what would be the best way to do it?
A code example would be greatly appreciated.
Regards
Tof