I have recently started testing ARKit on an iPhone 16 Pro and I have noticed that the AutoFocus reaction on this device is much slower than other devices. For example, if I point the camera to a close object AutoFocus takes 4-5 seconds to stabilize, the focal length is adjusted very very slowly. In some cases (although this is rare) AutoFocus seems almost stuck and requires a bit of device movement to trigger.
This is quite problematic when using some ARKit features like Image and Object detection as the detection algorithms struggle with out-of-focus images.
This problem is limited to ARKit. AutoFocus is significantly more responsive when the standard AVFoundation Camera API is used.
This behavior is easy to reproduce with any of the ARKit samples like https://developer.apple.com/documentation/arkit/arkit_in_ios/content_anchors/tracking_and_visualizing_planes
Is anybody else experiencing this problem?
ARKit
RSS for tagIntegrate iOS device camera and motion features to produce augmented reality experiences in your app or game using ARKit.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Created
Subject: Combining ARKit Face Tracking with High-Resolution AVCapture and Perspective Rendering on Front Camera
Message:
Hello Apple Developer Community,
We’re developing an application using the front camera that requires both real-time ARKit face tracking/guidance and the capture of high-resolution still images via AVCaptureSession. Our goal is to leverage ARKit’s depth and face data to render a captured image from another perspective post-capture, maintaining high image quality.
Our Approach:
Real-Time ARKit Guidance:
Utilize ARKit (e.g., ARFaceTrackingConfiguration) for continuous face tracking, depth, and scene understanding to guide the user in real time.
High-Resolution Capture Transition:
At the moment of capture, we plan to pause the ARKit session and switch to an AVCaptureSession to take a high-resolution image.
We assume that for a front-facing image, the subject’s face is directly front-on, and the relative pose between the face and camera remains the same during the transition. The only variation we expect is a change in distance.
Our intention is to minimize the delay between the last ARKit frame and the high-res capture to maintain temporal consistency, assuming that aside from distance, the face-camera relative pose remains unchanged.
Post-Processing Perspective Rendering:
Using the last ARKit face data (depth, pose, and landmarks) along with the high-resolution 2D image, we aim to render the scene from another perspective.
We want to correct the perspective of the 2D image using SceneKit or RealityKit, leveraging the collected ARKit scene information to achieve a natural, high-quality rendering from a different viewpoint.
The rendering should match the quality of a normally captured high-resolution image, adjusting for the difference in distance while using the stored ARKit data to correct perspective.
Our Questions:
Session Transition Best Practices:
What are the recommended best practices to seamlessly pause ARKit and switch to a high-resolution AVCapture session on the front camera
How can we minimize user movement or other issues during this brief transition, given our assumption that the face-camera pose remains largely consistent except for distance changes?
Data Integration for Perspective Rendering:
How can we effectively integrate stored ARKit face, depth, and pose data with the high-res image to perform accurate perspective correction or rendering from another viewpoint?
Given that we assume the relative pose is constant except for distance, are there strategies or APIs to leverage this assumption for simplifying the perspective transformation?
Perspective Correction with SceneKit/RealityKit:
What techniques or workflows using SceneKit or RealityKit are recommended for correcting the perspective of a captured 2D image based on ARKit scene data?
How can we use these frameworks to render the high-resolution image from an alternative perspective, while maintaining image quality and fidelity?
4. Pitfalls and Guidelines:
What common pitfalls should we be aware of when combining ARKit tracking data with high-res capture and post-processing for perspective rendering?
Are there performance considerations, recommended thresholds for acceptable temporal consistency, or validation techniques to ensure the ARKit data remains applicable at the moment of high-res capture?
We appreciate any advice, sample code references, or documentation pointers that could assist us in implementing this workflow effectively.
Thank you!
I'm trying to implement a prototype to render virtual objects in a mixed immersive space on the camer frames captured by CameraFrameProvider.
Here are what I have done:
Get camera's instrinsics from frame.primarySample.parameters.intrinsics
Get camera's extrinsics from frame.primarySample.parameters.extrinsics
Get the device anchor by worldTrackingProvider.queryDeviceAnchor(atTimestamp: CACurrentMediaTime())
Setup a RealityKit.RealityRenderer to render virtual objects on the captured camera frames
let realityRenderer = try RealityKit.RealityRenderer()
realityRenderer.cameraSettings.colorBackground = .outputTexture()
let cameraEntity = PerspectiveCamera()
// see https://developer.apple.com/forums/thread/770235
let cameraTransform = deviceAnchor.originFromAnchorTransform * extrinsics.inverse
cameraEntity.setTransformMatrix(cameraTransform, relativeTo: nil)
cameraEntity.camera.near = 0.01
cameraEntity.camera.far = 100
cameraEntity.camera.fieldOfViewOrientation = .horizontal
// manually calculated based on camera intrinsics
cameraEntity.camera.fieldOfViewInDegrees = 105
realityRenderer.entities.append(cameraEntity)
realityRenderer.activeCamera = cameraEntity
Virtual objects, which should be seen in the camera frames, are clipped out by the camera transform.
If I use deviceAnchor.originFromAnchorTransform as the camera transform, virtual objects can be rendered on camera frames at wrong positions (I think it is because the camera extrinsics isn't used to adjust the camera to the correct position).
My question is how to use the camera extrinsic matrix for this purpose?
Does the camera extrinsics point to a similar orientation of the device anchor with some minor rotation and postion change? Here is an extrinsics from a camera frame. It seems that the direction of Y-axis and Z-axis are flipped by the extrinsics. So the camera is point to a wrong direction.
simd_float4x4([[0.9914258, 0.012555369, -0.13006608, 0.0], // X-axis
[-0.0009778949, -0.9946325, -0.10346654, 0.0], // Y-axis
[-0.13066702, 0.10270659, -0.98609203, 0.0], // Z-axis
[0.024519, -0.019568002, -0.058280986, 1.0]]) // translation
I was watching the Developer videos, and there was mention that RealityView handles persistent world data differently and also automatically for us.
I am having an issue finding the material I need to get up to speed on that.
In ARKit, I was able to place a model with the world data and recall that .map data. It even stored a reference image for the scene to help match the world data.
I'm looking for the information on how to implement and work with those same features with RealityView, as it seems to be better/automatically integrated?
I need help being pointed in the right direction. Sample code would be amazing.
Topic:
Spatial Computing
SubTopic:
ARKit
We're developing a VisionOS application, where we would like to do product recognition (like food items).
We have enterprise entitlements and therefore also main camera access for VisionOS. We send this live camera frames to a trained CoreML model where we will receive 2D coordinates from the model detection prediction.
Now, we would like to create a 3D anchor on the detected items so it can be visible for user. The 3D anchor is going to be the class name of the detected item.
How do we transform this 2D coordinate from the model prediction to a 3D anchor?
We are using the ARKit image tracking feature on visionOS 2.0 with three pre-registered images. The image tracking works, but only one image is actively tracked at a time. When more than one target image is visible to the camera, it has difficulty detecting and tracking the other images.
Is this the expected behavior in visionOS, or is there something we need to do to resolve this issue?
I am working on a project that requires access to the main camera on the Vision Pro. My main account holder applied for the necessary enterprise entitlement and we were approved and received the Enterprise.license file by email. I have added the Enterprise.license file to my project, and manually added the com.apple.developer.arkit.main-camera-access.allow entitlement to the entitlement file and set it to true since it was not available in the list when I tried to use the + Capability button in the Signing & Capabilites tab.
I am getting an error: Provisioning profile "iOS Team Provisioning Profile: " doesn't include the com.apple.developer.arkit.main-camera-access.allow entitlement. I have checked the provisioning profile settings online, and there is no manual option for adding the main camera access entitlement, and it does not seem to be getting the approval from the license.
Hello All,
I'm desperate to found a solution and I need your help please.
I've create a simple cube in Vision OS. I can get it by hand (close my hand on it) and move it pretty where I want. But, I would like to throw it (exemple like a basket ball). Not push it, I want to have it in hand and throw it away of me with a velocity and direction = my hand move (and finger opened to release it).
Please put me on the wait to do that.
Cheers and thanks
Mathis
Topic:
Spatial Computing
SubTopic:
ARKit
What is the reason the hand-tracking joints have these axes? I'm trying to create a virtual hands model and that's a mess.
Hello Community,
I'm encountering an issue with the latest iOS 17 update, specifically related to RoomPlan version-2. In iOS 16, when using RoomPlan version-1, we were able to display stairs in our app. However, after upgrading to iOS 17 and implementing RoomPlan version-2, the stairs are no longer visible.
Despite thorough investigation, I couldn't find any option within the code to show or hide stairs, or any other objects for that matter. It seems like a specific issue with the update rather than a coding error on our part.
Has anyone else encountered a similar problem? If so, I would greatly appreciate any insights or solutions you might have. It's crucial for our app functionality to have stairs displayed accurately, and we're currently at a loss on how to address this issue.
Thank you in advance for any assistance you can provide.
Best regards
I am working with MeshAnchors, and I am having troubles getting to the classification of the triangles/faces.
This post references the MeshAnchor.Geometry, and that struct does have a property named "classifications", but it is of type GeometrySource. I cannot find any classification information in GeometrySource. Am I missing something there?
I think I am looking for something of type MeshAnchor.MeshClassification, but I cannot find any structs with this as a property.
We are working on a world scale AR app that leverages the device location and heading to place objects in the streets, so that they are correctly and stably anchored to certain locations.
Since the geo-tracking imagery is only available in certain cities and areas, we are trying to figure out how to fallback when geo-tracking is not available as the device move away, to still retain good AR camera accuracy. We might need to come up with some algorithm using the device GPS, to line up the ARCamera with our objects.
Question: Does geo-tracking always provide greater than or equal to the accuracy of world tracking, for a GPS outdoor AR experience?
If so, we can simply use the ARGeoTrackingConfiguration for the entire time, and rely on the ARView keeping itself aligned. Otherwise, we need to switch between it and ARWorldTrackingConfiguration when geo-tracking is not available and/or its accuracy is low, then roll our own algorithm to keep the camera aligned.
Thanks.
Error:
RoomCaptureSession.CaptureError.exceedSceneSizeLimit
Apple Documentation Explanation:
An error that indicates when the scene size grows past the framework’s limitations.
Issue:
This error is popping up in my iPhone 14 Pro (128 GB) after a few roomplan scans are done. This error shows up even if the room size is small. It occurs immediately after I start the RoomCaptureSession after the relocalisation of previous AR session (in world tracking configuration). I am having trouble understanding exactly why this error shows and how to debug/solve it.
Does anyone have any idea on how to approach to this issue?
Hi everyone! I am working on AR app and wanted to implement object occlusion because it removes drift pretty much from the object. This working great with RealityKit sample But I am unable to replicate such behaviour it with scenekit. Because scenekit does not offer object occlusion. Can we say scenekit is getting depricated, and we should re-write app in RealityKit (which is obviously a big task)?
Hi,
since iOS 15 I've repeatedly noticed the console warning »ARSessionDelegate is retaining X ARFrames. This can lead to future camera frames being dropped« even for rather simple projects using RealityKit and ARKit. Could someone from the ARKit team please elaborate what causes this warning and what can be done to avoid it?
If I remember correctly I didn't even assign an ARSessionDelegate.
Thank you!