Combining ARKit Face Tracking with High-Resolution AVCapture and Perspective Rendering on Front Camera

Subject: Combining ARKit Face Tracking with High-Resolution AVCapture and Perspective Rendering on Front Camera

Message:

Hello Apple Developer Community,

We’re developing an application using the front camera that requires both real-time ARKit face tracking/guidance and the capture of high-resolution still images via AVCaptureSession. Our goal is to leverage ARKit’s depth and face data to render a captured image from another perspective post-capture, maintaining high image quality.

Our Approach:

  1. Real-Time ARKit Guidance:
  • Utilize ARKit (e.g., ARFaceTrackingConfiguration) for continuous face tracking, depth, and scene understanding to guide the user in real time.
  1. High-Resolution Capture Transition:
  • At the moment of capture, we plan to pause the ARKit session and switch to an AVCaptureSession to take a high-resolution image.
  • We assume that for a front-facing image, the subject’s face is directly front-on, and the relative pose between the face and camera remains the same during the transition. The only variation we expect is a change in distance.
  • Our intention is to minimize the delay between the last ARKit frame and the high-res capture to maintain temporal consistency, assuming that aside from distance, the face-camera relative pose remains unchanged.
  1. Post-Processing Perspective Rendering:
  • Using the last ARKit face data (depth, pose, and landmarks) along with the high-resolution 2D image, we aim to render the scene from another perspective.
  • We want to correct the perspective of the 2D image using SceneKit or RealityKit, leveraging the collected ARKit scene information to achieve a natural, high-quality rendering from a different viewpoint.
  • The rendering should match the quality of a normally captured high-resolution image, adjusting for the difference in distance while using the stored ARKit data to correct perspective.

Our Questions:

  1. Session Transition Best Practices:
  • What are the recommended best practices to seamlessly pause ARKit and switch to a high-resolution AVCapture session on the front camera
  • How can we minimize user movement or other issues during this brief transition, given our assumption that the face-camera pose remains largely consistent except for distance changes?
  1. Data Integration for Perspective Rendering:
  • How can we effectively integrate stored ARKit face, depth, and pose data with the high-res image to perform accurate perspective correction or rendering from another viewpoint?
  • Given that we assume the relative pose is constant except for distance, are there strategies or APIs to leverage this assumption for simplifying the perspective transformation?
  1. Perspective Correction with SceneKit/RealityKit:

What techniques or workflows using SceneKit or RealityKit are recommended for correcting the perspective of a captured 2D image based on ARKit scene data? How can we use these frameworks to render the high-resolution image from an alternative perspective, while maintaining image quality and fidelity? 4. Pitfalls and Guidelines:

  • What common pitfalls should we be aware of when combining ARKit tracking data with high-res capture and post-processing for perspective rendering?
  • Are there performance considerations, recommended thresholds for acceptable temporal consistency, or validation techniques to ensure the ARKit data remains applicable at the moment of high-res capture?

We appreciate any advice, sample code references, or documentation pointers that could assist us in implementing this workflow effectively.

Thank you!

Hi @ONIO,

Have you tried the ARKit session api to capture a high resolution image:

https://developer.apple.com/documentation/arkit/arsession/3975720-capturehighresolutionframe

This allows you to grab a high resolution image while keeping the ARKit session running to continue to provide guidance. There is a WWDC video from 2022 here that explains more:

https://developer.apple.com/videos/play/wwdc2022/10126

Hi @Vision Pro Engineer , Thank you for the links. I indeed tried captureHighResolutionFrame(completion:) on my iPhone 14 Pro with iOS 18.1.1 and was able to get a 1512 × 2016 frame instead of the standard 1080 × 1440. I tested it with the Tracking and Visualizing Faces sample app. Unfortunately requires our use case at least the 7MP (2316 × 3088) from the front camera. Is this (1512 × 2016) actually the highest resolution frame I can get with my setup and using ARKit? Or do I need to pay attention other configuration settings?

In the video everything is about ARWorldTrackingConfiguration. Does it also apply to ARFaceTrackingConfiguration? I asking because I was not able to get a higher resolution stream. The following returned me nil

    // Assign the video format that supports hi-res capturing.
config.videoFormat = hiResCaptureVideoFormat
}
// Run the session.
session.run(config)

Since also features like triggering focus events and other device settings could be beneficial, I tried to access the device as described in the video as well with:

   do {
      try device.lockForConfiguration()
      // configure AVCaptureDevice settings
      …
      device.unlockForConfiguration()
   } catch {
      // error handling
      …
   }
}

But I was not able to access it. Should it be possible?

I investigated into a fast session switching, but was not able to get it faster than 1.6 seconds which brakes the user experience heavily. Below you can find the code that I used to switch sessions and capture an image.

import ARKit
import SceneKit
import UIKit
import AVFoundation

class ViewController: UIViewController, ARSessionDelegate {
    
    // MARK: - Outlets

    @IBOutlet var sceneView: ARSCNView!
    @IBOutlet weak var tabBar: UITabBar!

    // MARK: - Properties
    
    var faceAnchorsAndContentControllers: [ARFaceAnchor: VirtualContentController] = [:]
    
    var selectedVirtualContent: VirtualContentType! {
        didSet {
            guard oldValue != nil, oldValue != selectedVirtualContent
                else { return }
            
            // Remove existing content when switching types.
            for contentController in faceAnchorsAndContentControllers.values {
                contentController.contentNode?.removeFromParentNode()
            }
            
            // If there are anchors already (switching content), create new controllers and generate updated content.
            // Otherwise, the content controller will place it in `renderer(_:didAdd:for:)`.
            for anchor in faceAnchorsAndContentControllers.keys {
                let contentController = selectedVirtualContent.makeController()
                if let node = sceneView.node(for: anchor),
                   let contentNode = contentController.renderer(sceneView, nodeFor: anchor) {
                    node.addChildNode(contentNode)
                    faceAnchorsAndContentControllers[anchor] = contentController
                }
            }
        }
    }
    
    // MARK: - AVCaptureSession Properties
    
    var captureSession: AVCaptureSession?
    var photoOutput: AVCapturePhotoOutput?
    var captureDevice: AVCaptureDevice?
    var captureCompletion: ((UIImage?, Error?) -> Void)?
    
    // Dedicated serial queue for AVCaptureSession
    let captureSessionQueue = DispatchQueue(label: "com.yourapp.captureSessionQueue")
    
    // Activity Indicator for user feedback
    var activityIndicator: UIActivityIndicatorView!
    
    // MARK: - ARFrame Storage
    
    // Property to store the captured ARFrame's image
    var capturedARFrameImage: UIImage?
    
    // MARK: - View Controller Life Cycle

    override func viewDidLoad() {
        super.viewDidLoad()
        
            sceneView.delegate = self
            sceneView.session.delegate = self
            sceneView.automaticallyUpdatesLighting = true
            
            // Set the initial face content.
            tabBar.selectedItem = tabBar.items!.first!
            selectedVirtualContent = VirtualContentType(rawValue: tabBar.selectedItem!.tag)
            
            // Initialize and configure AVCaptureSession
            setupCaptureSession()
            
            // Initialize Activity Indicator
            setupActivityIndicator()
            
            // Add Capture Button
            setupCaptureButton()
        }

    override func viewWillAppear(_ animated: Bool) {
        super.viewWillAppear(animated)
        
        // Run ARSession
        resetTracking()
    }
    
    override func viewWillDisappear(_ animated: Bool) {
        super.viewWillDisappear(animated)
        
        // Pause ARSession
        sceneView.session.pause()
    }

    // MARK: - Setup AVCaptureSession
    
    func setupCaptureSession() {
        captureSession = AVCaptureSession()
        captureSession?.sessionPreset = .photo
        
        // Select the front camera
        guard let device = AVCaptureDevice.default(.builtInWideAngleCamera,
                                                 for: .video,
                                                 position: .front) else {
            print("Front camera is not available.")
            return
        }
        captureDevice = device
        
        do {
            let input = try AVCaptureDeviceInput(device: device)
            if let captureSession = captureSession, captureSession.canAddInput(input) {
                captureSession.addInput(input)
            } else {
                print("Unable to add input to capture session.")
                return
            }
        } catch {
            print("Error configuring capture device input: \(error)")
            return
        }
        
        // Configure photo output
        photoOutput = AVCapturePhotoOutput()
        if let photoOutput = photoOutput, let captureSession = captureSession, captureSession.canAddOutput(photoOutput) {
            captureSession.addOutput(photoOutput)
        } else {
            print("Unable to add photo output to capture session.")
            return
        }
        
        // Configure photo settings
        photoOutput?.isHighResolutionCaptureEnabled = true
    }
    
    // MARK: - Setup Activity Indicator
    
    func setupActivityIndicator() {
        if #available(iOS 13.0, *) {
            activityIndicator = UIActivityIndicatorView(style: .large)
        } else {
            // Fallback on earlier versions
            activityIndicator = UIActivityIndicatorView(style: .gray)
        }
        activityIndicator.center = view.center
        activityIndicator.hidesWhenStopped = true
        view.addSubview(activityIndicator)
    }

    // MARK: - ARSessionDelegate

    func session(_ session: ARSession, didFailWithError error: Error) {
        guard error is ARError else { return }
        
        let errorWithInfo = error as NSError
        let messages = [
            errorWithInfo.localizedDescription,
            errorWithInfo.localizedFailureReason,
            errorWithInfo.localizedRecoverySuggestion
        ]
        let errorMessage = messages.compactMap({ $0 }).joined(separator: "\n")
        
        DispatchQueue.main.async {
            self.displayErrorMessage(title: "The AR session failed.", message: errorMessage)
        }
    }
    
    /// - Tag: ARFaceTrackingSetup
    func resetTracking() {
        guard ARFaceTrackingConfiguration.isSupported else { return }
        let configuration = ARFaceTrackingConfiguration()
        if #available(iOS 13.0, *) {
            configuration.maximumNumberOfTrackedFaces = 1 // Limit to 1 for performance
        }
        configuration.isLightEstimationEnabled = true
        sceneView.session.run(configuration, options: [.resetTracking, .removeExistingAnchors])
        
        faceAnchorsAndContentControllers.removeAll()
    }
    
    // MARK: - Error Handling
    
    func displayErrorMessage(title: String, message: String) {
        // Present an alert informing about the error that has occurred.
        let alertController = UIAlertController(title: title, message: message, preferredStyle: .alert)
        let restartAction = UIAlertAction(title: "Restart Session", style: .default) { _ in
            alertController.dismiss(animated: true, completion: nil)
            self.resetTracking()
        }
        alertController.addAction(restartAction)
        present(alertController, animated: true, completion: nil)
    }
    
    // Auto-hide the home indicator to maximize immersion in AR experiences.
    override var prefersHomeIndicatorAutoHidden: Bool {
        return true
    }
    
    // Hide the status bar to maximize immersion in AR experiences.
    override var prefersStatusBarHidden: Bool {
        return true
    }
    
    // MARK: - Setup Capture Button
    
    func setupCaptureButton() {
        // Create button
        let captureButton = UIButton(type: .system)
        captureButton.setTitle("Capture", for: .normal)
        captureButton.backgroundColor = UIColor.systemBlue.withAlphaComponent(0.7)
        captureButton.setTitleColor(.white, for: .normal)
        captureButton.layer.cornerRadius = 10
        captureButton.translatesAutoresizingMaskIntoConstraints = false
        
        // Add target action
        captureButton.addTarget(self, action: #selector(captureHiResFrame(_:)), for: .touchUpInside)
        
        // Add to view
        self.view.addSubview(captureButton)
        
        // Set constraints (e.g., bottom center)
        NSLayoutConstraint.activate([
            captureButton.bottomAnchor.constraint(equalTo: view.safeAreaLayoutGuide.bottomAnchor, constant: -20),
            captureButton.centerXAnchor.constraint(equalTo: view.centerXAnchor),
            captureButton.widthAnchor.constraint(equalToConstant: 100),
            captureButton.heightAnchor.constraint(equalToConstant: 50)
        ])
    }
    
    // MARK: - Capture High-Resolution Frame with Session Switching

    @objc func captureHiResFrame(_ sender: UIButton) {
        // Ensure iOS 17+ is available
        guard #available(iOS 17.0, *), ARFaceTrackingConfiguration.isSupported else {
            print("High-resolution capture requires iOS 17+ and a supported device.")
            return
        }
        
        // Ensure ARSession is running and has at least one frame
        guard let currentFrame = sceneView.session.currentFrame else {
            print("ARSession is not running or no frame is available.")
            return
        }
        
        // Convert ARFrame's captured image to UIImage
        let arImage = convertPixelBufferToUIImage(pixelBuffer: currentFrame.capturedImage)
        capturedARFrameImage = arImage // Store the ARFrame image
        
        // Record the start time
        let startTime = Date()
        
        // Pause the ARSession on the main thread
        DispatchQueue.main.async { [weak self] in
            self?.sceneView.session.pause()
        }
        
        // Start the AVCaptureSession on a background thread
        captureSessionQueue.async { [weak self] in
            guard let self = self else { return }
            
            // Start running the capture session
            self.captureSession?.startRunning()
            
            // Wait until the capture session is running
            let maxWaitTime: TimeInterval = 2.0 // Maximum wait time in seconds
            let pollingInterval: TimeInterval = 0.05 // Polling interval in seconds
            var waitedTime: TimeInterval = 0.0
            
            while !(self.captureSession?.isRunning ?? false) && waitedTime < maxWaitTime {
                Thread.sleep(forTimeInterval: pollingInterval)
                waitedTime += pollingInterval
            }
            
            if self.captureSession?.isRunning == true {
                // Configure photo settings
                let photoSettings = AVCapturePhotoSettings()
                photoSettings.isHighResolutionPhotoEnabled = true
                
                // Set up the capture completion handler
                self.captureCompletion = { [weak self] image, error in
                    guard let self = self else { return }
                    
                    // Stop the AVCaptureSession
                    self.captureSession?.stopRunning()
                    
                    // Resume the ARSession on the main thread
                    DispatchQueue.main.async {
                        if let configuration = self.sceneView.session.configuration {
                            self.sceneView.session.run(configuration, options: [.resetTracking, .removeExistingAnchors])
                        }
                        
                        // Record the end time
                        let endTime = Date()
                        let timeInterval = endTime.timeIntervalSince(startTime)
                        
                        // Print the time taken
                        print("Time taken to capture image: \(timeInterval) seconds")
                        
                        if let error = error {
                            print("Error capturing image: \(error)")
                            return
                        }
                        
                        if let image = image {
                            // Show Activity Indicator
                            DispatchQueue.main.async {
                                self.activityIndicator.startAnimating()
                            }
                            
                            // Save the image to the photo library
                            UIImageWriteToSavedPhotosAlbum(image, self, #selector(self.image(_:didFinishSavingWithError:contextInfo:)), nil)
                            print("Successfully captured and saved AVCapture image with size: \(image.size.width)x\(image.size.height)")
                            
                            // Optionally, save the ARFrame image as well
                            if let arImage = self.capturedARFrameImage {
                                UIImageWriteToSavedPhotosAlbum(arImage, self, #selector(self.image(_:didFinishSavingWithError:contextInfo:)), nil)
                                print("Successfully saved ARSession frame image with size: \(arImage.size.width)x\(arImage.size.height)")
                            }
                        } else {
                            print("Failed to capture AVCapture image.")
                        }
                    }
                }
                
                // Capture the photo
                self.photoOutput?.capturePhoto(with: photoSettings, delegate: self)
            } else {
                print("Capture session failed to start within \(maxWaitTime) seconds.")
                
                // Optionally, inform the user about the failure
                DispatchQueue.main.async { [weak self] in
                    self?.displayErrorMessage(title: "Capture Failed", message: "Unable to start the camera for capturing the image. Please try again.")
                }
            }
        }
    }
    
    // MARK: - Convert CVPixelBuffer to UIImage
    
    func convertPixelBufferToUIImage(pixelBuffer: CVPixelBuffer) -> UIImage? {
        let ciImage = CIImage(cvPixelBuffer: pixelBuffer)
        let context = CIContext()
        if let cgImage = context.createCGImage(ciImage, from: ciImage.extent) {
            let uiImage = UIImage(cgImage: cgImage)
            return uiImage
        }
        return nil
    }
    
    // MARK: - Handle Image Save Completion
    
    @objc func image(_ image: UIImage, didFinishSavingWithError error: Error?, contextInfo: UnsafeRawPointer) {
        if let error = error {
            // Handle the error
            print("Error saving image: \(error.localizedDescription)")
            
            // Optionally, inform the user
            DispatchQueue.main.async { [weak self] in
                let alert = UIAlertController(title: "Save Error", message: error.localizedDescription, preferredStyle: .alert)
                alert.addAction(UIAlertAction(title: "OK", style: .default))
                self?.present(alert, animated: true)
            }
        } else {
            // Image saved successfully
            print("Image saved to photo library successfully.")
            
            // Optionally, inform the user
            DispatchQueue.main.async { [weak self] in
                let alert = UIAlertController(title: "Saved!", message: "Your images have been saved to your photo library.", preferredStyle: .alert)
                alert.addAction(UIAlertAction(title: "OK", style: .default))
                self?.present(alert, animated: true)
            }
        }
        
        // Stop Activity Indicator
        DispatchQueue.main.async { [weak self] in
            self?.activityIndicator.stopAnimating()
        }
    }
}

// MARK: - UITabBarDelegate

extension ViewController: UITabBarDelegate {
    func tabBar(_ tabBar: UITabBar, didSelect item: UITabBarItem) {
        guard let contentType = VirtualContentType(rawValue: item.tag)
            else { fatalError("unexpected virtual content tag") }
        selectedVirtualContent = contentType
    }
}

// MARK: - ARSCNViewDelegate

extension ViewController: ARSCNViewDelegate {
    
    func renderer(_ renderer: SCNSceneRenderer, didAdd node: SCNNode, for anchor: ARAnchor) {
        guard let faceAnchor = anchor as? ARFaceAnchor else { return }
        
        // If this is the first time with this anchor, get the controller to create content.
        // Otherwise (switching content), will change content when setting `selectedVirtualContent`.
        DispatchQueue.main.async {
            let contentController = self.selectedVirtualContent.makeController()
            if node.childNodes.isEmpty, let contentNode = contentController.renderer(renderer, nodeFor: faceAnchor) {
                node.addChildNode(contentNode)
                self.faceAnchorsAndContentControllers[faceAnchor] = contentController
            }
        }
    }
    
    /// - Tag: ARFaceGeometryUpdate
    func renderer(_ renderer: SCNSceneRenderer, didUpdate node: SCNNode, for anchor: ARAnchor) {
        guard let faceAnchor = anchor as? ARFaceAnchor,
              let contentController = faceAnchorsAndContentControllers[faceAnchor],
              let contentNode = contentController.contentNode else {
            return
        }
        
        contentController.renderer(renderer, didUpdate: contentNode, for: anchor)
    }
    
    func renderer(_ renderer: SCNSceneRenderer, didRemove node: SCNNode, for anchor: ARAnchor) {
        guard let faceAnchor = anchor as? ARFaceAnchor else { return }
        
        faceAnchorsAndContentControllers[faceAnchor] = nil
    }
}

// MARK: - AVCapturePhotoCaptureDelegate

extension ViewController: AVCapturePhotoCaptureDelegate {
    func photoOutput(_ output: AVCapturePhotoOutput,
                     didFinishProcessingPhoto photo: AVCapturePhoto,
                     error: Error?) {
        if let error = error {
            captureCompletion?(nil, error)
            return
        }
        
        guard let photoData = photo.fileDataRepresentation(),
              let image = UIImage(data: photoData) else {
            captureCompletion?(nil, NSError(domain: "PhotoCapture", code: -1, userInfo: [NSLocalizedDescriptionKey: "Failed to convert photo data to UIImage."]))
            return
        }
        
        captureCompletion?(image, nil)
    }
}

Since we only need the face orientation and face landmarks of ARKit, we looked into other methods to get this. We found the Vision Framework. Are there other options?

How would you detect a head pose with the back camera?

Thank you in advance, it is really a hot topic on our side.

Combining ARKit Face Tracking with High-Resolution AVCapture and Perspective Rendering on Front Camera
 
 
Q