How to extracted stereo image pair from generated spatial photos by visionOS 2.0

Hi,

My app allows users to share and view spatial photos.

For viewing spatial photos, I'm using a plane in a RealityView that has a camera index switch material node, which takes the stereo images as the inputs.

For sharing native spatial photos taken on the vision pro, prior to visionOS 2.0, I extract the stereo image pair and merge them into a single side-by-side image to upload to the app's backend.

However, since visionOS 2.0 introduced generating spatial photos from normal photos, I've been seeing some unexpected behaviours in my app, while on the other hand, they can be viewed correctly in the system Photos app:

  • Sometimes the extracted images have different size, the right image is smaller than the left image. See the first image in the google drive below, taken with iPhone 15 Pro.
  • Even if the image pair have the same size, when viewed in my app, it has some artefacts, especially around the edge of objects which are closer to the camera. See the second image in the google drive below, taken with iPhone 11.

Google drive link here: https://drive.google.com/drive/folders/1UTfpxvO3-ChqshwfyzY5E_KCgk8VgUaa

I know that now Quicklook preview application can support viewing spatial photos now, but I would like to keep it the way I implemented in the app, for compatibility concerns.

Below is a code snippet that deals with the extraction. Please point out the correct way to extract stereo image pair from a generated spatial photo.

Happy to submit a code-level support request if more information is needed.

// the data is from photos picker item
let data = try await photo.loadTransferable(type: Data.self)
let source = CGImageSourceCreateWithData(data as CFData, nil)
let sbsImage = source.extractSpatialPhoto()

extension CGImageSource {
    func extractSpatialPhoto() -> UIImage? {
        guard let leftCGImage = extractSpatialImage(at: 0),
              let rightCGImage = extractSpatialImage(at: 1)
        else {
            return nil
        }

        let leftImage = UIImage(ciImage: leftCGImage)
        let rightImage = UIImage(ciImage: rightCGImage)

       guard leftImage.size == rightImage.size else {
           return nil
       }

        // merge left + right
        let size = CGSize(width: leftImage.size.width * 2, height: leftImage.size.height)
        UIGraphicsBeginImageContextWithOptions(size, true, 1.0)

        leftImage.draw(in: CGRect(x: 0, y: 0, width: leftImage.size.width, height: leftImage.size.height))
        rightImage.draw(in: CGRect(x: leftImage.size.width, y: 0, width: rightImage.size.width, height: rightImage.size.height))

        let mergedImage = UIGraphicsGetImageFromCurrentImageContext()
        UIGraphicsEndImageContext()

        return mergedImage
    }

    // not sure if this actually works
    func extractSpatialImage(at index: Int) -> CIImage? {
        guard let cgImage = CGImageSourceCreateImageAtIndex(self, index, nil) else {
            return nil
        }

        var ciImage = CIImage(cgImage: cgImage)

        if let properties = CGImageSourceCopyPropertiesAtIndex(self, index, nil) as? [String: Any],
           let heifDictionary = properties[kCGImagePropertyHEIFDictionary as String] as? [String: Any],
           let extrinsics = heifDictionary[kIIOMetadata_CameraExtrinsicsKey as String] as? [String: Any],
           let position = extrinsics[kIIOCameraExtrinsics_Position as String] as? [Double]
        {
            // Default baseline is 64mm (0 for left camera, 0.064m for right camera)
            let standardBaseline = 0.064

            // Check if it's the right image (should be at [0.064, 0, 0])
            let isRightImage = (index == 1)
            let expectedPosition = isRightImage ? standardBaseline : 0.0

            // Calculate the translation needed to align to standard baseline
            let positionDelta = position[0] - expectedPosition

            // Apply translation only if there's a mismatch in position
            if positionDelta != 0 {
                let transform = CGAffineTransform(translationX: CGFloat(positionDelta), y: 0)
                ciImage = ciImage.transformed(by: transform)
            }
        }

        return ciImage
    }
}

Answered by Vision Pro Engineer in 807332022

Hello! The left-eye and right-eye images in a CGImageSource for a spatial photo are not always at indexes 0 and 1. For example, spatial photos created by the Create Spatial feature in visionOS, or captured on iPhone 16, contain three images, not two. The image at index 0 is a higher-quality monoscopic image for display on non-stereo-aware platforms.

The code below shows how to discover the correct image indexes to use for the left and right eyes:

import ImageIO

extension CGImageSource {

    /// Returns a tuple containing the left-eye and right-eye image indexes for a stereo or spatial HEIC.
    ///
    /// Returns nil if the the image is not stereo.
    func stereoImageIndexes() -> (left: Int, right: Int)? {

        // Extract container-level metadata.
        guard let sourceProperties = CGImageSourceCopyProperties(self, nil) as? [String: Any] else { return nil }

        // Check for any groups.
        guard let groups = sourceProperties[kCGImagePropertyGroups as String] as? [[String: Any]], !groups.isEmpty else { return nil }

        // Find the first stereo pair group, if one exists.
        guard let groupDictionary = groups.first(where: { groupDictionary in
            guard let groupType = groupDictionary[kCGImagePropertyGroupType as String] as? String, groupType == kCGImagePropertyGroupTypeStereoPair as String else { return false }
            return true
        }) else { return nil }

        // Retrieve the left and right indexes.
        guard let leftIndex = groupDictionary[kCGImagePropertyGroupImageIndexLeft as String] as? Int else { return nil }
        guard let rightIndex = groupDictionary[kCGImagePropertyGroupImageIndexRight as String] as? Int else { return nil }

        return (leftIndex, rightIndex)

    }

}
Accepted Answer

Hello! The left-eye and right-eye images in a CGImageSource for a spatial photo are not always at indexes 0 and 1. For example, spatial photos created by the Create Spatial feature in visionOS, or captured on iPhone 16, contain three images, not two. The image at index 0 is a higher-quality monoscopic image for display on non-stereo-aware platforms.

The code below shows how to discover the correct image indexes to use for the left and right eyes:

import ImageIO

extension CGImageSource {

    /// Returns a tuple containing the left-eye and right-eye image indexes for a stereo or spatial HEIC.
    ///
    /// Returns nil if the the image is not stereo.
    func stereoImageIndexes() -> (left: Int, right: Int)? {

        // Extract container-level metadata.
        guard let sourceProperties = CGImageSourceCopyProperties(self, nil) as? [String: Any] else { return nil }

        // Check for any groups.
        guard let groups = sourceProperties[kCGImagePropertyGroups as String] as? [[String: Any]], !groups.isEmpty else { return nil }

        // Find the first stereo pair group, if one exists.
        guard let groupDictionary = groups.first(where: { groupDictionary in
            guard let groupType = groupDictionary[kCGImagePropertyGroupType as String] as? String, groupType == kCGImagePropertyGroupTypeStereoPair as String else { return false }
            return true
        }) else { return nil }

        // Retrieve the left and right indexes.
        guard let leftIndex = groupDictionary[kCGImagePropertyGroupImageIndexLeft as String] as? Int else { return nil }
        guard let rightIndex = groupDictionary[kCGImagePropertyGroupImageIndexRight as String] as? Int else { return nil }

        return (leftIndex, rightIndex)

    }

}
How to extracted stereo image pair from generated spatial photos by visionOS 2.0
 
 
Q