Explore ARKit 4

Explore ARKit 4

ARKit 4 enables you to build the next generation of augmented reality apps to transform how people connect with the world around them. We'll walk you through the latest improvements to Apple's augmented reality platform, including how to use Location Anchors to connect virtual objects with a real-world longitude, latitude, and altitude. Discover how to harness the LiDAR Scanner on iPad Pro and obtain a depth map of your environment. And learn how to track faces in AR on more devices, including the iPad Air (3rd generation), iPad mini (5th generation), and all devices with the A12 Bionic chip or later that have a front-facing camera.

To get the most out of this session, you should be familiar with how your apps can take advantage of LiDAR Scanner on iPad Pro. Watch “Advanced Scene Understanding in AR” for more information.

Once you've learned how to leverage ARKit 4 in your iOS and iPadOS apps, explore realistic rendering improvements in “What's New in RealityKit” and other ARKit features like People Occlusion and Motion Capture with “Introducing ARKit 3”.

Resources
Related Videos

WWDC21
- Explore ARKit 5
WWDC20
WWDC19
- Introducing ARKit 3
- Introducing RealityKit and Reality Composer
♪ Voiceover: Hello, and welcome to WWDC.
Quinton: Hi, my name's Quinton, and I'm an engineer on the ARKit team.
Today both Praveen and I get to show you some of the new features in ARKit with iOS 14.
So let's jump right in and explore ARKit 4.
This release adds many advancements to ARKit which already powers the world's largest AR platform, iOS.
ARKit gives you the tools to create AR experiences that change the way your users see the world.
Some of these tools include device motion tracking, camera scene capture, and advanced scene processing which all help to simplify the task of building a realistic and immersive AR experience.
Let's see what's next with ARKit.
So first, we're going to take a look at the location anchor API.
Location anchors bring your AR experience onto the global scale by allowing you to position virtual content in relation to the globe.
Then we'll see whatt he new LiDAR sensor brings to ARKit with scene geometry.
Scene geometry provides apps with a mesh of the surrounding environment that can be used for everything from occlusion to lighting.
Next we'll look at the technology that enables scene geometry, the depth API.
We're opening up this API to give apps access to a dense depth map to enable new possibilities using the LiDAR sensor.
And additionally, the LiDAR sensor improves object placement.
We'll go over some best practices to make sure your apps take full advantage of the newest object-placement techniques.
And we'll wrap up with some improvements to FaceTracking.
Let's start with location anchors.
Before we get too far, let's look at how we got to this point.
ARKit started on iOS with the best tracking.
No QR codes. No external equipment needed.
Just start an AR experience by placing content around you.
Then we added multi-user experiences.
Your AR content could then be shared with a friend using a separate device to make experiences social.
And last year we brought people into ARKit.
AR experiences are now aware of the people on the scene.
Motion capture is possible with just a single iOS device, and people occlusion makes AR content even more immersive as people can walk right in front of a virtual object.
All these features combine to make some amazing experiences.
But what's next? So now we're bringing AR into the outdoors with location anchors.
Location anchors enable you to place AR content in relation to the globe.
This means you can now place virtual objects and create AR experiences by specifying a latitude, longitude, and altitude.
ARKit will take your geographic coordinates as well as high-resolution map data from Apple Maps to place your AR experiences at the specific world location.
This whole process is called visual localization, and it will precisely locate your device in relation to the surrounding environment more accurately than could be done before with just GPS.
All this is possible due to advance machine learning techniques running right on your device.
There's no processing in the cloud and no images sent back to Apple.
ARKit also takes care of merging the local coordinate system to the geographic coordinate system, so you can work in one unified system, regardless of how you want to create your AR experiences.
To access these features we've added a new configuration, ARGeoTrackingConfiguration and ARGeoAnchors are what you'll use to place in content the same way as other ARKit anchors.
Let's see some location anchors in action.
We've got a video here in front of the Ferry Building in San Francisco.
You can see a large virtual sculpture.
That's actually the companion sculpture created by KAWS, and viewed in the acute art app.
Since it was placed with location anchors, everyone who uses the app at the Ferry Building can enjoy the virtual art in the same place and the same way.
Let's see what's under the hood in ARKit to make this all work.
So when using geo tracking, we download all the detailed map data from Apple Maps around your current location.
Part of this data is a localization map that contains feature points of the surrounding area that can be seen from the street.
Then with the localization map, your current location, and images from your device, we can use advanced machine learning to visually localize and determine your device's position.
All this is happening under the hood in ARKit to give you a precise, globally-aware pose without worrying about any of this complexity.
The location anchor API can be broken down into three main parts.
ARGeoTrackingConfiguration is the configuration that you'll use to take advantage of all of the new location anchor features.
This configuration contains a subset of the world-tracking features that are compatible with geo tracking.
Then once you've started an AR session with the geo-tracking configuration, you'll be able to create ARGeoAnchors just like any other ARKit anchor.
And also while using geo tracking, there's a new tracking status that's important to monitor.
This is contained in ARGeoTrackingStatus and provides valuable feedback to improve the geo-tracking experience.
So building an app with location anchors can be broken down into a few steps.
The first is checking availability of geo tracking.
ARGeoTrackingConfiguration has a few methods that let us check the preconditions using the rest of the location anchor API.
Then location anchors can be added once you know there's full geo-tracking support.
And after anchors are added, we can use a rendering engine to place virtual content.
We'll then need take care of geo-tracking transitions.
Once started, geo tracking will move through a few states that may need some user intervention to ensure the best geo-tracking experience.
So let's build a simple point of interest app to see what these steps look like in practice.
In our app, we're going to start with helping our users find the iconic Ferry Building in San Francisco, California.
As you can see, we've placed a sign to make the building easy to spot.
To begin the app, let's first start with checking availability.
As with many ARKit features, we need to make sure the current device is supported before attempting to start an experience.
Location anchors are available on devices with an A12 bionic chip and newer, as well as GPS.
ARGeoTrackingConfigurations isSupported class method should be used to check for the support.
For geo tracking, we also need to check if the current location is supported.
We need to be in a location that has all the required maps data to localize.
The geo tracking configuration has a method to check your current location support as well as an arbitrary latitude and longitude.
Additionally, once a geo tracking session is started, ARKit will ask a user for permission for both camera and location.
ARKit has always asked for camera permission, but location permission is needed to do geo tracking.
Let's see what this looks like in code.
ARGeoTrackingConfiguration has all the class methods that we need to check before starting our AR session.
We'll first check if the current device is supported with "isSupported," and then we'll check if our current location is available for geo tracking with "checkAvailability." If this check fails, we'll get an error with more info to display to the user.
For example, if the user hasn't given the app location permissions.
Then once we know our current device and location are supported, we can go ahead and start the session.
Since we're using RealityKit, we'll need our ARView and then update the configuration.
By default, ARView uses a world-tracking configuration, and so we need to pass in a GeoTrackingConfiguration when running the session.
The next step is adding a location anchor.
To do this, we'll use the new ARAnchor subclass ARGeoAnchor.
geoAnchors are similar to existing ARKit anchors in many ways.
However, because geoAnchors operate on global coordinates, we can't create them with just transforms.
We need to specify their geographic coordinates with latitude, longitude, and altitude.
The most common way to create geoAnchors will be through specifying just latitude and longitude, which this allows ARKit to fill in the altitude based on maps data of the ground-level.
Let's now add a location anchor to our point of interest app.
So for our app, we need to start by finding the Ferry Building's location.
One way we can get the latitude and longitude is through the maps app.
When we place the marker in the maps app, we now get up to six digits of precision after the decimal.
It's important to use six or more digits so that we get a precise location to place our contents.
Once we have a latitude and longitude, we can make a geoAnchor.
We don't need to specify an altitude because we'll let ARKit use maps data to determine the elevation of the ground level.
Then we'll add the geoAnchor to our session.
And since we're using RealityKit to render our virtual content, and we've already created our geoAnchor, we can go ahead and attach the anchor to an entity to mark the Ferry Building.
Let's run our app and see what it looks like.
We'll start near the Ferry Building in San Francisco looking towards Market Street.
And as we pan around, we can see some of the palm trees that line the city.
And soon the Ferry Building will come into view.
Our sign looks to be on the ground, which is expected, but the text is rotated.
Since we'd like to find the Ferry Building easily from a distance, we'd really like to have the sign floating a few meters in the air and facing towards the city.
So how do we do this? Well, to position this content, we need to first look at the coordinate's system of the geoAnchor.
geoAnchors are fixed to cardinal directions.
Their axes are set when you create the anchor, and this orientation will remain unchanged for the rest of the session.
A geoAnchors X-axis is always pointed east, and the Z-axis is always pointed south for any geographic coordinate.
Since we're using a right-handed coordinate system, this leaves positive y pointing up away from the ground.
geoAnchors, like all other ARKit anchors are immutable.
This means we'll need to use our rendering engine to rotate or translate our virtual objects from the geoAnchors origin.
Let's clean up our sign that we placed in front of the Ferry Building.
So here's some RealityKit code to start updating our sign.
After getting the signEntity and adding it to the geoAnchor entity, we want to rotate the sign towards the city.
To do this, we'll rotate it by a little less than 90 degrees clockwise and we'll elevate the sign's position by 35 meters.
Both of these operations are in relation to the geoAnchor entity that we had previously created.
Let's see what this looks like in the app.
Now when we pan around and we get to our Ferry Building, our sign is high in the air, and we can see it from a distance.
The text is much easier to read in this orientation.
This looks great, but we're missing some crucial information here about the geo-tracking state that we can use to guide the user to the best geo-tracking experience.
When using a GeoTrackingConfiguration, there's a new GeoTrackingStatus object that's available on ARFrame and ARsession observer.
ARGeoTrackingStatus encapsulates all the current state information of geo tracking, similar to the world-tracking information that's available on ARCamera.
Within geo tracking status, there is a state.
This state indicates how far along geo tracking is during localization.
There's also a property that provides more information about the current localization state called geo Tracking State Reason, and there's an accuracy provided once geo tracking localizes.
Let's take a closer look at the geo tracking state.
When an AR session begins, geoTrackingState starts at initializing.
At this point, geo tracking is waiting for the world tracking to initialize.
From initializing, the tracking state can immediately go to not available if geo tracking isn't supported in the current location.
If you're using the checkAvailability class method on geoTrackingConfiguration, you should rarely get into the state.
Once geo tracking moves to localizing, ARKit is receiving images as well as maps data and is trying to compute both.
However, during both the initializing and localizing states, there could be issues detected that prevent localization.
These issues are communicated through geoTrackingStateReason.
This reason should be used to inform the user how to help geo tracking localize.
Some possible reasons include the device is pointed too low, which would then inform the user to raise the device, or geoDataNotLoaded, and we'd inform the user that a network connection is required.
For all possible reasons, have a look at ARGeoTrackingTypes.h.
In general we want to encourage users to point their devices at buildings and other stationary structures that are visible from the street.
Parking lots, open fields, and other environments that dynamically change have a lower chance of localizing.
After addressing any geoTrackingStateReasons, geo tracking should become localized.
It's at this point that you should start your AR experience.
If you place objects before localization, the objects could jump to unintended locations.
Additionally once localized ARGeoTrackingAccuracy is provided to help you gauge what experiences should be enabled.
It's also important to always monitor geo tracking state as it's possible for geo tracking to move back to localizing or even initializing such as when tracking is lost or map data isn't available.
Let's take a look at how we can add this tracking state to improve our sample app.
Now we can see this whole time we were actually localizing when looking at Market Street and the surrounding buildings.
As we pan around, we can see from the tracking state that we localize and then the accuracy increases to high.
I think we've got our app just about ready, at least for the Ferry Building.
So we've added a more expansive location anchor sample project on developer.apple.com that I encourage you to check out after this talk.
For more information on the RealityKit features used, check out last year's talk introducing RealityKit and Reality Composer.
In our sample app, we saw how to create location anchors by directly specifying coordinates.
We already knew the geographic coordinates for the Ferry Building.
However, these coordinates could have come from any source, such as our app bundle, our web backend, or really any database.
Another way to create a location anchor is via user interaction.
We could expand on our app in the future by allowing users to tap the screen to save their own point of interest.
getGeoLocation(for point) on ARSession allows us to get geographic coordinates from any world point in ARKit coordinate space.
For example, this could have come from a raycast, or location on a plane.
Location anchors are available for you today with iOS14, and we're starting with support in the San Francisco Bay Area, New York, Los Angeles, Chicago, and Miami, with more cities coming through the summer.
All iPhones and iPads with an A12 bionic chip or newer, as well as GPS, are supported.
Also, for any apps that require location anchors exclusively, you can use device capability keys to limit your app in the App Store to only compatible hardware.
In addition to the GPS key, you'll need to use the new key for devices with an A12 bionic chip or newer that is available in iOS14.
So with location anchors, you can now bring your AR experiences onto the global scale.
We went over how ARGeoTrackingConfiguration is the entry point to adding location anchors to your app.
We saw how to add ARGeoAnchors to your ARScene and how to position content in relation to those anchors.
We also saw how ARGeoTrackingStatus can be used to help guide the user to the best geo-tracking experience.
And now here's Praveen to tell you more about scene geometry.
Praveen Gowda: Hi everyone.
I'm Praveen Gowda. I'm an engineer on the ARKit team.
Today I'm going to take you through some of the APIs available in iOS14 that help bring the power of the LiDAR scanner to your applications.
In ARKit 3.5, we introduced the scene geometry API powered by the LiDAR scanner on the new iPad Pro.
Before we go into scene geometry, let's take a look at how the LiDAR scanner works.
The LiDAR shoots light onto the surroundings and then collects the light reflected off the surfaces in the scene.
The depth is estimated by measuring the time it took for the light to go from the LiDAR to the environment and reflect back to the scanner.
And this entire process runs millions of times every second.
The LiDAR scanner is used by the same geometry API to provide a topological map of the environment.
This can be optionally fused with semantic classification which enables apps to recognize and classify physical objects.
This provides an opportunity for creating richer AR experiences where apps can now upload virtual objects with the real world or use physics to enable realistic interactions between virtual and physical objects.
Or to use virtual lighting on real world surfaces and in many other use cases that we were to imagine.
Let's take a quick look at scene geometry in action.
Here is a living room and once the scene geometry API is turned on the entire visible room is meshed.
Triangles vary in size to show the optimum detail for each surface.
The color mesh appears once semantic classification is enabled.
Each color represents a different classification such as blue for the seats and green for the floor.
As we saw, the scene geometry feature is built by leveraging the depth data gathered from the LiDAR scanner.
In iOS14, we have a new ARKit depth API that provides access to the same depth data.
The API provides a dense depth image where a pixel in the image corresponds to depth in meters from the camera.
What we see here is a debug visualization of this depth where there's a gradient from blue to red, where blue represents regions closer to the camera and red represents those far away.
The depth data would be available at 60 Hz, associated with each AR frame.
The scene geometry feature is built on top of this API where depth data across multiple frames are aggregated and processed to construct a 3D mesh.
This API is powered by the LiDAR scanner and is available on devices which have LiDAR.
Here is an illustration of how the depth map is generated.
The colored RGB image from the wide-angle camera and the depth ratings from the LiDAR scanner are fused together using advanced machine learning algorithms to create a dense depth map that is exposed through the API.
This operation runs at 60 times per second with the depth map available on every AR frame.
To access the depth data, each AR frame will have a new property called sceneDepth.
This provides an object of type, ARDepthData.
ARDepthData is a container for two buffers.
One is a depthMap and the other is a confidenceMap.
The depthMap is a CV pixel buffer but each pixel represents depth and is in meters, and this depth corresponds to the distance from plane of the camera to a point in the world.
One thing to note is that the depth map is smaller in resolution compared to the captured image on the AR frame which still presents the same aspect ratio.
The other buffer on the ARDepthData object is the confidenceMap.
Since the measurement of depth using LiDAR is based on the light which reflects from objects, the accuracy of the depth map can be impacted by the nature of the surrounding environment.
Challenging surfaces, such as those which are highly reflective or those with high absorption, can lower the accuracy of the depth.
This accuracy is expressed through a value we call confidence.
For each depth pixel, there is a corresponding confidence value of type ARConfidenceLevel, and this value can either be low, medium, or high and will help to filter depth based on the requirements of your application.
Let's see how we can use the depth API.
I begin with creating an ARSession and an ARTrackingConfiguration.
There is a new frame semantic called sceneDepth, which allows you to turn on the depth API.
As always, I check if the frameSemantic is supported on the device using the supportsFrameSemantics method on the configuration class.
Then we can set the frameSemantic to sceneDepth and run the configuration.
After this, I can access the depth data from the sceneDepth property on ARFrame using the didUpdate frame delegate method.
Additionally if you have an AR app that uses people occlusion feature, and then search the personSegmentationWithDepth frameSemantic, then you will automatically get sceneDepth on devices that support the sceneDepth frameSemantic with no additional power cost to your application.
Here is a demo of an app that we built using the depth API.
The depth from the depthMap is unprojected to 3D to form a point cloud.
The point cloud is colored using the captured image on the ARFrame.
By accumulating depth data across multiple AR frames, you get a dense 3D point cloud like the one we see here.
I can also filter the point clouds based on the confidence level.
This is the point cloud formed by all the depth pixels including those with low confidence.
And here is the point cloud while filtering depth with confidence is medium or high.
And this is the point cloud we get by using only that depth which has high confidence.
This gives us a clear picture of how the physical properties of surfaces can impact the confidence level of its depth.
Your application and its tolerance to inaccuracies in depth will determine how you will filter the depth based on its confidence level.
Let's take a closer look at how we built this app.
For each ARFrame we access, the sceneDepth property with the ARDepth data objects, providing us with the depth and the confidenceMap.
The key part of the app is a metal vertex shader called unproject.
As the name suggests, it unprojects the depth data from the depth map to the 3D space using parameters on the ARCamera such as the cameras transform, it's intrinsics, and the projection matrix.
The shader also uses captured image to sample color for each depth pixel.
What we get as an output of this is the 3D point cloud which is then rendered using Metal.
To summarize, we have a new depth API in ARKit 4 which gives a highly accurate representation of the world.
There is a frame semantic called sceneDepth which allows you to enable the feature.
Once enabled, the depth data will be available at 60 Hz on each AR frame.
The depth data will have a depthMap and a confidenceMap, and the API is supported on devices with the LiDAR scanner.
One of the fundamental tasks in many AR apps is placing objects, and in ARKit 3, we introduced the raycasting API to make object placement easier.
In ARKit 4, The LiDAR scanner brings some great implements to raycasting.
Raycasting is highly optimized for object placement and makes it easy to precisely place virtual objects in your AR app.
Placing objects in ARKit 4 is more precise and quicker, thanks to the LiDAR scanner.
Your apps that already use raycasting will automatically benefit on a LiDAR-enabled device.
Raycasting also leverages scene depth or scene geometry when available to instantly place objects in AR.
This works great even on featureless offices such as white walls.
In iOS14, the raycast API is recommended over hit-testing for object placement.
Before you start raycasting, you will need to create a raycast query.
A raycast query describes the direction and the behavior of the ray used for raycasting.
It is composed of a raycast target which describes the type of surface that a ray can intersect with.
Existing planes correspond to planes detected by ARKit, while considering the shape and size of the plane.
Infinite planes are the same planes but with the shape and size ignored.
And estimated planes are planes of arbitrary orientation formed from the feature points around the surface.
The raycasttarget alignment specifies the alignment of surfaces that a ray can intersect with.
This can be horizontal, vkertical, or any.
There are two types of raycasts.
There are single-shot raycasts which return a one-time result.
And then that tracked raycasts which continuously update the results as ARKit's understanding of the world evolves.
In order to get the latest features object placement we are recommending migrating to the raycasting API as we deprecate hit-testing.
The code we see on the top is extracted from a sample app which uses hit-testing to place objects.
It performs a test with three different kinds of hit-test options.
And it is usually followed by some custom heuristics to filter those results and figure out where to place the object.
All of that can be replaced with the few lines of raycasting code like the one we see below and ARKit will do the heavy lifting under the hood to make sure that your virtual objects always stay at the right place.
Raycasting makes it easier than ever before to precisely place virtual objects in your ARKit applications.
Let's move over to FaceTracking.
FaceTracking allows you to detect faces in your front camera AR experience, overlay virtual content on them, and animate facial expressions in real time.
This is supported on all devices with the TrueDepth camera.
Now with ARKit 4, FaceTracking support is extended to devices without a TrueDepth camera, as long as they have an Apple A12 bionic processor or later.
This includes the devices without the TrueDepth camera such as the new iPhone SE.
Elements of FaceTracking, such as face anchors, face geometry, and blendshapes will be available on all supported devices but capture depth data will be limited to devices with the TrueDepth camera.
And that is ARKit 4.
With location anchors, you can now bring your AR experiences onto the global scale.
And we looked at how we can use the LiDAR to build rich AR apps using the same geometry and the depth API.
There are exciting improvements in raycasting to make object placement in AR easier than ever before.
And finally, FaceTracking is now supported on a wider range of devices.
Thank you.
And we can't wait to check out all the great apps that you will build using ARKit 4.

// Check device support for geo-tracking
guard ARGeoTrackingConfiguration.isSupported else {
    // Geo-tracking not supported on this device
    return
}

// Check current location is supported for geo-tracking
ARGeoTrackingConfiguration.checkAvailability { (available, error) in
    guard available else {
        // Geo-tracking not supported at current location
        return
    }
    // Run ARSession
    let arView = ARView()
    arView.session.run(ARGeoTrackingConfiguration())
}

8:38 - Adding Location Anchors

// Create coordinates
let coordinate = CLLocationCoordinate2D(latitude: 37.795313, longitude: -122.393792)

// Create Location Anchor
let geoAnchor = ARGeoAnchor(name: "Ferry Building", coordinate: coordinate)

// Add Location Anchor to session
arView.session.add(anchor: geoAnchor)

// Create a RealityKit anchor entity 
let geoAnchorEntity = AnchorEntity(anchor: geoAnchor)

// Anchor content under the RealityKit anchor
geoAnchorEntity.addChild(generateSignEntity())

// Add the RealityKit anchor to the scene
arView.scene.addAnchor(geoAnchorEntity)

10:32 - Positioning Content

// Create a new entity for our virtual content
let signEntity = generateSignEntity();

// Add the virtual content entity to the Geo Anchor entity
geoAnchorEntity.addChild(signEntity)

// Rotate text to face the city
let orientation = simd_quatf.init(angle: -Float.pi / 3.5, axis: SIMD3<Float>(0, 1, 0))
signEntity.setOrientation(orientation, relativeTo: geoAnchorEntity)

// Elevate text to 35 meters above ground level
let position = SIMD3<Float>(0, 35, 0)
signEntity.setPosition(position, relativeTo: geoAnchorEntity)

14:08 - User Interactive Location Anchors

let session = ARSession()
let worldPosition = raycastLocationFromUserTap()
session.getGeoLocation(forPoint: worldPosition) { (location, altitude, error) in
    if let error = error {
        ...
    }
    let geoAnchor = ARGeoAnchor(coordinate: location, altitude: altitude)
}

20:32 - Enabling the Depth API

// Enabling the depth API

let session = ARSession()
let configuration = ARWorldTrackingConfiguration()

// Check if configuration and device supports .sceneDepth
if type(of: configuration).supportsFrameSemantics(.sceneDepth) {
    // Activate sceneDepth
    configuration.frameSemantics = .sceneDepth
}
session.run(configuration)

...

// Accessing depth data
func session(_ session: ARSession, didUpdate frame: ARFrame) {
    guard let depthData = frame.sceneDepth else { return }
    // Use depth data
}

21:12 - Depth API alongside person occlusion

// Using the depth API alongside person occlusion

let session = ARSession()
let configuration = ARWorldTrackingConfiguration()

// Set required frame semantics
let semantics: ARConfiguration.FrameSemantics = .personSegmentationWithDepth
        
// Check if configuration and device supports the required semantics
if type(of: configuration).supportsFrameSemantics(semantics) {
    // Activate .personSegmentationWithDepth
    configuration.frameSemantics = semantics
}
session.run(configuration)

25:41 - Raycasting

let session = ARSession()
hitTest(point, types: [.existingPlaneUsingGeometry,
                       .estimatedVerticalPlane,
                       .estimatedHorizontalPlane])

let query = arView.makeRaycastQuery(from: point,
                                    allowing: .estimatedPlane,
                                    alignment: .any)

let raycast = session.trackedRaycast(query) { results in
   // result updates
}

Resources

Related Videos

WWDC21

WWDC20

WWDC19