Are there any tutorials or guides for AR apps?

Hello.
Are there any tutorials or guides to follow on developing AR apps? From what I see in the documentation it is mostly a reference.

As someone new to developing AR apps for iOS I was wondering if there is some documentation that gives an overview of the general approach and structure of AR apps.

Thanks,
Val

Accepted Reply

Hi vlttnv,

You are absolutely correct that referencing Apple's ARKit Developer Documentation will be your best resource as you dive into the world of ARKit and Augmented Reality. I find myself referencing that documentation multiple times per day and it certainly has become my strongest resource. I can recall having many similar questions when I began exploring building AR apps, and I hope that a few thoughts my be helpful as you continue your journey. With that said, please do refer to Apple's Developer Documentation and sample projects first and foremost.

ARKit


ARKit is the underlying framework that handles the "heavy lifting" of Augmented Reality experiences. ARKit configures the camera, gathers the relevant sensor data, and is responsible for detecting and locating the "anchors" that will tether your 3D content to the real world, as seen through the camera. In a sense, Augmented Reality is all about displaying 3D content in the real world, tethering your 3D content to anchors that are tracked and followed, making the 3D content appear as though it truly is in front of your user. As a whole, ARKit does the work to find those anchors, track those anchors, and handles the computations and augmentations to keep your 3D content tethered to those anchors, making the experience seem realistic.

Anchors can come in a variety of forms. Anchors are most commonly planes (a horizontal plane, like a floor, table top, or the ground, or a vertical plane, like a wall, window, or door), but can also be faces (a human face), an image (where you provide your app an image, and when the camera detects that image, that becomes the "anchor" for your 3D content), an object (where you provide your app a 3D object, and when the camera detects that object in the real world, that object becomes the "anchor" for your 3D content), a body (for the purposes of tracking the movement of joints and applying that movement to a 3D character), a location (using ARGeoAnchors, which anchor your 3D content to a specific set of longitude/latitude/altitude coordinates, as a CLLocation from the CoreLocation framework, if in a supported location), or a mesh (if your device has a LiDAR scanner, ARKit becomes capable of detecting more nuanced planes, such as recognizing a floor plane vs. a table-top plane, or a door plane vs. a wall plane). In all, your 3D content has to be anchored to something in the real world, and ARKit handles finding these anchors and providing them to you for your use.

Content Technology


Whereas ARKit handles the heavy lifting of configuring the camera, finding anchors, and tracking those anchors, you have a choice of what type of Content Technology you plan to use to actually render/show your 3D content. The Content Technology is the framework doing the heavy lifting of either loading your 3D model (that you probably created elsewhere, such as a 3D modeling program, or in Reality Composer), or creating 3D content programmatically. There are four main choices for Content Technology;

RealityKit - RealityKit was announced at WWDC 2019 and is the newest of the 3D graphics technologies available in iOS. Much like other 3D technologies available in iOS, RealityKit offers you the ability to load 3D models you may have created in other 3D modeling programs, create 3D content (such as boxes, spheres, text, etc.), as well as create 3D lights, cameras, and more. As described in the RealityKit Documentation, RealityKit allows you to Simulate and render 3D content for use in your augmented reality apps. To your comment, RealityKit complements ARKit; ARKit gathers the information from the camera and sensors, RealityKit renders the 3D content.SceneKit - SceneKit is another popular choice for working with ARKit. SceneKit is wildly popular in iOS development for generating 3D content. Similar to RealityKit, SceneKit offers the ability to load and create 3D models, handle lighting, reflections, shadows, etc., and works hand-in-hand with ARKit. SceneKit is also popular in game development, and given that many developers have experience with SceneKit from developing 3D games, it is a great way to bring that understanding to the world of Augmented Reality, as much of the same principles from 3D game development can be applied to AR.SpriteKit - SpriteKit is another popular choice for game development and its principles, when brought into the world of AR, can still be applied. SpriteKit is a highly performant framework, and deals traditionally in 2D content. Again, this a hugely popular framework already for iOS game development, and its ability to work hand-in-hand with ARKit allows developers with existing knowledge to implement AR experiences. Metal - Metal is a low-level graphics framework that is hugely powerful. In its simplest form, Metal allows you to take control of the entire graphics pipeline, offering you the ability to develop experiences from the ground up while maintaining exceptional performance. Metal talks directly to your device's GPU, and can allow you to have more nuanced control of the functionality of how everything from the camera to your 3D content appears. All of the aforementioned frameworks are built on top of Metal, and all are built to offer the same incredible performance and security that Metal provides. If you find yourself needing to work more directly with the GPU, Metal is your best choice.It is worth saying that you will find that Apple's sample projects for ARKit leverage different content technologies at different times. I encourage you to review the sample projects relevant to the app you are building and see which may ideally fit your use case.

(Adding a second reply with follow-up thoughts).

Replies

There are many resources and tutorials available for developing AR apps. Depending on your experience with iOS, you may find some of the terminology and thought processes easy to digest, whereas in other cases, since you're dealing with 3D objects, there can often be a different way of looking at things, given the use of planes, anchors, and coordinates.

One resource that might be ideal for getting started is to create a sample AR app in Xcode. In Xcode 12, when you choose to start a new project, an Augmented Reality App is a template option to start with. Once you select that, you can choose from a "Content Technology" and "Interface." I've found that choosing RealityKit as the Content Technology and SwiftUI as the Interface allows Xcode to create a great deal of the boilerplate code, so you can focus more-so on the AR itself.

Additionally, Reality Composer is a great tool for getting started with AR. It is accessible as a standalone for iPhone/iPad, as well as Mac through Xcode (Xcode -> Open Developer Tool -> Reality Composer).

At the core, I think the question you'd have to ask is; am I looking to quickly get started creating 3D/AR content and seeing it come to life, or does it interest me more to know the inner-workings and have more nuanced control of how the AR experience works? Apple's Tracking and Visualizing Planes is a great starting point project, as are all of the ARKit sessions from past WWDC (available in Apple's Developer app).
Thanks Brandon.

I spent some more time going through the documentation and watching some of the WWDC videos. I followed a very well made YouTube tutorial to create a simple RealityKit app.

I guess that thing that is a bit unclear to me is the life-cycle of an AR app. I haven't been able to find an explanation for that but I have the feeling that if I look at the general iOS app development documentation I might find it there.

Something else that is still a bit unclear to me is the difference between ARKit and RealityKit. From what I understand, RealityKit is a wrapper for ARKit and makes it easier to develop AR apps and if you want to do something more advanced you'd have to use ARKit directly.

What I am looking to do is understand how the AR frameworks work, what the patterns for development are, and what to use/do depending on the app I am trying to make. I guess I am more interested in knowing enough of the inner-workings to make informed decisions rather then seeing something quickly without understanding it.


Hi vlttnv,

You are absolutely correct that referencing Apple's ARKit Developer Documentation will be your best resource as you dive into the world of ARKit and Augmented Reality. I find myself referencing that documentation multiple times per day and it certainly has become my strongest resource. I can recall having many similar questions when I began exploring building AR apps, and I hope that a few thoughts my be helpful as you continue your journey. With that said, please do refer to Apple's Developer Documentation and sample projects first and foremost.

ARKit


ARKit is the underlying framework that handles the "heavy lifting" of Augmented Reality experiences. ARKit configures the camera, gathers the relevant sensor data, and is responsible for detecting and locating the "anchors" that will tether your 3D content to the real world, as seen through the camera. In a sense, Augmented Reality is all about displaying 3D content in the real world, tethering your 3D content to anchors that are tracked and followed, making the 3D content appear as though it truly is in front of your user. As a whole, ARKit does the work to find those anchors, track those anchors, and handles the computations and augmentations to keep your 3D content tethered to those anchors, making the experience seem realistic.

Anchors can come in a variety of forms. Anchors are most commonly planes (a horizontal plane, like a floor, table top, or the ground, or a vertical plane, like a wall, window, or door), but can also be faces (a human face), an image (where you provide your app an image, and when the camera detects that image, that becomes the "anchor" for your 3D content), an object (where you provide your app a 3D object, and when the camera detects that object in the real world, that object becomes the "anchor" for your 3D content), a body (for the purposes of tracking the movement of joints and applying that movement to a 3D character), a location (using ARGeoAnchors, which anchor your 3D content to a specific set of longitude/latitude/altitude coordinates, as a CLLocation from the CoreLocation framework, if in a supported location), or a mesh (if your device has a LiDAR scanner, ARKit becomes capable of detecting more nuanced planes, such as recognizing a floor plane vs. a table-top plane, or a door plane vs. a wall plane). In all, your 3D content has to be anchored to something in the real world, and ARKit handles finding these anchors and providing them to you for your use.

Content Technology


Whereas ARKit handles the heavy lifting of configuring the camera, finding anchors, and tracking those anchors, you have a choice of what type of Content Technology you plan to use to actually render/show your 3D content. The Content Technology is the framework doing the heavy lifting of either loading your 3D model (that you probably created elsewhere, such as a 3D modeling program, or in Reality Composer), or creating 3D content programmatically. There are four main choices for Content Technology;

RealityKit - RealityKit was announced at WWDC 2019 and is the newest of the 3D graphics technologies available in iOS. Much like other 3D technologies available in iOS, RealityKit offers you the ability to load 3D models you may have created in other 3D modeling programs, create 3D content (such as boxes, spheres, text, etc.), as well as create 3D lights, cameras, and more. As described in the RealityKit Documentation, RealityKit allows you to Simulate and render 3D content for use in your augmented reality apps. To your comment, RealityKit complements ARKit; ARKit gathers the information from the camera and sensors, RealityKit renders the 3D content.SceneKit - SceneKit is another popular choice for working with ARKit. SceneKit is wildly popular in iOS development for generating 3D content. Similar to RealityKit, SceneKit offers the ability to load and create 3D models, handle lighting, reflections, shadows, etc., and works hand-in-hand with ARKit. SceneKit is also popular in game development, and given that many developers have experience with SceneKit from developing 3D games, it is a great way to bring that understanding to the world of Augmented Reality, as much of the same principles from 3D game development can be applied to AR.SpriteKit - SpriteKit is another popular choice for game development and its principles, when brought into the world of AR, can still be applied. SpriteKit is a highly performant framework, and deals traditionally in 2D content. Again, this a hugely popular framework already for iOS game development, and its ability to work hand-in-hand with ARKit allows developers with existing knowledge to implement AR experiences. Metal - Metal is a low-level graphics framework that is hugely powerful. In its simplest form, Metal allows you to take control of the entire graphics pipeline, offering you the ability to develop experiences from the ground up while maintaining exceptional performance. Metal talks directly to your device's GPU, and can allow you to have more nuanced control of the functionality of how everything from the camera to your 3D content appears. All of the aforementioned frameworks are built on top of Metal, and all are built to offer the same incredible performance and security that Metal provides. If you find yourself needing to work more directly with the GPU, Metal is your best choice.It is worth saying that you will find that Apple's sample projects for ARKit leverage different content technologies at different times. I encourage you to review the sample projects relevant to the app you are building and see which may ideally fit your use case.

(Adding a second reply with follow-up thoughts).

Lifecycle

Regardless of which Content Technology you choose, there are certain principles that apply across the board when creating an ARKit experience. Namely;
  • Define the type of configuration relevant for your AR experience (world tracking, face tracking, body tracking, image tracking, object tracking, geolocation tracking), as well as any relevant parameters (if you're doing world tracking, perhaps you want to specify you are only looking for vertical planes to tether 3D content to).

It is worth noting that you can somewhat mix-and-match different use cases (for example, as noted in the Combining User Face-Tracking and World Tracking sample project, you can set up a world tracking configuration to find horizontal and vertical planes while still receiving face tracking information - that sample demonstrates generating facial expressions on a 3D character in the "real world," while using your face to drive those expressions).
  • Configure any relevant configuration parameters, if supported (some examples would be specifying finding only horizontal planes vs. vertical planes, detecting meshes with their classifications, if your device has a LiDAR scanner, or configuring the environmental texturing parameters). Most configurations have a default set of parameters that even if you do not configure manually, you can have a great experience.

  • Configure any parameters relevant to the view that will display the AR content (if you are working with RealityKit, that comes in the form of an ARView, if you are working with SceneKit, that comes in the form of an ARSCNView, if you are working with Metal, that comes in the form of a MTKView). This could include inserting physics or object occlusion if using an ARView from RealityKit, or debug options to evaluate performance when working in SceneKit. Each Content Technology has their own set of parameters for the view that can be configured.

  • Configure a delegate for your AR Session so you can receive relevant updates from ARKit (such as receiving the camera frame, receiving callbacks when new anchors are added, updated, and removed, as well as interruptions and performance concerns to address that could impact the user experience).

  • Run the session with the configuration you set up. This should then begin an AR session leveraging the configuration you have requested. Based on the type of anchor your configuration is looking for, each time a new anchor is added, you can receive a callback in your delegate method. From there, you can use that anchor to add your 3D content.

Here's a sample of a very simple setup of an ARWorldTrackingConfiguration using RealityKit. This exists in my ViewController's viewDidLoad() method;

Code Block
arView.session.delegate = self
let configuration = ARWorldTrackingConfiguration()
configuration.environmentTexturing = .automatic
arView.session.run(configuration)


While that is as simple as it gets, ARKit is handling the work of setting up the camera, preparing to configure itself to look for any horizontal or vertical plane (the default) in the real world, apply automatic environment texturing for the more realistic appearance of content in the environment, and run the session.

Once an anchor is found, you'll receive the didAdd anchors: [ARAnchor] callback in your ARSession delegate, to which point, you could call or generate 3D content and add it to your Content Technology's view hierarchy.

Closing

There is a great deal to learn about ARKit, and many different ways to build experiences. What your vision for your app is may help inform your choices of which underlying Content Technology you will choose. Do review the ARKit Developer Documentation and sample projects as a starting point. This community, too, has been very helpful to me, and hopefully will be able to assist as you move further into your development.

Hello. I recommend reading the complete guide to creating an augmented reality app https://www.cleveroad.com/blog/location-based-ar-apps-development/. The biggest players in the technology industry predict the great potential of AR and continue to develop software to create AR applications.