Build a custom AR view by rendering camera images and using position-tracking information to display overlay content.
ARKit includes view classes for easily displaying AR experiences with SceneKit or SpriteKit. However, if you instead build your own rendering engine (or integrate with a third-party engine), ARKit also provides all the support necessary to display an AR experience with a custom view.
In any AR experience, the first step is to configure an ARSession object to manage camera capture and motion processing. A session defines and maintains a correspondence between the real-world space the device inhabits and a virtual space where you model AR content. To display your AR experience in a custom view, you’ll need to:
Retrieve video frames and tracking information from the session.
Render those frame images as the backdrop for your view.
Use the tracking information to position and draw AR content atop the camera image.
Get Video Frames and Tracking Data from the Session
Create and maintain your own ARSession instance, and run it with a session configuration appropriate for the kind of AR experience you want to support. The session captures video from the camera, tracks the device’s position and orientation in a modeled 3D space, and provides ARFrame objects. Each such object contains both an individual video frame image and position tracking information from the moment that frame was captured.
There are two ways to access ARFrame objects produced by an AR session, depending on whether your app favors a pull or a push design pattern.
If you prefer to control frame timing (the pull design pattern), use the session’s currentFrame property to get the current frame image and tracking information each time you redraw your view’s contents. The ARKit Xcode template uses this approach:
Alternatively, if your app design favors a push pattern, implement the session:didUpdateFrame: delegate method, and the session will call it once for each video frame it captures (at 60 frames per second by default).
Upon obtaining a frame, you’ll need to draw the camera image, and update and render any overlay content your AR experience includes.
Draw the Camera Image
Each ARFrame object’s capturedImage property contains a pixel buffer captured from the device camera. To draw this image as the backdrop for your custom view, you’ll need to create textures from the image content and submit GPU rendering commands that use those textures.
The pixel buffer’s contents are encoded in a biplanar YCbCr (also called YUV) data format; to render the image you’ll need to convert this pixel data to a drawable RGB format. For rendering with Metal, you can perform this conversion most efficiently in GPU shader code. Use CVMetalTextureCache APIs to create two Metal textures from the pixel buffer—one each for the buffer’s luma (Y) and chroma (CbCr) planes:
Next, encode render commands that draw those two textures using a fragment function that performs YCbCr to RGB conversion with a color transform matrix:
Track and Render Overlay Content
AR experiences typically focus on rendering 3D overlay content so that the content appears to be part of the real world seen in the camera image. To achieve this illusion, use the ARAnchor class to model the position and orientation of your own 3D content relative to real-world space. Anchors provide transforms that you can reference during rendering.
For example, the Xcode template creates an anchor located about 20 cm in front of the device whenever a user taps on the screen:
In your rendering engine, use the transform property of each ARAnchor object to place visual content. The Xcode template uses each of the anchors added to the session in its handleTap method to position a simple cube mesh:
Render with Realistic Lighting
When you configure shaders for drawing 3D content in your scene, use the estimated lighting information in each ARFrame object to produce more realistic shading: