Hi vlttnv,
You are absolutely correct that referencing Apple's
ARKit Developer Documentation will be your best resource as you dive into the world of ARKit and Augmented Reality. I find myself referencing that documentation multiple times per day and it certainly has become my strongest resource. I can recall having many similar questions when I began exploring building AR apps, and I hope that a few thoughts my be helpful as you continue your journey. With that said, please do refer to Apple's Developer Documentation and sample projects first and foremost.
ARKit is the underlying framework that handles the "heavy lifting" of Augmented Reality experiences. ARKit configures the camera, gathers the relevant sensor data, and is responsible for detecting and locating the "anchors" that will tether your 3D content to the real world, as seen through the camera. In a sense, Augmented Reality is all about displaying 3D content in the real world, tethering your 3D content to anchors that are tracked and followed, making the 3D content appear as though it truly is in front of your user. As a whole, ARKit does the work to find those anchors, track those anchors, and handles the computations and augmentations to keep your 3D content tethered to those anchors, making the experience seem realistic.
Anchors can come in a variety of forms. Anchors are most commonly planes (a horizontal plane, like a floor, table top, or the ground, or a vertical plane, like a wall, window, or door), but can also be faces (a human face), an image (where you provide your app an image, and when the camera detects that image, that becomes the "anchor" for your 3D content), an object (where you provide your app a 3D object, and when the camera detects that object in the real world, that object becomes the "anchor" for your 3D content), a body (for the purposes of tracking the movement of joints and applying that movement to a 3D character), a location (using ARGeoAnchors, which anchor your 3D content to a specific set of longitude/latitude/altitude coordinates, as a CLLocation from the CoreLocation framework, if in a supported location), or a mesh (if your device has a LiDAR scanner, ARKit becomes capable of detecting more nuanced planes, such as recognizing a floor plane vs. a table-top plane, or a door plane vs. a wall plane). In all, your 3D content has to be anchored to
something in the real world, and ARKit handles finding these anchors and providing them to you for your use.
Whereas ARKit handles the heavy lifting of configuring the camera, finding anchors, and tracking those anchors, you have a choice of what type of Content Technology you plan to use to actually render/show your 3D content. The Content Technology is the framework doing the heavy lifting of either loading your 3D model (that you probably created elsewhere, such as a 3D modeling program, or in Reality Composer), or creating 3D content programmatically. There are four main choices for Content Technology;
RealityKit - RealityKit was announced at WWDC 2019 and is the newest of the 3D graphics technologies available in iOS. Much like other 3D technologies available in iOS, RealityKit offers you the ability to load 3D models you may have created in other 3D modeling programs, create 3D content (such as boxes, spheres, text, etc.), as well as create 3D lights, cameras, and more. As described in the
RealityKit Documentation, RealityKit allows you to
Simulate and render 3D content for use in your augmented reality apps. To your comment, RealityKit complements ARKit; ARKit gathers the information from the camera and sensors, RealityKit renders the 3D content.
SceneKit - SceneKit is another popular choice for working with ARKit. SceneKit is wildly popular in iOS development for generating 3D content. Similar to RealityKit, SceneKit offers the ability to load and create 3D models, handle lighting, reflections, shadows, etc., and works hand-in-hand with ARKit. SceneKit is also popular in game development, and given that many developers have experience with SceneKit from developing 3D games, it is a great way to bring that understanding to the world of Augmented Reality, as much of the same principles from 3D game development can be applied to AR.
SpriteKit - SpriteKit is another popular choice for game development and its principles, when brought into the world of AR, can still be applied. SpriteKit is a highly performant framework, and deals traditionally in 2D content. Again, this a hugely popular framework already for iOS game development, and its ability to work hand-in-hand with ARKit allows developers with existing knowledge to implement AR experiences.
Metal - Metal is a low-level graphics framework that is hugely powerful. In its simplest form, Metal allows you to take control of the entire graphics pipeline, offering you the ability to develop experiences from the ground up while maintaining exceptional performance. Metal talks directly to your device's GPU, and can allow you to have more nuanced control of the functionality of how everything from the camera to your 3D content appears. All of the aforementioned frameworks are built on top of Metal, and all are built to offer the same incredible performance and security that Metal provides. If you find yourself needing to work more directly with the GPU, Metal is your best choice.
It is worth saying that you will find that Apple's sample projects for ARKit leverage different content technologies at different times. I encourage you to review the sample projects relevant to the app you are building and see which may ideally fit your use case.
(Adding a second reply with follow-up thoughts).