Create images from rectangular shapes found in the user’s environment, and augment their appearance.
- iOS 13.0+
- Xcode 11.0+
To demonstrate general image recognition, this sample app uses Vision to detect rectangular shapes in the user’s environment that are most likely artwork or photos. Run the app on an iPhone or iPad, and point the device’s camera at a movie poster or wall-mounted picture frame. When the app detects a rectangular shape, you extract the pixel data defined by that shape from the camera feed to create an image.
The sample app changes the appearance of the image by applying a Core ML model that performs a stylistic alteration. By repeating this action in succession, you achieve real-time image processing using a trained neural network.
To complete the effect of augmenting an image in the user’s environment, you use ARKit’s image tracking feature. ARKit can hold an altered image steady over the original image as the user moves the device in their environment. ARKit also tracks the image if it moves on its own, as when the app recognizes a banner on the side of a bus, and the bus begins to drive away.
This sample app uses SceneKit to render its graphics.
Detect Rectangular Shapes in the User’s Environment
As shown below, you can use Vision in real-time to check the camera feed for rectangles. You perform this check up to 10 times a second by using
Rectangle to schedule a repeating timer with an
update of 0.1 seconds.
Because Vision requests can be taxing on the processor, check the camera feed no more than 10 times a second. Checking for rectangles more frequently may cause the app’s frame rate to decrease, without noticeably improving the app’s results.
When you make Vision requests in real-time with an ARKit–based app, you should do so serially. By waiting for one request to finish before invoking another, you ensure that the AR experience remains smooth and free of interruptions. In the
search function, you use the
is flag to ensure you’re only checking for one rectangle at a time:
The sample sets the
is flag to
false when a Vision request completes or fails.
Crop the Camera Feed to an Observed Rectangle
When Vision finds a rectangle in the camera feed, it provides you with the rectangle’s precise coordinates through a
VNRectangle. You apply those coordinates to a Core Image perspective correction filter to crop it, leaving you with just the image data inside the rectangular shape.
Using the first image in the Overview, the camera image is:
The cropped result is:
Create a Reference Image
To prepare to track the cropped image, you create an
ARReference, which provides ARKit with everything it needs, like its look and physical size, to locate that image in the physical environment.
ARKit requires that reference images contain sufficient detail to be recognizable; for example, a plain white image cannot be tracked. To prevent ARKit from failing to track a reference image, you validate it first before attempting to use it.
Track the Image Using ARKit
Provide the reference image to ARKit to get updates on where the image lies in the camera feed when the user moves their device. Do that by creating an image tracking session and passing the reference image in to the configuration’s
Vision made the initial observation about where the image lies in 2D space in the camera feed, but ARKit resolves its location in 3D space, in the physical environment. When ARKit succeeds in recognizing the image, it creates an
ARImage and a SceneKit node at the right position. You save the anchor and node that ARKit gives you by passing them to an
Alter the Image’s Appearance Using Core ML
This sample app is bundled with a Core ML model that performs image processing. Given an input image and an integer index, the model outputs a visually modified version of that image in one of eight different styles. The particular style of the output depends on the value of the index you pass in. The first style resembles burned paper, the second style resembles a mosaic, and there are six other styles as shown in the following image.
When Vision finds a rectangular shape in the user’s environment, you pass the camera’s image data defined by that rectangle into a new
The following code shows how you choose the artistic style to apply to the image by inputting the integer index to the Core ML model. Then, you process the image by calling the Core ML model’s
The following figure shows the result when you process the input image with a style index of 2.
Display the Altered Image in Augmented Reality
To complete the augmented reality effect, you cover the original image with the altered image. First, add a visualization node to hold the altered image as a child of the node provided by ARKit.
When Core ML produces the output image, you call
image to pass the model’s output image into the visualization node’s
display function, where you set the image as the visualization node’s contents.
The visualization node’s contents overlap the original image when SceneKit displays it. In the case of the image above, the following screenshot shows the end result as seen through a user’s device:
Continually Update the Image’s Appearance
This sample demonstrates real-time image processing by switching artistic styles over time. By calling
select, you can make successive alterations of the original image.
style is the integer input to the Core ML model that determines the style of the output.
Visualization fades between two images of differing style, which creates the effect that the tracked image is constantly transforming into a new look. You accomplish this effect by defining two SceneKit nodes. One node displays the current altered image, and the other displays the previous altered image.
You fade between these two nodes by running an opacity animation:
When the animation finishes, you begin altering the original image with the next artistic style by calling
Respond to Image Tracking Updates
As part of the image tracking feature, ARKit continues to look for the image throughout the AR session. If the image itself moves, ARKit updates the
ARImage with its corresponding image’s new location in the physical environment, and calls your delegate’s
renderer(_: to notify your app of the change.
The sample app tracks a single image at a time. To do that, you invalidate the current image tracking session if an image the app was tracking is no longer visible. This, in turn, enables Vision to start looking for a new rectangular shape in the camera feed.