- iOS 12.0+
- Xcode 11.3+
The Vision framework can detect and track rectangles, faces, and other salient objects across a sequence of images.
This sample shows how to create requests to track human faces and interpret the results of those requests. In order to visualize the geometry of observed facial features, the code draws paths around the primary detected face and its most prominent features.
The sample app applies computer vision algorithms to find a face in the provided image. Once it finds a face, it attempts to track that face across subsequent frames of the video. Finally, it draws a green box around the observed face, as well as yellow paths outlining facial features, on Core Animation layers.
To see this sample app in action, build and run the project on iOS 11. Grant the app permission to use the camera.
Configure the Camera to Capture Video
This section shows how to set up a camera capture session using delegates to prepare images for Vision. Configuring the camera involves the following steps.
Query the user’s input device and configure it for video data output by specifying its resolution and camera.
Next, create a serial dispatch queue. This queue ensures that video frames, received asynchronously through delegate callback methods, are delivered in order. Establish a capture session with
AVMediavideo and set its device and resolution.
Finally, designate the video’s preview layer and add it to your view hierarchy, so the camera knows where to display video frames as they are captured.
Most of this code is boilerplate setup that enables you to handle video input properly. Tweak the values only if you choose a different camera arrangement.
Parse Face Detection Results
You can provide a completion handler for a Vision request handler to execute when it finishes. The completion handler indicates whether the request succeeded or resulted in an error. If the request succeeded, its
results property contains data specific to the type of request that you can use to identify the object’s location and bounding box.
For face rectangle requests, the
VNFace provided via callback includes a bounding box for each detected face. The sample uses this bounding box to draw paths around each of the detected face landmarks on top of the preview image.
In addition to drawing paths on
CALayer to visualize the feature, you can access specific facial-feature data such as eye, pupil, nose, and lip classifications in the face observation’s
landmarks property. Your app can leverage this information to track the user’s face and apply custom effects. For a face landmarks request, the face rectangle detector will also run implicitly.
Perform any image preprocessing in the delegate method
file. In this delegate method, create a pixel buffer to hold image contents, determine the device’s orientation, and check whether you have a face to track.
Before the Vision framework can track an object, it must first know which object to track. Determine which face to track by creating a
VNImage and passing it a still image frame. In the case of video, submit individual frames to the request handler as they arrive in the delegate method
VNImage handles detection of faces and objects in still images, but it doesn’t carry information from one frame to the next. For tracking an object, create a
VNSequence, which can handle
Track the Detected Face
Once you have an observation from the image request handler’s face detection, input it to the sequence request handler.
If the detector hasn’t found a face, create an image request handler to detect a face. Once that detection succeeds, and you have a face observation, track it by creating a
Then call the sequence handler’s
perform(_:) function. This method runs synchronously, so use a background queue to avoid blocking the main queue as it executes, and call back to the main queue only if you need to perform UI updates such as path drawing.