- iOS 12.0+
- Xcode 11.3+
With the Vision framework, you can detect and track objects or rectangles through a sequence of frames coming from video, live capture, or other sources.
This sample app shows you how to pick an initial object to track, how to create Vision tracking requests to follow that object, and how to parse results from the object or rectangle tracker.
Preview the Sample App
To see this sample app in action, build and run the project in Xcode, then choose a video from your photo library. Once the video is loaded from your photo library, choose to track either objects or rectangles.
Nominate Objects or Rectangles to Track
To track rectangles, select Rectangles. The app runs the rectangle detector and shows rectangles it finds in a preview of the scene.
Otherwise, to track objects, select Objects. Then nominate objects to track by touching them in the preview and dragging boxes around them. You can select multiple objects; the app identifies them by their
UUID, using differently colored rectangles.
VNTrack class requires a detected object observation to initialize. The sample provides this observation by running
VNDetect, or by creating one from the bounding box you drew in the preview. It tracks multiple objects by iterating through each observation and creating a
VNDetected from its bounding box.
In your own app, if you prefer to nominate objects programmatically, you can use observations returned from Vision’s own object detection algorithms. For example, the Vision framework’s
VNImage accepts face detection, text detection, and barcode detection requests, and those requests return their results in subclasses of
VNObservation. You can pass these observations directly into
Selecting a salient object heavily influences the performance of the tracking algorithm; provide the best initial bounding box segmentation of your object that you can.
Track Objects or Rectangles with a Request Handler
The Vision framework handles tracking requests through a
VNSequence. Whereas the
VNImage handles object detection requests on a still image, the
VNSequence handles tracking requests.
Create a tracking request for each rectangle or object you’d like to track. Seed each tracking request with the observation created during nomination.
For each such request, call the sequence request handler’s
perform(_: method, making sure to pass in the video reader’s orientation to ensure upright tracking. This method runs synchronously; use a background queue, such as
work in the sample code, so that the main queue isn’t blocked while your requests execute.
By iterating through each selected object or rectangle, creating a tracking request from it, and calling
perform on the request handler, Vision follows the object or rectangle over the image sequence and returns results through its
Interpret Tracking Results
Access tracking results through the request’s
results property or its completion handler. A single tracking request represents a single tracked object in a one-to-one relationship. If a tracking request succeeds, its
results property contains
VNDetected objects describing the tracked object’s new location in the frame.
Use the observation’s
bounding to determine its location, so you can update your app or UI with the tracked object’s new location. Also use it to seed the next round of tracking.
In practice, you should periodically run a new tracking request with an updated set of input observations to capture objects that weren’t present in your initial nomination frame. For instance, you could create a new
VNTrack every ten frames.