Framework

Vision

Apply computer vision algorithms to perform a variety of tasks on input images and video.

Overview

The Vision framework performs face and face landmark detection, text detection, barcode recognition, image registration, and general feature tracking. Vision also allows the use of custom Core ML models for tasks like classification or object detection.

Topics

Still Image Analysis

Detecting Objects in Still Images

Locate and demarcate rectangles, faces, barcodes, and text in images using the Vision framework.

class VNImageRequestHandler

An object that processes one or more image analysis requests pertaining to a single image.

class VNImageBasedRequest

The abstract superclass for image analysis requests that focus on a specific part of an image.

class VNRequest

The abstract superclass for analysis requests.

class VNObservation

The abstract superclass for analysis results.

Image Sequence Analysis

class VNSequenceRequestHandler

An object that processes image analysis requests for each frame in a sequence.

Object Tracking

Tracking the User’s Face in Real Time

Detect and track faces from the selfie cam feed in real time.

Tracking Multiple Objects or Rectangles in Video

Apply Vision algorithms to track objects or rectangles throughout a video.

class VNTrackingRequest

The abstract superclass for image analysis requests that track unique features across multiple images or video frames.

class VNTrackRectangleRequest

An image analysis request that tracks movement of a previously identified rectangular object across multiple images or video frames.

class VNTrackObjectRequest

An image analysis request that tracks movement of a previously identified arbitrary object across multiple images or video frames.

class VNDetectedObjectObservation

An image analysis result that provides the position and extent of a detected image feature.

class VNRecognizedObjectObservation

A detected object observation with an array of classification labels that classify the recognized object.

Recognizing Objects in Live Capture

Apply Vision algorithms to identify objects in real-time video.

Rectangle Detection

class VNDetectRectanglesRequest

An image analysis request that finds projected rectangular regions in an image.

class VNRectangleObservation

Information about projected rectangular regions detected by an image analysis request.

Face Detection

class VNDetectFaceRectanglesRequest

An image analysis request that finds faces within an image.

class VNDetectFaceLandmarksRequest

An image analysis request that finds facial features (such as the eyes and mouth) in an image.

class VNFaceObservation

Face or facial-feature information detected by an image analysis request.

Barcode Detection

struct VNBarcodeSymbology

Symbologies supported by the Vision framework.

class VNDetectBarcodesRequest

An image analysis request that finds and recognizes barcodes in an image.

class VNBarcodeObservation

Barcode information detected by an image analysis request.

Text Detection

class VNDetectTextRectanglesRequest

An image analysis request that finds regions of visible text in an image.

class VNTextObservation

Information about regions of text detected by an image analysis request.

Horizon Detection

class VNDetectHorizonRequest

An image analysis request that determines the horizon angle in an image.

class VNHorizonObservation

Horizon angle information detected by an image analysis request.

Image Alignment

class VNTargetedImageRequest

The abstract superclass for image analysis requests that operate on both the processed image and a secondary image.

class VNImageRegistrationRequest

The abstract superclass for image analysis requests that align images based on their content.

class VNImageAlignmentObservation

The abstract superclass for image analysis results that describe the relative alignment of two images.

class VNTranslationalImageRegistrationRequest

An image analysis request that determines the affine transform needed to align the content of two images.

class VNImageTranslationAlignmentObservation

Affine transform information produced by an image alignment request.

class VNHomographicImageRegistrationRequest

An image analysis request that determines the perspective warp matrix needed to align the content of two images.

class VNImageHomographicAlignmentObservation

Perspective warp information produced by an image alignment request.

Machine-Learning Image Analysis

Classifying Images with Vision and Core ML

Preprocess photos using the Vision framework and classify them with a Core ML model.

Training a Create ML Model to Classify Flowers

Train a flower classifier using Create ML in Swift Playgrounds, and apply the resulting model to real-time image classification using Vision.

class VNCoreMLRequest

An image analysis request that uses a Core ML model to process images.

class VNClassificationObservation

Classification information produced by an image analysis request.

class VNPixelBufferObservation

An output image produced by a Core ML image analysis request.

class VNCoreMLFeatureValueObservation

A collection of key-value information produced by a Core ML image analysis request.

Coordinate Conversion

Vision uses a normalized coordinate space from 0.0 to 1.0 with lower left origin. For observations like landmarks in a face rect, these coordinates are relative to parent observations.

func VNImagePointForNormalizedPoint(CGPoint, Int, Int) -> CGPoint

Projects a point from normalized coordinate space into image coordinates.

func VNImageRectForNormalizedRect(CGRect, Int, Int) -> CGRect

Projects a rectangle from normalized coordinate space into image coordinates.

func VNNormalizedRectForImageRect(CGRect, Int, Int) -> CGRect

Normalizes a rectangle from image coordinates.

let VNNormalizedIdentityRect: CGRect

The normalized identity rectangle with origin (0,0) and unit length and width.

func VNNormalizedRectIsIdentityRect(CGRect) -> Bool

Returns true if the rectangle has origin (0,0) and unit length and width.

func VNImagePointForFaceLandmarkPoint(vector_float2, CGRect, Int, Int) -> CGPoint

Returns the image coordinates of a given face landmark point.

func VNNormalizedFaceBoundingBoxPointForLandmarkPoint(vector_float2, CGRect, Int, Int) -> CGPoint

Returns the coordinates of a given face landmark point, in bounding box coordinates.

Errors

let VNErrorDomain: String

The domain for NSError objects produced by Vision framework methods.

enum VNErrorCode

Error codes in NSError objects produced by Vision framework methods.

Vision Framework Version

var VNVisionVersionNumber: Double

The current version number of the Vision framework.

Revision Specification

protocol VNRequestRevisionProviding

A protocol for specifying revision number.

let VNRequestRevisionUnspecified: Int

A constant enumerating the case where a VNRequest revision has not been specified.

let VNTrackObjectRequestRevision1: Int

A constant for specifying revision 1 of VNTrackObjectRequest.

let VNCoreMLRequestRevision1: Int

A constant for specifying revision 1 of VNCoreMLRequest.