How to get the saliency mask of the VNDetectDocumentSegmentationRequest

In one of the WWDC videos, the VNDetectDocumentSegmentationRequest result is described in the following way:

The result of the request is a low resolution segmentation mask, where each pixel represents a confidence if that pixel is part of the detected document or not. In addition it provides the four corner points of the quadrilateral.

Similarly, in the VNDetectDocumentSegmentationRequest docs there's the following statement:

The result that the request generates contains the four corner points of a document’s quadrilateral and saliency mask.

So the first part ("four corner points of a document’s quadrilateral") is easy - it's in the results of the request, which are in VNRectangleObservation format:

let request = VNDetectDocumentSegmentationRequest { (request, error) in
    guard let results = request.results as? [VNRectangleObservation] else {
        // Failed
    }
    // Process VNRectangleObservations
}

but how do I obtain the "low resolution segmentation mask" / "saliency mask" for VNDetectDocumentSegmentationRequest?

Replies

I figured it out: apparently the result is always the same - VNRectangleObservation. But underlying VNDetectedObjectObservation contains globalSegmentationMask. So:

let request = VNDetectDocumentSegmentationRequest { (request, error) in
    guard let results = request.results as? [VNRectangleObservation],
               let result = results.first else {
        // No results
        return
    }

    guard let segmentationMask = result.globalSegmentationMask, // VNPixelBufferObservation
               let pixelBuffer = segmentationMask.pixelBuffer else  {      
       // Mask is unusable
       return
    }
 
    // And then for example:
    let ciImage = CIImage(cvPixelBuffer: pixelBuffer)
}