VisionKit

RSS for tag

Scan documents with the camera on iPhone and iPad devices using VisionKit.

VisionKit Documentation

Posts under VisionKit tag

65 Posts
Sort by:
Post not yet marked as solved
4 Replies
1.8k Views
Hi, When using VNFeaturePrintObservation and then computing the distance using two images, the values that it returns varies heavily. When two identical images (same image file) is inputted into function (below) that I have used to compare the images, the distance does not return 0 while it is expected to, since they are identical images. Also, what is the upper limit of computeDistance? I am trying to find the percentage similarity between the two images. (Of course, this cannot be done unless the issue above is resolved). Code that I have used is below func featureprintObservationForImage(image: UIImage) -> VNFeaturePrintObservation? {     let requestHandler = VNImageRequestHandler(cgImage: image.cgImage!, options: [:])     let request = VNGenerateImageFeaturePrintRequest()     request.usesCPUOnly = true // Simulator Testing     do {       try requestHandler.perform([request])       return request.results?.first as? VNFeaturePrintObservation     } catch {       print("Vision Error: \(error)")       return nil     }   }   func compare(origImg: UIImage, drawnImg: UIImage) -> Float? {     let oImgObservation = featureprintObservationForImage(image: origImg)     let dImgObservation = featureprintObservationForImage(image: drawnImg)     if let oImgObservation = oImgObservation {       if let dImgObservation = dImgObservation {         var distance: Float = -1         do {           try oImgObservation.computeDistance(&distance, to: dImgObservation)         } catch {           fatalError("Failed to Compute Distance")         }         if distance == -1 {           return nil         } else {           return distance         }       } else {         print("Drawn Image Observation found Nil")       }     } else {       print("Original Image Observation found Nil")     }     return nil   } Thanks for all the help!
Posted
by chewethan.
Last updated
.
Post not yet marked as solved
2 Replies
903 Views
Hello, I'm doing a iOS app and I'm trying to find a way to extract programmatically a person from his identity picture (and to leave behind the background) I'm watching WWDC "Lift subjects from images in your app" video (a really cool feature) and i'm wondering if this feature would be possible programmatically, without the need of a human person interaction. Thank you.
Posted
by Gohoro.
Last updated
.
Post not yet marked as solved
2 Replies
627 Views
In VisionOS is it possible to detect when a user is touching a physical surface in the real world and also to project 2D graphics on that surface? So imagine a windowless 2D app that is projected onto a surface, essentially turning a physical wall, table, etc. into a giant touchscreen? So kinda like this: https://appleinsider.com/articles/23/06/23/vision-pro-will-turn-any-surface-into-a-display-with-touch-control But I want every surface in the room to be touchable and be able to display 2D graphics on the face of that surface and not floating in space. So essentially turning every physical surface in the room into a UIView. Thanks!
Posted
by coderkid.
Last updated
.
Post not yet marked as solved
1 Replies
752 Views
I'm using VNDocumentViewController to scan some documents which is working fine. But not infrequently, the app becomes slow after dismissing the VNDocumentViewController. There is no VNDocumentViewController instance allocated, but according to the allocations gathered by instruments, there is a ICDocCamViewController still living and using between 200 and 300 MB. I guess that ICDocCamViewController is an internal component of VNDocumentViewController. Are there any known issues? Unfortunately, I do not see any way to free ICDocCamViewController. 281.37 MB 93.1% 104708 start_wqthread 281.37 MB 93.1% 104708 _pthread_wqthread 256.04 MB 84.7% 42747 _dispatch_workloop_worker_thread 256.03 MB 84.7% 42564 _dispatch_lane_invoke 256.03 MB 84.7% 42564 _dispatch_lane_serial_drain 249.91 MB 82.7% 22258 _dispatch_client_callout 237.01 MB 78.4% 8562 _dispatch_call_block_and_release 236.50 MB 78.3% 3836 __77-[ICDocCamViewController saveCapturedImage:metaData:rects:completionHandler:]_block_invoke_3 236.50 MB 78.3% 3836 -[ICDocCamViewController cropAndFilterImage:rects:filterType:] 236.49 MB 78.3% 3678 +[ICDocCamImageFilters filteredImage:orientation:imageFilterType:] 236.49 MB 78.3% 3678 +[ICDocCamImageFilters colorDocument:orientation:] 236.47 MB 78.3% 3476 -[CIContext(createCGImage) createCGImage:fromRect:] 236.47 MB 78.3% 3476 -[CIContext(_createCGImageInternal) _createCGImage:fromRect:format:premultiplied:colorSpace:deferred:renderCallback:] 236.40 MB 78.2% 2096 -[CIContext(CIRenderDestination) startTaskToRender:fromRect:toDestination:atPoint:error:] 236.40 MB 78.2% 2096 -[CIContext(CIRenderDestination) _startTaskToRender:toDestination:forPrepareRender:forClear:error:] 236.40 MB 78.2% 2096 CI::RenderToBitmap::render(CI::Image*, CI::Context*) const 236.40 MB 78.2% 2096 CI::image_get_bitmap(CI::Context*, CI::Image*, CGRect, CGColorSpace*, CI::Bitmap*, CI::RenderDestination const*) 236.33 MB 78.2% 716 CI::tile_node_graph(CI::Context*, CI::RenderDestination const*, char const*, CI::Node*, CGRect const&, CI::PixelFormat, CI::swizzle_info const&, CI::TileTask* (CI::ProgramNode*, CGRect) block_pointer) 236.33 MB 78.2% 716 CI::recursive_tile(CI::RenderTask*, CI::Context*, CI::RenderDestination const*, char const*, CI::Node*, CGRect const&, CI::PixelFormat, CI::swizzle_info const&, CI::TileTask* (CI::ProgramNode*, CGRect) block_pointer) 236.26 MB 78.2% 212 invocation function for block in CI::image_get_bitmap(CI::Context*, CI::Image*, CGRect, CGColorSpace*, CI::Bitmap*, CI::RenderDestination const*) 236.26 MB 78.2% 212 CI::Context::render(CI::ProgramNode*, CGRect const&) 236.26 MB 78.2% 212 CI::Context::recursive_render(CI::TileTask*, CI::roiKey const&, CI::Node*, bool) 236.26 MB 78.2% 211 CI::Context::recursive_render(CI::TileTask*, CI::roiKey const&, CI::Node*, bool) 236.24 MB 78.2% 196 CI::Context::recursive_render(CI::TileTask*, CI::roiKey const&, CI::Node*, bool) 236.23 MB 78.2% 181 CI::Context::recursive_render(CI::TileTask*, CI::roiKey const&, CI::Node*, bool) 236.21 MB 78.2% 166 CI::Context::recursive_render(CI::TileTask*, CI::roiKey const&, CI::Node*, bool) 236.19 MB 78.2% 151 CI::Context::recursive_render(CI::TileTask*, CI::roiKey const&, CI::Node*, bool) 17.02 KB 0.0% 14 CI::MetalTextureManager::create_intermediate(CI::IntermediateDescriptor const&, unsigned long long, CGRect const&, unsigned long, unsigned long, bool) 128 Bytes 0.0% 1 CI::MetalContext::render_intermediate_node(CI::TileTask*, CI::ProgramNode*, CGRect const&, CI::intermediate_t*, bool, void () block_pointer) 17.02 KB 0.0% 14 CI::MetalTextureManager::create_intermediate(CI::IntermediateDescriptor const&, unsigned long long, CGRect const&, unsigned long, unsigned long, bool) 128 Bytes 0.0% 1 CI::MetalContext::render_intermediate_node(CI::TileTask*, CI::ProgramNode*, CGRect const&, CI::intermediate_t*, bool, void () block_pointer) 17.02 KB 0.0% 14 CI::MetalTextureManager::create_intermediate(CI::IntermediateDescriptor const&, unsigned long long, CGRect const&, unsigned long, unsigned long, bool) 128 Bytes 0.0% 1 CI::MetalContext::render_intermediate_node(CI::TileTask*, CI::ProgramNode*, CGRect const&, CI::intermediate_t*, bool, void () block_pointer) 17.02 KB 0.0% 14 CI::MetalTextureManager::create_intermediate(CI::IntermediateDescriptor const&, unsigned long long, CGRect const&, unsigned long, unsigned long, bool) 128 Bytes 0.0% 1 CI::MetalContext::render_intermediate_node(CI::TileTask*, CI::ProgramNode*, CGRect const&, CI::intermediate_t*, bool, void () block_pointer) 128 Bytes 0.0% 1 CI::MetalContext::render_root_node(CI::TileTask*, CI::ProgramNode*, CGRect const&, void () block_pointer, void () block_pointer) 69.00 KB 0.0% 504 CI::gather_rois_for_program_graph(CI::Context*, char const*, CI::ProgramNode*, CGRect) 71.72 KB 0.0% 1380 CI::prepare_initial_graph(CI::Context*, char const*, CI::Image*, CI::RenderDestination const*, CGRect, CGColorSpace*, CI::PixelFormat, CI::swizzle_info, CI::Affine const&, bool, CI::TextureDescriptor*) 71.72 KB 0.0% 1380 CI::prepare_initial_graph(CI::Context*, char const*, CI::Image*, CI::RenderDestination const*, CGRect, CGColorSpace*, CI::PixelFormat, CI::swizzle_info, CI::Affine const&, bool, CI::TextureDescriptor*) 18.34 KB 0.0% 202 -[CIPaperWash outputImage]
Posted
by chkpnt.
Last updated
.
Post marked as solved
4 Replies
2.1k Views
I work on an iOS app that displays images that often contain text, and I'm adding support for ImageAnalysisInteraction as described in this WWDC 2022 session. I have gotten as far as making the interaction show up and being able to select text and get the system selection menu, and even add my own action to the menu via the buildMenuWithBuilder API. But what I really want to do with my custom action is get the selected text and do a custom lookup-like thing to check the text against other content in my app. So how do I get the selected text from an ImageAnalysisInteraction on a UIImageView? The docs show methods to check if there is selected text, but I want to know what the text is.
Posted Last updated
.
Post not yet marked as solved
2 Replies
1.5k Views
With the release of Xcode 13, a large section of my vision framework processing code became errors and cannot compile. All of these have became deprecated. This is my original code:  do {       // Perform VNDetectHumanHandPoseRequest       try handler.perform([handPoseRequest])       // Continue only when a hand was detected in the frame.       // Since we set the maximumHandCount property of the request to 1, there will be at most one observation.       guard let observation = handPoseRequest.results?.first else {         self.state = "no hand"         return       }       // Get points for thumb and index finger.       let thumbPoints = try observation.recognizedPoints(forGroupKey: .handLandmarkRegionKeyThumb)       let indexFingerPoints = try observation.recognizedPoints(forGroupKey: .handLandmarkRegionKeyIndexFinger)       let middleFingerPoints = try observation.recognizedPoints(forGroupKey: .handLandmarkRegionKeyMiddleFinger)       let ringFingerPoints = try observation.recognizedPoints(forGroupKey: .handLandmarkRegionKeyRingFinger)       let littleFingerPoints = try observation.recognizedPoints(forGroupKey: .handLandmarkRegionKeyLittleFinger)       let wristPoints = try observation.recognizedPoints(forGroupKey: .all)               // Look for tip points.       guard let thumbTipPoint = thumbPoints[.handLandmarkKeyThumbTIP],          let thumbIpPoint = thumbPoints[.handLandmarkKeyThumbIP],          let thumbMpPoint = thumbPoints[.handLandmarkKeyThumbMP],          let thumbCMCPoint = thumbPoints[.handLandmarkKeyThumbCMC] else {         self.state = "no tip"         return       }               guard let indexTipPoint = indexFingerPoints[.handLandmarkKeyIndexTIP],          let indexDipPoint = indexFingerPoints[.handLandmarkKeyIndexDIP],          let indexPipPoint = indexFingerPoints[.handLandmarkKeyIndexPIP],          let indexMcpPoint = indexFingerPoints[.handLandmarkKeyIndexMCP] else {         self.state = "no index"         return       }               guard let middleTipPoint = middleFingerPoints[.handLandmarkKeyMiddleTIP],          let middleDipPoint = middleFingerPoints[.handLandmarkKeyMiddleDIP],          let middlePipPoint = middleFingerPoints[.handLandmarkKeyMiddlePIP],          let middleMcpPoint = middleFingerPoints[.handLandmarkKeyMiddleMCP] else {         self.state = "no middle"         return       }               guard let ringTipPoint = ringFingerPoints[.handLandmarkKeyRingTIP],          let ringDipPoint = ringFingerPoints[.handLandmarkKeyRingDIP],          let ringPipPoint = ringFingerPoints[.handLandmarkKeyRingPIP],          let ringMcpPoint = ringFingerPoints[.handLandmarkKeyRingMCP] else {         self.state = "no ring"         return       }               guard let littleTipPoint = littleFingerPoints[.handLandmarkKeyLittleTIP],          let littleDipPoint = littleFingerPoints[.handLandmarkKeyLittleDIP],          let littlePipPoint = littleFingerPoints[.handLandmarkKeyLittlePIP],          let littleMcpPoint = littleFingerPoints[.handLandmarkKeyLittleMCP] else {         self.state = "no little"         return       }               guard let wristPoint = wristPoints[.handLandmarkKeyWrist] else {         self.state = "no wrist"         return       } ... } Now every line from thumbPoints onwards results in error, I have fixed the first part (not sure if it is correct or not as it cannot compile) to :         let thumbPoints = try observation.recognizedPoints(forGroupKey: VNHumanHandPoseObservation.JointsGroupName.thumb.rawValue)        let indexFingerPoints = try observation.recognizedPoints(forGroupKey: VNHumanHandPoseObservation.JointsGroupName.indexFinger.rawValue)        let middleFingerPoints = try observation.recognizedPoints(forGroupKey: VNHumanHandPoseObservation.JointsGroupName.middleFinger.rawValue)        let ringFingerPoints = try observation.recognizedPoints(forGroupKey: VNHumanHandPoseObservation.JointsGroupName.ringFinger.rawValue)        let littleFingerPoints = try observation.recognizedPoints(forGroupKey: VNHumanHandPoseObservation.JointsGroupName.littleFinger.rawValue)        let wristPoints = try observation.recognizedPoints(forGroupKey: VNHumanHandPoseObservation.JointsGroupName.littleFinger.rawValue) I tried many different things but just could not get the retrieving individual points to work. Can anyone help on fixing this?
Posted Last updated
.
Post not yet marked as solved
1 Replies
809 Views
Is there a framework that allows for classic image processing operations in real-time from incoming imagery from the front-facing cameras before they are displayed on the OLED screens? Things like spatial filtering, histogram equalization, and image warping. I saw the documentation for the Vision framework, but it seems to address high-level tasks, like object and recognition. Thank you!
Posted Last updated
.
Post marked as solved
5 Replies
2.7k Views
Did something change on face detection / Vision Framework on iOS 15? Using VNDetectFaceLandmarksRequest and reading the VNFaceLandmarkRegion2D to detect eyes is not working on iOS 15 as it did before. I am running the exact same code on an iOS 14 and iOS 15 device and the coordinates are different as seen on the screenshot? Any Ideas?
Posted
by Ships66.
Last updated
.
Post not yet marked as solved
1 Replies
624 Views
How can I achieve full control over Vision Pro's display and effectively render a 2D graph plot on it? I would appreciate guidance on the necessary steps or code snippets. P.s. As per Apple documentation For a more immersive experience, an app can open a dedicated Full Space where only that app’s content will appear. This still does not fulfill the 'flat bounded 2D' requirement as the Spaces provide an unbounded 3D immersive view.
Posted Last updated
.
Post not yet marked as solved
0 Replies
480 Views
First of all this vision api is amazing. the OCR is very accurate. I've been looking to multiprocess using the vision API. I have about 2 million PDFs I want to OCR, and I want to run multiple threads/run parallel processing to OCR each. I tried pyobjc but it does not work so well. Any suggestions on tackling this problem?
Posted
by jsunghop.
Last updated
.
Post marked as solved
2 Replies
766 Views
Trying to use VNGeneratePersonSegmentationRequest.. it seems to work but the output mask isn't at the same resolution as the source image.. so comping the result with the source produces a bad result. Not the full code, but hopefully enough to see what I'm doing. var imageRect = CGRect(x: 0, y: 0, width: image.size.width, height: image.size.height) let imageRef = image.cgImage(forProposedRect: &imageRect, context: nil, hints: nil)! let request = VNGeneratePersonSegmentationRequest() let handler = VNImageRequestHandler(cgImage: imageRef) do { try handler.perform([request]) guard let result = request.results?.first else { return } //Is this the right way to do this? let output = result.pixelBuffer //This ciImage alpha mask is a different resolution than the source image //So I don't know how to combine this with the source to cut out the foreground as they don't line up.. the res it's even the right aspect ratio. let ciImage = CIImage(cvPixelBuffer: output) ..... }
Posted
by dank.
Last updated
.
Post not yet marked as solved
1 Replies
679 Views
Why iOS 17's lift subject returns jagged edge and low resolution result image ? The output quality is totally different from iOS16. Does this just occur to beta version? or will it be the same in real iOS17 release version ?
Posted
by hmpark.
Last updated
.
Post not yet marked as solved
2 Replies
782 Views
My app uses the Vision framework to find images that are visually similar. Before WWDC23, this code worked predictably to request image observations and compare images: private func observe(cgImage: CGImage?) -> VNFeaturePrintObservation? { var returnVal: VNFeaturePrintObservation? = nil guard let cgImage = cgImage else {return returnVal} let imageRequestHandler = VNImageRequestHandler(cgImage: cgImage, options: [:]) let request = VNGenerateImageFeaturePrintRequest() request.usesCPUOnly = true do { try imageRequestHandler.perform([request]) returnVal = request.results?.first as? VNFeaturePrintObservation } catch { } return returnVal } func similarity(to compareAsset: Asset) -> Float { var dist = Float.infinity if let obs = self.observation, let compareObs = compareAsset.observation { try? obs.computeDistance(&dist, to: compareObs) } return dist } In the new frameworks, there is a new VNGenerateImageFeaturePrintRequestRevision value, and observations made with different request revisions can't be compared. If you try, you get an error: Error Domain=com.apple.vis Code=12 "The revision of the observations do not match" UserInfo={NSLocalizedDescription=The revision of the observations do not match}. The docs state that by explicitly setting the new VNGenerateImageFeaturePrintRequestRevision property on my requests, I can force the request to use a particular version. But I've updated the above code to do this, but explicitly setting the revision of my request doesn't work, and my app still gets tons of errors about mismatched request revisions. Here's the updated code: private func observe(cgImage: CGImage?) -> VNFeaturePrintObservation? { var returnVal: VNFeaturePrintObservation? = nil guard let cgImage = cgImage else {return returnVal} let imageRequestHandler = VNImageRequestHandler(cgImage: cgImage, options: [:]) let request = VNGenerateImageFeaturePrintRequest() if #available(iOS 17.0, *) { request.revision = VNGenerateImageFeaturePrintRequestRevision2 } else { request.revision = VNGenerateImageFeaturePrintRequestRevision1 } request.usesCPUOnly = true do { try imageRequestHandler.perform([request]) returnVal = request.results?.first as? VNFeaturePrintObservation } catch { print("\(type(of: self)) :: \(#function) :: error in observation request \(error)") } return returnVal } func similarity(to compareAsset: Asset) -> Float { var dist = Float.infinity if let obs = self.observation, let compareObs = compareAsset.observation { do { try obs.computeDistance(&dist, to: compareObs) } catch { print("\(type(of: self)) :: \(#function) :: error in simvalue \(error)") if (error as NSError).code == 12 { let revision = obs.requestRevision let compareRevision = compareObs.requestRevision print("\(type(of: self)) :: \(#function) :: simValue req mismatch \(revision) \(compareRevision)") } } } return dist } This breaks my app, and I can't figure out how to reliably force requests to the revision number I need. What am I doing wrong here? Will this behavior sort itself out as the SDK evolves? Thanks y'all
Posted
by hova414.
Last updated
.
Post not yet marked as solved
1 Replies
802 Views
I got a crash when right click on an image in a web page loaded by WKWebView. The log and stack trace show that it might related to VisionKit. I think it may related to copy text on image feature. I don't know how to fix it. So I just want to disable VisionKit on WKWebView to check whether it fix the issue or not. Do you know how to do it? Stack trace: The log before crash: [com.apple.VisionKit.processing] Error processing request from MAD on result: Error Domain=NSOSStatusErrorDomain Code=-50 "paramErr: error in user parameter list" UserInfo={NSLocalizedDescription=Error allocating CVPixelBuffer} request: <VKCImageAnalyzerRequest: 0x6000025562b0> requestID: 22 madRequestID: (Not Set) cancelled: NO
Posted
by huync.
Last updated
.
Post not yet marked as solved
1 Replies
413 Views
I'm using VisionKit framework for text recognition and detecting rectangles in the image. For that, I'm using VNRecognizeText & VNDetectRectangles features of the VisionKit. In macOS and iOS results, I found slight difference in the boundingBox coordinates of the text and the rectangles detected for the same image. Is this expected? Can we do anything to make the results identical? Also, on macOS, when I'm using same features of VisionKit from python (using pyobjc-framework-Vision package), there also i'm getting slightly different results.
Posted Last updated
.
Post not yet marked as solved
0 Replies
892 Views
When trying the code from this Session: https://developer.apple.com/videos/play/wwdc2023/10176/ Specifically: result.generateScaledMaskForImage(forInstances: result.allInstances, from: handler) It throws the error: -[VNInstanceMaskObservation generateScaledMaskForImageForInstances:fromRequestHandler:error:]: unrecognized selector sent to instance Any ideas?
Posted
by kcusson.
Last updated
.
Post not yet marked as solved
0 Replies
736 Views
Hi! Wondering if there's a way to add subject lifting to images in my SwiftUI app without relying on UIViewControllerRepresentable and Coordinators to adopt the ImageAnalysisInteraction protocol. Thank you!
Posted
by SAIK1065.
Last updated
.