VisionKit

RSS for tag

Scan documents with the camera on iPhone and iPad devices using VisionKit.

Posts under VisionKit tag

169 Posts

Post

Replies

Boosts

Views

Activity

ImageAnalysisInteraction doesn't call contentsRect delegate's method
Hello, I am struggling with an issue that contentsRect(for:) method of ImageAnalysisInteractionDelegate is not being called at any moment. I've set up the demo project where the interaction is added to root view of view controller while I'm analyzing the image of UIImageView that is added to this view. I want to achieve the behavior where I could define contents rect for highlights of found text on that image. P.S. I know that I could simply add an interaction to an image view but that's not the case - the real work that I want to achieve in the real project is to display live text on paused video player, so that image view here is for simplicity only. import UIKit import VisionKit class ViewController: UIViewController {       private let imageView = UIImageView()       private let imageAnalyzer = ImageAnalyzer()   private let interaction = ImageAnalysisInteraction()   override func viewDidLoad() {     super.viewDidLoad()           view.addSubview(imageView)     imageView.translatesAutoresizingMaskIntoConstraints = false           NSLayoutConstraint.activate([       imageView.leadingAnchor.constraint(equalTo: view.leadingAnchor),       imageView.trailingAnchor.constraint(equalTo: view.trailingAnchor),       imageView.centerYAnchor.constraint(equalTo: view.centerYAnchor),       imageView.heightAnchor.constraint(equalTo: view.heightAnchor, multiplier: 0.7)     ])           interaction.delegate = self       // Some image with text that I have in assets     imageView.image = UIImage(named: "IMG_5564")     imageView.contentMode = .scaleAspectFit           view.addInteraction(interaction)           interaction.setContentsRectNeedsUpdate()           DispatchQueue.main.asyncAfter(deadline: .now() + 1.0) {       self.analyze()     }   }       private func analyze() {     Task {       let imageAnalysis = try? await imageAnalyzer.analyze(         self.imageView.image!,         configuration: .init([.machineReadableCode, .text])       )               self.interaction.analysis = imageAnalysis       self.interaction.preferredInteractionTypes = .automatic       self.interaction.setContentsRectNeedsUpdate()     }   } } extension ViewController: ImageAnalysisInteractionDelegate {   func presentingViewController(for interaction: ImageAnalysisInteraction) -> UIViewController? {     return nil   }       func contentsRect(for interaction: ImageAnalysisInteraction) -> CGRect {     // >>> This method is never being called <<<     return CGRect(x: 0, y: 0, width: 1.0, height: 0.7)   }       func contentView(for interaction: ImageAnalysisInteraction) -> UIView? {     return nil   }       func interaction(_ interaction: ImageAnalysisInteraction, highlightSelectedItemsDidChange highlightSelectedItems: Bool) {     debugPrint("highlight: \(highlightSelectedItems)")   } }
1
0
1.2k
Aug ’22
DataScannerViewController Scan Text is not working
Hi, I'm using XCode 14.0 beta 4 and iOS 16.0 Beta 2, I followed the tutorial on but when I'm trying to scan text, it does not show on the CameraScanner, actually, I think it did not call the func dataScanner but kept showing me the warning message: Custom words array can only contain strings. Ignoring custome words array. So in this case I can see the camera and highlight anchor, but I cannot extract the text the VisionKit detected, any workaround? Thanks in advance.
1
0
1.1k
Aug ’22
DataScannerViewController in Objective-C
Hi, Is DataScannerViewController available to be called directly from Objective-C? I see the header file has an "objc" attribute on it, but trying to initialize it from an Objective-C file doesn't seem to be working for me. Maybe it's something I'm doing wrong, but I wanted to first clarify and confirm that if it indeed possible to use it directly in Objective-C, or not?
1
0
1.1k
Aug ’22
Getting title of document via VNRecognizedTextObservation
I am reading the image text from Vision kit (OCR) capabilities and trying to find out the title of the document. This seems pretty obvious in case if Title is on the top of the Document. But in some cases, for example, if I am reading a Business card, etc, sometimes appears somewhere in the middle of the card. While debugging, I found that there is an isTtile field (screenshot attached) VNRecognizedTextObservation. but I am not able to access it? is this private? Although I don't see a clear reason to have this property to be private.
1
1
937
Jul ’22
Using DataScannerViewController with async stream
Hi, The presentation "Capture Machine Readable Codes and Text with VisionKit" mentions at the end that the DataScannerViewController can be used with an async stream. In the presentation, there is a code snipper for the updateViewAsyncStream method, but it's not really used anywhere. How do utilize this when the DataScannerViewController is active and capture the recognized items? Also there is a sendDidChangeNotification() function sat the end but the compiler complains that it's not in scope. Thanks.
2
0
2.0k
Jul ’22
[VisionKit Text Recognition] boundingBox(for:) returns wrong results when used with .accurate recognition level
I'm using the Vision OCR (with VNRecognizeTextRequest) to detect text on images. For our specific use-case, we need to know the position of each of the letters, and we can do this with the function: recognizedText.boundingBox(for: (idx1..<idx2)) (where idx2 = idx1 + 1). However, this results is only valid when the recognition level flag of the request is set to .fast, as when it is set to .accurate, the bounding box for any letter is not the bounding box of the letter itself, but the bounding box of the whole word containing the letter. Basically, this is the same problem as the one described here: https://developer.apple.com/forums/thread/131510 The issue is we cannot use the .fast recognition level, as the text might be tilted, plus the letters are often hard to read with pretty bad contrast, and this produces unusable results with the .fast setting. Does anyone know: if there is a way to directly extract the bounding box of the letters from the VNRecognizedTextObservation with the .accurate setting ? if there is an update / feature adjust planned on this issue, or if the Vision Dev team doesn't care about this issue ? Is there even a way to ask for a Bug fix on this issue for the dev team ? We do really need this feature, so any info is a good info. Thanks in advance for your answers.
1
0
1.2k
May ’22
Dynamic Text overlay on live camera feed
Please can anyone suggest if they have attempted to have camera on and dynamic text overlaying done depending on what is identified in the view. Eg. point a camera to the fruit and i should be able to identify the fruit and display text over the camera feed. The moment i move to next object it should ask me if ii want to save this or discard to move to new object.
0
0
720
Apr ’22
Order of points in VNFaceLandmarkRegion2D
In my app, I am performing a VNDetectFaceLandmarksRequest with a VNSequenceRequestHandler. The video that serves as my input is from my iPhones selfie-camera. The request returns the VNFaceLandmarkRegion2D from where I get all the landmarks as an array of CGPoints via VNFaceLandmarkRegion2D.normalizedPoints I want to compare all the CGPoint-arrays over time, but I am not sure if a point at a certain index is always representing the same landmark. Can I assume that a specific landmark, e.g. the left-most landmark of the right eye, always has the same index in the CGPoint-array?
1
0
910
Mar ’22
Bad quality of scanned documents
Hey guys, facing the issue that scanned documents on my iPhone 12 Pro Max with Files app are pretty bad quality. Guess it started with iOS 15 beta 3. Unfortunately issue still persists with current non beta iOS 15 release. It‘s the same on iPad OS 15. When I launch ‚scan with iPhone’ using Preview app on macOS quality is good as always. Hence looks like issue is related on files app or PDF processing on iPhone. Have anybody else seen the same? Thanx and cheers, Flory
33
1
15k
Jan ’22
I am a new developer. How can i make my output text editable?
I have made a Scan to Text app with the help of sources from the internet, but I can’t figure out a way to get my output text to be editable. Here’s my code private func makeScannerView()-> ScannerView {         ScannerView(completion: {             textPerPage in             if let outputText = textPerPage?.joined(separator: "\n").trimmingCharacters(in: .whitespacesAndNewlines){                 let newScanData = ScanData(content: outputText)                 self.texts.append(newScanData)             }             self.showScannerSheet = false                      })     }
0
0
752
Jan ’22
How to get the saliency mask of the VNDetectDocumentSegmentationRequest
In one of the WWDC videos, the VNDetectDocumentSegmentationRequest result is described in the following way: The result of the request is a low resolution segmentation mask, where each pixel represents a confidence if that pixel is part of the detected document or not. In addition it provides the four corner points of the quadrilateral. Similarly, in the VNDetectDocumentSegmentationRequest docs there's the following statement: The result that the request generates contains the four corner points of a document’s quadrilateral and saliency mask. So the first part ("four corner points of a document’s quadrilateral") is easy - it's in the results of the request, which are in VNRectangleObservation format: let request = VNDetectDocumentSegmentationRequest { (request, error) in guard let results = request.results as? [VNRectangleObservation] else { // Failed } // Process VNRectangleObservations } but how do I obtain the "low resolution segmentation mask" / "saliency mask" for VNDetectDocumentSegmentationRequest?
1
0
1.1k
Dec ’21
VNRecognizedText confidence values are only 0.5 and 1.0
I'm using Vision to conduct some OCR from a live camera feed. I've setup my VNRecognizeTextRequests as follows: let request = VNRecognizeTextRequest(completionHandler: recognizeTextCompletionHandler) request.recognitionLevel = .accurate request.usesLanguageCorrection = false And I handle the results as follows: guard let observations = request.results as? [VNRecognizedTextObservation] else { return } for observation in observations { if let recognizedText = observation.topCandidates(1).first { guard recognizedText.confidence >= self.confidenceLimit, // set to 0.5 let foundText = validateRegexPattern(text: recognizedText.string, regexPattern: self.regexPattern), let foundDecimal = Double(foundText) else { continue } } This is actually working great and yielding very accurate results, but the confidence values I'm receiving from the results are generally either 0.5 or 1.0, and rarely 0.3. I find these to be pretty nonsensical confidence values and I'm wondering if this is the intended result or some sort of bug. Conversely, using recognitionLevel = .fast yields more realistic and varied confidence values, but much less accurate results overall (even though fast is recommended for OCR from a live camera feed, I've had significantly better results using the accurate recognition level, which is why I've been using the accurate recognition level)
0
0
1.2k
Nov ’21
Recognizing text
I'm using VNImageRequestHandler to recognize text using the camera. In my handler I'm using the topLeft, topRight, bottomLeft, bottomRight properties, which I'm scaling to the size of the canvas, to draw an outline around each text object. When I do this the Y position and Height are correct, but the Width is slightly smaller, and the X position centers the outline around the text. Any idea why this would be a different size?
2
0
1.2k
Nov ’21
ImageAnalysisInteraction doesn't call contentsRect delegate's method
Hello, I am struggling with an issue that contentsRect(for:) method of ImageAnalysisInteractionDelegate is not being called at any moment. I've set up the demo project where the interaction is added to root view of view controller while I'm analyzing the image of UIImageView that is added to this view. I want to achieve the behavior where I could define contents rect for highlights of found text on that image. P.S. I know that I could simply add an interaction to an image view but that's not the case - the real work that I want to achieve in the real project is to display live text on paused video player, so that image view here is for simplicity only. import UIKit import VisionKit class ViewController: UIViewController {       private let imageView = UIImageView()       private let imageAnalyzer = ImageAnalyzer()   private let interaction = ImageAnalysisInteraction()   override func viewDidLoad() {     super.viewDidLoad()           view.addSubview(imageView)     imageView.translatesAutoresizingMaskIntoConstraints = false           NSLayoutConstraint.activate([       imageView.leadingAnchor.constraint(equalTo: view.leadingAnchor),       imageView.trailingAnchor.constraint(equalTo: view.trailingAnchor),       imageView.centerYAnchor.constraint(equalTo: view.centerYAnchor),       imageView.heightAnchor.constraint(equalTo: view.heightAnchor, multiplier: 0.7)     ])           interaction.delegate = self       // Some image with text that I have in assets     imageView.image = UIImage(named: "IMG_5564")     imageView.contentMode = .scaleAspectFit           view.addInteraction(interaction)           interaction.setContentsRectNeedsUpdate()           DispatchQueue.main.asyncAfter(deadline: .now() + 1.0) {       self.analyze()     }   }       private func analyze() {     Task {       let imageAnalysis = try? await imageAnalyzer.analyze(         self.imageView.image!,         configuration: .init([.machineReadableCode, .text])       )               self.interaction.analysis = imageAnalysis       self.interaction.preferredInteractionTypes = .automatic       self.interaction.setContentsRectNeedsUpdate()     }   } } extension ViewController: ImageAnalysisInteractionDelegate {   func presentingViewController(for interaction: ImageAnalysisInteraction) -> UIViewController? {     return nil   }       func contentsRect(for interaction: ImageAnalysisInteraction) -> CGRect {     // >>> This method is never being called <<<     return CGRect(x: 0, y: 0, width: 1.0, height: 0.7)   }       func contentView(for interaction: ImageAnalysisInteraction) -> UIView? {     return nil   }       func interaction(_ interaction: ImageAnalysisInteraction, highlightSelectedItemsDidChange highlightSelectedItems: Bool) {     debugPrint("highlight: \(highlightSelectedItems)")   } }
Replies
1
Boosts
0
Views
1.2k
Activity
Aug ’22
How to get the filters in VNDocumentCameraViewController without open a camara?
There are 4 filters in VNDocumentCameraViewController, "Color", "Grayscal", "Black & White", "Photo". Is there a way to apply the filters on UIImages directly without open a camara?
Replies
1
Boosts
1
Views
1.2k
Activity
Aug ’22
DataScannerViewController Scan Text is not working
Hi, I'm using XCode 14.0 beta 4 and iOS 16.0 Beta 2, I followed the tutorial on but when I'm trying to scan text, it does not show on the CameraScanner, actually, I think it did not call the func dataScanner but kept showing me the warning message: Custom words array can only contain strings. Ignoring custome words array. So in this case I can see the camera and highlight anchor, but I cannot extract the text the VisionKit detected, any workaround? Thanks in advance.
Replies
1
Boosts
0
Views
1.1k
Activity
Aug ’22
DataScannerViewController in Objective-C
Hi, Is DataScannerViewController available to be called directly from Objective-C? I see the header file has an "objc" attribute on it, but trying to initialize it from an Objective-C file doesn't seem to be working for me. Maybe it's something I'm doing wrong, but I wanted to first clarify and confirm that if it indeed possible to use it directly in Objective-C, or not?
Replies
1
Boosts
0
Views
1.1k
Activity
Aug ’22
Getting title of document via VNRecognizedTextObservation
I am reading the image text from Vision kit (OCR) capabilities and trying to find out the title of the document. This seems pretty obvious in case if Title is on the top of the Document. But in some cases, for example, if I am reading a Business card, etc, sometimes appears somewhere in the middle of the card. While debugging, I found that there is an isTtile field (screenshot attached) VNRecognizedTextObservation. but I am not able to access it? is this private? Although I don't see a clear reason to have this property to be private.
Replies
1
Boosts
1
Views
937
Activity
Jul ’22
Using DataScannerViewController with async stream
Hi, The presentation "Capture Machine Readable Codes and Text with VisionKit" mentions at the end that the DataScannerViewController can be used with an async stream. In the presentation, there is a code snipper for the updateViewAsyncStream method, but it's not really used anywhere. How do utilize this when the DataScannerViewController is active and capture the recognized items? Also there is a sendDidChangeNotification() function sat the end but the compiler complains that it's not in scope. Thanks.
Replies
2
Boosts
0
Views
2.0k
Activity
Jul ’22
Use the camera for keyboard input in your app
please give some example code for scan text with using button
Replies
1
Boosts
0
Views
1.1k
Activity
Jun ’22
[VisionKit Text Recognition] boundingBox(for:) returns wrong results when used with .accurate recognition level
I'm using the Vision OCR (with VNRecognizeTextRequest) to detect text on images. For our specific use-case, we need to know the position of each of the letters, and we can do this with the function: recognizedText.boundingBox(for: (idx1..<idx2)) (where idx2 = idx1 + 1). However, this results is only valid when the recognition level flag of the request is set to .fast, as when it is set to .accurate, the bounding box for any letter is not the bounding box of the letter itself, but the bounding box of the whole word containing the letter. Basically, this is the same problem as the one described here: https://developer.apple.com/forums/thread/131510 The issue is we cannot use the .fast recognition level, as the text might be tilted, plus the letters are often hard to read with pretty bad contrast, and this produces unusable results with the .fast setting. Does anyone know: if there is a way to directly extract the bounding box of the letters from the VNRecognizedTextObservation with the .accurate setting ? if there is an update / feature adjust planned on this issue, or if the Vision Dev team doesn't care about this issue ? Is there even a way to ask for a Bug fix on this issue for the dev team ? We do really need this feature, so any info is a good info. Thanks in advance for your answers.
Replies
1
Boosts
0
Views
1.2k
Activity
May ’22
Dynamic Text overlay on live camera feed
Please can anyone suggest if they have attempted to have camera on and dynamic text overlaying done depending on what is identified in the view. Eg. point a camera to the fruit and i should be able to identify the fruit and display text over the camera feed. The moment i move to next object it should ask me if ii want to save this or discard to move to new object.
Replies
0
Boosts
0
Views
720
Activity
Apr ’22
No option to scan in notes
Hi, I have followed instruction but I do not have the option to scan a document in notes, any suggestion what I could do, this was a feature I was looking forward too :-(
Replies
0
Boosts
0
Views
500
Activity
Apr ’22
Finding API on "scanning the document"
I'm finding the API on "scanning the document" which used on Notes App. I want to build functions of scanning a document automatically. So I need it. Please, tell me.
Replies
0
Boosts
0
Views
746
Activity
Mar ’22
Order of points in VNFaceLandmarkRegion2D
In my app, I am performing a VNDetectFaceLandmarksRequest with a VNSequenceRequestHandler. The video that serves as my input is from my iPhones selfie-camera. The request returns the VNFaceLandmarkRegion2D from where I get all the landmarks as an array of CGPoints via VNFaceLandmarkRegion2D.normalizedPoints I want to compare all the CGPoint-arrays over time, but I am not sure if a point at a certain index is always representing the same landmark. Can I assume that a specific landmark, e.g. the left-most landmark of the right eye, always has the same index in the CGPoint-array?
Replies
1
Boosts
0
Views
910
Activity
Mar ’22
dynamic reading of text in live camera feed
i want to read text and dynamically display what the word means on the live camera feed.... any ideas? Should be able to read maximum 2 words on screen but user can tap to select the word/image & save the screen too..
Replies
0
Boosts
0
Views
849
Activity
Mar ’22
Cannot find type "..." in scope
I am trying to make a QR code scanner. However, it keeps saying that the scanner is not in scope. How can I fix this?
Replies
3
Boosts
0
Views
6.7k
Activity
Feb ’22
Bad quality of scanned documents
Hey guys, facing the issue that scanned documents on my iPhone 12 Pro Max with Files app are pretty bad quality. Guess it started with iOS 15 beta 3. Unfortunately issue still persists with current non beta iOS 15 release. It‘s the same on iPad OS 15. When I launch ‚scan with iPhone’ using Preview app on macOS quality is good as always. Hence looks like issue is related on files app or PDF processing on iPhone. Have anybody else seen the same? Thanx and cheers, Flory
Replies
33
Boosts
1
Views
15k
Activity
Jan ’22
I am a new developer. How can i make my output text editable?
I have made a Scan to Text app with the help of sources from the internet, but I can’t figure out a way to get my output text to be editable. Here’s my code private func makeScannerView()-> ScannerView {         ScannerView(completion: {             textPerPage in             if let outputText = textPerPage?.joined(separator: "\n").trimmingCharacters(in: .whitespacesAndNewlines){                 let newScanData = ScanData(content: outputText)                 self.texts.append(newScanData)             }             self.showScannerSheet = false                      })     }
Replies
0
Boosts
0
Views
752
Activity
Jan ’22
Vision : VNDetectRectanglesRequest Error in iOS15
The VNDetectorOption_OriginatingRequestSpecifier required option was not found" UserInfo={NSLocalizedDescription=The VNDetectorOption_OriginatingRequestSpecifier required option was not found Facing this error in only iOS15 while finding observation.
Replies
4
Boosts
0
Views
1.4k
Activity
Jan ’22
How to get the saliency mask of the VNDetectDocumentSegmentationRequest
In one of the WWDC videos, the VNDetectDocumentSegmentationRequest result is described in the following way: The result of the request is a low resolution segmentation mask, where each pixel represents a confidence if that pixel is part of the detected document or not. In addition it provides the four corner points of the quadrilateral. Similarly, in the VNDetectDocumentSegmentationRequest docs there's the following statement: The result that the request generates contains the four corner points of a document’s quadrilateral and saliency mask. So the first part ("four corner points of a document’s quadrilateral") is easy - it's in the results of the request, which are in VNRectangleObservation format: let request = VNDetectDocumentSegmentationRequest { (request, error) in guard let results = request.results as? [VNRectangleObservation] else { // Failed } // Process VNRectangleObservations } but how do I obtain the "low resolution segmentation mask" / "saliency mask" for VNDetectDocumentSegmentationRequest?
Replies
1
Boosts
0
Views
1.1k
Activity
Dec ’21
VNRecognizedText confidence values are only 0.5 and 1.0
I'm using Vision to conduct some OCR from a live camera feed. I've setup my VNRecognizeTextRequests as follows: let request = VNRecognizeTextRequest(completionHandler: recognizeTextCompletionHandler) request.recognitionLevel = .accurate request.usesLanguageCorrection = false And I handle the results as follows: guard let observations = request.results as? [VNRecognizedTextObservation] else { return } for observation in observations { if let recognizedText = observation.topCandidates(1).first { guard recognizedText.confidence >= self.confidenceLimit, // set to 0.5 let foundText = validateRegexPattern(text: recognizedText.string, regexPattern: self.regexPattern), let foundDecimal = Double(foundText) else { continue } } This is actually working great and yielding very accurate results, but the confidence values I'm receiving from the results are generally either 0.5 or 1.0, and rarely 0.3. I find these to be pretty nonsensical confidence values and I'm wondering if this is the intended result or some sort of bug. Conversely, using recognitionLevel = .fast yields more realistic and varied confidence values, but much less accurate results overall (even though fast is recommended for OCR from a live camera feed, I've had significantly better results using the accurate recognition level, which is why I've been using the accurate recognition level)
Replies
0
Boosts
0
Views
1.2k
Activity
Nov ’21
Recognizing text
I'm using VNImageRequestHandler to recognize text using the camera. In my handler I'm using the topLeft, topRight, bottomLeft, bottomRight properties, which I'm scaling to the size of the canvas, to draw an outline around each text object. When I do this the Y position and Height are correct, but the Width is slightly smaller, and the X position centers the outline around the text. Any idea why this would be a different size?
Replies
2
Boosts
0
Views
1.2k
Activity
Nov ’21