VisionKit

Scan documents with the camera on iPhone and iPad devices using VisionKit.

VisionKit Documentation

Post

Replies

Boosts

Views

Activity

Vision - Identifying Trajectories in Video and use them for angle measurement

Hi, i would like to detect a trajectory in a video like this example As i understood it right, it will give me the trajectory of the moving part in the video back. My question is, once i have the measured trajectory (iphone is mounted on a tripod), can i send a picture/photo (iphone is mounted on a tripod on same spot) to the AI Vision with the trajectory sequence and it will give me the position index of the trajectory? What i want to know is the angle/position of a rotating object. For example iPhone is always mounted on the same spot and the object too. The object has an arc trajectory like rotation from -180°(Left of iPhone camera) to 0°(In front) and to 180°(Right). If the initiation of the trajectory is started and we know the trajectory formula, is it possible to know the actual angle position of the object by taking a picture sending it to AI Vision with the recorded trajectory? My goal is to ask every second the position angle of the object using AI Vision. Best regards, Michael

Machine Learning & AI General Vision VisionKit

825

Jan ’23

Thread 1: EXC_BAD_ACCESS

Hi! I am making a phone number recognition app based off of Apple's example code. I am new to swift and coding in general. When I run the project I get the "Thread 1: EXC_BAD_ACCESS (code=257, address=0x7e700019ec0ad79)" error on line 68 "previewView.session = captureSession" . I think it has something to do with line 26 "@IBOutlet weak var previewView: PreviewView!" ? I have a view controller with 2 views and one label. The IB Outlets seem fine for two of them but does not look the same for line 26.. the color of the text PreviewView! is gray instead of blue like the others. Could this be the problem? import AVFoundation import Vision class TextScanViewController: UIViewController { // MARK: - UI objects @IBOutlet weak var previewView: PreviewView! @IBOutlet weak var cutoutView: UIView! @IBOutlet weak var numberView: UILabel! var maskLayer = CAShapeLayer() // Device orientation. Updated whenever the orientation changes to a // different supported orientation. var currentOrientation = UIDeviceOrientation.portrait // MARK: - Capture related objects private let captureSession = AVCaptureSession() let captureSessionQueue = DispatchQueue(label: "com.example.apple-samplecode.CaptureSessionQueue") var captureDevice: AVCaptureDevice? var videoDataOutput = AVCaptureVideoDataOutput() let videoDataOutputQueue = DispatchQueue(label: "com.example.apple-samplecode.VideoDataOutputQueue") // MARK: - Region of interest (ROI) and text orientation // Region of video data output buffer that recognition should be run on. // Gets recalculated once the bounds of the preview layer are known. var regionOfInterest = CGRect(x: 0, y: 0, width: 1, height: 1) // Orientation of text to search for in the region of interest. var textOrientation = CGImagePropertyOrientation.up // MARK: - Coordinate transforms var bufferAspectRatio: Double! // Transform from UI orientation to buffer orientation. var uiRotationTransform = CGAffineTransform.identity // Transform bottom-left coordinates to top-left. var bottomToTopTransform = CGAffineTransform(scaleX: 1, y: -1).translatedBy(x: 0, y: -1) // Transform coordinates in ROI to global coordinates (still normalized). var roiToGlobalTransform = CGAffineTransform.identity // Vision -> AVF coordinate transform. var visionToAVFTransform = CGAffineTransform.identity // MARK: - View controller methods override func viewDidLoad() { super.viewDidLoad() // Set up preview view. previewView.session = captureSession

Programming Languages Swift Swift Xcode Vision VisionKit

1.4k

Jan ’23

VNDocumentCameraViewController button not working

The button, to manually capture the document, is not centered and does not work. Selecting it has no result and there is no highlighted state. When you want to scan a document manually, nothing happens. The auto mode still captures a document, but nothing happens when you press the button.

Machine Learning & AI General VisionKit

840

Jan ’23

How to convert VNRectangleObservation item to UIImage in SwiftUI

I was able to identify squares from a images using VNDetectRectanglesRequest. Now I want those rectangles to store as separate images (UIImage or cgImage). Below is what I tried. let rectanglesDetection = VNDetectRectanglesRequest { request, error in rectangles = request.results as! [VNRectangleObservation] rectangles.sort{$0.boundingBox.origin.y > $1.boundingBox.origin.y} for rectangle in rectangles { let rect = rectangle.boundingBox let imageRef = cgImage.cropping(to: rect) let image = UIImage(cgImage: imageRef!, scale: image!.scale, orientation: image!.imageOrientation) checkBoxImages.append(image) } Can anybody point out what's wrong or what should be the best approach?

Programming Languages Swift Swift Vision VisionKit

Jan ’23

Triggering ImageAnalysisInteraction from custom button/UI

I currently use live text over images using a ImageAnalysisInteraction instance added as an interaction to a UIImage instance with preferredInteractionTypes set to [.automatic]. I get the default live text button (At the bottom right) as usual and it works well. However, I would like to implement the button within our own UI so it better matches our app's style and interface. I see there is an isSupplementaryInterfaceHidden property that enables hiding the default buttons, but how does one go about triggering the same interaction using a custom UIButton after the default buttons are hidden? I see that a couple of apps have implemented this but I'm unable to find helpful pointers on how to achieve this. Information regarding this is highly appreciated. TIA.

UI Frameworks UIKit UIKit Vision VisionKit

829

Dec ’22

VisionKit - get bounding boxes from ImageAnalysis

I am developing a command line application to extract text from images and PDF files. The ImageAnalysis class from VisionKit provides high quality OCR but does not appear to have functionality to get the position of extracted text (words, etc.). This functionality appears to be in place in a private unexposed API, since the ImageAnalysisOverlayView is able to leverage it to show the live text interface. Is there any way to get this information in a terminal application with no displayed UI? (Note: I filed a feedback request for this over 3 months ago and have yet to hear back)

Machine Learning & AI General VisionKit

Dec ’22

Why is my ml model returning VNCoreMLFeatureValueObservation instead VNRecognizedObjectObservation?

I'm training a machine learning model in PyTorch using YOLOv5 from Ultralytics. CoreMLTools from Apple is used to convert the PyTorch (.pt) model into a CoreML model (.mlmodel). This works fine, and I can use it in my iOS App, but I have to access the prediction output of the Model "manually". The output shape of the model is MultiArray : Float32 1 × 25500 × 46 array. From the VNCoreMLRequest I receive only VNCoreMLFeatureValueObservation from this I can get the MultiArray and iterate through it, to find the data I need. But I see that Apple offers for Object Detection models VNRecognizedObjectObservation type, which is not returned for my model. What is the reason why my model is not supported to return the VNRecognizedObjectObservation type? Can I use CoreMLTools to enable it?

Machine Learning & AI General Vision VisionKit Machine Learning Core ML

Dec ’22

Recognizing LCD/LED number digits

I want to make a feature in my App, where a user can use the Camera to capture the data of an LCD screen, for example a thermostat or a digital clock. I downloaded the sample Text recognition project and tweaked it to only search for numbers. But whenever I show it a photo of a digital number (LCD/LED) it just cannot recognize it. So I searched for an answer, and I came to the conclusion that I have to train a model with these kind of numbers. My question is, how should I do it, what kind of photos should I take for training? Thank you for the help in advance! Have a nice WWDC

Machine Learning & AI Core ML Vision VisionKit Core ML

2.4k

Dec ’22

ImageAnalyzer running VNRecognizeTextRequest in the background?

Using VisionKit, there's two main ways to get text from images. ImageAnalyzer VNRecognizeRequest Based on some OCR tests, I'm seeing that the outputs from these two methods are different. Initially, I thought ImageAnalyzer was running VNRequestTextRecognitionLevel.fast because it's for Live Text, but the outputs from ImageAnalyzer are sometimes better than VNRequestTextRecognitionLevel.accurate. Is ImageAnalyzer running VNRequestTextRecognition in the background? Or if it isn't, what pipeline is it using to detect text?

App & System Services General VisionKit Live Text

901

Dec ’22

Frames dropped after too many VNRecognizeTextRequest and VNDetectBarcodesRequest are executed under few seconds

I am developing an iOS application to recognize texts and barcodes with the camera using VNRecognizeTextRequest and VNDetectBarcodesRequest. The application performs loop detections and I am facing a frame dropped issue when 13 VNRecognizeTextRequest and VNDetectBarcodesRequest are performed in a row under few seconds. Before the frames are dropped, I do not see any errors during the 13 detections performed. When frames are dropped it is for an undetermined period of time but the detection manages to start again after. Does anyone experienced this problem before ? Is there a hardware limitation that does not allow to perform more than 13 detections in a row to preserve the device battery or limit the use of resources ? I already noticed that frame were dropped when the analysis frequency was lower than the image arrival frequency. This is understandable since the buffer queue is a FIFO. For example when I test the application on an old iPad since the analysis frequency is too low (in average 1 detection is performed in 1.2 seconds), frames are dropped during the analysis but the problem with the 13 detections does not occur. However, I do not understand why a too high analysis frequency is a problem. Thanks in advance

Machine Learning & AI General Vision VisionKit

1.1k

Nov ’22

VisionKit DataScanner

I’m using datascanner to scan timestamps. I seem to have a few issues that are affecting my ability to use it properly. First if I set the text type to datetimeduration this seems like I would be my preferred option however sometimes the scanner locks only on the date portion of the timestamp and not the date/time combined. Any way to tell it that it should look for the time also? because that didn’t work I just set it to scan “all”. Then I use regex to look in the larger amount of text for the timestamp format I’m looking for. My next issue is perhaps due to fonts but quite often : and 3 gets read as 8 so then I end up with invalid timestamps due to getting a timestamp where either the colons are 8 or there’s something like 85 minutes instead of 35. Seems like if visionkit had an idea of the font classes I was primarily looking at perhaps it could scan with better reliability. Any way to do that or any way to suggest for the future maybe? I’ve done a little of my own processing but I feel like I’m trying to apply multiple band aids rather than just solving the initial problem.

Machine Learning & AI General VisionKit

952

Nov ’22

Xcode 14 - iOs 16.0.2 - attempt to insert nil object from objects[0]

Since I update Xcode (13 -> 14) and install iOs 16 on iPhone, I have this fatal error when I try to launch VNDocumentCameraViewController I do not understand why and where the error is? before the update, the scanner worked and I haven't got this message NSInvalidArgumentException', reason: '*** -[__NSPlaceholderArray initWithObjects:count:]: attempt to insert nil object from objects[0]' let scannerViewController = VNDocumentCameraViewController() scannerViewController.delegate = self present(scannerViewController, animated: true) code-block

Machine Learning & AI General VisionKit

3.4k

Oct ’22

VNDocumentCameraViewController - crash with fatal error (iOs16)

With iOS 16, I have a fatal error when I launch VNDocumentCameraViewController let myScanViewController = VNDocumentCameraViewController() myScanViewController.delegate = self self.present(myScanViewController, animated: false) On my project, this function worked file with older operating system (like iOS 14, iOS 15) info.plist --> NSCameraUsageDescription is present with description Before the camera view appear, I'v got a fatal error and I do not know where to check or add a breakpoint to know where it try to add a nil object Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: [__NSPlaceholderArray initWithObjects:count:]: attempt to insert nil object from objects[0]'

Machine Learning & AI General VisionKit

1.7k

Oct ’22

Detect images in document scans with Vision?

Hello, I am working on an app that scans documents and recognizes the text with help of Vision framework. This works great. I would also like to "recognize" or detect individual images which are part of the document. Does Vision has any support for this or should I be looking into training my own ML model? Below is an example document - I would like to extract the text (already done) and also the image of the building.

App & System Services Core OS iOS Vision VisionKit

1.2k

Oct ’22

How to Text Recognition in Vision Framework by custom

Hi, I want to text recognition have structure below. Currently, I use Text Recognition in Vision Framework to detect text in image like structure below image. The result after detect: "amazing crictic wink fold disagree robot access segment jar ennery brain club"(detect by column group). But I want to expect result: "amazing disagree jar crictic robot ennery wink acess brain fold segment club"(detect from top to bottom and left to right). How to detect text by from top to below and left to right? Thanks.

Machine Learning & AI General Vision VisionKit

788

Oct ’22

In Live Text API, text selection or clicking the highlight button on the right does not work

I am implementing Live Text function using ImageAnalysisInteraction and ImageAnalyzer. After loading an image file into UIImageView , I am trying to implement a function to select like a TextView using Live Text function. On the right side of the UIImageView, a button to change the Live Text highlight state is displayed. However, there is a phenomenon that this button cannot be clicked. Also, even if you change the highlight state to Live Text, it is not selected like UITextView. imageView.addInteraction(interaction) I added an interaction to the imageview like this. Even if I keep changing preferredInteractionTypes to multiple types, there is no choice. interaction.view!.isUserInteractionEnabled = true imageView.isUserInteractionEnabled = true So I also changed the isUserInteractionEnabled value to true . Have any of you solved this problem?

App & System Services General Vision VisionKit Live Text

Oct ’22

Error when using Live Text API

I am developing a function using Live Text. (I used ImageAnalyzer and ImageAnalysisInteraction.) When executed, the following error log is displayed and the Live Text Button is displayed but not clicked or highlighted. That is, it does nothing. [Unknown process name] Error: this application, or a library it uses, has passed an invalid numeric value (NaN, or not-a-number) to CoreGraphics API and this value is being ignored. Please fix this problem. [Unknown process name] If you want to see the backtrace, please set CG_NUMERICS_SHOW_BACKTRACE environmental variable. [api] -[CIImage initWithCVPixelBuffer:options:] failed because the buffer is nil. Are there any possible causes and solutions for this not working? Or is there a setting I'm missing in order to use LiveText? Below is part of the code. Changing preferredInteractionTypes to all types doesn't change anything. UIImageViews are displayed by switching left and right through UIPageViewController and calling the LiveText function. Task { var analyzer:ImageAnalyzer? = ImageAnalyzer() if analyzer == nil { return } let configuration = ImageAnalyzer.Configuration([.text]) do { let pAnalysis = try await analyzer!.analyze(image!, configuration: configuration) DispatchQueue.main.async { //interaction!.preferredInteractionTypes = .automatic interaction.preferredInteractionTypes = .textSelection //interaction!.preferredInteractionTypes = .dataDetectors //interaction!.preferredInteractionTypes = [.textSelection, .dataDetectors] //interaction.preferredInteractionTypes = [.textSelection, .dataDetectors, .automatic] interaction.selectableItemsHighlighted = true interaction.analysis = pAnalysis interaction.setContentsRectNeedsUpdate() } } catch {} analyzer = nil }

App & System Services General Vision VisionKit Live Text

1.4k

Sep ’22

visionKit

I might be mistaken, but I believe visionKit has been made integral to ios16 and/or xCode14. My app now has a lot of visionKit entries in the log. I have a sense that this is expensive on a scrolling tableview of videos. First, how is it possible to simply prevent the prying eyes of visionKit from performing their invasive tasks? Second, am I mistaken about the added weight, especially once the kit gets its teeth into a video? Third, is this why the controls do not show up, because no displayed controls is very confusing to the app user? TIA

Machine Learning & AI General VisionKit

927

Sep ’22

Unable to turn on Voice Control

After I have upgraded iOS to the last public Beta iOS 16.1, Voice Control didn’t turn on and I have gotten the message “Unable to turn on Voice Control. Failed to download necessary files”. I was using Voice Control before installing the update and I was connected to WiFi. The same trouble happened with iPadOS when I have updated to iPadOS 16.1.

Machine Learning & AI General VisionKit

676

Sep ’22

Live Text API fails: "failed because the buffer is nil"

Hello, I am trying to play around with the Live Text API according to this docs - https://developer.apple.com/documentation/visionkit/enabling_live_text_interactions_with_images?changes=latest_minor But it always fails with [api] -[CIImage initWithCVPixelBuffer:options:] failed because the buffer is nil. I am running this on a UIImage instance that I got from VNDocumentCameraViewController. This is my current implementation that I run after the scanned image is displayed: private func setupLiveText() { guard let image = imageView.image else { return } let interaction = ImageAnalysisInteraction() imageView.addInteraction(interaction) Task { let configuration = ImageAnalyzer.Configuration([.text]) let analyzer = ImageAnalyzer() do { let analysis = try await analyzer.analyze(image, configuration: configuration) DispatchQueue.main.async { interaction.analysis = analysis } } catch { print(error.localizedDescription) } } } It does not fail, it returns non-nil analysis object, but setting it to the interaction does nothing. I am testing this on iPhone SE 2020 which has the A13 chip. This feature requires A12 and up.

Machine Learning & AI General VisionKit Vision

2.4k

Sep ’22

Vision - Identifying Trajectories in Video and use them for angle measurement