Apply computer vision algorithms to perform a variety of tasks on input images and video using Vision.

Vision Documentation

Posts under Vision tag

80 Posts
Sort by:
Post not yet marked as solved
0 Replies
302 Views
Hi, Developer Friends! iOS 15 Vision boundingBox behaviour different from iOS 14? Since iOS 15, the boundingBox of a scan is not working as effectively as in iOS 14 for iPhone 11 Pro when multiple boxes are present together. Example, this project from Apple WWDC2019 doesn't work well in iOS 15 as automatic boundingBox is sometimes warped/not focusing on logical rectangular box of "Scan Other" with multiple rectangular text boxes are adjacent. My "Scan Other" is a square text box with thick black borders and has adjacent (text) boxes. Previous behaviour saw the the cropping at the black borders and now the scan crops at edges further out from thick black (obvious) borders. see: Business Companion introduced in WWDC 2019: https://developer.apple.com/documentation/vision/structuring_recognized_text_on_a_document Do you have any thoughts? Environment details & steps: Steps: load above Apple app with iOS 15, XCode 13, iPhone 11Pro and then "Scan Other". Document I'm using is here. It has 6 boxes and I want the scanning to see only one so I try to focus on only one box: https://docs.google.com/document/d/1jVS72iQJ18W-ax8fRdzwyh_FmMDb-X14/edit?usp=sharing&ouid=114863480924239772650&rtpof=true&sd=true
Posted
by
Post not yet marked as solved
0 Replies
355 Views
I'm using VNRecognizeTextRequest with: request.recognitionLevel = .accurate request.usesLanguageCorrection = false request.recognitionLanguages = ["en-US", "de-DE"] Basically code is taken from https://developer.apple.com/documentation/vision/reading_phone_numbers_in_real_time But when performs it by VNImageRequestHandler. I'm getting the following warning: Could not determine an appropriate width index for aspect ratio 0.0062 Could not determine an appropriate width index for aspect ratio 0.0078 Could not determine an appropriate width index for aspect ratio 0.0089 ... I tried to use fast for recognitionLevel and it helped but results are not that good as in accurate level. Can you suggest how to fix the problem accurate?
Posted
by
Post not yet marked as solved
0 Replies
240 Views
Hi,  Apologies, but I am completely new to Apple development, struggling to find the right information that I need, and would really appreciate some pointers from experienced developers as to the best approach for a project I am starting. The use case I have relates to using properties of colour to predict the density of a fluid from a photograph. Each photograph will simply be a single colour, the properties of the photograph (colour intensity / brightness / saturation) will vary between each photograph as the density of the fluid changes and I am looking to use these (or possibly other similar properties) to determine a value for the fluid density. What I would like to ask is: 1- Do you think CoreML is the best approach to use for predicting the density based upon the colour properties of the photograph, or should I start somewhere else? 2- Can you point me to any helpful related documentation which will help me get started. I hope someone can help. Many thanks in advance Steve
Posted
by
Post not yet marked as solved
0 Replies
326 Views
I already installed tensorflow latest version using the documentation given (link). But when I tried to run notebook with command "%tensorflow_version 2.x" , its giving error "UsageError: Line magic function %tensorflow_version not found.". Please tell me, what to do ?
Posted
by
006
Post not yet marked as solved
3 Replies
404 Views
I have a SwiftUI app that I want to limit the size of a barcode scanner to a AVCaptureVideoPreviewLayer with a size of CGRect(x: 0, y: 0, width: 335, height: 150). (see image) I am using Apple Vision to detect the barcode. However barcodes that are out of the CGRect are also getting picked up and I would like to limit the area to my preview layer. I notice the pixel buffer is using the entire screen of width=1920 height=1080 as my device is an iPhone 11 Pro. Can I limit the buffer size before passing to Vision? extension CameraViewController: AVCaptureVideoDataOutputSampleBufferDelegate { func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) { guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return } print("output \(output)") print("sampleBuffer \(sampleBuffer)") print("connection \(connection)") print("pixelBuffer \(pixelBuffer)") let imageRequestHandler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: .up) do { try imageRequestHandler.perform([detectBarcodeRequest]) } catch { print(error) } }//captureOutput }//extension
Posted
by
Post not yet marked as solved
0 Replies
332 Views
I already installed tensorflow latest version using the documentation given (link). But when I tried to run notebook with command "%tensorflow_version 2.x" , its giving error "UsageError: Line magic function %tensorflow_version not found.". Please tell me, what to do ?
Posted
by
006
Post marked as solved
2 Replies
631 Views
In my mobile application, I observe a memory leak when running inference with my image convolution model. The memory leak occurs when getting predictions from the model. Given a pointer to a loaded MLModel object called module and input feature provider feature_provider (of type MLDictionaryFeatureProvider*), the memory leak is observed each time a prediction is made by calling [module predictionFromFeatures:feature_provider error:NULL]; The amount of memory leaked between each iteration appears to be related to the output size of the model. Assuming the mobile GPU backend is running in half-precision (float16), I observe the following for the given output sizes; Output image of dimension [1,3,3840,2160] (of size 1*3*3840*2160*16bits/(8bits * 1000^2) == 49.7664MB) Constant increase in memory of approximately 91.7MB after each image prediction. Output image of dimension [1,3,2048,1080] (of size 1*3*2048*1080*16bits/(8bits * 1000^2) == 13.27104MB) Constant increase in memory of approximately 23.7MB after each image prediction. Is there a known issue with the CoreML MLModel's predictionFromFeatures which allocates memory each time it is called? Or is this the intended behaviour? At the moment this is limiting me from running inference on mobile devices, and I was wondering if anyone has a suggested workaround, patch, or advice? Thank you in advance, and please find the information to reproduce the issue below. To Reproduce To reproduce the problem, a simple model with three convolutions and one pixel-shuffle layer was converted from PyTorch to an MLModel. The MLModel was then run with a debugger in a mobile application. A breakpoint was set on the line computing the predictions in a loop and the memory use after each iteration was observed to increase. Alternatively to setting a breakpoint, the number of prediction iterations can be set to 50 (assuming output size is [1,3,3840,2160] and phone memory is 4GB), which causes the application to run out of memory at runtime. The PyTorch model: import torch.nn as nn class Model(nn.Module): def __init__(self): super().__init__() upscale_factor = 8 self.Conv1 = nn.Conv2d(in_channels = 48, out_channels = 48, kernel_size = 3, stride = 1) self.Conv2 = nn.Conv2d(48, 48, 3, 1) self.Conv3 = nn.Conv2d(48, 3 * (upscale_factor*upscale_factor), 3, 1) self.PS = nn.PixelShuffle(upscale_factor) def forward(self, x): Conv1 = self.Conv1(x) Conv2 = self.Conv2(Conv1) Conv3 = self.Conv3(Conv2) y = self.PS(Conv3) return y The PyTorch to MLModel converter: import torch import coremltools def convert_torch_to_coreml(torch_model, input_shapes, save_path): torchscript_model = torch.jit.script(torch_model) mlmodel = coremltools.converters.convert( torchscript_model, inputs=[coremltools.TensorType(name=f'input_{i}', shape=input_shape) for i, input_shape in enumerate(input_shapes)], ) mlmodel.save(save_path) Generate MLModel using the above definitions: if __name__ == "__main__": torch_model = Model() # input_shapes = [[1,48,256,135]] # 2K input_shapes = [[1,48,480,270]] # 4K coreml_model_path = "./toy.mlmodel" convert_torch_to_coreml(torch_model, input_shapes, coreml_model_path) Mobile application: The mobile application was generated using PyTorch's iOS TestApp and adapted for our use case. The adapted TestApp is available here.. The most relevant lines in the application for loading the model and running inference are included below: Set MLMultiArray pointer to input tensor's data pointer: + (MLMultiArray*) tensorToMultiArray:(at::Tensor) input { float* input_ptr = input.data_ptr<float>(); int batch = (int) input.size(0); int ch = (int) input.size(1); int height = (int) input.size(2); int width = (int) input.size(3); int pixels = ch * height * width; NSArray* shape = @[[NSNumber numberWithInt:batch][NSNumber numberWithInt: ch], [NSNumber numberWithInt: height], [NSNumber numberWithInt: width]]; MLMultiArray* output = [[MLMultiArray alloc] initWithShape:shape dataType:MLMultiArrayDataTypeFloat32 error:NULL]; float* output_ptr = (float *) output.dataPointer; for (int pixel_index = 0; pixel_index < pixels; ++pixel_index) { output_ptr[pixel_index] = input_ptr[pixel_index]; } return output; } Load model, set input feature provider, and run inference over multiple iterations: NSError* __autoreleasing __nullable* __nullable error = nil; NSString* modelPath = [NSString stringWithUTF8String:model_path.c_str()]; NSURL* modelURL = [NSURL fileURLWithPath:modelPath]; NSURL* compiledModel = [MLModel compileModelAtURL:modelURL error:error]; MLModel* module = [MLModel modelWithContentsOfURL:compiledModel error:NULL]; NSMutableDictionary* feature_inputs = [[NSMutableDictionary alloc] init]; for (int i = 0; i < inputs.size(); ++i) { NSString* key = [NSString stringWithFormat:@"input_%d", i]; [feature_inputs setValue:[Converter tensorToMultiArray: inputs[i].toTensor()] forKey: key]; } MLDictionaryFeatureProvider* feature_provider = [[[MLDictionaryFeatureProvider alloc] init] initWithDictionary:feature_inputs error:NULL]; // Running inference on the model results in memory leak for (int i = 0; i < iter; ++i) { [module predictionFromFeatures:feature_provider error:NULL]; } Complete example source The complete minimal example of both the MLModel generation and the TestApp are available here. System environment: Original environment: coremltools version: 5.0b5: OS: build on MacOS targetting iOS for mobile application: macOS version: Big Sur (version 11.4) iOS version: 14.7.1 (run on iPhone 12) XCode version: Version 12.5.1 (12E507) How you install python: Install from source python version: 3.8.10 How you install Pytorch: Install from source PyTorch version: 1.8.1. Update to 'latest' environment coremltools version: 5.0b5: OS: build on MacOS targetting iOS for mobile application: macOS version: Big Sur (version 11.4) iOS version: 15.0.2 (run on iPhone 12) XCode version: Version 13.0(13A233) How you install Python: Install from source python version: 3.8.10 How you install Pytorch: Install from source PyTorch version: 1.10.0-rc2 Additional Information Given the model definition and tensor output shapes above, the corresponding tensor input shapes for the model are as follows: Output shape of [1,3,3840,2160] has input shape [1,48,480,270] Output shape of [1,3,2048,1080] has input shape [1,48,256,135]
Posted
by
Post not yet marked as solved
1 Replies
371 Views
In Vision's hand detection, we can work with one hand's landmarks to classify the pose. Is it possible to detect both hands' landmarks at the same time? So that we can detect a two-hand pose?
Posted
by
Post not yet marked as solved
1 Replies
362 Views
Hi, is it possible to get the code for the demo app used in this presentation for the dynamic style transfer example please? thanks
Posted
by
Post not yet marked as solved
3 Replies
624 Views
The VNDetectorOption_OriginatingRequestSpecifier required option was not found" UserInfo={NSLocalizedDescription=The VNDetectorOption_OriginatingRequestSpecifier required option was not found Facing this error in only iOS15 while finding observation.
Posted
by
Post not yet marked as solved
0 Replies
210 Views
VNContoursObservation is taking 715 times as long as OpenCV’s findContours() when creating directly comparable results. VNContoursObservation creates comparable results when I have set the maximumImageDimension property to 1024. If I set it lower, it runs a bit faster, but creates lower quality contours and still takes over 100 times as long. I have a hard time believing Apple doesn’t know what they are doing, so does anyone have an idea what is going on and how to get it to run much faster? There doesn’t seem to be many options, but nothing I’ve tried closes the gap. Setting the detectsDarkOnLight property to true makes it run even slower. OpenCV findContours runs with a binary image, but I am passing a RGB image to Vision assuming it would convert it to an appropriate format. OpenCV: double taskStart = CFAbsoluteTimeGetCurrent(); int contoursApproximation = CV_CHAIN_APPROX_NONE; int contourRetrievalMode = CV_RETR_LIST; findContours(input, contours, hierarchy, contourRetrievalMode, contoursApproximation, cv::Point(0,0)); NSLog(@"###### opencv findContours: %f", CFAbsoluteTimeGetCurrent() - taskStart); ###### opencv findContours: 0.017616 seconds Vision: let taskStart = CFAbsoluteTimeGetCurrent() let contourRequest = VNDetectContoursRequest.init() contourRequest.revision = VNDetectContourRequestRevision1 contourRequest.contrastAdjustment = 1.0 contourRequest.detectsDarkOnLight = false contourRequest.maximumImageDimension = 1024 let requestHandler = VNImageRequestHandler.init(cgImage: sourceImage.cgImage!, options: [:]) try! requestHandler.perform([contourRequest]) let contoursObservation = contourRequest.results?.first as! VNContoursObservation print(" ###### contoursObservation: \(CFAbsoluteTimeGetCurrent() - taskStart)") ###### contoursObservation: 12.605962038040161 The image I am providing OpenCV is 2048 pixels and the image I am providing Vision is 1024.
Posted
by
Post not yet marked as solved
2 Replies
390 Views
Hi, I have seen this video: https://developer.apple.com/videos/play/wwdc2021/10041/ and in my project i am trying to draw detected barcodes. I am using Vision framework and i have the barcode position in boundingBox parameter, but i dont understand cgrect of that parameter. I am programming in objective c and i don't see resources, and for more complication i have not an image, i am capturing barcodes from video camera sesion. for parts: 1-how can i draw detected barcode like in the video (from an image). 2-how can i draw detected barcode in capturesession. I have used VNImageRectForNormalizedRect to pass from normalized to pixel, but the result is not correct. thank you very much.
Posted
by
Post not yet marked as solved
0 Replies
272 Views
Context A SwiftUI app that uses the phone's camera to detect what hand sign I am doing in front of it and update a Text view in SwiftUI. The detection part is done in a UIViewController (primarily with Vision) and then that view is used in the main ContentView in SwiftUI. (UIViewControllerRepresentable) Problem I am able to print what hand sign I am doing in the front of the screen in the UIViewController, but not able to send that value to update the text label in SwiftUI. Here are my three main code files that pertain to this issue: ContentView.swift struct ContentView: View {       @State var text = ""       var body: some View {     ZStack {       Color(hex: "3E065F")         .ignoresSafeArea()       VStack {                   Text(text)           .foregroundColor(Color.white)           .font(.largeTitle)         MyCameraView(value: $text)       }     }   }     } MyCameraView.swift struct MyCameraView: UIViewControllerRepresentable {       @Binding var value: String   func makeUIViewController(context: Context) -> CameraViewController {     let cvc = CameraViewController()     return cvc   }   func updateUIViewController(     _ uiViewController: CameraViewController,     context: Context   ) {     value = uiViewController.currText // Only Called Once!   } } CameraViewController.swift final class CameraViewController: UIViewController, AVCaptureVideoDataOutputSampleBufferDelegate {    var currText = "A" ...    func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {     ...     guard let handSignsModel = try? VNCoreMLModel(for: SavedModel().model) else { print("Fail"); return }           let request = VNCoreMLRequest(model: handSignsModel) { (finishedRequest, err) in       guard let results = finishedRequest.results as? [VNClassificationObservation] else { return }       //print(results.first?.identifier)       self.currText = results.first!.identifier       print(self.currText)     }     try? VNImageRequestHandler(cvPixelBuffer: pixelBuffer, options: [:]).perform([request])   } } Please look at the print(self.currText) statement in the last file. I want to pass that value (every frame) to update the text in the ContentView. I tried to use updateUIViewController method in MyCameraView.swift, but it does not get called to update the text label every frame, only the first time it loads.
Posted
by
Post not yet marked as solved
1 Replies
339 Views
Apple's sample code Identifying Trajectories in Video contains the following delegate callback: func cameraViewController(_ controller: CameraViewController, didReceiveBuffer buffer: CMSampleBuffer, orientation: CGImagePropertyOrientation) { let visionHandler = VNImageRequestHandler(cmSampleBuffer: buffer, orientation: orientation, options: [:]) if gameManager.stateMachine.currentState is GameManager.TrackThrowsState { DispatchQueue.main.async { // Get the frame of rendered view let normalizedFrame = CGRect(x: 0, y: 0, width: 1, height: 1) self.jointSegmentView.frame = controller.viewRectForVisionRect(normalizedFrame) self.trajectoryView.frame = controller.viewRectForVisionRect(normalizedFrame) } // Perform the trajectory request in a separate dispatch queue. trajectoryQueue.async { do { try visionHandler.perform([self.detectTrajectoryRequest]) if let results = self.detectTrajectoryRequest.results { DispatchQueue.main.async { self.processTrajectoryObservations(controller, results) } } } catch { AppError.display(error, inViewController: self) } } } } However, instead of drawing UI whenever detectTrajectoryRequest.results exist (https://developer.apple.com/documentation/vision/vndetecttrajectoriesrequest/3675672-results), I'm interested in using the CMTimeRange provided by each result to construct a new video. In effect, this would filter down the original video to only frames with trajectories. How might I accomplish this, perhaps through writing only specific time ranges' frames from one AVFoundation video to a new AVFoundation video?
Posted
by
Post not yet marked as solved
0 Replies
410 Views
I'm using Vision to conduct some OCR from a live camera feed. I've setup my VNRecognizeTextRequests as follows: let request = VNRecognizeTextRequest(completionHandler: recognizeTextCompletionHandler) request.recognitionLevel = .accurate request.usesLanguageCorrection = false And I handle the results as follows: guard let observations = request.results as? [VNRecognizedTextObservation] else { return } for observation in observations { if let recognizedText = observation.topCandidates(1).first { guard recognizedText.confidence >= self.confidenceLimit, // set to 0.5 let foundText = validateRegexPattern(text: recognizedText.string, regexPattern: self.regexPattern), let foundDecimal = Double(foundText) else { continue } } This is actually working great and yielding very accurate results, but the confidence values I'm receiving from the results are generally either 0.5 or 1.0, and rarely 0.3. I find these to be pretty nonsensical confidence values and I'm wondering if this is the intended result or some sort of bug. Conversely, using recognitionLevel = .fast yields more realistic and varied confidence values, but much less accurate results overall (even though fast is recommended for OCR from a live camera feed, I've had significantly better results using the accurate recognition level, which is why I've been using the accurate recognition level)
Posted
by
Post not yet marked as solved
1 Replies
364 Views
Given an AVAsset, I'm performing a Vision trajectory request on it and would like to write out a video asset that only contains frames with trajectories (filter out downtime in sports footage where there's no ball moving). I'm unsure what would be a good approach, but as a starting point I tried the following pipeline: Copy sample buffer from the source AVAssetReaderOutput. Perform trajectory request on a vision handler parameterized by the sample buffer. For each resulting VNTrajectoryObservation (trajectory detected), use its associated CMTimeRange to configure a new AVAssetReader set to that time range. Append the time range constrained sample buffer to one AVAssetWriterInput until the forEach is complete. In code: private func transferSamplesAsynchronously(from readerOutput: AVAssetReaderOutput, to writerInput: AVAssetWriterInput, onQueue queue: DispatchQueue, sampleBufferProcessor: SampleBufferProcessor, completionHandler: @escaping () -> Void) { /* The writerInput continously invokes this closure until finished or cancelled. It throws an NSInternalInconsistencyException if called more than once for the same writer. */ writerInput.requestMediaDataWhenReady(on: queue) { var isDone = false /* While the writerInput accepts more data, process the sampleBuffer and then transfer the processed sample to the writerInput. */ while writerInput.isReadyForMoreMediaData { if self.isCancelled { isDone = true break } // Get the next sample from the asset reader output. guard let sampleBuffer = readerOutput.copyNextSampleBuffer() else { // The asset reader output has no more samples to vend. isDone = true break } let visionHandler = VNImageRequestHandler(cmSampleBuffer: sampleBuffer, orientation: self.orientation, options: [:]) do { try visionHandler.perform([self.detectTrajectoryRequest]) if let results = self.detectTrajectoryRequest.results { try results.forEach { result in let assetReader = try AVAssetReader(asset: self.asset) assetReader.timeRange = result.timeRange let trackOutput = AVTrackOutputs.firstTrackOutput(ofType: .video, fromTracks: self.asset.tracks, withOutputSettings: nil) assetReader.add(trackOutput) assetReader.startReading() guard let sampleBuffer = trackOutput.copyNextSampleBuffer() else { // The asset reader output has no more samples to vend. isDone = true return } // Append the sample to the asset writer input. guard writerInput.append(sampleBuffer) else { /* The writer could not append the sample buffer. The `readingAndWritingDidFinish()` function handles any error information from the asset writer. */ isDone = true return } } } } catch { print(error) } } if isDone { /* Calling `markAsFinished()` on the asset writer input does the following: 1. Unblocks any other inputs needing more samples. 2. Cancels further invocations of this "request media data" callback block. */ writerInput.markAsFinished() /* Tell the caller the reader output and writer input finished transferring samples. */ completionHandler() } } } private func readingAndWritingDidFinish(assetReaderWriter: AVAssetReaderWriter, completionHandler: @escaping FinishHandler) { if isCancelled { completionHandler(.success(.cancelled)) return } // Handle any error during processing of the video. guard sampleTransferError == nil else { assetReaderWriter.cancel() completionHandler(.failure(sampleTransferError!)) return } // Evaluate the result reading the samples. let result = assetReaderWriter.readingCompleted() if case .failure = result { completionHandler(result) return } /* Finish writing, and asynchronously evaluate the results from writing the samples. */ assetReaderWriter.writingCompleted { result in completionHandler(result) return } } When run I get the following: No error is caught in the first catch clause, and none are caught in private func readingAndWritingDidFinish(assetReaderWriter: AVAssetReaderWriter, completionHandler: @escaping FinishHandler), the completion handler is called. Help with any of the following questions would be appreciated: What is causing what appears to be indefinite loading? How might I isolate the problem further? Am I misusing or misunderstanding how to selectively read from time ranges of AVAssetReader objects? Should I forego the AVAssetReader / AVAsssetWriter route entirely, and use the time ranges with AVAssetExportSession instead? I don't know how the two approaches compare, or what to consider when choosing between the two.
Posted
by
Post not yet marked as solved
0 Replies
227 Views
wwdc20-10673 briefly shows how to visualize optical flow generated by VNGenerateOpticalFlowRequest and sample code is available through the developer app. But how can we build the OpticalFlowVisualizer.ci.metallib file from the CI-kernel code provided as OpticalFlowVisualizer.cikernel?
Posted
by