Apply computer vision algorithms to perform a variety of tasks on input images and video using Vision.

Vision Documentation

Posts under Vision tag

80 Posts
Sort by:
Post not yet marked as solved
3 Replies
624 Views
The VNDetectorOption_OriginatingRequestSpecifier required option was not found" UserInfo={NSLocalizedDescription=The VNDetectorOption_OriginatingRequestSpecifier required option was not found Facing this error in only iOS15 while finding observation.
Posted
by iOS15.
Last updated
.
Post not yet marked as solved
0 Replies
394 Views
I use VNDetectHumanBodyPoseRequest to detect body from an image which in xcode assets(I download from image website), But I get error below: 2021-12-24 21:50:19.945976+0800 Guess My Exercise[91308:4258893] [espresso] [Espresso::handle_ex_plan] exception=Espresso exception: "I/O error": Missing weights path cnn_human_pose.espresso.weights status=-2 Unable to perform the request: Error Domain=com.apple.vis Code=9 "Unable to setup request in VNDetectHumanBodyPoseRequest" UserInfo={NSLocalizedDescription=Unable to setup request in VNDetectHumanBodyPoseRequest}. Below is my codes: let image = UIImage(named: "image2") guard let cgImage = image?.cgImage else{return} let requestHandler = VNImageRequestHandler(cgImage: cgImage) let request = VNDetectHumanBodyPoseRequest(completionHandler: bodyPoseHandler) do { // Perform the body pose-detection request. try requestHandler.perform([request]) } catch { print("Unable to perform the request: \(error).") } func bodyPoseHandler(request: VNRequest, error: Error?) { guard let observations = request.results as? [VNHumanBodyPoseObservation] else { return } let poses = Pose.fromObservations(observations) self.drawPoses(poses, onto: self.simage!) // Process each observation to find the recognized body pose points. }
Posted
by zhouxinle.
Last updated
.
Post not yet marked as solved
0 Replies
361 Views
Modifying guidance given in an answer on AVFoundation + Vision trajectory detection, I'm instead saving time ranges of frames that have a specific ML label from my custom action classifier: private lazy var detectHumanBodyPoseRequest: VNDetectHumanBodyPoseRequest = { let detectHumanBodyPoseRequest = VNDetectHumanBodyPoseRequest(completionHandler: completionHandler) return detectHumanBodyPoseRequest }() var timeRangesOfInterest: [Int : CMTimeRange] = [:] private func readingAndWritingDidFinish(assetReaderWriter: AVAssetReaderWriter, asset completionHandler: @escaping FinishHandler) { if isCancelled { completionHandler(.success(.cancelled)) return } // Handle any error during processing of the video. guard sampleTransferError == nil else { assetReaderWriter.cancel() completionHandler(.failure(sampleTransferError!)) return } // Evaluate the result reading the samples. let result = assetReaderWriter.readingCompleted() if case .failure = result { completionHandler(result) return } /* Finish writing, and asynchronously evaluate the results from writing the samples. */ assetReaderWriter.writingCompleted { result in self.exportVideoTimeRanges(timeRanges: self.timeRangesOfInterest.map { $0.value }) { result in completionHandler(result) } } } func exportVideoTimeRanges(timeRanges: [CMTimeRange], completion: @escaping (Result<OperationStatus, Error>) -> Void) { let inputVideoTrack = self.asset.tracks(withMediaType: .video).first! let composition = AVMutableComposition() let compositionTrack = composition.addMutableTrack(withMediaType: .video, preferredTrackID: kCMPersistentTrackID_Invalid)! var insertionPoint: CMTime = .zero for timeRange in timeRanges { try! compositionTrack.insertTimeRange(timeRange, of: inputVideoTrack, at: insertionPoint) insertionPoint = insertionPoint + timeRange.duration } let exportSession = AVAssetExportSession(asset: composition, presetName: AVAssetExportPresetHighestQuality)! try? FileManager.default.removeItem(at: self.outputURL) exportSession.outputURL = self.outputURL exportSession.outputFileType = .mov exportSession.exportAsynchronously { var result: Result<OperationStatus, Error> switch exportSession.status { case .completed: result = .success(.completed) case .cancelled: result = .success(.cancelled) case .failed: // The `error` property is non-nil in the `.failed` status. result = .failure(exportSession.error!) default: fatalError("Unexpected terminal export session status: \(exportSession.status).") } print("export finished: \(exportSession.status.rawValue) - \(exportSession.error)") completion(result) } } This worked fine with results vended from Apple's trajectory detection, but using my custom action classifier TennisActionClassifier (Core ML model exported from Create ML), I get the console error getSubtractiveDecodeDuration signalled err=-16364 (kMediaSampleTimingGeneratorError_InvalidTimeStamp) (Decode timestamp is earlier than previous sample's decode timestamp.) at MediaSampleTimingGenerator.c:180. Why might this be?
Posted
by Curiosity.
Last updated
.
Post not yet marked as solved
0 Replies
257 Views
My goal is to mark any tennis video's timestamps of both the start of each rally/point and the end of each rally/point. I tried trajectory detection, but the "end time" is when the ball bounces rather than when the rally/point ends. I'm not quite sure what direction to go from here to improve on this. Would action classification of body poses in each frame (two classes, "playing" and "not playing") be the best way to split the video into segments? A different technique?
Posted
by Curiosity.
Last updated
.
Post marked as solved
5 Replies
1.5k Views
Hello everybody, I am trying to run inference on a CoreML Model created by me using CreateML. I am following the sample code provided by Apple on the CoreML documentation page and every time I try to classify an image I get this error: "Could not create Espresso context". Has this ever happened to anyone? How did you solve it? Here is my code: import Foundation import Vision import UIKit import ImageIO final class ButterflyClassification {          var classificationResult: Result?          lazy var classificationRequest: VNCoreMLRequest = {                  do {             let model = try VNCoreMLModel(for: ButterfliesModel_1(configuration: MLModelConfiguration()).model)                          return VNCoreMLRequest(model: model, completionHandler: { [weak self] request, error in                                  self?.processClassification(for: request, error: error)             })         }         catch {             fatalError("Failed to lead model.")         }     }()     func processClassification(for request: VNRequest, error: Error?) {                  DispatchQueue.main.async {                          guard let results = request.results else {                 print("Unable to classify image.")                 return             }                          let classifications = results as! [VNClassificationObservation]                          if classifications.isEmpty {                                  print("No classification was provided.")                 return             }             else {                                  let firstClassification = classifications[0]                 self.classificationResult = Result(speciesName: firstClassification.identifier, confidence: Double(firstClassification.confidence))             }         }     }     func classifyButterfly(image: UIImage) - Result? {                  guard let ciImage = CIImage(image: image) else {             fatalError("Unable to create ciImage")         }                  DispatchQueue.global(qos: .userInitiated).async {                          let handler = VNImageRequestHandler(ciImage: ciImage, options: [:])             do {                 try handler.perform([self.classificationRequest])             }             catch {                 print("Failed to perform classification.\n\(error.localizedDescription)")             }         }                  return classificationResult     } } Thank you for your help!
Posted
by tmsm1999.
Last updated
.
Post not yet marked as solved
0 Replies
281 Views
I'm building a feature to automatically edit out all the downtime of a tennis video. I have a partial implementation that stores the start and end times of Vision trajectory detections and writes only those segments to an AVFoundation export session. I've encountered a major issue, which is that the trajectories returned end whenever the ball bounce, so each segment is just one tennis shot and nowhere close to an entire rally with multiple bounces. I'm ensure if I should continue done the trajectory route, maybe stitching together the trajectories and somehow only splitting at the start and end of a rally. Any general guidance would be appreciated. Is there a different Vision or ML approach that would more accurately model the start and end time of a rally? I considered creating a custom action classifier to classify frames to be either "playing tennis" or "inactivity," but I started with Apple's trajectory detection since it was already built and trained. Maybe a custom classifier would be needed, but not sure.
Posted
by Curiosity.
Last updated
.
Post not yet marked as solved
1 Replies
364 Views
Given an AVAsset, I'm performing a Vision trajectory request on it and would like to write out a video asset that only contains frames with trajectories (filter out downtime in sports footage where there's no ball moving). I'm unsure what would be a good approach, but as a starting point I tried the following pipeline: Copy sample buffer from the source AVAssetReaderOutput. Perform trajectory request on a vision handler parameterized by the sample buffer. For each resulting VNTrajectoryObservation (trajectory detected), use its associated CMTimeRange to configure a new AVAssetReader set to that time range. Append the time range constrained sample buffer to one AVAssetWriterInput until the forEach is complete. In code: private func transferSamplesAsynchronously(from readerOutput: AVAssetReaderOutput, to writerInput: AVAssetWriterInput, onQueue queue: DispatchQueue, sampleBufferProcessor: SampleBufferProcessor, completionHandler: @escaping () -> Void) { /* The writerInput continously invokes this closure until finished or cancelled. It throws an NSInternalInconsistencyException if called more than once for the same writer. */ writerInput.requestMediaDataWhenReady(on: queue) { var isDone = false /* While the writerInput accepts more data, process the sampleBuffer and then transfer the processed sample to the writerInput. */ while writerInput.isReadyForMoreMediaData { if self.isCancelled { isDone = true break } // Get the next sample from the asset reader output. guard let sampleBuffer = readerOutput.copyNextSampleBuffer() else { // The asset reader output has no more samples to vend. isDone = true break } let visionHandler = VNImageRequestHandler(cmSampleBuffer: sampleBuffer, orientation: self.orientation, options: [:]) do { try visionHandler.perform([self.detectTrajectoryRequest]) if let results = self.detectTrajectoryRequest.results { try results.forEach { result in let assetReader = try AVAssetReader(asset: self.asset) assetReader.timeRange = result.timeRange let trackOutput = AVTrackOutputs.firstTrackOutput(ofType: .video, fromTracks: self.asset.tracks, withOutputSettings: nil) assetReader.add(trackOutput) assetReader.startReading() guard let sampleBuffer = trackOutput.copyNextSampleBuffer() else { // The asset reader output has no more samples to vend. isDone = true return } // Append the sample to the asset writer input. guard writerInput.append(sampleBuffer) else { /* The writer could not append the sample buffer. The `readingAndWritingDidFinish()` function handles any error information from the asset writer. */ isDone = true return } } } } catch { print(error) } } if isDone { /* Calling `markAsFinished()` on the asset writer input does the following: 1. Unblocks any other inputs needing more samples. 2. Cancels further invocations of this "request media data" callback block. */ writerInput.markAsFinished() /* Tell the caller the reader output and writer input finished transferring samples. */ completionHandler() } } } private func readingAndWritingDidFinish(assetReaderWriter: AVAssetReaderWriter, completionHandler: @escaping FinishHandler) { if isCancelled { completionHandler(.success(.cancelled)) return } // Handle any error during processing of the video. guard sampleTransferError == nil else { assetReaderWriter.cancel() completionHandler(.failure(sampleTransferError!)) return } // Evaluate the result reading the samples. let result = assetReaderWriter.readingCompleted() if case .failure = result { completionHandler(result) return } /* Finish writing, and asynchronously evaluate the results from writing the samples. */ assetReaderWriter.writingCompleted { result in completionHandler(result) return } } When run I get the following: No error is caught in the first catch clause, and none are caught in private func readingAndWritingDidFinish(assetReaderWriter: AVAssetReaderWriter, completionHandler: @escaping FinishHandler), the completion handler is called. Help with any of the following questions would be appreciated: What is causing what appears to be indefinite loading? How might I isolate the problem further? Am I misusing or misunderstanding how to selectively read from time ranges of AVAssetReader objects? Should I forego the AVAssetReader / AVAsssetWriter route entirely, and use the time ranges with AVAssetExportSession instead? I don't know how the two approaches compare, or what to consider when choosing between the two.
Posted
by Curiosity.
Last updated
.
Post not yet marked as solved
0 Replies
249 Views
I am saving time ranges from an input video asset where trajectories are found, then exporting only those segments to an output video file. Currently I track these time ranges in a stored property var timeRangesOfInterest: [Double : CMTimeRange], which is set in the trajectory request's completion handler func completionHandler(request: VNRequest, error: Error?) {         guard let request = request as? VNDetectTrajectoriesRequest else { return }         if let results = request.results,            results.count > 0 {             for result in results {                 var timeRange = result.timeRange                 timeRange.start = timeRange.start - self.assetWriterStartTime                 self.timeRangesOfInterest[timeRange.start.seconds] = timeRange             }         }     } Then these time ranges of interest are used in an export session to only export those segments /*          Finish writing, and asynchronously evaluate the results from writing          the samples.         */         assetReaderWriter.writingCompleted { result in             self.exportVideoTimeRanges(timeRanges: self.timeRangesOfInterest.map { $0.1 }) { result in                 completionHandler(result)             }         } Unfortunately however, I'm getting repeated trajectory video segments in the outputted video. Is this maybe because trajectory requests return "in progress" repeated trajectory results with slightly different time range start times? What might be a good strategy for avoiding or removing them? I noticed trajectory segments will appear out of order in the output as well.
Posted
by Curiosity.
Last updated
.
Post not yet marked as solved
0 Replies
227 Views
wwdc20-10673 briefly shows how to visualize optical flow generated by VNGenerateOpticalFlowRequest and sample code is available through the developer app. But how can we build the OpticalFlowVisualizer.ci.metallib file from the CI-kernel code provided as OpticalFlowVisualizer.cikernel?
Posted
by dabx.
Last updated
.
Post not yet marked as solved
0 Replies
410 Views
I'm using Vision to conduct some OCR from a live camera feed. I've setup my VNRecognizeTextRequests as follows: let request = VNRecognizeTextRequest(completionHandler: recognizeTextCompletionHandler) request.recognitionLevel = .accurate request.usesLanguageCorrection = false And I handle the results as follows: guard let observations = request.results as? [VNRecognizedTextObservation] else { return } for observation in observations { if let recognizedText = observation.topCandidates(1).first { guard recognizedText.confidence >= self.confidenceLimit, // set to 0.5 let foundText = validateRegexPattern(text: recognizedText.string, regexPattern: self.regexPattern), let foundDecimal = Double(foundText) else { continue } } This is actually working great and yielding very accurate results, but the confidence values I'm receiving from the results are generally either 0.5 or 1.0, and rarely 0.3. I find these to be pretty nonsensical confidence values and I'm wondering if this is the intended result or some sort of bug. Conversely, using recognitionLevel = .fast yields more realistic and varied confidence values, but much less accurate results overall (even though fast is recommended for OCR from a live camera feed, I've had significantly better results using the accurate recognition level, which is why I've been using the accurate recognition level)
Posted
by ctj388.
Last updated
.
Post not yet marked as solved
1 Replies
339 Views
Apple's sample code Identifying Trajectories in Video contains the following delegate callback: func cameraViewController(_ controller: CameraViewController, didReceiveBuffer buffer: CMSampleBuffer, orientation: CGImagePropertyOrientation) { let visionHandler = VNImageRequestHandler(cmSampleBuffer: buffer, orientation: orientation, options: [:]) if gameManager.stateMachine.currentState is GameManager.TrackThrowsState { DispatchQueue.main.async { // Get the frame of rendered view let normalizedFrame = CGRect(x: 0, y: 0, width: 1, height: 1) self.jointSegmentView.frame = controller.viewRectForVisionRect(normalizedFrame) self.trajectoryView.frame = controller.viewRectForVisionRect(normalizedFrame) } // Perform the trajectory request in a separate dispatch queue. trajectoryQueue.async { do { try visionHandler.perform([self.detectTrajectoryRequest]) if let results = self.detectTrajectoryRequest.results { DispatchQueue.main.async { self.processTrajectoryObservations(controller, results) } } } catch { AppError.display(error, inViewController: self) } } } } However, instead of drawing UI whenever detectTrajectoryRequest.results exist (https://developer.apple.com/documentation/vision/vndetecttrajectoriesrequest/3675672-results), I'm interested in using the CMTimeRange provided by each result to construct a new video. In effect, this would filter down the original video to only frames with trajectories. How might I accomplish this, perhaps through writing only specific time ranges' frames from one AVFoundation video to a new AVFoundation video?
Posted
by Curiosity.
Last updated
.
Post not yet marked as solved
0 Replies
272 Views
Context A SwiftUI app that uses the phone's camera to detect what hand sign I am doing in front of it and update a Text view in SwiftUI. The detection part is done in a UIViewController (primarily with Vision) and then that view is used in the main ContentView in SwiftUI. (UIViewControllerRepresentable) Problem I am able to print what hand sign I am doing in the front of the screen in the UIViewController, but not able to send that value to update the text label in SwiftUI. Here are my three main code files that pertain to this issue: ContentView.swift struct ContentView: View {       @State var text = ""       var body: some View {     ZStack {       Color(hex: "3E065F")         .ignoresSafeArea()       VStack {                   Text(text)           .foregroundColor(Color.white)           .font(.largeTitle)         MyCameraView(value: $text)       }     }   }     } MyCameraView.swift struct MyCameraView: UIViewControllerRepresentable {       @Binding var value: String   func makeUIViewController(context: Context) -> CameraViewController {     let cvc = CameraViewController()     return cvc   }   func updateUIViewController(     _ uiViewController: CameraViewController,     context: Context   ) {     value = uiViewController.currText // Only Called Once!   } } CameraViewController.swift final class CameraViewController: UIViewController, AVCaptureVideoDataOutputSampleBufferDelegate {    var currText = "A" ...    func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {     ...     guard let handSignsModel = try? VNCoreMLModel(for: SavedModel().model) else { print("Fail"); return }           let request = VNCoreMLRequest(model: handSignsModel) { (finishedRequest, err) in       guard let results = finishedRequest.results as? [VNClassificationObservation] else { return }       //print(results.first?.identifier)       self.currText = results.first!.identifier       print(self.currText)     }     try? VNImageRequestHandler(cvPixelBuffer: pixelBuffer, options: [:]).perform([request])   } } Please look at the print(self.currText) statement in the last file. I want to pass that value (every frame) to update the text in the ContentView. I tried to use updateUIViewController method in MyCameraView.swift, but it does not get called to update the text label every frame, only the first time it loads.
Posted
by DecoderDE.
Last updated
.
Post not yet marked as solved
0 Replies
624 Views
I am trying to train an Object Detection Model using transfer learning with a small dataset ( roughly 650 Images and two classes ) using Create ML v2.0 (53.2.2) with prefer external GPU checked. I am using a 2018 Mac mini 3.2 ghz I7 16 gb of ram and AMD Radeon Pro 580 eGPU. The problem I am having is that I can only do about 3500 iterations before I run out of memory and I need to pause the training. When I resume training my Loss increases again and it takes a while for it to get back down to where it was before I paused. So I am wondering if there is a better way to setup the hardware or any other suggestions so I can get through all of the iterations without having to pause. I don't recall having this issue with Create ML v1.0, so any suggestions would be appreciated.
Posted
by bbarry.
Last updated
.
Post not yet marked as solved
2 Replies
390 Views
Hi, I have seen this video: https://developer.apple.com/videos/play/wwdc2021/10041/ and in my project i am trying to draw detected barcodes. I am using Vision framework and i have the barcode position in boundingBox parameter, but i dont understand cgrect of that parameter. I am programming in objective c and i don't see resources, and for more complication i have not an image, i am capturing barcodes from video camera sesion. for parts: 1-how can i draw detected barcode like in the video (from an image). 2-how can i draw detected barcode in capturesession. I have used VNImageRectForNormalizedRect to pass from normalized to pixel, but the result is not correct. thank you very much.
Posted Last updated
.
Post not yet marked as solved
0 Replies
210 Views
VNContoursObservation is taking 715 times as long as OpenCV’s findContours() when creating directly comparable results. VNContoursObservation creates comparable results when I have set the maximumImageDimension property to 1024. If I set it lower, it runs a bit faster, but creates lower quality contours and still takes over 100 times as long. I have a hard time believing Apple doesn’t know what they are doing, so does anyone have an idea what is going on and how to get it to run much faster? There doesn’t seem to be many options, but nothing I’ve tried closes the gap. Setting the detectsDarkOnLight property to true makes it run even slower. OpenCV findContours runs with a binary image, but I am passing a RGB image to Vision assuming it would convert it to an appropriate format. OpenCV: double taskStart = CFAbsoluteTimeGetCurrent(); int contoursApproximation = CV_CHAIN_APPROX_NONE; int contourRetrievalMode = CV_RETR_LIST; findContours(input, contours, hierarchy, contourRetrievalMode, contoursApproximation, cv::Point(0,0)); NSLog(@"###### opencv findContours: %f", CFAbsoluteTimeGetCurrent() - taskStart); ###### opencv findContours: 0.017616 seconds Vision: let taskStart = CFAbsoluteTimeGetCurrent() let contourRequest = VNDetectContoursRequest.init() contourRequest.revision = VNDetectContourRequestRevision1 contourRequest.contrastAdjustment = 1.0 contourRequest.detectsDarkOnLight = false contourRequest.maximumImageDimension = 1024 let requestHandler = VNImageRequestHandler.init(cgImage: sourceImage.cgImage!, options: [:]) try! requestHandler.perform([contourRequest]) let contoursObservation = contourRequest.results?.first as! VNContoursObservation print(" ###### contoursObservation: \(CFAbsoluteTimeGetCurrent() - taskStart)") ###### contoursObservation: 12.605962038040161 The image I am providing OpenCV is 2048 pixels and the image I am providing Vision is 1024.
Posted
by 3DTOPO.
Last updated
.
Post not yet marked as solved
1 Replies
362 Views
Hi, is it possible to get the code for the demo app used in this presentation for the dynamic style transfer example please? thanks
Posted
by Saddif.
Last updated
.
Post not yet marked as solved
1 Replies
578 Views
Hello, I found a bug using Vision for face detection on iPhone SE second generation. When I execute the Apple demo project for face detection (https://developer.apple.com/documentation/vision/tracking_the_user_s_face_in_real_time) I have no face detected and the following error logs : 2021-04-16 10:36:29.110317+0200 VisionFaceTrack[2899:16783309] Metal API Validation Enabled 2021-04-16 10:36:29.443963+0200 VisionFaceTrack[2899:16783492] [espresso] [Espresso::handle_ex_plan] exception=ANECF error: loadModel:sandboxExtension:options:qos:withReply:: Program load failure FaceDetection error: Optional(Error Domain=com.apple.vis Code=9 "encountered an unexpected condition: Unspecified error" UserInfo={NSLocalizedDescription=encountered an unexpected condition: Unspecified error}). 2021-04-16 10:36:29.445968+0200 VisionFaceTrack[2899:16783492] Failed to perform FaceRectangleRequest: Error Domain=com.apple.vis Code=9 "encountered an unexpected condition: Unspecified error" UserInfo={NSLocalizedDescription=encountered an unexpected condition: Unspecified error} 2021-04-16 10:36:29.460822+0200 VisionFaceTrack[2899:16783492] [espresso] [Espresso::handle_ex_plan] exception=ANECF error: loadModel:sandboxExtension:options:qos:withReply:: Program load failure FaceDetection error: Optional(Error Domain=com.apple.vis Code=9 "encountered an unexpected condition: Unspecified error" UserInfo={NSLocalizedDescription=encountered an unexpected condition: Unspecified error}). 2021-04-16 10:36:29.461645+0200 VisionFaceTrack[2899:16783492] Failed to perform FaceRectangleRequest: Error Domain=com.apple.vis Code=9 "encountered an unexpected condition: Unspecified error" UserInfo={NSLocalizedDescription=encountered an unexpected condition: Unspecified error} 2021-04-16 10:36:29.482482+0200 VisionFaceTrack[2899:16783492] [espresso] [Espresso::handle_ex_plan] exception=ANECF error: loadModel:sandboxExtension:options:qos:withReply:: Program load failure FaceDetection error: Optional(Error Domain=com.apple.vis Code=9 "encountered an unexpected condition: Unspecified error" UserInfo={NSLocalizedDescription=encountered an unexpected condition: Unspecified error}). 2021-04-16 10:36:29.483395+0200 VisionFaceTrack[2899:16783492] Failed to perform FaceRectangleRequest: Error Domain=com.apple.vis Code=9 "encountered an unexpected condition: Unspecified error" UserInfo={NSLocalizedDescription=encountered an unexpected condition: Unspecified error} This bug is very annoying because it brokes all apps using Vision at least on this iPhone and iOS version. I have an app using this technology on the AppStore, that's how I firstly discover the problem and am hopping for a quick fix. iOS version : 14.4 (18D52) iPhone SE second generation Xcode version 12.4 (12D4e) Edit : this seem to be solved with iOS 14.4.2. Can we have more information about this bug ? Is it device model relative or all device with iOS 14.4 will have this bug ?
Posted
by Hschou.
Last updated
.