Apply computer vision algorithms to perform a variety of tasks on input images and video using Vision.

Vision Documentation

Posts under Vision tag

111 results found
Sort by:
Post not yet marked as solved
205 Views

Vision Framework rightLeg detecting incorrectly

This sounds insane, but I'm unable to detect the right leg with the vision framework. import Vision let rightLeg = try observation.recognizedPoints(.rightLeg)       print(rightLeg) The result is: [C.VNHumanBodyPoseObservationJointName(_rawValue: left_foot_joint): [0.000000; 1.000000], C.VNHumanBodyPoseObservationJointName(_rawValue: left_leg_joint): [0.751968; 0.281050], __C.VNHumanBodyPoseObservationJointName(_rawValue: left_upLeg_joint): [0.767090; 0.715324]] As you can see, it seems to be tracking the left leg. This is similar to the result if I change the groupName value to .leftLeg. .leftArm & .rightArm work as expected as well. Only .rightLeg is not working. I've tested detection on several people with the same results. How is this possible?
Asked
by shmck.
Last updated
.
Post not yet marked as solved
22 Views

Function like template matching in OpenCV

Hi, Is it possible that we can do the same function as "template matching" in OpenCV, by using swift and vision framework? I don't want to go to object recognition in ML because of the accuracy issue.
Asked
by Yuqi.
Last updated
.
Post not yet marked as solved
60 Views

iOS 15 Vision person segmentation

I tried using sample code "Applying Matte Effects to People in Images and Videos" on iPhone 12 mini, but it's not accurate near the boundaries (especially hair). I even tried .accurate mode in segmentation quality level that causes iPhone to overheat quickly but still segmentation is not good for live video. One thing that may matter is results of segmentation are not as good as matting which applies alpha channel for the hair to blend accurately with the background. But if I am missing something, please do point out.
Asked Last updated
.
Post not yet marked as solved
44 Views

Is it possible to get a bounding box for each character in VNRecognizedTextObservation

Mostly with Chinese characters Vision recognize a line of text as a single 'word' when in fact there could be 2 or more. For ex, this string (肖丹销售部銷售经理) includes a name (first 2 char) and a job title (everything else). The first 2 characters have a height about twice the size of the others. I've been trying to break this string into 2, but I can't find a way to do it as the bounding box relates to the whole 'word' and not each character. If I could get each character's bounding box I could compare them and decide to make multiple strings when appropriate. I also tried to run VNDetectTextRectanglesRequest but the results don't always match (rarely actually) what you get with VNRecognizeTextRequest. For ex these 9 characters return 12 VNTextObservation. Anyone has an idea? Thanks.
Asked
by LCG.
Last updated
.
Post marked as solved
314 Views

CoreML Inference Error: "Could not create Espresso context"

Hello everybody, I am trying to run inference on a CoreML Model created by me using CreateML. I am following the sample code provided by Apple on the CoreML documentation page and every time I try to classify an image I get this error: "Could not create Espresso context". Has this ever happened to anyone? How did you solve it? Here is my code: import Foundation import Vision import UIKit import ImageIO final class ButterflyClassification {          var classificationResult: Result?          lazy var classificationRequest: VNCoreMLRequest = {                  do {             let model = try VNCoreMLModel(for: ButterfliesModel_1(configuration: MLModelConfiguration()).model)                          return VNCoreMLRequest(model: model, completionHandler: { [weak self] request, error in                                  self?.processClassification(for: request, error: error)             })         }         catch {             fatalError("Failed to lead model.")         }     }()     func processClassification(for request: VNRequest, error: Error?) {                  DispatchQueue.main.async {                          guard let results = request.results else {                 print("Unable to classify image.")                 return             }                          let classifications = results as! [VNClassificationObservation]                          if classifications.isEmpty {                                  print("No classification was provided.")                 return             }             else {                                  let firstClassification = classifications[0]                 self.classificationResult = Result(speciesName: firstClassification.identifier, confidence: Double(firstClassification.confidence))             }         }     }     func classifyButterfly(image: UIImage) - Result? {                  guard let ciImage = CIImage(image: image) else {             fatalError("Unable to create ciImage")         }                  DispatchQueue.global(qos: .userInitiated).async {                          let handler = VNImageRequestHandler(ciImage: ciImage, options: [:])             do {                 try handler.perform([self.classificationRequest])             }             catch {                 print("Failed to perform classification.\n\(error.localizedDescription)")             }         }                  return classificationResult     } } Thank you for your help!
Asked
by tmsm1999.
Last updated
.
Post not yet marked as solved
52 Views

VNStatefulRequest in Core Image

The new VNGeneratePersonSegmentationRequest is a stateful request, i.e. it keeps state and improves the segmentation mask generation for subsequent frames. There is also the new CIPersonSegmentationFilter as a convenient way for using the API with Core Image. But since the Vision request is stateful, I was wondering how this is handled by the Core Image filter. Does the filter also keep state between subsequent calls? How is the "The request requires the use of CMSampleBuffers with timestamps as input" requirement of VNStatefulRequest ensured?
Asked Last updated
.
Post not yet marked as solved
73 Views

Meal App Personalization Sample Code

Hello! I was wondering if it would be possible for the sample code for the Meal App to be posted. There are some things I'd like to see regarding MLLinearRegressor and how models can be personalized with context and data.
Asked
by SAIK1065.
Last updated
.
Post not yet marked as solved
143 Views

Getting object data (length, width, height, etc.) from 3d objects/PhotogrammetrySession

Hi, I was wondering if the new PhotogrammetrySession will allow us developers to obtain object data like length, width, height post 3d reconstruction? Is this functionality present or will we have to manually compute said information based on depth, focal length, etc.?
Asked
by jzooms.
Last updated
.
Post marked as solved
148 Views

iPhone 7 limited functionality?

I installed the new iOS beta on an iPhone 7, and I don't see the new details in maps, and I don't see the new Live Text features in photos / camera. I didn't see anything in the release notes about required hardware...
Asked
by tdc.
Last updated
.
Post not yet marked as solved
76 Views

Vision, CoreImage, ML

We have around 2,000 catalog .jpg photos in 2D of fitness equipment (with no background). When we take photos of real ones we would like to identify if the photo taken is of the same kind (not a car or a motorcycle) and also blur it’s background. blur backgrounds is easy with Vision in portrait mode but for human faces.. What is the best approach to be taken in this scenario.? We would appreciate some pointers or guidelines. Thanks!
Asked Last updated
.
Post not yet marked as solved
69 Views

Comparing Faces in iOS.

Into my application which is in iOS, I have two photos, one photo has been grabbed from the Identity Card of a User, and another one is, the User takes a Selfie, in the end, there will be a task to compare those two faces and have a similarity response, whether they refer to the same person or not, I was wondering there is another way apart of having an MLModel into my app, for e.g. there is a Built-in framework or class that I can use in swift?
Asked
by iMoei941.
Last updated
.
Post not yet marked as solved
150 Views

GPU selection for Vision

I know that it's possible to select the GPU on which to run the Metal code. But is it possible to select the GPU for Vision?
Asked Last updated
.
Post not yet marked as solved
571 Views

VNImageRequestHandler - Failing with Error code 11 in iOS/iPadOS 14.5

Has anyone been seeing errors from VNImageRequestHandler since upgrading to iOS/iPadOS 14.5? Specifically: Error Domain=com.apple.vis Code=11 "encountered unknown exception"  It works for some images, but seems to fail on many that work fine on prior iOS/iPadOS versions.
Asked Last updated
.
Post marked as solved
286 Views

Vision + RealityKit: Convert a point in ARFrame.capturedImage to 3D World Transform

Background: I am prototyping with RealityKit with ios 14.1 on a latest iPad Pro 11 inches. My goal was to track a hand. When using skeleton tracking, it appears skeleton scales were not adjusted correctly so I got like 15cm off in some of my samples. So I am experimenting to use Vision to identity hand and then project back into 3D space. 1> Run image recognition on ARFrame.capturedImage let handler = VNImageRequestHandler(cvPixelBuffer: frame.capturedImage, orientation: .up, options: [:]) let handPoseRequest = VNDetectHumanHandPoseRequest() .... try handler.perform([handPoseRequest]) 2> Convert point to 3D world transform (where the problem is).    fileprivate func convertVNPointTo3D(_ point: VNRecognizedPoint,                     _ session: ARSession,                     _ frame: ARFrame,                     _ viewSize: CGSize)               -> Transform?   {     let pointX = (point.x / Double(frame.camera.imageResolution.width))*Double(viewSize.width)     let pointY = (point.y / Double(frame.camera.imageResolution.height))*Double(viewSize.height)     let query = frame.raycastQuery(from: CGPoint(x: pointX, y: pointY), allowing: .estimatedPlane, alignment: .any)     let results = session.raycast(query)     if let first = results.first {       return Transform(matrix: first.worldTransform)     }     else {       return nil     }   } I wonder if I am doing the right conversion. The issue is, in the ARSession.rayCast document - https://developer.apple.com/documentation/arkit/arsession/3132065-raycast, it says this is converting UI screen point to 3D point. However, I am not sure how ARFrame.capturedImage will be fit into UI screen. Thanks
Asked Last updated
.