Post not yet marked as solved
205
Views
This sounds insane, but I'm unable to detect the right leg with the vision framework.
import Vision
let rightLeg = try observation.recognizedPoints(.rightLeg)
print(rightLeg)
The result is:
[C.VNHumanBodyPoseObservationJointName(_rawValue: left_foot_joint): [0.000000; 1.000000], C.VNHumanBodyPoseObservationJointName(_rawValue: left_leg_joint): [0.751968; 0.281050], __C.VNHumanBodyPoseObservationJointName(_rawValue: left_upLeg_joint): [0.767090; 0.715324]]
As you can see, it seems to be tracking the left leg. This is similar to the result if I change the groupName value to .leftLeg.
.leftArm & .rightArm work as expected as well. Only .rightLeg is not working.
I've tested detection on several people with the same results.
How is this possible?
Post not yet marked as solved
22
Views
Hi,
Is it possible that we can do the same function as "template matching" in OpenCV, by using swift and vision framework?
I don't want to go to object recognition in ML because of the accuracy issue.
Post not yet marked as solved
60
Views
I tried using sample code "Applying Matte Effects to People in Images and Videos" on iPhone 12 mini, but it's not accurate near the boundaries (especially hair). I even tried .accurate mode in segmentation quality level that causes iPhone to overheat quickly but still segmentation is not good for live video. One thing that may matter is results of segmentation are not as good as matting which applies alpha channel for the hair to blend accurately with the background. But if I am missing something, please do point out.
Post not yet marked as solved
44
Views
Mostly with Chinese characters Vision recognize a line of text as a single 'word' when in fact there could be 2 or more.
For ex, this string (肖丹销售部銷售经理) includes a name (first 2 char) and a job title (everything else). The first 2 characters have a height about twice the size of the others.
I've been trying to break this string into 2, but I can't find a way to do it as the bounding box relates to the whole 'word' and not each character. If I could get each character's bounding box I could compare them and decide to make multiple strings when appropriate.
I also tried to run VNDetectTextRectanglesRequest but the results don't always match (rarely actually) what you get with VNRecognizeTextRequest. For ex these 9 characters return 12 VNTextObservation.
Anyone has an idea?
Thanks.
Post marked as solved
314
Views
Hello everybody,
I am trying to run inference on a CoreML Model created by me using CreateML. I am following the sample code provided by Apple on the CoreML documentation page and every time I try to classify an image I get this error: "Could not create Espresso context".
Has this ever happened to anyone? How did you solve it?
Here is my code:
import Foundation
import Vision
import UIKit
import ImageIO
final class ButterflyClassification {
var classificationResult: Result?
lazy var classificationRequest: VNCoreMLRequest = {
do {
let model = try VNCoreMLModel(for: ButterfliesModel_1(configuration: MLModelConfiguration()).model)
return VNCoreMLRequest(model: model, completionHandler: { [weak self] request, error in
self?.processClassification(for: request, error: error)
})
}
catch {
fatalError("Failed to lead model.")
}
}()
func processClassification(for request: VNRequest, error: Error?) {
DispatchQueue.main.async {
guard let results = request.results else {
print("Unable to classify image.")
return
}
let classifications = results as! [VNClassificationObservation]
if classifications.isEmpty {
print("No classification was provided.")
return
}
else {
let firstClassification = classifications[0]
self.classificationResult = Result(speciesName: firstClassification.identifier, confidence: Double(firstClassification.confidence))
}
}
}
func classifyButterfly(image: UIImage) - Result? {
guard let ciImage = CIImage(image: image) else {
fatalError("Unable to create ciImage")
}
DispatchQueue.global(qos: .userInitiated).async {
let handler = VNImageRequestHandler(ciImage: ciImage, options: [:])
do {
try handler.perform([self.classificationRequest])
}
catch {
print("Failed to perform classification.\n\(error.localizedDescription)")
}
}
return classificationResult
}
}
Thank you for your help!
Post not yet marked as solved
52
Views
The new VNGeneratePersonSegmentationRequest is a stateful request, i.e. it keeps state and improves the segmentation mask generation for subsequent frames.
There is also the new CIPersonSegmentationFilter as a convenient way for using the API with Core Image. But since the Vision request is stateful, I was wondering how this is handled by the Core Image filter.
Does the filter also keep state between subsequent calls? How is the "The request requires the use of CMSampleBuffers with timestamps as input" requirement of VNStatefulRequest ensured?
Post not yet marked as solved
73
Views
Hello!
I was wondering if it would be possible for the sample code for the Meal App to be posted. There are some things I'd like to see regarding MLLinearRegressor and how models can be personalized with context and data.
Post not yet marked as solved
59
Views
Where can I find a comprehensive list of all the classes that the built in Sound Classifier model supports?
Post not yet marked as solved
143
Views
Hi,
I was wondering if the new PhotogrammetrySession will allow us developers to obtain object data like length, width, height post 3d reconstruction?
Is this functionality present or will we have to manually compute said information based on depth, focal length, etc.?
Post marked as solved
148
Views
I installed the new iOS beta on an iPhone 7, and I don't see the new details in maps, and I don't see the new Live Text features in photos / camera.
I didn't see anything in the release notes about required hardware...
Post not yet marked as solved
76
Views
We have around 2,000 catalog .jpg photos in 2D of fitness equipment (with no background). When we take photos of real ones we would like to identify if the photo taken is of the same kind (not a car or a motorcycle) and also blur it’s background.
blur backgrounds is easy with Vision in portrait mode but for human faces..
What is the best approach to be taken in this scenario.?
We would appreciate some pointers or guidelines.
Thanks!
Post not yet marked as solved
69
Views
Into my application which is in iOS, I have two photos, one photo has been grabbed from the Identity Card of a User, and another one is, the User takes a Selfie, in the end, there will be a task to compare those two faces and have a similarity response, whether they refer to the same person or not, I was wondering there is another way apart of having an MLModel into my app, for e.g. there is a Built-in framework or class that I can use in swift?
Post not yet marked as solved
150
Views
I know that it's possible to select the GPU on which to run the Metal code. But is it possible to select the GPU for Vision?
Post not yet marked as solved
571
Views
Has anyone been seeing errors from VNImageRequestHandler since upgrading to iOS/iPadOS 14.5?
Specifically: Error Domain=com.apple.vis Code=11 "encountered unknown exception"
It works for some images, but seems to fail on many that work fine on prior iOS/iPadOS versions.
Post marked as solved
286
Views
Background: I am prototyping with RealityKit with ios 14.1 on a latest iPad Pro 11 inches. My goal was to track a hand. When using skeleton tracking, it appears skeleton scales were not adjusted correctly so I got like 15cm off in some of my samples. So I am experimenting to use Vision to identity hand and then project back into 3D space.
1> Run image recognition on ARFrame.capturedImage
let handler = VNImageRequestHandler(cvPixelBuffer: frame.capturedImage, orientation: .up, options: [:])
let handPoseRequest = VNDetectHumanHandPoseRequest()
....
try handler.perform([handPoseRequest])
2> Convert point to 3D world transform (where the problem is).
fileprivate func convertVNPointTo3D(_ point: VNRecognizedPoint,
_ session: ARSession,
_ frame: ARFrame,
_ viewSize: CGSize)
-> Transform?
{
let pointX = (point.x / Double(frame.camera.imageResolution.width))*Double(viewSize.width)
let pointY = (point.y / Double(frame.camera.imageResolution.height))*Double(viewSize.height)
let query = frame.raycastQuery(from: CGPoint(x: pointX, y: pointY), allowing: .estimatedPlane, alignment: .any)
let results = session.raycast(query)
if let first = results.first {
return Transform(matrix: first.worldTransform)
}
else {
return nil
}
}
I wonder if I am doing the right conversion. The issue is, in the ARSession.rayCast document - https://developer.apple.com/documentation/arkit/arsession/3132065-raycast, it says this is converting UI screen point to 3D point. However, I am not sure how ARFrame.capturedImage will be fit into UI screen.
Thanks