Post not yet marked as solved
Summary:
I am using the Vision framework, in conjunction with AVFoundation, to detect facial landmarks of each face in the camera feed (by way of the VNDetectFaceLandmarksRequest). From here, I am taking the found observations and unprojecting each point to a SceneKit View (SCNView), then using those points as the vertices to draw a custom geometry that is textured with a material over each found face.
Effectively, I am working to recreate how an ARFaceTrackingConfiguration functions. In general, this task is functioning as expected, but only when my device is using the front camera in landscape right orientation. When I rotate my device, or switch to the rear camera, the unprojected points do not properly align with the found face as they do in landscape right/front camera.
Problem:
When testing this code, the mesh appears properly (that is, appears affixed to a user's face), but again, only when using the front camera in landscape right. While the code runs as expected (that is, generating the face mesh for each found face) in all orientations, the mesh is wildly misaligned in all other cases.
My belief is this issue either stems from my converting the face's bounding box (using VNImageRectForNormalizedRect, which I am calculating using the width/height of my SCNView, not my pixel buffer, which is typically much larger), though all modifications I have tried result in the same issue.
Outside of that, I also believe this could be an issue with my SCNCamera, as I am a bit unsure how the transform/projection matrix works and whether that would be needed here.
Sample of Vision Request Setup:
// Setup Vision request options
var requestHandlerOptions: [VNImageOption: AnyObject] = [:]
// Setup Camera Intrinsics
let cameraIntrinsicData = CMGetAttachment(sampleBuffer, key: kCMSampleBufferAttachmentKey_CameraIntrinsicMatrix, attachmentModeOut: nil)
if cameraIntrinsicData != nil {
requestHandlerOptions[VNImageOption.cameraIntrinsics] = cameraIntrinsicData
}
// Set EXIF orientation
let exifOrientation = self.exifOrientationForCurrentDeviceOrientation()
// Setup vision request handler
let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer,
orientation: exifOrientation,
options: requestHandlerOptions)
// Setup the completion handler
let completion: VNRequestCompletionHandler = {request, error in
let observations = request.results as! [VNFaceObservation]
// Draw faces
DispatchQueue.main.async {
drawFaceGeometry(observations: observations)
}
}
// Setup the image request
let request = VNDetectFaceLandmarksRequest(completionHandler: completion)
// Handle the request
do {
try handler.perform([request])
} catch {
print(error)
}
Sample of SCNView Setup:
// Setup SCNView
let scnView = SCNView()
scnView.translatesAutoresizingMaskIntoConstraints = false
self.view.addSubview(scnView)
scnView.showsStatistics = true
NSLayoutConstraint.activate([
scnView.leadingAnchor.constraint(equalTo: self.view.leadingAnchor),
scnView.topAnchor.constraint(equalTo: self.view.topAnchor),
scnView.bottomAnchor.constraint(equalTo: self.view.bottomAnchor),
scnView.trailingAnchor.constraint(equalTo: self.view.trailingAnchor)
])
// Setup scene
let scene = SCNScene()
scnView.scene = scene
// Setup camera
let cameraNode = SCNNode()
let camera = SCNCamera()
cameraNode.camera = camera
scnView.scene?.rootNode.addChildNode(cameraNode)
cameraNode.position = SCNVector3(x: 0, y: 0, z: 16)
// Setup light
let ambientLightNode = SCNNode()
ambientLightNode.light = SCNLight()
ambientLightNode.light?.type = SCNLight.LightType.ambient
ambientLightNode.light?.color = UIColor.darkGray
scnView.scene?.rootNode.addChildNode(ambientLightNode)
Sample of "face processing"
func drawFaceGeometry(observations: [VNFaceObservation]) {
// An array of face nodes, one SCNNode for each detected face
var faceNode = [SCNNode]()
// The origin point
let projectedOrigin = sceneView.projectPoint(SCNVector3Zero)
// Iterate through each found face
for observation in observations {
// Setup a SCNNode for the face
let face = SCNNode()
// Setup the found bounds
let faceBounds = VNImageRectForNormalizedRect(observation.boundingBox, Int(self.scnView.bounds.width), Int(self.scnView.bounds.height))
// Verify we have landmarks
if let landmarks = observation.landmarks {
// Landmarks are relative to and normalized within face bounds
let affineTransform = CGAffineTransform(translationX: faceBounds.origin.x, y: faceBounds.origin.y)
.scaledBy(x: faceBounds.size.width, y: faceBounds.size.height)
// Add all points as vertices
var vertices = [SCNVector3]()
// Verify we have points
if let allPoints = landmarks.allPoints {
// Iterate through each point
for (index, point) in allPoints.normalizedPoints.enumerated() {
// Apply the transform to convert each point to the face's bounding box range
_ = index
let normalizedPoint = point.applying(affineTransform)
let projected = SCNVector3(normalizedPoint.x, normalizedPoint.y, CGFloat(projectedOrigin.z))
let unprojected = sceneView.unprojectPoint(projected)
vertices.append(unprojected)
}
}
// Setup Indices
var indices = [UInt16]()
// Add indices
// ... Removed for brevity ...
// Setup texture coordinates
var coordinates = [CGPoint]()
// Add texture coordinates
// ... Removed for brevity ...
// Setup texture image
let imageWidth = 2048.0
let normalizedCoordinates = coordinates.map { coord -> CGPoint in
let x = coord.x / CGFloat(imageWidth)
let y = coord.y / CGFloat(imageWidth)
let textureCoord = CGPoint(x: x, y: y)
return textureCoord
}
// Setup sources
let sources = SCNGeometrySource(vertices: vertices)
let textureCoordinates = SCNGeometrySource(textureCoordinates: normalizedCoordinates)
// Setup elements
let elements = SCNGeometryElement(indices: indices, primitiveType: .triangles)
// Setup Geometry
let geometry = SCNGeometry(sources: [sources, textureCoordinates], elements: [elements])
geometry.firstMaterial?.diffuse.contents = textureImage
// Setup node
let customFace = SCNNode(geometry: geometry)
sceneView.scene?.rootNode.addChildNode(customFace)
// Append the face to the face nodes array
faceNode.append(face)
}
// Iterate the face nodes and append to the scene
for node in faceNode {
sceneView.scene?.rootNode.addChildNode(node)
}
}
Post not yet marked as solved
Where can I find a comprehensive list of all the classes that the built in Sound Classifier model supports?
Did something change on face detection / Vision Framework on iOS 15?
Using VNDetectFaceLandmarksRequest and reading the VNFaceLandmarkRegion2D to detect eyes is not working on iOS 15 as it did before. I am running the exact same code on an iOS 14 and iOS 15 device and the coordinates are different as seen on the screenshot?
Any Ideas?
Post not yet marked as solved
VNContoursObservation is taking 715 times as long as OpenCV’s findContours() when creating directly comparable results.
VNContoursObservation creates comparable results when I have set the maximumImageDimension property to 1024. If I set it lower, it runs a bit faster, but creates lower quality contours and still takes over 100 times as long.
I have a hard time believing Apple doesn’t know what they are doing, so does anyone have an idea what is going on and how to get it to run much faster? There doesn’t seem to be many options, but nothing I’ve tried closes the gap. Setting the detectsDarkOnLight property to true makes it run even slower.
OpenCV findContours runs with a binary image, but I am passing a RGB image to Vision assuming it would convert it to an appropriate format.
OpenCV:
double taskStart = CFAbsoluteTimeGetCurrent();
int contoursApproximation = CV_CHAIN_APPROX_NONE;
int contourRetrievalMode = CV_RETR_LIST;
findContours(input, contours, hierarchy, contourRetrievalMode, contoursApproximation, cv::Point(0,0));
NSLog(@"###### opencv findContours: %f", CFAbsoluteTimeGetCurrent() - taskStart);
###### opencv findContours: 0.017616 seconds
Vision:
let taskStart = CFAbsoluteTimeGetCurrent()
let contourRequest = VNDetectContoursRequest.init()
contourRequest.revision = VNDetectContourRequestRevision1
contourRequest.contrastAdjustment = 1.0
contourRequest.detectsDarkOnLight = false
contourRequest.maximumImageDimension = 1024
let requestHandler = VNImageRequestHandler.init(cgImage: sourceImage.cgImage!, options: [:])
try! requestHandler.perform([contourRequest])
let contoursObservation = contourRequest.results?.first as! VNContoursObservation
print(" ###### contoursObservation: \(CFAbsoluteTimeGetCurrent() - taskStart)")
###### contoursObservation: 12.605962038040161
The image I am providing OpenCV is 2048 pixels and the image I am providing Vision is 1024.
Here is the setup.
I have an UIImageView in which I write some text, using UIGraphicsBeginImageContext.
I pass this image to the OCR func:
func ocrText(onImage: UIImage?) {
let request = VNRecognizeTextRequest { request, error in
guard let observations = request.results as? [VNRecognizedTextObservation] else {
fatalError("Received invalid observations")
}
print("observations", observations.count)
for observation in observations {
if observation.topCandidates(1).isEmpty {
continue
}
}
} // end of request handler
request.recognitionLanguages = ["fr"]
let requests = [request]
DispatchQueue.global(qos: .userInitiated).async {
let ocrGroup = DispatchGroup()
guard let img = onImage?.cgImage else { return } // Conversion to cgImage works OK
print("img", img, img.width)
let (_, _) = onImage!.logImageSizeInKB(scale: 1)
ocrGroup.enter()
let handler = VNImageRequestHandler(cgImage: img, options: [:])
try? handler.perform(requests)
ocrGroup.leave()
ocrGroup.wait()
}
}
Problem is that observations is an empty array. I get the following logs:
img <CGImage 0x7fa53b350b60> (DP)
<<CGColorSpace 0x6000032f1e00> (kCGColorSpaceICCBased; kCGColorSpaceModelRGB; sRGB IEC61966-2.1)>
width = 398, height = 164, bpc = 8, bpp = 32, row bytes = 1600
kCGImageAlphaPremultipliedFirst | kCGImageByteOrder32Little | kCGImagePixelFormatPacked
is mask? No, has masking color? No, has soft mask? No, has matte? No, should interpolate? Yes 398
ImageSize(KB): 5 ko
2022-06-02 17:21:03.734258+0200 App[6949:2718734] Metal API Validation Enabled
observations 0
Which shows image is loaded and converted correctly to cgImage.
But no observations.
Now, if I use the same func on a snapshot image of the text drawn on screen, it works correctly.
Is there a difference between the image created by camera and image drawn in CGContext ?
Here is how mainImageView!.image (used in ocr) is created in a subclass of UIImageView:
override func touchesEnded(_ touches: Set<UITouch>, with event: UIEvent?) {
// Merge tempImageView into mainImageView
UIGraphicsBeginImageContext(mainImageView!.frame.size)
mainImageView!.image?.draw(in: CGRect(x: 0, y: 0, width: frame.size.width, height: frame.size.height), blendMode: .normal, alpha: 1.0)
tempImageView!.image?.draw(in: CGRect(x: 0, y: 0, width: frame.size.width, height: frame.size.height), blendMode: .normal, alpha: opacity)
mainImageView!.image = UIGraphicsGetImageFromCurrentImageContext()
UIGraphicsEndImageContext()
tempImageView?.image = nil
}
I also draw the created image in a test UIImageView and get the correct image.
Here are the logs for the drawn texte and from the capture:
Drawing doesn't work
img <CGImage 0x7fb96b81a030> (DP)
<<CGColorSpace 0x600003322160> (kCGColorSpaceICCBased; kCGColorSpaceModelRGB; sRGB IEC61966-2.1)>
width = 398, height = 164, bpc = 8, bpp = 32, row bytes = 1600
kCGImageAlphaPremultipliedFirst | kCGImageByteOrder32Little | kCGImagePixelFormatPacked
is mask? No, has masking color? No, has soft mask? No, has matte? No, should interpolate? Yes 398
ImageSize(KB): 5 ko
2022-06-02 15:38:51.115476+0200 Numerare[5313:2653328] Metal API Validation Enabled
observations 0
Screen shot : Works
img <CGImage 0x7f97641720f0> (IP)
<<CGColorSpace 0x60000394c960> (kCGColorSpaceICCBased; kCGColorSpaceModelRGB; iMac)>
width = 570, height = 276, bpc = 8, bpp = 32, row bytes = 2280
kCGImageAlphaNoneSkipLast | 0 (default byte order) | kCGImagePixelFormatPacked
is mask? No, has masking color? No, has soft mask? No, has matte? No, should interpolate? Yes 570
ImageSize(KB): 5 ko
2022-06-02 15:43:32.158701+0200 Numerare[5402:2657059] Metal API Validation Enabled
2022-06-02 15:43:33.122941+0200 Numerare[5402:2657057] [WARNING] Resource not found for 'fr_FR'. Character language model will be disabled during language correction.
observations 1
Is there an issue with kCGColorSpaceModelRGB ?
Post not yet marked as solved
Hi,
When using VNFeaturePrintObservation and then computing the distance using two images, the values that it returns varies heavily. When two identical images (same image file) is inputted into function (below) that I have used to compare the images, the distance does not return 0 while it is expected to, since they are identical images.
Also, what is the upper limit of computeDistance? I am trying to find the percentage similarity between the two images. (Of course, this cannot be done unless the issue above is resolved).
Code that I have used is below
func featureprintObservationForImage(image: UIImage) -> VNFeaturePrintObservation? {
let requestHandler = VNImageRequestHandler(cgImage: image.cgImage!, options: [:])
let request = VNGenerateImageFeaturePrintRequest()
request.usesCPUOnly = true // Simulator Testing
do {
try requestHandler.perform([request])
return request.results?.first as? VNFeaturePrintObservation
} catch {
print("Vision Error: \(error)")
return nil
}
}
func compare(origImg: UIImage, drawnImg: UIImage) -> Float? {
let oImgObservation = featureprintObservationForImage(image: origImg)
let dImgObservation = featureprintObservationForImage(image: drawnImg)
if let oImgObservation = oImgObservation {
if let dImgObservation = dImgObservation {
var distance: Float = -1
do {
try oImgObservation.computeDistance(&distance, to: dImgObservation)
} catch {
fatalError("Failed to Compute Distance")
}
if distance == -1 {
return nil
} else {
return distance
}
} else {
print("Drawn Image Observation found Nil")
}
} else {
print("Original Image Observation found Nil")
}
return nil
}
Thanks for all the help!
Post not yet marked as solved
i saw there is a way to track hands with vision, but is there also a way to record that movement and export it to fbx? Oh and is there a way to set only one hand to be recorded or both at the same time? Implementation will be in SwiftUI
Post not yet marked as solved
The 2-d image frame is extracted from a live/ pre-recorded video where the camera is placed behind one player so that the complete tennis court is visible in the frame. The court detection and ball detection have been done using CoreML and Vision APIs. Next step is to detect the trajectory and the bounce point of the ball to see if the ball is in/out of the court for scoring and analysis. I've used VNDetectTrajectoryRequest to draw the trajectory of the ball and used the detected court boundingBox as the ROI for trajectory detection.The problem is I am not able to remove the extra noise (coming from player movement in each frame) from the detection as the player is also in ROI. Next, How should I proceed with the ball bounce detection?
private func detectTrajectories(_ controller: CameraViewController, _ buffer : CMSampleBuffer, _ orientation : CGImagePropertyOrientation) throws {
let visionHandler = VNImageRequestHandler(cmSampleBuffer: buffer,
orientation: orientation,
options: [:])
let normalizedFrame = CGRect(x: 0, y: 0, width: 1, height: 1)
DispatchQueue.main.async {
// Get the frame of the rendered view.
self.trajectoryView.frame = controller.viewRectForVisionRect(normalizedFrame)
self.trajectoryView.roi = controller.viewRectForVisionRect(normalizedFrame)
}
//setup trajectory request
setUpDetectTrajectoriesRequestWithMaxDimension()
do {
// Help manage the real-time use case to improve the precision versus delay tradeoff.
detectTrajectoryRequest.targetFrameTime = .zero
// The region of interest where the object is moving in the normalized image space.
detectTrajectoryRequest.regionOfInterest = normalizedFrame
try visionHandler.perform([detectTrajectoryRequest])
} catch {
print("Failed to perform the trajectory request: \(error.localizedDescription)")
return
}
}
func setUpDetectTrajectoriesRequestWithMaxDimension() {
detectTrajectoryRequest = VNDetectTrajectoriesRequest(frameAnalysisSpacing: .zero, trajectoryLength: trajectoryLength, completionHandler: completionHandler)
// detectTrajectoryRequest.
detectTrajectoryRequest
.objectMinimumNormalizedRadius = 0.003
detectTrajectoryRequest.objectMaximumNormalizedRadius = 0.005
}
private func completionHandler(request: VNRequest, error: Error?) {
if let e = error {
print(e)
return
}
guard let observations = request.results as? [VNTrajectoryObservation] else { return }
let relevantTrajectory = observations.filter { $0.confidence > trajectoryDetectionConfidence}
if let trajectory = relevantTrajectory.first {
DispatchQueue.main.async {
print(trajectory.projectedPoints.count)
self.trajectoryView.duration = trajectory.timeRange.duration.seconds
self.trajectoryView.points = trajectory.detectedPoints
self.trajectoryView.performTransition(.fadeIn, duration: 0.05)
if !self.trajectoryView.fullTrajectory.isEmpty {
self.trajectoryView.roi = CGRect(x: 0, y: 0, width: 1, height: 1)
}
}
DispatchQueue.main.asyncAfter(deadline: .now() + 1.5, execute: {
self.trajectoryView.resetPath()
})
}
}
In the completion handler function, I have removed all the VNTrajectoryObservation that have a confidence of less than 0.9. After that, I have created a trajectoryView that displays the detected trajectory on the frame.
Post not yet marked as solved
I am having issues with VNRecognizeTextRequest where the binary images for extracting text from images fails to load. Here is some logs that I have gotten:
WARNING: File mapping at offset 80 of file /System/Library/AssetsV2/com_apple_MobileAsset_LinguisticData/d8e5c8c195c5d8c6372e99004e20e5562158a0d4.asset/AssetData/en.lm/fst.dat could not be honored, reading instead.
WARNING: File mapping at offset 10400 of file /System/Library/AssetsV2/com_apple_MobileAsset_LinguisticData/d8e5c8c195c5d8c6372e99004e20e5562158a0d4.asset/AssetData/en.lm/fst.dat could not be honored, reading instead.
Post not yet marked as solved
Hi All,
I am using vision framework for barcode scanning and now i wanted to support UPU s18 4-state barcodes too. Can u pls guide us how i can achive this functionality.
TIA
A
Hello,
I am reaching out for some assistance regarding integrating a CoreML action classifier into a SwiftUI app. Specifically, I am trying to implement this classifier to work with the live camera of the device. I have been doing some research, but unfortunately, I have not been able to find any relevant information on this topic.
I was wondering if you could provide me with any examples, resources, or information that could help me achieve this integration? Any guidance you can offer would be greatly appreciated.
Thank you in advance for your help and support.
Post not yet marked as solved
Hi -
I apologize if this question has been answered in the past, I can't seem to find a clear answer. I'm wondering if there is a reliable way to leverage individual landmark points from VNDetectHumanHandPoseRequest to calculate a real-world distance. Like the wrist point to the tip of the middle finger and return a calculated result like 7.5" for example.
My assumption is that the same methods used with a manual hitTest to find the distance betwen two points (like in the official Measure app) could work here. However, with hitTest being deprecated, that leaves me a bit of a loss.
I'm happy to continue to dig through the documentation, but before I do, I was hoping someone could let me know if this is even possible or if we're still not quite there yet to be able to leverage the Vision points to calculate an accurate distance (on modern devices that support it)?
I appreciate any feedback or points in the right direction!
Post not yet marked as solved
I have a macOS Sierra Version 10.12.6 27 inch after a few hours of using this computer back gets very hot
any idea how to fix this
thanks
Post not yet marked as solved
I'm referring to this talk:
https://developer.apple.com/videos/play/wwdc2021/10152
I was wondering if the code for the "Image composition" project he demonstrates at the end of the talk (around 24:00) is available somewhere?
Would much appreciate any help.
Post not yet marked as solved
Hello guys,
I am trying to run this sample project on my Ipad, I got a black screen the camera does not initialize.
I tried updating the info.plist and asking for camera permission.
I updated all the devices, did someone tried this demo?
https://developer.apple.com/documentation/vision/detecting_animal_body_poses_with_vision
Post not yet marked as solved
Will there be VNStatefulRequest versions of these classes?
Post not yet marked as solved
I need to detect distance of user from camera using depth data. I am able to detect user using 2d image but how can i detect user using only depth data.
Hi everyone!
I'm implementing a multithreaded approach on text recognition through Vision's VNSequenceRequestHandler and VNRecognizeTextRequest.
I've created multiple threads (let's use 3 for example) and created 3 instances of VNSequenceRequestHandler for each thread.
My AVSession sends me sample buffers (60 per second) and I'm trying to handle them one by one in 3 different threads. These threads constantly trying to consume sample buffers from my temporary sample buffer queue (1 to 3 sample buffers are in queue, they got deleted after handling). Sample buffers are not shared between these threads - one sample buffer is only for one thread's VNSequenceRequestHandler. For each performRequests operation I create a new VNRecognizeTextRequest.
By this I was trying to increase count of sample buffers handled per second.
But what I found out is no matter how many threads I've created (1 or 3), the speed is always about 10 fps (iPhone 13 Pro).
When I use 1 thread, only one instance of VNSequenceRequestHandler is created and used. In this case the [requestHandler performRequests:@[request] onCMSampleBuffer:sampleBuffer error:&error] takes about 100-150ms.
When I use 3 threads, each instance of VNSequenceRequestHandler takes up to 600ms to handle the request with [requestHandler performRequests:@[request] onCMSampleBuffer:sampleBuffer error:&error].
When I have 2 threads, the average time is about 300-400ms.
Does it mean that the VNSequenceRequestHandler inside of a Vision framework share some buffer's or request's queue so they're not able to work separately? Or maybe some single core of a GPU is used for detection?
I saw in the debug session window that the VNSequenceRequestHandler creates separate concurrent dispatch queues for handling the requests (for 2 instances 2 queues created), which in my opinion should not block the resources that much causing requests execution time grow 2 times.
Any ideas what causing the problem?
Hi,
Has anyone gotten the human body pose in 3D sample provided at the following working?
https://developer.apple.com/documentation/vision/detecting_human_body_poses_in_3d_with_vision
I installed iPadOS 17 on a 9th Gen iPad. The sample load up on Mac and iPad. However after selecting an image, it goes into the spinning wheel without anything returned. I hope to play and learn more about the sample. Any pointers or help is greatly appreciated.
Similarly, the Detecting animal body poses with Vision is showing up as blank for me.
https://developer.apple.com/documentation/vision/detecting_animal_body_poses_with_vision
Or does the samples require a device with Lidar? Thank you in advance.
Post not yet marked as solved
I'm using Vision framework for text recognition and detecting rectangles in the image. For that, I'm using VNRecognizeText & VNDetectRectangles features of the Vision. In macOS and iOS results, I found slight difference in the boundingBox coordinates of the text and the rectangles detected for the same image. Is this expected? Can we do anything to make the results identical? Also, on macOS, when I'm using same features of Vision from python (using pyobjc-framework-Vision package), there also i'm getting slightly different results.