Hello, I am Pieter Bikkel. I study Software Engineering at the HAN, University of Applied Sciences, and I am working on an app that can recognize volleyball actions using Machine Learning. A volleyball coach can put an iPhone on a tripod and analyze a volleyball match. For example, where the ball always lands in the field, how hard the ball is served. I was inspired by this session and wondered if I could interview one of the experts in this field. This would allow me to develop my App even better. I hope you can help me with this.
Vision
RSS for tagApply computer vision algorithms to perform a variety of tasks on input images and video using Vision.
Posts under Vision tag
90 Posts
Sort by:
Post
Replies
Boosts
Views
Activity
Hello,
I am looking for something to allow me to Anchor a webview component to the user, as in it follows their line of vision as they move.
I tried using RealityView with an Anchor Entity, but it raises an error of "Presentations are not permitted within volumetric window scene". Can I anchor the Window instead?
I'm trying to create a sky mask on pictures taken from my iPhone. I've seen in the documentation that CoreImage support semantic segmentation for Sky among other type for person (skin, hair etc...)
For now, I didn't found the proper workflow to use it.
First, I watched https://developer.apple.com/videos/play/wwdc2019/225/
I understood that images must be captured with the segmentation with this kind of code:
photoSettings.enabledSemanticSegmentationMatteTypes = self.photoOutput.availableSemanticSegmentationMatteTypes
photoSettings.embedsSemanticSegmentationMattesInPhoto = true
I capture the image on my iPhone, save it as HEIC format then later, I try to load the matte like that :
let skyMatte = CIImage(contentsOf: imageURL, options: [.auxiliarySemanticSegmentationSkyMatte: true])
Unfortunately, self.photoOutput.availableSemanticSegmentationMatteTypes always give me a list of types for person only and never a types Sky.
Anyway, the AVSemanticSegmentationMatte.MatteType is just [Hair, Skin, Teeth, Glasses] ... No Sky !!!
So, How am I supposed to use semanticSegmentationSkyMatteImage ?!? Is there any simple workaround ?
Hello, I have created a view with a 360 image full view, and I need to perform a task when the user clicks anywhere on the screen (leave the dome), but no matter what I try, it just does not work, it doesn't print anything at all.
import SwiftUI
import RealityKit
import RealityKitContent
struct StreetWalk: View {
@Binding var threeSixtyImage: String
@Binding var isExitFaded: Bool
var body: some View {
RealityView { content in
// Create a material with a 360 image
guard let url = Bundle.main.url(forResource: threeSixtyImage, withExtension: "jpeg"),
let resource = try? await TextureResource(contentsOf: url) else {
// If the asset isn't available, something is wrong with the app.
fatalError("Unable to load starfield texture.")
}
var material = UnlitMaterial()
material.color = .init(texture: .init(resource))
// Attach the material to a large sphere.
let streeDome = Entity()
streeDome.name = "streetDome"
streeDome.components.set(ModelComponent(
mesh: .generatePlane(width: 1000, depth: 1000),
materials: [material]
))
// Ensure the texture image points inward at the viewer.
streeDome.scale *= .init(x: -1, y: 1, z: 1)
content.add(streeDome)
}
update: { updatedContent in
// Create a material with a 360 image
guard let url = Bundle.main.url(forResource: threeSixtyImage,
withExtension: "jpeg"),
let resource = try? TextureResource.load(contentsOf: url) else {
// If the asset isn't available, something is wrong with the app.
fatalError("Unable to load starfield texture.")
}
var material = UnlitMaterial()
material.color = .init(texture: .init(resource))
updatedContent.entities.first?.components.set(ModelComponent(
mesh: .generateSphere(radius: 1000),
materials: [material]
))
}
.gesture(tap)
}
var tap: some Gesture {
SpatialTapGesture().targetedToAnyEntity().onChanged{ value in
// Access the tapped entity here.
print(value.entity)
print("maybe you can tap the dome")
// isExitFaded.toggle()
}
}
Hello,
I am reaching out for some assistance regarding integrating a CoreML action classifier into a SwiftUI app. Specifically, I am trying to implement this classifier to work with the live camera of the device. I have been doing some research, but unfortunately, I have not been able to find any relevant information on this topic.
I was wondering if you could provide me with any examples, resources, or information that could help me achieve this integration? Any guidance you can offer would be greatly appreciated.
Thank you in advance for your help and support.
I am trying to use VNDetectFaceRectanglesRequest to detect face bounding boxes on frames obtained by ARKit callbacks.
I have my app in Portrait Device Orientation and I am passing the .right orientation to perform method on VNSequenceRequestHandler
something like:
private let requestHandler = VNSequenceRequestHandler()
private var facePoseRequest: VNDetectFaceRectanglesRequest!
// ...
try? self.requestHandler.perform([self.facePoseRequest], on: currentBuffer, orientation: orientation)
Im setting .right for orientation above, in the hopes that Vision-Framework will re-orient before running inference.
Im trying to draw the returned BB on top of the Image. Here's my results processing code:
guard let faceRes = self.facePoseRequest.results?.first as? VNFaceObservation else {
return
}
//Option1: Assuming reported BB is in coordinate space of orientation-adjusted pixel buffer
// Problems/Observations:
// BoundingBox turns into a square with equal width and height
// Also BB does not cover entire face, but only from chin to eyes
//Notice Height & Width are flipped below
let flippedBB = VNImageRectForNormalizedRect(faceRes.boundingBox, currBufHeight, currBufWidth)
//vs
//Option2: Assuming, reported BB is in coordinate-system of original un-oriented pixel-buffer
// Problem/Observations:
// while the drawn BB does appear like a rectangle and covering most of the face, it is not always centered on the face.
// It moves around the screen when I tilt the device or my face.
let currBufWidth = CVPixelBufferGetWidth(currentBuffer)
let currBufHeight = CVPixelBufferGetHeight(currentBuffer)
let reportedBB = VNImageRectForNormalizedRect(faceRes.boundingBox, currBufWidth, currBufHeight)
In Option1 above:
BoundingBox becomes a square shape with Width and Height becoming equal. I noticed that the reported normalized BB has the same aspect ration as the Input Pixel Buffer, which is 1.33 . This is the reason that when I flip Width and Height params in VNImageRectForNormalizedRect, width and height become equal.
In Option2 above:
BB seems to be somewhat right height, it jumps around when I tilt the device or my head.
What coordinate system are the reported bounding boxes in?
Do I need to adjust for y-flippedness of Vision framework before I perform above operations?
What's the best way to draw these BB on the captured-frame and or ARview?
Thank you
Hi,
I want to control a hand model via hand motion capture.
I know there is a sample project and some articles about Rigging a Model for Motion Capture in ARKit document. BUT The solution is quite encapsulated in BodyTrackedEntity. I can't find appropriate Entity for controlling just a hand model.
By using VNDetectHumanHandPoseRequest provided by Vision framework, I can get hand joint info, but I don't know how to use that info in RealityKit to control a 3d hand model.
Do you know how to do that or do you have any idea on how should it be implemented?
Thanks
When I customize the gesture interaction, how do I set the key value? It depends on the accuracy of finger joint recognition and distance detection. What is the accuracy of finger joint detection? discrimination and distance detection
Can you share the source code for the demo of the Vision Face Detector with the metrics (roll, yaw and pitch) displayed? You provide some code online but not for this portion of the presentation.
Summary:
I am using the Vision framework, in conjunction with AVFoundation, to detect facial landmarks of each face in the camera feed (by way of the VNDetectFaceLandmarksRequest). From here, I am taking the found observations and unprojecting each point to a SceneKit View (SCNView), then using those points as the vertices to draw a custom geometry that is textured with a material over each found face.
Effectively, I am working to recreate how an ARFaceTrackingConfiguration functions. In general, this task is functioning as expected, but only when my device is using the front camera in landscape right orientation. When I rotate my device, or switch to the rear camera, the unprojected points do not properly align with the found face as they do in landscape right/front camera.
Problem:
When testing this code, the mesh appears properly (that is, appears affixed to a user's face), but again, only when using the front camera in landscape right. While the code runs as expected (that is, generating the face mesh for each found face) in all orientations, the mesh is wildly misaligned in all other cases.
My belief is this issue either stems from my converting the face's bounding box (using VNImageRectForNormalizedRect, which I am calculating using the width/height of my SCNView, not my pixel buffer, which is typically much larger), though all modifications I have tried result in the same issue.
Outside of that, I also believe this could be an issue with my SCNCamera, as I am a bit unsure how the transform/projection matrix works and whether that would be needed here.
Sample of Vision Request Setup:
// Setup Vision request options
var requestHandlerOptions: [VNImageOption: AnyObject] = [:]
// Setup Camera Intrinsics
let cameraIntrinsicData = CMGetAttachment(sampleBuffer, key: kCMSampleBufferAttachmentKey_CameraIntrinsicMatrix, attachmentModeOut: nil)
if cameraIntrinsicData != nil {
requestHandlerOptions[VNImageOption.cameraIntrinsics] = cameraIntrinsicData
}
// Set EXIF orientation
let exifOrientation = self.exifOrientationForCurrentDeviceOrientation()
// Setup vision request handler
let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer,
orientation: exifOrientation,
options: requestHandlerOptions)
// Setup the completion handler
let completion: VNRequestCompletionHandler = {request, error in
let observations = request.results as! [VNFaceObservation]
// Draw faces
DispatchQueue.main.async {
drawFaceGeometry(observations: observations)
}
}
// Setup the image request
let request = VNDetectFaceLandmarksRequest(completionHandler: completion)
// Handle the request
do {
try handler.perform([request])
} catch {
print(error)
}
Sample of SCNView Setup:
// Setup SCNView
let scnView = SCNView()
scnView.translatesAutoresizingMaskIntoConstraints = false
self.view.addSubview(scnView)
scnView.showsStatistics = true
NSLayoutConstraint.activate([
scnView.leadingAnchor.constraint(equalTo: self.view.leadingAnchor),
scnView.topAnchor.constraint(equalTo: self.view.topAnchor),
scnView.bottomAnchor.constraint(equalTo: self.view.bottomAnchor),
scnView.trailingAnchor.constraint(equalTo: self.view.trailingAnchor)
])
// Setup scene
let scene = SCNScene()
scnView.scene = scene
// Setup camera
let cameraNode = SCNNode()
let camera = SCNCamera()
cameraNode.camera = camera
scnView.scene?.rootNode.addChildNode(cameraNode)
cameraNode.position = SCNVector3(x: 0, y: 0, z: 16)
// Setup light
let ambientLightNode = SCNNode()
ambientLightNode.light = SCNLight()
ambientLightNode.light?.type = SCNLight.LightType.ambient
ambientLightNode.light?.color = UIColor.darkGray
scnView.scene?.rootNode.addChildNode(ambientLightNode)
Sample of "face processing"
func drawFaceGeometry(observations: [VNFaceObservation]) {
// An array of face nodes, one SCNNode for each detected face
var faceNode = [SCNNode]()
// The origin point
let projectedOrigin = sceneView.projectPoint(SCNVector3Zero)
// Iterate through each found face
for observation in observations {
// Setup a SCNNode for the face
let face = SCNNode()
// Setup the found bounds
let faceBounds = VNImageRectForNormalizedRect(observation.boundingBox, Int(self.scnView.bounds.width), Int(self.scnView.bounds.height))
// Verify we have landmarks
if let landmarks = observation.landmarks {
// Landmarks are relative to and normalized within face bounds
let affineTransform = CGAffineTransform(translationX: faceBounds.origin.x, y: faceBounds.origin.y)
.scaledBy(x: faceBounds.size.width, y: faceBounds.size.height)
// Add all points as vertices
var vertices = [SCNVector3]()
// Verify we have points
if let allPoints = landmarks.allPoints {
// Iterate through each point
for (index, point) in allPoints.normalizedPoints.enumerated() {
// Apply the transform to convert each point to the face's bounding box range
_ = index
let normalizedPoint = point.applying(affineTransform)
let projected = SCNVector3(normalizedPoint.x, normalizedPoint.y, CGFloat(projectedOrigin.z))
let unprojected = sceneView.unprojectPoint(projected)
vertices.append(unprojected)
}
}
// Setup Indices
var indices = [UInt16]()
// Add indices
// ... Removed for brevity ...
// Setup texture coordinates
var coordinates = [CGPoint]()
// Add texture coordinates
// ... Removed for brevity ...
// Setup texture image
let imageWidth = 2048.0
let normalizedCoordinates = coordinates.map { coord -> CGPoint in
let x = coord.x / CGFloat(imageWidth)
let y = coord.y / CGFloat(imageWidth)
let textureCoord = CGPoint(x: x, y: y)
return textureCoord
}
// Setup sources
let sources = SCNGeometrySource(vertices: vertices)
let textureCoordinates = SCNGeometrySource(textureCoordinates: normalizedCoordinates)
// Setup elements
let elements = SCNGeometryElement(indices: indices, primitiveType: .triangles)
// Setup Geometry
let geometry = SCNGeometry(sources: [sources, textureCoordinates], elements: [elements])
geometry.firstMaterial?.diffuse.contents = textureImage
// Setup node
let customFace = SCNNode(geometry: geometry)
sceneView.scene?.rootNode.addChildNode(customFace)
// Append the face to the face nodes array
faceNode.append(face)
}
// Iterate the face nodes and append to the scene
for node in faceNode {
sceneView.scene?.rootNode.addChildNode(node)
}
}