Vision

Interview University Research on Vision Machine Learning

Hello, I am Pieter Bikkel. I study Software Engineering at the HAN, University of Applied Sciences, and I am working on an app that can recognize volleyball actions using Machine Learning. A volleyball coach can put an iPhone on a tripod and analyze a volleyball match. For example, where the ball always lands in the field, how hard the ball is served. I was inspired by this session and wondered if I could interview one of the experts in this field. This would allow me to develop my App even better. I hope you can help me with this.

Machine Learning & AI General Vision Machine Learning

0

414

Sep ’23

Anchor Window Group

Hello, I am looking for something to allow me to Anchor a webview component to the user, as in it follows their line of vision as they move. I tried using RealityView with an Anchor Entity, but it raises an error of "Presentations are not permitted within volumetric window scene". Can I anchor the Window instead?

App & System Services Core OS Vision visionOS Reality Composer Pro

2

0

1.3k

Sep ’23

how to produce image with semanticSegmentationSkyMatteImage information on iOS?

I'm trying to create a sky mask on pictures taken from my iPhone. I've seen in the documentation that CoreImage support semantic segmentation for Sky among other type for person (skin, hair etc...) For now, I didn't found the proper workflow to use it. First, I watched https://developer.apple.com/videos/play/wwdc2019/225/ I understood that images must be captured with the segmentation with this kind of code: photoSettings.enabledSemanticSegmentationMatteTypes = self.photoOutput.availableSemanticSegmentationMatteTypes photoSettings.embedsSemanticSegmentationMattesInPhoto = true I capture the image on my iPhone, save it as HEIC format then later, I try to load the matte like that : let skyMatte = CIImage(contentsOf: imageURL, options: [.auxiliarySemanticSegmentationSkyMatte: true]) Unfortunately, self.photoOutput.availableSemanticSegmentationMatteTypes always give me a list of types for person only and never a types Sky. Anyway, the AVSemanticSegmentationMatte.MatteType is just [Hair, Skin, Teeth, Glasses] ... No Sky !!! So, How am I supposed to use semanticSegmentationSkyMatteImage ?!? Is there any simple workaround ?

Media Technologies General Image I/O Vision Core Image

1

0

699

Aug ’23

RealityView is not responding to tap gesture

Hello, I have created a view with a 360 image full view, and I need to perform a task when the user clicks anywhere on the screen (leave the dome), but no matter what I try, it just does not work, it doesn't print anything at all. import SwiftUI import RealityKit import RealityKitContent struct StreetWalk: View { @Binding var threeSixtyImage: String @Binding var isExitFaded: Bool var body: some View { RealityView { content in // Create a material with a 360 image guard let url = Bundle.main.url(forResource: threeSixtyImage, withExtension: "jpeg"), let resource = try? await TextureResource(contentsOf: url) else { // If the asset isn't available, something is wrong with the app. fatalError("Unable to load starfield texture.") } var material = UnlitMaterial() material.color = .init(texture: .init(resource)) // Attach the material to a large sphere. let streeDome = Entity() streeDome.name = "streetDome" streeDome.components.set(ModelComponent( mesh: .generatePlane(width: 1000, depth: 1000), materials: [material] )) // Ensure the texture image points inward at the viewer. streeDome.scale *= .init(x: -1, y: 1, z: 1) content.add(streeDome) } update: { updatedContent in // Create a material with a 360 image guard let url = Bundle.main.url(forResource: threeSixtyImage, withExtension: "jpeg"), let resource = try? TextureResource.load(contentsOf: url) else { // If the asset isn't available, something is wrong with the app. fatalError("Unable to load starfield texture.") } var material = UnlitMaterial() material.color = .init(texture: .init(resource)) updatedContent.entities.first?.components.set(ModelComponent( mesh: .generateSphere(radius: 1000), materials: [material] )) } .gesture(tap) } var tap: some Gesture { SpatialTapGesture().targetedToAnyEntity().onChanged{ value in // Access the tapped entity here. print(value.entity) print("maybe you can tap the dome") // isExitFaded.toggle() } }

Graphics & Games RealityKit RealityKit Vision AR / VR visionOS

1

0

1.4k

Aug ’23

ActionClassifier in SwiftUI App

Hello, I am reaching out for some assistance regarding integrating a CoreML action classifier into a SwiftUI app. Specifically, I am trying to implement this classifier to work with the live camera of the device. I have been doing some research, but unfortunately, I have not been able to find any relevant information on this topic. I was wondering if you could provide me with any examples, resources, or information that could help me achieve this integration? Any guidance you can offer would be greatly appreciated. Thank you in advance for your help and support.

UI Frameworks SwiftUI SwiftUI Vision Core ML Create ML

1

0

932

Aug ’23

Question regarding Coordinate space of BoundingBoxes reported by VNFaceObservation

I am trying to use VNDetectFaceRectanglesRequest to detect face bounding boxes on frames obtained by ARKit callbacks. I have my app in Portrait Device Orientation and I am passing the .right orientation to perform method on VNSequenceRequestHandler something like: private let requestHandler = VNSequenceRequestHandler() private var facePoseRequest: VNDetectFaceRectanglesRequest! // ... try? self.requestHandler.perform([self.facePoseRequest], on: currentBuffer, orientation: orientation) Im setting .right for orientation above, in the hopes that Vision-Framework will re-orient before running inference. Im trying to draw the returned BB on top of the Image. Here's my results processing code: guard let faceRes = self.facePoseRequest.results?.first as? VNFaceObservation else { return } //Option1: Assuming reported BB is in coordinate space of orientation-adjusted pixel buffer // Problems/Observations: // BoundingBox turns into a square with equal width and height // Also BB does not cover entire face, but only from chin to eyes //Notice Height & Width are flipped below let flippedBB = VNImageRectForNormalizedRect(faceRes.boundingBox, currBufHeight, currBufWidth) //vs //Option2: Assuming, reported BB is in coordinate-system of original un-oriented pixel-buffer // Problem/Observations: // while the drawn BB does appear like a rectangle and covering most of the face, it is not always centered on the face. // It moves around the screen when I tilt the device or my face. let currBufWidth = CVPixelBufferGetWidth(currentBuffer) let currBufHeight = CVPixelBufferGetHeight(currentBuffer) let reportedBB = VNImageRectForNormalizedRect(faceRes.boundingBox, currBufWidth, currBufHeight) In Option1 above: BoundingBox becomes a square shape with Width and Height becoming equal. I noticed that the reported normalized BB has the same aspect ration as the Input Pixel Buffer, which is 1.33 . This is the reason that when I flip Width and Height params in VNImageRectForNormalizedRect, width and height become equal. In Option2 above: BB seems to be somewhat right height, it jumps around when I tilt the device or my head. What coordinate system are the reported bounding boxes in? Do I need to adjust for y-flippedness of Vision framework before I perform above operations? What's the best way to draw these BB on the captured-frame and or ARview? Thank you

Spatial Computing ARKit ARKit Vision

0

455

Aug ’23

How to control a rigged 3d hand model via hand motion capture?

Hi, I want to control a hand model via hand motion capture. I know there is a sample project and some articles about Rigging a Model for Motion Capture in ARKit document. BUT The solution is quite encapsulated in BodyTrackedEntity. I can't find appropriate Entity for controlling just a hand model. By using VNDetectHumanHandPoseRequest provided by Vision framework, I can get hand joint info, but I don't know how to use that info in RealityKit to control a 3d hand model. Do you know how to do that or do you have any idea on how should it be implemented? Thanks

Spatial Computing ARKit ARKit RealityKit Vision

0

598

Aug ’23

What is the accuracy of finger joint detection? Discrimination and distance detection

When I customize the gesture interaction, how do I set the key value? It depends on the accuracy of finger joint recognition and distance detection. What is the accuracy of finger joint detection? discrimination and distance detection

Spatial Computing ARKit ARKit Vision

0

437

Jul ’23

wwdc21-10040: Detect people, faces, and poses using Vision

Can you share the source code for the demo of the Vision Face Detector with the metrics (roll, yaw and pitch) displayed? You provide some code online but not for this portion of the presentation.

Machine Learning & AI General Vision Machine Learning

0

420

Jul ’23

Properly projecting points with different orientations and camera positions?

Summary: I am using the Vision framework, in conjunction with AVFoundation, to detect facial landmarks of each face in the camera feed (by way of the VNDetectFaceLandmarksRequest). From here, I am taking the found observations and unprojecting each point to a SceneKit View (SCNView), then using those points as the vertices to draw a custom geometry that is textured with a material over each found face. Effectively, I am working to recreate how an ARFaceTrackingConfiguration functions. In general, this task is functioning as expected, but only when my device is using the front camera in landscape right orientation. When I rotate my device, or switch to the rear camera, the unprojected points do not properly align with the found face as they do in landscape right/front camera. Problem: When testing this code, the mesh appears properly (that is, appears affixed to a user's face), but again, only when using the front camera in landscape right. While the code runs as expected (that is, generating the face mesh for each found face) in all orientations, the mesh is wildly misaligned in all other cases. My belief is this issue either stems from my converting the face's bounding box (using VNImageRectForNormalizedRect, which I am calculating using the width/height of my SCNView, not my pixel buffer, which is typically much larger), though all modifications I have tried result in the same issue. Outside of that, I also believe this could be an issue with my SCNCamera, as I am a bit unsure how the transform/projection matrix works and whether that would be needed here. Sample of Vision Request Setup: // Setup Vision request options var requestHandlerOptions: [VNImageOption: AnyObject] = [:] // Setup Camera Intrinsics let cameraIntrinsicData = CMGetAttachment(sampleBuffer, key: kCMSampleBufferAttachmentKey_CameraIntrinsicMatrix, attachmentModeOut: nil) if cameraIntrinsicData != nil { requestHandlerOptions[VNImageOption.cameraIntrinsics] = cameraIntrinsicData } // Set EXIF orientation let exifOrientation = self.exifOrientationForCurrentDeviceOrientation() // Setup vision request handler let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: exifOrientation, options: requestHandlerOptions) // Setup the completion handler let completion: VNRequestCompletionHandler = {request, error in let observations = request.results as! [VNFaceObservation] // Draw faces DispatchQueue.main.async { drawFaceGeometry(observations: observations) } } // Setup the image request let request = VNDetectFaceLandmarksRequest(completionHandler: completion) // Handle the request do { try handler.perform([request]) } catch { print(error) } Sample of SCNView Setup: // Setup SCNView let scnView = SCNView() scnView.translatesAutoresizingMaskIntoConstraints = false self.view.addSubview(scnView) scnView.showsStatistics = true NSLayoutConstraint.activate([ scnView.leadingAnchor.constraint(equalTo: self.view.leadingAnchor), scnView.topAnchor.constraint(equalTo: self.view.topAnchor), scnView.bottomAnchor.constraint(equalTo: self.view.bottomAnchor), scnView.trailingAnchor.constraint(equalTo: self.view.trailingAnchor) ]) // Setup scene let scene = SCNScene() scnView.scene = scene // Setup camera let cameraNode = SCNNode() let camera = SCNCamera() cameraNode.camera = camera scnView.scene?.rootNode.addChildNode(cameraNode) cameraNode.position = SCNVector3(x: 0, y: 0, z: 16) // Setup light let ambientLightNode = SCNNode() ambientLightNode.light = SCNLight() ambientLightNode.light?.type = SCNLight.LightType.ambient ambientLightNode.light?.color = UIColor.darkGray scnView.scene?.rootNode.addChildNode(ambientLightNode) Sample of "face processing" func drawFaceGeometry(observations: [VNFaceObservation]) { // An array of face nodes, one SCNNode for each detected face var faceNode = [SCNNode]() // The origin point let projectedOrigin = sceneView.projectPoint(SCNVector3Zero) // Iterate through each found face for observation in observations { // Setup a SCNNode for the face let face = SCNNode() // Setup the found bounds let faceBounds = VNImageRectForNormalizedRect(observation.boundingBox, Int(self.scnView.bounds.width), Int(self.scnView.bounds.height)) // Verify we have landmarks if let landmarks = observation.landmarks { // Landmarks are relative to and normalized within face bounds let affineTransform = CGAffineTransform(translationX: faceBounds.origin.x, y: faceBounds.origin.y) .scaledBy(x: faceBounds.size.width, y: faceBounds.size.height) // Add all points as vertices var vertices = [SCNVector3]() // Verify we have points if let allPoints = landmarks.allPoints { // Iterate through each point for (index, point) in allPoints.normalizedPoints.enumerated() { // Apply the transform to convert each point to the face's bounding box range _ = index let normalizedPoint = point.applying(affineTransform) let projected = SCNVector3(normalizedPoint.x, normalizedPoint.y, CGFloat(projectedOrigin.z)) let unprojected = sceneView.unprojectPoint(projected) vertices.append(unprojected) } } // Setup Indices var indices = [UInt16]() // Add indices // ... Removed for brevity ... // Setup texture coordinates var coordinates = [CGPoint]() // Add texture coordinates // ... Removed for brevity ... // Setup texture image let imageWidth = 2048.0 let normalizedCoordinates = coordinates.map { coord -> CGPoint in let x = coord.x / CGFloat(imageWidth) let y = coord.y / CGFloat(imageWidth) let textureCoord = CGPoint(x: x, y: y) return textureCoord } // Setup sources let sources = SCNGeometrySource(vertices: vertices) let textureCoordinates = SCNGeometrySource(textureCoordinates: normalizedCoordinates) // Setup elements let elements = SCNGeometryElement(indices: indices, primitiveType: .triangles) // Setup Geometry let geometry = SCNGeometry(sources: [sources, textureCoordinates], elements: [elements]) geometry.firstMaterial?.diffuse.contents = textureImage // Setup node let customFace = SCNNode(geometry: geometry) sceneView.scene?.rootNode.addChildNode(customFace) // Append the face to the face nodes array faceNode.append(face) } // Iterate the face nodes and append to the scene for node in faceNode { sceneView.scene?.rootNode.addChildNode(node) } }

Graphics & Games SceneKit SceneKit Vision AVFoundation

3

0

1.8k

Jul ’23

Post

Replies

Boosts

Views

Activity

Vision

Posts under Vision tag

Post

Replies

Boosts

Views

Activity