Apply computer vision algorithms to perform a variety of tasks on input images and video using Vision.

Posts under Vision tag

90 Posts
Sort by:

Post

Replies

Boosts

Views

Activity

Interview University Research on Vision Machine Learning
Hello, I am Pieter Bikkel. I study Software Engineering at the HAN, University of Applied Sciences, and I am working on an app that can recognize volleyball actions using Machine Learning. A volleyball coach can put an iPhone on a tripod and analyze a volleyball match. For example, where the ball always lands in the field, how hard the ball is served. I was inspired by this session and wondered if I could interview one of the experts in this field. This would allow me to develop my App even better. I hope you can help me with this.
0
0
414
Sep ’23
how to produce image with semanticSegmentationSkyMatteImage information on iOS?
I'm trying to create a sky mask on pictures taken from my iPhone. I've seen in the documentation that CoreImage support semantic segmentation for Sky among other type for person (skin, hair etc...) For now, I didn't found the proper workflow to use it. First, I watched https://developer.apple.com/videos/play/wwdc2019/225/ I understood that images must be captured with the segmentation with this kind of code: photoSettings.enabledSemanticSegmentationMatteTypes = self.photoOutput.availableSemanticSegmentationMatteTypes photoSettings.embedsSemanticSegmentationMattesInPhoto = true I capture the image on my iPhone, save it as HEIC format then later, I try to load the matte like that : let skyMatte = CIImage(contentsOf: imageURL, options: [.auxiliarySemanticSegmentationSkyMatte: true]) Unfortunately, self.photoOutput.availableSemanticSegmentationMatteTypes always give me a list of types for person only and never a types Sky. Anyway, the AVSemanticSegmentationMatte.MatteType is just [Hair, Skin, Teeth, Glasses] ... No Sky !!! So, How am I supposed to use semanticSegmentationSkyMatteImage ?!? Is there any simple workaround ?
1
0
699
Aug ’23
RealityView is not responding to tap gesture
Hello, I have created a view with a 360 image full view, and I need to perform a task when the user clicks anywhere on the screen (leave the dome), but no matter what I try, it just does not work, it doesn't print anything at all. import SwiftUI import RealityKit import RealityKitContent struct StreetWalk: View { @Binding var threeSixtyImage: String @Binding var isExitFaded: Bool var body: some View { RealityView { content in // Create a material with a 360 image guard let url = Bundle.main.url(forResource: threeSixtyImage, withExtension: "jpeg"), let resource = try? await TextureResource(contentsOf: url) else { // If the asset isn't available, something is wrong with the app. fatalError("Unable to load starfield texture.") } var material = UnlitMaterial() material.color = .init(texture: .init(resource)) // Attach the material to a large sphere. let streeDome = Entity() streeDome.name = "streetDome" streeDome.components.set(ModelComponent( mesh: .generatePlane(width: 1000, depth: 1000), materials: [material] )) // Ensure the texture image points inward at the viewer. streeDome.scale *= .init(x: -1, y: 1, z: 1) content.add(streeDome) } update: { updatedContent in // Create a material with a 360 image guard let url = Bundle.main.url(forResource: threeSixtyImage, withExtension: "jpeg"), let resource = try? TextureResource.load(contentsOf: url) else { // If the asset isn't available, something is wrong with the app. fatalError("Unable to load starfield texture.") } var material = UnlitMaterial() material.color = .init(texture: .init(resource)) updatedContent.entities.first?.components.set(ModelComponent( mesh: .generateSphere(radius: 1000), materials: [material] )) } .gesture(tap) } var tap: some Gesture { SpatialTapGesture().targetedToAnyEntity().onChanged{ value in // Access the tapped entity here. print(value.entity) print("maybe you can tap the dome") // isExitFaded.toggle() } }
1
0
1.4k
Aug ’23
ActionClassifier in SwiftUI App
Hello, I am reaching out for some assistance regarding integrating a CoreML action classifier into a SwiftUI app. Specifically, I am trying to implement this classifier to work with the live camera of the device. I have been doing some research, but unfortunately, I have not been able to find any relevant information on this topic. I was wondering if you could provide me with any examples, resources, or information that could help me achieve this integration? Any guidance you can offer would be greatly appreciated. Thank you in advance for your help and support.
1
0
932
Aug ’23
Question regarding Coordinate space of BoundingBoxes reported by VNFaceObservation
I am trying to use VNDetectFaceRectanglesRequest to detect face bounding boxes on frames obtained by ARKit callbacks. I have my app in Portrait Device Orientation and I am passing the .right orientation to perform method on VNSequenceRequestHandler something like: private let requestHandler = VNSequenceRequestHandler() private var facePoseRequest: VNDetectFaceRectanglesRequest! // ... try? self.requestHandler.perform([self.facePoseRequest], on: currentBuffer, orientation: orientation) Im setting .right for orientation above, in the hopes that Vision-Framework will re-orient before running inference. Im trying to draw the returned BB on top of the Image. Here's my results processing code: guard let faceRes = self.facePoseRequest.results?.first as? VNFaceObservation else { return } //Option1: Assuming reported BB is in coordinate space of orientation-adjusted pixel buffer // Problems/Observations: // BoundingBox turns into a square with equal width and height // Also BB does not cover entire face, but only from chin to eyes //Notice Height & Width are flipped below let flippedBB = VNImageRectForNormalizedRect(faceRes.boundingBox, currBufHeight, currBufWidth) //vs //Option2: Assuming, reported BB is in coordinate-system of original un-oriented pixel-buffer // Problem/Observations: // while the drawn BB does appear like a rectangle and covering most of the face, it is not always centered on the face. // It moves around the screen when I tilt the device or my face. let currBufWidth = CVPixelBufferGetWidth(currentBuffer) let currBufHeight = CVPixelBufferGetHeight(currentBuffer) let reportedBB = VNImageRectForNormalizedRect(faceRes.boundingBox, currBufWidth, currBufHeight) In Option1 above: BoundingBox becomes a square shape with Width and Height becoming equal. I noticed that the reported normalized BB has the same aspect ration as the Input Pixel Buffer, which is 1.33 . This is the reason that when I flip Width and Height params in VNImageRectForNormalizedRect, width and height become equal. In Option2 above: BB seems to be somewhat right height, it jumps around when I tilt the device or my head. What coordinate system are the reported bounding boxes in? Do I need to adjust for y-flippedness of Vision framework before I perform above operations? What's the best way to draw these BB on the captured-frame and or ARview? Thank you
0
0
455
Aug ’23
How to control a rigged 3d hand model via hand motion capture?
Hi, I want to control a hand model via hand motion capture. I know there is a sample project and some articles about Rigging a Model for Motion Capture in ARKit document. BUT The solution is quite encapsulated in BodyTrackedEntity. I can't find appropriate Entity for controlling just a hand model. By using VNDetectHumanHandPoseRequest provided by Vision framework, I can get hand joint info, but I don't know how to use that info in RealityKit to control a 3d hand model. Do you know how to do that or do you have any idea on how should it be implemented? Thanks
0
0
598
Aug ’23
Properly projecting points with different orientations and camera positions?
Summary: I am using the Vision framework, in conjunction with AVFoundation, to detect facial landmarks of each face in the camera feed (by way of the VNDetectFaceLandmarksRequest). From here, I am taking the found observations and unprojecting each point to a SceneKit View (SCNView), then using those points as the vertices to draw a custom geometry that is textured with a material over each found face. Effectively, I am working to recreate how an ARFaceTrackingConfiguration functions. In general, this task is functioning as expected, but only when my device is using the front camera in landscape right orientation. When I rotate my device, or switch to the rear camera, the unprojected points do not properly align with the found face as they do in landscape right/front camera. Problem: When testing this code, the mesh appears properly (that is, appears affixed to a user's face), but again, only when using the front camera in landscape right. While the code runs as expected (that is, generating the face mesh for each found face) in all orientations, the mesh is wildly misaligned in all other cases. My belief is this issue either stems from my converting the face's bounding box (using VNImageRectForNormalizedRect, which I am calculating using the width/height of my SCNView, not my pixel buffer, which is typically much larger), though all modifications I have tried result in the same issue. Outside of that, I also believe this could be an issue with my SCNCamera, as I am a bit unsure how the transform/projection matrix works and whether that would be needed here. Sample of Vision Request Setup: // Setup Vision request options var requestHandlerOptions: [VNImageOption: AnyObject] = [:] // Setup Camera Intrinsics let cameraIntrinsicData = CMGetAttachment(sampleBuffer, key: kCMSampleBufferAttachmentKey_CameraIntrinsicMatrix, attachmentModeOut: nil) if cameraIntrinsicData != nil { requestHandlerOptions[VNImageOption.cameraIntrinsics] = cameraIntrinsicData } // Set EXIF orientation let exifOrientation = self.exifOrientationForCurrentDeviceOrientation() // Setup vision request handler let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: exifOrientation, options: requestHandlerOptions) // Setup the completion handler let completion: VNRequestCompletionHandler = {request, error in let observations = request.results as! [VNFaceObservation] // Draw faces DispatchQueue.main.async { drawFaceGeometry(observations: observations) } } // Setup the image request let request = VNDetectFaceLandmarksRequest(completionHandler: completion) // Handle the request do { try handler.perform([request]) } catch { print(error) } Sample of SCNView Setup: // Setup SCNView let scnView = SCNView() scnView.translatesAutoresizingMaskIntoConstraints = false self.view.addSubview(scnView) scnView.showsStatistics = true NSLayoutConstraint.activate([ scnView.leadingAnchor.constraint(equalTo: self.view.leadingAnchor), scnView.topAnchor.constraint(equalTo: self.view.topAnchor), scnView.bottomAnchor.constraint(equalTo: self.view.bottomAnchor), scnView.trailingAnchor.constraint(equalTo: self.view.trailingAnchor) ]) // Setup scene let scene = SCNScene() scnView.scene = scene // Setup camera let cameraNode = SCNNode() let camera = SCNCamera() cameraNode.camera = camera scnView.scene?.rootNode.addChildNode(cameraNode) cameraNode.position = SCNVector3(x: 0, y: 0, z: 16) // Setup light let ambientLightNode = SCNNode() ambientLightNode.light = SCNLight() ambientLightNode.light?.type = SCNLight.LightType.ambient ambientLightNode.light?.color = UIColor.darkGray scnView.scene?.rootNode.addChildNode(ambientLightNode) Sample of "face processing" func drawFaceGeometry(observations: [VNFaceObservation]) { // An array of face nodes, one SCNNode for each detected face var faceNode = [SCNNode]() // The origin point let projectedOrigin = sceneView.projectPoint(SCNVector3Zero) // Iterate through each found face for observation in observations { // Setup a SCNNode for the face let face = SCNNode() // Setup the found bounds let faceBounds = VNImageRectForNormalizedRect(observation.boundingBox, Int(self.scnView.bounds.width), Int(self.scnView.bounds.height)) // Verify we have landmarks if let landmarks = observation.landmarks { // Landmarks are relative to and normalized within face bounds let affineTransform = CGAffineTransform(translationX: faceBounds.origin.x, y: faceBounds.origin.y) .scaledBy(x: faceBounds.size.width, y: faceBounds.size.height) // Add all points as vertices var vertices = [SCNVector3]() // Verify we have points if let allPoints = landmarks.allPoints { // Iterate through each point for (index, point) in allPoints.normalizedPoints.enumerated() { // Apply the transform to convert each point to the face's bounding box range _ = index let normalizedPoint = point.applying(affineTransform) let projected = SCNVector3(normalizedPoint.x, normalizedPoint.y, CGFloat(projectedOrigin.z)) let unprojected = sceneView.unprojectPoint(projected) vertices.append(unprojected) } } // Setup Indices var indices = [UInt16]() // Add indices // ... Removed for brevity ... // Setup texture coordinates var coordinates = [CGPoint]() // Add texture coordinates // ... Removed for brevity ... // Setup texture image let imageWidth = 2048.0 let normalizedCoordinates = coordinates.map { coord -> CGPoint in let x = coord.x / CGFloat(imageWidth) let y = coord.y / CGFloat(imageWidth) let textureCoord = CGPoint(x: x, y: y) return textureCoord } // Setup sources let sources = SCNGeometrySource(vertices: vertices) let textureCoordinates = SCNGeometrySource(textureCoordinates: normalizedCoordinates) // Setup elements let elements = SCNGeometryElement(indices: indices, primitiveType: .triangles) // Setup Geometry let geometry = SCNGeometry(sources: [sources, textureCoordinates], elements: [elements]) geometry.firstMaterial?.diffuse.contents = textureImage // Setup node let customFace = SCNNode(geometry: geometry) sceneView.scene?.rootNode.addChildNode(customFace) // Append the face to the face nodes array faceNode.append(face) } // Iterate the face nodes and append to the scene for node in faceNode { sceneView.scene?.rootNode.addChildNode(node) } }
3
0
1.8k
Jul ’23