Apply computer vision algorithms to perform a variety of tasks on input images and video using Vision.

Posts under Vision tag

102 Posts
Sort by:

Post

Replies

Boosts

Views

Activity

**Title:** Front-Facing Camera Rotation Matrix in ARKit: Consistency, Transformations, and `ARFrame.camera` Alignment
I'm seeking detailed information about the rotation matrix of the iPhone's front-facing (selfie) camera when using ARKit. Specifically, I need to understand: The exact rotation matrix applied to the front-facing camera's output in ARKit. Whether this matrix is consistent across all iPhone models or if there are variations. If there are any transformations applied to align the camera's coordinate system with the device's orientation, particularly in portrait mode. How this rotation matrix relates to the transform property of `ARFrame.camera
0
0
222
2w
Unable to Get Result from DetectHorizonRequest - Result is nil
I am using Apple’s Vision framework with DetectHorizonRequest to detect the horizon in an image. Here is my code: func processHorizonImage(_ ciImage: CIImage) async { let request = DetectHorizonRequest() do { let result = try await request.perform(on: ciImage) print(result) } catch { print(error) } } After calling the perform method, I am getting result as nil. To ensure the request's correctness, I have verified the following: The input CIImage is valid and contains a visible horizon. No errors are being thrown. The relevant frameworks are properly imported. Given that my image contains a clear horizon, why am I still not getting any results? I would appreciate any help or suggestions to resolve this issue. Thank you for your support! This is the image
0
0
186
3w
"failed to processImage" in videoProcessor
Hello, I’m working on a program that analyzes video files frame by frame to detect human poses in each frame. However, during the process of reading observations from the stream, the analysis frequently stops with the following error: [LOG_ERROR] /Library/Caches/com.apple.xbs/Sources/MediaAnalysis/VideoProcessing/VCPHumanPoseImageRequest.mm[85]: code -18 [LOG_ERROR] /Library/Caches/com.apple.xbs/Sources/MediaAnalysis/VideoProcessing/VCPHumanPoseImageRequest.mm[178]: code -18 The error was caught and printed using a do-catch block, and here is the output: Error Domain=NSOSStatusErrorDomain Code=-18 "Error: failed to processImage" UserInfo={NSLocalizedDescription=Error: failed to processImage} While the do-catch block helps prevent the app from crashing, the frames following the error cannot be analyzed. I’m hoping to understand the cause of this error, or find a way to skip the problematic frames and continue analyzing the subsequent ones. My development environment is Xcode Version 16.0 (16A242d) and iOS 18.0. Thank you for your help. (Attaching my code below.) let videoProcessor = VideoProcessor(videoURL) let bodyPoseRequest = DetectHumanBodyPoseRequest() let asset = AVURLAsset(url: videoURL) let videoTrack = try await asset.loadTracks(withMediaType: .video).first let bodyPoseStream = try await videoProcessor.addRequest(bodyPoseRequest) videoProcessor.startAnalysis() do { for try await observations in bodyPoseStream { guard let observation = observations.first else { continue } if let timeRange = observation.timeRange { /// do something... } } } catch { print("\(error.localizedDescription)") }
0
0
146
3w
New Vision API
Hey everyone, I've been updating my code to take advantage of the new Vision API for text recognition in macOS 15. I'm noticing some very odd behavior though, it seems like in general the new Vision API consistently produces worse results than the old API. For reference here is how I'm setting up my request. var request = RecognizeTextRequest() request.recognitionLevel = getOCRMode() // generally accurate request.usesLanguageCorrection = !disableLanguageCorrection // generally true request.recognitionLanguages = language.split(separator: ",").map { Locale.Language(identifier: String($0)) } // generally 'en' let observations = try? await request.perform(on: image) as [RecognizedTextObservation] Then I will process the results and just get the top candidate, which as mentioned above, typically is of worse quality then the same request formed with the old API. Am I doing something wrong here?
0
0
187
3w
Vision framework OCR missing Swedish support?
WWDC 2024 mentioned that the OCR feature from the Vision framework has support for "Korean, Swedish, and Chinese", but the Swedish support does not seem to be available... Running either print(try? VNRecognizeTextRequest().supportedRecognitionLanguages()) or var ocrRequest = RecognizeTextRequest(.revision3) print(ocrRequest.supportedRecognitionLanguages) did not print out Swedish as one of the supported languages, but Korean and Chinese are. Tested on early versions of iOS 18 developer beta, and the latest version of iOS 18.1 (22B5054e).
1
0
258
Oct ’24
Use Vision framework to detect a graph in Swift
I would like to offer the functionality that the user aims the camera at a graph (including axes and scales) and the app detects the graph and the app replicates the graph using the image. I have the whole camera setup finished with a AVCaptureSession, VNDetectContoursRequest, VNImageRequestHandler, etc. However, now I get many many results so I guess I will now need to tell the image processing process what I am looking for. i.e. filter the VNContoursObservations. I 'think' I first need to detect two perpendicular lines (the two axes). How do I do that? If I do not see them, I can just ignore that input and wait for the next VNContoursObservation. When I found the axes of the graph, I will need to find the curve (graph) that I need to scan. Any tips on how I can find that curve and turn that curve into a bunch of coordinates? Thanks! Wouter
1
0
280
Oct ’24
Palm Menu Button Issue
Hi, we have in our app an immersive space and we taught the palm menu button is not available in immersive spaces, but when I look in the hand and tap the menu button appear. Is it possible to keep it hidden? Because we a have an hand tracking feature in palm and when we try to press a button to overlap the palm it triggers the menu button and then when the user presses again by mistake, it sends the application to the background. This is very important for us because we would like to release this hand-tracking feature as soon as possible. Here is a link with to a video with the problem: https://drive.google.com/file/d/1cfOcdzF19h_mbmpvkVNCJjXEBJecVeJL/view?usp=sharing
1
0
301
Sep ’24
Issue with OCR on Swift iOS App: Roboflow API Bounding Boxes Missing After Response
Hi everyone, I'm working on an iOS app built in Swift using Xcode, where I'm integrating Roboflow's object detection API to extract items from grocery receipts. My goal is to identify key information (like items, total, tax, etc.) from the images of these receipts. I'm successfully sending images to the Roboflow API and receiving predictions with bounding box data, but when I attempt to extract text from the detected regions (bounding boxes), it appears that the text extraction is failing—no text is being recognized. The issue seems to be that the bounding boxes are either not properly being handled or something is going wrong in the way I process the API response. Here's a brief breakdown of what I'm doing: The image is captured, converted to base64, and sent to the Roboflow API. The API response comes back with bounding boxes for the detected elements (items, date, subtotal, etc.). The problem occurs when I try to extract the text from the image using the bounding box data—it seems like the bounding boxes are being found, but no text is returned. I suspect the issue might be happening because the app’s segue to the results view controller is triggered before the OCR extraction completes, or there might be a problem in my code handling the bounding box response. Response Data: { "inference_id": "77134cce-91b5-4600-a59b-fab74350ca06", "time": 0.09240847699993537, "image": { "width": 370, "height": 502 }, "predictions": [ { "x": 163.5, "y": 250.5, "width": 313.0, "height": 127.0, "confidence": 0.9357666373252869, "class": "Item", "class_id": 1, "detection_id": "753341d5-07b6-42a1-8926-ecbc61128243" }, { "x": 52.5, "y": 417.5, "width": 89.0, "height": 23.0, "confidence": 0.8819760680198669, "class": "Date", "class_id": 0, "detection_id": "b4681149-d538-47b1-8700-d9528bf1daa0" }, ... ] } And the log showing bounding boxes: Prediction: ["width": 313, "y": 250.5, "x": 163.5, "detection_id": 753341d5-07b6-42a1-8926-ecbc61128243, "class": Item, "height": 127, "confidence": 0.9357666373252869, "class_id": 1] No bounding box found in prediction. I've double-checked the bounding box coordinates, and everything seems fine. Does anyone have experience with using OCR alongside object detection APIs in Swift? Any help on how to ensure the bounding boxes are properly processed and used for OCR would be greatly appreciated! Also, would it help to delay the segue to the results view controller until OCR is complete? Thank you!
0
0
304
Sep ’24
The Vision request does not work in simulator with Error "Could not create inference context"
When I use VNGenerateForegroundInstanceMaskRequest to generate the mask in the simulator by SwiftUI, there is an error "Could not create inference context". Then I add the code to make the vision by CPU: let request = VNGenerateForegroundInstanceMaskRequest() let handler = VNImageRequestHandler(ciImage: inputImage) #if targetEnvironment(simulator) if #available(iOS 18.0, *) { let allDevices = MLComputeDevice.allComputeDevices for device in allDevices { if(device.description.contains("MLCPUComputeDevice")){ request.setComputeDevice(.some(device), for: .main) break } } } else { // Fallback on earlier versions request.usesCPUOnly = true } #endif do { try handler.perform([request]) if let result = request.results?.first { let mask = try result.generateScaledMaskForImage(forInstances: result.allInstances, from: handler) return CIImage(cvPixelBuffer: mask) } } catch { print(error) } Even I force the simulator to run the code by CPU, but it still have the error: "Could not create inference context"
2
0
307
Sep ’24
Detect animal poses in Vision: Detected joints and connection are drawn correctly only on iPhone without ignoring safe area
Hi, I'm trying to personalize the Detect animal poses in Vision example (WWDC 23). Detect animal poses in Vision After some tests I saw that the landmarks and connection drawings work only if I do not ignore the safe area, if I ignore it (removing the toggle) or use the app on the iPad the drawings are no longer applied correctly. In the example GeometryReader is used to detect the size of the view: ... ZStack { GeometryReader { geo in AnimalSkeletonView(animalJoint: animalJoint, size: geo.size) } }.frame(maxWidth: .infinity) ... struct AnimalSkeletonView: View { // Get the animal joint locations. @StateObject var animalJoint = AnimalPoseDetector() var size: CGSize var body: some View { DisplayView(animalJoint: animalJoint) if animalJoint.animalBodyParts.isEmpty == false { // Draw the skeleton of the animal. // Iterate over all recognized points and connect the joints. ZStack { ZStack { // left head if let nose = animalJoint.animalBodyParts[.nose] { if let leftEye = animalJoint.animalBodyParts[.leftEye] { Line(points: [nose.location, leftEye.location], size: size) .stroke(lineWidth: 5.0) .fill(Color.orange) } } ... } } } } } // Create a transform that converts the pose's normalized point. struct Line: Shape { var points: [CGPoint] var size: CGSize func path(in rect: CGRect) -> Path { let pointTransform: CGAffineTransform = .identity .translatedBy(x: 0.0, y: -1.0) .concatenating(.identity.scaledBy(x: 1.0, y: -1.0)) .concatenating(.identity.scaledBy(x: size.width, y: size.height)) var path = Path() path.move(to: points[0]) for point in points { path.addLine(to: point) } return path.applying(pointTransform) } } Looking online I saw that it was recommended to change the property cameraView.previewLayer.videoGravity from: cameraView.previewLayer.videoGravity = .resizeAspectFill to: cameraView.previewLayer.videoGravity = .resizeAspect but it doesn't work for me. Could you help me understand where I'm wrong? Thanks!
1
0
376
Sep ’24
Symbol Not Found Error in VNFaceLandmarkRegion2D with MacCatalyst on macOS 14.6.1 (Xcode 16)
We have updated our cross-platform applications to support iOS 18 and are in the final stages of releasing versions built with MacCatalyst. After merging the MacCatalyst changes with those for iOS 18, we are now required to build the app using Xcode 16. However, since transitioning to Xcode 16, the app builds successfully but crashes immediately on startup with the following error: dyld[45279]: Symbol not found: _$sSo22VNFaceLandmarkRegion2DC6VisionE16normalizedPointsSaySo7CGPointVGvg Referenced from: <211097A0-6612-3A9A-80B5-AE12915EBA2A> /Users/***/Library/Developer/Xcode/DerivedData/DM_iOS_Apps-gzpzdsacfldxxwclyngreqkbhtey/Build/Products/Debug-maccatalyst/MyApp.app/Contents/Frameworks/Filters_MyApp.framework/Versions/A/Filters_MyApp Expected in: <50DB755E-C83C-3FC7-A0BB-9C4DF9FEA374> /System/Library/Frameworks/Vision.framework/Versions/A/Vision This crash occurs only when building the app with Xcode 16 for MacCatalyst on macOS 14.6.1. On iOS and macOS 15, it functions as expected, and it also worked prior to the iOS 18 changes, which are independent of the Vision framework code, when building with Xcode 15. Here are the environment details where the error occurs: Xcode Version: Xcode 16.0 (16A242d) macOS Version: macOS Sonoma 14.6.1 And the setup where it works: Xcode Version: Xcode 16.0 (16A242d) macOS Version: macOS Sequoia 15.0 Additionally, attempting to implement a workaround using pointsInImage(imageSize:) resulted in a similar issue, where the symbol for this method is also missing. Is this a known issue? Are there any workarounds or fixes available? We have already submitted this issue as feedback (FB15164375), along with a demo project to illustrate the problem.
2
0
397
3w
Vision framework not working on Apple Vision Pro
com.apple.Vision Code=9 "Could not build inference plan - ANECF error: failed to load ANE model file:///System/Library/Frameworks/ Vision.framework/anodv4_drop6_fp16.H14G.espresso.hwx Code rise this error: func imageToHeadBox(image: CVPixelBuffer) async throws -> [CGRect] { let request:DetectFaceRectanglesRequest = DetectFaceRectanglesRequest() let faceResult:[FaceObservation] = try await request.perform(on: image) let faceBoxs:[CGRect] = faceResult.map { face in let faceBoundingBox:CGRect = face.boundingBox.cgRect return faceBoundingBox } return faceBoxs }
1
0
517
Sep ’24
Difficulty Locating Center of Pupil Using ARKit – Vision vs. ARKit for Fine Detail?
Hi everyone, I'm working on an AR application where I need to accurately locate the center of the pupil and measure anatomical distances between the pupil and eyelids. I’ve been using ARKit’s face tracking, but I’m having trouble pinpointing the exact center of the pupil. My Questions: Locating Pupil Center in ARKit: Is there a reliable way to detect the exact center of the pupil using ARKit? If so, how can I achieve this? Framework Recommendation: Given the need for fine detail in measurements, would ARKit be sufficient, or would it be better to use the Vision framework for more accurate 2D facial landmark detection? Alternatively, would a hybrid approach, combining Vision for precision and ARKit for 3D tracking, be more effective? What I've Tried: Using ARKit’s ARFaceAnchor to detect face landmarks, but the results for the pupil position seem imprecise for my needs. Considering Vision for 2D detection, but concerned about integrating it into a 3D AR experience. Any insights, code snippets, or guidance would be greatly appreciated! Thanks in advance!
0
1
293
Aug ’24
VisionOS Enterprise API: fail to get cameraFrame in cameraFrameUpdates{}
I am developing an app based on visionOS and need to utilize the main camera access provided by the Enterprise API. I have applied for an enterprise license and added the main camera access capability and the license file in Xcode. In my code, I used await arKitSession.queryAuthorization(for: [.cameraAccess]) to request user permission for camera access. After obtaining the permission, I used arKitSession to run the cameraFrameProvider. However, when running for await cameraFrame in cameraFrameUpdates{ print("hello") guard let mainCameraSample = cameraFrame.sample(for: .left) else { continue } pixelBuffer = mainCameraSample.pixelBuffer } , I am unable to receive any frames from the camera, and even print("hello") within the braces do not execute. The app does not crash or throw any errors. Here is my full code: import SwiftUI import ARKit struct cameraTestView: View { @State var pixelBuffer: CVPixelBuffer? var body: some View { VStack{ Button(action:{ Task { await loadCameraFeed() } }){ Text("test") } if let pixelBuffer = pixelBuffer { let ciImage = CIImage(cvPixelBuffer: pixelBuffer) let context = CIContext(options: nil) if let cgImage = context.createCGImage(ciImage, from: ciImage.extent) { Image(uiImage: UIImage(cgImage: cgImage)) } }else{ Image("exampleCase") .resizable() .scaledToFill() .frame(width: 400,height: 400) } } } func loadCameraFeed() async { // Main Camera Feed Access Example let formats = CameraVideoFormat.supportedVideoFormats(for: .main, cameraPositions:[.left]) let cameraFrameProvider = CameraFrameProvider() let arKitSession = ARKitSession() // main camera feed access example var cameraAuthorization = await arKitSession.queryAuthorization(for: [.cameraAccess]) guard cameraAuthorization == [ARKitSession.AuthorizationType.cameraAccess:ARKitSession.AuthorizationStatus.allowed] else { return } do { try await arKitSession.run([cameraFrameProvider]) } catch { return } let cameraFrameUpdates = cameraFrameProvider.cameraFrameUpdates(for: formats[0]) if cameraFrameUpdates != nil { print("identify cameraFrameUpdates") } else{ print("fail to get cameraFrameUpdates") return } for await cameraFrame in cameraFrameUpdates! { print("hello") guard let mainCameraSample = cameraFrame.sample(for: .left) else { continue } pixelBuffer = mainCameraSample.pixelBuffer } } } #Preview(windowStyle: .automatic) { cameraTestView() } When I click the button, the console prints: identify cameraFrameUpdates It seems like it stuck in getting cameraFrame from cameraFrameUpdates. Occurring on VisionOS 2.0 Beta (just updated), Xcode 16 Beta 6 (just updated). Does anyone have a workaround for this? I would be grateful if anyone can help.
2
1
476
Aug ’24
ModelContainer working but ModelContext not finding items with SwiftDta
I am trying to count a database table from inside some of my classes. I am tying to do this below **(My problem is that count1 is working, but count2 is not working.) ** class AppState{ private(set) var context: ModelContext? .... func setModelContext(_ context: ModelContext) { self.context = context } @MainActor func count()async{ let container1 = try ModelContainer(for: Item.self) let descriptor = FetchDescriptor<Item>() let count1 = try container1.mainContext.fetchCount(descriptor) let count2 = try context.fetchCount(descriptor) print("WORKING COUNT: \(count1)") print("NOTWORKING COUNT: \(count2) -> always 0") } I am passing the context like: ... @main @MainActor struct myApp: App { @State private var appState = AppState() @Environment(\.modelContext) private var modelContext WindowGroup { ItemView(appState: appState) .task { appState.setModelContext(modelContext) } } .windowStyle(.plain) .windowResizability(.contentSize) .modelContainer(for: [Item.self, Category.self]) { result in ... } Can I get some guidance on why this is happening? Which one is better to use? If I should use count2, how can I fix it? Is this the correct way to search inside an application using SwiftData ? I don't wanna search using the View like @Query because this operation is gonna happen on the background of the app.
1
0
336
Aug ’24
VisionFramework does not work with VisionOS2.0
I try vision frameworks with VisionPro but does not work only with VisionOS2.0. When I perform requests, do not work and below error is caught. I try same code with VisionOS1.2, iOS18.0beta it works. I try also new beta API but does not work and same error. ex.GenerateForegroundInstanceMaskRequest do you have any idea? is it any permission for use vision framework with visionOS2.0. This is my try list with VisionOS2.0beta4 GenerateForegroundInstanceMaskRequest (not work error1) VNGenerateForegroundInstanceMaskRequest(not work error1) VNRecognizeTextRequest (not work error2) with VisionOS1.2 VNRecognizeTextRequest (work) with iOS 18beta GenerateForegroundInstanceMaskRequest (work) My Development Env Env1 VisionPro: VIsionOS2.0beta4 Xcode: 16.0beta4,16.0beta2. macOS: 14.5(23F79) Env2 VisionPro: VIsionOS1.2. Xcode: 15.4 macOS: 14.5(23F79). Error1 Error Domain=com.apple.Vision Code=9 "Could not build inference plan - ANECF error: failed to load ANE model file:///System/Library/Frameworks/Vision.framework/subject_lifting_gen1_rev5_gv8dsz6vxu_multihead_int8.espresso.net Error= (DESIGN)" UserInfo={NSLocalizedDescription=Could not build inference plan - ANECF error: failed to load ANE model file:///System/Library/Frameworks/Vision.framework/subject_lifting_gen1_rev5_gv8dsz6vxu_multihead_int8.espresso.net Error= (DESIGN)} Error2 Error Domain=com.apple.Vision Code=11 "VNRecognizeTextRequest produced an internal error" UserInfo={NSLocalizedDescription=VNRecognizeTextRequest produced an internal error, NSUnderlyingError=0x3001f6850 {Error Domain=CRImageReaderErrorDomain Code=-5 "Unknown error" UserInfo={NSLocalizedDescription=Unknown error}}}
8
0
696
Sep ’24