Apply computer vision algorithms to perform a variety of tasks on input images and video using Vision.

Posts under Vision tag

110 Posts
Sort by:

Post

Replies

Boosts

Views

Activity

Implementing multi-pass rendering in VisionOS
I’m working on a Vision Pro app using Metal and need to implement multi-pass rendering. Specifically, I want to render intermediate results to a texture, then use that texture in a second pass for post-processing before presenting the final output. What’s the best approach in visionOS? Should I use multiple render passes in a single command buffer or separate command buffers? Any insights on efficiently handling this in RealityKit or Metal? Thanks!
0
0
114
5d
Body segmentation/occlusion on the Apple Vision Pro
Hello, I am currently working on a Unity project for the Apple Vision Pro. I would like to have people passing in front of the virtual objects occlude the virtual objects that are behind. Something similar to this: https://developer.apple.com/documentation/arkit/occluding-virtual-content-with-people I could unfortunately not find any documentation about this. Is it possible to implement body segmentation or occlusion on the Apple Vision Pro? If it's not currently supported, are there plans to add it? Any ideas on how to achieve this with existing tools? Thanks! Mehdi
1
0
155
1w
Efficient Clustering of Images Using VNFeaturePrintObservation.computeDistance
Hi everyone, I'm working with VNFeaturePrintObservation in Swift to compute the similarity between images. The computeDistance function allows me to calculate the distance between two images, and I want to cluster similar images based on these distances. Current Approach Right now, I'm using a brute-force approach where I compare every image against every other image in the dataset. This results in an O(n^2) complexity, which quickly becomes a bottleneck. With 5000 images, it takes around 10 seconds to complete, which is too slow for my use case. Question Are there any efficient algorithms or data structures I can use to improve performance? If anyone has experience with optimizing feature vector clustering or has suggestions on how to scale this efficiently, I'd really appreciate your insights. Thanks!
0
0
216
1w
The multiview video screen turns blank when returning
I am encountering an issue while using the multiview video demo provided at this link "https://developer.apple.com/documentation/avkit/creating-a-multiview-video-playback-experience-in-visionos/". Specifically, when running on versions of visionOS prior to 2.2, navigating back results in a blank screen. Has anyone else experienced this problem and found a solution? Any advice or workaround would be greatly appreciated.
1
0
173
3d
Issues with using ClassifyImageRequest() on an Xcode simulator
Hello, I am developing an app for the Swift Student challenge; however, I keep encountering an error when using ClassifyImageRequest from the Vision framework in Xcode: VTEST: error: perform(_:): inside 'for await result in resultStream' error: internalError("Error Domain=NSOSStatusErrorDomain Code=-1 \"Failed to create espresso context.\" UserInfo={NSLocalizedDescription=Failed to create espresso context.}") It works perfectly when testing it on a physical device, and I saw on another thread that ClassifyImageRequest doesn't work on simulators. Will this cause problems with my submission to the challenge? Thanks
5
1
365
1w
Feature Request – Support for GS1 DataBar Stacked in Vision Framework
Dear Apple Developer Team, I am writing to request the addition of GS1 DataBar Stacked (both regular and expanded variants) to the barcode symbologies supported by the Vision framework (VNBarcodeSymbology) and VisionKit's DataScannerViewController. Currently, Vision supports several GS1 DataBar formats, such as: VNBarcodeSymbology.gs1DataBar VNBarcodeSymbology.gs1DataBarExpanded VNBarcodeSymbology.gs1DataBarLimited However, GS1 DataBar Stacked is widely used in industries such as retail, pharmaceuticals, and logistics, where space constraints prevent the use of the standard GS1 DataBar format. Many businesses rely on this symbology to encode GTINs and other product data, but Apple's barcode scanning API does not explicitly support it. Why This Feature Matters: Essential for Small Packaging: GS1 DataBar Stacked is commonly used on small product labels where a standard linear barcode does not fit. Widespread Industry Adoption: Many point-of-sale (POS) systems and inventory management tools require this symbology. Improves iOS Adoption for Enterprise Use: Adding support would make Apple’s Vision framework a more viable solution for businesses that currently rely on third-party barcode scanning SDKs. Feature Request: Please add GS1 DataBar Stacked and GS1 DataBar Expanded Stacked to the recognized symbologies in: VNBarcodeSymbology (for Vision framework) DataScannerViewController (for VisionKit) This addition would enhance the versatility of Apple’s barcode scanning tools and reduce the need for third-party libraries. I appreciate your consideration of this request and would be happy to provide more details or test implementations if needed. Thank you for your time and support! Best regards
2
5
279
2w
Is it possible to read japanese tategaki with vision framework
We are building an app which can reads texts. It can read english and Japanese normal texts successfully. But in some cases, we need to read Japanese tategaki (vertically aligned texts). But in that times, the same code gives no output. So, is there any need to change any configuration to read Japanese tategaki? Or is it really possible to read Japanese tategaki using vision framework? lazy var detectTextRequest = VNRecognizeTextRequest { request, error in self.resStr="\n" self.words = [:] // Get OCR result guard let res = request.results as? [VNRecognizedTextObservation] else { return } // separate the words by space let text = res.compactMap({$0.topCandidates(1).first?.string}).joined(separator: " ") var n = 0 self.wordArr=[[]] self.xs = 1 self.ys = 1 var hs = 0.0 // To compare the heights of the words // To get the original axis (top most word's axis), only once for r in res { var word = r.topCandidates(1).first?.string self.words[word ?? ""] = [r.topLeft.x, r.topLeft.y] if(self.cartLabelType == 1){ if(word?.components(separatedBy: CharacterSet(charactersIn: "//")).count ?? 0>2){ self.xs = r.topLeft.x self.ys = r.topLeft.y } } } } }
2
1
318
3w
Vision to detect corners and or 3 lines.
End goal: to detect 3 lines, and 2 corners accurately. Trying contours but they are a bit off. Is there a way or settings in contours to detect corners and lines more accurately, maybe less an sharper edged/corner contours? Or some other API or methods please? I would love an email please ;) thank you. 2. also an overlay/scale issue
3
0
440
Jan ’25
Filtering Contours from Vision
Hello, I need help I desire to select/filter the contours on an image. Not sure best way to do that. Idea select/filter for bottom left most contour? see image attached please. also will need end points or court corners. and need contour to be fine line, smooth, ie accurate of the court end line and side lines only is desired. thank you :) or also glad for other ideas or api to determine the lines/corners I need. glad to email to discuss if that is better/easier actually prefer that. thanks.
3
0
283
Jan ’25
Help Needed: SwiftUI View with Camera Integration and Core ML Object Recognition
Hi everyone, I'm working on a SwiftUI app and need help building a view that integrates the device's camera and uses a pre-trained Core ML model for real-time object recognition. Here's what I want to achieve: Open the device's camera from a SwiftUI view. Capture frames from the camera feed and analyze them using a Create ML-trained Core ML model. If a specific figure/object is recognized, automatically close the camera view and navigate to another screen in my app. I'm looking for guidance on: Setting up live camera capture in SwiftUI. Using Core ML and Vision frameworks for real-time object recognition in this context. Managing navigation between views when the recognition condition is met. Any advice, code snippets, or examples would be greatly appreciated! Thanks in advance!
1
0
454
Jan ’25
[Vision, visionOS] Is it possible using Vision Framework on visionOS for body tracking feature?
Hello, I checked following documentations. Vision | Apple Developer Documentation Discover Swift enhancements in the Vision framework - WWDC24 - Videos - Apple Developer I saw Vision Framework is available on visionOS. So I want to know that if it's possible using Vision Framework on visionOS for tracking human and animal body poses. Or are there some limits to use this on visionOS?
1
0
345
Jan ’25
How to Implement a Curved Surface Effect for Video Playback and Allow Dynamic Width Adjustment in visionOS?
Dear Apple Engineers, I am working on a project in visionOS and need to implement a curved surface effect for video playback, where the width of the surface can be dynamically adjusted. Specifically, I want the video to be displayed on a curved surface (similar to a scroll unfolding), and the user should be able to adjust the width of this surface. I have the following specific questions: How can I implement a curved surface for video playback and ensure the video content is not stretched or distorted on the surface? How can I create a dynamic curved surface (such as a bending plane) in RealityKit or visionOS, where the width can be adjusted by the user? Is it possible to achieve more complex curved surface effects (such as scroll unfolding or bending) using Shaders or other techniques? Thank you very much for your help!
1
0
403
Dec ’24
Vision Framework Causes EXC_BREAKPOINT Error in Xcode App Playground (.swiftpm) File
I’m trying to use the Vision framework in a Swift Playground to perform face detection on an image. The following code works perfectly when I run it in a regular Xcode project, but in an App Playground, I get the error: Thread 12: EXC_BREAKPOINT (code=1, subcode=0x10321c2a8) Here's the code: import SwiftUI import Vision struct ContentView: View { var body: some View { VStack { Text("Face Detection") .font(.largeTitle) .padding() Image("me") .resizable() .aspectRatio(contentMode: .fit) .onAppear { detectFace() } } } func detectFace() { guard let cgImage = UIImage(named: "me")?.cgImage else { return } let request = VNDetectFaceRectanglesRequest { request, error in if let results = request.results as? [VNFaceObservation] { print("Detected \(results.count) face(s).") for face in results { print("Bounding Box: \(face.boundingBox)") } } else { print("No faces detected.") } } let handler = VNImageRequestHandler(cgImage: cgImage, options: [:]) do { try handler.perform([request]) // This line causes the error. } catch { print("Failed to perform Vision request: \(error)") } } } The error occurs on this line: try handler.perform([request]) Details: This code runs fine in a normal Xcode project (.xcodeproj). I'm using an App Playground instead (.swiftpm). The image is being included in the .xcassets folder. Is there any way I can mitigate this issue? Please do not recommend switching to .xcodeproj, as I am making a submission for Apple's Swift Student Challenge, and they require that I use .swiftpm.
1
0
315
Dec ’24
Vision - Time travel door
Hello All, We're going to do a scene now, kind of like a time travel door. When the user selects the scene, the user passes through the door to show the current scene. The changes in the middle need to be more natural. It's even better if you can walk through an immersive space... There is very little information now. How can I start doing this? Is there any information I can refer to thanks
2
0
409
Dec ’24