as in the environments we have real tiem reflections of movies on a screen or reflections of the surrounding hood in the background...
could i get a metallic surface getting accurate reflections of a box on top ?
i don't mean getting a rpobe or hdr cubemap, i mean the same accurate reflections as the water of the mt hood with movie i'm wacthing in other app
Vision
RSS for tagApply computer vision algorithms to perform a variety of tasks on input images and video using Vision.
Posts under Vision tag
106 Posts
Sort by:
Post
Replies
Boosts
Views
Activity
Hi!
I attempted to run a sample project for detecting human pose in photos, which can be found here:
https://developer.apple.com/documentation/vision/detecting-human-body-poses-in-3d-with-vision
The project works perfectly when run on my Macbook Pro M1, but it fails on Apple Vision Pro. After selecting the photo an endless loading screen is presented and the following output is produced in the console:
Failed to initialize 2D Detection Algorithm.
Failed to initialize 2D Pose Estimation Algorithm.
Failed to initialize algorithm modules
Network path is nil: (null)
Failed to initialize 2D Detection Algorithm.
Failed to initialize 2D Pose Estimation Algorithm.
Failed to initialize algorithm modules
Unable to perform the request: Error Domain=com.apple.Vision Code=9 "Async status object reported as failed but without an error" UserInfo={NSLocalizedDescription=Async status object reported as failed but without an error}.
de-activating session 70138 after timeout
It seems that VNDetectHumanBodyPose3DRequest is failing on Vision Pro for some reason. Are there any additional requirements for running Vision framework on VisionOS, that I might be missing?
Hi!
I attempted running a sample project for detecting human pose in 3D with vision framework, that can be found here: https://developer.apple.com/documentation/vision/detecting-human-body-poses-in-3d-with-vision.
It works perfectly on my Macbook Pro M1, but fails on Apple Vision Pro. After selecting a photo, an endless loading screen is displayed and the following message is produced in the console:
Failed to initialize 2D Detection Algorithm.
Failed to initialize 2D Pose Estimation Algorithm.
Failed to initialize algorithm modules
Network path is nil: (null)
Failed to initialize 2D Detection Algorithm.
Failed to initialize 2D Pose Estimation Algorithm.
Failed to initialize algorithm modules
Unable to perform the request: Error Domain=com.apple.Vision Code=9 "Async status object reported as failed but without an error" UserInfo={NSLocalizedDescription=Async status object reported as failed but without an error}.
de-activating session 70138 after timeout
Is human pose detection expected to work on VisionOS? Is there any special configuration required, that I might be missing?
Hi,
One can configure the languages of a (VN)RecognizeTextRequest with either:
.automatic: language to be detected
a specific language, say Spanish
If the request is configured with .automatic and successfully detects Spanish, will the results be exactly equivalent compared to a request made with Spanish set as language?
I could not find any information about this, and this is very important for the core architecture of my app.
Thanks!
The goal is to achieve precise joint tracking for clinical assessment. The Doctor is wearing the AVP and observing the Patients movement.
Do you have any recommended best practices for integrating real-time joint tracking and displaying them on the patient within visionOS?
We attempted to use VNHumanBodyPose3DObservation, which theoretically should work, but we are unable to display the detected joints in an Immersive Space for real-time validation. This makes it difficult for the doctor to ensure accurate tracking and if possible a photo or video of the Range of Motion assessment would be needed for the patient record.
Are there alternative methods to achieve precise real-time joint tracking without requiring main camera access (com.apple.developer.arkit.main-camera-access.allow)?
Our app is downloading a zip of an .mlpackage file, which is then compiled into an .mlmodelc file using MLModel.compileModel(at:). This model is then run using a VNCoreMLRequest.
Two users – and this after a very small rollout - are reporting issues running the VNCoreMLRequest. The error message from their logs:
Error Domain=com.apple.CoreML Code=0 "Failed to build the model execution plan using a model architecture file '/private/var/mobile/Containers/Data/Application/F93077A5-5508-4970-92A6-03A835E3291D/Documents/SKDownload/Identify-image-iOS/mobile_img_eu_v210.mlmodelc/model.mil' with error code: -5."
The URL there is to a file inside the compiled model. The error is happening when the perform function of VNImageRequestHandler is run. (i.e. the model compiled without an error.)
Anyone else seen this issue? Its only picked up in a few web results and none of them are directly relevant or have a fix.
I know that a CoreML error Code=0 is a generic error, but does anyone know what error code -5 is? Not even sure which framework its coming from.
Keep getting error :
I have tried Picker for File, Photo Library , both same results .
Debugging the resize for 360x360 but still facing this error.
The model I'm trying to implement is created with CreateMLComponents
The process is from example of WWDC 2022 Banana Ripeness , I have used index for each .jpg .
Prediction Failed: The VNCoreMLTransform request failed
Is there some possible way to solve it or is error somewhere in training of model ?
I’m working on a Vision Pro app using Metal and need to implement multi-pass rendering. Specifically, I want to render intermediate results to a texture, then use that texture in a second pass for post-processing before presenting the final output.
What’s the best approach in visionOS? Should I use multiple render passes in a single command buffer or separate command buffers? Any insights on efficiently handling this in RealityKit or Metal?
Thanks!
Hello,
I am currently working on a Unity project for the Apple Vision Pro. I would like to have people passing in front of the virtual objects occlude the virtual objects that are behind. Something similar to this: https://developer.apple.com/documentation/arkit/occluding-virtual-content-with-people
I could unfortunately not find any documentation about this. Is it possible to implement body segmentation or occlusion on the Apple Vision Pro? If it's not currently supported, are there plans to add it? Any ideas on how to achieve this with existing tools?
Thanks!
Mehdi
Hi everyone,
I'm working with VNFeaturePrintObservation in Swift to compute the similarity between images. The computeDistance function allows me to calculate the distance between two images, and I want to cluster similar images based on these distances.
Current Approach
Right now, I'm using a brute-force approach where I compare every image against every other image in the dataset. This results in an O(n^2) complexity, which quickly becomes a bottleneck. With 5000 images, it takes around 10 seconds to complete, which is too slow for my use case.
Question
Are there any efficient algorithms or data structures I can use to improve performance?
If anyone has experience with optimizing feature vector clustering or has suggestions on how to scale this efficiently, I'd really appreciate your insights. Thanks!
I am encountering an issue while using the multiview video demo provided at this link "https://developer.apple.com/documentation/avkit/creating-a-multiview-video-playback-experience-in-visionos/". Specifically, when running on versions of visionOS prior to 2.2, navigating back results in a blank screen. Has anyone else experienced this problem and found a solution? Any advice or workaround would be greatly appreciated.
Hello,
I am developing an app for the Swift Student challenge; however, I keep encountering an error when using ClassifyImageRequest from the Vision framework in Xcode:
VTEST: error: perform(_:): inside 'for await result in resultStream' error: internalError("Error Domain=NSOSStatusErrorDomain Code=-1 \"Failed to create espresso context.\" UserInfo={NSLocalizedDescription=Failed to create espresso context.}")
It works perfectly when testing it on a physical device, and I saw on another thread that ClassifyImageRequest doesn't work on simulators. Will this cause problems with my submission to the challenge?
Thanks
Topic:
Machine Learning & AI
SubTopic:
General
Tags:
Swift Student Challenge
Swift
Swift Playground
Vision
Dear Apple Developer Team,
I am writing to request the addition of GS1 DataBar Stacked (both regular and expanded variants) to the barcode symbologies supported by the Vision framework (VNBarcodeSymbology) and VisionKit's DataScannerViewController.
Currently, Vision supports several GS1 DataBar formats, such as:
VNBarcodeSymbology.gs1DataBar
VNBarcodeSymbology.gs1DataBarExpanded
VNBarcodeSymbology.gs1DataBarLimited
However, GS1 DataBar Stacked is widely used in industries such as retail, pharmaceuticals, and logistics, where space constraints prevent the use of the standard GS1 DataBar format. Many businesses rely on this symbology to encode GTINs and other product data, but Apple's barcode scanning API does not explicitly support it.
Why This Feature Matters:
Essential for Small Packaging: GS1 DataBar Stacked is commonly used on small product labels where a standard linear barcode does not fit.
Widespread Industry Adoption: Many point-of-sale (POS) systems and inventory management tools require this symbology.
Improves iOS Adoption for Enterprise Use: Adding support would make Apple’s Vision framework a more viable solution for businesses that currently rely on third-party barcode scanning SDKs.
Feature Request:
Please add GS1 DataBar Stacked and GS1 DataBar Expanded Stacked to the recognized symbologies in:
VNBarcodeSymbology (for Vision framework)
DataScannerViewController (for VisionKit)
This addition would enhance the versatility of Apple’s barcode scanning tools and reduce the need for third-party libraries.
I appreciate your consideration of this request and would be happy to provide more details or test implementations if needed.
Thank you for your time and support!
Best regards
We are building an app which can reads texts. It can read english and Japanese normal texts successfully. But in some cases, we need to read Japanese tategaki (vertically aligned texts). But in that times, the same code gives no output. So, is there any need to change any configuration to read Japanese tategaki? Or is it really possible to read Japanese tategaki using vision framework?
lazy var detectTextRequest = VNRecognizeTextRequest { request, error in
self.resStr="\n"
self.words = [:]
// Get OCR result
guard let res = request.results as? [VNRecognizedTextObservation] else { return }
// separate the words by space
let text = res.compactMap({$0.topCandidates(1).first?.string}).joined(separator: " ")
var n = 0
self.wordArr=[[]]
self.xs = 1
self.ys = 1
var hs = 0.0 // To compare the heights of the words
// To get the original axis (top most word's axis), only once
for r in res {
var word = r.topCandidates(1).first?.string
self.words[word ?? ""] = [r.topLeft.x, r.topLeft.y]
if(self.cartLabelType == 1){
if(word?.components(separatedBy: CharacterSet(charactersIn: "//")).count ?? 0>2){
self.xs = r.topLeft.x
self.ys = r.topLeft.y
}
}
}
}
}
If I update vision pro size then textfield and button are not update as per its new size.
Its working on some scren but not in some screen.
Please refer below screenshot for your reference,
Based on the iPhone 14 Max camera, implement model recognition and draw a rectangular box around the recognized object. The width and height are calculated using LiDAR and displayed in centimeters on the real-time updated image.
Hello all... is there a way to close a contour if you have found say two points on each side top "extension"? see image attached. So in end desire a trapezoid type shape. Code example would be very appreciated. thank you :) Think I have it as a CGPath. So a way to edit a CGPath, or close the top from a top left to a top right point?
I have to decrease main window screen size when user open Immersive space in my project.
Using frame i try it but it not updated main window size it just update view frame.
We are using VNRecognizeTextRequest to detect text in documents, and we have noticed that even in some very clear and well-formatted documents, there are still instances where text blocks are missed. the live text also have the same issue.
End goal: to detect 3 lines, and 2 corners accurately. Trying contours but they are a bit off. Is there a way or settings in contours to detect corners and lines more accurately, maybe less an sharper edged/corner contours? Or some other API or methods please?
I would love an email please ;) thank you. 2. also an overlay/scale issue