Apply computer vision algorithms to perform a variety of tasks on input images and video using Vision.

Vision Documentation

Posts under Vision tag

100 Posts
Sort by:
Post not yet marked as solved
0 Replies
127 Views
For example: we use DocKit for birdwatching, so we have an unknown field distance and direction. Distance = ? Direction = ? For example, the rock from which the observation is made. The task is to recognize the number of birds caught in the frame, add a detection frame and collect statistics. Question: What is the maximum number of frames processed with custom object recognition? If not enough, can I do the calculations myself and transfer to DokKit for fast movement?
Posted Last updated
.
Post not yet marked as solved
1 Replies
216 Views
I want to get only spatial video while open the Photo library in my app. How can I achieve? One more thing, If I am selecting any video using photo library then how to identify selected video is Spatial Video or not? self.presentPicker(filter: .videos) /// - Tag: PresentPicker private func presentPicker(filter: PHPickerFilter?) { var configuration = PHPickerConfiguration(photoLibrary: .shared()) // Set the filter type according to the user’s selection. configuration.filter = filter // Set the mode to avoid transcoding, if possible, if your app supports arbitrary image/video encodings. configuration.preferredAssetRepresentationMode = .current // Set the selection behavior to respect the user’s selection order. configuration.selection = .ordered // Set the selection limit to enable multiselection. configuration.selectionLimit = 1 let picker = PHPickerViewController(configuration: configuration) picker.delegate = self present(picker, animated: true) } `func picker(_ picker: PHPickerViewController, didFinishPicking results: [PHPickerResult]) { picker.dismiss(animated: true) { // do something on dismiss } guard let provider = results.first?.itemProvider else {return} provider.loadFileRepresentation(forTypeIdentifier: "public.movie") { url, error in guard error == nil else{ print(error) return } // receiving the video-local-URL / filepath guard let url = url else {return} // create a new filename let fileName = "\(Int(Date().timeIntervalSince1970)).\(url.pathExtension)" // create new URL let newUrl = URL(fileURLWithPath: NSTemporaryDirectory() + fileName) print(newUrl) print("===========") // copy item to APP Storage //try? FileManager.default.copyItem(at: url, to: newUrl) // self.parent.videoURL = newUrl.absoluteString } }`
Posted Last updated
.
Post not yet marked as solved
0 Replies
197 Views
Hello . Currently, only the ios version is on sale on the App Store. The application is offering an icloud-linked, auto-renewable subscription. I want to sell to the app store connect with the same identifier, AppID at the same time. I simply added visionos to the existing app project to provide the visionos version early, but the existing UI-related code and the location-related code are not compatible. We used the same identifier with the same name, duplicated and optimized only what could be implemented, and created it without any problems on the actual device. However, when I added the visionos platform to the App Store cennect and tried to upload it through the archive in the app for visionos that I created as an addition, there was an error in the identifier and provisioning, so the upload was blocked. The result of looking up to solve the problem App Group -I found out about the function, but it was judged that a separate app was for an integrated service, so it was not suitable for me. Add an APP to an existing app project via target and manually adjust the platform in Xcode -> Build Phases -> Compile Soures -> Archive upload success?( I haven't been able to implement this stage of information yet.) I explained the current situation. Please give me some advice on how to implement it.visionos has a lot of constraints, so you need to take a lot of features off.
Posted Last updated
.
Post not yet marked as solved
0 Replies
173 Views
I am using VNRecognizeTextRequest to read Chinese characters. It works fine with text written horizontally, but if even two characters are written vertically, then nothing is recognized. Does anyone know how to get the vision framework to either handle vertical text or recognize characters individually when working with Chinese? I am setting VNRequestTextRecognitionLevel to accurate, since setting it to fast does not recognize any Chinese characters at all. I would love to be able to use fast recognition and handle the characters individually, but it just doesn't seem to work with Chinese. And, when using accurate, if I take a picture of any amount of text, but it's arranged vertically, then nothing is recognized. I can take a picture of 1 character and it works, but if I add just 1 more character below it, then nothing is recognized. It's bizarre. I've tried setting usesLanguageCorrection = false and tried using VNRecognizeTextRequestRevision3, ...Revision2 and ...Revision1. Strangely enough, revision 2 seems to recognize some text if it's vertical, but the bounding boxes are off. Or, sometimes the recognized text will be wrong. I tried playing with DataScannerViewController and it's able to recognize characters in vertical text, but I can't figure out how to replicate it with VNRecognizeTextRequest. The problem with using DataScannerViewController is that it treats the whole text block as one item, and it uses the live camera buffer. As soon as I capture a photo, I still have to use VNRecognizeTextRequest. Below is a code snippet of how I'm using VNRecognizeTextRequest. There's not really much to it and there aren't many other parameters I can try out (plus I've already played around with them). I've also attached a sample image with text laid out vertically. func detectText( in sourceImage: CGImage, oriented orientation: CGImagePropertyOrientation ) async throws -> [VNRecognizedTextObservation] { return try await withCheckedThrowingContinuation { continuation in let request = VNRecognizeTextRequest { request, error in // ... continuation.resume(returning: observations) } request.recognitionLevel = .accurate request.recognitionLanguages = ["zh-Hant", "zh-Hans"] // doesn't seem have any impact // request.usesLanguageCorrection = false do { let requestHandler = VNImageRequestHandler( cgImage: sourceImage, orientation: orientation ) try requestHandler.perform([request]) } catch { continuation.resume(throwing: error) } } }
Posted
by kpcwats.
Last updated
.
Post not yet marked as solved
0 Replies
198 Views
Currently, I'm reducing the ios version being sold on the App Store with the same app id and identifier to visionos, and I'm trying to upload it using the same identifier, developer id, and icloud, but I can't upload it because of an error. I didn't know what the problem was, so I updated the update of the ios version early, but I uploaded the review without any problems. Uploading the visionos single version has a profile problem, so I would appreciate it if you could tell me the solution. In addition, a lot of the code used in the ios version is not compatible with visionos, so we have created a new project for visionos.
Posted Last updated
.
Post not yet marked as solved
0 Replies
304 Views
Context So basically I've trained my model for object detection with +4k images. Under preview I'm able to check the prediction for Image "A" which detects two labels with 100% and its Bounding Boxes look accurate. The problem itself However, inside the Swift Playground, when I try to perform object detection using the same model and same Image I don't get same results. What I expected Is that after performing the request and processing the array of VNRecognizedObjectObservation would show the very same results that appear in CreateML Preview. Notes: So the way I'm importing the model into playground is just by drag and drop. I've trained the images using JPEG format. The test Image is rotated so that it looks vertical using MacOS Finder rotation tool. I've tried, while creating VNImageRequestHandlerto pass a different orientation, with the same result. Swift Playground code This is the code I'm using. import UIKit import Vision do{ let model = try MYMODEL_FROMCREATEML(configuration: MLModelConfiguration()) let mlModel = model.model let coreMLModel = try VNCoreMLModel(for: mlModel) let request = VNCoreMLRequest(model: coreMLModel) { request, error in guard let results = request.results as? [VNRecognizedObjectObservation] else { return } results.forEach { result in print(result.labels) print(result.boundingBox) } } let image = UIImage(named: "TEST_IMAGE.HEIC")! let requestHandler = VNImageRequestHandler(cgImage: image.cgImage!) try requestHandler.perform([request]) } catch { print(error) } Additional Notes & Uncertainties Not sure if this is relevant, but just in case: I've trained the model using pictures I took from my iPhone using 48MP HEIC format. All photos were on vertical position. With a python script I overwrote the EXIF orientation to 1 (Normal). This was in order to be able to annotate the images using the CVAT tool and then convert to CreateML annotation format. Assumption #1 Since I've read that Object Detection in Create ML is based on YOLOv3 architecture which inside the first layer resizes the image dimension, meaning that I don't have to worry about using very large images to train my model. Is this correct? Assumption #2 Also makes me asume that the same thing happens when I try to make a prediction?
Posted
by joe_dev.
Last updated
.
Post not yet marked as solved
1 Replies
293 Views
Hello, I have been working to try to create a scanner to scan a PDF417 barcode from your photos library for a few days now and have come to a dead end. Every time that I run my function on the photo, my array of observations always returns as []. This example is me trying to use it with an automatic generated image because I think that if it works with this, it will work with a real screenshot. That being said, I have already tried with all sorts of images that aren't pre-generated, and they, still, have failed to work. Code below: Calling the function createVisionRequest(image: generatePDF417Barcode(from: "71238-12481248-128035-40239431")!) Creating the Barcode: static func generatePDF417Barcode(from key: String) -> UIImage? { let data = key.data(using: .utf8)! let filter = CIFilter.pdf417BarcodeGenerator() filter.message = data filter.rows = 7 let transform = CGAffineTransform(scaleX: 3, y: 4) if let outputImage = filter.outputImage?.transformed(by: transform) { let context = CIContext() if let cgImage = context.createCGImage(outputImage, from: outputImage.extent) { return UIImage(cgImage: cgImage) } } return nil } Main function for scanning the barcode: static func desynthesizeIDScreenShot(from image: UIImage, completion: @escaping (String?) -> Void) { guard let ciImage = CIImage(image: image) else { print("Empty image") return } let imageRequestHandler = VNImageRequestHandler(ciImage: ciImage, orientation: .up) let request = VNDetectBarcodesRequest { (request,error) in guard error == nil else { completion(nil) return } guard let observations = request.results as? [VNDetectedObjectObservation] else { completion(nil) return } request.revision = VNDetectBarcodesRequestRevision2 let result = (observations.first as? VNBarcodeObservation)?.payloadStringValue print("Observations", observations) if let result { completion(result) print() print(result) } else { print(error?.localizedDescription) //returns nil completion(nil) print() print(result) print() } } request.symbologies = [VNBarcodeSymbology.pdf417] try? imageRequestHandler.perform([request]) } Thanks!
Posted
by ZBomb_.
Last updated
.
Post not yet marked as solved
1 Replies
852 Views
i saw there is a way to track hands with vision, but is there also a way to record that movement and export it to fbx? Oh and is there a way to set only one hand to be recorded or both at the same time? Implementation will be in SwiftUI
Posted
by chaert-s.
Last updated
.
Post not yet marked as solved
0 Replies
241 Views
After working immersively in vision, can the user adjust immersive mode and transparency in the wallpaper? Will developers be able to arbitrarily adjust transparency with code to see users overlap between reality and immersive mode at the same time?
Posted Last updated
.
Post not yet marked as solved
1 Replies
373 Views
I'm unable to figure out how to know when my app no longer has focus. ScenePhase will only change when the WindowGroup gets created or closed. UIApplication.didBecomeActive and UIApplication.didEnterBackgroundNotification are not called either when say you move focus to Safari. What's the trick?
Posted
by Lucky7.
Last updated
.
Post not yet marked as solved
2 Replies
496 Views
I have trained a model to classify some symbols using Create ML. In my app I am using VNImageRequestHandler and VNCoreMLRequest to classify image data. If I use a CVPixelBuffer obtained from an AVCaptureSession then the classifier runs as I would expect. If I point it at the symbols it will work fairly accurately, so I know the model is trained fairly correctly and works in my app. If I try to use a cgImage that is obtained by cropping a section out of a larger image (from the gallery), then the classifier does not work. It always seems to return the same result (although the confidence is not a 1.0 and varies for each image, it will be to within several decimal points of it, eg 9.9999). If I pause the app when I have the cropped image and use the debugger to obtain the cropped image (via the little eye icon and then open in preview), then drop the image into the Preview section of the MLModel file or in Create ML, the model correctly classifies the image. If I scale the cropped image to be the same size as I get from my camera, and convert the cgImage to a CVPixelBuffer with same size and colour space to be the same as the camera (1504, 1128, kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange) then I get some difference in ouput, it's not accurate, but it returns different results if I specify the 'centerCrop' or 'scaleFit' options. So I know that 'something' is happening, but it's not the correct thing. I was under the impression that passing a cgImage to the VNImageRequestHandler would perform the necessary conversions, but experimentation shows this is not the case. However, when using the preview tool on the model or in Create ML this conversion is obviously being done behind the scenes because the cropped part is being detected. What am I doing wrong. tl;dr my model works, as backed up by using video input directly and also dropping cropped images into preview sections passing the cropped images directly to the VNImageRequestHandler does not work modifying the cropped images can produce different results, but I cannot see what I should be doing to get reliable results. I'd like my app to behave the same way the preview part behaves, I give it a cropped part of an image, it does some processing, it goes to the classifier, it returns a result same as in Create ML.
Posted
by Bergasms.
Last updated
.
Post not yet marked as solved
1 Replies
342 Views
I used metal and CompositorLayer to render an immersive space skybox. In this space, the window created by the Swift UI I created only displays the gray frosted glass background effect (it seems to ignore the metal-rendered skybox and only samples and displays the black background). why is that? Is there any solution to display the normal frosted glass background? Thank you very much!
Posted
by zane1024.
Last updated
.
Post not yet marked as solved
1 Replies
391 Views
Hey guys! I'm building an app which detects cars via Vision and then retrieves the distance to said car by a synchronized depthDataMap. However, I'm having trouble finding the correct corresponding pixel in that depthDataMap. While the CGRect of the ObjectObservation ranges from 0 - 300 (x) and 0 - 600 (y), The width x height of the DepthDataMap is Only 320 x 180, so I can't get the right corresponding pixel. Any Idea on how to solve this? Kind regards
Posted Last updated
.
Post not yet marked as solved
0 Replies
275 Views
My project is use OC, is a iOS App, and now I need make it to visionOS (not unmodified designed for iPhone). So a question one, how can I differentiate visionOS by code, need use macro definitions, otherwise, it cannot be compiled. The question two, have some other tips?or other question need I know? Thanks.
Posted
by lowinding.
Last updated
.
Post not yet marked as solved
0 Replies
293 Views
I want to make icloud backup using SwiftData in VisionOS and I need to use SwiftData first but I get the following error even though I do the following steps I followed the steps below I created a Model import Foundation import SwiftData @Model class NoteModel { @Attribute(.unique) var id: UUID var date:Date var title:String var text:String init(id: UUID = UUID(), date: Date, title: String, text: String) { self.id = id self.date = date self.title = title self.text = text } } I added modelContainer WindowGroup(content: { NoteView() }) .modelContainer(for: [NoteModel.self]) And I'm making inserts to test import SwiftUI import SwiftData struct NoteView: View { @Environment(\.modelContext) private var context var body: some View { Button(action: { // new Note let note = NoteModel(date: Date(), title: "New Note", text: "") context.insert(note) }, label: { Image(systemName: "note.text.badge.plus") .font(.system(size: 24)) .frame(width: 30, height: 30) .padding(12) .background( RoundedRectangle(cornerRadius: 50) .foregroundStyle(.black.opacity(0.2)) ) }) .buttonStyle(.plain) .hoverEffectDisabled(true) } } #Preview { NoteView().modelContainer(for: [NoteModel.self]) }
Posted
by OVRIDOO.
Last updated
.