Apply computer vision algorithms to perform a variety of tasks on input images and video using Vision.

Posts under Vision tag

78 Posts

Post

Replies

Boosts

Views

Activity

How to Implement a Curved Surface Effect for Video Playback and Allow Dynamic Width Adjustment in visionOS?
Dear Apple Engineers, I am working on a project in visionOS and need to implement a curved surface effect for video playback, where the width of the surface can be dynamically adjusted. Specifically, I want the video to be displayed on a curved surface (similar to a scroll unfolding), and the user should be able to adjust the width of this surface. I have the following specific questions: How can I implement a curved surface for video playback and ensure the video content is not stretched or distorted on the surface? How can I create a dynamic curved surface (such as a bending plane) in RealityKit or visionOS, where the width can be adjusted by the user? Is it possible to achieve more complex curved surface effects (such as scroll unfolding or bending) using Shaders or other techniques? Thank you very much for your help!
0
0
514
Dec ’24
Vision Framework Causes EXC_BREAKPOINT Error in Xcode App Playground (.swiftpm) File
I’m trying to use the Vision framework in a Swift Playground to perform face detection on an image. The following code works perfectly when I run it in a regular Xcode project, but in an App Playground, I get the error: Thread 12: EXC_BREAKPOINT (code=1, subcode=0x10321c2a8) Here's the code: import SwiftUI import Vision struct ContentView: View { var body: some View { VStack { Text("Face Detection") .font(.largeTitle) .padding() Image("me") .resizable() .aspectRatio(contentMode: .fit) .onAppear { detectFace() } } } func detectFace() { guard let cgImage = UIImage(named: "me")?.cgImage else { return } let request = VNDetectFaceRectanglesRequest { request, error in if let results = request.results as? [VNFaceObservation] { print("Detected \(results.count) face(s).") for face in results { print("Bounding Box: \(face.boundingBox)") } } else { print("No faces detected.") } } let handler = VNImageRequestHandler(cgImage: cgImage, options: [:]) do { try handler.perform([request]) // This line causes the error. } catch { print("Failed to perform Vision request: \(error)") } } } The error occurs on this line: try handler.perform([request]) Details: This code runs fine in a normal Xcode project (.xcodeproj). I'm using an App Playground instead (.swiftpm). The image is being included in the .xcassets folder. Is there any way I can mitigate this issue? Please do not recommend switching to .xcodeproj, as I am making a submission for Apple's Swift Student Challenge, and they require that I use .swiftpm.
1
0
425
Dec ’24
Vision - Time travel door
Hello All, We're going to do a scene now, kind of like a time travel door. When the user selects the scene, the user passes through the door to show the current scene. The changes in the middle need to be more natural. It's even better if you can walk through an immersive space... There is very little information now. How can I start doing this? Is there any information I can refer to thanks
2
0
549
Dec ’24
Immersive Space not working
if I set UIApplicationPreferredDefaultSceneSessionRole to UISceneSessionRoleImmersiveSpaceApplication then my Immersive Space for image is working fine but when I try with UIWindowSceneSessionRoleApplication this option and try to open Immersive space on particular sub screen then its not showing image in immersive space(Immersive space not open). Any one have idea what the issue. <key>UIApplicationSceneManifest</key> <dict> <key>UIApplicationPreferredDefaultSceneSessionRole</key> <string>UIWindowSceneSessionRoleApplication</string> <key>UIApplicationSupportsMultipleScenes</key> <true/> <key>UISceneConfigurations</key> <dict> <key>UISceneSessionRoleImmersiveSpaceApplication</key> <array> <dict> <key>UISceneInitialImmersionStyle</key> <string>UIImmersionStyleFull</string> </dict> </array> </dict> </dict> My info.plist value as above
1
0
498
Dec ’24
VisionKit: Improve barcode scanning accuracy
Hi all, I am developing an app that scans barcodes using VisionKit, but I am facing some difficulties. The accuracy level is not at where I hope it to be at. Changing the “qualityLevel” parameter from balanced to accurate made the barcode reading slightly better, but it is still misreading some cases. I previously implemented the same barcode scanning app with AVFoundation, and that had much better accuracy. I tested it out, and barcodes that were read correctly with AVFoundation were read incorrectly with VisionKit . Is there anyway to improve the accuracy of the barcode reading in VisionKit? Or is this something that is built in and the developer cannot change? Either way, any ideas on how to improve reading accuracy would help. Thanks in advance!
0
0
429
Dec ’24
New Vision API
Hey everyone, I've been updating my code to take advantage of the new Vision API for text recognition in macOS 15. I'm noticing some very odd behavior though, it seems like in general the new Vision API consistently produces worse results than the old API. For reference here is how I'm setting up my request. var request = RecognizeTextRequest() request.recognitionLevel = getOCRMode() // generally accurate request.usesLanguageCorrection = !disableLanguageCorrection // generally true request.recognitionLanguages = language.split(separator: ",").map { Locale.Language(identifier: String($0)) } // generally 'en' let observations = try? await request.perform(on: image) as [RecognizedTextObservation] Then I will process the results and just get the top candidate, which as mentioned above, typically is of worse quality then the same request formed with the old API. Am I doing something wrong here?
3
0
679
Dec ’24
Inference with non-square Images
I'm trying to set up Facebook AI's "Segment Anything" MLModel to compare its performance and efficacy on-device against the Vision library's Foreground Instance Mask Request. The Vision request accepts any reasonably-sized image for processing, and then has a method to produce an output at the same resolution as the input image. Conversely, the MLModel for Segment Anything accepts a 1024x1024 image for inference and outputs a 1024x1024 image for output. What is the best way to work with non-square images, such as 4:3 camera photos? I can basically think of 3 methods for accomplishing this: Scale the image to 1024x1024, ignoring aspect ratio, then inversely scale the output back to the original size. However, I have a big concern that squashing the content will result in poor inference results. Scale the image, preserving its aspect ratio so its minimum dimension is 1024, then run the model multiple times on a sliding 1024x1024 window and then aggregating the results. My main concern here is the complexity of de-duping the output, when each run could make different outputs based on how objects are cropped. Fit the image within 1024x1024 and pad with black pixels to make a square. I'm not sure if the border will muck up the inference. Anyway, this seems like it must be a well-solved problem in ML, but I'm having difficulty finding an authoritative best practice.
0
0
420
Dec ’24
Running out of memory analyzing images with ImageRequestHandler
Hi, I'm trying to analyze images in my Photos library with the following code: func analyzeImages(_ inputIDs: [String]) { let manager = PHImageManager.default() let option = PHImageRequestOptions() option.isSynchronous = true option.isNetworkAccessAllowed = true option.resizeMode = .none option.deliveryMode = .highQualityFormat let concurrentTasks=1 let clock = ContinuousClock() let duration = clock.measure { let group = DispatchGroup() let sema = DispatchSemaphore(value: concurrentTasks) for entry in inputIDs { if let asset=PHAsset.fetchAssets(withLocalIdentifiers: [entry], options: nil).firstObject { print("analyzing asset: \(entry)") group.enter() sema.wait() manager.requestImage(for: asset, targetSize: PHImageManagerMaximumSize, contentMode: .aspectFit, options: option) { (result, info) in if let result = result { Task { print("retrieved asset: \(entry)") let aestheticsRequest = CalculateImageAestheticsScoresRequest() let fingerprintRequest = GenerateImageFeaturePrintRequest() let inputImage = result.cgImage! let handler = ImageRequestHandler(inputImage) let (aesthetics,fingerprint) = try await handler.perform(aestheticsRequest, fingerprintRequest) // save Results print("finished asset: \(entry)") sema.signal() group.leave() } } else { group.leave() } } } } group.wait() } print("analyzeImages: Duration \(duration)") } When running this code, only two requests are being processed simultaneously (due to to the semaphore)... However, if I call the function with a large list of images (>100), memory usage balloons over 1.6GB and the app crashes. If I call with a smaller number of images, the loop completes and the memory is freed. When I use instruments to look for memory leaks, it indicates no memory leaks are found, but there are 150+ VM:IOSurfaces allocated by CMPhoto, CoreVideo and CoreGraphics @ 35MB each. Shouldn't each surface be released when the task is complete?
2
0
593
Dec ’24
VNCoreMLRequest Callback Not Triggered in Modified Video Classification App
Hi everyone, I'm working on integrating object recognition from live video feeds into my existing app by following Apple's sample code. My original project captures video and records it successfully. However, after integrating the Vision-based object detection components (VNCoreMLRequest), no detections occur, and the callback for the request is never triggered. To debug this issue, I’ve added the following functionality: Set up AVCaptureVideoDataOutput for processing video frames. Created a VNCoreMLRequest using my Core ML model. The video recording functionality works as expected, but no object detection happens. I’d like to know: How to debug this further? Which key debug points or logs could help identify where the issue lies? Have I missed any key configurations? Below is a diff of the modifications I’ve made to my project for the new feature. Diff of Changes: (Attach the diff provided above) Specific Observations: The captureOutput method is invoked correctly, but there is no output or error from the Vision request callback. Print statements in my setup function setForVideoClassify() show that the setup executes without errors. Questions: Could this be due to issues with my Core ML model compatibility or configuration? Is the VNCoreMLRequest setup incorrect, or do I need to ensure specific image formats for processing? Platform: Xcode 16.1, iOS 18.1, Swift 5, SwiftUI, iPhone 11, Darwin MacBook-Pro.local 24.1.0 Darwin Kernel Version 24.1.0: Thu Oct 10 21:02:27 PDT 2024; root:xnu-11215.41.3~2/RELEASE_X86_64 x86_64 Any guidance or advice is appreciated! Thanks in advance.
1
0
604
Nov ’24
How to roll a ball by physic in RealityKit
I decided to use a club to kick a ball and let it roll on the turf in RealityKit, but now I can only let it slide but can not roll. I add collision on the turf(static), club (kinematic) and the ball(dynamic), and set some parameters: radius, mass. Using these parameters calculate linear damping, inertia, besides, use time between frames and the club position to calculate speed. Code like these: let radius: Float = 0.025 let mass: Float = 0.04593 // 质量,单位:kg var inertia = 2/5 * mass * pow(radius, 2) let currentPosition = entity.position(relativeTo: nil) let distance = distance(currentPosition, rgfc.lastPosition) let deltaTime = Float(context.deltaTime) let speed = distance / deltaTime let C_d: Float = 0.47 //阻力系数 let linearDamping = 0.5 * 1.2 * pow(speed, 2) * .pi * pow(radius, 2) * C_d //线性阻尼(1.2表示空气密度) entity.components[PhysicsBodyComponent.self]?.massProperties.inertia = SIMD3<Float>(inertia, inertia, inertia) entity.components[PhysicsBodyComponent.self]?.linearDamping = linearDamping // force let acceleration = speed / deltaTime let forceDirection = normalize(currentPosition - rgfc.lastPosition) let forceMultiplier: Float = 1.0 let appliedForce = forceDirection * mass * acceleration * forceMultiplier entityCollidedWith.addForce(appliedForce, at: rgfc.hitPosition, relativeTo: nil) Also I try to applyImpulse but not addForce, like: let linearImpulse = forceDirection * speed * forceMultiplier * mass No matter how I adjust the friction(static, dynamic) and restitution, using addForce or applyImpulse, the ball can only slide. How can I solve this problem?
0
0
585
Nov ’24
How to attach point cloud(or depth data) to heic?
I'm developing 3D Scanner works on iPad. I'm using AVCapturePhoto and Photogrammetry Session. photoCaptureDelegate is like below: extension PhotoCaptureDelegate: AVCapturePhotoCaptureDelegate { func photoOutput(_ output: AVCapturePhotoOutput, didFinishProcessingPhoto photo: AVCapturePhoto, error: Error?) { let fileUrl = CameraViewModel.instance.imageDir!.appendingPathComponent("\(PhotoCaptureDelegate.name)\(id).heic") let img = CIImage(cvPixelBuffer: photo.pixelBuffer!, options: [ .auxiliaryDepth: true, .properties: photo.metadata ]) let depthData = photo.depthData!.converting(toDepthDataType: kCVPixelFormatType_DepthFloat32) let colorSpace = CGColorSpace(name: CGColorSpace.sRGB) let fileData = CIContext().heifRepresentation(of: img, format: .RGBA8, colorSpace: colorSpace!, options: [ .avDepthData: depthData ]) try? fileData!.write(to: fileUrl, options: .atomic) } } But, Photogrammetry session spits warning messages: Sample 0 missing LiDAR point cloud! Sample 1 missing LiDAR point cloud! Sample 2 missing LiDAR point cloud! Sample 3 missing LiDAR point cloud! Sample 4 missing LiDAR point cloud! Sample 5 missing LiDAR point cloud! Sample 6 missing LiDAR point cloud! Sample 7 missing LiDAR point cloud! Sample 8 missing LiDAR point cloud! Sample 9 missing LiDAR point cloud! Sample 10 missing LiDAR point cloud! The session creates a usdz 3d model but scale is not correct. I think the point cloud can help Photogrammetry session to find right scale, but I don't know how to attach point cloud.
3
2
1.2k
Oct ’24
**Title:** Front-Facing Camera Rotation Matrix in ARKit: Consistency, Transformations, and `ARFrame.camera` Alignment
I'm seeking detailed information about the rotation matrix of the iPhone's front-facing (selfie) camera when using ARKit. Specifically, I need to understand: The exact rotation matrix applied to the front-facing camera's output in ARKit. Whether this matrix is consistent across all iPhone models or if there are variations. If there are any transformations applied to align the camera's coordinate system with the device's orientation, particularly in portrait mode. How this rotation matrix relates to the transform property of `ARFrame.camera
0
0
620
Oct ’24
Symbol Not Found Error in VNFaceLandmarkRegion2D with MacCatalyst on macOS 14.6.1 (Xcode 16)
We have updated our cross-platform applications to support iOS 18 and are in the final stages of releasing versions built with MacCatalyst. After merging the MacCatalyst changes with those for iOS 18, we are now required to build the app using Xcode 16. However, since transitioning to Xcode 16, the app builds successfully but crashes immediately on startup with the following error: dyld[45279]: Symbol not found: _$sSo22VNFaceLandmarkRegion2DC6VisionE16normalizedPointsSaySo7CGPointVGvg Referenced from: <211097A0-6612-3A9A-80B5-AE12915EBA2A> /Users/***/Library/Developer/Xcode/DerivedData/DM_iOS_Apps-gzpzdsacfldxxwclyngreqkbhtey/Build/Products/Debug-maccatalyst/MyApp.app/Contents/Frameworks/Filters_MyApp.framework/Versions/A/Filters_MyApp Expected in: <50DB755E-C83C-3FC7-A0BB-9C4DF9FEA374> /System/Library/Frameworks/Vision.framework/Versions/A/Vision This crash occurs only when building the app with Xcode 16 for MacCatalyst on macOS 14.6.1. On iOS and macOS 15, it functions as expected, and it also worked prior to the iOS 18 changes, which are independent of the Vision framework code, when building with Xcode 15. Here are the environment details where the error occurs: Xcode Version: Xcode 16.0 (16A242d) macOS Version: macOS Sonoma 14.6.1 And the setup where it works: Xcode Version: Xcode 16.0 (16A242d) macOS Version: macOS Sequoia 15.0 Additionally, attempting to implement a workaround using pointsInImage(imageSize:) resulted in a similar issue, where the symbol for this method is also missing. Is this a known issue? Are there any workarounds or fixes available? We have already submitted this issue as feedback (FB15164375), along with a demo project to illustrate the problem.
2
0
793
Oct ’24