Post not yet marked as solved
The VNDetectorOption_OriginatingRequestSpecifier required option was not found" UserInfo={NSLocalizedDescription=The VNDetectorOption_OriginatingRequestSpecifier required option was not found
Facing this error in only iOS15 while finding observation.
Post not yet marked as solved
I use VNDetectHumanBodyPoseRequest to detect body from an image which in xcode assets(I download from image website), But I get error below:
2021-12-24 21:50:19.945976+0800 Guess My Exercise[91308:4258893] [espresso] [Espresso::handle_ex_plan] exception=Espresso exception: "I/O error": Missing weights path cnn_human_pose.espresso.weights status=-2
Unable to perform the request: Error Domain=com.apple.vis Code=9 "Unable to setup request in VNDetectHumanBodyPoseRequest" UserInfo={NSLocalizedDescription=Unable to setup request in VNDetectHumanBodyPoseRequest}.
Below is my codes:
let image = UIImage(named: "image2")
guard let cgImage = image?.cgImage else{return}
let requestHandler = VNImageRequestHandler(cgImage: cgImage)
let request = VNDetectHumanBodyPoseRequest(completionHandler: bodyPoseHandler)
do {
// Perform the body pose-detection request.
try requestHandler.perform([request])
} catch {
print("Unable to perform the request: \(error).")
}
func bodyPoseHandler(request: VNRequest, error: Error?) {
guard let observations =
request.results as? [VNHumanBodyPoseObservation] else {
return
}
let poses = Pose.fromObservations(observations)
self.drawPoses(poses, onto: self.simage!)
// Process each observation to find the recognized body pose points.
}
Post not yet marked as solved
Modifying guidance given in an answer on AVFoundation + Vision trajectory detection, I'm instead saving time ranges of frames that have a specific ML label from my custom action classifier:
private lazy var detectHumanBodyPoseRequest: VNDetectHumanBodyPoseRequest = {
let detectHumanBodyPoseRequest = VNDetectHumanBodyPoseRequest(completionHandler: completionHandler)
return detectHumanBodyPoseRequest
}()
var timeRangesOfInterest: [Int : CMTimeRange] = [:]
private func readingAndWritingDidFinish(assetReaderWriter: AVAssetReaderWriter,
asset
completionHandler: @escaping FinishHandler) {
if isCancelled {
completionHandler(.success(.cancelled))
return
}
// Handle any error during processing of the video.
guard sampleTransferError == nil else {
assetReaderWriter.cancel()
completionHandler(.failure(sampleTransferError!))
return
}
// Evaluate the result reading the samples.
let result = assetReaderWriter.readingCompleted()
if case .failure = result {
completionHandler(result)
return
}
/*
Finish writing, and asynchronously evaluate the results from writing
the samples.
*/
assetReaderWriter.writingCompleted { result in
self.exportVideoTimeRanges(timeRanges: self.timeRangesOfInterest.map { $0.value }) { result in
completionHandler(result)
}
}
}
func exportVideoTimeRanges(timeRanges: [CMTimeRange], completion: @escaping (Result<OperationStatus, Error>) -> Void) {
let inputVideoTrack = self.asset.tracks(withMediaType: .video).first!
let composition = AVMutableComposition()
let compositionTrack = composition.addMutableTrack(withMediaType: .video, preferredTrackID: kCMPersistentTrackID_Invalid)!
var insertionPoint: CMTime = .zero
for timeRange in timeRanges {
try! compositionTrack.insertTimeRange(timeRange, of: inputVideoTrack, at: insertionPoint)
insertionPoint = insertionPoint + timeRange.duration
}
let exportSession = AVAssetExportSession(asset: composition, presetName: AVAssetExportPresetHighestQuality)!
try? FileManager.default.removeItem(at: self.outputURL)
exportSession.outputURL = self.outputURL
exportSession.outputFileType = .mov
exportSession.exportAsynchronously {
var result: Result<OperationStatus, Error>
switch exportSession.status {
case .completed:
result = .success(.completed)
case .cancelled:
result = .success(.cancelled)
case .failed:
// The `error` property is non-nil in the `.failed` status.
result = .failure(exportSession.error!)
default:
fatalError("Unexpected terminal export session status: \(exportSession.status).")
}
print("export finished: \(exportSession.status.rawValue) - \(exportSession.error)")
completion(result)
}
}
This worked fine with results vended from Apple's trajectory detection, but using my custom action classifier TennisActionClassifier (Core ML model exported from Create ML), I get the console error getSubtractiveDecodeDuration signalled err=-16364 (kMediaSampleTimingGeneratorError_InvalidTimeStamp) (Decode timestamp is earlier than previous sample's decode timestamp.) at MediaSampleTimingGenerator.c:180. Why might this be?
Post not yet marked as solved
Is it possible to use SNAudioFileAnalyzer with live HLS(m3u8) stream? Maybe we need to extract somehow audio from it?
And Can we use SNAudioFileAnalyzer with real remote url? Or we can use it only with files in file system?
Post not yet marked as solved
My goal is to mark any tennis video's timestamps of both the start of each rally/point and the end of each rally/point. I tried trajectory detection, but the "end time" is when the ball bounces rather than when the rally/point ends. I'm not quite sure what direction to go from here to improve on this. Would action classification of body poses in each frame (two classes, "playing" and "not playing") be the best way to split the video into segments? A different technique?
Hello everybody,
I am trying to run inference on a CoreML Model created by me using CreateML. I am following the sample code provided by Apple on the CoreML documentation page and every time I try to classify an image I get this error: "Could not create Espresso context".
Has this ever happened to anyone? How did you solve it?
Here is my code:
import Foundation
import Vision
import UIKit
import ImageIO
final class ButterflyClassification {
var classificationResult: Result?
lazy var classificationRequest: VNCoreMLRequest = {
do {
let model = try VNCoreMLModel(for: ButterfliesModel_1(configuration: MLModelConfiguration()).model)
return VNCoreMLRequest(model: model, completionHandler: { [weak self] request, error in
self?.processClassification(for: request, error: error)
})
}
catch {
fatalError("Failed to lead model.")
}
}()
func processClassification(for request: VNRequest, error: Error?) {
DispatchQueue.main.async {
guard let results = request.results else {
print("Unable to classify image.")
return
}
let classifications = results as! [VNClassificationObservation]
if classifications.isEmpty {
print("No classification was provided.")
return
}
else {
let firstClassification = classifications[0]
self.classificationResult = Result(speciesName: firstClassification.identifier, confidence: Double(firstClassification.confidence))
}
}
}
func classifyButterfly(image: UIImage) - Result? {
guard let ciImage = CIImage(image: image) else {
fatalError("Unable to create ciImage")
}
DispatchQueue.global(qos: .userInitiated).async {
let handler = VNImageRequestHandler(ciImage: ciImage, options: [:])
do {
try handler.perform([self.classificationRequest])
}
catch {
print("Failed to perform classification.\n\(error.localizedDescription)")
}
}
return classificationResult
}
}
Thank you for your help!
Post not yet marked as solved
I'm building a feature to automatically edit out all the downtime of a tennis video. I have a partial implementation that stores the start and end times of Vision trajectory detections and writes only those segments to an AVFoundation export session.
I've encountered a major issue, which is that the trajectories returned end whenever the ball bounce, so each segment is just one tennis shot and nowhere close to an entire rally with multiple bounces. I'm ensure if I should continue done the trajectory route, maybe stitching together the trajectories and somehow only splitting at the start and end of a rally.
Any general guidance would be appreciated.
Is there a different Vision or ML approach that would more accurately model the start and end time of a rally? I considered creating a custom action classifier to classify frames to be either "playing tennis" or "inactivity," but I started with Apple's trajectory detection since it was already built and trained. Maybe a custom classifier would be needed, but not sure.
Post not yet marked as solved
Given an AVAsset, I'm performing a Vision trajectory request on it and would like to write out a video asset that only contains frames with trajectories (filter out downtime in sports footage where there's no ball moving).
I'm unsure what would be a good approach, but as a starting point I tried the following pipeline:
Copy sample buffer from the source AVAssetReaderOutput.
Perform trajectory request on a vision handler parameterized by the sample buffer.
For each resulting VNTrajectoryObservation (trajectory detected), use its associated CMTimeRange to configure a new AVAssetReader set to that time range.
Append the time range constrained sample buffer to one AVAssetWriterInput until the forEach is complete.
In code:
private func transferSamplesAsynchronously(from readerOutput: AVAssetReaderOutput,
to writerInput: AVAssetWriterInput,
onQueue queue: DispatchQueue,
sampleBufferProcessor: SampleBufferProcessor,
completionHandler: @escaping () -> Void) {
/*
The writerInput continously invokes this closure until finished or
cancelled. It throws an NSInternalInconsistencyException if called more
than once for the same writer.
*/
writerInput.requestMediaDataWhenReady(on: queue) {
var isDone = false
/*
While the writerInput accepts more data, process the sampleBuffer
and then transfer the processed sample to the writerInput.
*/
while writerInput.isReadyForMoreMediaData {
if self.isCancelled {
isDone = true
break
}
// Get the next sample from the asset reader output.
guard let sampleBuffer = readerOutput.copyNextSampleBuffer() else {
// The asset reader output has no more samples to vend.
isDone = true
break
}
let visionHandler = VNImageRequestHandler(cmSampleBuffer: sampleBuffer, orientation: self.orientation, options: [:])
do {
try visionHandler.perform([self.detectTrajectoryRequest])
if let results = self.detectTrajectoryRequest.results {
try results.forEach { result in
let assetReader = try AVAssetReader(asset: self.asset)
assetReader.timeRange = result.timeRange
let trackOutput = AVTrackOutputs.firstTrackOutput(ofType: .video, fromTracks: self.asset.tracks,
withOutputSettings: nil)
assetReader.add(trackOutput)
assetReader.startReading()
guard let sampleBuffer = trackOutput.copyNextSampleBuffer() else {
// The asset reader output has no more samples to vend.
isDone = true
return
}
// Append the sample to the asset writer input.
guard writerInput.append(sampleBuffer) else {
/*
The writer could not append the sample buffer.
The `readingAndWritingDidFinish()` function handles any
error information from the asset writer.
*/
isDone = true
return
}
}
}
} catch {
print(error)
}
}
if isDone {
/*
Calling `markAsFinished()` on the asset writer input does the
following:
1. Unblocks any other inputs needing more samples.
2. Cancels further invocations of this "request media data"
callback block.
*/
writerInput.markAsFinished()
/*
Tell the caller the reader output and writer input finished
transferring samples.
*/
completionHandler()
}
}
}
private func readingAndWritingDidFinish(assetReaderWriter: AVAssetReaderWriter,
completionHandler: @escaping FinishHandler) {
if isCancelled {
completionHandler(.success(.cancelled))
return
}
// Handle any error during processing of the video.
guard sampleTransferError == nil else {
assetReaderWriter.cancel()
completionHandler(.failure(sampleTransferError!))
return
}
// Evaluate the result reading the samples.
let result = assetReaderWriter.readingCompleted()
if case .failure = result {
completionHandler(result)
return
}
/*
Finish writing, and asynchronously evaluate the results from writing
the samples.
*/
assetReaderWriter.writingCompleted { result in
completionHandler(result)
return
}
}
When run I get the following:
No error is caught in the first catch clause, and none are caught in private func readingAndWritingDidFinish(assetReaderWriter: AVAssetReaderWriter, completionHandler: @escaping FinishHandler), the completion handler is called.
Help with any of the following questions would be appreciated:
What is causing what appears to be indefinite loading?
How might I isolate the problem further?
Am I misusing or misunderstanding how to selectively read from time ranges of AVAssetReader objects?
Should I forego the AVAssetReader / AVAsssetWriter route entirely, and use the time ranges with AVAssetExportSession instead? I don't know how the two approaches compare, or what to consider when choosing between the two.
Post not yet marked as solved
I am saving time ranges from an input video asset where trajectories are found, then exporting only those segments to an output video file.
Currently I track these time ranges in a stored property var timeRangesOfInterest: [Double : CMTimeRange], which is set in the trajectory request's completion handler
func completionHandler(request: VNRequest, error: Error?) {
guard let request = request as? VNDetectTrajectoriesRequest else { return }
if let results = request.results,
results.count > 0 {
for result in results {
var timeRange = result.timeRange
timeRange.start = timeRange.start - self.assetWriterStartTime
self.timeRangesOfInterest[timeRange.start.seconds] = timeRange
}
}
}
Then these time ranges of interest are used in an export session to only export those segments
/*
Finish writing, and asynchronously evaluate the results from writing
the samples.
*/
assetReaderWriter.writingCompleted { result in
self.exportVideoTimeRanges(timeRanges: self.timeRangesOfInterest.map { $0.1 }) { result in
completionHandler(result)
}
}
Unfortunately however, I'm getting repeated trajectory video segments in the outputted video. Is this maybe because trajectory requests return "in progress" repeated trajectory results with slightly different time range start times? What might be a good strategy for avoiding or removing them? I noticed trajectory segments will appear out of order in the output as well.
Post not yet marked as solved
wwdc20-10673 briefly shows how to visualize optical flow generated by VNGenerateOpticalFlowRequest and sample code is available through the developer app. But how can we build the OpticalFlowVisualizer.ci.metallib file from the CI-kernel code provided as OpticalFlowVisualizer.cikernel?
Post not yet marked as solved
I'm using Vision to conduct some OCR from a live camera feed. I've setup my VNRecognizeTextRequests as follows:
let request = VNRecognizeTextRequest(completionHandler: recognizeTextCompletionHandler)
request.recognitionLevel = .accurate
request.usesLanguageCorrection = false
And I handle the results as follows:
guard let observations = request.results as? [VNRecognizedTextObservation] else { return }
for observation in observations {
if let recognizedText = observation.topCandidates(1).first {
guard recognizedText.confidence >= self.confidenceLimit, // set to 0.5
let foundText = validateRegexPattern(text: recognizedText.string, regexPattern: self.regexPattern),
let foundDecimal = Double(foundText) else { continue }
}
This is actually working great and yielding very accurate results, but the confidence values I'm receiving from the results are generally either 0.5 or 1.0, and rarely 0.3. I find these to be pretty nonsensical confidence values and I'm wondering if this is the intended result or some sort of bug. Conversely, using recognitionLevel = .fast yields more realistic and varied confidence values, but much less accurate results overall (even though fast is recommended for OCR from a live camera feed, I've had significantly better results using the accurate recognition level, which is why I've been using the accurate recognition level)
Post not yet marked as solved
Apple's sample code Identifying Trajectories in Video contains the following delegate callback:
func cameraViewController(_ controller: CameraViewController, didReceiveBuffer buffer: CMSampleBuffer, orientation: CGImagePropertyOrientation) {
let visionHandler = VNImageRequestHandler(cmSampleBuffer: buffer, orientation: orientation, options: [:])
if gameManager.stateMachine.currentState is GameManager.TrackThrowsState {
DispatchQueue.main.async {
// Get the frame of rendered view
let normalizedFrame = CGRect(x: 0, y: 0, width: 1, height: 1)
self.jointSegmentView.frame = controller.viewRectForVisionRect(normalizedFrame)
self.trajectoryView.frame = controller.viewRectForVisionRect(normalizedFrame)
}
// Perform the trajectory request in a separate dispatch queue.
trajectoryQueue.async {
do {
try visionHandler.perform([self.detectTrajectoryRequest])
if let results = self.detectTrajectoryRequest.results {
DispatchQueue.main.async {
self.processTrajectoryObservations(controller, results)
}
}
} catch {
AppError.display(error, inViewController: self)
}
}
}
}
However, instead of drawing UI whenever detectTrajectoryRequest.results exist (https://developer.apple.com/documentation/vision/vndetecttrajectoriesrequest/3675672-results), I'm interested in using the CMTimeRange provided by each result to construct a new video. In effect, this would filter down the original video to only frames with trajectories. How might I accomplish this, perhaps through writing only specific time ranges' frames from one AVFoundation video to a new AVFoundation video?
Post not yet marked as solved
Context
A SwiftUI app that uses the phone's camera to detect what hand sign I am doing in front of it and update a Text view in SwiftUI. The detection part is done in a UIViewController (primarily with Vision) and then that view is used in the main ContentView in SwiftUI. (UIViewControllerRepresentable)
Problem
I am able to print what hand sign I am doing in the front of the screen in the UIViewController, but not able to send that value to update the text label in SwiftUI.
Here are my three main code files that pertain to this issue:
ContentView.swift
struct ContentView: View {
@State var text = ""
var body: some View {
ZStack {
Color(hex: "3E065F")
.ignoresSafeArea()
VStack {
Text(text)
.foregroundColor(Color.white)
.font(.largeTitle)
MyCameraView(value: $text)
}
}
}
}
MyCameraView.swift
struct MyCameraView: UIViewControllerRepresentable {
@Binding var value: String
func makeUIViewController(context: Context) -> CameraViewController {
let cvc = CameraViewController()
return cvc
}
func updateUIViewController(
_ uiViewController: CameraViewController,
context: Context
) {
value = uiViewController.currText // Only Called Once!
}
}
CameraViewController.swift
final class CameraViewController: UIViewController,
AVCaptureVideoDataOutputSampleBufferDelegate {
var currText = "A"
...
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
...
guard let handSignsModel = try? VNCoreMLModel(for: SavedModel().model) else { print("Fail"); return }
let request = VNCoreMLRequest(model: handSignsModel) { (finishedRequest, err) in
guard let results = finishedRequest.results as? [VNClassificationObservation] else { return }
//print(results.first?.identifier)
self.currText = results.first!.identifier
print(self.currText)
}
try? VNImageRequestHandler(cvPixelBuffer: pixelBuffer, options: [:]).perform([request])
}
}
Please look at the print(self.currText) statement in the last file. I want to pass that value (every frame) to update the text in the ContentView.
I tried to use updateUIViewController method in MyCameraView.swift, but it does not get called to update the text label every frame, only the first time it loads.
Post not yet marked as solved
I am trying to train an Object Detection Model using transfer learning with a small dataset ( roughly 650 Images and two classes ) using Create ML v2.0 (53.2.2) with prefer external GPU checked. I am using a 2018 Mac mini 3.2 ghz I7 16 gb of ram and AMD Radeon Pro 580 eGPU.
The problem I am having is that I can only do about 3500 iterations before I run out of memory and I need to pause the training. When I resume training my Loss increases again and it takes a while for it to get back down to where it was before I paused.
So I am wondering if there is a better way to setup the hardware or any other suggestions so I can get through all of the iterations without having to pause. I don't recall having this issue with Create ML v1.0, so any suggestions would be appreciated.
Post not yet marked as solved
Hi,
I have seen this video: https://developer.apple.com/videos/play/wwdc2021/10041/
and in my project i am trying to draw detected barcodes.
I am using Vision framework and i have the barcode position in boundingBox parameter, but i dont understand cgrect of that parameter.
I am programming in objective c and i don't see resources, and for more complication i have not an image, i am capturing barcodes from video camera sesion.
for parts:
1-how can i draw detected barcode like in the video (from an image).
2-how can i draw detected barcode in capturesession.
I have used VNImageRectForNormalizedRect to pass from normalized to pixel, but the result is not correct.
thank you very much.
Post not yet marked as solved
VNContoursObservation is taking 715 times as long as OpenCV’s findContours() when creating directly comparable results.
VNContoursObservation creates comparable results when I have set the maximumImageDimension property to 1024. If I set it lower, it runs a bit faster, but creates lower quality contours and still takes over 100 times as long.
I have a hard time believing Apple doesn’t know what they are doing, so does anyone have an idea what is going on and how to get it to run much faster? There doesn’t seem to be many options, but nothing I’ve tried closes the gap. Setting the detectsDarkOnLight property to true makes it run even slower.
OpenCV findContours runs with a binary image, but I am passing a RGB image to Vision assuming it would convert it to an appropriate format.
OpenCV:
double taskStart = CFAbsoluteTimeGetCurrent();
int contoursApproximation = CV_CHAIN_APPROX_NONE;
int contourRetrievalMode = CV_RETR_LIST;
findContours(input, contours, hierarchy, contourRetrievalMode, contoursApproximation, cv::Point(0,0));
NSLog(@"###### opencv findContours: %f", CFAbsoluteTimeGetCurrent() - taskStart);
###### opencv findContours: 0.017616 seconds
Vision:
let taskStart = CFAbsoluteTimeGetCurrent()
let contourRequest = VNDetectContoursRequest.init()
contourRequest.revision = VNDetectContourRequestRevision1
contourRequest.contrastAdjustment = 1.0
contourRequest.detectsDarkOnLight = false
contourRequest.maximumImageDimension = 1024
let requestHandler = VNImageRequestHandler.init(cgImage: sourceImage.cgImage!, options: [:])
try! requestHandler.perform([contourRequest])
let contoursObservation = contourRequest.results?.first as! VNContoursObservation
print(" ###### contoursObservation: \(CFAbsoluteTimeGetCurrent() - taskStart)")
###### contoursObservation: 12.605962038040161
The image I am providing OpenCV is 2048 pixels and the image I am providing Vision is 1024.
Post not yet marked as solved
I want to use Esp32 Cam to capture video stream and do Object Recognition through vison. How can I input HTTP Live Streams into vison
Is it safe to assume once the observation has been generated, the points are cached in some structure internal to the observation?
Post not yet marked as solved
Hi,
is it possible to get the code for the demo app used in this presentation for the dynamic style transfer example please?
thanks
Post not yet marked as solved
Hello,
I found a bug using Vision for face detection on iPhone SE second generation.
When I execute the Apple demo project for face detection (https://developer.apple.com/documentation/vision/tracking_the_user_s_face_in_real_time) I have no face detected and the following error logs :
2021-04-16 10:36:29.110317+0200 VisionFaceTrack[2899:16783309] Metal API Validation Enabled
2021-04-16 10:36:29.443963+0200 VisionFaceTrack[2899:16783492] [espresso] [Espresso::handle_ex_plan] exception=ANECF error: loadModel:sandboxExtension:options:qos:withReply:: Program load failure
FaceDetection error: Optional(Error Domain=com.apple.vis Code=9 "encountered an unexpected condition: Unspecified error" UserInfo={NSLocalizedDescription=encountered an unexpected condition: Unspecified error}).
2021-04-16 10:36:29.445968+0200 VisionFaceTrack[2899:16783492] Failed to perform FaceRectangleRequest: Error Domain=com.apple.vis Code=9 "encountered an unexpected condition: Unspecified error" UserInfo={NSLocalizedDescription=encountered an unexpected condition: Unspecified error}
2021-04-16 10:36:29.460822+0200 VisionFaceTrack[2899:16783492] [espresso] [Espresso::handle_ex_plan] exception=ANECF error: loadModel:sandboxExtension:options:qos:withReply:: Program load failure
FaceDetection error: Optional(Error Domain=com.apple.vis Code=9 "encountered an unexpected condition: Unspecified error" UserInfo={NSLocalizedDescription=encountered an unexpected condition: Unspecified error}).
2021-04-16 10:36:29.461645+0200 VisionFaceTrack[2899:16783492] Failed to perform FaceRectangleRequest: Error Domain=com.apple.vis Code=9 "encountered an unexpected condition: Unspecified error" UserInfo={NSLocalizedDescription=encountered an unexpected condition: Unspecified error}
2021-04-16 10:36:29.482482+0200 VisionFaceTrack[2899:16783492] [espresso] [Espresso::handle_ex_plan] exception=ANECF error: loadModel:sandboxExtension:options:qos:withReply:: Program load failure
FaceDetection error: Optional(Error Domain=com.apple.vis Code=9 "encountered an unexpected condition: Unspecified error" UserInfo={NSLocalizedDescription=encountered an unexpected condition: Unspecified error}).
2021-04-16 10:36:29.483395+0200 VisionFaceTrack[2899:16783492] Failed to perform FaceRectangleRequest: Error Domain=com.apple.vis Code=9 "encountered an unexpected condition: Unspecified error" UserInfo={NSLocalizedDescription=encountered an unexpected condition: Unspecified error}
This bug is very annoying because it brokes all apps using Vision at least on this iPhone and iOS version.
I have an app using this technology on the AppStore, that's how I firstly discover the problem and am hopping for a quick fix.
iOS version : 14.4 (18D52)
iPhone SE second generation
Xcode version 12.4 (12D4e)
Edit : this seem to be solved with iOS 14.4.2.
Can we have more information about this bug ? Is it device model relative or all device with iOS 14.4 will have this bug ?