AVAssetReader allows you to read video frames from a movie file. It outputs CMSampleBuffers, each of which contains CVPixelBuffer that can be fed to CoreML. Many color image models work best with 32BGRA pixel format. You can request AVAssetReader to output the said pixel format through outputSettings: parameter.
The following example reads from a movie file named "IMG_0522.mov" and run Resnet50 image classification on each frame. Note that Resnet50 class is auto-generated when you add Resnet50.mlmodel to the Xcode project. You can find the model in our model gallery (https://developer.apple.com/machine-learning/models/)
let movieURL = Bundle.module.url(forResource: "IMG_0522", withExtension: "mov")!
let model = try! await Resnet50.load(configuration: MLModelConfiguration())
let asset = AVAsset(url: movieURL)
let assetTrack = try! await asset.loadTracks(withMediaType: .video).first!
let assetReader = try! AVAssetReader(asset: asset)
let outputSettings: [String: Any] = [
String(kCVPixelBufferPixelFormatTypeKey): kCVPixelFormatType_32BGRA,
String(kCVPixelBufferWidthKey): 224,
String(kCVPixelBufferHeightKey): 224,
]
let assetReaderTrack = AVAssetReaderTrackOutput(track: assetTrack, outputSettings: outputSettings)
assetReader.add(assetReaderTrack)
assetReader.startReading()
while let sampleBuffer = assetReaderTrack.copyNextSampleBuffer() {
guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
continue
}
let prediction = try! model.prediction(image: pixelBuffer)
let frameTime = String(format: "% 4.2f", CMSampleBufferGetPresentationTimeStamp(sampleBuffer).seconds)
print("\(frameTime) seconds: \(prediction.classLabel)")
}