CoreML on video files

Is it possible to use Core ML on video files? My goal is to have a recorded video and process it for text in the video. When that text or video is found mark it and then extract clips form the video based on where the text was found. Making highlights basically. I am kind of lost as to where to start.

CoreML supports image input / output. You can feed CVPixelBuffer via CoreML API and look up prediction. Here's a good starting point for reading CVPixelBuffer from videos: https://developer.apple.com/documentation/accelerate/reading_from_and_writing_to_core_video_pixel_buffers.

For this, you need to first build a model that takes image and produces a boolean to indicate if there's text in the image. For this, I don't have any specific pointer but, you might be able to find some existing model architectures / trained models that can do this. Once you have a model, you can use CoreML tools to convert that into CoreML format: https://coremltools.readme.io.

CoreML on video files
 
 
Q