I've rewritten my problem more concisely below.
I'd like to perform pose analysis on user imported video, automatically producing an AVFoundation video output where only frames with a detected pose - https://developer.apple.com/documentation/createml/detecting_human_actions_in_a_live_video_feed are a part of the result. In the Building a Feature-Rich App for Sports Analysis - https://developer.apple.com/documentation/vision/building_a_feature-rich_app_for_sports_analysis sample code, analysis happens by implementing the func cameraViewController(_ controller: CameraViewController, didReceiveBuffer buffer: CMSampleBuffer, orientation: CGImagePropertyOrientation) delegate callback, such as in line 326 of GameViewController.swift.
Where I'm stuck is using this analysis to only keep particular frames with a pose detected. Say I've analyzed all CMSampleBuffer frames and classified which ones have the pose I want. How would I only those specific frames for the new video output?