ARKit and streaming body motion keypoints, video and audio.

I like to use the iPhone12 Pro as a frontend in a project where we shall evaluate people behavior. The processing will be done on an external Ubuntu machine. I therefore like to transfer body motion keypoints, IMU data, Video frames and sound to the external machine. We are able to extract body motion keypoints and IMU data but are struggling with combining this with video and sound streaming. Is there a method in ARKit to extract the video frame that was used for the body pose estimation? In the same way, is there a way to extract the audio samples in ARKit or can this be done in another way?

I am not an experienced IOS programmer, but like to know if it is possible to achieve what we want.

Each ARFrame has a capturedImage, which provides you with the image from either the front or rear facing (depending on configuration) wide angle camera.

You can get each image from the session(_:didUpdate:) method of the ARSessionDelegate.

Similarly (assuming you've set providesAudioData on your configuration), you can get the audio samples from session(_:didOutputAudioSampleBuffer:).

ARKit and streaming body motion keypoints, video and audio.
 
 
Q