I like to use the iPhone12 Pro as a frontend in a project where we shall evaluate people behavior. The processing will be done on an external Ubuntu machine. I therefore like to transfer body motion keypoints, IMU data, Video frames and sound to the external machine. We are able to extract body motion keypoints and IMU data but are struggling with combining this with video and sound streaming. Is there a method in ARKit to extract the video frame that was used for the body pose estimation? In the same way, is there a way to extract the audio samples in ARKit or can this be done in another way?
I am not an experienced IOS programmer, but like to know if it is possible to achieve what we want.