Hi,
I'm aiming to render frames as close as possible to the presentation time - it's for a smartphone-based VR headset (Google Cardboard style) where ideally there is a "late warp" just before presenting a new frame that applies both lens distortion and also orientation correction to reduce the error in the predicted head pose by leveraging the very latest motion sensor data. So leaving it as late as possible gives better pose predictions.
This late warp is a pretty simple pass - just a textured mesh, so it's typically <2ms of GPU time. Thanks to the Developer Labs it's been suggested I could use a compute shader for the warp so it can share GPU resources with any ongoing rendering work too (as Metal doesn't have a public per-queue priority to allow for pre-emption of other rendering work, which is the way this is generally handled on Android).
What I'm trying to figure out now is how best to schedule the rendering. With CAMetalLayer maximumDrawableCount set to 2, you're pretty much guaranteed that the frame will be displayed on the next vsync if rendering completes quickly enough. However sometimes the system seems to hold onto the drawables a bit longer than expected which blocks getNextDrawable.
With maximumDrawableCount of 3, it seems easy enough to maintain 60FPS but looking in instruments the CPU to display latency varies - there are times where its around 50ms (ie already 2 frames in the queue to be presented first, waitForNextDrawable blocks), some periods where it's 30 ms (generally 1 other frame queued) and sometimes where it drops down to the ideal 16ms or less.
Is there any way to call present that will just drop any other queued frames in the layer? I've tried presentDrawable:drawable atTime:0 and afterMinimumDuration:0 but to no avail.
It seems like with CAMetalLayer I'll just have to addPresentedHandler blocks to keep track of how many are queued in the display so I can ensure the queue is generally empty before presenting the next frame.
A related question is the deadline for completing the rendering. The CAMetalLayer is in the compositing fast path, but it seems rendering needs to still complete (ie all the GPU workload finished) around 5ms before the next vsync for it to be displayed on the next vsync. I suspect there's a deadline for the frame just in case it needs to be composited but any hints / ideas for handling that would be appreciated. It seems to be slightly device-specific; somewhat unexpectedly, the iPod touch 7 latches frames that finish much closer to the vsync time than the iPhone 12 Pro.
I've also just come across AVSampleBufferDisplayLayer that I'm taking a look at now. It seems to offer better control of the queue, and still enables the compositing fast path, but I can't actually see any feedback like addPresentedHandler to be able to judge what the deadline is to have a frame shown in the next vsync.