Guides and Sample Code


Metal Best Practices Guide

On This Page

Triple Buffering

Best Practice: Implement a triple buffering model to update dynamic buffer data.

Dynamic buffer data refers to frequently updated data stored in a buffer. To avoid creating new buffers per frame and to minimize processor idle time between frames, implement a triple buffering model.

Prevent Access Conflicts and Reduce Processor Idle Time

Dynamic buffer data is typically written by the CPU and read by the GPU. An access conflict occurs if these operations happen at the same time; the CPU must finish writing the data before the GPU can read it, and the GPU must finish reading that data before the CPU can overwrite it. If dynamic buffer data is stored in a single buffer, this causes extended periods of processor idle time when either the CPU is stalled or the GPU is starved. For the processors to work in parallel, the CPU should be working at least one frame ahead of the GPU. This solution requires multiple instances of dynamic buffer data, so the CPU can write the data for frame n+1 while the GPU reads the data for frame n.

Reduce Memory Overhead and Frame Latency

You can manage multiple instances of dynamic buffer data with a FIFO queue of reusable buffers. However, allocating too many buffers increases memory overhead and may limit memory allocation for other resources. Additionally, allocating too many buffers increases frame latency if the CPU work is too far ahead of the GPU work.

Allow Time for Command Buffer Transactions

Dynamic buffer data is encoded and bound to a transient command buffer. It takes a certain amount of time to transfer this command buffer from the CPU to the GPU after it has been committed for execution. Similarly, it takes a certain amount of time for the GPU to notify the CPU that it has completed the execution of this command buffer. This sequence is detailed below, for a single frame:

  1. The CPU writes to the dynamic data buffer and encodes commands into a command buffer.

  2. The CPU schedules a completion handler (addCompletedHandler:), commits the command buffer (commit), and transfers the command buffer to the GPU.

  3. The GPU executes the command buffer and reads from the dynamic data buffer.

  4. The GPU completes its execution and calls the command buffer completion handler (MTLCommandBufferHandler).

This sequence can be parallelized with two dynamic data buffers, but the command buffer transactions may cause the CPU to stall or the GPU to starve if either processor is waiting on a busy dynamic data buffer.

Implement a Triple Buffering Model

Adding a third dynamic data buffer is the ideal solution when considering processor idle time, memory overhead, and frame latency. Figure 4-1 shows a triple buffering timeline, and Listing 4-1 shows a triple buffering implementation.

Figure 4-1Triple buffering timeline image: ../Art/ResourceManagement_TripleBuffering_2x.png
Listing 4-1Triple buffering implementation
  1. static const NSUInteger kMaxInflightBuffers = 3;
  2. /* Additional constants */
  3. @implementation Renderer
  4. {
  5. dispatch_semaphore_t _frameBoundarySemaphore;
  6. NSUInteger _currentFrameIndex;
  7. NSArray <id <MTLBuffer>> _dynamicDataBuffers;
  8. /* Additional variables */
  9. }
  10. - (void)configureMetal
  11. {
  12. // Create a semaphore that gets signaled at each frame boundary.
  13. // The GPU signals the semaphore once it completes a frame's work, allowing the CPU to work on a new frame
  14. _frameBoundarySemaphore = dispatch_semaphore_create(kMaxInflightBuffers);
  15. _currentFrameIndex = 0;
  16. /* Additional configuration */
  17. }
  18. - (void)makeResources
  19. {
  20. // Create a FIFO queue of three dynamic data buffers
  21. // This ensures that the CPU and GPU are never accessing the same buffer simultaneously
  22. MTLResourceOptions bufferOptions = /* ... */;
  23. NSMutableArray *mutableDynamicDataBuffers = [NSMutableArray arrayWithCapacity:kMaxInflightBuffers];
  24. for(int i = 0; i < kMaxInflightBuffers; i++)
  25. {
  26. // Create a new buffer with enough capacity to store one instance of the dynamic buffer data
  27. id <MTLBuffer> dynamicDataBuffer = [_device newBufferWithLength:sizeof(DynamicBufferData) options:bufferOptions];
  28. [mutableDynamicDataBuffers addObject:dynamicDataBuffer];
  29. }
  30. _dynamicDataBuffers = [mutableDynamicDataBuffers copy];
  31. }
  32. - (void)update
  33. {
  34. // Advance the current frame index, which determines the correct dynamic data buffer for the frame
  35. _currentFrameIndex = (_currentFrameIndex + 1) % kMaxInflightBuffers;
  36. // Update the contents of the dynamic data buffer
  37. DynamicBufferData *dynamicBufferData = [_dynamicDataBuffers[_currentFrameIndex] contents];
  38. /* Perform updates */
  39. }
  40. - (void)render
  41. {
  42. // Wait until the inflight command buffer has completed its work
  43. dispatch_semaphore_wait(_frameBoundarySemaphore, DISPATCH_TIME_FOREVER);
  44. // Update the per-frame dynamic buffer data
  45. [self update];
  46. // Create a command buffer and render command encoder
  47. id <MTLCommandBuffer> commandBuffer = [_commandQueue commandBuffer];
  48. id <MTLRenderCommandEncoder> renderCommandEncoder = [commandBuffer renderCommandEncoderWithDescriptor:_renderPassDescriptor];
  49. // Set the dynamic data buffer for the frame
  50. [renderCommandEncoder setVertexBuffer:_dynamicDataBuffers[_currentFrameIndex] offset:0 atIndex:0];
  51. /* Additional encoding */
  52. [renderCommandEncoder endEncoding];
  53. // Schedule a drawable presentation to occur after the GPU completes its work
  54. [commandBuffer presentDrawable:view.currentDrawable];
  55. __weak dispatch_semaphore_t semaphore = _frameBoundarySemaphore;
  56. [commandBuffer addCompletedHandler:^(id<MTLCommandBuffer> commandBuffer) {
  57. // GPU work is complete
  58. // Signal the semaphore to start the CPU work
  59. dispatch_semaphore_signal(semaphore);
  60. }];
  61. // CPU work is complete
  62. // Commit the command buffer and start the GPU work
  63. [commandBuffer commit];
  64. }
  65. @end