- macOS 10.13+
- Xcode 10.0+
macOS supports systems that have multiple GPUs, enabling Metal apps to increase their computational power. An example is an iMac Pro that has a built-in discrete GPU and is connected to multiple external GPUs. External GPUs can be more powerful than built-in GPUs, so they’re a great choice for compute-intensive workloads. However, to take advantage of this added power, Metal apps should be prepared to safely handle adding and removing any external GPUs for more robust execution. Metal apps should always perform well, whether one GPU or multiple GPUs are available to the system.
Sample Simulation Modes
This sample executes an N-body simulation and continuously renders a subset of the simulation’s intermediate results as it progresses to completion. After the simulation has completed, the sample renders all the N-body particles in their final state, presenting the full set of final results.
The sample runs in one of two modes, depending on how many GPUs are available to the system.
Simulation with a single Metal device. When only one Metal device is available, the sample executes the simulation and renders the intermediate results serially. The compute simulation and the graphics rendering are performed on the same thread by the same device.
Simulation with multiple Metal devices. When multiple Metal devices are available, the sample spawns a second thread to separate the compute processing and graphics rendering work between two GPUs. The threads run concurrently and repeatedly, sharing data as follows:
The simulation thread produces and transfers intermediate results to system memory.
The render thread—the main thread—consumes and renders intermediate results from system memory. (All graphics rendering occurs on the main thread.)
The threads don’t wait on each other; they work independently at different rates. The simulation thread runs as fast as possible and the render thread runs as fast as the display’s frame rate.
Handle External GPU Notifications
When an external GPU is connected to the system, this sample performs compute simulations on that external GPU and graphics rendering on a built-in GPU. Otherwise, the sample performs both compute and graphics work on a single built-in GPU.
When the sample receives a notification for an external GPU removal, it transfers all compute simulation data from the external GPU to the app’s view controller, which then transfers the data to a built-in GPU. This dynamic response ensures that the results of the compute simulation are efficiently retained and transferred for continued processing on a new GPU. It also ensures that the work in progress isn’t discarded and the simulation isn’t restarted.
When the sample receives a notification for an external GPU addition, it first completes the current simulation with the built-in GPU and then starts the next simulation with the external GPU.
This sample implements many techniques described in the Device Selection and Fallback for Graphics Rendering sample. For information about handling external GPU notifications, see the following sections from that sample:
Set a GPU Eject Policy
Register for External GPU Notifications
Respond to External GPU Notifications
Deregister from Notifications
Transfer Simulation Data Between Devices
This sample uses two separate classes to encode Metal commands:
AAPLSimulation for compute commands and
AAPLRenderer for graphics commands. The sample uses the simulation class to create a compute pipeline, initialize an N-body data set, and execute the simulation. The sample uses the renderer class to create a render pipeline and draw N-body data produced by the simulation. (The sample uses the
AAPLView class to separate work between the simulation and renderer classes.)
When the sample runs on a single device, the view controller executes the simulation and renderer work serially. For each frame, they both encode commands to the same
MTLCommand and they both share the positions of the N-body particles via the same
When the sample runs on multiple devices, the view controller assigns the simulation work to one device and the renderer work to another. The simulation executes repeatedly in a loop on a separate simulation thread. For each iteration, it updates the positions of the N-body particles and blits this data to a new
MTLBuffer backed by system memory. The sample passes this system memory backing to the view controller, which then passes it along to the renderer. The renderer executes repeatedly in a loop on the main thread. For each iteration, it creates a new
MTLBuffer, backed by the same system memory populated by the simulation thread, and renders the N-body particles based on the latest available position data.
Allocate System Memory for a Buffer
The sample calls the
vm function to allocate a page-aligned buffer,
update, backed by system memory. The sample then calls the
new method to create a new
_update, backed by the same system memory used for the previous buffer.
Because the app is directly responsible for managing this system memory, the sample calls the
init method to wrap the
update buffer in an
NSData object. This method allows the sample to register a deallocator,
dealloc, to free the system memory when the app has no more references to the buffer.