Article

About GPU Bandwidth

Learn about some of the main factors that affect bandwidth between a GPU and a system on a Mac.

Overview

The bandwidth between a GPU and a system is a crucial topic when developing high-performance Metal apps. Some GPUs are very powerful on their own, but this power can be severely degraded if a user has a suboptimal system setup or if your app is using a suboptimal GPU for a specific task.

In general, external GPUs are more powerful than many built-in GPUs (integrated or discrete). However, external GPUs typically have a lower bandwidth than built-in GPUs. Thus, data transfers between a system and an external GPU can be more expensive than data transfers between a system and its built-in GPUs. Additionally, data transfers between GPUs incur a significant cost because this process typically requires intermediary data transfers to the system; data can't be transferred directly between GPUs.

GPU Buses

Bandwidth is largely determined by the bus that connects a GPU to a system. This bus varies according to different types of GPUs:

  • Integrated GPUs are built-in GPUs that use the same system memory and bus as the CPU; they don't have a have a separate interface.

  • Discrete GPUs are built-in GPUs that are connected to a system by an internal PCIe bus. Depending on the specific GPU and Mac model, this type of bus can have a width of 8 (PCIe x8) or 16 (PCIe x16) memory lanes.

  • External GPUs are connected to a system by an external Thunderbolt 3 bus.

A system diagram that shows an iMac Pro connected to both a built-in discrete GPU with an internal PCIe bus and an external GPU with an external Thunderbolt 3 bus.

PCIe x16 has twice as much bandwidth as PCIe x8 and four times as much bandwidth as Thunderbolt 3.

A horizontal bar chart that shows the relative bandwidths of Thunderbolt 3 (1x), PCIe x8 (2x), and PCIe x16 (4x).

Resource Storage Modes

Bandwidth costs are minimized when data transfers across a bus are also minimized. This optimization is largely influenced by the storage mode of a resource, which determines the memory location and access permissions of a resource.

Shared resources are stored in system memory. Shared resources can be accessed by both the CPU and the GPU. This memory location means that a discrete GPU can access the resource only via a PCIe bus, and an external GPU can access the resource only via a Thunderbolt 3 bus. Compared to accessing video memory, accessing system memory is relatively slow for a discrete GPU and considerably slower for an external GPU. Thus, shared resources incur the highest bandwidth and data transfer costs.

A system diagram showing two shared resources in the same system memory that can be accessed by a discrete GPU via a PCIe bus and by an external GPU via a Thunderbolt 3 bus.

Private resources are stored in video memory. Private resources can be accessed only by the GPU. This memory location means that discrete and external GPUs can access a resource directly from within their own video memory. Compared to accessing system memory, accessing video memory is much faster for discrete and external GPUs. Thus, private resources incur the lowest bandwidth and data transfer costs.

A system diagram showing two private resources in separate video memory that can be accessed directly by a discrete GPU and an external GPU.

Managed resources are stored as a dual copy in both system memory and video memory. The resource copy in system memory can be accessed only by the CPU, and the resource copy in video memory can be accessed only by the GPU. These memory locations mean that discrete and external GPUs have fast access to the resource copy in video memory, but slower access via a blit operation to the resource copy in system memory. Managed resources have some costs associated with accessing system memory, but these costs are reduced by efficient blits. (Sparse blits between system memory and video memory are much faster than frequent and direct system memory access.)

A system diagram showing two managed resources with one copy in the same system memory and another copy in separate video memory. The resource copies in system memory can be accessed by a discrete GPU via a PCIe bus and by an external GPU via a Thunderbolt 3 bus. The resource copies in video memory can be accessed directly by a discrete GPU and an external GPU.

GPU-Driven Displays

Presenting a drawable on a display incurs significant bandwidth costs if the drawable has to be transferred between GPUs. Each display, whether it's built in or external, is driven by a single GPU. Therefore, the fastest path to present a drawable to any given display is to render that drawable with the GPU that drives the display. Otherwise, the drawable has to be transferred across from the GPU that renders it to the GPU that's driving the display.

An example is a Mac with a discrete GPU, connected to an external GPU that's also connected to an external display (where the external GPU drives the external display). If a drawable is rendered with a discrete GPU, the system has to transfer this drawable to the external GPU via the Thunderbolt 3 bus. To avoid this transfer, the drawable should instead be rendered with the external GPU.

A system diagram that shows two possible pathways for a drawable. The recommended pathway renders a drawable with an external GPU and presents it on an external display. The not recommended pathway renders a drawable with a discrete GPU and transfers it to an external GPU before presenting it on an external display.

In Macs with multiple built-in GPUs, drawable transfers may also occur if different GPUs render and present the drawable. An example is a MacBook Pro with an integrated and discrete GPU, with automatic graphics switching enabled (where the integrated GPU can drive the MacBook Pro's display). If a drawable is rendered with a discrete GPU, the system has to transfer this drawable to the integrated GPU via the PCIe bus. To avoid this transfer, the drawable should instead be rendered with the integrated GPU.

A system diagram that shows two possible pathways for a drawable. The recommended pathway renders a drawable with an integrated GPU and presents it on a built-in display. The not recommended pathway renders a drawable with a discrete GPU and transfers it to an integrated GPU before presenting it on a built-in display.

See Also

Selecting GPUs on Mac

Device Selection and Fallback for Graphics Rendering

Demonstrates how to work with multiple GPUs and efficiently render to a display.

Device Selection and Fallback for Compute Processing

Demonstrates how to work with multiple GPUs and efficiently execute a compute-intensive simulation.

About External GPUs

Learn how to support external GPUs in your macOS apps and games.

About Multi-GPU and Multi-Display Setups

Learn about the different ways that a user can connect external GPUs and external displays to a Mac computer.

Handling External GPU Additions and Removals

Register and respond to external GPU notifications initiated by a user.

Getting Different Types of GPUs

Obtain, identify, and choose suitable GPUs for your app.

Getting the GPU that Drives a View's Display

Keep up to date with the optimal device for your display.