Hi everyone,
We are in the process of migrating a legacy KEXT for our external multi-disk RAID enclosure to the modern DriverKit framework. During the performance validation of our KEXT, we observed a large and consistent maximum throughput difference between Intel-based Macs and Apple Silicon-based Macs. We would like to share our findings and hope to discuss with others in the community to see if you have had similar experiences that could confirm or correct our understanding.
The Observation: A Consistent Performance Gap
When using the exact same external RAID hardware (an 8-HDD RAID 5 array), driven by our mature KEXT, we see the following results in high-throughput benchmarks (AJA System Test, large sequential writes):
- On a 2020 Intel-based Mac: We consistently achieve a throughput of ~2500 MB/s.
- On modern M-series Macs (from M1 to M4): The throughput is consistently capped at ~1500 MB/s.
This performance difference of nearly 40% is significant and is present across the entire Apple Silicon product line.
Our Hypothesis: A Shift in Architectural Design Philosophy
Since the KEXT and external hardware are identical in both tests, we believe this performance difference is not a bug but a fundamental platform architecture distinction. Our hypothesis is as follows:
1. The Intel Mac Era ("Dedicated Throughput") The Intel-based Macs we tested use a dedicated, discrete Intel Thunderbolt controller chip. This chip has its own dedicated PCIe lanes and resources, and its design appears to be singularly focused on maximizing raw, sustained data throughput for external peripherals.
2. The Apple Silicon Era ("Integrated Efficiency") In contrast, M-series Macs use a deeply integrated I/O controller inside the SoC. This controller must share resources, such as the total unified memory bandwidth and the chip's overall power budget, with all other functional units (CPU, GPU, etc.).
We speculate that the design priority for this integrated I/O controller has shifted from "maximizing single-task raw throughput" to "maximizing overall system efficiency, multi-task responsiveness, and low latency." As a result, in a pure, single-task storage benchmark, its performance ceiling may be lower than that of the older, dedicated-chip architecture.
Our Question to the Community:
Is our understanding correct? Have other developers of high-performance storage drivers or peripherals also observed a similar performance ceiling for external storage on Apple Silicon Macs, when compared to high-end Intel Macs?
We believe that understanding this as a deliberate architectural trade-off is crucial for setting realistic performance targets for our DEXT. Our current goal has been adjusted to have our DEXT match the KEXT's ~1500 MB/s on the M-series platform.
Any insights, confirmations, or corrections from the community or Apple engineers would be greatly appreciated.
Thank you very much!
Charles