Master Abort on a link side of the Root Complex

I'm encountering a bug developing a (Thunderbolt) PCIDriverKit dext. The system runs well for a while, and then the hardware instance is shut down by the OS (Sonoma 14.7.6) for some unknown reason, but the dext is not, so hardware reads return 0xFFFFFFFF (device disconnected). There is no crash report, or diagnostic report. I don't see anything related in a sysdiagnose capture. The only hint I can find is the kernel logs indicate:

2026-07-01 16:18:15.929 Df kernel[0:139] (AppleT8122PCIeC) apciec[pcic1-bridge]::handleCompletionTimeoutInterrupt Completion timeout detected at address 0xe00510020
2026-07-01 16:18:15.929 Df kernel[0:139] (AppleT8122PCIeC) apciec[pcic1-bridge]::handleCompleterAbortInterrupt Completer Abort received (pri_status = 0x00100407, sec_status = 0x200000f0):
2026-07-01 16:18:15.929 Df kernel[0:139] (AppleT8122PCIeC) apciec[pcic1-bridge]::handleCompleterAbortInterrupt   * Received Master Abort on a link side of the Root Complex

That address (0xe00510020) looks like it's our PCI device (at a BAR0, offset 0x0020) read.

What does this error explicitly mean? Did the device take too long to respond to the read request?

After the event, every device on that Thunderbolt bus is disconnected, and a reboot seems to be required. Hardware is a 14" 2023 MacBook Pro M3.

What does this error explicitly mean?

So, for future reference, breaking down the message sequence here:

handleCompletionTimeoutInterrupt Completion timeout detected at address 0xe00510020

As you noted, the address here is the PCI device address.

handleCompleterAbortInterrupt Completer Abort received (pri_status = 0x00100407, sec_status = 0x200000f0):

pri_status and sec_status are the primary and secondary status register values.

handleCompleterAbortInterrupt   * Received Master Abort on a link side of the Root Complex

This is our interpretation of those two register values. Currently, that comes from section 7.5.1.1.14 of PCI Express Base Specification, Rev. 4.0 Version 1.0.

https://pcisig.com/PCIExpress/Specs/Base/_4.0_1.0

Did the device take too long to respond to the read request?

I'm not sure what would have triggered it, but I believe a command timeout would have been handled by a different interrupt type.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Master Abort on a link side of the Root Complex
 
 
Q