DriverKit kernel crash, possible PCI bridge issue?
In working on a DriverKit driver for a legacy SCSI controller, we have finally arrived at the expected (and usually joyful) point in driver development - a repeatable kernel crash and reboot. It happens when the controller first attempts to do "real work" by accessing requests or replies provided to it by the driver. I will provide additional information below (and can share more), but it’s a bit tricky because there is a lot of steps required to get to the point where the PCIe card is sufficiently initialized to respond, presumably attempt a memory access, and bring down the whole system.
-
The system freezes and reboots write after the first MemoryWrite32 with an address of a request.
-
The address is properly (we believe) mapped, using IODMACommand et al, approximately like so:
uint64_t reqArrSize = MPT_REQUEST_AREA * kMaxTasks;
IOAddressSegment arrseg = {};
IOMemoryMap * arrmap = nullptr;
ret = IOBufferMemoryDescriptor::Create(kIOMemoryDirectionInOut, reqArrSize, 0, &ivars->fReqArrayDesc);
if (ret != kIOReturnSuccess) { … }
IODMACommandSpecification dmaspec = {};
dmaspec.options = kIODMACommandSpecificationNoOptions;
dmaspec.maxAddressBits = 32;
// tried 31 here but that fails to allocate, the addresses start at 80000000, then leads to a crash
// but that seems to be a bridge / SID issue, not the address itself
ret = IODMACommand::Create(ivars->pciDevice, kIODMACommandCreateNoOptions, &dmaspec, &ivars->fReqArrayDMA);
if (ret != kIOReturnSuccess) { … }
ivars->fReqArrayDMA->retain();
uint64_t dmaFlags = 0;
uint32_t segCount = 1; // capacity in; actual out
IOAddressSegment segs[1] = {};
ret = ivars->fReqArrayDMA->PrepareForDMA(kIODMACommandPrepareForDMANoOptions, ivars->fReqArrayDesc, 0,
reqArrSize, &dmaFlags, &segCount, segs);
if ((ret != kIOReturnSuccess ) || (segCount != 1)) {…}
ivars->fReqArrayPhys = segs[0].address;
ivars->fReqArrayDesc->CreateMapping(0,0,0,0,0, &arrmap); // virtual mapping
ivars->fReqArray = (uint8_t *)(arrmap ? arrmap->GetAddress() : 0); // virtual
ivars->fReqFreemap.set(); // mark all request entries as free
memset(ivars->fReqArray, 0, reqArrSize);
- The crash is very violent and consistent:
panic(...): "dart-apciec0 (...): DART DART exception SID 0 ERROR_STATUS 0x80000001
ERROR_ADDRESS 0x0000000080000000 (no exceptionInfo)" @AppleT8110DART.cpp:1720
-
The fault address is the address we write to one of the controller's control register.
-
We suspect that the issue is that this card uses an 8114 bridge, which uses its SID for the request, instead of the (scsi controller chip) endpoint. The mapping, meanwhile, is under the endpoint’s SID. So when the controller attempts its first request using the physical address, the mapping does not exist at the right level - which brings down the whole system.
pcic0-bridge 0x106b/0x1015 Apple TB PCIe port
└ pci-bridge 0x1b21/0x2461 ASMedia (TB tunnel)
└ pci-bridge 0x1b21/0x2461 ASMedia IOPCITunnelled=Yes
└ pci-bridge 0x10b5/0x8114 - (on the controller card) PLX PEX 8114, PCIe-to-PCI-X bridge
└ scsi@8 0x1000/0x0030 - (on the controller card), LSI, on bus 4 (PCI-X secondary side)
-
Worse yet, there appears to be no way form within a DEXT to ask for a mapping at a different level, and the 8114 doesn’t have a mapper of its own anyway (attempting to do so with a separate DEXT fails).
-
That last part we know because we tried a workaround with two DEXTs, one that successfully matches the 8114 bridge and the other that matches the SCSI controller chip itself. However, the bridge doesn’t have a mapper attached and so the OS does not seem to give us a useful address (using the same IOBufferMemoryDescriptor::Create, ODMACommand::Create, PrepareForDMA sequence), and 32 bit request simply fails with kIOReturnMessageTooLarge.
Attached
is a quick diagram of our suspicions and a longer write-up (far warning, unlike the above, that is mostly AI output but it’s been reviewed by the team).
Questions:
-
Have others dealt with this problem of legacy PCI devices that have bridges that don’t fit neatly into the restrictions, and what did you do?
-
Is there a way to get the right virtual<->physical mappings recognized at the right level?
-
Is there a way to temporarily turn off SID checking or is that (as I assume) intrinsic to how this works?
-
Are we on the wrong track entirely?