Hello everyone,
We are in the process of migrating a high-performance storage KEXT to DriverKit. During our initial validation phase, we noticed a performance gap between the DEXT and the KEXT, which prompted us to try and optimize our I/O handling process.
Background and Motivation:
Our test hardware is a RAID 0 array of two HDDs. According to AJA System Test, our legacy KEXT achieves a write speed of about 645 MB/s on this hardware, whereas the new DEXT reaches about 565 MB/s. We suspect the primary reason for this performance gap might be that the DEXT, by default, uses a serial work-loop to submit I/O commands, which fails to fully leverage the parallelism of the hardware array.
Therefore, to eliminate this bottleneck and improve performance, we configured a dedicated parallel dispatch queue (MyParallelIOQueue) for the UserProcessParallelTask method.
However, during our implementation attempt, we encountered a critical issue that caused a system-wide crash.
The Operation Causing the Panic:
We configured MyParallelIOQueue using the following combination of methods:
- In the
.iigfile: We appended theQUEUENAME(MyParallelIOQueue)macro after theoverridekeyword of theUserProcessParallelTaskmethod declaration. - In the
.cppfile: We manually created a queue with the same name by calling theIODispatchQueue::Create()function within ourUserInitializeControllermethod.
The Result:
This results in a macOS kernel panic during the DEXT loading process, forcing the user to perform a hard reboot.
After the reboot, checking with the systemextensionsctl list command reveals the DEXT's status as [activated waiting for user], which indicates that it encountered an unrecoverable, fatal error during its initialization.
Key Code Snippets to Reproduce the Panic:
-
In
.iigfile - this was our exact implementation:class DRV_MAIN_CLASS_NAME: public IOUserSCSIParallelInterfaceController { public: virtual kern_return_t UserProcessParallelTask(...) override QUEUENAME(MyParallelIOQueue); }; -
In
.hfile:struct DRV_MAIN_CLASS_NAME_IVars { // ... IODispatchQueue* MyParallelIOQueue; }; -
In
UserInitializeControllerimplementation:kern_return_t IMPL(DRV_MAIN_CLASS_NAME, UserInitializeController) { // ... // We also included code to manually create the queue. kern_return_t ret = IODispatchQueue::Create("MyParallelIOQueue", kIODispatchQueueReentrant, 0, &ivars->MyParallelIOQueue); if (ret != kIOReturnSuccess) { // ... error handling ... } // ... return kIOReturnSuccess; }
Our Question:
What is the officially recommended and most stable method for configuring UserProcessParallelTask_Impl() to use a parallel I/O queue?
Clarifying this is crucial for all developers pursuing high-performance storage solutions with DriverKit. Any explanation or guidance would be greatly appreciated.
Best Regards,
Charles
Therefore, to eliminate this bottleneck and improve performance, we configured a dedicated parallel dispatch queue (MyParallelIOQueue) for the UserProcessParallelTask method.
Yeah... that won't work. UserProcessParallelTask is an OSAction target, which is already targeting a queue. I'd be curious to see how the panic() played out*, but I'm not surprised that you panicked.
*I suspect you deadlocked command submission long enough that the SCSI stack gave up and panicked, but that's purely a guess.
That leads to here:
What is the officially recommended and most stable method for configuring UserProcessParallelTask_Impl() to use a parallel I/O queue?
The answer here is to shift to UserProcessBundledParallelTasks. The architecture is a bit more complex, but it allows you to asynchronously receive and complete tasks in parallel while also reusing command and response buffers to minimize wiring cost. Take a look at the header files from SCSIControllerDriverKit for details on how this architecture works.
__
Kevin Elliott
DTS Engineer, CoreOS/Hardware