System Panic with IOUserSCSIParallelInterfaceController during Dispatch Queue Configuration

Hello everyone,

We are in the process of migrating a high-performance storage KEXT to DriverKit. During our initial validation phase, we noticed a performance gap between the DEXT and the KEXT, which prompted us to try and optimize our I/O handling process.

Background and Motivation:

Our test hardware is a RAID 0 array of two HDDs. According to AJA System Test, our legacy KEXT achieves a write speed of about 645 MB/s on this hardware, whereas the new DEXT reaches about 565 MB/s. We suspect the primary reason for this performance gap might be that the DEXT, by default, uses a serial work-loop to submit I/O commands, which fails to fully leverage the parallelism of the hardware array.

Therefore, to eliminate this bottleneck and improve performance, we configured a dedicated parallel dispatch queue (MyParallelIOQueue) for the UserProcessParallelTask method.

However, during our implementation attempt, we encountered a critical issue that caused a system-wide crash.

The Operation Causing the Panic:

We configured MyParallelIOQueue using the following combination of methods:

  1. In the .iig file: We appended the QUEUENAME(MyParallelIOQueue) macro after the override keyword of the UserProcessParallelTask method declaration.
  2. In the .cpp file: We manually created a queue with the same name by calling the IODispatchQueue::Create() function within our UserInitializeController method.

The Result:

This results in a macOS kernel panic during the DEXT loading process, forcing the user to perform a hard reboot.

After the reboot, checking with the systemextensionsctl list command reveals the DEXT's status as [activated waiting for user], which indicates that it encountered an unrecoverable, fatal error during its initialization.

Key Code Snippets to Reproduce the Panic:

  1. In .iig file - this was our exact implementation:

    class DRV_MAIN_CLASS_NAME: public IOUserSCSIParallelInterfaceController
    {
    public:
        virtual kern_return_t UserProcessParallelTask(...) override
            QUEUENAME(MyParallelIOQueue);
    };
    
  2. In .h file:

    struct DRV_MAIN_CLASS_NAME_IVars {
        // ...
        IODispatchQueue*    MyParallelIOQueue;
    };
    
  3. In UserInitializeController implementation:

    kern_return_t
    IMPL(DRV_MAIN_CLASS_NAME, UserInitializeController)
    {
        // ...
        // We also included code to manually create the queue.
        kern_return_t ret = IODispatchQueue::Create("MyParallelIOQueue",
                                                    kIODispatchQueueReentrant,
                                                    0,
                                                    &ivars->MyParallelIOQueue);
        if (ret != kIOReturnSuccess) {
            // ... error handling ...
        }
        // ...
        return kIOReturnSuccess;
    }
    

Our Question:

What is the officially recommended and most stable method for configuring UserProcessParallelTask_Impl() to use a parallel I/O queue?

Clarifying this is crucial for all developers pursuing high-performance storage solutions with DriverKit. Any explanation or guidance would be greatly appreciated.

Best Regards,

Charles

Answered by DTS Engineer in 865478022

Therefore, to eliminate this bottleneck and improve performance, we configured a dedicated parallel dispatch queue (MyParallelIOQueue) for the UserProcessParallelTask method.

Yeah... that won't work. UserProcessParallelTask is an OSAction target, which is already targeting a queue. I'd be curious to see how the panic() played out*, but I'm not surprised that you panicked.

*I suspect you deadlocked command submission long enough that the SCSI stack gave up and panicked, but that's purely a guess.

That leads to here:

What is the officially recommended and most stable method for configuring UserProcessParallelTask_Impl() to use a parallel I/O queue?

The answer here is to shift to UserProcessBundledParallelTasks. The architecture is a bit more complex, but it allows you to asynchronously receive and complete tasks in parallel while also reusing command and response buffers to minimize wiring cost. Take a look at the header files from SCSIControllerDriverKit for details on how this architecture works.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Therefore, to eliminate this bottleneck and improve performance, we configured a dedicated parallel dispatch queue (MyParallelIOQueue) for the UserProcessParallelTask method.

Yeah... that won't work. UserProcessParallelTask is an OSAction target, which is already targeting a queue. I'd be curious to see how the panic() played out*, but I'm not surprised that you panicked.

*I suspect you deadlocked command submission long enough that the SCSI stack gave up and panicked, but that's purely a guess.

That leads to here:

What is the officially recommended and most stable method for configuring UserProcessParallelTask_Impl() to use a parallel I/O queue?

The answer here is to shift to UserProcessBundledParallelTasks. The architecture is a bit more complex, but it allows you to asynchronously receive and complete tasks in parallel while also reusing command and response buffers to minimize wiring cost. Take a look at the header files from SCSIControllerDriverKit for details on how this architecture works.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

System Panic with IOUserSCSIParallelInterfaceController during Dispatch Queue Configuration
 
 
Q