DriverKit. Data Integrity issue.

Dear Apple engineers and DriverKit developers,
We have developed a DriverKit (DEXT) driver for an HBA RAID controller.
The RAID controller is connected to hosts through Thunderbolt (PCIe port of the Thunderbolt controller).
We use an IO script to verify the developed driver. The test fails after a few (10-12) hours of running with an error:
“BUG IN CLIENT OF LIBDISPATCH: dispatch_sync called on queue already owned by current thread”.

We inspected the stack trace of the crash report. This error happens in the interrupt handler.

Thread 5 Crashed: 

0   libdispatch.dylib                   0x19671aa8c __DISPATCH_WAIT_FOR_QUEUE__ + 484 
1   libdispatch.dylib                   0x19671a5d0 _dispatch_sync_f_slow + 152 
2   DriverKit                           0x195d3fc1c IODispatchQueue::DispatchSync_f(void*, void (*)(void*)) + 296 
3   DriverKit                           0x195d40860 IOInterruptDispatchSourceThread(void*) + 380 
4   libsystem_pthread.dylib             0x1968a3738 _pthread_start + 140 
5   libsystem_pthread.dylib             0x1968ac6c8 thread_start + 8

On our side we created 5 DispatchQueue(s) for the interrupt processing and configured 5 interrupt handlers using ConfigureInterrupts(kIOInterruptTypePCIMessagedX, 1, 5). It gives 5 interrupts, as requested and it is not clear what is the reason for the issue.

Our code samples are below

uint32_t configureInterrupts(uint32_t requested, uint32_t required) 

{ 

    const kern_return_t ret = ioPCIdevice->ConfigureInterrupts(kIOInterruptTypePCIMessagedX, required, requested); 

    if (ret != kIOReturnSuccess) 
        return 0; 

    uint64_t interruptType = 0; 
    uint32_t interruptsCount = 0; 
    uint32_t interruptIndex = 0; 

 
    for ( ;; ++interruptIndex) { 
        if (IOInterruptDispatchSource::GetInterruptType(ioPCIdevice, interruptIndex, &interruptType) != kIOReturnSuccess) 
            break; 

        if ((interruptType & kIOInterruptTypePCIMessagedX) == 0) 
            continue; 

        ++interruptsCount; 
    } 
    return interruptsCount; 
} 
.....


// Create DQs
    for(int i = 0; i < maxInterrupts; ++i) { 
        ret = IODispatchQueue::Create(INTERRUPT_DQ_NAME, 0, 0, &ivars->interruptQueue[i]); 
        if (kIOReturnSuccess != ret || nullptr == ivars->interruptQueue[i]) { 
            GH_PRINT_ERR("Interrupts queue %d creation failed with error %d", i, ret); 
            return false; 
        } 
    } 
.....


// Link DQ with interrupt 
 for(int index = 0; index < maxInterrupts; ++ index) { 
    kern_return_t ret = CreateActionHandleInterruptRequest(size, &ivars->interruptActions[index]); 
    if (kIOReturnSuccess != ret) { 
        GH_PRINT_ERR("Create action for interrupt handler %u failed.", index); 
        return ; 
    } 

    ret = IOInterruptDispatchSource::Create(ivars->PCI_io.dev, index, ivars->interruptQueue[index], &ivars->interruptSources[index]); 
    if (kIOReturnSuccess != ret || nullptr == ivars->interruptSources[index]) { 
        GH_PRINT_ERR("Creating interrupt source %u failed for interrupt index %u.", index, index); 
        return ; 
    } 

    ret = ivars->interruptSources[index]->SetHandler(ivars->interruptActions[index]); 
    if (kIOReturnSuccess != ret) { 
        GH_PRINT_ERR("Setting the handler for interrupt source %u failed.", index); 
        return ; 
    } 
    ivars->contexts[index] = ivars->interruptActions[index]->GetReference();     
} 
.....
 

// definition for interrupt handler
void    HandleInterruptRequest ( OSAction *action, uint64_t count, uint64_t time) TYPE (IOInterruptDispatchSource::InterruptOccurred); 

 

Do you have any clue how we can fix this error? Or directions and ways for investigation?
Please let us know if you need more details.

Please post the full crash log and I'll se what I can determine.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Hello, Please find crash log attached

Regards

So, let me start here:

“BUG IN CLIENT OF LIBDISPATCH: dispatch_sync called on queue already owned by current thread”.

This particular error happens any time you make a call into a dispatch queue which would inherently deadlock that queue. The most straightforward case would be calling DispatchSync inside code that's already running on that same queue, though the design of DriverKit can make it less obvious that this is what's going on. That leads me to thread 8:

Thread 8:
0   DriverKit                       	       0x195d4d29c iokit_user_client_trap + 8
1   DriverKit                       	       0x195d3cf00 OSMetaClassBase::Invoke(IORPC) + 312 
2   SCSIControllerDriverKit         	       0x195e8b364 IOUserSCSIParallelInterfaceController::ParallelTaskCompletion(OSAction*, SCSIUserParallelResponse, int (*)(OSMetaClassBase*, IORPC)) + 156 
3   com.mycompany.driverkit.MyDevice	       0x1042a6a20 invocation function for block in OsBridge::SCMD_DONE(void*, megasas_cmd_fusion*, SCSIUserParallelResponse*, megasas_cmd_fusion*)
4   DriverKit                       	       0x195d3f918 IODispatchQueueBlockFunction(void*) + 32 
5   DriverKit                       	       0x195d3f9dc invocation function for block in IODispatchQueue::DispatchAsync_f(void*, void (*)(void*)) + 164 

What is "SCMD_DONE" actually doing? What queue was it called on and what is calling out "to"?

More broadly, do your other crash logs show the same pattern or do they point to other locations?

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

DriverKit. Data Integrity issue.
 
 
Q