[DEXT Migration Issue] IOUserSCSIParallelInterfaceController fails to handle low-level I/O from `diskutil`

Hello everyone,

We are migrating our KEXT for a Thunderbolt storage device to a DEXT based on IOUserSCSIParallelInterfaceController.

We've run into a fundamental issue where the driver's behavior splits based on the I/O source: high-level I/O from the file system (e.g., Finder, cp) is mostly functional (with a minor ls -al sorting issue for Traditional Chinese filenames), while low-level I/O directly to the block device (e.g., diskutil) fails or acts unreliably. Basic read/write with dd appears to be mostly functional.

We suspect that our DEXT is failing to correctly register its full device "personality" with the I/O Kit framework, unlike its KEXT counterpart. As a result, low-level I/O requests with special attributes (like cache synchronization) sent by diskutil are not being handled correctly by the IOUserSCSIParallelInterfaceController framework of our DEXT.

Actions Performed & Relevant Logs

1. Discrepancy: diskutil info Shows Different Device Identities for DEXT vs. KEXT

For the exact same hardware, the KEXT and DEXT are identified by the system as two different protocols.

KEXT Environment:

   Device Identifier:         disk5
   Protocol:                  Fibre Channel Interface
   ...
   Disk Size:                 66.0 TB
   Device Block Size:         512 Bytes

DEXT Environment:

   Device Identifier:         disk5
   Protocol:                  SCSI
   SCSI Domain ID:            2
   SCSI Target ID:            0
   ...
   Disk Size:                 66.0 TB
   Device Block Size:         512 Bytes

2. Divergent I/O Behavior: Partial Success with Finder/cp vs. Failure with diskutil

  • High-Level I/O (Partially Successful): In the DEXT environment, if we operate on an existing volume (e.g., /Volumes/MyVolume), file copy operations using Finder or cp succeed. Furthermore, the logs we've placed in our single I/O entry point, UserProcessParallelTask_Impl, are triggered.

    • Side Effect: However, running ls -al on such a volume shows an incorrect sorting order for files with Traditional Chinese names (they appear before . and ..).
  • Low-Level I/O (Contradictory Behavior): In the DEXT environment, when we operate directly on the raw block device (/dev/disk5):

    • diskutil partitionDisk ... -> Fails 100% of the time with the error: Error: -69825: Wiping volume data to prevent future accidental probing failed.
    • dd command -> Basic read/write operations appear to work correctly (a write can be immediately followed by a read within the same DEXT session, and the data is correct).

3. Evidence of Cache Synchronization Failure (Non-deterministic Behavior)

The success of the dd command is not deterministic. Cross-environment tests prove that its write operations are unreliable:

  • First Test:

    1. In the DEXT environment, write a file with random data to /dev/disk5 using dd.
    2. Reboot into the KEXT environment.
    3. Read the data back from /dev/disk5 using dd. The result is a file filled with all zeros.
    • Conclusion: The write operation only went to the hardware cache, and the data was lost upon reboot.
  • Second Test:

    1. In the DEXT environment, write the same random file to /dev/disk5 using dd.
    2. Key Variable: Immediately after, still within the DEXT environment, read the data back once for verification. The content is correct!
    3. Reboot into the KEXT environment.
    4. Read the data back from /dev/disk5. This time, the content is correct!
    • Conclusion: The additional read operation in the second test unintentionally triggered a hardware cache flush. This proves that the dd (in our DEXT) write operation by itself does not guarantee synchronization, making its behavior unreliable.

Our Problem

Based on the observations above, we have the conclusion:

  1. High-Level Path (triggered by Finder/cp): When an I/O request originates from the high-level file system, the framework seems to enter a fully-featured mode. In this mode, all SCSI commands, including READ/WRITE, INQUIRY, and SYNCHRONIZE CACHE, are correctly packaged and dispatched to our UserProcessParallelTask_Impl entry point. Therefore, Finder operations are mostly functional.

  2. Low-Level Path (triggered by dd/diskutil): When an I/O request originates from the low-level raw block device layer:

    • The most basic READ/WRITE commands can be dispatched (which is why dd appears to work).
    • However, critical management commands, such as INQUIRY and SYNCHRONIZE CACHE, are not being correctly dispatched or handled. This leads to the incorrect device identification in diskutil info and the failure of diskutil partitionDisk due to its inability to confirm cache synchronization.

We would greatly appreciate any guidance, suggestions, or insights on how to resolve this discrepancy. Specifically, what is the recommended approach within DriverKit to ensure that a DEXT based on IOUserSCSIParallelInterfaceController can properly declare its capabilities and handle both high-level and low-level I/O requests uniformly?

Thank you.

Charles

Answered by DTS Engineer in 865046022

Based on the setProperty calls from our KEXT's source code and the properties from the .ioreg analysis, we have implemented the following four-part configuration in our DEXT:

I'm confused. Above I told you that the problem was:

  1. You've failed to define kIOMaximumSegmentByteCount* keys, as UserReportHBAConstraints basically "requires".

Have you defined that key?

More specifically, UserReportHBAConstraints() as a list of required keys:

Key:										Required:

kIOMaximumSegmentCountReadKey = 			Yes
kIOMaximumSegmentCountWriteKey = 			Yes
kIOMaximumSegmentByteCountReadKey = 		Yes
kIOMaximumSegmentByteCountWriteKey = 		Yes
kIOMinimumSegmentAlignmentByteCountKey = 	Yes
kIOMaximumSegmentAddressableBitCountKey = 	Yes
kIOMinimumHBADataAlignmentMaskKey =  		Yes

Your DEXT has not defined all of them and, as they are required, you should not expect your DEXT to function properly until you've defined all of them.

Similarly:

Low-Level Sync (UserGetDMASpecification): We also set maxTransferSize to 512KB to ensure consistency with the HBA layer.

I haven't gotten explicit confirmation from the engineering team, but at this point I'm fairly convinced that maxTransferSize MUST either:

maxTransferSize >= kIOMaximumSegmentCountReadKey * kIOMaximumSegmentByteCountReadKey
OR
maxTransferSize >= kIOMaximumSegmentCountWriteKey * kIOMaximumSegmentByteCountWriteKey

...whichever of the two is larger. maxTransferSize basically defines "the largest possible transfer your controller could EVER handle", which would obviously be your segment count * segment size. Critically, using a smaller maxTransferSize won't cause an immediate failure, only a later failure if/when the kernel actually tried to "give you" a large enough transfer.

I've actually just filed a bug on this (r.164177660) asking that IOUserSCSIParallelInterfaceController fail completely if configuration is incomplete or if that configuration is in any way incomplete.

(We have confirmed that IORegistryExplorer shows the Protocol Characteristics were set successfully, but IOMaximumBlockCountWrite remains 0xffff.)

Yes. Your DEXT cannot set kIOMaximumByteCountWriteKey, which means it will return to its default value of 0xffff.

The failure of bs=768k, however, shows that IOKit's IOBreaker splitting functionality has not been successfully activated. If it had been, we should have at least seen the first 512KB sub-request in our DEXT's log, but in fact, we saw nothing.

No, that's NOT what it shows. Your DEXT is going to fail ANY transfer larger than 512KB because that's what YOU defined maxTransferSize as:

The failure of bs=768k, however, shows that IOKit's IOBreaker splitting functionality has not been successfully activated.

You're right, it's not activating. That's because you're currently telling the kernel that you can handle a transfer up to:

IOMaximumBlockCountRead = Oxffff -> 31 MB

...so any transfer smaller than 31 MB will be passed directly down the I/O stack. However, IOUserSCSIParallelInterfaceController is going to fail any transfer smaller than maxTransferSize.

Again, the way to fix this is to:

  1. Define the two kIOMaximumSegmentCount keys, so that IOBreaker will divide the requests into something "reasonable".

  2. Set maxTransferSize to a large enough value that it can handle the configuration you're creating in #1.

Our plan for the next step is to manually write a "second-layer I/O splitter" from scratch within our UserProcessParallelTask_Impl (for example, to split a received 512KB request into eight 64KB hardware commands).

That depends on what you mean by "splitter". The expected implementation here is that you'll take the value fBufferIOVMAddr and use basic math to divide it up into smaller chunks, each of which will be a scatter gather entry you pass over to your card.

Historically, I believe this was done by subdividing the IOMemoryDescriptor and generating individual IODMACommands; however, the nature of the DART means that this is somewhat silly and unnecessary, so it's done with a single IODMACommand over the entire descriptor.

Finally, making sure this is as clear as possible, this does mean that maxTransferSize is likely to be much, potentially MUCH, larger than you'd "expect" in the older architecture. I don't have the data at hand to validate the exact number, but I believe there is a Fibre Channel DEXT which is setting maxTransferSize to ~1 GB.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

So, let me start with a clarification here:

In the DEXT environment, when we operate directly on the raw block device (/dev/disk5):

The "raw" block device here would be "/dev/rdisk5", not "/dev/disk5". Your actually interacting with the "cached" device, not the "raw" device.

That leads to here:

First Test:

  1. In the DEXT environment, write a file with random data to /dev/disk5 using dd.
  2. Reboot into the KEXT environment.
  3. Read the data back from /dev/disk5 using dd. The result is a file filled with all zeros.

Conclusion: The write operation only went to the hardware cache, and the data was lost upon reboot.

What actually happened in your DEXT for #1? Are you sure you received ANY I/O command? Also, have you tried this test entirely in the KEXT environment? In particular, what happens if you pull power from the machine (instead of cleanly shutting down)?

On KEXT side, I think what's actually happening here is the following:

  • The write hits the UBC (Universal Buffer Cache) without actually generating an I/O commands.

  • The shutdown process flushes everything out of the cache and the KEXT ensures everything is "finalized" to disk before it allows terminations.

I'd expect this to generally work the same in a DEXT, but if you're DEXT isn't ensuring everything is flushed during teardown, then that could cause data loss.

Second Test:

  1. In the DEXT environment, write the same random file to /dev/disk5 using dd.
  2. Key Variable: Immediately after, still within the DEXT environment, read the data back once for verification. The content is correct!

I'd be curious to see how the command stream differed here, but my guess about what happened here is that the system had to commit the write to disk before it triggered the read. However, it's also possible that the UBC returned the data you'd written directly from the cache, while also issuing a "flush" on it's own.

One thing I'd suggest testing here is exactly how your DEXT behaves if you run "diskutil eject". I think that should behave similar to what happens at shutdown, without the headache of trying to log/debug across reboots.

However, that leads to here:

diskutil partitionDisk ... -> Fails 100% of the time with the error: Error: -69825: Wiping volume data to prevent future accidental probing failed

Something different is going on here. In general, diskutil use the rdisk node, which means you're getting direct I/O requests without the write cache being involved.

The error code itself basically means "wiping failed", but that does narrow the scope enough that I'm sure this targeted the rdisk, not the disk. Do any writes make it your disk when you get this error?

Our code uses that error in two places, once for errors coming out of "pwrite" and the other if something goes wrong when setting up the initial buffers that are used the I/O source.

In terms of that second path, take a look at the code in IOMediaBSDClient.cpp, particularly dkioctl. IOMediaBSDClient is where things like ioctl() calls are "converted" into IOKit's architecture and, as you'll see, most of them are just reading properties off their underlying IOMedia object, all of which were propagated "up" from your controller. If you're not getting any writes, then I would start by comparing the IOMedia configuration of KEXT vs. DEXT and see if you can find anything that's wrong.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Hi Kevin,

We've encountered an I/O issue with our storage DEXT and were hoping for your expert insight.

Our storage DEXT works correctly when handling I/O requests directed to the buffered device node (/dev/disk5).

However, when I/O requests target the raw device node (/dev/rdisk5), the operation fails with an Input/output error if the transfer size is 128KB or larger. In contrast, all transfers of 64KB or smaller succeed consistently.

Detailed Debugging Steps & Observations:

After a full reboot of macOS to ensure a clean test environment, we performed a series of tests exclusively on the /dev/rdisk5 node.

A 64KB transfer succeeds:

$ sudo dd if=/dev/zero of=/dev/rdisk5 bs=64k count=1
1+0 records in
1+0 records out
65536 bytes transferred in 0.003461 secs (18935568 bytes/sec)

A 128KB transfer consistently fails:

$ sudo dd if=/dev/zero of=/dev/rdisk5 bs=128k count=1
dd: /dev/rdisk5: Input/output error
1+0 records in
0+0 records out
0 bytes transferred in 0.000247 secs (0 bytes/sec)

Log Analysis:

  • When the 64KB transfer succeeds, our DEXT's core I/O handling logic is triggered, and we can see detailed logs for the WRITE(10) command being prepared, including the CDB content and a reported SGCount of 1.
[UserProcessParallelTask_Impl] fRequestedTransferCount  = 65536
[UserProcessParallelTask_Impl] CDB                      = 0x2a 0x00 ... 0x00 0x80 0x00
...
[AME_DAS_SCSI_Prepare]     > Opcode (from CDB[0]): 0x2a : WRITE (10)
[AME_DAS_SCSI_Prepare]     > DataLength: 0x00010000
[AME_DAS_SCSI_Prepare]     > SGCount: 1
  • When the 128KB transfer fails, we see absolutely no logs from our core I/O handling function. The I/O request seemingly disappears before it reaches the core part of our DEXT that prepares the SCSI command.

Related Code Implementation:

We have implemented the UserGetDMASpecification method in our DEXT to report our hardware's DMA capability limits to the system. The current code, where we explicitly set maxTransferSize to 64KB, is shown below.

kern_return_t
IMPL (DRV_MAIN_CLASS_NAME, UserGetDMASpecification)
{
    // The maximum size of a single DMA transfer. 64KB is a safe value.
    *maxTransferSize = 64 * 1024;
    
    // Standard 4-byte alignment.
    *alignment = 4;
    
    // We explicitly tell macOS that our hardware can ONLY handle
    // 32-bit physical addresses.
    *numAddressBits = 32;
    *segmentType = kDMAOutputSegmentHost64;

    Log("Reporting DMA Constraints: maxTransferSize=%llu, alignment=%u, numAddressBits=%u",
        *maxTransferSize, *alignment, *numAddressBits);
    
    return kIOReturnSuccess;
}

This implementation appears to be consistent with the 64KB threshold behavior we are observing.

Our Questions:

  1. Based on our test results and the UserGetDMASpecification implementation, what is the expected behavior of the DriverKit framework when an I/O request larger than 64KB is sent to the rdisk node?
  2. Is the maxTransferSize reported by UserGetDMASpecification the reason why the I/O request is rejected before reaching our DEXT's core logic?
  3. What is the recommended "best practice" within the DriverKit framework for handling raw I/O requests that are larger than maxTransferSize? Should we implement the request splitting logic within the DEXT ourselves, or should I/O Kit automatically handle splitting the request for us as long as we correctly report our limits?

Thank you again!

Charles

Hi Kevin,

We conducted a test in our legacy KEXT environment, and the results indicate a behavioral difference between the KEXT and DriverKit frameworks in handling I/O requests.

Test Scenario (KEXT Environment):

We executed a 10MB raw I/O write command to an rdisk node:

sudo dd if=/dev/zero of=/dev/rdisk7 bs=10m count=1

Observations:

This command is a single 10MB write request. Our KEXT logs show that the driver did not receive one 10MB request, but instead received 20 sequential WRITE(10) commands.

Upon analysis, we confirmed that the size of each request was 512KB.

  • Evidence 1: DataLength

    • The DataLength for each request in the log is 0x00080000, which is 524,288 bytes (512KB).
  • Evidence 2: CDB Content

    • In the CDB of the first request, the transfer length is 0x0400 blocks (1024 blocks). 1024 blocks * 512 bytes/block = 512KB.
    • The LBAs of subsequent requests were sequential (LBA 0, LBA 1024, LBA 2048, ...).

Partial log for reference:

[AME_DAS_SCSI_Prepare] --- [KEXT FINAL DUMP] AME_SCSI_REQUEST_DAS packet for BufferID 46 ---
[AME_DAS_SCSI_Prepare]     Opcode (from CDB[0]): 0x2a (WRITE (10))
[AME_DAS_SCSI_Prepare]     CDB (hex): 2a 00 00 00 00 00 00 04 00 00
[AME_DAS_SCSI_Prepare]     DataLength: 0x00080000
...
[AME_DAS_SCSI_Prepare] --- [KEXT FINAL DUMP] AME_SCSI_REQUEST_DAS packet for BufferID 47 ---
[AME_DAS_SCSI_Prepare]     Opcode (from CDB[0]): 0x2a (WRITE (10))
[AME_DAS_SCSI_Prepare]     CDB (hex): 2a 00 00 00 04 00 00 04 00 00
[AME_DAS_SCSI_Prepare]     DataLength: 0x00080000

Our Conclusion:

This test shows that in the KEXT environment, the I/O Kit storage stack splits a large raw I/O request that exceeds an internal limit into multiple smaller chunks (512KB in this case) before sending them to the driver.

This is different from the behavior we observed in our DEXT. In the DEXT, after we declared a maxTransferSize of 64KB via UserGetDMASpecification, any rdisk request larger than this limit is rejected by the framework, not split.

Our Updated Questions:

  1. Is this behavior in DriverKit an intentional design change?
  2. Is the request-splitting functionality of I/O Kit no longer a part of the DriverKit framework, with the expectation that DEXT developers should now implement the logic for splitting large raw I/O requests?

Thank you for your clarification.

Best, Charles

We conducted a test in our legacy KEXT environment, and the results indicate a behavioral difference between the KEXT and DriverKit frameworks in handling I/O requests.

So, the first thing to understand is that IOKit and DriverKit are NOT fundamentally different/separate technologies. The best way to understand DriverKit is that it's implemented as a very specialized user client built on to our existing IOKit infrastructure. Note that this dynamic is quite direct- for example, a DEXT's "IOKitPersonalities" dictionary doesn't just "look like" an IOKit match dictionary, it IS an IOKit matching dictionary. How DEXT loading/matching actually works is:

  1. The matching dictionary is added into the kernels KEXT matching set just like any KEXT would be.

  2. When hardware is attached, that matching dictionary is used to match and load an in kernel driver in EXACTLY the same way ANY other KEXT would be.

  3. Once the kernel driver finishes loading, the system then uses the DEXT keys inside that matching dictionary to create your DEXT process, which then launches and proceeds with it's own initialization.

...at which point the DEXT then operates by controlling and manipulating it's underlying IOKit support driver. Understanding this dynamic is critical because the basic assumption your making here is that there is a "I/O Kit storage stack" and a "DriverKit storage stack":

This test shows that in the KEXT environment, the I/O Kit storage stack splits a large raw I/O request that exceeds an internal limit into multiple smaller chunks (512KB in this case) before sending them to the driver.

This is different from the behavior we observed in our DEXT. In the DEXT, after we declared a maxTransferSize of 64KB via UserGetDMASpecification, any rdisk request larger than this limit is rejected by the framework, not split.

...but that assumption is false. Both of these case are actually using EXACTLY the same "I/O Kit storage stack". That then leads to here:

  1. Is the request-splitting functionality of I/O Kit no longer a part of the DriverKit framework, with the expectation that DEXT developers should now implement the logic for splitting large raw I/O requests?

I think you've misunderstood what's causing the difference you're seeing here. The difference here isn't caused by IOKit vs. DriverKit, it's caused by different property configurations causing different behavior. Putting that another way, I believe that if you configured your KEXT such that it exported exactly the same property configuration as your DEXT is, then you'd see exactly the same behavior as your DEXT.

I'm not sure what you'll find, but the next step here is to get an IORegistryExplorer snapshot of both cases (KEXT and DEXT) and the compare the property configuration of both cases. I believe you'll find that one or more of the keys you specified through UserReportHBAConstraints is different than you're expecting, which is then leading to the failure.

Related to that point, returning to here:

In the DEXT, after we declared a maxTransferSize of 64KB via UserGetDMASpecification, any rdisk request larger than this limit is rejected by the framework, not split.

maxTransferSize is the maximum size of the entire transfer you're willing to receive, not the limitation of an individual DMA request. In most cases, I'd expect it to be the same or larger than:

kIOMaximumSegmentCount<Read/Write>Key * kIOMaximumSegmentByteCount<Read/Write>Key

However, setting it to less than kIOMaximumSegmentByteCount<Read/Write>Key would probably create exactly the error you're seeing. The high level storage stack would segment that transfer based on kIOMaximumSegmentByteCount<Read/Write>Key but the low level storage stack would then fail the request for exceeding maxTransferSize.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Hi Kevin,

Thank you again for your guidance. We have an important update based on your advice.

Following your suggestion to "compare property configurations," we analyzed our KEXT's source code, as we could not find the relevant properties in the KEXT's IORegistryExplorer snapshot. We then attempted to replicate the KEXT's behavior in our DEXT, which has led us to a final, specific contradiction that we hope you can help us resolve.

Here is a summary of our findings and final questions:

1 - KEXT Source Code Contains setProperty Calls

We discovered that our KEXT's Start() function programmatically sets several IOBlockStorageCharacteristics using setProperty. The key value is:

// In KEXT's Start() function:
#define MAX_IO_LENGTH (512 * 1024) // 512KB
setProperty(kIOMaximumByteCountWriteKey, (UInt64)MAX_IO_LENGTH, 64);
// ... plus other similar properties.

2 - DEXT Fails Despite Replicating KEXT's Configuration

Based on this finding and your previous advice, we implemented the following in our DEXT:

  • We implemented SetupHBAConstraints() and call the UserReportHBAConstraints virtual function.
  • Inside it, we populated the dictionary with the exact same values found in our KEXT's source, including setting kIOMaximumByteCountWriteKey to 512KB.
  • We also kept our UserGetDMASpecification's maxTransferSize at 64KB, as this reflects our hardware's actual single DMA transaction limit.

However, a dd bs=128k command to the rdisk node still fails with an Input/output error.

3 - The Final Contradiction: KEXT's "Hidden" Splitting Behavior

This is the core of our confusion. When we test the KEXT with a large I/O (dd bs=10m), our driver logs show that IOKit is, in fact, splitting the request into 64KB chunks, not the 512KB size we specified in the KEXT's source code.

This leads to our final questions:

1 - Why does IOKit split I/O into 64KB chunks for our KEXT, even when kIOMaximumByteCountWriteKey is set to 512KB? Is there another, higher-priority property or mechanism (perhaps related to Protocol Characteristics) that is forcing this smaller split size?

2 - In the DEXT environment, this 64KB splitting behavior is not being triggered. Instead, it appears the upper layers of IOKit are respecting our 512KB hint (by not splitting a 128KB request), while the lower layers reject it for exceeding the 64KB maxTransferSize. How can we correctly replicate the KEXT's robust "64KB splitting" behavior in our DEXT?

We are now unable to explain the discrepancy between the KEXT's declared properties and its actual I/O splitting behavior. Any clarification you can provide would be immensely helpful.

Best regards, Charles

We are now unable to explain the discrepancy between the KEXT's declared properties and its actual I/O splitting behavior. Any clarification you can provide would be immensely helpful.

So, I believe this is all handled by IOBlockStorageDriver using a class called IOBreaker. You can find the start of this in IOBlockStorageDriver::breakUpRequest(), which then proceeds into IOBreaker. Take a look at the code and see how that correlates with what you're seeing.

Following your suggestion to "compare property configurations," we analyzed our KEXT's source code, as we could not find the relevant properties in the KEXT's IORegistryExplorer snapshot. We then attempted to replicate the KEXT's behavior in our DEXT, which has led us to a final, specific contradiction that we hope you can help us resolve.

If you want me to take a look at this, I'll need to see snapshots from both configurations. Please use IORegistryExplorer.app to save out the files, upload both files to a bug, then post the bug number back here.

*I hate trying to read ioreg text traces.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Hi Kevin,

Thank you for the detailed explanation and the pointer to IOBlockStorageDriver::breakUpRequest() and IOBreaker. That is incredibly helpful information.

As you requested, I have filed a Feedback report containing the .ioreg snapshots from both the KEXT and DEXT configurations.

A sysdiagnose log, generated by Feedback Assistant during the submission process, is also attached.

The Feedback ID is: FB20924370

We look forward to your analysis of the registry files.

Best regards,

Charles

We could not find the relevant properties in the KEXT's IORegistryExplorer snapshot.

Looking at the snapshots you sent, I've listed the configuration of both drivers. Note that the first section lists the properties of the direct driver itself, while the second is the actual IOSCSIPeripheralDeviceType00, the parent IOBlockStorageServices. Here are the two configurations:

(1) KEXT configuration:

KEXT (subclass of IOSCSIParallelInterfaceController):

IOMaximumSegmentAddressableBitCount = 0x20
IOMaximumSegmentCountRead = 0x81
IOMaximumSegmentCountWrite = 0x81
IOMaximumByteCountRead = 0x80000
IOMaximumByteCountWrite = 0x80000
IOMinimumSegmentAlignmentByteCount = 0x4

IOSCSIPeripheralDeviceType00:
IOMaximumBlockCountRead = 0x400
IOMaximumBlockCountWrite = 0x400
IOMaximumByteCountWrite = 0x80000
IOMaximumByteCountRead = 0x80000

(2) DEXT Configuration

DEXT:
IOMaximumSegmentAddressableBitCount = 0x40
IOMaximumSegmentCountRead = 0x81
IOMaximumSegmentCountWrite = 0x81
IOMinimumSegmentAlignmentByteCount = 0

IOSCSIPeripheralDeviceType00:
IOMaximumBlockCountRead = Oxffff
IOMaximumBlockCountWrite = Oxffff

So, first off, IOMaximumBlock is "Oxffff" because that property can't be controlled by DriverKit. I don't know the EXACT reason for that, but I think it's because it acted as an unnecessarily complex "override" of the actual hardware configuration (see below).

Next, there are actually two issues:

  1. You've failed to define kIOMaximumSegmentByteCount* keys, as UserReportHBAConstraints basically "requires".

  2. maxTransferSize is too small for the configuration you’re actually creating.

Breaking that down in detail, above I said:

maxTransferSize is the maximum size of the entire transfer you're willing to receive, not the limitation of an individual DMA request. In most cases, I'd expect it to be the same or larger than:

kIOMaximumSegmentCount<Read/Write>Key * kIOMaximumSegmentByteCount<Read/Write>Key

That's because the expected flow here is the following:

  1. IOBreaker in IOBlockStorageDriver segments the transfer into using those two factors.

  2. Your DEXT receives one call to UserProcessParallelTask which passes in a single large physical segment.

  3. Your DEXT breaks that physical segment into individual transfers that match whatever your hardware requires.

  4. Once you've finished processing the entire physical segment you complete the task.

That leads me to your comment here:

// The maximum size of a single DMA transfer. 64KB is a safe value.

That limit doesn't really make sense on any hardware that DriverKit supports. The DART and the MMU (see here for more background) basically allow arbitrarily large chunks of non-contiguous memory to be mapped into the PCI address space. To the extent there is a limit, it's much, much larger than kilobytes.

That leads back to the comment I made in #2 about receiving a single segment, which comes from this comment in the documentation:

"Next, it sets the metadata for this I/O operation. The kernel generates one long physical segment with fBufferIOVMAddr as its start address, as seen here in the call to a hypothetical..."

That behavior DOESN'T actually come from the SCSI layer, but actually comes from IODMACommand. I discuss this in more detail here, but the summary is that the fact IODMACommand.PrepareForDMA() returns a segment list doesn't mean that it actually "can" do that. Even the kernel itself, it appears to only be possible when using some of our more "obscure" IOMemory subclasses, none of which are available through DriverKit.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Hi Kevin,

We have an update after completing the DEXT configuration as you advised. The final result is that any dd command with a block size larger than 512KB to the rdisk node still fails, just as before.

The tests and configurations that led to this result are detailed below.

Final Configuration (We have tried all approaches):

Based on the setProperty calls from our KEXT's source code and the properties from the .ioreg analysis, we have implemented the following four-part configuration in our DEXT:

  • HBA Level (UserReportHBAConstraints): We implemented this function to report all key properties from the KEXT's HBA layer (e.g., kIOMaximumByteCountWriteKey = 512KB).
  • LUN Level (UserCreateTargetForID): In AsyncCreateTargetForID, we built a personalities dictionary for each LUN and attempted to set IOMaximumBlockCountWriteKey = 0x400 through it.
  • Identity Declaration (UserInitializeController): In UserInitializeController(), we set the Protocol Characteristics dictionary via this->SetProperties.
  • Low-Level Sync (UserGetDMASpecification): We also set maxTransferSize to 512KB to ensure consistency with the HBA layer.

(We have confirmed that IORegistryExplorer shows the Protocol Characteristics were set successfully, but IOMaximumBlockCountWrite remains 0xffff.)

dd Test Results:

After applying all the configurations above, we ran the following tests on the /dev/rdisk node:

  • dd bs=64k count=1 -> Succeeded
  • dd bs=128k count=1 -> Succeeded
  • dd bs=512k count=1 -> Succeeded
  • dd bs=768k count=1 -> Failed (Input/output error)

Our Analysis:

The success of bs=512k indicates that our property configuration at the HBA level (UserReportHBAConstraints + UserGetDMASpecification) is effective. IOKit allows a single request up to 512KB to pass through.

The failure of bs=768k, however, shows that IOKit's IOBreaker splitting functionality has not been successfully activated. If it had been, we should have at least seen the first 512KB sub-request in our DEXT's log, but in fact, we saw nothing.

This seems to indicate that merely declaring size limits and protocol characteristics is insufficient to get IOBreaker to work for an IOUserSCSIParallelInterfaceController type DEXT.

Our Question:

You previously described a two-layer splitting model:

  1. IOBreaker ... segments the transfer...
  2. Your DEXT receives ... a single large physical segment.
  3. Your DEXT breaks that physical segment into individual transfers that match whatever your hardware requires.

Combining all our experimental evidence (especially our inability to influence IOMaximumBlockCountWrite from the DEXT), we believe the reason our KEXT succeeded is that its parent class implemented step 3, whereas our DEXT has not.

We would like to confirm with you:

Our plan for the next step is to manually write a "second-layer I/O splitter" from scratch within our UserProcessParallelTask_Impl (for example, to split a received 512KB request into eight 64KB hardware commands).

Is this the expected and correct solution for handling such hardware limits in DriverKit?

Thank you very much!

Best Regards,

Charles

Accepted Answer

Based on the setProperty calls from our KEXT's source code and the properties from the .ioreg analysis, we have implemented the following four-part configuration in our DEXT:

I'm confused. Above I told you that the problem was:

  1. You've failed to define kIOMaximumSegmentByteCount* keys, as UserReportHBAConstraints basically "requires".

Have you defined that key?

More specifically, UserReportHBAConstraints() as a list of required keys:

Key:										Required:

kIOMaximumSegmentCountReadKey = 			Yes
kIOMaximumSegmentCountWriteKey = 			Yes
kIOMaximumSegmentByteCountReadKey = 		Yes
kIOMaximumSegmentByteCountWriteKey = 		Yes
kIOMinimumSegmentAlignmentByteCountKey = 	Yes
kIOMaximumSegmentAddressableBitCountKey = 	Yes
kIOMinimumHBADataAlignmentMaskKey =  		Yes

Your DEXT has not defined all of them and, as they are required, you should not expect your DEXT to function properly until you've defined all of them.

Similarly:

Low-Level Sync (UserGetDMASpecification): We also set maxTransferSize to 512KB to ensure consistency with the HBA layer.

I haven't gotten explicit confirmation from the engineering team, but at this point I'm fairly convinced that maxTransferSize MUST either:

maxTransferSize >= kIOMaximumSegmentCountReadKey * kIOMaximumSegmentByteCountReadKey
OR
maxTransferSize >= kIOMaximumSegmentCountWriteKey * kIOMaximumSegmentByteCountWriteKey

...whichever of the two is larger. maxTransferSize basically defines "the largest possible transfer your controller could EVER handle", which would obviously be your segment count * segment size. Critically, using a smaller maxTransferSize won't cause an immediate failure, only a later failure if/when the kernel actually tried to "give you" a large enough transfer.

I've actually just filed a bug on this (r.164177660) asking that IOUserSCSIParallelInterfaceController fail completely if configuration is incomplete or if that configuration is in any way incomplete.

(We have confirmed that IORegistryExplorer shows the Protocol Characteristics were set successfully, but IOMaximumBlockCountWrite remains 0xffff.)

Yes. Your DEXT cannot set kIOMaximumByteCountWriteKey, which means it will return to its default value of 0xffff.

The failure of bs=768k, however, shows that IOKit's IOBreaker splitting functionality has not been successfully activated. If it had been, we should have at least seen the first 512KB sub-request in our DEXT's log, but in fact, we saw nothing.

No, that's NOT what it shows. Your DEXT is going to fail ANY transfer larger than 512KB because that's what YOU defined maxTransferSize as:

The failure of bs=768k, however, shows that IOKit's IOBreaker splitting functionality has not been successfully activated.

You're right, it's not activating. That's because you're currently telling the kernel that you can handle a transfer up to:

IOMaximumBlockCountRead = Oxffff -> 31 MB

...so any transfer smaller than 31 MB will be passed directly down the I/O stack. However, IOUserSCSIParallelInterfaceController is going to fail any transfer smaller than maxTransferSize.

Again, the way to fix this is to:

  1. Define the two kIOMaximumSegmentCount keys, so that IOBreaker will divide the requests into something "reasonable".

  2. Set maxTransferSize to a large enough value that it can handle the configuration you're creating in #1.

Our plan for the next step is to manually write a "second-layer I/O splitter" from scratch within our UserProcessParallelTask_Impl (for example, to split a received 512KB request into eight 64KB hardware commands).

That depends on what you mean by "splitter". The expected implementation here is that you'll take the value fBufferIOVMAddr and use basic math to divide it up into smaller chunks, each of which will be a scatter gather entry you pass over to your card.

Historically, I believe this was done by subdividing the IOMemoryDescriptor and generating individual IODMACommands; however, the nature of the DART means that this is somewhat silly and unnecessary, so it's done with a single IODMACommand over the entire descriptor.

Finally, making sure this is as clear as possible, this does mean that maxTransferSize is likely to be much, potentially MUCH, larger than you'd "expect" in the older architecture. I don't have the data at hand to validate the exact number, but I believe there is a Fibre Channel DEXT which is setting maxTransferSize to ~1 GB.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Hi Kevin,

Final update: It works! We have successfully resolved the issues related to I/O splitting and size.

The solution was to:

  • Correctly and completely implement all 7 required keys in UserReportHBAConstraints, based on the KEXT's configuration.
  • Set maxTransferSize in UserGetDMASpecification to a sufficiently large value (>= SegmentCount * SegmentByteCount).

Once we implemented these two changes, everything fell into place. Our DEXT can now handle large I/O requests of any size (we've successfully tested up to 1GB in a single transfer). dd bs=2m now works as expected, and diskutil can also successfully partition and format the disk. We observed that the IOMaximumBlockCount property on IOSCSIPeripheralDeviceType00 still shows as 0xffff, but as you anticipated, this value does not seem to affect the behavior of IOBreaker.

Our tests confirm that our hardware can handle the resulting 512KB segments directly.

We could not have solved this problem without your persistent and incredibly precise guidance. Thank you for taking the time to analyze our ioreg files and for explaining the complex behaviors of the IOKit storage stack and DriverKit.

Thank you again!

Charles

Final update: It works! We have successfully resolved the issues related to I/O splitting and size.

Fabulous!

Once we implemented these two changes, everything fell into place. Our DEXT can now handle large I/O requests of any size (we've successfully tested up to 1GB in a single transfer).

Great! FYI, I believe there is a hard-coded limit on what the kernel will "wire" that's ~2 GB. However, that limit has very broad effects on the entire system (for example, I believe IOBufferMemoryDescriptor allocation will start failing at the same threshold). If you want to experiment, you can try pushing all your value to 3GB, then pushing that transfer to it. I suspect the system will just break the transfer up before it reaches you (so you'll never get a 3GB transfer), but it's possible something more "interesting" will happen.

We observed that the IOMaximumBlockCount property on IOSCSIPeripheralDeviceType00 still shows as 0xffff, but as you anticipated, this value does not seem to affect the behavior of IOBreaker.

Yes. I'm not sure of the full history, but I suspect one of the engineers working on SCSIControllerDriverKit recognized that the property configuration had become a nest of confused and contradictory values, so they just picked the two values that were most "sensible" and only let your DEXT set those. IOMaximumBlockCount was then left in place for legacy reasons.

We could not have solved this problem without your persistent and incredibly precise guidance. Thank you for taking the time to analyze our ioreg files and for explaining the complex behaviors of the IOKit storage stack and DriverKit.

You're very welcome.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

[DEXT Migration Issue] IOUserSCSIParallelInterfaceController fails to handle low-level I/O from &#96;diskutil&#96;
 
 
Q