DEXT (IOUserSCSIParallelInterfaceController): Direct I/O Succeeds, but Buffered I/O Fails with Data Corruption on Large File Copies

Hi all,

We are migrating a SCSI HBA driver from KEXT to DriverKit (DEXT), with our DEXT inheriting from IOUserSCSIParallelInterfaceController. We've encountered a data corruption issue that is reliably reproducible under specific conditions and are hoping for some assistance from the community.

Hardware and Driver Configuration:

  • Controller: LSI 3108
  • DEXT Configuration: We are reporting our hardware limitations to the framework via the UserReportHBAConstraints function, with the following key settings:
    // UserReportHBAConstraints...
    addConstraint(kIOMaximumSegmentAddressableBitCountKey, 0x20); // 32-bit
    addConstraint(kIOMaximumSegmentCountWriteKey, 129);
    addConstraint(kIOMaximumByteCountWriteKey, 0x80000); // 512KB
    

Observed Behavior: Direct I/O vs. Buffered I/O

We've observed that the I/O behavior differs drastically depending on whether it goes through the system file cache:

1. Direct I/O (Bypassing System Cache) -> 100% Successful

When we use fio with the direct=1 flag, our read/write and data verification tests pass perfectly for all file sizes, including 20GB+.

2. Buffered I/O (Using System Cache) -> 100% Failure at >128MB

Whether we use the standard cp command or fio with the direct=1 option removed to simulate buffered I/O, we observe the exact same, clear failure threshold:

  • Test Results:

    • File sizes ≤ 128MB: Success. Data checksums match perfectly.
    • File sizes ≥ 256MB: Failure. Checksums do not match, and the destination file is corrupted.
  • Evidence of failure reproduced with fio (buffered_integrity_test.fio, with direct=1 removed):

    • fio --size=128M buffered_integrity_test.fio -> Test Succeeded (err=0).
    • fio --size=256M buffered_integrity_test.fio -> Test Failed (err=92), reporting the following error, which proves a data mismatch during the verification phase:
      verify: bad header ... at file ... offset 1048576, length 1048576
      fio: ... error=Illegal byte sequence
      

Our Analysis and Hypothesis

The phenomenon of "Direct I/O succeeding while Buffered I/O fails" suggests the problem may be related to the cache synchronization mechanism at the end of the I/O process:

  1. Our UserProcessParallelTask_Impl function correctly handles READ and WRITE commands.
  2. When cp or fio (buffered) runs, the WRITE commands are successfully written to the LSI 3108 controller's onboard DRAM cache, and success is reported up the stack.
  3. At the end of the operation, to ensure data is flushed to disk, the macOS file system issues an fsync, which is ultimately translated into a SYNCHRONIZE CACHE SCSI command (Opcode 0x35 or 0x91) and sent to our UserProcessParallelTask_Impl.
  4. We hypothesize that our code may not be correctly identifying or handling this SYNCHRONIZE CACHE opcode. It might be reporting "success" up the stack without actually commanding the hardware to flush its cache to the physical disk.
  5. The OS receives this "success" status and assumes the operation is safely complete.
  6. In reality, however, the last batch of data remains only in the controller's volatile DRAM cache and is eventually lost.
  7. This results in an incomplete or incorrect file tail, and while the file size may be correct, the data checksum will inevitably fail.

Summary

Our DEXT driver performs correctly when handling Direct I/O but consistently fails with data corruption when handling Buffered I/O for files larger than 128MB. We can reliably reproduce this issue using fio with the direct=1 option removed.

The root cause is very likely the improper handling of the SYNCHRONIZE CACHE command within our UserProcessParallelTask. P.S. This issue did not exist in the original KEXT version of the driver.

We would appreciate any advice or guidance on this issue.

Thank you.

We've observed that the I/O behavior differs drastically depending on whether it goes through the system file cache:

Quick question— how are you validating what the actual issue is? More specifically, are you pulling, unmounting the device, and testing with a known good driver? Or are you testing with your development DEXT? That's crucial because testing through your DEXT means you don't know whether this is a write or a read issue.

That leads to here:

At the end of the operation, to ensure data is flushed to disk, the macOS file system issues an fsync, which is ultimately translated into a SYNCHRONIZE CACHE SCSI command (Opcode 0x35 or 0x91) and sent to our UserProcessParallelTask_Impl.

Are you sure about that? How have you validated that? I haven't tried to validate the entire I/O path, but I’m fairly sure that copyfile() (what cp calls) does not call fsync().

FYI, the history here is somewhat complicated and “ugly," but in general, if the system were going specifically trying to flush data, it would probably have called F_FULLFSYNC. However, the cost of that is fairly high, so it isn't something the system routinely does.

That leads me to here:

verify: bad header ... at file ... offset 1048576, length 1048576

That's an oddly specific offset, as it's exactly 1 MB. I don't see how that would track with your current theory.

In terms of next steps, I think I'd start by making sure you know EXACTLY what happened. On the testing side, my suggestion would be to do something like this:

  1. Zero out the entire drive.

  2. Partition the drive into "small-ish" volumes (so you minimize how much of the device the system will target).

  3. Format that partition with the simplest possible file system. I'd start with FAT or possibly unencrypted, non-journaled HFS+. I definitely would NOT use APFS. Too much complexity/noise.

  4. Reproduce the issue in the most controlled way possible.

  5. Read the driver with a trustworthy controller so you can see EXACTLY what happened.

On the driver side, I'd focus on trying to figure out exactly what's being sent to the device. I'm not sure the kernel log will handle it, but I'd be tempted to build your own user client and use that to log out EVERY command that goes in/out of your DEXT. You won't be able to see the data itself, but you don't need to see the data as that offset should be "enough" to follow what's going on.

Related to that point, I'd focus your attention on unmount, NOT the copy itself. Unmount is the only point the system ACTUALLY promises to flush "everything", so that's the point you "know" the data had to reach your driver.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Hi Kevin,

TL;DR (Core Issues)

  1. 100% Reproducible: We used two completely independent test methods (cp + sync and fio buffered mode), and both consistently reproduced the data corruption issue under Buffered I/O.
  2. Clear Failure Threshold: Both tests point to the exact same failure boundary: file sizes 128MB (inclusive) and below pass, but 256MB (inclusive) and above consistently fail.
  3. Key Clue Confirmed: The fio test again reported the exact same error as our initial finding: data corruption starts at a very precise offset 1048576 (1MB) location. You mentioned last time that this offset was "odd"; now we can confirm this is not a coincidence, but a core characteristic of the problem.

Test 1: System-Level File Copy Integrity Test (cp + sync)

This test aims to simulate standard user operations to rule out tool-specific issues.

  • Methodology: We used a shell script to automate cp for file copying, used sync to force the system to flush I/O caches, and finally compared the SHA256 hashes of the source and destination files.
  • Results:
    • 128MB File: Success. Hashes match perfectly.
    • 256MB File: Failure. Hash mismatch, confirming data corruption during the copy process.
  • Conclusion: This proves that the issue lies within the system I/O stack, and our DEXT has a defect when handling standard buffered writes.

Test 2: fio Buffered I/O Integrity Test

This test leverages fio's robust verification capabilities to obtain lower-level error details.

Settings & Expected Behavior

  • Settings (buffered_integrity_test.fio):

    • We removed direct=1 to force fio to use the OS file cache, consistent with the behavior of cp.
    • We set do_verify=1 and verify=sha256 to require data integrity checks after I/O operations.
  • fio Mechanism: The verification process consists of two independent phases (it does not verify-as-it-writes):

    1. Write Phase: fio writes the entire 256MB data (each 1MB block carries a unique header for verification) to the file.
    2. Verify Phase: After writing concludes, fio reads back the entire 256MB starting from the beginning (offset 0), checking the headers block by block.

Test Results

  • The results match the cp test exactly: 128MB passes, 256MB fails.
  • During the 256MB failure, we received the following error message:
    [*] Testing file size: 256M
    ...
    verify: bad header ... at file ... offset 1048576, length 1048576
    fio: ... error=Illegal byte sequence
    ...
    [!!!] TEST FAILED at file size: 256M
    

Error Message Analysis

The fio error report indicates the following:

  1. verify: bad header: This means the data block read back by fio during the "Verify Phase" has a corrupted header.

  2. at file ... offset 1048576: This tells us:

    • fio successfully verified the first 1MB (offset 0 to 1048575).
    • The error occurred when fio attempted to verify the second 1MB data block (starting at offset 1,048,576).
    • This indicates that our driver or controller starts failing when handling I/O requests that cross this 1MB address boundary.

Synthesis & Next Steps

We suspect there is a bug in our DEXT driver when handling Buffered I/O writes. The trigger conditions are:

  • The I/O must be Buffered I/O.
  • The total I/O for a single file must exceed a threshold (> 128MB).
  • When these conditions are met, the driver or hardware errs when processing write requests that cross the 1MB byte boundary, causing corruption for all subsequent data.

This is likely related to the processing flow of the SYNCHRONIZE CACHE command, and how I/O requests are segmented or addressed in memory.

Our Next Plan: Following your suggestion, we will implement detailed logging within the DEXT. Our goal is to capture all SCSI commands and their parameters (specifically LBA and transfer length) entering and exiting the driver during the failed 256MB fio test. We will focus on observing exactly what happens when I/O requests touch and cross this critical 1MB location.

Best Regards,

Charles


(Attachments include the test scripts and full logs for both cp and fio.)

The error occurred when fio attempted to verify the second 1MB data block (starting at offset 1,048,576).

So what's IS there? And what should have been? These test use hashes for verification because it's the quickest way to validate the data, but the question I'm interested in here is what your driver actually "put" on disk.

Part of my concern here is that unless you artificially cut power to the device, we shouldn't have needed any explicit cache sync as part of the copy. The system should have flushed all of it's buffers to your DEXT as part of unmount and you should have flushed them to disk shortly after that. If this is really about cache flushing, that's the failure case I'd really be concerned* about here, not the individual copy.

*Failure to flush a file loses file data. Failure to flush file system blocks loses volumes.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Hi Kevin,

The data corruption issue has been resolved!

Special thanks for asking the key question: "So what's IS there?" This prompted us to shift our focus from high-level hash verification to inspecting the raw bytes written to the disk. This investigation revealed that the root cause was not related to cache flushing, but rather a hardware limitation regarding single transfer lengths.

Here is a summary of our findings and the solution:

By using a Python script to verify the data on disk byte-by-byte, we discovered:

  • When macOS coalesced writes into 2MB chunks (as we had previously set maxTransferSize to 128MB), data corruption consistently began exactly at Offset 1MB + 16KB within the command (manifesting as 0x00 or garbage data).
  • The LSI 3108 controller/firmware seems cannot correctly handle a single SCSI command with a data length exceeding 1MB in the DEXT environment.

We implemented a two-layer fix:

  • In UserGetDMASpecification, we explicitly set maxTransferSize to 1MB (1,048,576 bytes). This forces macOS to split large I/O requests into smaller chunks that the hardware can safely digest.
  • To align with SCSI best practices and ensure maximum stability, we implemented logic within the driver to further split these 1MB buffers into multiple 64KB SGL Segments when populating hardware descriptors.

With these fixes applied, all previously failing test scenarios (including cp + sync and fio, with file sizes ranging from 256MB up to 5GB) now PASS 100%. The checksums match perfectly.

Thanks again for your guidance — it pointed us directly to the core issue!

Best Regards,

Charles

Special thanks for asking the key question: "So what’s there?" This prompted us to shift our focus from high-level hash verification to inspecting the raw bytes written to the disk. This investigation revealed that the root cause was not related to cache flushing, but rather a hardware limitation regarding single transfer lengths.

Good, I'm glad that was helpful. One of the things I've learned in DTS is that it's very easy for a bug investigation to be derailed by jumping straight to "what went wrong" without really having looked closely at "what actually happened". We're so used to our code "doing what we think it does" that we skip right past the possibility that it simply ISN'T doing what we think it is.

With these fixes applied, all previously failing test scenarios (including cp + sync and fio, with file sizes ranging from 256MB up to 5GB) now PASS 100%. The checksums match perfectly.

Fabulous and congratulations!

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Hi Kevin,

Regarding your previous analysis of the KEXT configuration, specifically where you pointed out the IOMaximumByteCount is actually 0x80000 (512KB) in this post.

We observed the runtime logs of the legacy KEXT, and the results align perfectly with your observation. Even when copying a 2GB file, the KEXT actually splits the request into 512KB (0x80000) chunks per transfer.

[ProcessParallelTask] [KEXT_CHECK] WRITE (0x2a) | T:0 L:0 | Size: 524288 bytes (512 KB)

This confirms that our current 1MB limit in the DEXT actually exceeds the behavior of the legacy driver. Since the current setting has passed all data integrity verifications, we consider the transfer size issue resolved and will not pursue larger transfer sizes at this stage.

As per your recommendation, we are currently implementing the UserProcessBundledParallelTasks architecture to strictly follow DriverKit best practices and optimize performance (it is currently causing the DEXT to crash, so we are in the middle of debugging it).

Thanks again for your precise insights!

Best Regards,

Charles

Hi Kevin,

Apologies, but it seems our previous celebration was premature. I must retract the conclusion from my last post regarding the issue being resolved.

Here is the current situation:

While our fixes regarding maxTransferSize (1MB) and SGL splitting have indeed achieved a 100% pass rate for CLI tools (cp and fio) across all file sizes, we have discovered that Finder still fails with Error -36 during file copies.

Based on our latest findings, I would appreciate your insights on the following:

  • CLI (cp, fio): Read/Write operations are 100% successful, even with large files (e.g., 4GB).
  • Finder:
    • Copying "Pre-existing files" (Cold Data): Success for files < 64MB.
    • Copying "Freshly created files" (Dirty Data/Hot Cache): Immediate failure (Error -36) for files > 1MB.

We suspected that "freshly created files" were highly fragmented in memory, causing the DEXT to construct a Scatter/Gather List (SGL) that exceeded our hardware limit (129 Segments).

To verify this, we implemented "Trap Logs" inside the DEXT to monitor the SGCount. The logs proved this hypothesis wrong. Finder is not sending requests that exceed the limit.

At the exact moment Finder reports the error, the DEXT is receiving a series of extremely small and clean requests, not fragmented large I/O:

  • Request sizes are mostly 4KB (0x1000) or 16KB (0x4000).
  • The SGCount for each request is only 0 or 1.
  • This is far below our hardware limit of 129.

This leaves us with a confusing contradiction:

  • CLI (fio/cp): Sends larger chunks (e.g., 512KB), SGCount = 1 (likely due to contiguous allocation by fio). Result: Success.
  • Finder: Sends tiny chunks (4KB/16KB), SGCount = 1. Result: Failure (Error -36).

To rule out configuration variables, we have set the DEXT's UserReportHBAConstraints to match our stable Legacy KEXT exactly:

  • IOMaximumByteCount = 512 KB (0x80000)
  • IOMaximumSegmentCount = 129

Since the logs prove that Finder's requests fully comply with our reported Constraints, why would DriverKit or macOS determine a failure?

Does Finder's I/O behavior trigger specific SCSI command sequences or timing issues that CLI tools do not?

We observed that Finder issues multiple TEST UNIT READY and SYNCHRONIZE CACHE commands immediately before the write failure.

Does this imply that some error handling mechanism is being triggered prematurely?

Thank you again for your time and assistance.

Best Regards,

Charles

Does Finder's I/O behavior trigger specific SCSI command sequences or timing issues that CLI tools do not?

Sort of. The difference here isn't caused by any fundamental difference between "the Finder" and “CLI tools“ - at the level you're interacting with, they're both just "processes doing reads and writes". Indeed, ironically, I believe we moved copying out of process a few releases ago, so as far as most of the system can tell, the actual copying is occurring from very similar components.

What is different is that they’re almost certain to have different I/O patterns. On the CLI side, I'd cp is using copyfile() and as are many of our higher-level APIs (notably, I believe NSFileManager has converted to it). On the other hand, the Finder uses its own copy engine both because it's older and because the details of its behavior are different than most APIs (for example, it preflights copies and uses VNOP_COPYFILE for smb copies).

The big thing I'd keep in mind here is that the reason you tend to see "common" I/O patterns when copying using different tools/apps tends to be because those components happen to be using the same APIs, NOT because that pattern is fundamentally "better" or required. This can make it very easy to trick yourself into a false sense of security, as testing with "a bunch of apps" can make it look like you've done some kind of broad validation when all you've really done is proved that copyfile() works exactly the same in a bunch of different processes.

A few suggestions:

  • When you find a particular I/O pattern that fails, try to reproduce that I/O pattern in a dedicated test tool you directly "control". That will let you build up a test suite that validates that what "supposed" to work continues to work going forward.

  • When you find any particular issue, make sure to test the I/O patterns "around" whatever you find, not JUST the specific failure.

  • In terms of real-world testing, I'd focus on intentionally diversifying the set of tools you’re testing with, not just testing "lots of apps". For example, "cp", "ditto", and "asr" all "copy stuff", but they were written at completely different times for very different reasons. As a special case there, "asr" can directly replicate/update snapshots between volumes, which I'd expect to generate I/O patterns that are very different than basically "any" other copy process (see the man pages for more details).

  • In terms of API-level testing, I would test with "copyfile()", NSFileManager, and FSCopyObject(). Note that while the FSCopyObject is long deprecated, it's actually the API that is most likely to behave like the Finder.

This revealed the root cause: UserReportHBAConstraints appears to have an internal filtering mechanism that ignores "optional" keys not listed in the documentation.

That's true, but slightly misleading. The documentation for UserReportHBAConstraints() includes the list of keys you're required to set, and those are the only keys you should be manipulating.

Since the standard API filters these keys, we switched to using SetProperties() to bypass the filter and directly inject these limits into the IORegistry.

I can't really argue with success, but I don't think you should be needing to set IOMaximumByteCountWrite. Are you sure you're setting all of the required key set, particularly "kIOMaximumSegmentByteCountWriteKey"? IORegistry should actually be ignoring IOMaximumByteCountWrite if the right full set of keys are present.

If you've confirmed that all of the required keys are set and correct, then please file a bug on this that includes an IORegistryExplorer snapshot of your full config, then post that bug number back here. I don't see any issue with you setting IOMaximumByteCountWrite yourself, but you shouldn't have to and, theoretically*, it might not work in the future.

*Meaning, "this isn't a key we specifically intended DEXT to need/use", NOT "we have an active concern/issue/problem with this key and are actively going to do something about it".

Part of what the SCSIControllerDriverKit is trying to "clean up" is to standardize a specific set of keys that control how I/O is broken up, instead of the odd mashup of overlapping keys we currently use/support.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Hi Kevin,

I have filed the bug report as requested. The Feedback ID is FB21256805. I've included the IORegistry dumps showing the missing keys when using the standard API versus the successful injection using the SetProperties workaround.

To answer your specific question:

"Are you sure you're setting all of the required key set, particularly 'kIOMaximumSegmentByteCountWriteKey'?"

Yes. We confirmed that kIOMaximumSegmentByteCountWriteKey was correctly set to 65536 (64 KB), and kIOMaximumSegmentCountWriteKey was set to 129.

Thanks again for your help in narrowing this down.

Best Regards,

Charles

Yes. We confirmed that kIOMaximumSegmentByteCountWriteKey was correctly set to 65536 (64 KB), and kIOMaximumSegmentCountWriteKey was set to 129.

Did you? Because I didn't find "IOMaximumSegmentCountWrite" or "IOMaximumSegmentCountWrite" in either of the files you uploaded. More to the point, the class reference for UserReportHBAConstraints lists these seven keys as "required":

kIOMaximumSegmentCountReadKey

kIOMaximumSegmentCountWriteKey
kIOMaximumSegmentByteCountReadKey
kIOMaximumSegmentByteCountWriteKey
kIOMinimumSegmentAlignmentByteCountKey
kIOMaximumSegmentAddressableBitCountKey
kIOMinimumHBADataAlignmentMaskKey

Your DEXT should be defining all of them and, as far as I can tell, it isn't.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Hi Kevin,

Just a quick update: I have uploaded the source code (SetupHBAConstraints_Workaround.cpp) to the bug report (FB21256805).

This file confirms that we are indeed defining all the required keys (including kIOMaximumSegmentByteCount) before calling UserReportHBAConstraints, and it also demonstrates the SetProperties workaround we implemented to bypass the issue.

Best regards,

Charles

This file confirms that we are indeed defining all the required keys (including kIOMaximumSegmentByteCount) before calling UserReportHBAConstraints, and it also demonstrates the SetProperties workaround we implemented to bypass the issue.

Ahh, I see what’s going on. I believe UserReportHBAConstraints actually sets those values at the controller level (not the peripheral), and your ioreg snippet didn’t include the controller. Can you upload a full IORegistryExplorer snapshot just so I can confirm everything is correct?

Aside from that, having read through the bug again, I think setting kIOMaximumByteCountRead/WriteKey on the peripheral is probably your best option. I want to double-check that with the engineering lead (he’s out sick today) just in case I’ve missed something, but I don’t think that will change anything.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Hi Kevin,

I have generated a full recursive IORegistry dump starting from our DriverKit controller class and attached it to FB21256805 (Filename: IORegistry_Snapshot_Fail.txt).

Best regards,

Charles

DEXT (IOUserSCSIParallelInterfaceController): Direct I/O Succeeds, but Buffered I/O Fails with Data Corruption on Large File Copies
 
 
Q