DEXT (IOUserSCSIParallelInterfaceController): Direct I/O Succeeds, but Buffered I/O Fails with Data Corruption on Large File Copies

Hi all,

We are migrating a SCSI HBA driver from KEXT to DriverKit (DEXT), with our DEXT inheriting from IOUserSCSIParallelInterfaceController. We've encountered a data corruption issue that is reliably reproducible under specific conditions and are hoping for some assistance from the community.

Hardware and Driver Configuration:

  • Controller: LSI 3108
  • DEXT Configuration: We are reporting our hardware limitations to the framework via the UserReportHBAConstraints function, with the following key settings:
    // UserReportHBAConstraints...
    addConstraint(kIOMaximumSegmentAddressableBitCountKey, 0x20); // 32-bit
    addConstraint(kIOMaximumSegmentCountWriteKey, 129);
    addConstraint(kIOMaximumByteCountWriteKey, 0x80000); // 512KB
    

Observed Behavior: Direct I/O vs. Buffered I/O

We've observed that the I/O behavior differs drastically depending on whether it goes through the system file cache:

1. Direct I/O (Bypassing System Cache) -> 100% Successful

When we use fio with the direct=1 flag, our read/write and data verification tests pass perfectly for all file sizes, including 20GB+.

2. Buffered I/O (Using System Cache) -> 100% Failure at >128MB

Whether we use the standard cp command or fio with the direct=1 option removed to simulate buffered I/O, we observe the exact same, clear failure threshold:

  • Test Results:

    • File sizes ≤ 128MB: Success. Data checksums match perfectly.
    • File sizes ≥ 256MB: Failure. Checksums do not match, and the destination file is corrupted.
  • Evidence of failure reproduced with fio (buffered_integrity_test.fio, with direct=1 removed):

    • fio --size=128M buffered_integrity_test.fio -> Test Succeeded (err=0).
    • fio --size=256M buffered_integrity_test.fio -> Test Failed (err=92), reporting the following error, which proves a data mismatch during the verification phase:
      verify: bad header ... at file ... offset 1048576, length 1048576
      fio: ... error=Illegal byte sequence
      

Our Analysis and Hypothesis

The phenomenon of "Direct I/O succeeding while Buffered I/O fails" suggests the problem may be related to the cache synchronization mechanism at the end of the I/O process:

  1. Our UserProcessParallelTask_Impl function correctly handles READ and WRITE commands.
  2. When cp or fio (buffered) runs, the WRITE commands are successfully written to the LSI 3108 controller's onboard DRAM cache, and success is reported up the stack.
  3. At the end of the operation, to ensure data is flushed to disk, the macOS file system issues an fsync, which is ultimately translated into a SYNCHRONIZE CACHE SCSI command (Opcode 0x35 or 0x91) and sent to our UserProcessParallelTask_Impl.
  4. We hypothesize that our code may not be correctly identifying or handling this SYNCHRONIZE CACHE opcode. It might be reporting "success" up the stack without actually commanding the hardware to flush its cache to the physical disk.
  5. The OS receives this "success" status and assumes the operation is safely complete.
  6. In reality, however, the last batch of data remains only in the controller's volatile DRAM cache and is eventually lost.
  7. This results in an incomplete or incorrect file tail, and while the file size may be correct, the data checksum will inevitably fail.

Summary

Our DEXT driver performs correctly when handling Direct I/O but consistently fails with data corruption when handling Buffered I/O for files larger than 128MB. We can reliably reproduce this issue using fio with the direct=1 option removed.

The root cause is very likely the improper handling of the SYNCHRONIZE CACHE command within our UserProcessParallelTask. P.S. This issue did not exist in the original KEXT version of the driver.

We would appreciate any advice or guidance on this issue.

Thank you.

We've observed that the I/O behavior differs drastically depending on whether it goes through the system file cache:

Quick question— how are you validating what the actual issue is? More specifically, are you pulling, unmounting the device, and testing with a known good driver? Or are you testing with your development DEXT? That's crucial because testing through your DEXT means you don't know whether this is a write or a read issue.

That leads to here:

At the end of the operation, to ensure data is flushed to disk, the macOS file system issues an fsync, which is ultimately translated into a SYNCHRONIZE CACHE SCSI command (Opcode 0x35 or 0x91) and sent to our UserProcessParallelTask_Impl.

Are you sure about that? How have you validated that? I haven't tried to validate the entire I/O path, but I’m fairly sure that copyfile() (what cp calls) does not call fsync().

FYI, the history here is somewhat complicated and “ugly," but in general, if the system were going specifically trying to flush data, it would probably have called F_FULLFSYNC. However, the cost of that is fairly high, so it isn't something the system routinely does.

That leads me to here:

verify: bad header ... at file ... offset 1048576, length 1048576

That's an oddly specific offset, as it's exactly 1 MB. I don't see how that would track with your current theory.

In terms of next steps, I think I'd start by making sure you know EXACTLY what happened. On the testing side, my suggestion would be to do something like this:

  1. Zero out the entire drive.

  2. Partition the drive into "small-ish" volumes (so you minimize how much of the device the system will target).

  3. Format that partition with the simplest possible file system. I'd start with FAT or possibly unencrypted, non-journaled HFS+. I definitely would NOT use APFS. Too much complexity/noise.

  4. Reproduce the issue in the most controlled way possible.

  5. Read the driver with a trustworthy controller so you can see EXACTLY what happened.

On the driver side, I'd focus on trying to figure out exactly what's being sent to the device. I'm not sure the kernel log will handle it, but I'd be tempted to build your own user client and use that to log out EVERY command that goes in/out of your DEXT. You won't be able to see the data itself, but you don't need to see the data as that offset should be "enough" to follow what's going on.

Related to that point, I'd focus your attention on unmount, NOT the copy itself. Unmount is the only point the system ACTUALLY promises to flush "everything", so that's the point you "know" the data had to reach your driver.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Hi Kevin,

TL;DR (Core Issues)

  1. 100% Reproducible: We used two completely independent test methods (cp + sync and fio buffered mode), and both consistently reproduced the data corruption issue under Buffered I/O.
  2. Clear Failure Threshold: Both tests point to the exact same failure boundary: file sizes 128MB (inclusive) and below pass, but 256MB (inclusive) and above consistently fail.
  3. Key Clue Confirmed: The fio test again reported the exact same error as our initial finding: data corruption starts at a very precise offset 1048576 (1MB) location. You mentioned last time that this offset was "odd"; now we can confirm this is not a coincidence, but a core characteristic of the problem.

Test 1: System-Level File Copy Integrity Test (cp + sync)

This test aims to simulate standard user operations to rule out tool-specific issues.

  • Methodology: We used a shell script to automate cp for file copying, used sync to force the system to flush I/O caches, and finally compared the SHA256 hashes of the source and destination files.
  • Results:
    • 128MB File: Success. Hashes match perfectly.
    • 256MB File: Failure. Hash mismatch, confirming data corruption during the copy process.
  • Conclusion: This proves that the issue lies within the system I/O stack, and our DEXT has a defect when handling standard buffered writes.

Test 2: fio Buffered I/O Integrity Test

This test leverages fio's robust verification capabilities to obtain lower-level error details.

Settings & Expected Behavior

  • Settings (buffered_integrity_test.fio):

    • We removed direct=1 to force fio to use the OS file cache, consistent with the behavior of cp.
    • We set do_verify=1 and verify=sha256 to require data integrity checks after I/O operations.
  • fio Mechanism: The verification process consists of two independent phases (it does not verify-as-it-writes):

    1. Write Phase: fio writes the entire 256MB data (each 1MB block carries a unique header for verification) to the file.
    2. Verify Phase: After writing concludes, fio reads back the entire 256MB starting from the beginning (offset 0), checking the headers block by block.

Test Results

  • The results match the cp test exactly: 128MB passes, 256MB fails.
  • During the 256MB failure, we received the following error message:
    [*] Testing file size: 256M
    ...
    verify: bad header ... at file ... offset 1048576, length 1048576
    fio: ... error=Illegal byte sequence
    ...
    [!!!] TEST FAILED at file size: 256M
    

Error Message Analysis

The fio error report indicates the following:

  1. verify: bad header: This means the data block read back by fio during the "Verify Phase" has a corrupted header.

  2. at file ... offset 1048576: This tells us:

    • fio successfully verified the first 1MB (offset 0 to 1048575).
    • The error occurred when fio attempted to verify the second 1MB data block (starting at offset 1,048,576).
    • This indicates that our driver or controller starts failing when handling I/O requests that cross this 1MB address boundary.

Synthesis & Next Steps

We suspect there is a bug in our DEXT driver when handling Buffered I/O writes. The trigger conditions are:

  • The I/O must be Buffered I/O.
  • The total I/O for a single file must exceed a threshold (> 128MB).
  • When these conditions are met, the driver or hardware errs when processing write requests that cross the 1MB byte boundary, causing corruption for all subsequent data.

This is likely related to the processing flow of the SYNCHRONIZE CACHE command, and how I/O requests are segmented or addressed in memory.

Our Next Plan: Following your suggestion, we will implement detailed logging within the DEXT. Our goal is to capture all SCSI commands and their parameters (specifically LBA and transfer length) entering and exiting the driver during the failed 256MB fio test. We will focus on observing exactly what happens when I/O requests touch and cross this critical 1MB location.

Best Regards,

Charles


(Attachments include the test scripts and full logs for both cp and fio.)

The error occurred when fio attempted to verify the second 1MB data block (starting at offset 1,048,576).

So what's IS there? And what should have been? These test use hashes for verification because it's the quickest way to validate the data, but the question I'm interested in here is what your driver actually "put" on disk.

Part of my concern here is that unless you artificially cut power to the device, we shouldn't have needed any explicit cache sync as part of the copy. The system should have flushed all of it's buffers to your DEXT as part of unmount and you should have flushed them to disk shortly after that. If this is really about cache flushing, that's the failure case I'd really be concerned* about here, not the individual copy.

*Failure to flush a file loses file data. Failure to flush file system blocks loses volumes.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

DEXT (IOUserSCSIParallelInterfaceController): Direct I/O Succeeds, but Buffered I/O Fails with Data Corruption on Large File Copies
 
 
Q