vnop_strategy unexpectedly zero-extends files

On implementing vnop_mmap, vnop_strategy and other related VNOPs as suggested in https://developer.apple.com/forums/thread/756358 my vnop_strategy routine ends up zero-extending files.

I don't understand why my filesystem behaves as described above.

Perusing the source code of both the relevant parts of Darwin/XNU and SMBClient did not clarify things for me.

A nudge in the right direction would be greatly appreciated.

The technical details of the issue are given in the plain text file attached, as some text was found to be sensitive. Unsure what exactly it was.

vnop_strategy unexpectedly zero-extends files

On implementing vnop_mmap, vnop_strategy and other related VNOPs as suggested in https://developer.apple.com/forums/thread/756358, my vnop_strategy routine ends up zero-extending files.

The end result is shown below.

Original file:
% hexdump -Cs 0x012085a0 progit.pdf
012085a0  30 30 20 6e 20 0a 74 72  61 69 6c 65 72 0a 3c 3c  |00 n .trailer.<<|
012085b0  20 2f 53 69 7a 65 20 33  35 30 39 0a 2f 52 6f 6f  | /Size 3509./Roo|
012085c0  74 20 32 20 30 20 52 0a  2f 49 6e 66 6f 20 31 20  |t 2 0 R./Info 1 |
012085d0  30 20 52 0a 3e 3e 0a 73  74 61 72 74 78 72 65 66  |0 R.>>.startxref|
012085e0  0a 31 38 38 33 38 33 39  30 0a 25 25 45 4f 46 0a  |.18838390.%%EOF.|
012085f0

New file:
% hexdump -Cs 0x012085a0 /Volumes/myfs/progit_cp.pdf 
012085a0  30 30 20 6e 20 0a 74 72  61 69 6c 65 72 0a 3c 3c  |00 n .trailer.<<|
012085b0  20 2f 53 69 7a 65 20 33  35 30 39 0a 2f 52 6f 6f  | /Size 3509./Roo|
012085c0  74 20 32 20 30 20 52 0a  2f 49 6e 66 6f 20 31 20  |t 2 0 R./Info 1 |
012085d0  30 20 52 0a 3e 3e 0a 73  74 61 72 74 78 72 65 66  |0 R.>>.startxref|
012085e0  0a 31 38 38 33 38 33 39  30 0a 25 25 45 4f 46 0a  |.18838390.%%EOF.|
012085f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
0120c000

The size of the original file is 0x012085f0 or 18908656 in decimal.

The size of the new file is 0x0120c000 or 18923520 in decimal.

The difference between the two files is 14864 bytes, which is filled with zeros.

The blocks of code in my vnop_write that set IO_HEADZEROFILL and IO_TAILZEROFILL are never entered into.


The last cluster_write call is passed a uio whose offset is 18874368 and resid is 34288:

2024-07-24 11:34:05 vnops.c:497 myfs_vnop_write: uio_offset(uio): 18874368 uio_resid(uio): 34288
2024-07-24 11:34:05 vnops.c:498 myfs_vnop_write: -> cluster_write(vp, uio, originalEOF: 18874368, newEOF: 18908656, zero_head_off: 0, zero_tail_off: 18908656, lflag: 0x100001)
2024-07-24 11:34:05 vnops.c:500 myfs_vnop_write: <- cluster_write -> 0

The last vnop_strategy call is handed down a buf_count of 49152 bytes however, which is then used to create a new uio which is passed to myfs_write routine:

2024-07-24 11:34:05 vnops.c:670 myfs_vnop_strategy: uio_offset(uio): 18874368 uio_resid(uio): 49152
2024-07-24 11:34:05 vnops.c:671 myfs_vnop_strategy: -> myfs_write 
2024-07-24 11:34:05 vnops.c:673 myfs_vnop_strategy: <- myfs_write -> 49152


The difference between 49152 and 34288 of 14864 bytes is what ends up being zero-extended.


dtrace(1)'ing the operation revealed the following sequence of calls being made.

About half-way through into its operation, cluster_write calls my vnop_strategy:

  4  -> cluster_write                         proc: cp -> cluster_write(vp: progit_cp.pdf, uio, originalEOF: 8388608, newEOF: 8912896, zero_head_off: 0, zero_tail_off: 8912896, lflag: 0x100001)

  4    -> myfs_vnop_strategy                proc: cp file: progit_cp.pdf b_bufsize: 0 b_bcount: 65536 b_resid: 0

              kernel.release.t8101`0xfffffe0014bfb610+0xcec
              kernel.release.t8101`0xfffffe0014bff2f0+0x300
              kernel.release.t8101`0xfffffe0014c00bd0+0x54c
              kernel.release.t8101`0xfffffe0014bfd898+0x724
              kernel.release.t8101`cluster_write_ext+0x2a4
              kernel.release.t8101`cluster_write+0x1c
              myfs`myfs_vnop_write+0xb18
              kernel.release.t8101`VNOP_WRITE+0x84

It then picks it up where it left off:

  4  -> cluster_write                         proc: cp -> cluster_write(vp: progit_cp.pdf, uio, originalEOF: 8912896, newEOF: 9437184, zero_head_off: 0, zero_tail_off: 9437184, lflag: 0x100001)

And eventually makes the final cluster_write call:

  4    -> cluster_write                       proc: cp -> cluster_write(vp: progit_cp.pdf, uio, originalEOF: 18874368, newEOF: 18908656, zero_head_off: 0, zero_tail_off: 18908656, lflag: 0x100001)


Which is followed by another series of vnop_strategy calls:

  4      -> myfs_vnop_strategy              proc: cp file: progit_cp.pdf b_bufsize: 0 b_bcount: 65536 b_resid: 0

              kernel.release.t8101`0xfffffe0014bfb610+0xcec
              kernel.release.t8101`0xfffffe0014bff2f0+0x300
              kernel.release.t8101`0xfffffe0014c00bd0+0x54c
              kernel.release.t8101`0xfffffe0014bfd898+0x724
              kernel.release.t8101`cluster_write_ext+0x2a4
              kernel.release.t8101`cluster_write+0x1c
              myfs`myfs_vnop_write+0xb18
              kernel.release.t8101`VNOP_WRITE+0x84


dtrace(1)'ing the same operation on smbfs shows a different sequence of calls.

cluster_write runs start through finish:

  4  -> cluster_write                         proc: cp -> cluster_write(vp: progit_cp.pdf, uio, originalEOF: 0, newEOF: 524288, zero_head_off: 0, zero_tail_off: 524288, lflag: 0x100001)
...
  4  -> cluster_write                         proc: cp -> cluster_write(vp: progit_cp.pdf, uio, originalEOF: 18874368, newEOF: 18908656, zero_head_off: 0, zero_tail_off: 18908656, lflag: 0x100001)

              smbfs`smbfs_vnop_write+0x354

With vnop_strategy invoked through the cluster_push path when the file gets closed:

  2  -> smbfs_vnop_strategy                   proc: cp file: progit_cp.pdf b_bufsize: 0 b_bcount: 8388608 b_resid: 0

              kernel.release.t8101`0xfffffe0026cdb610+0xcec
              kernel.release.t8101`0xfffffe0026cdf2f0+0x300
              kernel.release.t8101`cluster_push_err+0x2e0
              smbfs`smbfs_vnop_close+0x204


I don't understand why my filesystem behaves as described above.

Perusing the source code of both the relevant parts of Darwin/XNU and SMBClient did not clarify things for me.

A nudge in the right direction would be greatly appreciated.
Answered by DTS Engineer in 797407022

Looking at your specific example:

49152 / 16384 (16k) = 3

34288 / 16384 (16k) = ~2.09

Have you looked closely at exactly what inputs you received and what you then wrote back out? The simplest explanation here would be that your wrote back out the "full" pages you received from the kernel (which would have been zero filled) instead of only writing out the logical data of the file.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Accepted Answer

Looking at your specific example:

49152 / 16384 (16k) = 3

34288 / 16384 (16k) = ~2.09

Have you looked closely at exactly what inputs you received and what you then wrote back out? The simplest explanation here would be that your wrote back out the "full" pages you received from the kernel (which would have been zero filled) instead of only writing out the logical data of the file.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Passing the smaller value of buf_count and the difference between the file size and the current offset into the file, instead of passing the value returned by buf_count, into myfs_write function fixes the problem.

Thanks very much Kevin Elliott for the tip.

Happy to help!

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

vnop_strategy unexpectedly zero-extends files
 
 
Q