Ditto cannot extract ZIP file into filesystem-compressed files

It's quite common for app bundles to be distributed in .zip files, and to be stored on-disk as filesystem-compressed files. However, having them both appears to be an edge case that's broken for at least two major releases! (FB19048357, FB19329524)

I'd expect a simple ditto -x -k appbundle.zip ~/Applications (-x: extract, -k: work on a zip file) to work. Instead it spits out countless errors and leaves 0 Byte files in the aftermath 😭

Please fix.

Answered by DTS Engineer in 859331022

It's quite common for app bundles to be distributed in .zip files, and to be stored on-disk as filesystem-compressed files. However, having them both appears to be an edge case that's broken for at least two major releases! (FB19048357, FB19329524)

Let me start with a more basic question, namely, what are you actually trying to do here? The filesystems support for compressed files is a fairly obscure implementation detail that we never formally documented and never intended for widespread use. Note, for example, that its implementation is incompatible with custom icons*. Similarly, support for them in the broader ecosystem is inconsistent, and copying them incorrectly will result in a non-functional file. That isn't an issue for the role the system intended them for**, but it is a problem in broader usage.

*Compressed files "repurposed" the resource forks, while custom file icons use essentially the same resource fork-based architecture that they used on macOS Classic.

**Basically, further reducing the size of very small read-only data files.

Moving to the specific command here:

I'd expect a simple ditto -x -k appbundle.zip ~/Applications (-x: extract, -k: work on a zip file) to work.

To be honest, it actually worked better than I was expecting. What I expected was that it would write out the resource fork data while dropping the attribute that marks a file as compressed, leaving you with a compressed but broken file. It appears what it's actually doing is writing the uncompressed data to the data fork, then erroring out trying to deal with the compressed data. I'm not entirely sure how that's playing out, but I suspect what's actually happened is that the original zip archive actually saved the files in their uncompressed state, but also preserved the file attribute that marked them as "compressed". Note that the file uncompressed just fine with archive utility, but produces an uncompressed copy.

The summary here is that if you choose to use this file system feature, you may also need to work around parts of the system that don't handle it properly. File-level compression isn't a full part of the system and never has been.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

EDIT: ditto -x -k --hfsCompression appbundle.zip ~/Applications

Accepted Answer

It's quite common for app bundles to be distributed in .zip files, and to be stored on-disk as filesystem-compressed files. However, having them both appears to be an edge case that's broken for at least two major releases! (FB19048357, FB19329524)

Let me start with a more basic question, namely, what are you actually trying to do here? The filesystems support for compressed files is a fairly obscure implementation detail that we never formally documented and never intended for widespread use. Note, for example, that its implementation is incompatible with custom icons*. Similarly, support for them in the broader ecosystem is inconsistent, and copying them incorrectly will result in a non-functional file. That isn't an issue for the role the system intended them for**, but it is a problem in broader usage.

*Compressed files "repurposed" the resource forks, while custom file icons use essentially the same resource fork-based architecture that they used on macOS Classic.

**Basically, further reducing the size of very small read-only data files.

Moving to the specific command here:

I'd expect a simple ditto -x -k appbundle.zip ~/Applications (-x: extract, -k: work on a zip file) to work.

To be honest, it actually worked better than I was expecting. What I expected was that it would write out the resource fork data while dropping the attribute that marks a file as compressed, leaving you with a compressed but broken file. It appears what it's actually doing is writing the uncompressed data to the data fork, then erroring out trying to deal with the compressed data. I'm not entirely sure how that's playing out, but I suspect what's actually happened is that the original zip archive actually saved the files in their uncompressed state, but also preserved the file attribute that marked them as "compressed". Note that the file uncompressed just fine with archive utility, but produces an uncompressed copy.

The summary here is that if you choose to use this file system feature, you may also need to work around parts of the system that don't handle it properly. File-level compression isn't a full part of the system and never has been.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

what are you actually trying to do here?

Compressing bloated text files and binaries from user land .app bundles, for sure. Space savings is usually ~50%.

never intended for widespread use

Really? I would count having all system .app compressed on everyone's Mac as "widespread use".

Moreover, everyone loves macOS's compressed .dmg! It's a shame the app bundle I dragged out of it, takes more than 2x the space than the distribution .dmg.[1]

Transparent decompression for static data is also very welcome in a world of "structured plain text as file format".

you may also need to work around parts of the system that don't handle it properly

Well, it sort of works out in the current state - I just need to extract the .zip somewhere, and then ditto --hfsCompression /tmp/Some.app ~/Applications to get a compressed version of it. Other than a waste of Total Bytes Written, it's fine.

[1]: .dmg files are easy to deal with: just point ditto at it like ditto --hfsCompression /Volumes/SomeApp/SomeApp.app ~/Applications and everything usually works out. However I cannot mount a .zip file, and need to count on ditto itself.

Really? I would count having all system .app compressed on everyone's Mac as "widespread use".

Yes. I think well-designed, file system-level compression is a very reasonable feature that could be very useful. Unfortunately, that's not really what we implemented in HFS+ or APFS. There's a reason our APIs don't really support compressed files, and that's because their file system implementation means that they aren't fully supported by the entire system.

Moreover, everyone loves macOS's compressed .dmg! It's a shame the app bundle I dragged out of it takes more than 2x the space than the distribution .dmg.[1]

FYI, part of the reason this works so well is the other side of the same problem that makes it less valuable in the general file system. The file system manages storage in terms of allocation blocks (typically 4kb on our system), each of which is "used" regardless of the files’ actual contents. You can see this if you create a file with a few bytes of text in it, then "Get Info" on it using the Finder. On APFS, you'll see something like this for its size:

X bytes (4 KB on disk)

...where "X" is the file’s logical size (how many bytes you actually typed) and 4 KB is the allocation block that's being consumed to store it.

What actually led to compressed files being introduced on HFS+ WASN'T simply that they were "smaller", but was actually another innovation which allowed small amounts of file data to be stored directly in the catalog file, instead of consuming individual allocation blocks. Compression improved that benefit further, by both shrinking the existing "small files" and creating more "small files". However, that architecture doesn't really work for APFS.

In any case, returning to here:

Moreover, everyone loves macOS's compressed .dmg! It's a shame the app bundle I dragged out of it takes more than 2x the space than the distribution .dmg.[1]

The file system on a disk image has exactly the same allocation block issue; however, the compression is actually operating on the disk image file itself, NOT the individual data block. So, when you compress a disk image full of small files, what's actually being compressed is a long stream of "0" (the unused part of the allocation block) with small amounts of actual data mixed in, something most compression algorithms are particularly good at compressing.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Ditto cannot extract ZIP file into filesystem-compressed files
 
 
Q