Actually, for now, I'm going to put the project on ice, since I can't get the satisfactory performance out of it.
I totally understand, but I'd still appreciate you filing a bug for an App Sandox compliant, non-block FS mount API. This is an area that we're actively looking at, and developer bugs are helpful for scheduling and work prioritization.
To compare, for the same functionality, the macFUSE implementation uses about 40% of CPU, while the FSKit one uses between 100 and 150%.
I tried squeezing the last drops of performance out of the FSKit one, but it seems I hit the ceiling.
Unfortunately, the answer here is basically "yes, that's where things stand today". We haven't done much to optimize the FSVolumeReadWriteOperations path, so its current performance is not great. That's definitely something I'd expect to improve over time.
Unfortunately, I also can't seem to use FSVolumeKernelOffloadedIOOperations, since this is a virtual filesystem, without the underlying block device or similar (correct me if I'm wrong).
Hmmm.... well, yes and no. Strictly speaking, you're absolutely right. FSVolumeKernelOffloadedIOOperations works by passing dev node offsets into the kernel, so you can't use it without a dev node.
However, strictly speaking, you can actually pass the "diskimage-class=CRawDiskImage" option into hdiutil, which will then use the backing store for a device node. There's actually a post on this here by someone trying to get MFSLives working again.
Actually, using this technique on an arbitrary file might require zero padding the file so that you're an even block multiple, but file cloning means that isn't a particularly slow/expensive operation. Aside from that, I think this would let you use FSVolumeKernelOffloadedIOOperations for anything that only targeted a single file. Ironically, it would also sidestep the app sandbox mount issue, as you'd not be able to mount using DAMount.
Having said that, using FSVolumeKernelOffloadedIOOperations also requires you to be directly returning the source data (not post-processing the data first), so I'm not sure how well it would work for what you're doing. I'm also not sure how the performance and effort would work out, but there certainly are cases where it might be a useful approach.
__
Kevin Elliott
DTS Engineer, CoreOS/Hardware