How my filesystem is very similar to a passthrough filesystem. Operations on the mounted folder get mapped to a folder in the users home directory with some extra steps (encryption and some metadata saved in a local database that facilitates that encryption).
Just to let you know, Stacking File Systems are something we've pretty clearly recommend against:
"Apple does not support the development of stacking VFS plug-ins on Mac OS X (r. 4383626) ."
I achieve the expected result after a swap on my Virtual Filesystem Tree. The Inode ID's are preserved while the file contents get swapped.
Yes, but are they preserved FOREVER. Jumping to here:
My need for renamex_np RENAME_SWAP flag was needed to atomically swap the underlying files.
What need? As I said earlier, your file system is not required to support atomic swaps and the system will take care of it if it doesn't. In my view, there are only two options here that are a good idea:
-Call renamex_np so that RENAME_SWAP does in fact occur.
-Tell the system that you don't support atomic swaps.
Doing something like this is not viable unless you're willing to FULLY simulate all of the necessary long term ID preservation.
I achieve this by calling just rename with retries and rollback in case of errors.
What you're describing there is NOT equivalent to RENAME_SWAP. In my earlier message, I laid this out for rename swap:
ID 5, Name: Foo, Content "ABC" ID 10, Name: Bar, Content "EFG"
The expected results afer a swap is:
ID 5, Name: Foo, Content "EFG" ID 10, Name: Bar, Content "ABC"
What you're describing does this:
ID 10, Name: Foo, Content "EFG" ID 5, Name: Bar, Content "ABC"
That's a perfectly valid series of renames, but it's NOT a valid implementation of RENAME_SWAP. The required/expected behavior of RENAME_SWAP is SPECIFICALLY that the inode number of the target and source will NOT change, only their contents.
The reason this matters is that larger system (outside of the VFS system) also uses inode numbers to track files and it expect those inode numbers to "work" they way they "should". More specifically, if you're file system support "VOL_CAP_FMT_PERSISTENTOBJECTIDS" (see the man page for getattrlist), then the are MANY situations where the system* is going to store the persistance ID and then use that ID to attempt to access the file again. In the sequence above, that would mean that you'd return "Foo" at the beginning and "Bar" at then end, which would cause exactly what you're seeing.
*I believe file access scoping system in the sandbox also uses persistent IDs, which could explain why this particular case was tied to sandbox'd apps.
The issue with open (1) is that it calls getattrlist (2) on myFile.txt
and then it tries getattrlist (2) on myFile.txt.sb-123../myFile.txt
exactly 3 times, each time failing with ENOENT before it throws the
fnfError: File Not Found.
I think we need to be a bit more precise about what EXACTLY you're seeing and how it's playing out. Case in point:
a) When you say "open(1)" calls "getattrlist", how are you actually determining that? Are you specifically tracing calls from "getattrlist" from the "open" process? Or are you correlating activity to your file system against "open"?
b) When you say "getattrlist" to you SPECIFICALLY mean "a process called the sycall getattrlist", or do you mean "my file system received requests that translate to getattrlist"?
c) How are you determining the target path? Is that the specific input passed in by the calling process (if so, how are you tracing that?) or is the the path your file system generated from the vnode_t it was called with?
The difference here matters because of the difference expands the code and syscall range. What you're describing is slightly odd if you're describing something very specific and easy to explain if you mean something "broader". More specifically:
"calls getattrlist (2) on myFile.txt"-> this was open(1) generating the bookmark it passed to launchservicesd
"calls getattrlist (2) on myFile.txt.sb-123../myFile.txt"-> this was launchservicesd or one of it's supporting daemon's attempting to resolve the bookmark it received, probably by calling "fsgetpath", though it might have been a call secondary to that. Note that multiple calls occur because bookmark resolution tries slightly different resolution techniques before failing.
Looking at the code to "fsgetpath", it's actual implementation may not really be "visible" to your plugin, either because you overlooked it or because FUSE handles "build_path" entirely within the kernel.
Using the macFUSE loopback filesystem example
I took a quick look and the most obvious difference is that it's implementation of "exchangeDataOfItemAtPath" calls "renamex_np".
I think there is no vnode confusion, I have also tried the option to let macFUSE not cache the vnodes. I am stuck a bit in brain rot territory, and I find any new insight helpful as I might be missing something obvious.
I think your biggest block here is that you're thinking is overly focused on paths. File systems don't actually "think" in terms of paths. They need to be able to track an object before it exists, at it moves, changes names, and is eventually deleted. What you're saying here:
"calls getattrlist (2) on myFile.txt.sb-123../myFile.txt"
CANNOT have happened. Once that file was deleted, the higher level system "lost" that file, so there's no way they could have "come up" with that path. Even if there were more complicated components involved (like our file versioning system), there's no reason that would have been looked at during the launch process. All that part of the system cares about is the file it was told to start with. The only component that could (easily) continue to track that object is the VFS system. If you keep pulling on that side of the issue, I think you'll eventually find the problem.
Kevin Elliott
DTS Engineer, CoreOS/Hardware