open (1) fails with fnfErr while open (2) succeeds on custom filesystem

Hello,

I have developed a custom filesystem in golang, that relies on macFUSE.

High-level apps on osx (TextEdit, Numbers, Preview) rely on syscall.renamex_np with the flag RENAME_SWAP in order to save edits.

In golang, the sys call renamex_np and renameat2 are not supported, thus I had to implement the logic for it it.

The discussion opened on the google group for macFUSE can be followed here: https://groups.google.com/g/osxfuse-group/c/Kh0qVRGIVv4

On my mounted filesystem, edits work and performing system calls work. However after I perform a series of edits in TextEdit, and completely exit TextEdit. When I call open (1) on the file I get the following error:

The application cannot be opened for an unexpected reason, error=Error Domain=NSOSStatusErrorDomain Code=-43 "fnfErr: File not found" UserInfo={_LSLine=4129, _LSFunction=_LSOpenStuffCallLocal}

From the logs of my app, there is no open (2) called on the file.

I have tried to (trace) dtruss the open call for Numbers/TextEdit, but when I perform the above scenario, my Mac system freezes and the piped output from dtruss is 0 bytes after rebooting my system.

How can I debug my issue? Where can I find more documentation of the order of system calls for open (1)? I couldn't find the source code for renamex_np thus my implementation relied on the linux kernel implementation of renameat2, does renamex_np do something different?

I note that, if I open TextEdit for example, and then open my file, there is no problem. Also calling cat for example on the terminal it displays the content correctly. The problem seems to be from open (1). Furthermore, if I perform a rename of the file, open (1) succeeds in opening the file, until I perform at least another edit from a high-level app (that calls rename with the swap flag). Also if I unmount my filesystem and mount it again, open (1) behaves correctly.

How can I understand what open (1) is doing under the hood? For the high-level apps I could trace the system calls and figure out why they didn't work, but now I reached a point (scenario) where I can't trace the system calls for open (1) due to my whole system freezing.

Any input is appreciated.

Answered by DTS Engineer in 791895022

Thank you for pointing it out, I would have been stuck on this for a while. I think that once I figure out why the Inode doesn't get removed (even if there are no open File Descriptors in it, but children still present), the problem should be solved.

You're very welcome, glad to hear you were able to get to the bottom of this.

One quick comment on this:

your file system is not required to support atomic swaps and the system will take care of it if it doesn't

When the rename with the swap flag is called, returning syscall.EINVAL, syscall.ENOTSUP, sys call.ENOSYS has the effect that edits using TextEdit/Numbers/Preview cannot be saved. I need to support rename with swap.

The issue here is, IMHO, caused by a bug in FUSE. I believe what's going on here is that they don't provide any way for you to set "VOL_CAP_INT_EXCHANGEDATA" and/or "VOL_CAP_INT_RENAME_SWAP" to "false". This is what they SHOULD be doing in your case (and probably MANY other file systems). That's also why the failure above happens- your file system said "I support rename swap" but is then failing every call to "RENAME_SWAP". We never try anything else because the file system told us that it would work.

What SHOULD have happened here is that FUSE should have returned "false" for VOL_CAP_INT_EXCHANGEDATA/VOL_CAP_INT_RENAME_SWAP, at which point the system would never have tried RENAME_SWAP.

I would strongly encourage you to follow up with the FUSE team on this. Many file system don't support atomic exchanges and claiming support without a valid implementation risks data loss.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

I managed to debug a bit and finally trace the open (1) call.

It seems that before the fail, it tries to get the attribute for the file, it succeeds, but then it also tries to get the attribute for the myFile.sb-6c28314c-zqOUh7/myFile.txt which returns ENOENT.

From the logs of my app I can see that unlink (rmdir) was called on the .sb directory when I closed TextEdit.

My mystery is, why would it try to get the attribute for the file that was removed by a system call triggered by the app on close, and fail so disastrously on ENOENT.

Posting something I'd started working on before next message....

Starting here, as I think that context will clarify some other issues you're having:

I couldn't find the source code for renamex_np thus my implementation relied on the linux kernel implementation of renameat2,

The reason you're having trouble finding "it's implementation" is that, like most VFS operations, it doesn't really have "an" implementation as such. VFS drivers work by exporting a set of function pointers to the higher level system, each of which implement a specific file system operation. SO, what a basic operation like "open(2)" actually "does" is basically "call the VNOP_OPEN operation implemented by the filesystem targeted by that path". What THAT files system actually does in response to VNOP_OPEN is entirely up to it. There's obviously an expected "semantic", but it's not like the system has anyway to "make" that happen.

Anyway, what can make this hard to follow in our source code is that the layer between the incoming syscall (like renamex_np) and it's target VNOP (like VNOP_RENAMEX) is pretty hard to follow. I've ended up trying to follow it a few different times, but it's invariably easy to just "jump" to the lower level and infer the connection.

For reference, the VNOP definitions are in "vnode_if.h", which is also pretty well commented.

does renamex_np do something different?

Yes, very different. The man page for renamex_np lists the flags renamex_np accepts and the most important flag there is:

RENAME_SWAP
                   On file systems that support it (see getattrlist(2)
                   VOL_CAP_INT_RENAME_SWAP), it will cause the source and
                   target to be atomically swapped.  Source and target need
                   not be of the same type, i.e. it is possible to swap a file
                   with a directory.  EINVAL is returned in case of bitwise-
                   inclusive OR with RENAME_EXCL.

renamex_np is a replacement/extension of the older "exchangedata" function (see the exchangedata man page for more context). Both of those function exist to support file system that have been designed to allow two objects to be "swapped" in a single operation.

That leads to this:

In golang, the sys call renamex_np and renameat2 are not supported, thus I had to implement the logic for it it.

Do NOT do this. Many (most?) file systems do not support atomic object exchange, something the higher level system is designed to handle. For example, the reason this file manager method has a "backupItemName" is that it's specifically used on file systems where atomic exchange is not supported. Claiming to support atomic exchange ("VOL_CAP_INT_RENAME_SWAP") and then failing to implement correctly will intentionally create data loss. I don't know how FUSE handles this, but I'm sure it does.

Related to that point, you need to be careful about inferring expected file system behavior from syscall activity. Much of our code always calls "renamex_np" because, within the kernel, both functions end up calling "renameat_internal". Rename just passes in "0" for the "flags" argument.

The "actual" issue here is "can my file system support atomic file swaps" and if the answer is no then that's what you should tell the system.

Moving on to the earlier question:

On my mounted filesystem, edits work and performing system calls work. However after I perform a series of edits in TextEdit, and completely exit TextEdit. When I call open (1) on the file I get the following error: ... From the logs of my app, there is no open (2) called on the file.

Yes. That's because the command line "open" does not a "open" the file, nor is it really anything like a "syscall". The system (and people) are using the word "open" to refer to COMPLETELY different operations.

From open(1) man page:

The open command opens a file (or a directory or URL), just as if you had double-clicked the file's icon. If no application name is specified, the default application as determined via LaunchServices is used to open the specified files.

When you double click on a file in the Finder, what actually happens is a series of messages between daemon's that:

  1. Figure out which app(s) have said they're able to work with that file.
  2. Pick a target app from that list based on the systems own heuristics.
  3. Create a reference (scoped bookmarks/NSURL) to that file which will allow the target app to open the file
  4. If necessary, open the app.
  5. Send an open AppleEvent to the app, handing it the reference created in #3.

In terms of our public API, this is the same process NSWorkspace's various "openURL:configuration..." methods do.

Shifting to the error here:

The application cannot be opened for an unexpected reason, error=Error Domain=NSOSStatusErrorDomain Code=-43 "fnfErr: File not found" UserInfo={_LSLine=4129, _LSFunction=_LSOpenStuffCallLocal}

This is just a guess, but I suspect you haven't properly implemented persistent file identifier's, which ends up breaking bookmark resolution. The system the resolves your reference to set up the open event for the app, but then fails the operation because it can't find the file. It would be very hard to get the timing right, but I think you'd get exactly the same error if you called open and the IMMEDIATELY deleted the file.

Shifting to the debugging/investigation side.

How can I understand what open (1) is doing under the hood? For the high-level apps I could trace the system calls and figure out why they didn't work

Even with complete access to the system and no hangs, I'm not sure this would actually tell you anything. This kind of tracing works in straightforward cases like "a function returned an error which shouldn't have" but is much less helpful when you're dealing with failures that are side effects of your own behavior. Indeed, thinking in terms of "errors" is often misleading. Something like bookmark resolution failing is not an "error", it's simply one of the possible resolution results.


Kevin Elliott
DTS Engineer, CoreOS/Hardware

It seems that before the fail, it tries to get the attribute for the file, it succeeds, but then it also tries to get the attribute for the myFile.sb-6c28314c-zqOUh7/myFile.txt which returns ENOENT. From the logs of my app I can see that unlink (rmdir) was called on the .sb directory when I closed TextEdit.

I think you may be asking the wrong question. As far as the file system is concerned, paths are basically an incidental form of metadata that it doesn't really pay attention to. The VFS layer doesn't really use them at all. If you look in vnode_if.h, most VFS operation (including VNOP_OPEN) operate on vnode_t's, not file paths. I think what's happened here is that your attempt to implement atomic saves "yourself" has completely messed up the ID persistance. First, as context, image you start with:

ID 5, Name: Foo, Content "ABC" ID 10, Name: Bar, Content "EFG"

The expected results afer a swap is:

ID 5, Name: Foo, Content "EFG" ID 10, Name: Bar, Content "ABC"

This gives you the same ID stability that "open"-> "write"-> "close" would, without the risk of partial contents or file "damage".

Side Note: Just to be clear, the modification above CANNOT be implemented from user space. File IDs are directly controlled by the file system itself and no file system I'm aware of would allow them to be modified.

The ID issue is critical here because it explains what can create something like this:

My mystery is, why would it try to get the attribute for the file that was removed by a system call triggered by the app on close, and fail so disastrously on ENOENT.

I don't think it did. As far as user space is concerned, that file file should be "gone".
I'm not completely sure about what's going on, but my guess is that the vnode reference you're returning for "myFile.txt" is actually a reference to "myFile.sb-6c28314c-zqOUh7/myFile.txt". The system then says "what's the path for this vnode_t", which then fails because "myFile.sb-6c28314c-zqOUh7/myFile.txt" no longer exists.

That would also explain why this worked:

Furthermore, if I perform a rename of the file, open (1) succeeds in opening the file, until I perform at least another edit from a high-level app (that calls rename with the swap flag).

I suspect that your "basic" rename either cleared the vnode confusion above, or never generated the "swap" processes (so nothing go confused).


Kevin Elliott
DTS Engineer, CoreOS/Hardware

Thank you for the detailed responses!

I achieve the expected result after a swap on my Virtual Filesystem Tree. The Inode ID's are preserved while the file contents get swapped.

How my filesystem is very similar to a passthrough filesystem. Operations on the mounted folder get mapped to a folder in the users home directory with some extra steps (encryption and some metadata saved in a local database that facilitates that encryption).

My DirectoryNodes and FileNodes do not keep file descriptors open for the underlying folder, just a path.

On each system call Open/Read/Write/... on my mounted folder, I get the file descriptor for the mounted folder that I have control over in my code in my version of the Virtual Filesystem Tree and map it on the node's path on the underlying folder.

My need for renamex_np RENAME_SWAP flag was needed to atomically swap the underlying files. I achieve this by calling just rename with retries and rollback in case of errors.

While this happens, my DirectoryNode and TargetDirectoryNode are under lock, the same as the files being swapped.

At the end of the rename with swap, I just swap between the FileNodes the underlying path, and the attribute size; and release all locks.

I can perform an unlimited number of edits in TextEdit and an unlimited number of rename with Swap on my mounted folder, and open (2) followed by read (2) the virtual file to inspect the contents or write (2) to modify it.

The problem arrises after I exit TextEdit, the call to open (1) fails with the details from the above post.

I suspect that your "basic" rename either cleared the vnode confusion above

I hope yes, on my basic rename, the Inode ID is preserved, just the name is changed and rename (2) is called on the underlying folder, and my FileNode might be moved in my Virtual Filesystem Tree to a different parent under the new name, and removed from the previous parent as the old name.

From the logs of my app, there was never an issue of failing to access the underlying folders file.

The issue with open (1) is that it calls getattrlist (2) on myFile.txt

and then it tries getattrlist (2) on myFile.txt.sb-123../myFile.txt exactly 3 times, each time failing with ENOENT before it throws the fnfError: File Not Found.

Using the macFUSE loopback filesystem example, or cgofuse passthrough example from the winfsp developer, I cannot reproduce the issue.

It just seems it is affecting my version of implementation of a filesystem.

I think there is no vnode confusion, I have also tried the option to let macFUSE not cache the vnodes. I am stuck a bit in brain rot territory, and I find any new insight helpful as I might be missing something obvious.

Accepted Answer

How my filesystem is very similar to a passthrough filesystem. Operations on the mounted folder get mapped to a folder in the users home directory with some extra steps (encryption and some metadata saved in a local database that facilitates that encryption).

Just to let you know, Stacking File Systems are something we've pretty clearly recommend against:

"Apple does not support the development of stacking VFS plug-ins on Mac OS X (r. 4383626) ."

I achieve the expected result after a swap on my Virtual Filesystem Tree. The Inode ID's are preserved while the file contents get swapped.

Yes, but are they preserved FOREVER. Jumping to here:

My need for renamex_np RENAME_SWAP flag was needed to atomically swap the underlying files.

What need? As I said earlier, your file system is not required to support atomic swaps and the system will take care of it if it doesn't. In my view, there are only two options here that are a good idea:

-Call renamex_np so that RENAME_SWAP does in fact occur.

-Tell the system that you don't support atomic swaps.

Doing something like this is not viable unless you're willing to FULLY simulate all of the necessary long term ID preservation.

I achieve this by calling just rename with retries and rollback in case of errors.

What you're describing there is NOT equivalent to RENAME_SWAP. In my earlier message, I laid this out for rename swap:

ID 5, Name: Foo, Content "ABC" ID 10, Name: Bar, Content "EFG"

The expected results afer a swap is:

ID 5, Name: Foo, Content "EFG" ID 10, Name: Bar, Content "ABC"

What you're describing does this:

ID 10, Name: Foo, Content "EFG" ID 5, Name: Bar, Content "ABC"

That's a perfectly valid series of renames, but it's NOT a valid implementation of RENAME_SWAP. The required/expected behavior of RENAME_SWAP is SPECIFICALLY that the inode number of the target and source will NOT change, only their contents.

The reason this matters is that larger system (outside of the VFS system) also uses inode numbers to track files and it expect those inode numbers to "work" they way they "should". More specifically, if you're file system support "VOL_CAP_FMT_PERSISTENTOBJECTIDS" (see the man page for getattrlist), then the are MANY situations where the system* is going to store the persistance ID and then use that ID to attempt to access the file again. In the sequence above, that would mean that you'd return "Foo" at the beginning and "Bar" at then end, which would cause exactly what you're seeing.

*I believe file access scoping system in the sandbox also uses persistent IDs, which could explain why this particular case was tied to sandbox'd apps.

The issue with open (1) is that it calls getattrlist (2) on myFile.txt and then it tries getattrlist (2) on myFile.txt.sb-123../myFile.txt exactly 3 times, each time failing with ENOENT before it throws the fnfError: File Not Found.

I think we need to be a bit more precise about what EXACTLY you're seeing and how it's playing out. Case in point:

a) When you say "open(1)" calls "getattrlist", how are you actually determining that? Are you specifically tracing calls from "getattrlist" from the "open" process? Or are you correlating activity to your file system against "open"?

b) When you say "getattrlist" to you SPECIFICALLY mean "a process called the sycall getattrlist", or do you mean "my file system received requests that translate to getattrlist"?

c) How are you determining the target path? Is that the specific input passed in by the calling process (if so, how are you tracing that?) or is the the path your file system generated from the vnode_t it was called with?

The difference here matters because of the difference expands the code and syscall range. What you're describing is slightly odd if you're describing something very specific and easy to explain if you mean something "broader". More specifically:

"calls getattrlist (2) on myFile.txt"-> this was open(1) generating the bookmark it passed to launchservicesd

"calls getattrlist (2) on myFile.txt.sb-123../myFile.txt"-> this was launchservicesd or one of it's supporting daemon's attempting to resolve the bookmark it received, probably by calling "fsgetpath", though it might have been a call secondary to that. Note that multiple calls occur because bookmark resolution tries slightly different resolution techniques before failing.

Looking at the code to "fsgetpath", it's actual implementation may not really be "visible" to your plugin, either because you overlooked it or because FUSE handles "build_path" entirely within the kernel.

Using the macFUSE loopback filesystem example

I took a quick look and the most obvious difference is that it's implementation of "exchangeDataOfItemAtPath" calls "renamex_np".

I think there is no vnode confusion, I have also tried the option to let macFUSE not cache the vnodes. I am stuck a bit in brain rot territory, and I find any new insight helpful as I might be missing something obvious.

I think your biggest block here is that you're thinking is overly focused on paths. File systems don't actually "think" in terms of paths. They need to be able to track an object before it exists, at it moves, changes names, and is eventually deleted. What you're saying here:

"calls getattrlist (2) on myFile.txt.sb-123../myFile.txt"

CANNOT have happened. Once that file was deleted, the higher level system "lost" that file, so there's no way they could have "come up" with that path. Even if there were more complicated components involved (like our file versioning system), there's no reason that would have been looked at during the launch process. All that part of the system cares about is the file it was told to start with. The only component that could (easily) continue to track that object is the VFS system. If you keep pulling on that side of the issue, I think you'll eventually find the problem.


Kevin Elliott
DTS Engineer, CoreOS/Hardware

your file system is not required to support atomic swaps and the system will take care of it if it doesn't

When the rename with the swap flag is called, returning syscall.EINVAL, syscall.ENOTSUP, sys call.ENOSYS has the effect that edits using TextEdit/Numbers/Preview cannot be saved. I need to support rename with swap.

For the underlying filesystem it is indeed not rename swap. The underlying Inodes are not preserved, and that should be ok. In the programming language of the project (golang), the sys call renamex_np is not supported.

My filesystem has its own Inode assigned based on my lookup method (the Inodes will be different on filesystem restarts), the rename with swap, just swaps the underlying two files that have no effect on the Inode number. In my read/write/getattr methods I just read/write with offset from the underlying file, and for getattr I just stat the underlying file.

the inode number of the target and source will NOT change, only their contents.

The above holds true for my filesystems rename with swap. The underlying filesystem has different Inodes, and the corresponding Inode changes when I swap the data, but that should be ok, I do not keep any open file descriptors to that Inode from my filesystem.

a) When you say "open(1)" calls "getattrlist", how are you actually determining that?

I am using dtruss open myFile.txt 2> ../logs.txt, I am tracing the system calls for open (1).

b) When you say "getattrlist" to you SPECIFICALLY mean "a process called the sycall getattrlist"

A process specifically called getattrlist, from my filesystem logs, it received a lookup that replies with syscall.ENOENT

c) How are you determining the target path? Is that the specific input passed in by the calling process

According to the output of the stderr of dtruss open myFile.txt, it was the input for getattrlist (2) passed by the calling process. My filesystem Root Directory Entry calls lookup for the myFile.txt.sb-123.. and it fails with ENOENT, if it would find it, then the Directory Entry corresponding to myFile.txt.sb-123.. would call lookup for myFile.txt and if found it would call Getattr.

The only component that could (easily) continue to track that object is the VFS system. If you keep pulling on that side of the issue, I think you'll eventually find the problem.

Thank you, I will hammer through on this direction.

The only component that could (easily) continue to track that object is the VFS system. If you keep pulling on that side of the issue, I think you'll eventually find the problem.

The library I am using for the macFUSE communication doesn't drop the directory Inode on a Rmdir call. It considers it still alive, thus making it visible; however I try to drop the Inode after I perform the actual unlink, which ends up with a ENOENT on the system.

Thank you for pointing it out, I would have been stuck on this for a while. I think that once I figure out why the Inode doesn't get removed (even if there are no open File Descriptors in it, but children still present), the problem should be solved.

Thank you for pointing it out, I would have been stuck on this for a while. I think that once I figure out why the Inode doesn't get removed (even if there are no open File Descriptors in it, but children still present), the problem should be solved.

You're very welcome, glad to hear you were able to get to the bottom of this.

One quick comment on this:

your file system is not required to support atomic swaps and the system will take care of it if it doesn't

When the rename with the swap flag is called, returning syscall.EINVAL, syscall.ENOTSUP, sys call.ENOSYS has the effect that edits using TextEdit/Numbers/Preview cannot be saved. I need to support rename with swap.

The issue here is, IMHO, caused by a bug in FUSE. I believe what's going on here is that they don't provide any way for you to set "VOL_CAP_INT_EXCHANGEDATA" and/or "VOL_CAP_INT_RENAME_SWAP" to "false". This is what they SHOULD be doing in your case (and probably MANY other file systems). That's also why the failure above happens- your file system said "I support rename swap" but is then failing every call to "RENAME_SWAP". We never try anything else because the file system told us that it would work.

What SHOULD have happened here is that FUSE should have returned "false" for VOL_CAP_INT_EXCHANGEDATA/VOL_CAP_INT_RENAME_SWAP, at which point the system would never have tried RENAME_SWAP.

I would strongly encourage you to follow up with the FUSE team on this. Many file system don't support atomic exchanges and claiming support without a valid implementation risks data loss.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

open (1) fails with fnfErr while open (2) succeeds on custom filesystem
 
 
Q