Hardlinks reported as non-existing on macOS Sequoia for 3rd party FS

After creating a hardlink on a distributed filesystem of my own via:

% ln f.txt hlf.txt

Neither the original file, f.txt, nor the hardlink, hlf.txt, are immediately accessible, e.g. via cat(1) with ENOENT returned. A short time later though, both the original file and the hardlink are accessible. Both files can be stat(1)ed though, which confirms that vnop_getattr returns success for both files.

Dtruss(1) indicates it's the open(2) syscall that fails:

% sudo dtruss -f cat hlf.txt
2038/0x4f68:  open("hlf.txt\0", 0x0, 0x0)		 = -1 Err#2 ;ENOENT
 2038/0x4f68:  write_nocancel(0x2, "cat: \0", 0x5)		 = 5 0
 2038/0x4f68:  write_nocancel(0x2, "hlf.txt\0", 0x7)		 = 7 0
 2038/0x4f68:  write_nocancel(0x2, ": \0", 0x2)		 = 2 0
 2038/0x4f68:  write_nocancel(0x2, "No such file or directory\n\0", 0x1A)		 = 26 0

Dtrace(1)ing my KEXT no longer works on macOS Sequoia, so based on the diagnostics print statements I inserted into my KEXT, the following sequence of calls is observed:

vnop_lookup(hlf.txt) -> EJUSTRETURN ;ln(1)
vnop_link(hlf.txt) -> KERN_SUCCESS ;ln(1)
vnop_lookup(hlf.txt) -> KERN_SUCCESS ;cat(1)
vnop_open(/) ; I expected to see vnop_open(hlf.txt) here instead of the parent directory.

Internally, hardlinks are created in vnop_link via a call to vnode_setmultipath with cache_purge_negatives called on the destination directory.

On macOS Monterey for example, where the same code does result in hardlinks being accessible, the following calls are made:

vnop_lookup(hlf.txt) -> EJUSTRETURN ;ln(1)
vnop_link(hlf.txt) -> KERN_SUCCESS ;ln(1)
vnop_lookup(hlf.txt) -> KERN_SUCCESS ;cat(1)
vnop_open(hlf.txt) -> KERN_SUCCESS ;cat(1)

Not sure how else to debug this.

Perusing the kernel sources for uses of VISHARDLINK, VNOP_LINK and vnode_setmultipath call sites did not clear things up for me.

Any pointers would be greatly appreciated.

Answered by DTS Engineer in 850744022

So, I can't dig into this in depth, but I can outline where to go from here.

If my vnop_lookup returns the correct vnode_t for the file being looked up, what causes ENOENT to be returned to open(2) in userspace?

First off, if you're not already, stop testing with standard command line tools and replicate this in your own code. One of the critical details here is exactly which syscall returned ENOENT. It's possible/likely that it was "open" but (as far as I can tell), you're assuming that based on the fact that vnop_open was the last vfs hook called. However, it's also possible that another syscall occurred and then returned ENOENT before it called into your driver (assuming it would have called into "you").

How do we track it down?

Once you're sure what syscall returned ENOENT, the next step is to dig into the source. The VFS system is largely open source, so it's basically a matter of starting from whatever known point you have and then following the logic until you find "ENOENT". As a starting point, vfs_subr.c is the entry point for most syscalls (including open).

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Any pointers would be greatly appreciated.

So, the basic thing to understand is that, by definition, the VFS is only operating on the information you give it and nothing else. Whatever is going on is happening because of data that came from your driver. Shifting to specifics, the place I would focus on here is on EXACTLY what occurred and what was returned at the three specific points you identified. Those would be:

(1) Your first working case:

On macOS Monterey for example, where the same code does result in hardlinks being accessible

(2) Your second working case:

A short time later though, both the original file and the hardlink are accessible.

(3) And the failure case:

on macOS Sequoia, so based on the diagnostics print statements I inserted into my KEXT, the following sequence of calls is observed:

My next move here would be to collect as much data as possible from those three cases*, including both the inputs and outputs.

*Depending on the work involved, I might just focus on #2 and #3, returning to #1 if those didn't yield results.

Related to that point, I think there's a common mistake you're making here that can really confuse investigating this sort of issue:

vnop_open(/) ; I expected to see vnop_open(hlf.txt) here instead of the parent directory.

You were NOT given "/" or any sort of path. Paths are an artificial construct that your VFS driver is responsible for inventing, NOT reality. You were given a vnode_t, because that's the reality your VFS driver actually interacts with. More importantly, that vnode_t also CAME from your VFS driver, so the only possibilities here are:

  1. The system has changed and is in fact calling open() on a vnode_t it retrieved at some earlier point. Note that this would also imply there are "more" syscalls after this point (I think this is unlikely, but you can rule this out by looking at #2).

  2. The system contains a catastrophic bug (I can't really rule this out, but I'm not aware of any such bug, and it seems unlikely given that everything else works fine).

  3. The vnode_t is in fact for the target file, but your logic is mapping it back to the wrong thing.

  4. The vnode_t is in fact for "/" because that's what you returned from vnop_lookup.

I think #4 is the most likely cause, but my real advice here is to focus on the data, instead of immediately trying to look for any specific cause. Everything that happens in VFS is driven by the data you're returning to the system, so the answer is almost certain to be in that data.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Thanks for your feedback and suggestions.

I double checked which vnode_t is returned by my vnop_lookup. It is indeed the one that references both the original file and the hardlink.

vnop_lookup: cat-1235
vnop_lookup: -> lookuprpc(/hlf.txt)
lookuprpc: cat-1235
lookuprpc: lookup successful entry exists
lookuprpc: -> cache_lookup(/hlf.txt)
lookuprpc: <- cache_lookup(/hlf.txt) -> -1 ;VFS_CACHE_HIT
vnop_lookup: <- vp fffffe2a45cf6b60 /hlf.txt

The vnop_open call that comes immediately after the previous vnop_lookup call is for the parent directory, not the file being returned by the previous lookup:

vnop_open: zsh-570
vnop_open: vnode_isdir( root) -> 1

With no open vnop_open calls made afterwards.

Here's a similar backtrace for the original file being looked up.

vnop_lookup: cat-1236
vnop_lookup: -> lookuprpc(/f.txt)
lookuprpc: cat-1236
lookuprpc: lookup successful entry exists
lookuprpc: -> cache_lookup(/f.txt)
lookuprpc: <- cache_lookup(/f.txt) -> -1 ;VFS_CACHE_HIT
vnop_lookup: <- vp fffffe2a45cf6b60 /f.txt

If my vnop_lookup returns the correct vnode_t for the file being looked up, what causes ENOENT to be returned to open(2) in userspace? How do we track it down?

Accepted Answer

So, I can't dig into this in depth, but I can outline where to go from here.

If my vnop_lookup returns the correct vnode_t for the file being looked up, what causes ENOENT to be returned to open(2) in userspace?

First off, if you're not already, stop testing with standard command line tools and replicate this in your own code. One of the critical details here is exactly which syscall returned ENOENT. It's possible/likely that it was "open" but (as far as I can tell), you're assuming that based on the fact that vnop_open was the last vfs hook called. However, it's also possible that another syscall occurred and then returned ENOENT before it called into your driver (assuming it would have called into "you").

How do we track it down?

Once you're sure what syscall returned ENOENT, the next step is to dig into the source. The VFS system is largely open source, so it's basically a matter of starting from whatever known point you have and then following the logic until you find "ENOENT". As a starting point, vfs_subr.c is the entry point for most syscalls (including open).

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Hardlinks reported as non-existing on macOS Sequoia for 3rd party FS
 
 
Q