Kernel panic in mac_label_verify()

Accessing a directory on my custom distributed filesystem results in a kernel panic.

According to the backtrace, the last function called before the panic is triggered is mac_label_verify().

See the backtrace file attached.

The panic manifests itself given the following conditions:

  1. Machine-a: make a directory in Finder.
  2. Machine-b: remove the directory created on machine-a in Finder.
  3. Machine-a: access the directory removed on machine-b in Finder. Kernel panic ensues.

The panic is reproducible on both Apple Silicon and x86-64.

The backtrace is for x86-64 as I wasn't able to symbolicate it on Apple Silicon.

Not sure how to tackle this one.

Any pointers would be much appreciated.

Answered by DTS Engineer in 817030022

Regrettably, VNOP_MONITOR is off limits to 3rd party filesystems, as it's a XNU_KERNEL_PRIVATE symbol.

No, or at least not exactly.

You're right that "VNOP_MONITOR" is marked XNU_KERNEL_PRIVATE, but that's because most of the "VNOP_*" defines are marked that way. That includes basic (and critical) operations line "VNOP_OPEN". That's because the VFS API is structured and documented as:

-The "VNOP_*" functions are the functions the VFS system itself "calls" into. For example, here is a call to VNOP_MONITOR inside the VFS implementation.

-Every "VNOP_*" function is actually implemented as a wrapper function to a function pointer your VFS driver provides. Here's the implementation of VNOP_MONITOR inside kpi_vfs.c.

-Every "VNOP_*" function has a corresponding argument structure and descriptor, which are what your vfs driver actually implement. You can find them just above the entry for the "VNOP_" definition.

-Those are what your vfs actually implement. Here's the structure entry in smbfs and the actual function call that implements the actual vnop.

The basic structure above is how most of the VNOP functions are marked and with the (very) OCCASIONAL exceptions being VNOPs that are considered "useful" outside of the vfs layers internal implementation.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

First off, a quick note on this point:

I wasn't able to symbolicate it on Apple Silicon.

There's a difference in the load location of KEXTs that our current tools don't account for, but this forum post shows how you can account for that.

Moving to the panic itself:

0xffffffba572afbf0 : 0xffffff8007fbc833 mach_kernel : _mac_vnode_check_getattrlist + 0xb3

The opensource code for this is in mac_vfs.c. That code calls into mac_vnode_label() which dereferences the label field of the vnode and then calls mac_label_verify().

0xffffffba572afb00 : 0xffffff8007fafdf4 mach_kernel : _mac_label_verify + 0x4

The opensource code for this is in mac_label.c and your panic'ing on the first line in the function when you dereference the label argument.

struct label *
mac_label_verify(struct label **labelp)
{
	struct label *label = *labelp;

Basically, you're dealing with some kind of memory corruption.

Moving to you reproduction steps:

  1. Machine-a: make a directory in Finder.
  2. Machine-b: remove the directory created on machine-a in Finder.
  3. Machine-a: access the directory removed on machine-b in Finder. Kernel panic ensues.

In vfs terms, a valid vnode existed at #1 (since that's how the directory was created) and somewhere between #2 and #3 your code failed to properly manage that vnode, damaging it's label, which then caused the panic you're seeing. Note that I don't think the Finder itself is relevant here, as I'd expect you to see exactly the same file if directly created a directory (#1), removed it on the remote machine (#2), and then called getattrlist (#3) on it. I suspect the key issue here is actually that the vnode from #1 is still in the cache, not the specifics of how it's manipulated.

The next step here is to look very closely at exactly what #2 "does". Some suggestions on that:

  • Start by simply inspecting the full code "flow" between 2-3 looking for any problems. Sometimes a basic review with a narrower focus is enough to get you to the problem, and a bit of luck can save a tone of time.

If that doesn't work and you need to dig into the issue more deeply:

  • Verify you understand the "flow" here correctly. For example, my theory assumes that the vnode from #1 is the same node as #3, but I haven't proven that. Take the time to print out the vnode and "prove" these details. It's very easy to end up wasting a lot investigation time because you've started with a set of assumptions about your codes state that are simply wrong.

  • How did Machine-a "discover" that the directory was gone? Did Machine-a proactively "inform" Machine-a of the removal or was there an earlier access where it the file was determined to be "gone"?

  • What did Machine-a actually "do" (particularly to the vnode) when it was told the directory was gone?

  • Print debugging is a critical tool here. You know that the vnode was modified and you know what field was modified. In theory, if you added a check that compared the field value at entry and exit, then the first function that changed that value would be the point the failure started.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Oops...

Yes, the correct forum post is:

https://developer.apple.com/forums/thread/762661

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

My filesystem's vnop_rmdir calls cache_purge() and vnode_recycle() on RPC returning success.

The v_label associated with the vnode of the directory being removed is set to NULL and free()'d via the vnode_recycle() call path: vnode_recycle()->vnode_drop_and_unlock()->vnode_drop_internal():

vnode_drop_internal:

vp->v_label = NULL;

if (tmpl) { mac_vnode_label_free(tmpl);

}

So, client A, residing on machine-a, is still referencing the vnode that just got recycled by client-B residing on machine-b.

Seeing that mac_vnode_check_getattrlist() gets called before vnode_getattr() is, which calls my filesystem's vnop_getattr(), how do I make client A, and any other machine with a mount point to my filesystem, become aware of vnodes getting recycled by other clients on the grid?

I've observed the following behaviour both in Finder and on command line with respect to my initial query.

In Finder, I have to switch between windows for vnop_getattr() to be called on the parent directory and the filesystem changes made on a different machine to be picked up.

On command line, running % ls -ld /parent/directory results in vnop_getattr() to be called and the changes to be picked up.

The same behaviour is observed when, e.g., writing files. The file size doesn't get updated in real time. I have to switch between windows for file size changes to be picked up.

What am I missing? Thanks.

Accepted Answer

First off, I think it's important to clarify the vocabulary here. I'm borrowing the vocabulary below from MFSLives and I would strongly recommend that you download and review it closely, particularly "HashNode.h".

In any case, it's built around the idea that there are three different components any VFS driver interacts with:

  1. The "disk", meaning the actual underlying "static" data. In a standard block storage file system, these are the blocks that are actually written to physical media, in a network or distributed file system they're something else.

  2. FSNodes, meaning the data your VFS driver has in memory about it's "native" format.

  3. vnodes, meaning the data your VFS driver has is currently "sharing" with the larger system.

Note that within this framework:

  • Every vnode is backed by an FSNode, since the FSNode is both the source of any data the vnode contains AND (in most case) the mechanism would use to push data back to disk.

  • Every FSNode does not (necessarily) have a corresponding vnode. Your file system while often acquire information from the disk that isn't (currently) needed by any vnode. Similarly, the system may recycle a vnode while you still have change which haven't been pushed to disk.

  • The fundamental "pipeline" your driver is built around is the process of exporting data from disk-> FSNode-> vnode and committing data "back" from vnode-> FSNode-> disk.

This architecture is important, because it prevents you from falling into traps like this:

So, client A, residing on machine-a, is still referencing the vnode that just got recycled by client-B residing on machine-b.

Modifying a vnode doesn't actually generate any "fundamental" change. Modifications to vnodes modify FSNodes, but it's your job to decide what the actually means. In a classic file sharing system, you basically have something like this:

  • Client vnode modifications modify the FSNodes of the local client.

  • The server operates as the central disk/truth, receiving changes from each client and exporting changes out to each client.

  • How the serer exports those changes are ENTIRELY up to it. You asked:

how do I make client A, and any other machine with a mount point to my filesystem, become aware of vnodes getting recycled by other clients on the grid?

...and the answer is basically "you do this however you want to do this". You need to decide how your file system is going to behave and then implement that logic. When you see issues like this:

In Finder, I have to switch between windows for vnop_getattr() to be called on the parent directory and the filesystem changes made on a different machine to be picked up.

The answer is simply that you never told the system anything had changed, so nothing changed. Within the VFS API itself, the "vnode_notify()" is how your driver notifies the system that a vnode has changed. In remote file system, that's normally combined with VNOP_MONITOR so your file system can restrict what it's actually monitoring for changes.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Thanks for a very detailed response.

Regrettably, VNOP_MONITOR is off limits to 3rd party filesystems, as it's a XNU_KERNEL_PRIVATE symbol.

I'll have to find another way of dealing with this. Thanks.

Regrettably, VNOP_MONITOR is off limits to 3rd party filesystems, as it's a XNU_KERNEL_PRIVATE symbol.

No, or at least not exactly.

You're right that "VNOP_MONITOR" is marked XNU_KERNEL_PRIVATE, but that's because most of the "VNOP_*" defines are marked that way. That includes basic (and critical) operations line "VNOP_OPEN". That's because the VFS API is structured and documented as:

-The "VNOP_*" functions are the functions the VFS system itself "calls" into. For example, here is a call to VNOP_MONITOR inside the VFS implementation.

-Every "VNOP_*" function is actually implemented as a wrapper function to a function pointer your VFS driver provides. Here's the implementation of VNOP_MONITOR inside kpi_vfs.c.

-Every "VNOP_*" function has a corresponding argument structure and descriptor, which are what your vfs driver actually implement. You can find them just above the entry for the "VNOP_" definition.

-Those are what your vfs actually implement. Here's the structure entry in smbfs and the actual function call that implements the actual vnop.

The basic structure above is how most of the VNOP functions are marked and with the (very) OCCASIONAL exceptions being VNOPs that are considered "useful" outside of the vfs layers internal implementation.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

vnode_if.h distributed with Kernel.framework does not define struct vnop_monitor_args:

% grep -c 'struct vnop_monitor_args' `xcrun --show-sdk-path`/System/Library/Frameworks/Kernel.framework/Versions/A/Headers/sys/vnode_if.h
0

Because it's defined as a symbol that is private to kernel. Even if I downloaded vnode_if.h from the kernel sources, I wouldn't be able to link against com.apple.kpi.private, would I?

All other vnop_name_args structures that aren't defined as kernel private are available in vnode_if.h distributed with Kernel.framework:

% awk '/^struct vnop_.*_args/{print $2}' `xcrun --show-sdk-path`/System/Library/Frameworks/Kernel.framework/Versions/A/Headers/sys/vnode_if.h | sort
vnop_access_args
vnop_advlock_args
vnop_allocate_args
vnop_blktooff_args
vnop_blockmap_args
vnop_bwrite_args
vnop_clonefile_args
vnop_close_args
vnop_copyfile_args
vnop_create_args
vnop_exchange_args
vnop_fsync_args
vnop_getattr_args
vnop_getattrlistbulk_args
vnop_getnamedstream_args
vnop_getxattr_args
vnop_inactive_args
vnop_ioctl_args
vnop_kqfilt_add_args
vnop_kqfilt_remove_args
vnop_link_args
vnop_listxattr_args
vnop_lookup_args
vnop_makenamedstream_args
vnop_mkdir_args
vnop_mknod_args
vnop_mmap_args
vnop_mmap_check_args
vnop_mnomap_args
vnop_offtoblk_args
vnop_open_args
vnop_pagein_args
vnop_pageout_args
vnop_pathconf_args
vnop_read_args
vnop_readdir_args
vnop_readdirattr_args
vnop_readlink_args
vnop_reclaim_args
vnop_remove_args
vnop_removenamedstream_args
vnop_removexattr_args
vnop_rename_args
vnop_renamex_args
vnop_revoke_args
vnop_rmdir_args
vnop_searchfs_args
vnop_select_args
vnop_setattr_args
vnop_setlabel_args
vnop_setxattr_args
vnop_strategy_args
vnop_symlink_args
vnop_verify_args
vnop_whiteout_args
vnop_write_args

Samba client is Apple's own implementation, and therefore private symbols are available to it. Which is not the case with the filesystem I'm implementing.

Or were you suggesting that there is a way to access those symbols from 3rd party filesystems?

vnode_if.h distributed with Kernel.framework does not define struct vnop_monitor_args:

Huh. Please file a bug on that, as I don't really see any clear reason why it wasn't included in the public set.

Or were you suggesting that there is a way to access those symbols from 3rd party filesystems?

No, you shouldn't use it unless it's public.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Bug report FB16082106 submitted. Thanks.

Kernel panic in mac_label_verify()
 
 
Q