Kernel panic in mac_label_verify()

Accessing a directory on my custom distributed filesystem results in a kernel panic.

According to the backtrace, the last function called before the panic is triggered is mac_label_verify().

See the backtrace file attached.

panic(cpu 4 caller 0xffffff80079b5293): Kernel trap at 0xffffff8007fafdf4, type 14=page fault, registers:
CR0: 0x000000008001003b, CR2: 0x00000000000000e8, CR3: 0x0000000360b400d6, CR4: 0x00000000003626e0
RAX: 0xffffff800a8f5290, RBX: 0xffffff800a8b0a3b, RCX: 0xffffff800a8f5290, RDX: 0x0000000000000000
RSP: 0xffffffba572afbf0, RBP: 0xffffffba572afbf0, RSI: 0x0000000000000000, RDI: 0x00000000000000e8
R8:  0x0000000000001001, R9:  0xffffff8008497ffc, R10: 0xffffff9a56870600, R11: 0xffffff80086ee180
R12: 0x0000000000000000, R13: 0x0000000000000002, R14: 0x00000000000000e8, R15: 0xffffff800a8f5230
RFL: 0x0000000000010246, RIP: 0xffffff8007fafdf4, CS:  0x0000000000000008, SS:  0x0000000000000000
Fault CR2: 0x00000000000000e8, Error code: 0x0000000000000000, Fault CPU: 0x4, PL: 0, VF: 0

Panicked task 0xffffff95893da038: 7 threads: pid 466: Finder
Backtrace (CPU 4), panicked thread: 0xffffff9a5576fb30, Frame : Return Address
0xffffffba572af5d0 : 0xffffff8007870c7d mach_kernel : _handle_debugger_trap + 0x4ad
0xffffffba572af620 : 0xffffff80079c52e4 mach_kernel : _kdp_i386_trap + 0x114
0xffffffba572af660 : 0xffffff80079b4df7 mach_kernel : _kernel_trap + 0x3b7
0xffffffba572af6b0 : 0xffffff8007811971 mach_kernel : _return_from_trap + 0xc1
0xffffffba572af6d0 : 0xffffff8007870f5d mach_kernel : _DebuggerTrapWithState + 0x5d
0xffffffba572af7c0 : 0xffffff8007870607 mach_kernel : _panic_trap_to_debugger + 0x1a7
0xffffffba572af820 : 0xffffff8007fdb8db mach_kernel : _panic + 0x84
0xffffffba572af910 : 0xffffff80079b5293 mach_kernel : _sync_iss_to_iks + 0x2c3
0xffffffba572afa90 : 0xffffff80079b4f7d mach_kernel : _kernel_trap + 0x53d
0xffffffba572afae0 : 0xffffff8007811971 mach_kernel : _return_from_trap + 0xc1
0xffffffba572afb00 : 0xffffff8007fafdf4 mach_kernel : _mac_label_verify + 0x4
0xffffffba572afbf0 : 0xffffff8007fbc833 mach_kernel : _mac_vnode_check_getattrlist + 0xb3
0xffffffba572afc50 : 0xffffff8007a16c50 mach_kernel : _fgetattrlist + 0x290
0xffffffba572afd20 : 0xffffff8007a19dd3 mach_kernel : _getattrlist + 0x153
0xffffffba572aff40 : 0xffffff8007e5b230 mach_kernel : _unix_syscall64 + 0x1e0
0xffffffba572affa0 : 0xffffff8007811db6 mach_kernel : _hndl_unix_scall64 + 0x16

Process name corresponding to current thread (0xffffff9a5576fb30): Finder
Boot args: keepsyms=1 amfi_get_out_of_my_way=0x1

Mac OS version:
22H123

Kernel version:
Darwin Kernel Version 22.6.0: Wed Jul 31 21:42:48 PDT 2024; root:xnu-8796.141.3.707.4~1/RELEASE_X86_64
Kernel UUID: 15D2842B-072B-3795-BE93-B063FF3D4AA0
roots installed: 0
KernelCache slide: 0x0000000007400000
KernelCache base:  0xffffff8007600000
Kernel slide:      0x00000000074dc000
Kernel text base:  0xffffff80076dc000
__HIB  text base: 0xffffff8007500000
System model name: MacBookPro14,3 (Mac-551B86E5744E2388)
System shutdown begun: NO
Panic diags file available: YES (0x0)
Hibernation exit count: 0

System uptime in nanoseconds: 103011182653
Last Sleep:           absolute           base_tsc          base_nano
  Uptime  : 0x00000017fbf1ec78
  Sleep   : 0x0000000000000000 0x0000000000000000 0x0000000000000000
  Wake    : 0x0000000000000000 0x0000000a84213378 0x0000000000000000
Compressor Info: 0% of compressed pages limit (OK) and 0% of segments limit (OK) with 0 swapfiles and OK swap space
Zone info:
  Zone map: 0xffffff8a55f2f000 - 0xffffffaa55f2f000
  . PGZ   : 0xffffff8a55f2f000 - 0xffffff8a57f30000
  . VM    : 0xffffff8a57f30000 - 0xffffff8f2472f000
  . RO    : 0xffffff8f2472f000 - 0xffffff90bdf2f000
  . GEN0  : 0xffffff90bdf2f000 - 0xffffff958a72f000
  . GEN1  : 0xffffff958a72f000 - 0xffffff9a56f2f000
  . GEN2  : 0xffffff9a56f2f000 - 0xffffff9f2372f000
  . GEN3  : 0xffffff9f2372f000 - 0xffffffa3eff2f000
  . DATA  : 0xffffffa3eff2f000 - 0xffffffaa55f2f000
  Metadata: 0xffffff8a1ad1d000 - 0xffffff8a3ad1d000
  Bitmaps : 0xffffff8a3ad1d000 - 0xffffff8a3dd1d000
  Extra   : 0 - 0

The panic manifests itself given the following conditions:

  1. Machine-a: make a directory in Finder.
  2. Machine-b: remove the directory created on machine-a in Finder.
  3. Machine-a: access the directory removed on machine-b in Finder. Kernel panic ensues.

The panic is reproducible on both Apple Silicon and x86-64.

The backtrace is for x86-64 as I wasn't able to symbolicate it on Apple Silicon.

Not sure how to tackle this one.

Any pointers would be much appreciated.

Answered by DTS Engineer in 817030022

Regrettably, VNOP_MONITOR is off limits to 3rd party filesystems, as it's a XNU_KERNEL_PRIVATE symbol.

No, or at least not exactly.

You're right that "VNOP_MONITOR" is marked XNU_KERNEL_PRIVATE, but that's because most of the "VNOP_*" defines are marked that way. That includes basic (and critical) operations line "VNOP_OPEN". That's because the VFS API is structured and documented as:

-The "VNOP_*" functions are the functions the VFS system itself "calls" into. For example, here is a call to VNOP_MONITOR inside the VFS implementation.

-Every "VNOP_*" function is actually implemented as a wrapper function to a function pointer your VFS driver provides. Here's the implementation of VNOP_MONITOR inside kpi_vfs.c.

-Every "VNOP_*" function has a corresponding argument structure and descriptor, which are what your vfs driver actually implement. You can find them just above the entry for the "VNOP_" definition.

-Those are what your vfs actually implement. Here's the structure entry in smbfs and the actual function call that implements the actual vnop.

The basic structure above is how most of the VNOP functions are marked and with the (very) OCCASIONAL exceptions being VNOPs that are considered "useful" outside of the vfs layers internal implementation.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

First off, a quick note on this point:

I wasn't able to symbolicate it on Apple Silicon.

There's a difference in the load location of KEXTs that our current tools don't account for, but this forum post shows how you can account for that.

Moving to the panic itself:

0xffffffba572afbf0 : 0xffffff8007fbc833 mach_kernel : _mac_vnode_check_getattrlist + 0xb3

The opensource code for this is in mac_vfs.c. That code calls into mac_vnode_label() which dereferences the label field of the vnode and then calls mac_label_verify().

0xffffffba572afb00 : 0xffffff8007fafdf4 mach_kernel : _mac_label_verify + 0x4

The opensource code for this is in mac_label.c and your panic'ing on the first line in the function when you dereference the label argument.

struct label *
mac_label_verify(struct label **labelp)
{
	struct label *label = *labelp;

Basically, you're dealing with some kind of memory corruption.

Moving to you reproduction steps:

  1. Machine-a: make a directory in Finder.
  2. Machine-b: remove the directory created on machine-a in Finder.
  3. Machine-a: access the directory removed on machine-b in Finder. Kernel panic ensues.

In vfs terms, a valid vnode existed at #1 (since that's how the directory was created) and somewhere between #2 and #3 your code failed to properly manage that vnode, damaging it's label, which then caused the panic you're seeing. Note that I don't think the Finder itself is relevant here, as I'd expect you to see exactly the same file if directly created a directory (#1), removed it on the remote machine (#2), and then called getattrlist (#3) on it. I suspect the key issue here is actually that the vnode from #1 is still in the cache, not the specifics of how it's manipulated.

The next step here is to look very closely at exactly what #2 "does". Some suggestions on that:

  • Start by simply inspecting the full code "flow" between 2-3 looking for any problems. Sometimes a basic review with a narrower focus is enough to get you to the problem, and a bit of luck can save a tone of time.

If that doesn't work and you need to dig into the issue more deeply:

  • Verify you understand the "flow" here correctly. For example, my theory assumes that the vnode from #1 is the same node as #3, but I haven't proven that. Take the time to print out the vnode and "prove" these details. It's very easy to end up wasting a lot investigation time because you've started with a set of assumptions about your codes state that are simply wrong.

  • How did Machine-a "discover" that the directory was gone? Did Machine-a proactively "inform" Machine-a of the removal or was there an earlier access where it the file was determined to be "gone"?

  • What did Machine-a actually "do" (particularly to the vnode) when it was told the directory was gone?

  • Print debugging is a critical tool here. You know that the vnode was modified and you know what field was modified. In theory, if you added a check that compared the field value at entry and exit, then the first function that changed that value would be the point the failure started.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Oops...

Yes, the correct forum post is:

https://developer.apple.com/forums/thread/762661

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

My filesystem's vnop_rmdir calls cache_purge() and vnode_recycle() on RPC returning success.

The v_label associated with the vnode of the directory being removed is set to NULL and free()'d via the vnode_recycle() call path: vnode_recycle()->vnode_drop_and_unlock()->vnode_drop_internal():

vnode_drop_internal:

vp->v_label = NULL;

if (tmpl) { mac_vnode_label_free(tmpl);

}

So, client A, residing on machine-a, is still referencing the vnode that just got recycled by client-B residing on machine-b.

Seeing that mac_vnode_check_getattrlist() gets called before vnode_getattr() is, which calls my filesystem's vnop_getattr(), how do I make client A, and any other machine with a mount point to my filesystem, become aware of vnodes getting recycled by other clients on the grid?

I've observed the following behaviour both in Finder and on command line with respect to my initial query.

In Finder, I have to switch between windows for vnop_getattr() to be called on the parent directory and the filesystem changes made on a different machine to be picked up.

On command line, running % ls -ld /parent/directory results in vnop_getattr() to be called and the changes to be picked up.

The same behaviour is observed when, e.g., writing files. The file size doesn't get updated in real time. I have to switch between windows for file size changes to be picked up.

What am I missing? Thanks.

Accepted Answer

First off, I think it's important to clarify the vocabulary here. I'm borrowing the vocabulary below from MFSLives and I would strongly recommend that you download and review it closely, particularly "HashNode.h".

In any case, it's built around the idea that there are three different components any VFS driver interacts with:

  1. The "disk", meaning the actual underlying "static" data. In a standard block storage file system, these are the blocks that are actually written to physical media, in a network or distributed file system they're something else.

  2. FSNodes, meaning the data your VFS driver has in memory about it's "native" format.

  3. vnodes, meaning the data your VFS driver has is currently "sharing" with the larger system.

Note that within this framework:

  • Every vnode is backed by an FSNode, since the FSNode is both the source of any data the vnode contains AND (in most case) the mechanism would use to push data back to disk.

  • Every FSNode does not (necessarily) have a corresponding vnode. Your file system while often acquire information from the disk that isn't (currently) needed by any vnode. Similarly, the system may recycle a vnode while you still have change which haven't been pushed to disk.

  • The fundamental "pipeline" your driver is built around is the process of exporting data from disk-> FSNode-> vnode and committing data "back" from vnode-> FSNode-> disk.

This architecture is important, because it prevents you from falling into traps like this:

So, client A, residing on machine-a, is still referencing the vnode that just got recycled by client-B residing on machine-b.

Modifying a vnode doesn't actually generate any "fundamental" change. Modifications to vnodes modify FSNodes, but it's your job to decide what the actually means. In a classic file sharing system, you basically have something like this:

  • Client vnode modifications modify the FSNodes of the local client.

  • The server operates as the central disk/truth, receiving changes from each client and exporting changes out to each client.

  • How the serer exports those changes are ENTIRELY up to it. You asked:

how do I make client A, and any other machine with a mount point to my filesystem, become aware of vnodes getting recycled by other clients on the grid?

...and the answer is basically "you do this however you want to do this". You need to decide how your file system is going to behave and then implement that logic. When you see issues like this:

In Finder, I have to switch between windows for vnop_getattr() to be called on the parent directory and the filesystem changes made on a different machine to be picked up.

The answer is simply that you never told the system anything had changed, so nothing changed. Within the VFS API itself, the "vnode_notify()" is how your driver notifies the system that a vnode has changed. In remote file system, that's normally combined with VNOP_MONITOR so your file system can restrict what it's actually monitoring for changes.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Thanks for a very detailed response.

Regrettably, VNOP_MONITOR is off limits to 3rd party filesystems, as it's a XNU_KERNEL_PRIVATE symbol.

I'll have to find another way of dealing with this. Thanks.

Regrettably, VNOP_MONITOR is off limits to 3rd party filesystems, as it's a XNU_KERNEL_PRIVATE symbol.

No, or at least not exactly.

You're right that "VNOP_MONITOR" is marked XNU_KERNEL_PRIVATE, but that's because most of the "VNOP_*" defines are marked that way. That includes basic (and critical) operations line "VNOP_OPEN". That's because the VFS API is structured and documented as:

-The "VNOP_*" functions are the functions the VFS system itself "calls" into. For example, here is a call to VNOP_MONITOR inside the VFS implementation.

-Every "VNOP_*" function is actually implemented as a wrapper function to a function pointer your VFS driver provides. Here's the implementation of VNOP_MONITOR inside kpi_vfs.c.

-Every "VNOP_*" function has a corresponding argument structure and descriptor, which are what your vfs driver actually implement. You can find them just above the entry for the "VNOP_" definition.

-Those are what your vfs actually implement. Here's the structure entry in smbfs and the actual function call that implements the actual vnop.

The basic structure above is how most of the VNOP functions are marked and with the (very) OCCASIONAL exceptions being VNOPs that are considered "useful" outside of the vfs layers internal implementation.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

vnode_if.h distributed with Kernel.framework does not define struct vnop_monitor_args:

% grep -c 'struct vnop_monitor_args' `xcrun --show-sdk-path`/System/Library/Frameworks/Kernel.framework/Versions/A/Headers/sys/vnode_if.h
0

Because it's defined as a symbol that is private to kernel. Even if I downloaded vnode_if.h from the kernel sources, I wouldn't be able to link against com.apple.kpi.private, would I?

All other vnop_name_args structures that aren't defined as kernel private are available in vnode_if.h distributed with Kernel.framework:

% awk '/^struct vnop_.*_args/{print $2}' `xcrun --show-sdk-path`/System/Library/Frameworks/Kernel.framework/Versions/A/Headers/sys/vnode_if.h | sort
vnop_access_args
vnop_advlock_args
vnop_allocate_args
vnop_blktooff_args
vnop_blockmap_args
vnop_bwrite_args
vnop_clonefile_args
vnop_close_args
vnop_copyfile_args
vnop_create_args
vnop_exchange_args
vnop_fsync_args
vnop_getattr_args
vnop_getattrlistbulk_args
vnop_getnamedstream_args
vnop_getxattr_args
vnop_inactive_args
vnop_ioctl_args
vnop_kqfilt_add_args
vnop_kqfilt_remove_args
vnop_link_args
vnop_listxattr_args
vnop_lookup_args
vnop_makenamedstream_args
vnop_mkdir_args
vnop_mknod_args
vnop_mmap_args
vnop_mmap_check_args
vnop_mnomap_args
vnop_offtoblk_args
vnop_open_args
vnop_pagein_args
vnop_pageout_args
vnop_pathconf_args
vnop_read_args
vnop_readdir_args
vnop_readdirattr_args
vnop_readlink_args
vnop_reclaim_args
vnop_remove_args
vnop_removenamedstream_args
vnop_removexattr_args
vnop_rename_args
vnop_renamex_args
vnop_revoke_args
vnop_rmdir_args
vnop_searchfs_args
vnop_select_args
vnop_setattr_args
vnop_setlabel_args
vnop_setxattr_args
vnop_strategy_args
vnop_symlink_args
vnop_verify_args
vnop_whiteout_args
vnop_write_args

Samba client is Apple's own implementation, and therefore private symbols are available to it. Which is not the case with the filesystem I'm implementing.

Or were you suggesting that there is a way to access those symbols from 3rd party filesystems?

vnode_if.h distributed with Kernel.framework does not define struct vnop_monitor_args:

Huh. Please file a bug on that, as I don't really see any clear reason why it wasn't included in the public set.

Or were you suggesting that there is a way to access those symbols from 3rd party filesystems?

No, you shouldn't use it unless it's public.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Bug report FB16082106 submitted. Thanks.

With VNOP_MONITOR() still unavailable I thought I'd explore other alternatives.

Passing VFS_TBLNOMACLABEL into vfs_fsadd() has revealed that the issue was with the vnode being NULL when accessing directories that were removed on other client machines.

Further debugging has confirmed that the vnode was being set to NULL via a call to cache_enter() with cn_flags set to MAKEENTRY in vnop_lookup() on receiving a response from the server saying that the directory was no longer there.

Dtrace(1)'ing vnode's mac labels showed that for my filesystem they were set to NULL at all times whether or not VFS_TBLNOMACLABEL was set.

My filesystem being distributed, I have the benefit of querying the server about the presence of files or directories in my vnop_lookup().

What I attempted next is modify my lookup algorithm to remove the local vnode that represents a file or directory that the server reports as no longer existing, i.e. removed by another client machine.

Below is a pseudocode for the part of my lookup algorithm that deals with negative server responses.

ret = query_server(item);
if(ret==ENOENT){
    if(ISSET(cnp->cn_flags, ISLASTCN)){
        err = cache_lookup(cnp);
        if(err==CACHE_HIT){ /*item rm'd elsewhere, remove local vnode*/
            vn_revoke(vp);
            cache_purge(dvp);
            vnode_update_identity(VNODE_UPDATE_PURGE);
        }
        else if(err==ENOENT)
            cache_purge_negatives(dvp);
        else if(cnp->cn_nameiop==CREATE||cnp->cn_nameiop==RENAME)
           ret = EJUSTRETURN;
        else if(ISSET(cnp->cn_flags, MAKEENTRY))
           cache_enter(dvp, NULLVP, cnp);
   }
}

vn_revoke() ends up calling vnop_reclaim(), which in turn does the following:

remove_np_from_hashtable();
vnode_removefsref(vp);
vnode_clearfsnode(vp);
cache_purge(vp);

The above does prevent kernel panics from happening when clicking on files or directories that were removed elsewhere in Finder. On stepping back to the parent directory, the removed item is no longer displayed.

However, it's still showing in the terminal when queried with ls(1) or stat(1).

I've instrumented relevant parts of code with debugging print statements, but I still don't understand why this is happening.

Any pointers on what I'm missing would be greatly appreciated.

My filesystem being distributed, I have the benefit of querying the server about the presence of files or directories in my vnop_lookup().

As a side comment, I think this sort of thinking is inherently dangerous in the context of a network file system. The problem with any (or at least "most") distributed file system is that:

  1. Multiple clients are interacting with the same data at the same time.

  2. Those clients have very limited (if any) visibility into each other activity.

  3. The communication between server and client is very high latency compared to any local file system.

Taken together, that means that any given client can never ACTUALLY "know" the current state of any fs object, only the "past" state of that object at the point it happened to aks. In concrete terms, it's entirely possible that fs object "X" has ALREADY been deleted by the time your client receives the response back from the server telling it about fs object "X". This means that your primary focus shouldn't be about your VFS driver fetching data in response to specific events (like vnop_lookup) but should actually be about how your client finds out about and processes changes that have happened to vnode object that are already active.

The diabolic part here is that this dynamic only matters once the file system is actually under "load" and multiple clients are generating changes to the common data set. That makes it very easy to build something that works fun under basic testing and falls apart completely under real world usage.

What I attempted next is modify my lookup algorithm to remove the local vnode that represents a file or directory that the server reports as no longer existing, i.e. removed by another client machine.

Below is a pseudocode for the part of my lookup algorithm that deals with negative server responses.

I think you've overlooked a lot of complexity here. For example, Unix semantics allow I/O to deleted files, which means you can't actually clear our your vnode until it's been completely closed.

I've instrumented relevant parts of code with debugging print statements, but I still don't understand why this is happening.

In terms of seeing different behavior from similar operations, this is almost always caused by differences in syscall usage exposing different behavior between different VNOPs. Breaking this down with the most "direct" cases in your example:

Case 1, running "stat" in Terminal:

However, it's still showing in the terminal when queried with ls(1) or stat(1).

The "stat" syscall will even eventually call into VNOP_GETATTR against a specific object, so there is basically a 1 to 1 relationship between stat and VNOP_GETATTR.

Case 2, the Finder:

The above does prevent kernel panics from happening when clicking on files or directories that were removed elsewhere in Finder. On stepping back to the parent directory, the removed item is no longer displayed.

As a general rule, our higher level APIs (which the Finder was then built on) fairly actively avoid using stat/VNOP_GETATTR in favor of getattrlistbulk/VNOP_GETATTRLISTBULK (and/or possibly getdirentriesattr/VNOP_READDIRATTR).

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Thanks very much for your response.

I did manage to get it to work by purging the local name cache as well as the hash table where FSNodes are being kept in vnop_getattr() on receiving a notification from the server of a directory entry no longer existing.

I do appreciate that it's not an ideal solution though for the reasons you described.

Apple have made VNOP_MONITOR() available as a public symbol in macOS 15.4 beta in response to the bug report I submitted, FB16082106.

Is there anywhere I could look for VNOP_MONITOR()+vnode_notify() usage examples other than the SMBClient and Darwin/XNU sources as it's not immediately clear how remote filesystem changes are propagated across all connected clients through the use of those functions?

Apple have made VNOP_MONITOR() available as a public symbol in macOS 15.4 beta in response to the bug report I submitted, FB16082106.

Yep. It doesn't always happen, but the bug process can work.

Is there anywhere I could look for VNOP_MONITOR()+vnode_notify() usage examples other than the SMBClient and Darwin/XNU sources as it's not immediately clear how remote filesystem changes are propagated across all connected clients through the use of those functions?

A few different answers:

  • In terms of fs drivers VNOP_MONITOR used by NFS and afp as well as smb, but I'm not sure that will show you anything fundamentally different.

  • In terms of what vnode_notify() actually "does", the vfs system automatically generates* FSEvent's based on it's own interactions with the VFS driver. vnode_notify() lets a vfs driver manually trigger that process.

  • The basic "flow" here is that VNOP_MONITOR is how the vfs system tells your driver what the larger system is currently "interested" in and vnode_notify() is how your driver tells the system about whatever changes you want to share.

  • Note that there isn't necessarily a direct connection between the two APIs. That is, you can call vnode_notify() on directories that aren't being monitored and there isn't anyway for the system to "know" that a VNOP_MONITOR'd location had changes it wasn't told about. As a concrete example, something like a high performance SAN file system might ignore VNOP_MONITOR but vnode_notify() "everything", since that's effectively how local file systems work.

*That's all that's required for a standard block file system (HFS+/APFS/FAT/etc..) because, by definition, all of the modifications to the file system either came through the VFS system or don't generate FSEvent's (for example, APFS snapshots).

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Kernel panic in mac_label_verify()
 
 
Q