Kernel panic in mac_label_verify()

Accessing a directory on my custom distributed filesystem results in a kernel panic.

According to the backtrace, the last function called before the panic is triggered is mac_label_verify().

See the backtrace file attached.

The panic manifests itself given the following conditions:

  1. Machine-a: make a directory in Finder.
  2. Machine-b: remove the directory created on machine-a in Finder.
  3. Machine-a: access the directory removed on machine-b in Finder. Kernel panic ensues.

The panic is reproducible on both Apple Silicon and x86-64.

The backtrace is for x86-64 as I wasn't able to symbolicate it on Apple Silicon.

Not sure how to tackle this one.

Any pointers would be much appreciated.

First off, a quick note on this point:

I wasn't able to symbolicate it on Apple Silicon.

There's a difference in the load location of KEXTs that our current tools don't account for, but this forum post shows how you can account for that.

Moving to the panic itself:

0xffffffba572afbf0 : 0xffffff8007fbc833 mach_kernel : _mac_vnode_check_getattrlist + 0xb3

The opensource code for this is in mac_vfs.c. That code calls into mac_vnode_label() which dereferences the label field of the vnode and then calls mac_label_verify().

0xffffffba572afb00 : 0xffffff8007fafdf4 mach_kernel : _mac_label_verify + 0x4

The opensource code for this is in mac_label.c and your panic'ing on the first line in the function when you dereference the label argument.

struct label *
mac_label_verify(struct label **labelp)
{
	struct label *label = *labelp;

Basically, you're dealing with some kind of memory corruption.

Moving to you reproduction steps:

  1. Machine-a: make a directory in Finder.
  2. Machine-b: remove the directory created on machine-a in Finder.
  3. Machine-a: access the directory removed on machine-b in Finder. Kernel panic ensues.

In vfs terms, a valid vnode existed at #1 (since that's how the directory was created) and somewhere between #2 and #3 your code failed to properly manage that vnode, damaging it's label, which then caused the panic you're seeing. Note that I don't think the Finder itself is relevant here, as I'd expect you to see exactly the same file if directly created a directory (#1), removed it on the remote machine (#2), and then called getattrlist (#3) on it. I suspect the key issue here is actually that the vnode from #1 is still in the cache, not the specifics of how it's manipulated.

The next step here is to look very closely at exactly what #2 "does". Some suggestions on that:

  • Start by simply inspecting the full code "flow" between 2-3 looking for any problems. Sometimes a basic review with a narrower focus is enough to get you to the problem, and a bit of luck can save a tone of time.

If that doesn't work and you need to dig into the issue more deeply:

  • Verify you understand the "flow" here correctly. For example, my theory assumes that the vnode from #1 is the same node as #3, but I haven't proven that. Take the time to print out the vnode and "prove" these details. It's very easy to end up wasting a lot investigation time because you've started with a set of assumptions about your codes state that are simply wrong.

  • How did Machine-a "discover" that the directory was gone? Did Machine-a proactively "inform" Machine-a of the removal or was there an earlier access where it the file was determined to be "gone"?

  • What did Machine-a actually "do" (particularly to the vnode) when it was told the directory was gone?

  • Print debugging is a critical tool here. You know that the vnode was modified and you know what field was modified. In theory, if you added a check that compared the field value at entry and exit, then the first function that changed that value would be the point the failure started.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Oops...

Yes, the correct forum post is:

https://developer.apple.com/forums/thread/762661

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Kernel panic in mac_label_verify()
 
 
Q