Pinpointing dandling pointers in 3rd party KEXTs

I'm debugging the following kernel panic to do with my custom filesystem KEXT:

panic(cpu 0 caller 0xfffffe004cae3e24): [kalloc.type.var4.128]: element modified after free (off:96, val:0x00000000ffffffff, sz:128, ptr:0xfffffe2e7c639600)

My reading of this is that somewhere in my KEXT I'm holding a reference 0xfffffe2e7c639600 to a 128 byte zone that wrote 0x00000000ffffffff at offset 96 after that particular chunk of memory had been released and zeroed out by the kernel.

The panic itself is emitted when my KEXT requests the memory chunk that's been tempered with via the following set of calls.

zalloc_uaf_panic()

__abortlike
static void
zalloc_uaf_panic(zone_t z, uintptr_t elem, size_t size)
{
...
	(panic)("[%s%s]: element modified after free "
	"(off:%d, val:0x%016lx, sz:%d, ptr:%p)%s",
	zone_heap_name(z), zone_name(z),
	first_offs, first_bits, esize, (void *)elem, buf);
...
}

zalloc_validate_element()

static void
zalloc_validate_element(
	zone_t                  zone,
	vm_offset_t             elem,
	vm_size_t               size,
	zalloc_flags_t          flags)
{
...
	if (memcmp_zero_ptr_aligned((void *)elem, size)) {
		zalloc_uaf_panic(zone, elem, size);
	}
...
}

The panic is triggered if memcmp_zero_ptr_aligned(), which is implemented in assembly, detects that an n-sized chunk of memory has been written after being free'd.

/* memcmp_zero_ptr_aligned() checks string s of n bytes contains all zeros.
 * Address and size of the string s must be pointer-aligned.
 * Return 0 if true, 1 otherwise. Also return 0 if n is 0.
 */
extern int
memcmp_zero_ptr_aligned(const void *s, size_t n);

Normally, KASAN would be resorted to to aid with that. The KDK README states that KASAN kernels won't load on Apple Silicon. Attempting to follow the instructions given in the README for Intel-based machines does result in a failure for me on Apple Silicon.

I stumbled on the Pishi project. But the custom boot kernel collection that gets created doesn't have any of the KEXTs that were specified to kmutil(8) via the --explicit-only flag, so it can't be instrumented in Ghidra. Which is confirmed as well by running:

% kmutil inspect -B boot.kc.kasan
boot kernel collection at /Users/user/boot.kc.kasan
(AEB8F757-E770-8195-458D-B87CADCAB062):

Extension Information:

I'd appreciate any pointers on how to tackle UAFs in kernel space.

Answered by DTS Engineer in 856946022

Normally, KASAN would be resorted to to aid with that. The KDK README states that KASAN kernels won't load on Apple Silicon

After some discussion with the engineering team, it turns out that the information above isn't necessarily true. I haven't tested this and I can't provide formal directions on the process, but I believe the instructions at the end of this forum post do work:

https://kernelshaman.blogspot.com/2021/02/building-xnu-for-macos-112-intel-apple.html

Note that I'm only suggesting using the instructions at the end of that post to install our debug/kasan kernels from the KDK, not that you actual build XNU (the primary focus of that article).

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Normally, KASAN would be resorted to to aid with that. The KDK README states that KASAN kernels won't load on Apple Silicon. Attempting to follow the instructions given in the README for Intel-based machines does result in a failure for me on Apple Silicon.

Have you tried testing on Intel, either on real hardware or running in a VM?

Beyond that:

I'd appreciate any pointers on how to tackle UAFs in kernel space.

How reproducible is the issue? A lot of your options here depend on how much time/work it takes to create the failure and how "real world" the testing environment needs to be.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Have you tried testing on Intel, either on real hardware or running in a VM?

I haven't. I don't have access to an Intel-based Mac at the moment. And I don't have a VM readily available.

The following write-up[1] on instrumenting KEXTs claims that being able to load a KASAN kernel doesn't mean that one's own KEXT would get instrumented as well. Is that an accurate statement?

[1] https://r00tkitsmm.github.io/fuzzing/2025/04/10/Pishi2.html

How reproducible is the issue?

One issue is only reproducible when mounting my filesystem on an M4 Max CPU, but not on M1 or M2 machines I tried this on.

Another UAF issue is reproducible universally though which manifests itself when attempting to open multiple PDFs at the same time.

Normally, KASAN would be resorted to to aid with that. The KDK README states that KASAN kernels won't load on Apple Silicon

After some discussion with the engineering team, it turns out that the information above isn't necessarily true. I haven't tested this and I can't provide formal directions on the process, but I believe the instructions at the end of this forum post do work:

https://kernelshaman.blogspot.com/2021/02/building-xnu-for-macos-112-intel-apple.html

Note that I'm only suggesting using the instructions at the end of that post to install our debug/kasan kernels from the KDK, not that you actual build XNU (the primary focus of that article).

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Thanks for the link.

I was able to build a bootable Kext Collection as instructed in the post you referenced.

I was then able to boot into a KASAN instrumented kernel on my Apple Silicon machine.

On reproducing a kernel panic via a UAF I got the following symbolication, which I didn't find useful in identifying the source of a UAF in my KEXT.

% symbolicateKernelPanicBacktrace.sh ~/2025-09-24-114045.kernel.core.kasan.myfs.uninstrumented.log /System/Volumes/Data/Library/Developer/KDKs/KDK_12.5.1_21G83.kdk/System/Library/Kernels/kernel.kasan.t8101
ASCII text
panic(cpu 2 caller 0xfffffe0024926790): KASan: UaF of quarantined object 0xfffffe167506f880
handle_debugger_trap (in kernel.kasan.t8101) (debug.c:1431)
kdp_trap (in kernel.kasan.t8101) (kdp_machdep.c:363)
sleh_synchronous (in kernel.kasan.t8101) (sleh.c:854)
fleh_synchronous (in kernel.kasan.t8101) + 40
DebuggerTrapWithState (in kernel.kasan.t8101) (debug.c:662)
panic_trap_to_debugger (in kernel.kasan.t8101) (debug.c:1074)
Assert (in kernel.kasan.t8101) (debug.c:688)
ubsan_json_init.cold.1 (in kernel.kasan.t8101) (ubsan.c:0)
asan.module_ctor (in kernel.kasan.t8101) + 0
kasan_crash_report (in kernel.kasan.t8101) (kasan-report.c:136)
kasan_violation (in kernel.kasan.t8101) (kasan-report.c:0)
kasan_free_internal (in kernel.kasan.t8101) (kasan-classic.c:815)
kasan_free (in kernel.kasan.t8101) (kasan-classic.c:843)
kfree_zone (in kernel.kasan.t8101) (kalloc.c:2416)
kfree_ext (in kernel.kasan.t8101) (kalloc.c:0)
IOFree_internal (in kernel.kasan.t8101) (IOLib.cpp:360)
__asan_global_.str.102 (in kernel.kasan.t8101) + 8
nx_netif_na_txsync (in kernel.kasan.t8101) (nx_netif.c:1682)
netif_ring_tx_refill (in kernel.kasan.t8101) (nx_netif.c:4025)
nx_netif_na_txsync (in kernel.kasan.t8101) (nx_netif.c:1708)
netif_transmit (in kernel.kasan.t8101) (nx_netif.c:3770)
nx_netif_host_output (in kernel.kasan.t8101) (nx_netif_host.c:0)
dlil_output (in kernel.kasan.t8101) (dlil.c:6776)
ip_output_list (in kernel.kasan.t8101) (ip_output.c:1626)
tcp_ip_output (in kernel.kasan.t8101) (tcp_output.c:0)
tcp_output (in kernel.kasan.t8101) (tcp_output.c:2713)
tcp_input (in kernel.kasan.t8101) (tcp_input.c:0)
ip_proto_dispatch_in (in kernel.kasan.t8101) (ip_input.c:0)
ip_input (in kernel.kasan.t8101) (ip_input.c:0)
proto_input (in kernel.kasan.t8101) (kpi_protocol.c:0)
ether_inet_input (in kernel.kasan.t8101) (ether_inet_pr_module.c:221)
dlil_ifproto_input (in kernel.kasan.t8101) (dlil.c:5696)
dlil_input_packet_list_common (in kernel.kasan.t8101) (dlil.c:6121)
dlil_input_thread_cont (in kernel.kasan.t8101) (dlil.c:3169)
Call_continuation (in kernel.kasan.t8101) + 216

I also was able to instrument my KEXT as described in the Pishi project.

But the instrument_kext.py Ghidra script ended up garbling the LR register in my KC by overwriting the two most significant bytes of the address:

panic(cpu 7 caller 0xfffffe002d0984c0): Kernel data abort. at pc 0xfffffe002d46a9b0, lr 0xc8b2fe002d46a9ac (saved state: 0xfffffe3d227eed70)
	  x0:  0x0000000000447ed0 x1:  0xfffffe0030f6d1c0  x2:  0x0000000000000000  x3:  0xfffffe1017381410
	  x4:  0x00000000000000fd x5:  0x0000000000000000  x6:  0xfffffe002b81606c  x7:  0xfffffe3d227ee980
	  x8:  0x0000000000000000 x9:  0x64627135a6d70010  x10: 0x0000000000000005  x11: 0xfffffe1014d87e40
	  x12: 0xfffffe1014d7c000 x13: 0x0000000000000000  x14: 0x0000000000000000  x15: 0x0000000000000008
	  x16: 0x0000020077cba83c x17: 0xfffffe0030259920  x18: 0x0000000000000000  x19: 0x0000000000000000
	  x20: 0xfffffe167f157890 x21: 0xfffffe167f15617c  x22: 0x00000000e00002bd  x23: 0x0000000000000000
	  x24: 0x000000000014b4dc x25: 0x0000000000000000  x26: 0xfffffe167e137840  x27: 0x0000000000000000
	  x28: 0x0000000000000000 fp:  0xfffffe3d227ef140  lr:  0xc8b2fe002d46a9ac  sp:  0xfffffe3d227ef0c0
	  pc:  0xfffffe002d46a9b0 cpsr: 0x60401208         esr: 0x96000006          far: 0x0000000000000000

Debugger message: panic
Memory ID: 0x6
OS release type: User
OS version: 21G83
Kernel version: Darwin Kernel Version 21.6.0: Wed Aug 10 14:28:18 PDT 2022; root:xnu_kasan-8020.141.5~2/KASAN_ARM64_T8101

This has just about exhausted my options in being able to locate the source of UAFs in my KEXT.

If you have any other practical advice to offer, it would be greatly appreciated.

How do kernel devs at Apple debug UAFs?

On reproducing a kernel panic via a UAF, I got the following symbolication, which I didn't find useful in identifying the source of a UAF in my KEXT.

Have you tried doing a full symbolication of all threads at the panic point (not just the thread that panicked)? It's possible to do this through lldb as well, but this forum post has a full run-through of how to manually symbolicate one of our panic logs (which include the stack of every process on the system).

There aren't any guarantees, but it's fairly common for the source of the free to be visible on other threads (either directly or "close" to that point).

If you have any other practical advice to offer, it would be greatly appreciated.

First off, how reproducible is the panic, particularly in terms of either time or particular "action"? I have a few different suggestions below, but almost all of them will work "better" the more reliably you can reproduce the problem.

Next, I'd focus on how the information you've learned changes what/where the panic might be occurring. Putting that another way, what do you now know ISN'T panicking? The ASAN trace shows the issue coming from the network stack, so presumably, anything that isn't interacting with the network stack isn't part of this panic (ignoring buffer underruns).

One technique I've had some success with is to try and turn this into more of a "search" problem instead of an investigation. For example, the nature of UAF means that you wrote "something", so what happens if you... don't write? For example, in a local file system, you can short-circuit the normal I/O process by simply discarding any incoming writes and while returning "success" to the VFS layer. If the panic still occurs, then you just "proved" (to at least some degree) that this particular component ISN'T involved in the panic. With a bit of luck, you can find the problem by narrowing the range of possibility down to something you can directly monitor/analyze.

As another approach, conceptually, the "basic" information you need to solve an issue like this isn't actually that complicated. If you had a list of every address your KEXT interacted with and the points that interaction occurred, then "all" you'd need to do is match that address list with the address you'd panicked at and you'd have found the problem. Unfortunately, the problem here is that you're routinely interacting with so many addresses that you can't really "export" such a list, as it would simply be too unwieldy to create/manage/export, not to mention the performance distortions it creates.

However, if you can "narrow" the scope of the panic enough, you might be able to collect exactly the kind of list I'm describing by not starting the collection process until just before you "know" the problem is going to happen.

Another idea based on this:

My reading of this is that somewhere in my KEXT I'm holding a reference 0xfffffe2e7c639600 to a 128-byte zone that wrote 0x00000000ffffffff at offset 96 after that particular chunk of memory had been released and zeroed out by the kernel.

...

The panic itself is emitted when my KEXT requests the memory chunk that's been tempered with via the following set of calls.

Since allocations are triggering the panic, hypothetically, you might be able to "catch" the panic earlier by sprinkling your code allocating/freeing from exactly the same zone. I'd actually do something like:

  • Allocate the memory.

  • Write some "marker" value to multiple locations in the buffer.

  • Free the buffer.

...possibly doing that multiple times every time I did so. If you strategically "sprinkle" your code with those calls then, with luck, one of two things will happen:

  1. You'll trigger a panic inside your code, at which point you can try moving the allocation to further narrow the source.

  2. The buffer at whatever panic occurs outside your code will contain one of the markers you wrote above, giving you a different "hint" as to where the panic is occurring.

I'd also look at trying to log the addresses of those allocations. I think the allocator is sufficiently deterministic (particularly if you actively minimize any other activity) that relative addresses of your allocations vs. the panic point might tell you something about what "part" of your KEXT is involved with the panic.

How do kernel devs at Apple debug UAFs?

I actually posted that question on one of our internal communication channels and basically got a few answers:

  1. We try REALLY, REALLY hard to avoid this kind of bug. This is about maintaining a clear line between "my memory" (which a component manages and doesn't let anyone else touch) and "other people’s memory" (meaning, memory it got from someone else), then validating those two different cases.

  2. When it works, KASAN is great!

  3. Slowly and painfully.

The truth is crashes like this are often inherently difficult and there isn't really any "fixed" way to find them.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Pinpointing dandling pointers in 3rd party KEXTs
 
 
Q