Symbolicating kernel backtraces on Apple Silicon

I am able to symbolicate kernel backtraces for addresses that belong to my kext.

Is it possible to symbolicate kernel backtraces for addresses that lie beyond my kext and reference kernel code?

Answered by DTS Engineer in 836958022

Let me step back to the beginning and explain this in a bit more detail:

  1. When using "-l", what atos is actually "getting" there is the load address starting from the first __TEXT segment in the executable.

  2. In arm64, (very generally) things shifted about code was loading out of __TEXT_EXEC, NOT _TEXT.

SO, to correctly symbolicate, you need to properly shift the load address to account for that. Starting with loadable KEXTs, this is fairly easy. They're __TEXT segment is at 0, so the math here is:

(__TEXT_EXEC.vmaddr) - (__TEXT.vmaddr) ->

KEXTs also tend to have a fixed opening __TEXT size, so the value of __TEXT_EXEC.vmaddr ends up being "0x4000"

0x4000 - 0 -> 0x4000

That is where my original command came from:

atos -arch arm64e -o <symbol file path)  -l <KEXT load address - 0x4000>  <address to symbol>

NOTE: My earlier post was wrong, you need to subtract not add.

For kext, that load address comes directly from the panic log. Here is what that looks like in the panic from the other thread on this issue:

          lr: 0xfffffe0013723894  fp: 0x0000000000000000
      Kernel Extensions in backtrace:
         com.company.product(1.4.21d119)[92BABD94-80A4-3F6D-857A-3240E4DA8009]@0xfffffe001203bfd0->0xfffffe00120533ab
            dependency: com.apple.iokit.IOPCIFamily(2.9)[6D6666E6-340F-3A5E-9464-DE05164C0658]@0xfffffe0015e65e90->0xfffffe0015e93b3f

Now, the math is actually exactly the same for the kernel, but the numbers are totally different. Running otool -l on kernel.release.t8112 gets you:

__TEXT.vmaddr -> 
...
Load command 0
      cmd LC_SEGMENT_64
  cmdsize 552
  segname __TEXT
   vmaddr 0xfffffe0007004000
   vmsize 0x0000000000100000
...

AND:

__TEXT_EXEC.vmaddr:
...
Load command 2
      cmd LC_SEGMENT_64
  cmdsize 312
  segname __TEXT_EXEC
   vmaddr 0xfffffe00072b8000
   vmsize 0x00000000008f0000
...

(You may notice there are multiple __TEXT_EXEC section, in this case all we care about is the first)

SO, plugging that that into our math above:

(__TEXT_EXEC.vmaddr) - (__TEXT.vmaddr) ->
0xfffffe00072b8000 - 0xfffffe0007004000 -> 0x2B4000

That number is then subtracted from:

Kernel text exec base:  0xfffffe001cd8c000

0xfffffe001cd8c000 - 0x2B4000 -> 0xFFFFFE001CAD8000

Plugging that value into your atos command:

atos -o kernel.release.t8112 -arch arm64e -l 0xFFFFFE001CAD8000 0xfffffe001cde8ab8 0xfffffe001cf44894 0xfffffe001cf42a90 0xfffffe001cd938c0 0xfffffe001cde83b0 0xfffffe001d66d670 0xfffffe001d86a708 0xfffffe001d86941c 0xfffffe001d866504 0xfffffe001e234bc8 0xfffffe001cf4611c 0xfffffe001cd93938 0xfffffe001cee0a80 0xfffffe001cee0a80 0xfffffe001d38e408 0xfffffe001d3e9304 0xfffffe001ce8c24c 0xfffffe001cedf514 0xfffffe001cd9cc04

Gives us:

handle_debugger_trap (in kernel.release.t8112) (debug.c:1839)
handle_uncategorized (in kernel.release.t8112) (sleh.c:1591)
sleh_synchronous (in kernel.release.t8112) (sleh.c:1477)
fleh_synchronous (in kernel.release.t8112) + 44
panic_trap_to_debugger (in kernel.release.t8112) (debug.c:1400)
panic_with_options_and_initiator (in kernel.release.t8112) (debug.c:1213)
0xfffffe001d86a708
0xfffffe001d86941c
0xfffffe001d866504
0xfffffe001e234bc8
sleh_irq (in kernel.release.t8112) (sleh.c:3840)
fleh_irq (in kernel.release.t8112) + 64
vm_object_upl_request (in kernel.release.t8112) (vm_pageout.c:6213)
vm_object_upl_request (in kernel.release.t8112) (vm_pageout.c:6213)
ubc_create_upl_kernel (in kernel.release.t8112) (ubc_subr.c:2643)
vnode_pageout (in kernel.release.t8112) (vnode_pager.c:407)
vnode_pager_data_return (in kernel.release.t8112) (bsd_vm.c:451)
vm_pageout_iothread_external_continue (in kernel.release.t8112) (vm_pageout.c:4237)
Call_continuation (in kernel.release.t8112) + 196

Note the references to "panic..." which are an easy way to validate that the symbolication is valid. By it's nature, the kernel cannot "crash" in the same sense any other process can, as the kernel IS the core component that controls all of the mechanisms that cause OTHER processes to crash. Because of that, "all" kernel panics end in a call to "panic" because that's the function that ends up generating the log data your looking at and then stopping normal execution.

Anticipating your question, no, I don't know why the other 4 frame aren't symbolicating, but I don't think it's an issue with atos. Apple obviously has our own internal symbolication tools and they're getting exactly the same result. Are they from your KEXT and you just didn't include the KEXT address info?

Having said that, they also don't really matter for this particular panic. That's because of this message at the top of the log:

panic(cpu 4 caller 0xfffffe001d86a708): watchdog timeout: no checkins from watchdogd in 90 seconds (25 total checkins since monitoring last enabled)

This message basically means exactly what it sounds like. There's system in place (managed by "watchdogd") to verify that user space that requires specific, critical system daemons to "checkin" with the kernel. If they fail to do so, then the system times out and the kernel panics. That dynamic also means that the specific stack traces is largely irrelevant, as it simply shows what happened to be executing at the time the panic occurred, NOT what actually caused the panic.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Is it possible to symbolicate kernel backtraces for addresses that lie beyond my kext and reference kernel code?

Yes, it should be. Each Kernel Debug Kit (KDKs) release contains symbols files for that particular system. You can find those download here.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Thanks for your response.

I've downloaded and extracted Kernel_Debug_Kit_15.4_build_24E248.dmg.

Here's what I've tried so far to sybolicate the sample kernel panic log.

Note: the kernel passed in to the -o flag of atos(1) is the one that is found in kernel.release.t8112.dSYM/Contents/Resources/DWARF/kernel.release.t8112.

The load address computed as the kernel base addresses given in the kernel panic log minus the corresponding slide

atos -o kernel.release.t8112 -arch arm64e -l FFFFFE0007004000 0xfffffe001cde8ab8 0xfffffe001cf44894 0xfffffe001cf42a90 0xfffffe001cd938c0 0xfffffe001cde83b0 0xfffffe001d66d670 0xfffffe001d86a708 0xfffffe001d86941c 0xfffffe001d866504 0xfffffe001e234bc8 0xfffffe001cf4611c 0xfffffe001cd93938 0xfffffe001cee0a80 0xfffffe001cee0a80 0xfffffe001d38e408 0xfffffe001d3e9304 0xfffffe001ce8c24c 0xfffffe001cedf514 0xfffffe001cd9cc04

No output generated

The value of Kernel slide given in the kernel panic log passed to the -s flag

atos -o kernel.release.t8112 -arch arm64e -s 0000000014610000 0xfffffe001cde8ab8 0xfffffe001cf44894 0xfffffe001cf42a90 0xfffffe001cd938c0 0xfffffe001cde83b0 0xfffffe001d66d670 0xfffffe001d86a708 0xfffffe001d86941c 0xfffffe001d866504 0xfffffe001e234bc8 0xfffffe001cf4611c 0xfffffe001cd93938 0xfffffe001cee0a80 0xfffffe001cee0a80 0xfffffe001d38e408 0xfffffe001d3e9304 0xfffffe001ce8c24c 0xfffffe001cedf514 0xfffffe001cd9cc04

No output generated

The value of KernelCache slide given in the kernel panic log passed to the -s flag

atos -o kernel.release.t8112 -arch arm64e -s 0000000014608000 0xfffffe001cde8ab8 0xfffffe001cf44894 0xfffffe001cf42a90 0xfffffe001cd938c0 0xfffffe001cde83b0 0xfffffe001d66d670 0xfffffe001d86a708 0xfffffe001d86941c 0xfffffe001d866504 0xfffffe001e234bc8 0xfffffe001cf4611c 0xfffffe001cd93938 0xfffffe001cee0a80 0xfffffe001cee0a80 0xfffffe001d38e408 0xfffffe001d3e9304 0xfffffe001ce8c24c 0xfffffe001cedf514 0xfffffe001cd9cc04

No output generated

The value of Kernel text exec slide given in the kernel panic log passed to the -s flag

atos -o kernel.release.t8112 -arch arm64e -s 0000000015D88000 0xfffffe001cde8ab8 0xfffffe001cf44894 0xfffffe001cf42a90 0xfffffe001cd938c0 0xfffffe001cde83b0 0xfffffe001d66d670 0xfffffe001d86a708 0xfffffe001d86941c 0xfffffe001d866504 0xfffffe001e234bc8 0xfffffe001cf4611c 0xfffffe001cd93938 0xfffffe001cee0a80 0xfffffe001cee0a80 0xfffffe001d38e408 0xfffffe001d3e9304 0xfffffe001ce8c24c 0xfffffe001cedf514 0xfffffe001cd9cc04

Output: Looks bogus

__doprnt.digits (in kernel.release.t8112) + 103398
bootstrap_pagetables (in kernel.release.t8112) + 329876
bootstrap_pagetables (in kernel.release.t8112) + 322192
table_10bits (in kernel.release.t8112) + 10400
__doprnt.digits (in kernel.release.t8112) + 101598
recvmsg_x (in kernel.release.t8112) + 332
IOMachPort::noMoreSendersForObject(OSObject*, unsigned int, unsigned int*) (in kernel.release.t8112) + 440
IOUserClient::init(OSDictionary*) (in kernel.release.t8112) + 0
IOServiceMessageUserNotification::MetaClass::~MetaClass() (in kernel.release.t8112) + 528
bootstrap_pagetables (in kernel.release.t8112) + 336156
table_10bits (in kernel.release.t8112) + 10520
ropagetable_end (in kernel.release.t8112) + 2688
ropagetable_end (in kernel.release.t8112) + 2688
ipsec_cleanup_family (in kernel.release.t8112) + 920
pf_test (in kernel.release.t8112) + 3700
ropagetable_begin (in kernel.release.t8112) + 324884
nfU16InvMasks (in kernel.release.t8112) + 112

Can you please instruct me how to symbolicate my sample kernel panic log properly.

Can you please instruct me how to symbolicate my sample kernel panic log properly.

I think you're running into this issue. The short summary of the situation here is:

  • KEXT changed on ARM64, which ends up shifting the load offset from base address you see in an unsymbolicated backtrace. You can find the actual offset by running this command and then looking for the value of "vmaddr" in __TEXT_EXEC in the output:
otool -l <path to kext executable>
...
cmd LC_SEGMENT_64
cmdsize 232
segname __TEXT_EXEC
vmaddr 0x0000000000004000
vmsize 0x0000000000004000
  • While the value is theoretically dynamic, in practice it's actually 0x4000. Practically speaking, I'd add that as a fixed value to the load address and only check vmaddr if atos returns something silly*. Here is what that atos command looks like:
atos -arch arm64e -o <symbol file path)  -l <load address + (0x4000/vmaddr)>  <address to symbol>

*Keep in mind that the vmaddr offset is large enough that atos will tend to return garbage, so any issue is unlikely to be subtle.

  • atos needs to be update (r.135959767) to take care of this itself, probably by adding a new flag option like "-textExecAddress" that you'd use instead of "-l". However, until that available you'll need to do the math yourself.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Thanks.

Just a couple of points to clarify.

atos -arch arm64e -o <symbol file path)  -l <load address + (0x4000/vmaddr)>  <address to symbol>

In that particular panic log, there's not load address specified for my kext.

What address do I specify in that case?

0x4000/vmaddr is that divided by vmaddr or is that saying that 0x4000 is the value of vmaddr?

Accepted Answer

Let me step back to the beginning and explain this in a bit more detail:

  1. When using "-l", what atos is actually "getting" there is the load address starting from the first __TEXT segment in the executable.

  2. In arm64, (very generally) things shifted about code was loading out of __TEXT_EXEC, NOT _TEXT.

SO, to correctly symbolicate, you need to properly shift the load address to account for that. Starting with loadable KEXTs, this is fairly easy. They're __TEXT segment is at 0, so the math here is:

(__TEXT_EXEC.vmaddr) - (__TEXT.vmaddr) ->

KEXTs also tend to have a fixed opening __TEXT size, so the value of __TEXT_EXEC.vmaddr ends up being "0x4000"

0x4000 - 0 -> 0x4000

That is where my original command came from:

atos -arch arm64e -o <symbol file path)  -l <KEXT load address - 0x4000>  <address to symbol>

NOTE: My earlier post was wrong, you need to subtract not add.

For kext, that load address comes directly from the panic log. Here is what that looks like in the panic from the other thread on this issue:

          lr: 0xfffffe0013723894  fp: 0x0000000000000000
      Kernel Extensions in backtrace:
         com.company.product(1.4.21d119)[92BABD94-80A4-3F6D-857A-3240E4DA8009]@0xfffffe001203bfd0->0xfffffe00120533ab
            dependency: com.apple.iokit.IOPCIFamily(2.9)[6D6666E6-340F-3A5E-9464-DE05164C0658]@0xfffffe0015e65e90->0xfffffe0015e93b3f

Now, the math is actually exactly the same for the kernel, but the numbers are totally different. Running otool -l on kernel.release.t8112 gets you:

__TEXT.vmaddr -> 
...
Load command 0
      cmd LC_SEGMENT_64
  cmdsize 552
  segname __TEXT
   vmaddr 0xfffffe0007004000
   vmsize 0x0000000000100000
...

AND:

__TEXT_EXEC.vmaddr:
...
Load command 2
      cmd LC_SEGMENT_64
  cmdsize 312
  segname __TEXT_EXEC
   vmaddr 0xfffffe00072b8000
   vmsize 0x00000000008f0000
...

(You may notice there are multiple __TEXT_EXEC section, in this case all we care about is the first)

SO, plugging that that into our math above:

(__TEXT_EXEC.vmaddr) - (__TEXT.vmaddr) ->
0xfffffe00072b8000 - 0xfffffe0007004000 -> 0x2B4000

That number is then subtracted from:

Kernel text exec base:  0xfffffe001cd8c000

0xfffffe001cd8c000 - 0x2B4000 -> 0xFFFFFE001CAD8000

Plugging that value into your atos command:

atos -o kernel.release.t8112 -arch arm64e -l 0xFFFFFE001CAD8000 0xfffffe001cde8ab8 0xfffffe001cf44894 0xfffffe001cf42a90 0xfffffe001cd938c0 0xfffffe001cde83b0 0xfffffe001d66d670 0xfffffe001d86a708 0xfffffe001d86941c 0xfffffe001d866504 0xfffffe001e234bc8 0xfffffe001cf4611c 0xfffffe001cd93938 0xfffffe001cee0a80 0xfffffe001cee0a80 0xfffffe001d38e408 0xfffffe001d3e9304 0xfffffe001ce8c24c 0xfffffe001cedf514 0xfffffe001cd9cc04

Gives us:

handle_debugger_trap (in kernel.release.t8112) (debug.c:1839)
handle_uncategorized (in kernel.release.t8112) (sleh.c:1591)
sleh_synchronous (in kernel.release.t8112) (sleh.c:1477)
fleh_synchronous (in kernel.release.t8112) + 44
panic_trap_to_debugger (in kernel.release.t8112) (debug.c:1400)
panic_with_options_and_initiator (in kernel.release.t8112) (debug.c:1213)
0xfffffe001d86a708
0xfffffe001d86941c
0xfffffe001d866504
0xfffffe001e234bc8
sleh_irq (in kernel.release.t8112) (sleh.c:3840)
fleh_irq (in kernel.release.t8112) + 64
vm_object_upl_request (in kernel.release.t8112) (vm_pageout.c:6213)
vm_object_upl_request (in kernel.release.t8112) (vm_pageout.c:6213)
ubc_create_upl_kernel (in kernel.release.t8112) (ubc_subr.c:2643)
vnode_pageout (in kernel.release.t8112) (vnode_pager.c:407)
vnode_pager_data_return (in kernel.release.t8112) (bsd_vm.c:451)
vm_pageout_iothread_external_continue (in kernel.release.t8112) (vm_pageout.c:4237)
Call_continuation (in kernel.release.t8112) + 196

Note the references to "panic..." which are an easy way to validate that the symbolication is valid. By it's nature, the kernel cannot "crash" in the same sense any other process can, as the kernel IS the core component that controls all of the mechanisms that cause OTHER processes to crash. Because of that, "all" kernel panics end in a call to "panic" because that's the function that ends up generating the log data your looking at and then stopping normal execution.

Anticipating your question, no, I don't know why the other 4 frame aren't symbolicating, but I don't think it's an issue with atos. Apple obviously has our own internal symbolication tools and they're getting exactly the same result. Are they from your KEXT and you just didn't include the KEXT address info?

Having said that, they also don't really matter for this particular panic. That's because of this message at the top of the log:

panic(cpu 4 caller 0xfffffe001d86a708): watchdog timeout: no checkins from watchdogd in 90 seconds (25 total checkins since monitoring last enabled)

This message basically means exactly what it sounds like. There's system in place (managed by "watchdogd") to verify that user space that requires specific, critical system daemons to "checkin" with the kernel. If they fail to do so, then the system times out and the kernel panics. That dynamic also means that the specific stack traces is largely irrelevant, as it simply shows what happened to be executing at the time the panic occurred, NOT what actually caused the panic.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Thanks. That makes perfect sense now.

I got the same symbolicated output on my system following your instructions.

Are they from your KEXT and you just didn't include the KEXT address info?

That particular kernel panic is reproducible whenever a very large zip file is being extracted over my filesystem.

With watchdog panics, is there a way of telling what caused watchdogd to emit a panic?

That particular kernel panic is reproducible whenever a very large zip file is being extracted over my filesystem.

With watchdog panics, is there a way of telling what caused watchdogd to emit a panic?

Not with certainty, but you may be able to get better "clarity". A few things to try:

  • I'm not sure exactly where you've been getting your panic log data, but the "full" log should also include a spindump of every thread on the system. The log is quite large and painful to symbolicate but if you file a bug with one of those full logs and post the number back here, I can tak a look and see if that offers any hints.

  • The KDK instructions also describe how to set up two machine debugging and/or a core dump server. You can also find a pretty good overview of the kernel debugging tool inside the XNU source.

You can also look at this from the general issue side as well. Going back to here:

...reproducible whenever a very large zip file is being extracted over my filesystem.

Particularly if you're not using our flash (reading between the lines of your panic tace), at it's core, zip extraction is basically just pure I/O. The data is being read from a source and written to a target, and the CPU itself is virtually guaranteed to be able to process (decompress) data faster than one of those two targets. SO, depending on the nature of the data and the devices, one of a limited set of things will occur:

  • Extraction will saturate the read source (because it's the slowest).

  • Extraction will saturate the VFS system (because the content is lots of small objects).

  • Extraction will saturate the write target (because it's the slowest).

A panic then occurs because some kind of stall/hang occurs which delays critical system function long enough that the watchdog panics. That typically because:

  1. Either of the storage devices ends up bottlenecking in a way that stalls other critical activity.

  2. "Something else" REACTS to what's happening in a way that generates the stall.

First off, the second point is what I would actually look at first, particularly by looking for any kind of Endpoint Security client. The ES API is unavoidably designed/positioned such that it can create EXACTLY this kind of failure, so there's no point trying to debug further until you've ruled on ES as the failure source.

Beyond that, you may also be able to debug this by looking at this as an hang/stall, NOT a panic. That is, if the system panics with a "1 GB" file but does NOT panic with a "900 MB" file, that is (probably) NOT because of any fundamental difference between those to sizes but simply because the smaller case finished before the watchdog gave up. Looking at the console log or running tools like spindump against the non-panic case may give you enough information to resolves the bottleneck, which will then prevent the final panic.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Thanks a lot for your really helpful and detailed responses.

Two-machine debugging on Apple Silicon has certain limitations, which are worded in the KDK Readme as follows:

Note: Apple silicon doesn’t support active kernel debugging. You may inspect the current state of the kernel when it is halted due to a panic or NMI. However, you cannot set breakpoints, continue code execution, step into code, step over code, or step out of the current instruction.

x86-64 is better suited for remote kernel debugging. This particular panic doesn't occur on x86-64 though. Only on Apple Silicon.

I could try and set up a core dump server/client and see how that pans out.

Before I go ahead and file a bug report you suggested by way of submitting a full panic log, you mentioned about ES. In the full log, I found a reference to this:

"name":"AFKDCPEXTEndpoint9"},"1597":{"id":1597,"system_usec":104537416,"schedPriority":91,"state":["TH_RUN"],"snapshotFlags":["kKernel64_p","kThreadIOPassive","kThreadOnCore"],"user_usec":0,"kernelFrames":[[0,377776],[0,9311856],[10,23512],[10,18668],[10,6612],[11,6952],[0,1810716],[0,31032],[0,1395328],[0,1395328],[0,6300680],[0,6673156],[0,1049164],[0,1389844],[0,68612]],"basePriority":91,"lastMadeRunnableTime":3.1102084579999998,"userTime":0,"lastRunTime":3.1102656249999998,"systemTime":104.537416333,

Which corresponds to the panicked thread id:

Panicked thread: 0xfffffe31ffe31fc0, backtrace: 0xfffffe81838478d0, tid: 1597

That thread name, AFKDCPEXTEndpoint9, wouldn't a part of the ES API you were referring to, would it?

That thread name, AFKDCPEXTEndpoint9, wouldn't a part of the ES API you were referring to, would it?

It is not. I won't attempt to describe what it's actually "for" but it's part of a fairly general purpose architecture for interacting with coprocessors. It's possible that it does have some role in this case, but that role is very likely a secondary ("you did something that eventually caused it to stall, which then caused the panic") not primary ("it stalled on it's own, which then caused the panic").

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Hello Kevin,

I was wondering if you've had a chance to take a look at FB17497530 yet?

Thanks.

I was wondering if you've had a chance to take a look at FB17497530 yet?

I spent an hour or so looking at it today and have a few thoughts.

First off, something is clearly happening in the VM system. I don't know what the issue might actually be, but 200+ thread are blocked in vm_page_wait and I didn't find any thread that was showing clear "progress" in the VM system.

How much space is available on the boot volume and is there ANY possibility That it's becoming constrained? Like most modern OSs, the system is going to "fail" if it actually exhausts available swap. There are systems in place to mitigate that (first by warning the user, then by terminating processes) but there isn't any way to guarantee that will work and it's possible that some nuance of your implementation is bypassing that those protections.

Related to that point, what's the content you're actually decompressing? One (or several) large files or a bunch of small ones? The thought that crossed my mind is that it's possible decompressing a very large file might cause a pre-allocation that would end up "suddenly" filling your boot system faster than the normal mechanism could react. That's not something I'd expect to be happening, but it's worth ruling out.

That particular kernel panic is reproducible whenever a very large zip file is being extracted over my filesystem.

Is your file system the source or the target for the zip file? I only found one VFS stack that looks like it might be your file system and it's a read, not a write call:

13524	Kernel Frames
0	kernel.release.t8112	0xfffffe0008acbaf4	thread_invoke + 1180	(sched_prim.c:3566)
1	kernel.release.t8112	0xfffffe0008aca59c	thread_block_reason + 244	(sched_prim.c:4183)
2	kernel.release.t8112	0xfffffe0008ba0ce4	vm_page_wait + 436	(vm_resident.c:5443)
3	kernel.release.t8112	0xfffffe0008ba29bc	vm_page_alloc_list + 96	(vm_resident.c:8581)
4	kernel.release.t8112	0xfffffe0008b562f8	kmem_alloc_guard_internal + 404	(vm_kern.c:908)
5	kernel.release.t8112	0xfffffe0008aa7298	kalloc_large + 360	(kalloc.c:2364)
6	kernel.release.t8112	0xfffffe0008fdbe40	_MALLOC + 68	(kern_malloc.c:119)
7	[15, 181932]
8	[15, 184316]
9	[15, 317620]
10	[15, 316216]
11	[15, 71024]
12	[15, 70676]
13	[15, 236844]
14	kernel.release.t8112	0xfffffe0008c7b994	cluster_io + 3416	(vfs_cluster.c:2336)
15	kernel.release.t8112	0xfffffe0008c8549c	advisory_read_ext + 756	(vfs_cluster.c:6258)
16	kernel.release.t8112	0xfffffe0008c857a4	cluster_read_ahead + 652	(vfs_cluster.c:2568)
17	kernel.release.t8112	0xfffffe0008c82d38	cluster_read_copy + 3524	(vfs_cluster.c:4942)
18	kernel.release.t8112	0xfffffe0008c81f28	cluster_read_ext + 376	(vfs_cluster.c:4456)
19	[15, 233920]
20	kernel.release.t8112	0xfffffe0008cc1eec	vn_read_common + 388	(vfs_vnops.c:1154)
21	kernel.release.t8112	0xfffffe0008cc1b6c	vn_read + 860	(vfs_vnops.c:1204)
22	kernel.release.t8112	0xfffffe0009022768	readv_internal + 560	(sys_generic.c:340)
23	kernel.release.t8112	0xfffffe0009022964	read + 300	(sys_generic.c:391)
24	kernel.release.t8112	0xfffffe000912aea0	unix_syscall + 772	(systemcalls.c:181)
25	kernel.release.t8112	0xfffffe0008beeb1c	sleh_synchronous + 888	(sleh.c:1257)
26	kernel.release.t8112	0xfffffe0008a3f8c0	fleh_synchronous + 44	
27	kernel.release.t8112	0xfffffe0008a3f884	fleh_dispatch64 + 412	

User Frames
0	libsystem_kernel.dylib	0x1804307dc	read + 8	
1	libarchive.2.dylib	0x18e446a00	client_read_proxy + 64	(archive_read.c:168)
2	libarchive.2.dylib	0x18e444e9c	__archive_read_filter_ahead + 628	(archive_read.c:1407)
3	libarchive.2.dylib	0x18e444c1c	__archive_read_ahead + 48	(archive_read.c:1340)
4	libarchive.2.dylib	0x18e4bad54	zip_read_data_none + 924	(archive_read_support_format_zip.c:1740)
5	libarchive.2.dylib	0x18e4b53f8	archive_read_format_zip_read_data + 512	(archive_read_support_format_zip.c:3189)
6	libarchive.2.dylib	0x18e446728	_archive_read_data_block + 192	(archive_read.c:989)
7	libarchive.2.dylib	0x18e4c9950	archive_read_data_block + 64	(archive_virtual.c:167)
8	libarchive.2.dylib	0x18e4f02e8	copy_data + 104	(archive_read_extract2.c:137)
9	libarchive.2.dylib	0x18e4f01f4	archive_read_extract2 + 208	(archive_read_extract2.c:99)
10	libarchive.2.dylib	0x18e46d588	archive_read_extract + 244	(archive_read_extract.c:74)

Related to that point, what's actually providing the storage for your file system? From the context in the bug, it sounds like this might be some form of network/remote file system, but I didn't find any obvious support daemon. Is it possible that some supporting component has failed?

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Thanks very much for looking into this.

Here are the points you wanted clarified.

How much space is available on the boot volume and is there ANY possibility That it's becoming constrained?

That doesn't seem to be the case. See

what's the content you're actually decompressing? One (or several) large files or a bunch of small ones?

It is a bunch of files of varying sizes. Here's a

Is your file system the source or the target for the zip file?

It's actually both.

what's actually providing the storage for your file system? Is it possible that some supporting component has failed?

It is a network filesystem with I/O happening via RPC calls. Internally, the filesystem implements a socket polling algorithm that relies on the use of sock_upcall callback that's passed into sock_socket() on socket creation. The backing store on the remote server has been verified to have ample disk space available.

Do let me know if there's anything else I can provide that would help identify the cause of the problem.

Note: The message below is structured in the order I wrote it, but the later section on the VM system is probably what's most important. The short summary is that I think "something" you're doing is pushing the VM system past where it should be operating. The watchdog panic just happened to be how that underlying problem presented.

It is a bunch of files of varying sizes. Here's a

Yeah, that's only 60gb for free space, so you shouldn't be that "tight".

Is your file system the source or the target for the zip file?

It's actually both.

Interesting. A few questions on that point then:

  • Does the panic happen when you're just the source and/or destination or only when your "both"? I suspect it will happen when you're just the destination and won't happen when you''re just the source, but I'd like to confirm that.

  • How fast is the I/O path back to the server? Are you saturating that connection?

  • Is the I/O pipeline here simply "read compressed file from network-> decompress data-> write data out to network"? Or is there any intermediary in that process?

  • What actually makes it "back" to the server before the panic occurs? How much data were you actually able to write?

This is just a guess here, but the compression process here means there is an asymmetry between your effective read performance and your write performance. How does your write process actually "work"? Is there anything that would limit/constrain how much data you have pending (waiting to be sent over the network)? This is purely speculative, but I could imagine a situation where you run the machine out of memory by blindly accepting data to be written much faster than you're able to send the data out.

Do let me know if there's anything else I can provide that would help identify the cause of the problem.

First off, just to be clear, given that your file system is involved it's very likely that your file system is the underlying cause, even though the panic doesn't directly implicate you. There are a few different things I would look at here:

  • I would get Archive Utility out the process. My guess is that you can replicate that panic a basic tool that simply opens lot of files and writes a similar data volume at high speed, so that's probably where I'd start. One suggestion here is that I'd either generate random data or use the uncompressed data a the source, NOT "NULL" or simple patterns. If the data you work with is easily compressed, then the memory compressor could end up masking the problem.

  • Related to that point, I would write this as an app and have it write progress/diagnostic data to the the screen. Part of what makes this process difficult is getting data "close" to the point of failure, and writing to the screen means you can can data much closer to the point the panic actually happens.

  • I'd also try adding the Network Link Conditioner into the mix and see how that changes things. If my theory is write, then reducing network performance should make the problem worse (as the machine is forced to sit on more data waiting to be written). If you're not familiar with the Network Link Conditioner, it's a system preference that allows you artificially restrict the mac's network connection, simulating arbitrarily poor network conditions. You can find it in our "Additional Tools for Xcode", which you can download here.

The overall goal here is first get a clearer understanding of what's actually happening leading up to the panel and, if that doesn't provide clarity, allow you to reliably get the machine close "enough" to the point it panics that you can see what's going on while the machine is still running (so you can get diagnostic data).

Finally, a question here:

Internally, the filesystem implements a socket polling algorithm that relies on the use of sock_upcall callback that's passed into sock_socket() on socket creation.

My big question here is what memory is actually holding the file data that's waiting to be written to your server. My concern here is that this is happening directly in the kernel and/or is entangled with kernel memory in a way that ultimately forces the panic you're seeing. For reference, here is that state of memory from the panic:

active       695500
fileBacked   1390993
free         903
inactive     1
purgeable    0
speculative  695500
throttled    0
wired        100550

Note that both of these numbers jump out at me as VERY large:

fileBacked   1390993-> ~21g
speculative  695500-> ~10g

And this is downright SCARY:

free         903-> ~14m

For comparison, here are the same values from my machine when I run "vm_stat", reordered and edited to match the list above:

Mach Virtual Memory Statistics: (page size of 16384 bytes)
...
Pages active:                            188532.
File-backed pages:                       136625.
Pages free:                               13765.
Pages inactive:                          180253.
Pages purgeable:                           1509.
Pages speculative:                         5770.
Pages throttled:                              0.
Pages wired down:                        299012.

Note that my machine is not REMOTELY idle. This was my machine state as I was writing this with 20+ apps running and an uptime of 28 days.

My informal "rule of thumb" is that the system tends to run at ~80-90% of available memory, leaving the extra 10-20% as the "cushion" it uses to to deal with demand spikes. Pulling from the numbers below, the machine I'm working from has 16gb and my free pages are:

Pages free: 13765-> ~2g

...exactly what I'd expect. Frankly, I'm surprised you managed to get as low as 14m before you panic'd. As an immediate test, I'd suggest running vm_stat while testing with "larger" archives (basically, the biggest archive that doesn't panic) and see how the numbers shift. I think you'll find that you're causing serious spikes as well, they're just not big enough to cause the panic.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Does the panic happen when you're just the source and/or destination or only when your "both"? I suspect it will happen when you're just the destination and won't happen when you''re just the source, but I'd like to confirm that.

You were right in assuming that the panic occurred when my filesystem was the destination. I was able to verify that.

How fast is the I/O path back to the server? Are you saturating that connection?

The connection is not likely to be saturated as this is a 100Gb Link on a thunderbolt 5 interface.

Is the I/O pipeline here simply "read compressed file from network-> decompress data-> write data out to network"? Or is there any intermediary in that process?

The I/O pipeline is as you described it with no intermediary involved.

What actually makes it "back" to the server before the panic occurs? How much data were you actually able to write?

On two subsequent runs, around 41-42 Gb out of 64Gb of data were written before the panic ensued.

du -smx ./25116_CINEGRAPHER_MARCCAIN
41303	./25116_CINEGRAPHER_MARCCAIN

du -smx ./25116_CINEGRAPHER_MARCCAIN
42600	./25116_CINEGRAPHER_MARCCAIN

How does your write process actually "work"? Is there anything that would limit/constrain how much data you have pending (waiting to be sent over the network)?

The source uio_t buffer passed into vnop_write() is userspace. Before passing it down to sock_send() which operates on kernel resident memory buffers, we create a kernelspace copy of the userspace uio_t buffer of size equal to uio_resid(uspace_uio) with copying performed by uiomove() incrementally in chunks equal to either the value of the amount of data left in the userspace buffer or the value of the kernel's copysize_limit_panic, whichever happens to be the smaller of the two.

The kernelspace uio_t buffer is further split up into smaller chunks of data pertinent to the filesystem design which end up being passed into sock_send().

Reading is done in a similar fashion, the only difference being the use of sock_receive_mbuf() in stead of sock_receive() which uses an uio_t buffer instead of an mbuf.

I'm onto the debugging strategies you suggested now. I'll report back on my findings as they emerge.

Thanks once again for all your help.

Hopefully, we'll be able to resolve this soon.

You were right in assuming that the panic occurred when my filesystem was the destination. I was able to verify that.

OK, that matches up with the memory side of this since the kernel can't just "throw away" data it need/thinks it needs to write in the same way read data can be.

Shifting to here:

The connection is not likely to be saturated as this is a 100Gb Link on a thunderbolt 5 interface.

So, one thing I would immediately to is DRAMATICALLY restrict that connection, probably down to Mb/s, then what what happens with files in the 1-2 GB range. That should slow the output rate down enough that you'll be able to see accumulation occurring with transfers that are small enough that you won't threaten system stability.

I suspect what you'll find is that you're accepting writes at "full" speed, leaving the kernel "stuck" with 1+ GB of memory slowing streaming out over that connection. That's not acceptable since it means:

  • It's inevitably going to cause some kind of panic.

  • Even without a directly panic, any other kind of failure means all that unwritten data is lost.

Expanding on that last point, while write caching is a basic part of the systems design, you don't want to create a situation where you've told the user that something like a copy is "done" but the data transfer isn't actually finished until minutes later. The bottom line here is that you ultimately need to constrain how quickly you complete write calls to be roughly in line with your underlying ability to transfer data.

Reading is done in a similar fashion, the only difference being the use of sock_receive_mbuf() in stead of sock_receive() which uses an uio_t buffer instead of an mbuf.

Note that reads can't really cause the same issue as writes. Your VFS data "returns" that read data to the larger system, at which point the VM system has it's full "toolbox" for managing that memory. In the case of something like decompression (and lots of other use cases) that toolbox doesn't even matter that much, as user space is disposing of the read data at roughly the same rate you're returning it, preventing any accumulation.

BTW, how much memory does this machine have? Is it 32g or 16g?

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

BTW, how much memory does this machine have? Is it 32g or 16g?

It's 36Gb.

% expr `sysctl -n hw.memsize` / 1024 / 1024 / 1024
36

The bottom line here is that you ultimately need to constrain how quickly you complete write calls to be roughly in line with your underlying ability to transfer data.

My attempt at this was through modifying the socket polling algorithm.

The decision on whether it's a read or a write event that's occurred is taken in the sock_upcall callback.

The socket's receive buffer is queried about whether there's any data available via a call to sock_getsockopt(SO_NREAD). If there is, a thread waiting to read data is sent a read event wakeup() call. Otherwise, the algorithm considers this to be a write event and a thread waiting to write data is sent a wakeup() call. This doesn't take into account whether or not there's room in the send buffer.

It's a 'poor' man's socket polling algo, due to functions like sock_setupcalls(), soreadable(), and sowriteable() being a part of the private API.

To try and bring the rate the data is being written in line with the rate it's being transferred, I tried modifying the write part of the algorithm by considering it a write event only if there's no more data left in the socket's send buffer via a call to sock_getsockopt(SO_NWRITE). That didn't help remedy the problem.

I can't think of other ways of achieving this at the moment.

My attempt at this was through modifying the socket polling algorithm.

I don't know enough about the internals of your code to be sure, but my intuition is that this is the wrong place to try and adress this.

Moving to here....

The decision on whether it's a read or a write event that's occurred is taken in the sock_upcall callback.

Making things more explicit, my understanding is that your current logic basically looks like this:

  • write() called.
  • VNOP_WRITE is called in your driver.
  • Driver copies input data.
  • Driver returns from VNOP_WRITE
  • write() returns.
  • Driver finishes uploading data "at some point".

That basic approach is how any kind of write cache works, however, unless you're able to transfer data to disk faster than the system can call write()*, you're going to accumulate memory in the kernel and eventually panic. That means that at some point you need to decide to NOT return from "VNOP_WRITE" and instead block.

*Hint, this cannot be done, particularly since user space is allowed to "cheat" and have multiple threads calling write().

Note that this issue doesn't really apply to read(). You can't return from read() until you "have" the data, so a certain amount of blocking in implicit in it's design/"thinking". You might get into trouble if you allow your read cach to become to large, but that risk is more obvious and avoidable.

The socket's receive buffer is queried about whether there's any data available via a call to sock_getsockopt(SO_NREAD). If there is, a thread waiting to read data is sent a read event wakeup() call. Otherwise, the algorithm considers this to be a write event and a thread waiting to write data is sent a wakeup() call. This doesn't take into account whether or not there's room in the send buffer.

Unless I'm missing something, this is already to late as you've already returned from VNOP_WRITE/"write()".

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Your description of the write operation logic my filesystem implements is fairly close. I thought I'd feel in a few blanks though.

  • write() called.
  • VNOP_WRITE gets called.
  • Input data gets copied from user space into kernel space.
  • Input data equal to the size of uio_resid() gets split up into smaller chunks and is iteratively transmitted over network until there's no more input data left with sock_send(MSG_DONTWAIT) returning the number of the bytes written in sentlen, which is equal to the value of uio_resid() of the chunk passed into sock_send().
  • VNOP_WRITE returns once all input data has been transmitted.
  • write() returns.
  • Not sure about the 'driver finishes uploading data at some point' part though. Or were you referring to the TCP stack queueing up the input data after sock_send(MSG_DONTWAIT) returns for subsequent transmission?

At some point you need to decide to NOT return from "VNOP_WRITE" and instead block

Without first understanding what's using up all that VM, I can't see how such a point may be determined. How would VNOP_WRITE block?

For some reason, IOLog() data doesn't wind up in the output of log show|stream on macOS 15.5 running on M4 Max.

I've added debug print statements to output buffer sizes and memory (de)allocation for uio_t buffers used to store the chunks of input data. I'll run the zip file extraction while collecting the diagnostics via log stream on an M2 Mac where logging is working and see if that reveals anything of interest.

First off, I wanted to add a quick comment about something you said earlier:

Two-machine debugging on Apple Silicon has certain limitations, which are worded in the KDK Readme as follows:

Note: Apple silicon doesn’t support active kernel debugging. You may inspect the current state of the kernel when it is halted due to a panic or NMI. However, you cannot set breakpoints, continue code execution, step into code, step over code, or step out of the current instruction.

x86-64 is better suited for remote kernel debugging. This particular panic doesn't occur on x86-64 though. Only on Apple Silicon.

This is much less of a limitation than it might seem in practice because, in practice, being able to break and resume the kernel has always been far less useful that it might seem. So much of the system behavior overall behavior is driven by timing that it's always been the case that pausing the kernel for any significant length of time was so disruptive to it's normal operation that resuming it was of limited value.

One suggestion I do have here it think of IOPanic() (or, more likely, Debugger()) as a investigative tool, not just an error path. If you're able broadly identify the point at which the issue is starting to occur, you can then use IOPanic to halt the kernel inside your driver at whatever point will give you the best access to your drivers state, then use the debugger to investigate the specifics in more detail.

Looking at some details:

I thought I'd feel in a few blanks though.

One broad comment here. Looking at things a bit more closely, none of our file systems actually do "direct" I/O through VNOP_read/write. What they actually do is use the cluster I/O functions. Those function the generate actually I/O calls by calling into VNOP_STRATEGY. The cluster I/O functions are what control the UBC, so this also means that the system has much more control over what data is living in memory and when it knows it's flushed to disk.

Input data equal to the size of uio_resid() gets split up into smaller chunks and is iteratively transmitted over network until there's no more input data left with sock_send(MSG_DONTWAIT) returning the number of the bytes written in sentlen, which is equal to the value of uio_resid() of the chunk passed into sock_send().

My concern here is that the system is actually queuing up far more data here that you realize, allowing a very large backlog of unwritten data to accumulate. Have you looked into how that flow is working? Note that this is another case where manipulating network bandwidth can be helpful, as you'd immediately be able to see the effect, even with relatively small I/O requests.

Without first understanding what's using up all that VM, I can't see how such a point may be determined.

Sure, but the issue here is that if you're not controlling that point, then something else in the system is (or isn't...).

How would VNOP_WRITE block?

Because you blocked waiting on "something"? Typically that would either be an I/O call or lock.

For some reason, IOLog() data doesn't wind up in the output of log show|stream on macOS 15.5 running on M4 Max.

Is all kernel logging missing, or are your messages being redacted? On the redaction side, DriverKit actually uses the kernel's logging infrastructure, so I think the solution described in this thread would also apply to the kernel.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

I was able to collect some diagnostics which revealed an interesting pattern of things that were happening prior to the kernel panic occurring.

Depending on the amount of memory available, one or more files are able to get extracted from the zip file before the panic occurs.

cluster_write() gets called repeatedly until all file data is stored in VM.

Then VNOP_PAGEOUT gets called. I thought that VNOP_PAGEOUT was only called for mmapped files. But there are no VNOP_MMAP/VNOP_MNOMAP calls being made for the files being extracted.

The current VNOP_PAGEOUT implementation ends up calling err_pageout() for files that didn't get tagged as being memory mapped by VNOP_MMAP.

After a number of VNOP_PAGEOUT calls, VNOP_STRATEGY is called which commits file data to remote storage via a do_write() rpc.

This pattern is repeated until the system ultimately runs out of VM with kernel panic ensuing.

My intuition is telling me that those VNOP_PAGEOUT calls are being made for a reason. Possibly, to free up the memory pages used up by the cluster API.

I tried calling cluster_pageout() from VNOP_PAGEOUT despite VNOP_MMAP never being called. But that resulted in VNOP_STRATEGY being called through two separate paths, VNOP_PAGEOUT and cluster API, which resulted in I/O stalling.

Any further pointers would be much appreciated.

Then VNOP_PAGEOUT gets called. I thought that VNOP_PAGEOUT was only called for mmapped files.

No that's not correct, that's definitely not correct. The idea behind the UBC ("Universal Buffer Cache") is unify the I/O and VM system into a common architecture. In concrete terms, these to cases:

  1. Reading the first 16kb of a file.

  2. Mapping the first 16kb of the same file.

...should both return a VM page that's actually backed by EXACTLY the same page in the UBC system. Similarly, writing to the first 16kb of that file and modifying a mapped file both have EXACTLY the same result- that is, they both modify the same page in the UBC.

But there are no VNOP_MMAP/VNOP_MNOMAP calls being made for the files being extracted.

No, you called cluster_write(), which basically means "please modify this pages in the UBC".

The current VNOP_PAGEOUT implementation ends up calling err_pageout() for files that didn't get tagged as being memory mapped by VNOP_MMAP.

I don't think that's going to end very well. Have you looked at what it's actually asking to pageout and how that relates to the data you've returned?

My intuition is telling me that those VNOP_PAGEOUT calls are being made for a reason. Possibly, to free up the memory pages used up by the cluster API.

Trying to sort out exactly how this works is more than an afternoon's work, but what I think is going on here is that the system is actually dealing with two different concerns:

After a number of VNOP_PAGEOUT calls, VNOP_STRATEGY is called which commits file data to remote storage via a do_write() rpc.

VNOP_STRATEGY-> I think this is the system flushing dirty pages to disk as part of it's normal "lets get stuff to disk" strategy.

VNOP_PAGEOUT-> Yes, I think this the system letting you know that it's trying to reclaim pages. What exactly it's doing in your particular case isn't clear, but a comment in the SMB source does hint at one possibility:

/*
 * One case where the file is not currently open and we get called.
 * vnode_vput->vnode_reclaim_internal->vgone->vclean will call
 * ubc_msync_internal which calls vnode_pageout even though all dirty
 * data was pushed out on the last vnop_close.
 *
 * If file is not open, no way to push out data, so just return an
 * error.
 */

But that resulted in VNOP_STRATEGY being called through two separate paths, VNOP_PAGEOUT and cluster API, which resulted in I/O stalling.

This makes me very nervous, as my intuition is basically "yes, VNOP_STRATEGY can be called on a whole lot of different paths". Setting all other issue aside, I think you could make this happen by having multiple process open the same file and then shoving I/O at it (particularly if mapping and direct I/O were both being used).

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Further diagnostics have shown that ubc_msync(UBC_PUSHDIRTY|UBC_SYNC|UBC_INVALIDATE) was being called, for the file being written, through the VNOP_GETATTR path on each change of the file size while cluster_write() of the same file was in progress.

The change I made now calls ubc_msync() only if there's no cluster_write() in progress when getting file attributes.

To test the patch I extracted the zip file a few times in a row with the system no longer crashing and the following metrics being reported by vm_stat(1):

% vm_stat 30 | awk '/Statistics/; !/Statistics/{print $3,"|",$13}'
Mach Virtual Memory Statistics: (page size of 16384 bytes)
specul | file-backed
165368 | 653350
65896 | 888301
66724 | 920760
65872 | 907261
63935 | 898648
67795 | 885978
63295 | 871778
59696 | 863204

3309 | 807539
188881 | 859100
60996 | 1018163
63041 | 1015598
59276 | 1013489
57392 | 1013524
60394 | 1013068
56646 | 1009342

634 | 953419
333949 | 1003401
59119 | 1132207
60277 | 1131932
56650 | 1128616
59147 | 1124974
56620 | 1124759
57165 | 1123408
55714 | 1122087

Thanks very much for all your help.

Are you at liberty to disclose how you obtained the backtraces from the full panic long I'd supplied?

Are you at liberty to disclose how you obtained the backtraces from the full panic long I'd supplied?

Well, to be honest, I tossed a log into an internal tool that did the symbolication for me. In terms of doing the process yourself, the process isn't actually all that complicated (in principle) it would just be REALLY painful to do "by hand", as the format was basically designed to be as SMALL as possible, not to be at all readable. If you haven't already, please file a bug asking for us to provide a tool/tools for this process and then post the bug number back here.

In terms of how you'd do it yourself, if you look at the unsymbolicated crash log* you'll find a section that starts with:

  "processByPid" : {
    "0" : {"timesThrottled":0,"pageIns":0,"rawFlags":"0x20800001","timesDidThrottle":3,"procname":"kernel_task",....
    "1" : {"timesThrottled":0,"codeSigningAuxiliaryInfo":139586437120,"pageIns":2297,"systemTimeTask":2.57985825,...
    "103" : {"timesThrottled":0,"codeSigningAuxiliaryInfo":288230376151711744,"pageIns":75,"systemTimeTask":0.111604833,...
  ...

*Make sure you use a text editor that can disable line wrapping, otherwise the log is an undecipherable mess.

Each of those lines contain "all" of the information about that particular PID. Going back to symbolicated "chunk" I sent earlier, it included this unsymbolicated chunk:

13524	Kernel Frames
kernel.release.t8112	0xfffffe0008acbaf4	thread_invoke + 1180	(sched_prim.c:3566)
kernel.release.t8112	0xfffffe0008aca59c	thread_block_reason + 244	(sched_prim.c:4183)
kernel.release.t8112	0xfffffe0008ba0ce4	vm_page_wait + 436	(vm_resident.c:5443)
kernel.release.t8112	0xfffffe0008ba29bc	vm_page_alloc_list + 96	(vm_resident.c:8581)
kernel.release.t8112	0xfffffe0008b562f8	kmem_alloc_guard_internal + 404	(vm_kern.c:908)
kernel.release.t8112	0xfffffe0008aa7298	kalloc_large + 360	(kalloc.c:2364)
kernel.release.t8112	0xfffffe0008fdbe40	_MALLOC + 68	(kern_malloc.c:119)
[15, 181932]
[15, 184316]
[15, 317620]
[15, 316216]
[15, 71024]
[15, 70676]
[15, 236844]
kernel.release.t8112	0xfffffe0008c7b994	cluster_io + 3416	(vfs_cluster.c:2336)
kernel.release.t8112	0xfffffe0008c8549c	advisory_read_ext + 756	(vfs_cluster.c:6258)

Searching for "181932" will get you to this line:

"1413" : {"timesThrottled":0,"codeSigningAuxiliaryInfo":0,"pageIns":3922,"systemTimeTask":37.890620249999998,"rawFlags":"0x4C790801",...

Which, after replacing "]," with "],\n", will let you find this:

"user_usec":3287993,"kernelFrames":[[0,604916],
[0,599452],
[0,1477860],
[0,1485244],
[0,1172216],
[0,455320],
[0,5914176],
[15,181932],
[15,184316],
[15,317620],
[15,316216],
[15,71024],
[15,70676],
[15,236844],
[0,2374036],
[0,2413724],

Those are the actual frames you're symbolicating. The second value is the address, offset (I think) from the start of the library. The first value is an index in the "binaryImages" list which follows "processByPid":

  "binaryImages" : [["31040a93-b7a4-3b28-8a52-1d311ee62ecc",18446741874828394496,"T"],["d097e7da-1aa7-319f-bfb0-703c46b031b9",18446741874884756224,"T"]...

SO "0" is:

["31040a93-b7a4-3b28-8a52-1d311ee62ecc",18446741874828394496,"T"]

Note this is also the kernel UUID earlier in the log:

Kernel UUID: 31040A93-B7A4-3B28-8A52-1D311EE62ECC

And "15" is:

["1473bf7b-1487-3fe3-94bb-649224e65f21",18446741874883219200,"T"],

...which I assume is the UUID of your KEXT.

The second value is then the load address of the libary

I'd need to experiment a bit ot sort out the exact symbolication from there, but the sequence would basically be:

On that last step, I think you could just use the offset from the original frame and a load address of "0", but you might need to add the offset to the load and then symbolicate. That's something I'd need to experiment with to be 100% sure about.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

If you haven't already, please file a bug asking for us to provide a tool/tools for this process and then post the bug number back here.

That's very kind of you to suggest that. Please see FB17733912. Please advise if I should close my previous bug report, FB17497530, which you've already looked into.

That's very kind of you to suggest that. Please see FB17733912.

Perfect. Just to clarify, we probably won't release "our" tool as such, but this is something Xcode should address.

Please advise if I should close my previous bug report, FB17497530, which you've already looked into.

I think it's head back to you as 3rd party to resolves, but you're welcome to close it yourself.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Symbolicating kernel backtraces on Apple Silicon
 
 
Q