Apple Developer Connection
Member Login Log In | Not a Member? Contact ADC

< Previous PageNext Page > Hide TOC

Two-Machine Debugging

For debugging a device driver or indeed any code that resides in the kernel, two computers are a necessity. Buggy kernel code has a nasty tendency to crash or hang a system, and so directly debugging that system is often impossible.

Two-machine debugging with gdb is the main pathway to finding bugs in driver code. This section takes you through the procedure for setting up two computers for debugging, offers a few tips on using gdb in kernel code, introduces you to the kernel debugging macros, and discusses techniques for finding bugs causing kernel panics and hangs.

In this section:

Setting Up for Two-Machine Debugging
Using the Kernel Debugging Macros
Tips on Using gdb
Debugging Kernel Panics
Debugging System Hangs
Debugging Boot Drivers


Setting Up for Two-Machine Debugging

This section summarizes the steps required to set up two computers for kernel debugging. It draws heavily on the following documents, which you should refer to for more detailed information:

In two-machine debugging, one system is called the target computer and the other the host (or development) computer. The host computer is the computer that actually runs gdb. It is typically the computer on which your driver is developed—hence, it’s also referred to as the development computer. The target computer is the system on which the driver to be debugged is run. Your host and target computers should be running the same version of the Darwin kernel, or as close as possible to same version. (Of course, if you’re debugging a panic-prone version of the kernel, you’ll want the host computer to run the most recent stable version of Darwin.) For optimal source-level debugging, the host computer should have the source code of the driver, any kernel extensions related to your driver (such as its client or provider), and perhaps even the kernel itself (/xnu).

In order for two-machine debugging to be feasible, the following must be true:

Note: The following steps include instructions on setting up a permanent network connection via ARP (step 3). This is unnecessary if you are running Mac OS X v. 10.2 or later on both machines and if you set the NVRAM debug variable to 0x144 (as in step 1). This configuration allows you to set up two-machine debugging on two computers that are not necessarily on the same subnet. If you are running an earlier version of Mac OS X, however, you do need to follow step 3 and both computers must be on the same subnet.

When all this is in place, complete the following steps:

  1. Target Set the NVRAM debug variable to 0x144, which lets you drop into the debugger upon a non-maskable interrupt (NMI) and, if you’re running Mac OS X v. 10.2 or later, lets you debug two computers not on the same subnet. You can use setenv to set the flag within Open Firmware itself (on PowerPC-based Macintosh computers), or you can use the nvram utility. For the latter, enter the following as root at the command line:

    1. nvram boot-args="debug=0x144"

    It’s a good idea to enter nvram boot-args (no argument) first to get any current NVRAM variables in effect; then include these variables along with the debug flag when you give the nvram boot-args command a second time. Reboot the system.

    Note: If your target machine contains a PMU (for example, a PowerBook G4 or an early G5 desktop computer), you may find that it shuts down when you exit kernel debugging mode. One reason for this is that if a breakpoint causes kernel debugger entry in the middle of a PMU transaction, the PMU watchdog may trigger on a timeout and cause the machine to shut down. If you experience this, you may find that forcing the PMU driver to operate in polled mode fixes the problem. To do this, set the NVRAM variable pmuflags to 1, as shown below:

    nvram boot-args="pmuflags=1"

    You can set the pmuflags variable separately, as shown above, or you can set it at the same time you set the debug variable, as shown below:

    boot-args="debug=0x144 pmuflags=1"

  2. Host or Target Copy the driver (or any other kernel extension) to a working directory on the target computer.

  3. Host Set up a permanent network connection to the target computer via ARP. The following example assumes that your test computer is target.goober.com:

    $ ping -c 1 target.goober.com
    ping results: ....
    $ arp -an
    target.goober.com (10.0.0.69): 00:a0:13:12:65:31
    $ arp -s target.goober.com 00:a0:13:12:65:31
    $ arp -an
    target.goober.com (10.0.0.69) at00:a0:13:12:65:31 permanent

    This sequence of commands establishes a connection to the target computer (via ping), displays the information on recent connections ARP knows about (arp -an), makes the connection to the target computer permanent by specifying the Ethernet hardware address (arp -s), and issues the arp -an command a second time to verify this.

  4. Target Create symbol files for the driver and any other kernel extensions it depends on. First create a directory to hold the symbols; then run the kextload command-line tool, specifying the directory as the argument of the -s option:

    $ kextload -l -s /tmp/symbols /tmp/MyDriver.kext

    This command loads MyDriver.kext but, because of the -l option, doesn’t start the matching process yet (that happens in a later step). If you don’t want the driver to load just yet, specify the -n option along with the -s option. See “Using kextload, kextunload, and kextstat” for the kextload procedure for debugging a driver’s start-up code.

  5. Target or Host Copy the symbol files to the host computer.

  6. Host Optionally, if you want to debug your driver with access to all the symbols in the kernel, obtain or build a symboled kernel. For further information, contact Apple Developer Technical support. You can find the instructions for building the Darwin kernel from the Open Source code in the Building and Debugging Kernels in Kernel Programming Guide.

  7. Host Run gdb on the kernel.

    $ gdb /mach_kernel

    If you have a symboled kernel, specify the path to it rather than /mach_kernel. It is important that you run gdb on a kernel of the same version and build as the one that runs on the target computer. If the versions are different, you should obtain a symboled copy of the target’s kernel and use that.

  8. Host In gdb, add the symbol file of your driver.

    (gdb) add-symbol-file /tmp/symbols/com.acme.driver.MyDriver.sym

    Add the symbol files of the other kernel extensions in your driver’s dependency chain.

  9. Host Tell gdb that you will be debugging remotely.

    (gdb) target remote-kdp
  10. Target Break into kernel debugging mode. Depending on the model of your target system, either issue the appropriate keyboard command or press the programmer’s button. On USB keyboards, hold down the Command key and the Power button; on ADB keyboards, hold down the Control key and the Power button. If you’re running Mac OS X version 10.4 or later, hold down the following five keys: Command, Option, Control, Shift, and Escape.

    You may have to hold down the keys or buttons for several seconds until you see the “Waiting for remote debugger connection” message.

  11. Host Attach to the target computer and set breakpoints.

    (gdb) attach target.goober.com
    (gdb) break 'MyDriverClass::WriteData(* char)'
    (gdb) continue

    Be sure you give the continue command; otherwise the target computer is unresponsive.

  12. Target Start the driver running.

    $ kextload -m -t /tmp/MyDriver.kext

    The -m option starts the matching process for the driver. The -t option, which tells kextload to conduct extensive validation checks, is really optional here; ideally, your driver should have passed these checks during an earlier stage of debugging (see “Using kextload, kextunload, and kextstat”). After starting the driver, perform the actions necessary to trigger the breakpoint.

  13. Host When the breakpoint you set is triggered, you can begin debugging your driver using gdb commands. If you “source” the kernel debugging macros (see the following section, “Using the Kernel Debugging Macros”), you can use those as well.

Using the Kernel Debugging Macros

Apple includes a set of kernel debugging macros as part of Darwin. They have been written by engineers with an intimate knowledge of how the Darwin kernel works. Although it is possible to debug driver code without these macros, they will make the task much easier.

The kernel debugging macros probe the internal structures of a running Mac OS X system in considerable depth. With them you can get summary and detailed snapshots of tasks and their threads in the kernel, including such information as thread priority, executable names, and invoked functions. The kernel debugging macros also yield information on the kernel stacks for all or selected thread activations, on IPC spaces and port rights, on virtual-memory maps and map entries, and on allocation zones. See Table 7-2 for a summary of the kernel debugging macros.

Important: The kernel debugging macros described in this section will not work unless you have a symboled kernel. You can either build the Darwin kernel from the open source (see the Building and Debugging Kernels inKernel Programming Guide for details) or you can refer to the Kernel Debug Kit, available at http://developer.apple.com/sdk, which includes a copy of the kernel debug macros.

You can obtain the kernel debugging macros from the Darwin Open Source repository. They are in the .gdbinit file in the /xnu/osfmk branch of the source tree. Because .gdbinit is the standard name of the initialization file for gdb, you might already have your own .gdbinit file to set up your debugging sessions. If this is the case, you can combine the contents of the files or have a “source” statement in one .gdbinit file that references the other file. To include the macros in a .gdbinit file for a debugging session, specify the following gdb command shortly after running gdb on mach_kernel:

(gdb) source /tmp/.gdbinit

(In this example, /tmp represents any directory that holds the copy of the .gdbinit file you obtained from the Open Source repository.) Because the kernel debugging macros can change between versions of the kernel, make sure that you use the macros that match as closely as possible the version of the kernel you’re debugging.

Table 7-2  Kernel debugging macros

Macro

Description

showalltasks

Displays a summary listing of tasks

showallacts

Displays a summary listing of all activations

showallstacks

Displays the kernel stacks for all activations

showallvm

Displays a summary listing of all the VM maps

showallvme

Displays a summary listing of all the VM map entries

showallipc

Displays a summary listing of all the IPC spaces

showallrights

Displays a summary listing of all the IPC rights

showallkmods

Displays a summary listing of all the kernel extension binaries

showtask

Displays status of the specified task

showtaskacts

Displays the status of all activations in the task

showtaskstacks

Displays all kernel stacks for all activations in the task

showtaskvm

Displays status of the specified task's VM map

showtaskvme

Displays a summary list of the task's VM map entries

showtaskipc

Displays status of the specified task's IPC space

showtaskrights

Displays a summary list of the task's IPC space entries

showact

Displays status of the specified thread activation

showactstack

Displays the kernel stack for the specified activation

showmap

Displays the status of the specified VM map

showmapvme

Displays a summary list of the specified VM map's entries

showipc

Displays the status of the specified IPC space

showrights

Displays a summary list of all the rights in an IPC space

showpid

Displays the status of the process identified by PID

showproc

Displays the status of the process identified by a proc pointer

showkmod

Displays information about a kernel extension binary

showkmodaddr

Given an address, displays the kernel extension binary and offset

zprint

Displays zone information

paniclog

Displays the panic log information

switchtoact

Switch thread context

switchtoctx

Switch context

resetctx

Reset context

A subset of the kernel debugging macros are particularly useful for driver writers: showallstacks, switchtoact, showkmodaddr, showallkmods, and switchtoctx. The output of showallstacks lists all tasks in the system and, for each task, the threads and the stacks associated with each thread. Listing 7-5 shows the information on a couple tasks as emitted by showallstacks.

Listing 7-5  Example thread stacks shown by showallstacks

(gdb) showallstacks
...
task        vm_map      ipc_space  #acts   pid  proc        command
0x00c1e620  0x00a79a2c  0x00c10ce0    2     51  0x00d60760  kextd
            activation  thread      pri  state  wait_queue  wait_event
            0x00c2a1f8  0x00ccab0c   31  W      0x00c9fee8  0x30a10c <ipc_mqueue_rcv>
                        continuation=0x1ef44 <ipc_mqueue_receive_continue>
            activation  thread      pri  state  wait_queue  wait_event
            0x00c29a48  0x00cca194   31  W      0x00310570  0x30a3a0 <kmod_cmd_queue>
                kernel_stack=0x04d48000
                stacktop=0x04d4bbe0
                0x04d4bbe0  0xccab0c
                0x04d4bc40  0x342d8 <thread_invoke+1104>
                0x04d4bca0  0x344b4 <thread_block_reason+212>
                0x04d4bd00  0x334e0 <thread_sleep_fast_usimple_lock+56>
                0x04d4bd50  0x81ee0 <kmod_control+248>
                0x04d4bdb0  0x45f1c <_Xkmod_control+192>
                0x04d4be00  0x2aa70 <ipc_kobject_server+276>
                0x04d4be50  0x253e4 <mach_msg_overwrite_trap+2848>
                0x04d4bf20  0x257e0 <mach_msg_trap+28>
                0x04d4bf70  0x92078 <.L_syscall_return>
                0x04d4bfc0  0x10000000
                stackbottom=0x04d4bfc0
 
task        vm_map      ipc_space  #acts   pid  proc        command
0x00c1e4c0  0x00a79930  0x00c10c88    1     65  0x00d608c8  update
            activation  thread      pri  state  wait_queue  wait_event
            0x00ddaa50  0x00ddbe34   31  W      0x00310780  0xd608c8 <rld_env+10471956>
                        continuation=0x1da528 <_sleep_continue>

The typical number of stacks revealed by showallstacks runs into the dozens. Most of the threads associated with these stacks are asleep, blocked on continuation (as is that for the second task shown in the above example). Stacks such as these you can usually ignore. The remaining stacks are significant because they reflect the activity going on in the system at a particular moment and context (as happens when an NMI or kernel panic occurs).

Thread activations and stacks in the kernel—including those of drivers—belong to the task named kernel_task (under the command column). When you’re debugging a driver, you look in the active stacks in kernel_task for any indication of your driver or its provider, client, or any other object it communicates with. If you add the symbol files for these driver objects before you begin the debugging session, the indication will be much clearer. Listing 7-6 shows an active driver-related thread in kernel_task in the context of adjacent threads.

Listing 7-6  Kernel thread stacks as shown by showallstacks

       activation  thread      pri  state  wait_queue  wait_event
       0x0101ac38  0x010957e4   80  UW     0x00311510  0x10b371c <rld_env+13953096>
                 continuation=0x2227d0 <_ZN10IOWorkLoop22threadMainContinuationEv>
       activation  thread      pri  state  wait_queue  wait_event
       0x0101aaf0  0x01095650   80  R
             stack_privilege=0x07950000
             kernel_stack=0x07950000
             stacktop=0x07953b90
             0x07953b90  0xdf239e4 <com.apple.driver.AppleUSBProKeyboard + 0x19e4>
             0x07953be0  0xe546694 <com.apple.iokit.IOUSBFamily + 0x2694>
             0x07953c40  0xe5a84b4 <com.apple.driver.AppleUSBOHCI + 0x34b4>
             0x07953d00  0xe5a8640 <com.apple.driver.AppleUSBOHCI + 0x3640>
             0x07953d60  0xe5a93bc <com.apple.driver.AppleUSBOHCI + 0x43bc>
             0x07953df0  0x2239a8 <_ZN22IOInterruptEventSource12checkForWorkEv+18>
             0x07953e40  0x222864 <_ZN10IOWorkLoop10threadMainEv+104>
             0x07953e90  0x2227d0 <_ZN10IOWorkLoop22threadMainContinuationEv>
             stackbottom=0x07953e90
       activation  thread      pri  state  wait_queue  wait_event
       0x0101b530  0x0101c328   80  UW     0x00311500  0x10b605c <rld_env+13963656>
                 continuation=0x2227d0 <_ZN10IOWorkLoop22threadMainContinuationEv>

You can use showallstacks in debugging panics, hangs, and wedges. For instance, it might reveal a pair of threads that are deadlocked against each other or it might help to identify a thread that is not handling interrupts properly, thus causing a system hang.

Another common technique using the kernel debugging macros is to run the showallstacks macro and find the stack or stacks that are most of interest. Then run the switchtoact macro, giving it the address of a thread activation, to switch to the context of that thread and its stack. From there you can get a backtrace, inspect frames and variables, and so on. Listing 7-7 shows this technique.

Listing 7-7  Switching to thread activation and examining it

(gdb) switchtoact 0x00c29a48
(gdb) bt
#0  0x00090448 in cswnovect ()
#1  0x0008f84c in switch_context (old=0xcca194, continuation=0, new=0xccab0c) at
/SourceCache/xnu/xnu-327/osfmk/ppc/pcb.c:235
#2  0x000344b4 in thread_block_reason (continuation=0, reason=0) at
/SourceCache/xnu/xnu-327/osfmk/kern/sched_prim.c:1629
#3  0x000334e0 in thread_sleep_fast_usimple_lock (event=0xeec500, lock=0x30a3ac,
interruptible=213844) at /SourceCache/xnu/xnu-327/osfmk/kern/sched_prim.c:626
#4  0x00081ee0 in kmod_control (host_priv=0xeec500, id=4144, flavor=213844,
data=0xc1202c, dataCount=0xc12048) at /SourceCache/xnu/xnu-327/osfmk/kern/kmod.c:602
#5  0x00045f1c in _Xkmod_control (InHeadP=0xc12010, OutHeadP=0xc12110) at
mach/host_priv_server.c:958
#6  0x0002aa70 in ipc_kobject_server (request=0xc12000) at
/SourceCache/xnu/xnu-327/osfmk/kern/ipc_kobject.c:309
#7  0x000253e4 in mach_msg_overwrite_trap (msg=0xf0080dd0, option=3, send_size=60,
rcv_size=60, rcv_name=3843, timeout=12685100, notify=172953600, rcv_msg=0x0,
scatter_list_size=0) at /SourceCache/xnu/xnu-327/osfmk/ipc/mach_msg.c:1601
#8  0x000257e0 in mach_msg_trap (msg=0xeec500, option=13410708, send_size=213844,
rcv_size=4144, rcv_name=172953600, timeout=178377984, notify=256) at
/SourceCache/xnu/xnu-327/osfmk/ipc/mach_msg.c:1853
#9  0x00092078 in .L_syscall_return ()
#10 0x10000000 in ?? ()
Cannot access memory at address 0xf0080d10
(gdb) f 4
#4  0x00081ee0 in kmod_control (host_priv=0xeec500, id=4144, flavor=213844,
data=0xc1202c, dataCount=0xc12048) at /SourceCache/xnu/xnu-327/osfmk/kern/kmod.c:602
602                     res = thread_sleep_simple_lock((event_t)&kmod_cmd_queue,
(gdb) l
597                 simple_lock(&kmod_queue_lock);
598
599                 if (queue_empty(&kmod_cmd_queue)) {
600                     wait_result_t res;
601
602                     res = thread_sleep_simple_lock((event_t)&kmod_cmd_queue,
603                                        &kmod_queue_lock,
604                                        THREAD_ABORTSAFE);
605                     if (queue_empty(&kmod_cmd_queue)) {
606                         // we must have been interrupted!

Remember that when use the switchtoact that you’ve actually changed the value of the stack pointer. You are in a different context than before. If you want to return to the former context, use the resetctx macro.

The showallkmods and showkmodaddr macros are also useful in driver debugging. The former macro lists all loaded kernel extensions in a format similar to the kextstat command-line utility (Listing 7-8 shows a few lines of output). If you give the showkmodaddr macro the address of an “anonymous” frame in a stack, and if the frame belongs to a driver (or other kernel extension), the macro prints information about the kernel extension.

Listing 7-8  Sample output from the showallkmods macro

(gdb) showallkmods
kmod        address     size        id      refs    version  name
0x0ebc39f4  0x0eb7d000  0x00048000  71      0       3.2  com.apple.filesystems.afpfs
0x0ea09480  0x0ea03000  0x00007000  70      0       2.1  com.apple.nke.asp_atp
0x0e9e0c60  0x0e9d9000  0x00008000  69      0       3.0  com.apple.nke.asp_tcp
0x0e22b13c  0x0e226000  0x00006000  68      0       1.2  com.apple.nke.IPFirewall
0x0e225600  0x0e220000  0x00006000  67      0       1.2  com.apple.nke.SharedIP
0x0df5d868  0x0df37000  0x00028000  62      0       1.2  com.apple.ATIRage128
0x0de96454  0x0de79000  0x0001e000  55      3       1.3 com.apple.iokit.IOAudioFamily
...

Tips on Using gdb

If you hope to become proficient at I/O Kit driver debugging, you’ll have to become proficient in the use of gdb. There’s no getting around this requirement. But even if you are already familiar with gdb, you can always benefit from insights garnered by other driver writers from their experience.

Examining Computer Instructions

If you don’t have symbols for a driver binary—and even if you do—you should try examining the computer instructions in memory to get a detailed view of what is going on in that binary. You use the gdb command x to examine memory in the current context; usually, x is followed by a slash (“/”) and one to three parameters, one of which is i. The examine-memory parameters are:

For example, if you want to examine 10 instructions before and 10 instructions after the current context (as described in “Tips on Debugging Panics”), you could issue a command such as:

(gdb) x/20i $pc -40

This command says “show me 20 instructions, but starting 40 bytes” (4 bytes per instruction) “before the current address in the program counter” (the $pc variable). Of course, you could be less elaborate and give a simple command such as:

(gdb) x/10i 0x001c220c

which shows you 10 computer instructions starting at a specified address. Listing 7-9 shows you a typical block of instructions.

Listing 7-9  Typical output of the gdb “examine memory” command

(gdb) x/20i $pc-40
0x8257c <kmod_control+124>:     addi    r3,r27,-19540
0x82580 <kmod_control+128>:     bl      0x8d980 <get_cpu_data>
0x82584 <kmod_control+132>:     addi    r0,r30,-19552
0x82588 <kmod_control+136>:     lwz     r31,-19552(r30)
0x8258c <kmod_control+140>:     cmpw    r31,r0
0x82590 <kmod_control+144>:     bne+    0x825c0 <kmod_control+192>
0x82594 <kmod_control+148>:     mr      r3,r31
0x82598 <kmod_control+152>:     addi    r4,r27,-19540
0x8259c <kmod_control+156>:     li      r5,2
0x825a0 <kmod_control+160>:     bl      0x338a8
                                     <thread_sleep_fast_usimple_lock>
0x825a4 <kmod_control+164>:     lwz     r0,-19552(r30)
0x825a8 <kmod_control+168>:     cmpw    r0,r31
0x825ac <kmod_control+172>:     bne+    0x825c0 <kmod_control+192>
0x825b0 <kmod_control+176>:     addi    r3,r27,-19540
0x825b4 <kmod_control+180>:     bl      0x8da00 <fast_usimple_lock+32>
0x825b8 <kmod_control+184>:     li      r3,14
0x825bc <kmod_control+188>:     b       0x82678 <kmod_control+376>
0x825c0 <kmod_control+192>:     lis     r26,49
0x825c4 <kmod_control+196>:     li      r30,0
0x825c8 <kmod_control+200>:     lwz     r0,-19552(r26)

Needless to say, you need to know some assembler in order to make sense of the output of the examine-memory command. You don’t need to be an expert in assembler, just knowledgeable enough to recognize patterns. For example, it would be beneficial to know how pointer indirection with an object looks in computer instructions. With an object, there are two indirections really, one to get the data (and that could be null) and one to an object’s virtual table (the first field inside the object). If that field doesn’t point to either your code or kernel code, then there’s something that might be causing a null-pointer exception. If your assembler knowledge is rusty or non-existent, you can examine the computer instructions for your driver’s code that you know to be sound. By knowing how “healthy” code looks in assembler, you’ll be better prepared to spot divergences from the pattern.

Breakpoints

Using breakpoints to debug code inside the kernel can be a frustrating experience. Often kernel functions are called so frequently that, if you put a breakpoint on a function, it’s difficult to determine which particular case is the one with the problem. There are a few things you can do with breakpoints to ameliorate this.

Single-Stepping

Single-stepping through source code does not necessarily take you from one line to the next. You can bounce around in the source code quite a bit because the compiler does various things with the symbols to optimize them.

There are two things you can do to get around this. If it’s your code you’re stepping through, you can turn off optimizations. Or you can single-step through the computer instructions in assembler because one line of source code typically generates several consecutive lines of assembler. So, if you find it hard to figure things out by single-stepping through source, try single-stepping through assembler.

To single-step in gdb, use the stepi command (si for short). You can get a better view of your progress if you also use the display command, as in this example:

(gdb) display/4i $pc

This displays the program counter and the next three instructions as you step.

Debugging Kernel Panics

You might be familiar with kernel panics: those unexpected events that cripple a system, leaving it completely unresponsive. When a panic occurs on Mac OS X, the kernel prints information about the panic that you can analyze to find the cause of the panic. On pre-Jaguar systems, this information appears on the screen as a black and white text dump. Starting with the Jaguar release, a kernel panic causes the display of a message informing you that a problem occurred and requesting that you restart your computer. After rebooting, you can find the debug information on the panic in the file panic.log at /Library/Logs/.

If you’ve never seen it before, the information in panic.log might seem cryptic. Listing 7-10 shows a typical entry in the panic log.

Listing 7-10  Sample log entry for a kernel panic

Unresolved kernel trap(cpu 0): 0x300 - Data access DAR=0x00000058 PC=0x0b4255b4
Latest crash info for cpu 0:
   Exception state (sv=0x0AD86A00)
      PC=0x0B4255B4; MSR=0x00009030; DAR=0x00000058; DSISR=0x40000000; LR=0x0B4255A0;
      R1=0x04DE3B50; XCP=0x0000000C (0x300 - Data access)
      Backtrace:
         0x0B4255A0 0x000BA9F8 0x001D41F8 0x001D411C 0x001D6B90 0x0003ACCC
         0x0008EC84 0x0003D69C 0x0003D4FC 0x000276E0 0x0009108C 0xFFFFFFFF
      Kernel loadable modules in backtrace (with dependencies):
         com.acme.driver.MyDriver(1.6)@0xb409000
Proceeding back via exception chain:
   Exception state (sv=0x0AD86A00)
      previously dumped as "Latest" state. skipping...
   Exception state (sv=0x0B2BBA00)
      PC=0x90015BC8; MSR=0x0200F030; DAR=0x012DA94C; DSISR=0x40000000; LR=0x902498DC;
      R1=0xBFFFE140; XCP=0x00000030 (0xC00 - System call)
 
Kernel version:
Darwin Kernel Version 6.0:
Wed May  1 01:04:14 PDT 2002; root:xnu/xnu-282.obj~4/RELEASE_PPC

This block of information has several different parts, each with its own significance for debugging the problem.

Table 7-3  Types of kernel exceptions

Trap Value

Type of Kernel Trap

0x100

System reset

0x200

Computer check

0x300

Data access

0x400

Instruction access

0x500

External interrupt

0x600

Alignment exception

0x700

Illegal instruction

General Procedure

There are many possible ways to debug a kernel panic, but the following course of action has proven fruitful in practice.

  1. Get as many binaries with debugging symbols as possible.

    Make a note of all the kernel extensions listed under “Kernel loadable modules in backtrace”. If you don’t have debugging symbols for some of them, try to obtained a symboled version of them or get the source and build one with debugging symbols. This would include mach_kernel, the I/O Kit families, and other KEXTs that are part of the default install. You need to have the same version of the kernel and KEXT binaries that the panicked computer does, or the symbols won’t line up correctly.

  2. Generate and add symbol files for each kernel extension in the backtrace.

    Once you’ve got the kernel extension binaries with (or without) debugging symbols, generate relocated symbol files for each KEXT in the backtrace. Use kextload with the -s and -n options to do this; kextload prompts you for the load address of each kernel extension, which you can get from the backtrace. Alternatively, you can specify the -a option with -s when using kextload to specify KEXTs and their load addresses. Although you don’t need to relocate symbol files for all kernel extensions, you can only decode stack frames in the kernel or in KEXTs that you have done this for. After you run gdb on mach_kernel (preferably symboled), use gdb’s add-symbol-file command for each relocatable symbol files you’ve generated; see “Setting Up for Two-Machine Debugging” for details.

  3. Decode the addresses in the panic log.

    Start with the PC register and possibly the LR (Link Register). (The contents of the LR should look like a valid text address, usually a little smaller than the PC-register address.) Then process each address in the backtrace, remembering to subtract four from each of the stack addresses to get the last instruction executed in that frame. One possible way to go about it is to use a pair of gdb commands for each address:

    (gdb) x/i <address>-4
    ...
    (gdb) info line *<address>-4

    You need the asterisk in front of the address in the info command because you are passing a raw address rather than the symbol gdb expects. The x command, on the other hand, expects a raw address so no asterisk is necessary.

    Listing 7-11 gives an example of a symbolic backtrace generated from x/i <address>-4. You’ll know you’ve succeeded when all the stack frames decode to some sort of branch instruction in assembler.

  4. Interpret the results.

    Interpreting the results of the previous step is the hardest phase of debugging panics because it isn’t mechanical in nature. See the following section, “Tips on Debugging Panics,” for some suggestions.

Listing 7-11  Example of symbolic backtrace

(gdb) x/i 0x001c2200-4
0x1c21fc <IOService::PMstop(void)+320>: bctrl
0xa538260 <IODisplay::stop(IOService *)+36>:    bctrl
0x1bbc34 <IOService::actionStop(IOService *, IOService *)+160>: bctrl
0x1ccda4 <runAction__10IOWorkLoopPFP8OSObjectPvn3_iPB2Pvn3+92>: bctrl
0x1bc434 <IOService::terminateWorker(unsigned long)+1824>:      bctrl
0x1bb1f0 <IOService::terminatePhase1(unsigned long)+928>:       bl
0x1bb20c <IOService::scheduleTerminatePhase2(unsigned long)>
0x1edfcc <IOADBController::powerStateWillChangeTo(unsigned long, unsigned long, IOService *)+88>:       bctrl
0x1c54e8 <IOService::inform(IOPMinformee *, bool)+204>: bctrl
0x1c5118 <IOService::notifyAll(bool)+84>:       bl
0x1c541c <IOService::inform(IOPMinformee *, bool)>
0x1c58b8 <IOService::parent_down_05(void)+36>:  bl
0x1c50c4 <IOService::notifyAll(bool)>
0x1c8364 <IOService::allowCancelCommon(void)+356>:      bl
0x1c5894 <IOService::parent_down_05(void)>
0x1c80a0 <IOService::serializedAllowPowerChange2(unsigned long)+84>:    bl
0x1c8200 <IOService::allowCancelCommon(void)>
0x1ce198 <IOCommandGate::runAction(int (*)(OSObject *, void *, void *, void *, void *), void *, void *, void *, void *)+184>: bctrl
0x1c802c <IOService::allowPowerChange(unsigned long)+72>:       bctrl
0xa52e6c ????
0x3dfe0 <_call_thread_continue+440>:    bctrl
0x333fc <thread_continue+144>:  bctrl

Tips on Debugging Panics

The following tips might make debugging a panic easier for you.

Debugging System Hangs

System hangs are, after kernel panics, the most serious condition caused by badly behaved kernel code. A hung system may not be completely unresponsive, but it is unusable because you cannot effectively click the mouse button or type a key. You can categorize system hangs, and their probable cause, by the behavior of the mouse cursor.

For system hangs with the first symptom—the cursor doesn’t spin and won’t move—your first aim should be to find out what caused the interrupt. Why is the hardware controlled by your driver raising the interrupt? If your driver is using a filter interrupt event source (IOFilterInterruptEventSource), you might want to investigate that, too. With a filter event source a driver can ignore interrupts that it thinks aren’t its responsibility.

With any system hang, you should launch gdb on the kernel, attach to the hung system and run the showallstacks macro. Scan the output for threads that are deadlocked against each other. Or, if it’s an unhandled primary interrupt that you suspect, find the running thread; if it is the one that took the interrupt, it is probably the thread that’s gone into an infinite loop. If the driver is in an infinite loop, you can set a breakpoint in a frame of the thread’s stack that is the possible culprit; when you continue and hit the breakpoint almost immediately, you know you’re in an infinite loop. You can single-step from there to find the problem.

Debugging Boot Drivers

The Mac OS X BootX booter copies drivers for hardware required in the boot process into memory for the kernel’s boot-time loading code to load. Because boot drivers are already loaded by the time the system comes up, you do not have as much control over them as you do over non-boot drivers. In addition, a badly behaving boot driver can cause your system to become unusable until you are able to unload it. For these reasons, debugging techniques for boot drivers vary somewhat from those for other drivers.

The most important step you can take is to treat your boot driver as a non-boot driver while you are in the development phase. Remove the OSBundleRequired property from your driver’s Info.plist file and use the techniques described in this chapter to make sure the driver is performing all its functions correctly before you declare it to be a boot driver.

After you’ve thoroughly tested your driver, add the OSBundleRequired property to its Info.plist (see the document Loading Kernel Extensions at Boot Time to determine which value your driver should declare). This will cause the BootX booter to load your driver into memory during the boot process.

If your boot driver does have bugs you were unable to find before, you cannot use gdb to debug it because it is not possible to attach to a computer while it is booting. Instead, you must rely on IOLog output to find out what is happening. IOLog is synchronous when you perform a verbose boot so you can use IOLog statements throughout your boot driver’s code to track down the bugs. See “Using IOLog” for more information on this function.

To perform a verbose boot, reboot holding down both the Command and V keys. To get even more detail from the I/O Kit, you can set a boot-args flag before rebooting. Assuming root privileges with the sudo command, type the following on the command line

%sudo nvram boot-args="io=0xffff"
Password:
%shutdown -r now

Although this technique produces voluminous output, it can be difficult to examine because it scrolls off the screen during the boot process. If your boot driver does not prevent the system from completing the boot process, you can view the information in its entirety in the system log at /var/log/system.log.



< Previous PageNext Page > Hide TOC


Last updated: 2007-03-06




Did this document help you?
Yes: Tell us what works for you.

It’s good, but: Report typos, inaccuracies, and so forth.

It wasn’t helpful: Tell us what would have helped.
Get information on Apple products.
Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Copyright © 2007 Apple Inc.
All rights reserved. | Terms of use | Privacy Notice