Debugging Thunderbolt Drivers

This chapter contains debugging tips that may be useful when tracking down problems in Thunderbolt drivers.

Debugging VT-d I/O MMU Virtualization

Newer Thunderbolt-capable Macs provide support for I/O MMU virtualization (VT-d). This technology allows virtual machines to have direct access to hardware. As a consequence of this support, when your device performs DMA operations, the I/O addresses it uses may be different from the physical addresses as seen by the OS X kernel.

The complete spec can be found here: Intel® Virtualization Technology for Directed I/O Specification.

Ivy Bridge systems running OS X v10.8.2 and later enable the Intel VT-d unit as a DMA remapper. This functionality is supported by the current APIs in much the same way as the DART controller on PowerMac G5 computers. (To learn how this works at a high level, read Supporting DMA on 64-Bit System Architectures).

When VT-d is enabled, some of the changes you'll see as a results of this are:

If you ask for unmapped physical addresses—by calling an IOMemoryDescriptor object’s getPhysicalSegment method with the option kIOMapperNone or an IODMACommand object’s initWithSpecification method with mappingOptions set to something other than kMapped—and try to use them for DMA, your driver will break.

Enabling VT-d Panics

By default, VT-d faults are logged to /var/log/system.log. A VT-d fault looks like the following:

vtd[0] fault: device 13:0:0 reason 0x6 R:0x3dff000

In the example above, 13:0:0 is the bus, device, and function of the device that generated the fault. You can determine if your device produced the fault by looking at the pci-debug field in the output of the ioreg or in the IORegistryExplorer application and comparing the values.

For a list of reason codes, see the Intel® Virtualization Technology for Directed I/O Specification.

The R indicates that a read operation triggered the fault. (A W indicates a write.) The final value is the address that the device was trying to read or write (expressed as an I/O-space address).

When debugging faults, it can be useful to configure your kernel to panic whenever a fault occurs. To do this, add the following flag in your kernel boot args:

pci=0x100

Disabling VT-d

When debugging PCIe device drivers, it is often useful to temporarily disable VT-d so that I/O addresses are the same as the corresponding physical addresses. To disable VT-d, add the following to your kernel boot args:

dart=0x0

If the problem you are debugging goes away in this mode, it usually indicates one of the following mistakes:

  • Some part of your code is incorrectly passing a physical address in RAM to your device for DMA purposes instead of an I/O address.

  • Your code failed to call prepare or called prepare incorrectly on an IOMemoryDescriptor object before using it to perform DMA.

Debugging PCIe Pause

By default, the OS X kernel spreads out PCI address allocations. As a result, PCIe pause events occur infrequently, which makes these events challenging to debug. You can make debugging easier by disabling this allocation spreading behavior. With spreading disabled, nearly every hot plug event triggers a pause event.

To disable allocation spreading, add the following to your kernel boot-args string:

pci=0x200

Avoiding Memory Leaks

Drivers should ensure release of any and all resources acquired over the lifecycle of the driver. In particular, memory mapped I/O ranges, memory, and objects should all be freed and double-checked for any leaks. Use of tools such as ioclasscount can help identify some of these types of leaks. A common form of leaks is introduced by references (that is, retain counts). Often, it is difficult to determine the source of these references. One way to determine the source of these references is to override the taggedRetain() method and obtain a backtrace of the caller, which you can then send to printf(), kprintf(), or IOTimeStampConstant().

Listing 4-1  Searching for Memory Leaks

   void
   AppleSamplePCI::taggedRetain ( const void * tag ) const
   {
             void *    bt[16] = { 0 };
 
             OSBacktrace ( &bt[0], sizeof ( bt ) / sizeof ( bt[0] ) );
             super::taggedRetain ( tag );
   }

Using standard symbolication tools allows you to determine which functions or methods caused the reference(s) to be taken. Furthermore, you can override taggedRelease() and match the retains and releases to find calls to taggedRetain(), which have no corresponding call to taggedRelease().