Could you share your understanding of the crash and give any hints on how we can fix it?
So, let me actually start by commenting on this:
At first, we assumed that the issue is with hardware.
The first thing to understand here is that DEXTs are FULLY capable of panicking the kernel and probably always will be, particularly PCI DEXTs. The main benefit DEXTs provide is that they DRAMATICALLY improve overall system security and risk by constraining the "range" of what it's POSSIBLE for a component to do. Your DEXT only has access to a very limited set of kernel data, so that's the ONLY kernel data your DEXT interacts with. It's possible for a network DEXT to disrupt the network stack, but it's very difficult to see how it would disrupt the file system.
However, your DEXT is still being given access to many of the same resources it would have access to as a KEXT, and many of those resources are inherently dangerous. In the case of the PCI family, that issue is quite direct— I don't know of any way to build a "safe" API that allows for the direct manipulation of physical memory bus addresses and DMA.
Shifting to the panic logs:
Please let us know if you need any additional data. Thank you
For reference, this forum thread outlines how to symbolicate our modern kernel panic format. The process is a bit laborious, but it will ultimately give you a stack trace for every thread in the system at the point you panic. In any case, if you symbolicate either panic, you'll find that both panics are from your driver:
0 kernel.release.t6041 0xfffffe0008af1e58 panic_trap_to_debugger + 944 (debug.c:1403)
1 kernel.release.t6041 0xfffffe00093f59c8 panic + 60 (debug.c:1159)
2 kernel.release.t6041 0xfffffe0008c8f334 generic_platform_error_handler + 2220 (generic_platform_error_handler.c:803)
3 kernel.release.t6041 0xfffffe0008c65de4 sleh_synchronous + 412 (sleh.c:1442)
4 kernel.release.t6041 0xfffffe0008aa3d48 fleh_synchronous + 72
5 [2, 0]
User Frames
0 PCIDriverKit 0x18018f798 IOPCIDevice::MemoryRead32(unsigned char, unsigned long long, unsigned int*, unsigned int) + 96 (IOPCIDevice.cpp:295)
1 [305, 70596]
2 [305, 46508]
3 [305, 16428]
4 [305, 27460]
5 [305, 24932]
6 DriverKit 0x1800731e8 IOTimerDispatchSource::TimerOccurred_Invoke(IORPC, OSMetaClassBase*, void (*)(OSMetaClassBase*, OSAction*, unsigned long long), OSMetaClass const*) + 152 (IOTimerDispatchSource.iig.cpp:845)
7 [305, 83236]
8 [305, 81620]
9 DriverKit 0x1800490cc OSMetaClassBase::Invoke(IORPC) + 772 (uioserver.cpp:1614)
10 DriverKit 0x18006a700 IODispatchSource::CheckForWork(bool, int (*)(OSMetaClassBase*, IORPC)) + 316 (IODispatchSource.iig.cpp:631)
11 DriverKit 0x18004dffc invocation function for block in IOTimerDispatchSource::Create_Impl(IODispatchQueue*, IOTimerDispatchSource**) + 192 (uioserver.cpp:4159)
12 libdispatch.dylib 0x180ad4c48 _dispatch_continuation_pop + 600 (queue.c:349)
13 libdispatch.dylib 0x180ae7c84 _dispatch_source_invoke + 2712 (source.c:966)
14 libdispatch.dylib 0x180ad8f44 _dispatch_lane_serial_drain + 336 (queue.c:3991)
15 libdispatch.dylib 0x180ad9bf0 _dispatch_lane_invoke + 440 (queue.c:4082)
16 libdispatch.dylib 0x180adaf0c _dispatch_workloop_invoke + 1624 (queue.c:4761)
17 libdispatch.dylib 0x180ae42b8 _dispatch_root_queue_drain_deferred_wlh + 292 (queue.c:7265)
18 libdispatch.dylib 0x180ae3ba8 _dispatch_workloop_worker_thread + 692 (queue.c:6859)
19 libsystem_pthread.dylib 0x180c6e66c _pthread_wqthread + 408 (pthread.c:2696)
20 libsystem_pthread.dylib 0x180c756fc start_wqthread + 8
I don't have the symbol data necessary to symbolicate your DEXTs frames, but you can do it using the instructions I referenced earlier. That leads to here:
The error points to the same physical address FAR=0xa40100008 even if the hosts are different.
My guess is that you've got some kind of memory corruption issue in your DEXT, which is then leading to the same physical address getting going into MemoryRead32. The invalid address then panics the kernel.
__
Kevin Elliott
DTS Engineer, CoreOS/Hardware