sysext crashed while sending lots of log to host app

hi all.

I subscribe the notify write event, every time I recieve a notify write event message i will send log data and reply block(didn't do nothing) with async method to host app(Objc XPC API).host app will reply immediately once it recieves data. after a while my sysext crashed, then I checked system log find the log below.

launchd: exited with exit reason (namespace: 30 code: 0xc40000000004aaaa) - (unknown reason)

is it because of exceeding the maximum limit of xpc's block queue length, or too many memory allocation, or...

by the way, host app didn't crash.

how this happened exactly? how could i solve it?

Answered by DTS Engineer in 789091022

So, starting with the error:

launchd: exited with exit reason (namespace: 30 code: 0xc40000000004aaaa) - (unknown reason)

The "namespace" values comes from reason.h in XNU and "30" is "OS_REASON_PORT_SPACE". Basically, something went wrong with your mach port. The error code is in the kernel's structured error format, which uses a bit field structured as (system | subsystem | code). IOKit's IOReturn.h has the clearest description of the format that I'm aware though, to be clear, that is NOT an IOKit error. You could decode the specific error by digging into mach's error handling, but I'm fairly sure I know what's going wrong based on this:

I subscribe the notify write event, every time I recieve a notify write event message i will send log data

The key word there is "every time". Those word are a warning sign with any ES message, especially when tied to a high volume message like notify write. What you're describing means that every call to "write" across the entire monitored system will end up generating a mach message to a single mach port being processed by a single thread. If that thread can't process those messages as fast as they're generated, then the only question is what will break first*.

*As a side note here, I assume you muted the host app (the receiving app)? If you haven't and the host app then writes every log message to disk... well, that's going to fail a lot faster.

The big thing to understand here is that the actual risk here ISN'T normal system load, it's an attacker who is actively trying to break your ES client. How many write calls per-second can a process generate? Say 100 threads calling "write(1 byte)" in a tight loop? I'd need to test it to say for certain, but it's certainly "1000s", probably "10,000s", maybe "100,000s"? And, of course, there's no reason an attacker needs to stick to one process. Whatever the "base" count is, an attacker can multiply it indefinitely by duplicating itself. The key point here is that the issue here isn't something you can fix by optimizing or tweaking your current approach. The attacker can generate LOTS of writes with very little effort.

The solution here is to break the direct connection between notify messages your own messaging. Typically, that means your ES client collect the data over time, then pushes the data to your other process at a slower/throttled pace.


-Kevin Elliott
DTS Engineer, CoreOS Hardware

I’d like to clarify you architecture here. You wrote:

every time I recieve a notify write event message i will send log data and reply block (didn't do nothing) with async method to host app (Objc XPC API).

What do you mean by “host app” in this context? Do you mean the container app? That is, the app in which the sysex is embedded?

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Accepted Answer

So, starting with the error:

launchd: exited with exit reason (namespace: 30 code: 0xc40000000004aaaa) - (unknown reason)

The "namespace" values comes from reason.h in XNU and "30" is "OS_REASON_PORT_SPACE". Basically, something went wrong with your mach port. The error code is in the kernel's structured error format, which uses a bit field structured as (system | subsystem | code). IOKit's IOReturn.h has the clearest description of the format that I'm aware though, to be clear, that is NOT an IOKit error. You could decode the specific error by digging into mach's error handling, but I'm fairly sure I know what's going wrong based on this:

I subscribe the notify write event, every time I recieve a notify write event message i will send log data

The key word there is "every time". Those word are a warning sign with any ES message, especially when tied to a high volume message like notify write. What you're describing means that every call to "write" across the entire monitored system will end up generating a mach message to a single mach port being processed by a single thread. If that thread can't process those messages as fast as they're generated, then the only question is what will break first*.

*As a side note here, I assume you muted the host app (the receiving app)? If you haven't and the host app then writes every log message to disk... well, that's going to fail a lot faster.

The big thing to understand here is that the actual risk here ISN'T normal system load, it's an attacker who is actively trying to break your ES client. How many write calls per-second can a process generate? Say 100 threads calling "write(1 byte)" in a tight loop? I'd need to test it to say for certain, but it's certainly "1000s", probably "10,000s", maybe "100,000s"? And, of course, there's no reason an attacker needs to stick to one process. Whatever the "base" count is, an attacker can multiply it indefinitely by duplicating itself. The key point here is that the issue here isn't something you can fix by optimizing or tweaking your current approach. The attacker can generate LOTS of writes with very little effort.

The solution here is to break the direct connection between notify messages your own messaging. Typically, that means your ES client collect the data over time, then pushes the data to your other process at a slower/throttled pace.


-Kevin Elliott
DTS Engineer, CoreOS Hardware

I dug into XNU code, found the SENDING_NOTIFICATION__THIS_PROCESS_HAS_TOO_MANY_MACH_PORTS function in task.c, the code in this func generate the errcode seems to fit the log I mentioned before. I gotta clarify that I use dispatch_async to dispatch es_msg_handling to a concurrent queue. I don't know much about xpc's underlying mechanism, it seems that too many threads use xpc to send too much data will end up generating too many mach ports,

in another situation, I keep the amount of log to be send, but threads will leave the sending data operation to a serial queue. eventually, system extension exit because of OS_REASON_JETSAM. so it seems OK to send data through XPC in single thread no matter how much the data is. or at least, it may not happen before JETSAM

What you're describing means that every call to "write" across the entire monitored system will end up generating a mach message to a single mach port being processed by a single thread.

I dug into XNU code, found the SENDING_NOTIFICATION__THIS_PROCESS_HAS_TOO_MANY_MACH_PORTS function in task.c, the code in this func generate the errcode seems to fit the log I mentioned before.

Yes, that's probably it.

I gotta clarify that I use dispatch_async to dispatch es_msg_handling to a concurrent queue. I don't know much about xpc's underlying mechanism, it seems that too many threads use xpc to send too much data will end up generating too many mach ports,

XPC effectively operates as a "wrapper" around standard mach port communication and, yes that could lead to excessive mach port usage if used incorrectly. Similarly:

in another situation, I keep the amount of log to be send, but threads will leave the sending data operation to a serial queue. eventually, system extension exit because of OS_REASON_JETSAM. so it seems OK to send data through XPC in single thread no matter how much the data is. or at least, it may not happen before JETSAM

Moving everything to a single thread will consolidate mach port usage. The problem is that IF your messages are coming in faster than you can send them, then you'll eventually crash because of "something else", as the collection of queued messages grows over time. That's why you also need to consider how many messages you actually need/want to send and contraint your activity to that.


Kevin Elliott
DTS Engineer, CoreOS/Hardware

sysext crashed while sending lots of log to host app
 
 
Q