NEMachServiceName failure to access after network extension upgrade

We have a product which uses a Network Extension (a socket filter and a packet content filter). The application contains the network extension, as well as an un-sandboxed LaunchDaemon which connects to the service at the NEMachServiceName.

Occasionally, usually after an upgrade where the system extension is swapped for the new version, our un-sandboxed process isn't able to contact the network extension. From the logging, we receive the following XPC error

(libxpc.dylib) [com.apple.xpc:connection] [0x7fd6d0307f40] failed to do a bootstrap look-up: xpc_error=[3: No such process]

in the unsandboxed process. Eventually, we receive an invalidated callback on the XPC connection with the error Couldn’t communicate with a helper application.. We have confirmed that an appropriate service is running via the launchctl command, and the network extension process appears to have initialised correctly. We don't see any indication of a received connection at the Network Extension process however (probably not surprising given the error).

Once a system enters this state, repeated attempts to connect are unsuccessful and continue to produce the same error.

We've also confirmed that there are no XPC codec exceptions apparent that might cause the connection to fail.

I'm at a bit of a loss to explain why this failure might be occurring, other than a problem in the bootstrap/launchd being able to find the appropriate service. Is there possibly some problem with unsandboxed processes accessing the sandboxed network extension via XPC? They are both provisioned in an app group together. Is there possibly some issue where attempting to connect at a critical point during network extension installation causes it to become inaccessible?

We've observed this specifically on macOS 14.5 (23F79), however this is something we've noticed on other versions of macOS and our code. The problem isn't systematic, and systems end up in this state only occasionally. We do seem to find some customers have more instances of this problems than others, but we haven't been successful at teasing out any common thread that might explain why.

I think I’ve seen this before. Check out this thread.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Ah yes, this one looks fairly familiar thanks for pointing me to it. If I follow it correctly it seems the current workaround available is to unload and reload the system extension?

Our app (and a few other things) are distributed by .pkg rather than a dmg drag-n-drop install. Installing system extensions is done by our host app's main binary. We install a LaunchAgent that detects that installation needs to take place and triggers an app launch via NSWorkspace in a user's session. Most customers perform the pkg installation via Fleet Management (like Jamf), I suppose it's possible they are doing this outside a user session which could cause some problems?

Is there anything we can provide that would let Apple do further diagnostics on this? Most of our customers probably wouldn't notice since they manage system extension permissions via MDM, but for those few working without this, the user experience is pretty sub-optimal.

Sorry I didn’t reply earlier. Due to a technical mixup at my end, I missed your post.

If I follow it correctly it seems the current workaround available is to unload and reload the system extension?

That’s the way I interpret it as well, but I’ve not actually tried replicating the issue myself.

Is there anything we can provide that would let Apple do further diagnostics on this?

It’s tricky, given that you can’t reproduce it at will. I see two avenues to explore here:

  • Try to make it more reproducible

  • Debug it in the field

There’s a definite trade-off here. If you’re able to reproduce it, you’ll be able to file a better bug. And you’ll be able to test any workarounds you come up with. OTOH, it’s possible that it only reproduces at random, in which case you’ve spent a lot of time with no benefit.

Anyway, I don’t have any good advice on how you might try to reproduce it. You know your product much better than I do!

In terms of debugging it in the field, I have some general advice on that front in Using a Sysdiagnose Log to Debug a Hard-to-Reproduce Problem.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

NEMachServiceName failure to access after network extension upgrade
 
 
Q