How to know when `NEPacketTunnelProvider` has been cleaned up?

Question

Created 5d

Replies 3

Boosts 0

Participants 2

I have noticed race conditions on macOS when tearing down and re-configuring an NEPacketTunnelProvider.

My goal is to handle switching out one VPN profile for another identical/near identical one (I'll add some context for this below).

The flow that I have tested was to wait for the NEVPNStatusDidChange notification to report a NEVPNStatus.disconnected state, and then start the process of re-configuring the VPN with a new profile.

In practice however, I have noticed that I must wait a couple of seconds between NEVPNStatus.disconnected state being reported and setting up a new tunnel. Otherwise, the system routing table gets messed up but the VPN reports being in NEVPNStatus.connected state, resulting in a tunnel that appears healthy but can't be accessed.

With this, I wanted to ask if you have any suggestions on any OS items I can observer, in order to deterministically know that the system has fully cleaned up my packet tunnel, and that I am safe to configure another? This would be much more optimal than a hard-coded delay.

Additional context:

Jamf is a common solution for deploying MDM configuration profiles. However, in my tests, it doesn't support Apple's recommended approach of using the PayloadIdentifier to mark profiles for replacement, as PayloadIdentifiers are automatically updated to match the PayloadUUID of that same profile on upload. Although given what I've observed, I'm not sure the Apple recommended approach would work here in any case.

Additionally, it would be nice to transition from non-MDM to MDM cleanly, however, this also requires an indeterminate wait time between the non-MDM configuration being disconnected and subsequently removed, and the MDM one being configured.

With these scenarios, we need to be able to add a second configuration, with possibly identical VPN settings, then remove the old one, allowing the system to transition to the new configuration.

For the MDM case, the pattern I've noticed on the system is that when the current profile is suddenly deleted, the connection will go into disconnected state, then NEVPNConfigurationChange will fire. The new profile can be configured from NEVPNConfigurationChange, however some time is needed to avoid races.

For non-MDM, I had experimented with an approach of polling for MDM configurations appearing. When they do, I'd remove my previous notification observers, and set up a new NEVPNStatusDidChange notification observer, to remove the non-MDM VPN configuration after. it enters a disconnected state. Following the removal, I would call a function to reconfigure the VPN with new configuration. When this logic is in place, the call to stopVPNTunnel() is made. Again, a hardcoded delay is required between stopping and removing the old configuration and setting up a new one.

Thanks!

Answered by DTS Engineer in 873948022

Thanks for all that info.

apologies for my extra-long answer

Au contraire, that’s exactly the sort of info I need (-:

First up, let’s talk about how the API should work. The NETunnelProviderManager APIs are asynchronous for a reason: They’re not supposed to complete until they’re done. In your first example, either one of the following should be true:

Either removeFromPreferences() should not return [1] until the tunnel is closed.
Or the open tunnel shouldn’t get in the way of the system establishing the new settings.

The fact that you have to add arbitrary delays is clearly a bug IMO, and I encourage you to file it as such.

Given that you can easily reproduce this, I recommend that you enable extra NE logging via the VPN (Network Extension) for macOS instructions on our page Bug Reporting > Profiles and Logs and attach a sysdiagnose log taken shortly after seeing the issue.

Please post your bug number, just for the record.

As to a workaround, it’s hard to say anything definitive without seeing it in action, and that’s gonna be hard given that your neck deep in MDM stuff. However, my best suggestion is to monitor the state of the System Configuration dynamic store during this process. It’s likely that the system state that’s causing you problems in reflected there and, if so, you can use its notification mechanism to wait for it to stabilise.

The dynamic store is pretty obscure. I have some general background and links to documentation in this post. For tests like this I usually start out by prototyping my work with the scutil command-line tool. That gives me some assurance that dealing with this gnarly CF-level API will be worth it (-:

For more about this tool, see the scutil man page.

Share and Enjoy
—
Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

[1] I’m using the Swift async function terminology here because that it’s much more convient. I realise that you’re using the completion handler variant. And that’s fine. Just read complete for return.

Boost

Answer 1

DTS Engineer OP

Apple

2d

I want to get a better understanding of what you’re actually doing. Lemme summarise my understanding so far:

You have a VPN app for macOS.
It contains an Network Extension packet tunnel provider.
From the app, you want to change the app’s VPN configuration.

Is that right?

If so, some questions:

Is the provider packaged as an appex? Or a sysex? Or have you tried both and seen no difference?
I presume you’re doing this configuration using NETunnelProviderManager from the container app. If so, what sequence of APIs are you calling?

I’m asking because the usual way to change a tunnel’s settings from the container app is:

Load the preferences.
Make the changes.
Save the preferences.

That last operation doesn’t necessarily tear down the packet tunnel provider. Rather, the provider is expected to observe its configuration — via KVO on the protocolConfiguration property — and adjust accordingly, meaning that no tear down is necessary.

Share and Enjoy
—
Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

0

Answer 2

dylandylandylan OP

2d

Thanks for the quick response, and apologies for my extra-long answer -- I thought it best to be thorough here! Admittedly, I'm still learning the ropes a little with these APIs, so you may be able to spot some key detail I'm missing.

Is that right?

Your understanding is mostly correct. However, from the container app, I'm specifically interested in being able to dynamically switch configurations (NETunnelProviderManager instances), e.g., going from a non-MDM configuration to an MDM configuration. Outside of starting and stopping connections, I don't care to modify the configurations themselves, like for example, changing the server address of a configuration.

Is the provider packaged as an appex? Or a sysex? Or have you tried both and seen no difference?

Sorry, I should have specified. It's packaged as a sysex; I haven't experimented with an appex.

I presume you’re doing this configuration using NETunnelProviderManager from the container app. If so, what sequence of APIs are you calling?

Yes, the non-async NETunnelProviderManager APIs are being used from the container app.

I'll provide the example of transitioning from a non-MDM configuration to an MDM configuration as it feels like I have tighter control over it. The sequence is:

Call NETunnelProviderManager.loadAllFromPreferences, then parse the managers to detect if an MDM configuration exists -- our MDM profile has a key to detect this.
(from step 1 load completion handler) If found, call loadAllFromPreferences again, grabbing a reference to the VPN profile to be removed. From the completion handler, remove NEVPNStatusDidChange and NEVPNConfigurationChange observers temporarily, to ensure they're not affecting upcoming logic.
(still in step 2 load completion handler) Confirm that the NEVPNConnection associated with the manager is not in NEVPNStatus.disconnected state -- in the problem path, it is not.
(still in step 2 completion handler) Set up a temporary NEVPNStatusDidChange notification handler, to watch for NEVPNStatus.disconnected state -- this notification only observes changes to the manager to be removed.
- When the notification triggers with NEVPNStatus.disconnected state, call oldManager.removeFromPreferences().
  - Within the removal completion handler, call loadAllFromPreferences, re-register handlers for previously removed VPN notifications, and call startVPNTunnel() on the MDM-added manager.
(Back to step 2 completion handler) Call oldManager.connection.stopVPNTunnel(), to eventually trigger the code in step 4.

In practice, when I do this, I'll end up with a routing table that's missing the route for my VPN network. To "fix" this, I've been adding a hard-coded delay between entering the removeFromPreferences() callback, and beginning the logic to start the new VPN tunnel.

I’m asking because the usual way to change a tunnel’s settings from the container app is: ... 3. Save the preferences

Note: Since the newly added profile is added to the system preferences by MDM, I do not call saveToPreferences() after loading it and prior to calling startVPNTunnel() on it. I only call saveToPreferences when setting up non-MDM configurations.

The steps for transitioning from one MDM-managed VPN configuration to another is shorter and more crude, since it deals with yanking an active configuration out of the system:

Set up observer NEVPNConfigurationChange, which will call loadAllFromPreferences() followed by startVPNTunnel() (from callback).
With one MDM-added configuration already active and connected on the system, add another MDM configuration (with Jamf in this case), so that there are now two VPN configurations in the system settings -- the current one in .connected state and the new one in .disconnected state.
Use MDM management tool (Jamf) to remove currently .connected profile, causing the system to put it into disconnected state (from what I've observed) and at the same time triggering the NEVPNConfigurationChange handler from step 1.

In this case, I've noticed a five second wait is the rough margin I need after the NEVPNConfigurationChange notification, in order to successfully configure and start the new tunnel, without the aforementioned routing table issues. Since the profile is just being yanked out in this case, I definitely understand why there's more room for error when taking down one configuration and adding another.

0

Answer 3

DTS Engineer OP

Apple

1d

Recommended

Thanks for all that info.

apologies for my extra-long answer

Au contraire, that’s exactly the sort of info I need (-:

First up, let’s talk about how the API should work. The NETunnelProviderManager APIs are asynchronous for a reason: They’re not supposed to complete until they’re done. In your first example, either one of the following should be true:

Either removeFromPreferences() should not return [1] until the tunnel is closed.
Or the open tunnel shouldn’t get in the way of the system establishing the new settings.

The fact that you have to add arbitrary delays is clearly a bug IMO, and I encourage you to file it as such.

Given that you can easily reproduce this, I recommend that you enable extra NE logging via the VPN (Network Extension) for macOS instructions on our page Bug Reporting > Profiles and Logs and attach a sysdiagnose log taken shortly after seeing the issue.

Please post your bug number, just for the record.

As to a workaround, it’s hard to say anything definitive without seeing it in action, and that’s gonna be hard given that your neck deep in MDM stuff. However, my best suggestion is to monitor the state of the System Configuration dynamic store during this process. It’s likely that the system state that’s causing you problems in reflected there and, if so, you can use its notification mechanism to wait for it to stabilise.

The dynamic store is pretty obscure. I have some general background and links to documentation in this post. For tests like this I usually start out by prototyping my work with the scutil command-line tool. That gives me some assurance that dealing with this gnarly CF-level API will be worth it (-:

For more about this tool, see the scutil man page.

Share and Enjoy
—
Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

[1] I’m using the Swift async function terminology here because that it’s much more convient. I realise that you’re using the completion handler variant. And that’s fine. Just read complete for return.

0