I have noticed race conditions on macOS when tearing down and re-configuring an NEPacketTunnelProvider.
My goal is to handle switching out one VPN profile for another identical/near identical one (I'll add some context for this below).
The flow that I have tested was to wait for the NEVPNStatusDidChange notification to report a NEVPNStatus.disconnected state, and then start the process of re-configuring the VPN with a new profile.
In practice however, I have noticed that I must wait a couple of seconds between NEVPNStatus.disconnected state being reported and setting up a new tunnel. Otherwise, the system routing table gets messed up but the VPN reports being in NEVPNStatus.connected state, resulting in a tunnel that appears healthy but can't be accessed.
With this, I wanted to ask if you have any suggestions on any OS items I can observer, in order to deterministically know that the system has fully cleaned up my packet tunnel, and that I am safe to configure another? This would be much more optimal than a hard-coded delay.
Additional context:
Jamf is a common solution for deploying MDM configuration profiles. However, in my tests, it doesn't support Apple's recommended approach of using the PayloadIdentifier to mark profiles for replacement, as PayloadIdentifiers are automatically updated to match the PayloadUUID of that same profile on upload. Although given what I've observed, I'm not sure the Apple recommended approach would work here in any case.
Additionally, it would be nice to transition from non-MDM to MDM cleanly, however, this also requires an indeterminate wait time between the non-MDM configuration being disconnected and subsequently removed, and the MDM one being configured.
With these scenarios, we need to be able to add a second configuration, with possibly identical VPN settings, then remove the old one, allowing the system to transition to the new configuration.
For the MDM case, the pattern I've noticed on the system is that when the current profile is suddenly deleted, the connection will go into disconnected state, then NEVPNConfigurationChange will fire. The new profile can be configured from NEVPNConfigurationChange, however some time is needed to avoid races.
For non-MDM, I had experimented with an approach of polling for MDM configurations appearing. When they do, I'd remove my previous notification observers, and set up a new NEVPNStatusDidChange notification observer, to remove the non-MDM VPN configuration after. it enters a disconnected state. Following the removal, I would call a function to reconfigure the VPN with new configuration. When this logic is in place, the call to stopVPNTunnel() is made. Again, a hardcoded delay is required between stopping and removing the old configuration and setting up a new one.
Thanks!
Thanks for all that info.
apologies for my extra-long answer
Au contraire, that’s exactly the sort of info I need (-:
First up, let’s talk about how the API should work. The NETunnelProviderManager APIs are asynchronous for a reason: They’re not supposed to complete until they’re done. In your first example, either one of the following should be true:
- Either
removeFromPreferences()should not return [1] until the tunnel is closed. - Or the open tunnel shouldn’t get in the way of the system establishing the new settings.
The fact that you have to add arbitrary delays is clearly a bug IMO, and I encourage you to file it as such.
Given that you can easily reproduce this, I recommend that you enable extra NE logging via the VPN (Network Extension) for macOS instructions on our page Bug Reporting > Profiles and Logs and attach a sysdiagnose log taken shortly after seeing the issue.
Please post your bug number, just for the record.
As to a workaround, it’s hard to say anything definitive without seeing it in action, and that’s gonna be hard given that your neck deep in MDM stuff. However, my best suggestion is to monitor the state of the System Configuration dynamic store during this process. It’s likely that the system state that’s causing you problems in reflected there and, if so, you can use its notification mechanism to wait for it to stabilise.
The dynamic store is pretty obscure. I have some general background and links to documentation in this post. For tests like this I usually start out by prototyping my work with the scutil command-line tool. That gives me some assurance that dealing with this gnarly CF-level API will be worth it (-:
For more about this tool, see the scutil man page.
Share and Enjoy
—
Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"
[1] I’m using the Swift async function terminology here because that it’s much more convient. I realise that you’re using the completion handler variant. And that’s fine. Just read complete for return.