We've received logs and have spuriously reproduced the following behavior:
- calls to
setTunnelNetworkSettingscompleting withNETunnelProviderErrorwhere the code is networkSettingsInvalid, and the error domain string is empty. - After subsequent calls to
setTunnelNetworkSettings, the tunnel is stopped via theuserInitiatedstop reason within around 1 second from the first failure.
This happens after a number of successful calls to setTunnelNetworkSettings have been made in the lifetime of a given packet tunnel process.
We can confirm that no user ever initiates the disconnection. We can confirm that the only significant changes between the different calls to setTunnelNetworkSettings are that the parameters contain different private IPs for the tunnel settings - the routes and DNS settings remain the same.
In our limited testing, it seems that we can replicate the behavior we're observing by removing the VPN profile while the tunnel is up. However, we are certain the same behavior happens under other circumstances without any user interaction. Is this what memory starvation looks like? Or is this something else?
Our main concern is that the tunnel is killed and it is not brought back up even though our profile is set to be on-demand. It's difficult to give any promises about leaks to our users if the tunnel can be killed at any point and not be brought back.
The spurious disconnections are a security issue for our app, we'd like to know if there's anything we can do differently so that this does not happen.
We tried to get DTS, but given that we have no way to reproduce this issue with a minimal project. But we can reproduce the behavior (kill the tunnel by removing it's profile) from a minimal Xcode project, is that considered good enough for a reproduction?
the only significant changes between the different calls [are] different private IPs for the tunnel settings
Given that, and the hard-to-reproduce nature of this issue, it seems likely that this is a problem on the OS side of things. Normally I’d suggest that you file a bug about this, but I doubt that’d get traction with the info you’ve presented here. I have a couple of suggestions for how to improve that.
First, you really need a sysdiagnose log, taking shortly after reproducing the problem. That’s hard to get if you can’t reproduce the problem, but I have some hints on that topic in Using a Sysdiagnose Log to Debug a Hard-to-Reproduce Problem.
Ideally this would be from a machine with NE debugging enabled. See VPN (Network Extension) for xyzOS on our Bug Reporting > Profiles and Logs. Still, that’s gonna be hard to get in this case, so there’s no need to aim for the ideal here. If you can get any sysdiagnose log for this — well, any one taken shortly after reproducing the problem — that’ll be fine for your bug report.
Second, you wrote:
the error domain string is empty.
I suspect that’s a problem with your logging, because that error should always be in NETunnelProviderErrorDomain. NE errors are derived from Objective-C, so I recommend that you convert the Swift error to an NSError (let ns = e as NSError) and then poke around in that. That’ll give you access to both the error domain and the user info dictionary, and you should definitely look at the latter because it might contain some useful hints.
If you do file a bug, please post your bug number, just for the record.
Oh, one last thing. When I wrote “shortly after reproducing” I meant “after reproducing the real problem”. The tricks you’ve done to reproduce the symptoms are cool, and they’d be a good way to test the error logging changes I’ve suggested, but investigating the real issue requires a sysdiagnose log taken after seeing the real problem.
Share and Enjoy
—
Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"