OnDemand VPN connection stuck in NO INTERNET

We create custom VPN tunnel by overriding PacketTunnelProvider on MacOS. Normal VPN connection works seamlessly. But if we enable onDemand rules on VPN manager, intemittently during tunnel creation via OnDemand, internet goes away on machine leading to a connection stuck state.

Why does internet goes away during tunnel creation?

Answered by DTS Engineer in 876714022

OK. In that case I don’t see any way to make this work )-:

When you set an on-demand rule, connections that match that rule are held until the demand is satisfied. This makes sense when you think about the intended use case for on-demand rules, namely, a split VPN. Typically this pans out as follows:

  1. There’s a site that’s only available on the organisation’s intranet.
  2. The device manager deploys an on-demand VPN configuration to access that intranet.
  3. The user runs an app that connects to that site.
  4. The system treats that as demand and starts the VPN connection.
  5. And holds the app’s connection until the VPN connection is established.
  6. Once that’s done, it releases the app’s connection, which then connects to the site over the VPN.

This yields an obvious chicken’n’problem when the VPN provider relies on a connection that also matches the on-demand rule. The system can avoid this problems if the provider does it directly, from within its own process. This is the same sort of logic that NECP uses to avoid VPN loops. But if the provider’s connection somehow depends on some other unrelated process, tracking that dependency is hard and AFAIK there’s no facility within the system to do it.

You could file a bug about this, requesting that we tweak the system to understanding this dependency. However, that’s unlikely to be an easy fix.

Note If you do file a bug:

  • Enabled relevant debugging on a test machine (definitely VPN (Network Extension) but I think that Single Sign-On also makes sense).
  • Attach a sysdiagnose log taken from a machine in this stuck state.
  • Please post your bug number, just for the record.

As to what you can do about this right now, you need to find a way to break this dependency loop. For example:

  • You might limit the scope of your on-demand rules.
  • Or change how you authenticate these requests.
  • One switch to per-app VPN, targeting a specific list of apps that doesn’t include your SSO app.

Just as an FYI, extremely wide on-demand VPN rules are a common source of problems. I often see them deployed when folks are trying to use a packet tunnel provider for something that isn’t VPN. TN3120 Expected use cases for Network Extension packet tunnel providers has a general discussion of that issue.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

But we use some other auth as well where URLSession isn't used

OK. But what API is used in that case?

we still land into no internet for few ms

Should I interpret “ms” as milliseconds?

I believe on-demand flow itself has some issues.

Oh, I’m not disagreeing with you. Rather, I’m trying to characterise this problem so that I can advise you as to how best to proceed.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

  1. Lower level cpp APIs for creating TCP socket and reading/writing over it. This works perfectly always.
  2. Yes ms is milliseconds.
  3. I think no internet is seen at app level / http clients / apis. That's the reason URLSession also fails and Teams call also experiences drops.

In this 2nd auth scenario, internet drop is only for few ms, on-demand connection succeeds eventually and we don't experience any issues.

Yes ms is milliseconds.

OK, so we’re talking about very short transient failures here, right?

If so, that’s not super unexpected. As the networking reconfigures, existing connections can fail and their replacements might not connect immediately.

Our preferred networking APIs have a waits-for-connectivity feature so, when you start a connection, it won’t fail immediately but instead will wait for the connection to start. This is very different from the traditional BSD Sockets model.

I talk about this in some depth in TN3151 Choosing the right networking API, and specifically in the Connect by name and BSD Sockets best practices sections.

This is one of the reasons why I asked how NWConnection behaves in this scenario.

One further thing to note here is that, for compatibility reasons, the waits-for-connectivity feature is not the default with URLSession. You have to enable it via the waitsForConnectivity property on your session configuration. So it’d be interesting to see how that behaves in this environment.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

waitsForConnectivity property is not enabled for URLsession configuration. URLSession leads to no internet internmittently but when it does, there is no mitigation other than disabling always-on. That's the biggest problem right now.

OK.

We’ve been talking about URLSession both inside your packet tunnel provider and inside client apps. Does this problem affect both?

And earlier I described a diagnostic test involving NWConnection. Have you run that yet? If not, please do. And once you’re done, reply back here with the results.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

  1. We are using URLSession inside startTunnel() provided by Packet tunnel provider

  2. Tested with NWConnection as well - it is also not receiving any response.

Response is received only after ondemand is disabled which brings the internet connectivity back.

I think there's an issue with onDemand flow itself rather than URLSession / NWConnection. Our 2nd test which mentioned about teams call drops also adds to it (Which doesn't use either of them)

2- Tested with NWConnection as well - it is also not receiving any response.

And just to be clear, that was from within your packet tunnel provider?

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Yes

Packet tunnel provider is caliing some another app which is using URLSession

Packet tunnel provider is caliing some another app which is using URLSession

Huh? Apps can’t ‘call’ other apps on iOS, and similarly for app extensions. So what do you mean by this?

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

  • It's MacOS
  • We are calling some custom extension using Apple's enterprise single sign-on feature.

https://developer.apple.com/documentation/authenticationservices

It's MacOS

D’oh! Yes, sorry, I had my iOS coloured glasses on )-:

We are calling some custom extension using Apple's enterprise single sign-on feature.

So your packet tunnel provider is using URLSession to issue a request that’s triggering enterprise SSO which then ends up calling an SSO app extension?

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Yes

OK. In that case I don’t see any way to make this work )-:

When you set an on-demand rule, connections that match that rule are held until the demand is satisfied. This makes sense when you think about the intended use case for on-demand rules, namely, a split VPN. Typically this pans out as follows:

  1. There’s a site that’s only available on the organisation’s intranet.
  2. The device manager deploys an on-demand VPN configuration to access that intranet.
  3. The user runs an app that connects to that site.
  4. The system treats that as demand and starts the VPN connection.
  5. And holds the app’s connection until the VPN connection is established.
  6. Once that’s done, it releases the app’s connection, which then connects to the site over the VPN.

This yields an obvious chicken’n’problem when the VPN provider relies on a connection that also matches the on-demand rule. The system can avoid this problems if the provider does it directly, from within its own process. This is the same sort of logic that NECP uses to avoid VPN loops. But if the provider’s connection somehow depends on some other unrelated process, tracking that dependency is hard and AFAIK there’s no facility within the system to do it.

You could file a bug about this, requesting that we tweak the system to understanding this dependency. However, that’s unlikely to be an easy fix.

Note If you do file a bug:

  • Enabled relevant debugging on a test machine (definitely VPN (Network Extension) but I think that Single Sign-On also makes sense).
  • Attach a sysdiagnose log taken from a machine in this stuck state.
  • Please post your bug number, just for the record.

As to what you can do about this right now, you need to find a way to break this dependency loop. For example:

  • You might limit the scope of your on-demand rules.
  • Or change how you authenticate these requests.
  • One switch to per-app VPN, targeting a specific list of apps that doesn’t include your SSO app.

Just as an FYI, extremely wide on-demand VPN rules are a common source of problems. I often see them deployed when folks are trying to use a packet tunnel provider for something that isn’t VPN. TN3120 Expected use cases for Network Extension packet tunnel providers has a general discussion of that issue.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

But in our case,

  1. URL Session is calling a public endpoint. That's the reason, it succeeds even when VPN connection is being established.

  2. It happens intermittently. Majorly after sleep/wakeup or when enabling onDemand randomly.

Scenario that works always:

  1. Connect VPN manually. VPN established successfully
  2. Disconnect manually.
  3. Enable on demand. VPN established successfully automatically.

So, I think it's not the same issue as mentioned above.

OnDemand VPN connection stuck in NO INTERNET
 
 
Q