Intermittently browser is getting disconnected with NetworkExtension

Hi,

Greetings for the day!

We would like to update you that we have created Content Filter NetworkExtension and this extension is working fine till Big Sur M1 however we are facing some strange problem in M1 Monterey. Intermittently, When we try to browse websites, it does not respond and after 3-5 minutes its opened the websites correctly.

We would like to update you that our subclass overrides handleNewFlow, handleInboundDataFromFlow, handleOutboundDataFromFlow, handleInboundDataCompleteForFlow and handleOutboundDataCompleteForFlow. In all these methods we first check whether NEFilterFlow is nil or not and then pauseVerdict and once asynchronous methods completes execution then we call resumeFlow with verdict (allowVerdict/dropVerdict).

When above mentioned issue generated we collected console streaming log and found these lines in the logs (Not from our application):

Ignoring resume command for flow 3c8faf3c4a9f7 which does not exist

Ignoring resume command for flow 3c90795d4d6f9 which does not exist

Ignoring resume command for flow 3c9086d1ede69 which does not exist

Ignoring resume command for flow 3c909b251d53b which does not exist

We are not sure how above line get printed because we don’t have this logs in our source code so we would need your help to understand this problem and resolution so that we can solve this issue.

We have couple of extra queries:

  1. What is flow mentioned in above logs in bold text? Is it NEFilterFlow's identifier or something else?
  2. How we can validate whether NEFilterFlow is valid or not before calling resumeFlow
  3. Why above line is getting printed in log which says flow does not exist.
  4. Is there any timeout maintained by NetworkExtension?
  5. We are using XPC for interprocess communication so our question is that, Is NetworkExtension/XPC maintain the queue size and if it overflow the size then above line is getting printed. If this is the case then how we can handle that?
  6. Is it known issue in NetworkExtension framework itself on M1 Monterey?

Thanks & Regards,

Mohmad Vasim

Replies

We are not sure how above line get printed because we don’t have this logs in our source code Why above line is getting printed in log which says flow does not exist.

This log looks like it's being emitted from our system.

How we can validate whether NEFilterFlow is valid or not before calling resumeFlow

There is no API to tell if the flow is valid or not. Certainly if the flow has been dropped then it will be invalid but your app's code should be able to detect this. Also, if your app pauses a UDP based flow for more than 10 seconds then the flow will be dropped automatically, so be mindful of this behavior.

Is there any timeout maintained by NetworkExtension?

TCP based flows can be paused forever, UDP based flows can only be paused for 10 seconds before they are dropped by the system.

We are using XPC for interprocess communication so our question is that, Is NetworkExtension/XPC maintain the queue size and if it overflow the size then above line is getting printed. If this is the case then how we can handle that?

The logs that added in your post do not look like they are the result of XPC issues in your network extension.

Is it known issue in NetworkExtension framework itself on M1 Monterey?

Like every framework Network Extension has bugs. My advice here would be if you can reproduce this issue with the SimpleFirewall example on an M1 Mac then it would be grounds for a bug report.

Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com

Thanks Matt for your valuable response!

There is no API to tell if the flow is valid or not. Certainly if the flow has been dropped then it will be invalid but your app's code should be able to detect this. Also, if your app pauses a UDP based flow for more than 10 seconds then the flow will be dropped automatically, so be mindful of this behavior.

if app pause UDP flow more than 10 seconds and we see "Ignoring resume command for flow 3c8faf3c4a9f7 which does not exist" does any property of NEFilterFlow say remoteEndpoint or some other property is set to nil which we can check before invoking resumeFlow call.

This log looks like it's being emitted from our system.

As its very intermittent to reproduce the problem It would be appreciated if you elaborate more on this and provide us some hints why and when it will be generated by seeing the system code so that we will focus on that and will try to generate this again.

Thanks & Regards Mohmad Vasim

does any property of NEFilterFlow say remoteEndpoint or some other property is set to nil which we can check before invoking resumeFlow call.

Not that I'm aware of. You could check the flow description in your logs, but I would only use this for debugging purposes.

Regarding:

Ignoring resume command for flow 3c9086d1ede69 which does not exist It would be appreciated if you elaborate more on this and provide us some hints why and when it will be generated by seeing the system code so that we will focus on that and will try to generate this again.

I looked at this under the hood and this log is indeed generated by our Network Extension framework code when resumeFlow is called on a NEFilterFlow that is now consider nil by the provider. It looks like the main reasons this can happen is that flow identifier is lost, or becomes nil because it cannot be picked up as a NEFilterSocketFlow or NEFilterBrowserFlow (NEFilterBrowserFlow being for iOS only).

Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com