Hello everyone,
I am currently experiencing an issue with a network extension that we've developed, which seems to be causing problems on a select number of machines.
Problem Overview:
Our extension is designed to analyze all DNS traffic and block access to suspicious domains. Most of the time, this works seamlessly. However, I've observed repeated log messages on a few machines that hint at some network-related problems. Accompanying these logs, I've also noticed occasional spikes in CPU usage.
Logs:
Below are the logs captured from the Console that are repeated over and over again:
[C659 IPv6#3738d855.53 udp, attribution: developer] restart
[C659 IPv6#3738d855.53 waiting parent-flow (unsatisfied (No network route), dns)] event: path:restart @14383.841s, uuid: C7F27BD5-E86F-4076-A03E-1BD6A9C4C405
[C659 IPv6#3738d855.53 waiting parent-flow (unsatisfied (No network route), dns)] event: path:unsatisfied @14383.841s, uuid: C7F27BD5-E86F-4076-A03E-1BD6A9C4C405
[C659 IPv6#3738d855.53 in_progress parent-flow (unsatisfied (No network route), dns)] event: flow:start_child @14383.841s
nw_connection_report_state_with_handler_on_nw_queue [C659] reporting state preparing
[C659.157 IPv6#3738d855.53 initial path ((null))] event: path:start @14383.841s
[C659.157 IPv6#3738d855.53 waiting path (unsatisfied (No network route), dns)] event: path:unsatisfied @14383.842s, uuid: C7F27BD5-E86F-4076-A03E-1BD6A9C4C405
[C659.157 IPv6#3738d855.53 failed path (unsatisfied (No network route), dns)] event: null:null @14383.842s
[C659 IPv6#3738d855.53 waiting parent-flow (unsatisfied (No network route), dns)] event: flow:child_failed @14383.842s
nw_connection_report_state_with_handler_on_nw_queue [C659] reporting state waiting
[C659 IPv6#3738d855.53 waiting parent-flow (unsatisfied (No network route), dns)] event: path:restart @14383.842s, uuid: C7F27BD5-E86F-4076-A03E-1BD6A9C4C405
[C659 IPv6#3738d855.53 udp, attribution: developer] restart
Details & Observations:
- The high CPU usage appears randomly, and I haven't discerned any specific pattern.
- When CPU usage increase, the only way to go back to 0-0.3, is to restart the computer or restart the extension.
- A considerable amount of the above logs are generated over and over again, in the console, in a very short amount of time (in 15 seconds there are about ok 500k logs)
- Of the 600 machines using this extension, only 4 are exhibiting this issue.
- I've thoroughly checked the network configuration of the problematic machines, and I haven't found any disparities when compared to the ones working seamlessly.
- In cases where our extension can't determine if access to a specific domain should be blocked, we forward a request to our backend for verification. The code snippet used for this is: URLRequest(url: requestUrl) ... urlSession.dataTask(with: request)...resume(). Given that this method works perfectly on other machines, I'm inclined to believe this isn't the root issue.
I'm reaching out to understand if anyone has encountered a similar problem or if there are any insights into what might be causing this. Any guidance or suggestions would be greatly appreciated. Also would be helpful if anyone can help me to break down the anatomy of the logs.
Here is the main classes that handle the entire UDP flow, maybe you can spot any issue in the code.
If I have to guess, a single flow fails, and then automatically tries to resolve over and over again, but I don't know how to add a counter for the number of retries.
Thank you in advance for your assistance!
PS: A ScreenShot from the console to see the times between logs: https://ibb.co/pvjVD89