Big Sur reproducible DNS resolution issues

Hello,

Since the upgrade to Big Sur, I noticed network issues, regardless which network device is used Wi-Fi, LAN, it does not matter.

After some testing the issue is now reproducible as follows:
  • Mount a samba share and copy a file to it, in my case it was PDF file with size of approx. 1.6 MB

  • The copy process does not finish and any samba share is no longer accessible

  • Furthermore DNS resolution no longer works: e.g. executing dig apple.com no longer works, opening any website in any browser just times out, etc.

  • Strangely enough ping apple.com works (...?)

  • If the browser has cached a domain name, the website opens just fine

What I tried so far to investigate the issue:
  • I scrolled over the logs in the Console.app .. found nothing

  • Killed the macOS services mDNSResponderHelper, mDNSResponder .. nope

  • Flushed the DNS cache: sudo dscacheutil -flushcache .. nothing

  • Search the WWW for related issues and found these: Big Sur Network Connectivity Issue, Big Sur DNS Issue .. these two might be related, but who knows..

  • So far, only a reboot fixes this...

Can anyone else reproduce this issue ?
Any further ideas ?

Best Regards
SH
Post not yet marked as solved Up vote post of shoelzle Down vote post of shoelzle
34k views
  • I'll add my me to to the list. Since Big Sur - but not the original release, not sure which one in particular. Several different DNS related issues:

    Cisco AnyConnect client 4.9.04043. DNS is routed through the tunnel. Works great. System sleeps because I walk away, tunnel disconnects. Tunnel reconnects upon wake, DNS resolution is broken. HUP the mDNSResponder and everything is back to normal. This never happens if the laptop doesn't sleep.Safari will randomly have issues with DNS and return NXDOMAIN, even though DNS resolution via dig and ping are working fine. But it doesn't do it with pages that are already cached. Other browsers will work fine when Safari is broken. Need to close Safari and re-open to resolve this issue.
Add a Comment

Replies

There's one thought I'd like to add:
The issue is very annoying, since the stated procedure is most probably not the only way to trigger this issue.
I work all day with my Mac and at least once per day I have to reboot.
That is odd, especially if this is only happening with Samba. I would suggest opening a bug report here with an attached sysdiagnose while the issue is taking place.

Please install the following debug profiles before you take a sysdiagnose and reproduce the issue:
  • Network Diagnostics

  • Net-diagnose

  • mDNSResponder

https://developer.apple.com/bug-reporting/profiles-and-logs/?platform=macos

Please note the time and date the issue occurred on your Bug Report.


Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com
I already filed a bug report: https://feedbackassistant.apple.com/feedback/8963072.
The sysdiagnose file is attached, however the profiles you mentioned where not enabled at the time the sysdiagnose file was created.

Referring to your comment "That is odd, especially if this is only happening with Samba", there's an additional thought I already posted here replying to my own initial post: https://developer.apple.com/forums/thread/670856?answerId=655592022#655592022
I am also experiencing similar problem when using Cisco AnyConnect VPN client — some sites can be unresolvable, but after resetting mDNS (sudo killall -HUP mDNSResponder) they start to work again, for some time, or other "behind" VPN may become unresolvable.
The most irritating though is that during these problems, nslookup can always resolve the hostname while following ping will refuse to resolve.
Post not yet marked as solved Up vote reply of Ajax Down vote reply of Ajax
I have the same problem.
If you find a solution, please write it here.
I am having issues with big sur and DNS resolution, as well.

For me, the situation appears most repeatably when running docker login to an amazonaws.com elastic container repository but it happens other times, as well.

The symptom is that the machine comes almost to a stand-still eventually showing a rainbow spinner. After 1-3 minutes things recover.

I thought it might be related to specific programs running (openvpn client, docker, zoom, dropbox, google drive, etc) but even with those closed/logged out it happens so I am pretty sure it's related to DNS.

It is hard to do anything to troubleshoot while the problem is happening because the system is unresponsive.
I am seeing the exact same problem as Ajax. I am using Cisco AnyConnect and some sites are periodically unresolvable. Frustrating. Just like Ajax, I can't ping the name, but I can do an nslookup to get the IP. Once I have the IP, I can ping the address with no problems. No problems at all hitting the internet while connected to VPN. We use a split tunnel.

One observation... Chrome does not seem to be as susceptible to the issue. Firefox and Safari will not load certain pages when the issue occurs. So, why does Chrome work. Does it somehow use the IP address? I am not sure how it resolves the address unless it is cached.
I confirm that I have that same exact issue since I upgraded to Big Sur. I have to reboot at least one or twice a day because of the DNS issue.
any news on the issue yet?
i'm also facing this issue on my iMacPro and it's very annoying. Sometimes it happen, even when i'm trying to access a docker container running on localhost.

Was thinking about reinstall macOS, since my MacBook Pro does not have the problem, but that would be the last thing to try.
I'm having a very similar, if not identical issue. I'm running Big Sur (11.2.3) and Cisco AnyConnect 4.9.05042. I think I only have this problem while I'm connected to the VPN. When it does happen, I've noticed that a ping to google.com will fail. To fix it, I've been running "sudo dscacheutil -flushcache; sudo killall -HUP mDNSResponder" and it will recover almost immediately. I really would like to resolve this issue, so I'm going to look at installing some of the tools and submit a bug report that were suggested above. I don't know if this is related to a samba connection, as I do occassionally connect to our fileshare at work. It's very annoying and I haven't been able to pinpoint exactly what the circumstances are that cause it.
I am also facing similar issue with my VPN client on BigSur. However no issues seen on Catalina.
I have posted it here: https://developer.apple.com/forums/thread/677032

Can someone please tell if there is any workaround on this?
I'm having the same issue when using MotionPro Plus after I just upgraded to Big Sur. Under Catalina it worked just fine.
Now, when I enable Secure Tunnel I cannot resolve any internet DNS names anymore. I can connect to any internet IP, it just doesn't resolve any DNS. Any ping will just hang after I connect over VPN using MotionPro Plus.

This does not have anything to do with how the VPN is set up. I'm using the exact same configuration as under Catalina.
Wish I didn't upgrade now.
We're experiencing the same issue with Catalina and Big Sur up through 11.3.1. We're running Catalina and Big Sur (11.3.1) and Cisco AnyConnect 4.9.06093 and 4.10.00093. When connected to VPN, DNS does not work for anything...
I've been experiencing this issue on Big Sur 11.3.1 - using OpenVPN and a variety of other commercial VPN clients. The issue that I see, which isn't being reported here is that effectively macOS is holding onto resolver information and not allowing to be cleared out via certain mechanisms that would normally work.

When you disconnect your VPN client, and issue a scutil --dns | egrep -i '(domain|nameserver)', entries that were pushed by the VPN client are left behind, which is definitely new and broken behavior. Trying to clear these entries with sudo dscacheutil -flushcache; sudo killall -HUP mDNSResponder , you'll find that scutil --dns | egrep -i '(domain|nameserver)' yields the same resolver entries that your VPN client inserted (e.g. the above commands aren't removing these stale entries ).

Thusly, how I've been able to fix it without a reboot is to effectively use the scutil shell to directly modify and save the state of the DNS-related dictionaries that are populated/deleted by the VPN software (note: I've added prompt> to denote where commands need to be manually entered (and since I can't figure out how to escape the prompt so the formatting doesn't go awry):

Code Block
# In this case, the OpenVPN client is disconnected and all of these entries referring to 172.16.1.53 and 172.16.1.54 shouldn't be in the resolver dictionaries.
$ scutil --dns | egrep -i '(domain|nameserver)'
nameserver[0] : 1.1.1.1
nameserver[1] : 1.0.0.1
domain : sanitized.com
nameserver[0] : 172.16.1.53
nameserver[1] : 172.16.1.54
domain : sanitized.dev
nameserver[0] : 172.16.1.53
nameserver[1] : 172.16.1.54
domain : sanitized.local
nameserver[0] : 172.16.1.53
nameserver[1] : 172.16.1.54
domain : sanitized.local
nameserver[0] : 172.16.1.53
nameserver[1] : 172.16.1.54
domain : sanitized.local
nameserver[0] : 172.16.1.53
nameserver[1] : 172.16.1.54
domain : sanitized.local
nameserver[0] : 172.16.1.53
nameserver[1] : 172.16.1.54
domain : sanitized.local
nameserver[0] : 172.16.1.53
nameserver[1] : 172.16.1.54
domain : sanitized.local
nameserver[0] : 172.16.1.53
nameserver[1] : 172.16.1.54
domain : sanitized.local
nameserver[0] : 172.16.1.53
nameserver[1] : 172.16.1.54
domain : local
domain : 254.169.in-addr.arpa
domain : 8.e.f.ip6.arpa
domain : 9.e.f.ip6.arpa
domain : a.e.f.ip6.arpa
domain : b.e.f.ip6.arpa
nameserver[0] : 1.1.1.1
nameserver[1] : 1.0.0.1
# Lets try to remove them via conventional OS tools / processes that have always worked in the past:
$ sudo dscacheutil -flushcache; sudo killall -HUP mDNSResponder
Password:
# You can see here that the stale entries have not been removed by the above conventional means.
$ scutil --dns | egrep -i '(domain|nameserver)'
nameserver[0] : 1.1.1.1
nameserver[1] : 1.0.0.1
domain : sanitized.com
nameserver[0] : 172.16.1.53
nameserver[1] : 172.16.1.54
domain : sanitized.dev
nameserver[0] : 172.16.1.53
nameserver[1] : 172.16.1.54
domain : sanitized.local
nameserver[0] : 172.16.1.53
nameserver[1] : 172.16.1.54
domain : sanitized.local
nameserver[0] : 172.16.1.53
nameserver[1] : 172.16.1.54
domain : sanitized.local
nameserver[0] : 172.16.1.53
nameserver[1] : 172.16.1.54
domain : sanitized.local
nameserver[0] : 172.16.1.53
nameserver[1] : 172.16.1.54
domain : sanitized.local
nameserver[0] : 172.16.1.53
nameserver[1] : 172.16.1.54
domain : sanitized.local
nameserver[0] : 172.16.1.53
nameserver[1] : 172.16.1.54
domain : sanitized.local
nameserver[0] : 172.16.1.53
nameserver[1] : 172.16.1.54
domain : local
domain : 254.169.in-addr.arpa
domain : 8.e.f.ip6.arpa
domain : 9.e.f.ip6.arpa
domain : a.e.f.ip6.arpa
domain : b.e.f.ip6.arpa
nameserver[0] : 1.1.1.1
nameserver[1] : 1.0.0.1
# Here, we remove the stale entries directly in the scutil shell. I know that it is OpenVPN leaving this behind, so I choose that network service (you may have to choose something else)
$ sudo scutil
Password:
prompt> list ".*DNS"
subKey [0] = State:/Network/Global/DNS
subKey [1] = State:/Network/MulticastDNS
subKey [2] = State:/Network/PrivateDNS
subKey [3] = State:/Network/Service/CEEA9BD1-D467-461C-844D-A4C7E4640418/DNS
subKey [4] = State:/Network/Service/OpenVPNConnect/DNS
prompt> get State:/Network/Service/OpenVPNConnect/DNS
prompt> d.show
<dictionary> {
ServerAddresses : <array> {
0 : 172.16.1.53
1 : 172.16.1.54
}
SupplementalMatchDomains : <array> {
0 : sanitized.dev
1 : sanitized.local
2 : sanitized.local
3 : sanitized.local
4 : sanitized.local
5 : sanitized.local
6 : sanitized.local
7 : sanitized.local
8 : sanitized.com
}
SupplementalMatchDomainsNoSearch : 1
}
prompt> d.remove ServerAddresses
prompt> d.show
<dictionary> {
SupplementalMatchDomains : <array> {
0 : sanitized.dev
1 : sanitized.local
2 : sanitized.local
3 : sanitized.local
4 : sanitized.local
5 : sanitized.local
6 : sanitized.local
7 : sanitized.local
8 : sanitized.com
}
SupplementalMatchDomainsNoSearch : 1
}
prompt> d.remove SupplementalMatchDomains
prompt> d.show
<dictionary> {
SupplementalMatchDomainsNoSearch : 1
}
prompt> d.remove SupplementalMatchDomainsNoSearch
prompt> d.show
<dictionary> {
}
prompt> set State:/Network/Service/OpenVPNConnect/DNS
prompt> quit
# Now you can see the stale entries are gone, and the VPN client and general DNS resolution is restored to working order
$ scutil --dns | egrep -i '(domain|nameserver)'
nameserver[0] : 1.1.1.1
nameserver[1] : 1.0.0.1
domain : local
domain : 254.169.in-addr.arpa
domain : 8.e.f.ip6.arpa
domain : 9.e.f.ip6.arpa
domain : a.e.f.ip6.arpa
domain : b.e.f.ip6.arpa
nameserver[0] : 1.1.1.1
nameserver[1] : 1.0.0.1


You can see from the last scutil --dns | egrep -i '(domain|nameserver)' that I have restored things to what I would normally have when I boot the system (e.g. DHCP providing DNS information over Wi-Fi).

From this point, I can fire up any client VPN software and the interaction with the OS seems to work appropriately (until it breaks again, which is probably when there's a client disconnect / sleep, something like that).

Consequently, you can script this behavior, though it would be nice to know why this is happening. A reboot will fix it, but that's way too cumbersome -- especially if you're someone who has to use an array of VPN client software throughout your day.

I hope this helps.

  • I was able to track down some other VPN issues with the tools and information you provided - thanks!

    On my system the wrong name servers are stored in the DNS configuration. I am using Sonicwall Mobil Connect. The UI of the VPN app shows the correct name servers but the ones which are set in the config after activating the VPN are wrong. Maybe someone experienced something similar?

Add a Comment

Just to add to this thread - also experience the same problem since upgrading to BigSur in the last month. No issues before upgrade. Have to reboot several times a day to resolve. Network connectivity completely drops out through both wired and wireless. Problem does appear to be DNS - browser cannot resolve hostnames. I have noticed that often the problem is triggered when I start Docker Desktop, however I have had it intermittently fail at other times (presumably when some other service starts that I haven't been able to identity?)