Remote access to test system dying every few days

I'm using a Mac Mini as a Jenkins agent, which we use to run our Xcode tests on physical iOS devices. It's configured for remote access with SSH & screen sharing with VNC. Every few days they start failing completely. Sometimes one of them is up a bit longer, but more usually they're both down.

If I look in the GUI it says it's running. The correct ports are listening.

In Console I can see that the sshd process exits with 255 the instant it's started, but I haven't been able to get anything more specific.

I've found that I can get SSH & VNC access back with

launchctl bootout system/<svc name>
launchctl disable system/<svc name>
launchctl enable system/<svc name>
launchctl bootstrap system <plist file name>

The problem is that I can't tell from the remote device that it's not accessible by SSH/VNC. The different interfaces say that sure, everything's fine.

When I do a launchctl print there are some differences between the non-working and working versions. I don't know if these are actual indicators that it's down, or artifacts of the way I restarted them. The differences are consistent for both VNC & SSH:

Not working but apparently running:

path = (submitted by smd.215)
submitted job. ignore execute allowed.
system service = 0

Working after launchctl stop/restart:

path = /System/Library/LaunchDaemons/<plist file>
system service = 1

So, a few questions:

  • Has anyone else seen this?
  • Is there some way to get more error information about why sshd is exiting in the logs/console?
  • Is there a way to detect that sshd is failing, even though there's no system log entry for the failure, and the various interfaces show that everything's fine?
  • What's the cleanest way to tell the system to restart remote access every day just in case it can't be identified any other way?

Accepted Reply

Looks like it was related to Microsoft Defender. Not sure exactly what was causing the problem, but removing MS Defender completely caused the issue to go away.

Replies

Looks like it was related to Microsoft Defender. Not sure exactly what was causing the problem, but removing MS Defender completely caused the issue to go away.