WindowServer userspace-watchdog wedge escalates to recoveryOS forced reactivation — fires even idle & memory-clean (Mac14,5 / macOS 26.5.1 25F80)

Summary

On an M2 Max (macOS 26.5.1, 25F80), WindowServer intermittently stops checking in with the userspace watchdog ("hung 40/80 seconds since last successful checkin") and the machine resets. The part I'm asking about: since ~2026-06-17 the wedge no longer ends in a normal panic-reboot — it escalates to recoveryOS and forces a full Mac reactivation (local password + iCloud). Four times in 55 hours. The most recent was captured live and fired on a machine that was idle, memory-clean (1.2 GB free, flat swap, no Jetsam), with the underlying IOKit power-assertion count plateaued — so this is not resource exhaustion at the moment of the wedge.

It reproduces under a heavy third-party workload (an Electron app's per-session process fan-out, tracked separately by the vendor). I'm not asking Apple to fix that app — I'm asking about the OS behaviors that turn "an app uses a lot over many hours" into "the owner is locked out pending iCloud reactivation."

Apple-side vs app-side (so this isn't dismissed as a third-party issue)

Apple's — and these should hold regardless of any app, because no userspace workload should be able to cause them:

  1. A userspace WindowServer watchdog timeout escalating to a firmware/recoveryOS reset that invalidates the boot policy and demands iCloud reactivation. On disk, the four 06-18→06-20 events have the WindowServer watchdog .spin/.ips but no panic-full and no ResetCounter — they did not take the normal panic-reboot path.
  2. The watchdog resetting the whole machine instead of restarting the wedged compositor ("WindowServer has not exited since first loaded").
  3. WindowServer/SkyLight degrading cumulatively under sustained use so the wedge fires even when the machine is currently idle and memory-clean (below).
  4. IOKit never reclaiming RootDomainUserClient (IOPMrootDomain) registrations — they accumulate unbounded and clear only on reboot, and the dominant holders are Apple's own daemons (apsd, WebThumbnailExtension), not the app.

App-side (separate tracker): unbounded per-session MCP process fan-out + an Electron footprint + a held NoIdleSleep assertion forcing sustained display-on operation. That controls how fast you reach the degraded state; the OS controls whether reaching it is a graceful degrade or an owner-lockout. Event 4 shows the OS failure with no resource exhaustion present at all.

System

MacBook Pro 14" (Mac14,5), M2 Max (T6020), 38-core GPU, 32 GB — a high-end, fully capable machine I had essentially never needed to reboot before this; it now forces a reactivation roughly once per day of use. macOS 26.5.1 (25F80), kernel xnu-12377.121.6~2 (Darwin 25.5.0). FileVault on, no third-party kexts, 161 GB free disk. Same kernel build as 25F71 — the update didn't change it.

Signature

userspace watchdog timeout: no successful checkins from WindowServer (0 induced crashes) in 120 seconds
WindowServer has not exited since first loaded
service: logd / opendirectoryd / configd  — last checkin: 0 seconds ago
service: WindowServer                      — last checkin: 120 seconds ago
Panicked task ... watchdogd;  KEXT backtrace: AppleARMWatchdogTimer -> AppleARMPlatform
Compressor Info: NN% (OK) ... 0 swapfiles   <-- memory-clean

Only WindowServer is stuck; logd/opendirectoryd/configd check in normally. Pre-reset spindumps show ws_main_thread off-CPU ~59 s, wedged in SkyLight → QuartzCore CA::Transaction::commit / CALayer render-commit. Two spindumps per event (the 40 s/80 s checkpoints before the 120 s reset).

Timeline & the escalation point (verifiable on disk by panic-full/ResetCounter presence)

Normal panic-reboots (panic-full + ResetCounter written): 06-15 22:24 (~6.2 h uptime), 06-16 11:39 (~13.2 h), plus 06-14 / 06-15 16:07 / 06-16 20:08. Latest panic-full on disk = 06-16 20:08.

Escalated to recoveryOS reactivation (NO panic-full, NO ResetCounter):

#WhenWindowServer uptime at wedgestate
R106-18 01:23~27.6 h (99,278 s)
R206-18 19:53~18.4 h (66,201 s)+ Jetsam pressure
R306-19 15:30~19.2 h (68,991 s)
R406-20 08:19~16.7 h (60,180 s)idle, 1.2 GB free, leak plateaued 583

Uptime = WindowServer-process uptime from the .ips "M checkins since K seconds ago" field (the coarse uptime JSON field — 99000/66000/69000/60000 — corroborates). Time-to-wedge is not a fixed interval — it ranges ~6–28 h and scales inversely with GPU/compositor load; the invariant is sustained use, not a clock value. After 06-16 there are zero panic-full and zero ResetCounter on disk — the fingerprint of the recoveryOS escalation.

Decisive evidence — R4, captured live

A monitor sampling every 2.5 min when WindowServer wedged (08:19, 16.7 h):

08:14  iopm=581  free=1833MB  swap=3732MB  load=2.88
08:19  iopm=583  free=1192MB  swap=3724MB  load=3.03   <-- watchdog spindump written

Idle (load ~3), 1.2 GB free, flat swap, no Jetsam, the IOKit power-assertion count plateaued at 583. Nothing to exhaust — it still wedged and escalated to reactivation. The trigger is cumulative WindowServer/SkyLight state, not the resource level.

The leak (an aggravator)

RootDomainUserClient/IOPMrootDomain clients climb without bound (baseline ~120 → 583–923 here). Dominant holders are Apple daemons (live ioreg walk 2 min pre-panic: apsd 228/596, then Safari/WindowServer/powerd/loginwindow). Killing the top creating process does not reclaim them (ioclasscount 526→526) — kernel-orphaned, reboot-only. Per R4 the leak is an aggravator, not the threshold (583 wedged; 923 had not earlier).

Ruled out

Hardware (survived 25F71→25F80, same kernel; peripheral + driver removal); memory exhaustion (R4 and others memory-clean); a fixed clock (load-dependent, 6–28 h); an iopm threshold (R4 plateaued at 583).

Questions for Apple engineering

  1. Why does a userspace WindowServer watchdog timeout escalate to recoveryOS + forced reactivation (boot-policy re-verification) instead of a normal panic-reboot? What invalidates LocalPolicy / triggers Activation-Lock re-verification on this path?
  2. What in WindowServer/SkyLight degrades cumulatively over sustained use so a render-commit can't complete within the watchdog window even on an idle, memory-clean machine?
  3. Can the watchdog restart the wedged WindowServer ("has not exited since first loaded") instead of resetting the whole machine into a reactivation lockout?
  4. Can IOKit lifecycle-bound RootDomainUserClient (IOPMrootDomain) registrations so they don't accumulate unreclaimable? Dominant creators are Apple daemons (apsd, WebThumbnailExtension).

Apple Feedback FB22947849 has the per-event spindumps, .ips, the live monitor log, and a sysdiagnose. I'd most appreciate a pointer on #1 — the reactivation escalation is what turns a recoverable crash into a repeated owner-lockout.

WindowServer userspace-watchdog wedge escalates to recoveryOS forced reactivation — fires even idle & memory-clean (Mac14,5 / macOS 26.5.1 25F80)
 
 
Q