System-wide deadlock in removexattr from revisiond / APFS

Question

Created 1w

Replies 18

Boosts 0

Participants 3

System-wide deadlock in removexattr from revisiond / APFS

We're experiencing a deadlock on certains systems when our software is installed, which is causing side effects in our process (and likely others) such as blocked queues and increased memory usage.

According to the spindump, revisiond appears to be holding an exclusive lock within the kernel.

Process:          revisiond [426]
UUID:             5E9B9E04-984B-31AD-A4FF-A1A90B7D53A1
Path:             /System/Library/PrivateFrameworks/GenerationalStorage.framework/Versions/A/Support/revisiond
Codesigning ID:   com.apple.revisiond
Shared Cache:     25AE5A2A-FE2A-3998-8D4E-F3C5C6E6CEB6 slid base address 0x189834000, slide 0x9834000 (System Primary)
Architecture:     arm64e
Parent:           launchd [1]
UID:              0
Sudden Term:      Tracked
Memory Limit:     50MB
Jetsam Priority:  40
Footprint:        6225 KB
Time Since Fork:  1740319s
Num samples:      940 (1-940)
Num threads:      5
Note:             1 idle work queue thread omitted

[...]

  Thread 0xc0616d    940 samples (1-940)    priority 46 (base 4)    last ran 241692.754s ago
  940  start_wqthread + 8 (libsystem_pthread.dylib + 7068) [0x189d0ab9c]
    940  _pthread_wqthread + 292 (libsystem_pthread.dylib + 11852) [0x189d0be4c]
      940  _dispatch_workloop_worker_thread + 692 (libdispatch.dylib + 85356) [0x189b65d6c]
        940  _dispatch_root_queue_drain_deferred_wlh + 292 (libdispatch.dylib + 87156) [0x189b66474]
          940  _dispatch_lane_invoke + 440 (libdispatch.dylib + 45048) [0x189b5bff8]
            940  _dispatch_lane_serial_drain + 944 (libdispatch.dylib + 42420) [0x189b5b5b4]
              940  _dispatch_client_callout + 16 (libdispatch.dylib + 113364) [0x189b6cad4]
                940  _dispatch_call_block_and_release + 32 (libdispatch.dylib + 7004) [0x189b52b5c]
                  940  ??? (revisiond + 168768) [0x10494d340]
                    940  ??? (revisiond + 165940) [0x10494c834]
                      940  ??? (revisiond + 40264) [0x10492dd48]
                        940  ??? (revisiond + 56680) [0x104931d68]
                          940  <patched truncated backtrace>
                            940  removexattr + 8 (libsystem_kernel.dylib + 23768) [0x189cd1cd8]
                             *940  ??? (kernel.release.t6000 + 15240) [0xfffffe000886fb88]
                               *940  ??? (kernel.release.t6000 + 1886348) [0xfffffe0008a3888c]
                                 *940  ??? (kernel.release.t6000 + 7730436) [0xfffffe0008fcb504]
                                   *940  ??? (kernel.release.t6000 + 2759592) [0xfffffe0008b0dba8]
                                     *940  ??? (kernel.release.t6000 + 2808244) [0xfffffe0008b199b4]
                                       *940  apfs_vnop_removexattr + 1044 (apfs + 474512) [0xfffffe000be8d4d0]
                                         *940  decmpfs_cnode_set_vnode_state + 80 (kernel.release.t6000 + 2945816) [0xfffffe0008b3b318]
                                           *940  IORWLockWrite + 184 (kernel.release.t6000 + 496184) [0xfffffe00088e5238]
                                             *940  ??? (kernel.release.t6000 + 494624) [0xfffffe00088e4c20]
                                               *940  ??? (kernel.release.t6000 + 619452) [0xfffffe00089033bc]
                                                 *940  ??? (kernel.release.t6000 + 624472) [0xfffffe0008904758]

The bulk of the other processes are waiting for that lock.

(suspended, blocked by krwlock for reading owned by revisiond [426] thread 0xc0616d)

(blocked by krwlock for writing owned by revisiond [426] thread 0xc0616d)

Around the time of the event, these messages were logged by revision:

2026-03-06 18:49:37.781673-0500 0x16b7     Error       0x7f92f364           426    14   revisiond: [com.apple.revisiond:default] [ERROR] CSCopyChunkIDsForToken failed for 41639
2026-03-06 18:49:37.781716-0500 0x16b7     Error       0x7f92f365           426    14   revisiond: [com.apple.revisiond:default] [ERROR] updateEntry for new entry <private> failed
2026-03-06 18:49:37.781738-0500 0x16b7     Error       0x7f92f366           426    14   revisiond: [com.apple.revisiond:default] [ERROR] no entry for '<private>'
2026-03-06 18:49:37.781754-0500 0x16b7     Error       0x7f92f367           426    14   revisiond: [com.apple.revisiond:default] [ERROR] failed assembleInfoForOffset for fsid 16777234 fileid 359684022 offset 0 size 14334 (path <private>)

Our agent uses Endpoint Security Framework to monitor events, and provide anti-tamper functionality for installed components and processes. While several EndpointSecurity calls appear in the spindump stack traces, we don't have any evidence that any calls from revisiond were blocked.

What we'd really like to to understand what that lock is (appears to be decompressing an object on an APFS volume), what revisiond and APFS are doing with it, and what might cause it to deadlock.

Of note, one of our processes is also waiting on that lock, one thread for reading and the other for writing.

This issue affects machines running several macOS versions (15.x, 26.x). The machine in the examples is running macOS 26.3 (25D125)

Boost

Answer 1

DTS Engineer OP

Apple

1w

We're experiencing a deadlock on certains systems when our software is installed, which is causing side effects in our process (and likely others) such as blocked queues and increased memory usage.

According to the spindump, revisiond appears to be holding an exclusive lock within the kernel.

Can you file a bug on this and upload the full spindump there? That's the easiest way to transfer large data files and I may eventually want a sysdiagnose.

Having said that...

Our agent uses Endpoint Security Framework to monitor events, and provide anti-tamper functionality for installed components and processes. While several EndpointSecurity calls appear in the spindump stack traces, we don't have any evidence that any calls from revisiond were blocked.

...this is almost CERTAINLY caused by your ES client.

What we'd really like to to understand what that lock is (appears to be decompressing an object on an APFS volume), what revisiond and APFS are doing with it, and what might cause it to deadlock.

You can actually find the code for decmpfs_cnode_set_vnode_state here, however, I think you're asking the wrong question. The question here isn't what revisiond is doing, it's who revisiond is waiting on. Some other thread locked that lock and is now stuck, which is the real issue you need to sort out.

Case in point, you're getting closer:

Of note, one of our processes is also waiting on that lock, one thread for reading and the other for writing.

However, the problem thread ISN'T going to be waiting on that lock- that thread is going to be stuck "somewhere else", unable to complete it's work and release the lock.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

0

Answer 2

mslot OP

1w

Can you file a bug on this and upload the full spindump there?

I opened one last week. Case-ID: 18860388

...this is almost CERTAINLY caused by your ES client.

Yes.

The question here isn't what revisiond is doing, it's who revisiond is waiting on. Some other thread locked that lock and is now stuck, which is the real issue you need to sort out.

It definitely revisiond (well, APFS), that's holding the lock. There are dozens of threads where the shared lock points at revisiond [426] thread 0xc0616d), both for reading and writing.

 *940  apfs_vnop_getattr + 312 (apfs + 604336) [0xfffffe000beacff0]
   *940  IORWLockRead + 144 (kernel.release.t6000 + 496568) [0xfffffe00088e53b8]
	 *940  ??? (kernel.release.t6000 + 497548) [0xfffffe00088e578c]
	   *940  ??? (kernel.release.t6000 + 619452) [0xfffffe00089033bc]
		 *940  ??? (kernel.release.t6000 + 624472) [0xfffffe0008904758] (suspended, blocked by krwlock for reading owned by revisiond [426] thread 0xc0616d)

 *940  apfs_vnop_read + 708 (apfs + 555972) [0xfffffe000bea1304]
   *940  IORWLockWrite + 184 (kernel.release.t6000 + 496184) [0xfffffe00088e5238]
	 *940  ??? (kernel.release.t6000 + 494304) [0xfffffe00088e4ae0]
	   *940  ??? (kernel.release.t6000 + 619452) [0xfffffe00089033bc]
		 *940  ??? (kernel.release.t6000 + 624472) [0xfffffe0008904758] (blocked by krwlock for writing owned by revisiond [426] thread 0xc0616d)

*940  icp_lock_inode + 72 (apfs + 757908) [0xfffffe000bed27d4]
 *940  IORWLockWrite + 184 (kernel.release.t6000 + 496184) [0xfffffe00088e5238]
   *940  ??? (kernel.release.t6000 + 494304) [0xfffffe00088e4ae0]
	 *940  ??? (kernel.release.t6000 + 619452) [0xfffffe00089033bc]
	   *940  ??? (kernel.release.t6000 + 624472) [0xfffffe0008904758] (blocked by krwlock for writing owned by revisiond [426] thread 0xc0616d)

Only that revisiond thread that doesn't blame another thread / lock.

0

Answer 3

evitcejbo OP

1w

The IORWLockWrite stack seems to point machine_switch_context, i.e. when the lock is owned by another thread, and so the current thread is suspended / yielded to another one, waiting the lock to be reclaimable again.

But then it's a bit incoherent with all the other threads pointing that "blocked by krwlock for writing owned by revisiond [426] thread 0xc0616d" (it can't be at the same time the owner, and not the owner…).

Is it possible that machine_switch_context is called if you were able to get the ownership of the lock ? In which kind of scenario ? The stack doesn't seem to tell it. And we don't have the source code of IORWLockWrite.

It's like something suspended the revisiond thread in the kernel when it executed IORWLockWrite, but then this "something" is unable to resume it because it is blocked itself (on this same lock ?). But then it doesn't align with this machine_switch_context symbol in the stack.

0

Answer 4

DTS Engineer OP

Apple

1w

I opened one last week. Case-ID: 18860388

I don't think that’s a valid bug number. Details on the bug filing process are here, and the numbers are prefixed "FB". Again, please upload the full spintrace to that bug and then post the bug number back here.

It definitely revised (well, APFS), that's holding the lock.

Sure, but the question is "why", not "who". Causes this kind of hang are the interactions between multiple locks and multiple processes. It's hard to pick up unless you're looking at the full log and know what you're looking for, but the basic form is that there are two locks:

The "outer" lock, which the blocking thread (in this case, thread 0xc0616d) is inside and holding.
The "inner" lock, which the blocking thread (thread 0xc0616d) is stuck waiting on.

You can actually see this dynamic in the traces you sent. This lock is an APFS-owned lock:

 *940  apfs_vnop_getattr + 312 (apfs + 604336) [0xfffffe000beacff0]
   *940  IORWLockRead + 144 (kernel.release.t6000 + 496568) [0xfffffe00088e53b8]

And, unsurprisingly, deleting an xattr requires holding that lock:

*940  apfs_vnop_removexattr + 1044 (apfs + 474512) [0xfffffe000be8d4d0]

However, this is NOT an APFS lock, but is actually a lock in the kernel vfs system:

*940  decmpfs_cnode_set_vnode_state + 80 (kernel.release.t6000 + 2945816) [0xfffffe0008b3b318]
  *940  IORWLockWrite + 184 (kernel.release.t6000 + 496184) [0xfffffe00088e5238]

Again, you can see that second lock here if you're curious.

There are dozens of threads where the shared lock points at revisiond [426] thread 0xc0616d), both for reading and writing.

All of the APFS locks tend to be held for very short periods of time, so it's not unusual for work to pile up very quickly. More to the point, all of those other threads are (mostly) irrelevant to the issue. I'd actually be looking for any other reference to compression/decompression or xattrs.

Only that revisiond thread that doesn't blame another thread / lock.

The blame logic is not infallible, particularly when pure kernel threads are involved.

Is it possible that machine_switch_context is called if you were able to get the ownership of the lock? In which kind of scenario? The stack doesn't seem to tell it.

No, not really.

And we don't have the source code of IORWLockWrite

Yes, you do. It's defined in IOLocks.h, which maps it to lck_rw_lock_exclusive. However, I wouldn't expect that to lead you anywhere useful.

The bigger point here is that I think you already know revisiond isn't actually blocking itself. In your first post, you said "deadlock", not "panic":

We're experiencing a deadlock on certain systems when our software is installed, which is causing side effects in our process (and likely others) such as blocked queues and increased memory usage.

...which means I suspect you meant "hang", meaning that the problem eventually cleared itself. If the problem clears on its own, that meant someone had to release revisiond to make forward progress.

That leads to here:

It's like something suspended the revisiond thread in the kernel when it executed IORWLockWrite, but then this "something" is unable to resume it because it is blocked itself (on this same lock?).

See above for the description of how this happens.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

0

Answer 5

evitcejbo OP

1w

All of the APFS locks tend to be held for very short periods of time, so it's not unusual for work to pile up very quickly. More to the point, all of those other threads are (mostly) irrelevant to the issue. I'd actually be looking for any other reference to compression/decompression or xattrs.

If they are held for a very short amount of time, shouldn't we rarely see other threads waiting for it ? It's what I would expect, at least.

And here we can see that all other threads are waiting for it for the whole spindump duration (Num samples: 940 (1-940) / IORWLockWrite & IORWLockRead → 940). I mean, I know this count the number of times the sampler see these symbols each time it samples the processes (i.e. it doesn't mean this code was running between each sample), but I would be surprised that these exact same stacks is re-happening exactly at the same time as the samples are done by chances: they are likely running for the whole time.

Yes, you do. It's defined in IOLocks.h, which maps it to lck_rw_lock_exclusive. However, I wouldn't expect that to lead you anywhere useful.

Yep, I noticed that, but as we see IORWLock... in the stacks, I concluded that IOLOCKS_INLINE wasn't set, and that it really uses IORWLock... function (#define is a preprocessor macro, no reason for this function to appear in the stack if IOLOCKS_INLINE is set)

No, not really.

But then I don't understand why all the threads are pointing that revisiond thread is owning the lock, while revisiond thread stack seems to says that it wasn't able to own it (and so is suspended)…

Or, as you said, the blame logic is just all wrong, and everyone is pointing this revisiond thread by mistake, and revisiond is just blocked on someone else, like everyone else.

I let the OP answer the other points.

0

Answer 6

mslot OP

1w

I don't think that’s a valid bug number.

Sorry, it was a TSI, not a bug report. I replied to the DTS email yesterday with the full spin dump.

Let me know if you still want a bug report as well.

Again, you can see that second lock here if you're curious.

There are only 2 other threads in the logs that appear relevant, both from our helper process and both down in APFS. One of them is also stuck inside decmpfs_read_compressed:

  Thread 0xc06805    DispatchQueue "com.apple.root.default-qos"(13)    940 samples (1-940)    priority 46 (base 31)    last ran 241692.753s ago
  940  start_wqthread + 8 (libsystem_pthread.dylib + 7068) [0x189d0ab9c]
    940  _pthread_wqthread + 232 (libsystem_pthread.dylib + 11792) [0x189d0be10]
      940  _dispatch_worker_thread2 + 180 (libdispatch.dylib + 83844) [0x189b65784]
        940  _dispatch_root_queue_drain + 736 (libdispatch.dylib + 82236) [0x189b6513c]
          940  <deduplicated_symbol> + 32 (libdispatch.dylib + 231900) [0x189b899dc]
            940  _dispatch_client_callout + 16 (libdispatch.dylib + 113364) [0x189b6cad4]
              940  _dispatch_call_block_and_release + 32 (libdispatch.dylib + 7004) [0x189b52b5c]
                940  __34-[MyConcurrentQueue performBlock:]_block_invoke (in MyFramework) (MyConcurrentQueue.m:228) + 355392  [0x102a42c40]
                  940  __66-[MyPathHelper fetchFileTypeForFileAtPath:statData:queue:handler:]_block_invoke.125 (in MyFramework) (MyPathHelper.m:1056) + 43416  [0x1029f6998]
                    940  pread + 8 (libsystem_kernel.dylib + 9788) [0x189cce63c]
                     *940  ??? (kernel.release.t6000 + 15240) [0xfffffe000886fb88]
                       *940  ??? (kernel.release.t6000 + 1886348) [0xfffffe0008a3888c]
                         *940  ??? (kernel.release.t6000 + 7730436) [0xfffffe0008fcb504]
                           *940  ??? (kernel.release.t6000 + 6448868) [0xfffffe0008e926e4]
                             *940  ??? (kernel.release.t6000 + 6447764) [0xfffffe0008e92294]
                               *940  ??? (kernel.release.t6000 + 2790292) [0xfffffe0008b15394]
                                 *940  ??? (kernel.release.t6000 + 2791252) [0xfffffe0008b15754]
                                   *940  apfs_vnop_read + 508 (apfs + 555772) [0xfffffe000bea123c]
                                     *940  decmpfs_read_compressed + 300 (kernel.release.t6000 + 2955192) [0xfffffe0008b3d7b8]
                                       *940  ??? (kernel.release.t6000 + 2946852) [0xfffffe0008b3b724]
                                         *940  ??? (kernel.release.t6000 + 2800480) [0xfffffe0008b17b60]
                                           *940  apfs_inode_getxattr + 716 (apfs + 1541792) [0xfffffe000bf91de0]
                                             *940  IORWLockRead + 144 (kernel.release.t6000 + 496568) [0xfffffe00088e53b8]
                                               *940  ??? (kernel.release.t6000 + 497548) [0xfffffe00088e578c]
                                                 *940  ??? (kernel.release.t6000 + 619452) [0xfffffe00089033bc]
                                                   *940  ??? (kernel.release.t6000 + 624472) [0xfffffe0008904758] (blocked by krwlock for reading owned by revisiond [426] thread 0xc0616d)

  Thread 0xc06362    940 samples (1-940)    priority 46 (base 31)    last ran 241692.753s ago
  940  start_wqthread + 8 (libsystem_pthread.dylib + 7068) [0x189d0ab9c]
    940  _pthread_wqthread + 232 (libsystem_pthread.dylib + 11792) [0x189d0be10]
      940  _dispatch_worker_thread2 + 180 (libdispatch.dylib + 83844) [0x189b65784]
        940  _dispatch_root_queue_drain + 736 (libdispatch.dylib + 82236) [0x189b6513c]
          940  <deduplicated_symbol> + 32 (libdispatch.dylib + 231900) [0x189b899dc]
            940  _dispatch_client_callout + 16 (libdispatch.dylib + 113364) [0x189b6cad4]
              940  _dispatch_call_block_and_release + 32 (libdispatch.dylib + 7004) [0x189b52b5c]
                940  __34-[MyConcurrentQueue performBlock:]_block_invoke (in MyFramework) (MyConcurrentQueue.m:228) + 355392  [0x102a42c40]
                  940  __66-[MyPathHelper fetchFileTypeForFileAtPath:statData:queue:handler:]_block_invoke.125 (in MyFramework) (MyPathHelper.m:1056) + 43416  [0x1029f6998]
                    940  <patched truncated backtrace>
                      940  pread + 8 (libsystem_kernel.dylib + 9788) [0x189cce63c]
                       *940  ??? (kernel.release.t6000 + 15240) [0xfffffe000886fb88]
                         *940  ??? (kernel.release.t6000 + 1886348) [0xfffffe0008a3888c]
                           *940  ??? (kernel.release.t6000 + 7730436) [0xfffffe0008fcb504]
                             *940  ??? (kernel.release.t6000 + 6448868) [0xfffffe0008e926e4]
                               *940  ??? (kernel.release.t6000 + 6447764) [0xfffffe0008e92294]
                                 *940  ??? (kernel.release.t6000 + 2790292) [0xfffffe0008b15394]
                                   *940  ??? (kernel.release.t6000 + 2791252) [0xfffffe0008b15754]
                                     *940  apfs_vnop_read + 708 (apfs + 555972) [0xfffffe000bea1304]
                                       *940  IORWLockWrite + 184 (kernel.release.t6000 + 496184) [0xfffffe00088e5238]
                                         *940  ??? (kernel.release.t6000 + 494304) [0xfffffe00088e4ae0]
                                           *940  ??? (kernel.release.t6000 + 619452) [0xfffffe00089033bc]
                                             *940  ??? (kernel.release.t6000 + 624472) [0xfffffe0008904758] (blocked by krwlock for writing owned by revisiond [426] thread 0xc0616d)

I will review these calls and locks some more, but have limited insight into the APFS implementation here.

...which means I suspect you meant "hang", meaning that the problem eventually cleared itself. If the problem clears on its own, that meant someone had to release revisiond to make forward progress.

No, it was 3 days later and the problem hadn't resolved. At that time, we tried killing revisiond but it was left a zombie (due to the held lock?). A reboot was required to resolve the problem.

0

Answer 7

DTS Engineer OP

Apple

6d

Sorry, it was a TSI, not a bug report. I replied to the DTS email yesterday with the full spin dump.

Ahh, I see it now. So, the first thing I actually did was search for "/usr/lib/libEndpointSecurity.dylib" (the library link path), which is a convenient way to find all ES clients.

What you'll find is that there are two clients— yours and a (presumably unrelated) 3rd party client. That's a critical factor here because it means that there are now two independent entities with "veto" power over each other's activities, particularly activity that's coming from helper components (not just the direct ES client).

That leads to here:

There are only 2 other threads in the logs that appear relevant, both from our helper process and both down in APFS. One of them is also stuck inside decmpfs_read_compressed:

That's the deadlock. More specifically, looking at your thread's stack when you enter decmpfs_read_compressed, the first thing it does is call "decmpfs_lock_compressed_data":

(1)

*940  decmpfs_read_compressed + 300 (kernel.release.t6000 + 2955192) [0xfffffe0008b3d7b8]
  *940  ??? (kernel.release.t6000 + 2946852) [0xfffffe0008b3b724]
   *940  ??? (kernel.release.t6000 + 2800480) [0xfffffe0008b17b60]

Which is the lock that's being revisioned is waiting on here:

(2)

*940  decmpfs_cnode_set_vnode_state + 80 (kernel.release.t6000 + 2945816) [0xfffffe0008b3b318]
  *940  IORWLockWrite + 184 (kernel.release.t6000 + 496184) [0xfffffe00088e5238]

However, jumping back to your thread, the lock that's being waited on here:

(3)

*940 apfs_inode_getxattr + 716 (apfs + 1541792) [0xfffffe000bf91de0]
  *940 IORWLockRead + 144 (kernel.release.t6000 + 496568) [0xfffffe00088e53b8]

Is NOT the lock in #2. It's actually the more general apfs lock that revisiond took just before it entered decmpfs_cnode_set_vnode_state:

(4)

*940 apfs_vnop_removexattr + 1044 (apfs + 474512) [0xfffffe000be8d4d0]

Do you know exactly what "kicked off" your helper process? Both in terms of the triggering event and the target file? The timing required here is EXTREMELY specific, as you can see in their last Run timing:

decmpfs_cnode_set_vnode_state-> last ran 241692.754s ago

decmpfs_read_compressed -> last ran 241692.753s ago

That's VERY narrow timing, so if this issue is repeatable then something is happening to "line" the race between the thread with JUST the right timing.

Moving back to revisiond, as background, what revisiond actually does is maintain the backing data used by the system file versioning architecture. From that perspective, it's certainly something I'd consider muting, since everything it does is going to be driven by other external activity not its own "work".

In terms of what it specifically did here, it called "remove()" (trying to delete on old file) on a compressed file. It isn't actually trying to modify the file’s compression state, it's trying to delete it.

If they are held for a very short amount of time, shouldn't we rarely see other threads waiting for it? It's what I would expect, at least.

Yes... unless a process deadlocks, at which point other access starts piling up. Interestingly, I think you're deadlocked against a very specific file that isn't actually THAT essential to overall system function. Much of the system is still operating fairly "normally", which is why it hasn't already panicked.

And here we can see that all other threads are waiting for it for the whole spindump duration (Num samples: 940 (1-940) / IORWLockWrite & IORWLockRead → 940).

Oh, they've been waiting a LOT longer than that… note the "last ran" entry on each thread. That's the last time it actually executed:

Thread 0xc0616d    940 samples (1-940)    priority 46 (base 4)    last ran 241692.754s ago

241692s-> 2.7 days

The strange part here is that you've got ~350 zombie processes all waiting on the same lock in apfs_vnop_getattr. Here's the earliest and latest thread header:

Thread 0xc0879f    940 samples (1-940)    priority 46 (base 31)    last ran 241070.127s ago
...
Thread 0xf0adec    940 samples (1-940)    priority 46 (base 31)    last ran 708.289s ago

...all of them spawned by your helper.

Yep, I noticed that, but as we see IORWLock... in the stacks, I concluded that IOLOCKS_INLINE wasn't set, and that it really uses IORWLock... function (#define is a preprocessor macro, no reason for this function to appear in the stack if IOLOCKS_INLINE is set)

It's not really visible in your partial symbolization, but there's some compiler magic going on to provide better symbolization than would occur with a direct interruption of the define chain. Not sure how the mechanics of it worked, but the symbols system is using the #define for the underlying kernel function.

But then I don't understand why all the threads are pointing that revisiond thread is owning the lock,

I outlined this above, but revisiond does own the lock that those threads are blocked on. That lock was the one it took when it entered "apfs_vnop_removexattr". What's missing is that the system isn't accounting for the thread it's blocked on:

while revisiond thread stack seems to say that it wasn't able to own it (and so is suspended)…

I believe that's happening because the tracking system here is only tracking the most recent lock a thread is holding/blocked on, not EVERY lock a thread may be holding. So it can't tell that this thread is the thread that's blocking revisiond:

*940  apfs_vnop_read + 508 (apfs + 555772) [0xfffffe000bea123c]
 *940  decmpfs_read_compressed + 300 (kernel.release.t6000 + 2955192) [0xfffffe0008b3d7b8]
   *940  ??? (kernel.release.t6000 + 2946852) [0xfffffe0008b3b724]
	 *940  ??? (kernel.release.t6000 + 2800480) [0xfffffe0008b17b60]
	   *940  apfs_inode_getxattr + 716 (apfs + 1541792) [0xfffffe000bf91de0]
		 *940  IORWLockRead + 144 (kernel.release.t6000 + 496568) [0xfffffe00088e53b8]

Or, as you said, the blame logic is just all wrong, and everyone is pointing this revisiond thread by mistake, and revisiond is just blocked on someone else, like everyone else.

Not wrong, just limited in scope.

Finally moving to cause:

One of them is also stuck inside decmpfs_read_compressed:

My big question here is "how did you get here"? Do you know what syscall you approved leading into this? What are you actually trying to do here?

Finally, as a shot in the dark, is it possible this was a "..namedfork" path? There are some issue with those and it's possible that pushed you down a weird code path, leading to this deadlock.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

0

Answer 8

evitcejbo OP

6d

Lots of interesting things to check in your answer, thanks !

That's said, I have a last question (on my side) regarding Endpoint Security.

In the past, when I had to investigate deadlocks where ES was involved, we 100% of the time saw symbols like EndpointSecurityEventManager::es_something and/or EndpointSecurityEventManager::sendSomething, and something which show that they are waiting for an answer (for auth events, but I guess it can happen with notify events when the queue is full, and ES is not willing to drop the new events - I think someone from ES team explained to us some years ago that ES can try to slow down entering events to make a chance of the client to empty a bit the queue…), before being killed by ES kext itself.

But here, I don't see such things. When EndpointSecurityEventManager is on the appear in the spindump, it's only because they are blocked, like anyone else, on one of these 2 lock.

There is known typical scenario where ES can result in such double lock interlocking, without it being even a clear smoking gun ?

0

Answer 9

DTS Engineer OP

Apple

5d

Let me start by saying two slightly contradictory things:

It's very likely that there is an APFS bug here, so if you haven't already, please file a bug on this and post the bug number back here.
It's ALSO very likely that your ES client (possibly both ES clients) are required to actually experience this bug, and you should address it as well.

Next, I think I want to be clear that some of what I'm saying is a bit of an "educated guess". A big part of the problem here is that the data we're looking at is being collected LONG (2+ days) after the problem itself actually occurred, which means we don't REALLY know EXACTLY what happened. That doesn't invalidate what I'm describing above, but I do think it's important to have some awareness of the limitation of the data at hand and that, most importantly, new data could totally change our understanding of the overall state of things.

Getting into the details:

But here, I don't see such things. When EndpointSecurityEventManager is on the spindump, it's only because they are blocked, like anyone else, on one of these 2 locks.

What you're talking about here is the simple/direct case. Things like accidentally calling into a daemon as part of handling a request that's actually coming (directly or indirectly) from that daemon. Or simply slowing down "something" in the system so much that things start breaking down.

There is a known typical scenario where ES can result in such double lock interlocking, without it being even a clear smoking gun?

...however, that's NOT what I think is going on here. The more subtle issue with the ES API is how it can create and expose issues that not only wouldn't, but COULDN'T happen. Jumping back to my earlier forum post, the basic summary is that there are two different locks at play here:

The apfs_inode_getxattr lock
The decmpfs lock

And the deadlock occurs when a sequence like this occurs on exactly the same file:

Thread 1-> hold lock 2
Thread 2-> hold lock 1
Thread 2-> reach lock 2
Thread 1 -> reach lock 1

On the surface, that doesn't seem all that unlikely; however, that overlooks the role time plays here. In both cases, the total time to complete the entire sequence is EXTREMELY small, probably 1 quanta (the scheduler’s minimum scheduling interval) or less. That means that triggering this crash effectively requires two threads to be targeting exactly the same file AND then reach EXACTLY the right code point at EXACTLY the wrong time. In theory, that's possible; however, in practice, the randomness of the scheduler and the tiny time window mean... it can't really happen.

HOWEVER... now we add ES into the mix. When with ES added, the code flow here looks like this:

removexattr (in kernel)
ES check start -> callout to user space
user space check
call into kernel -> ES check completes
take lock 2
take lock 1

That ES check can now act as a sort of "starting gun", since you're now able to generate work targeting the relevant file within a FAR narrower time window than would otherwise occur. Indeed, when I said there were 2 threads involved, I was wrong. There are actually 4.

You can find them by starting with revisiond's last run time:

last ran 241692.754s ago

If you then search for "last ran 241692.75" (dropping the last 1000th of second), you'll find 3 additional threads with a last run of:

last ran 241692.753s ago

Two of those threads are in revisiond, two of them are in your helper. More importantly, if you compare the two threads, I think you'll find that they're basically parallel's of each other. My best explanation for that is that you basically "raced" both threads into the kernel with JUST the right/wrong timing that you triggered this deadlock.

In terms of what you do about it, I think the best option is to remove the timing collision, either by performing the work before approval or by adding a small delay to the I/O so that you don't collide. Of course, that assumes you don't just mute revisiond (which is probably what I would do).

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

0

Answer 10

mslot OP

5d

What you'll find is that there are two clients— yours and a (presumably unrelated) 3rd party client.

The other ES client is basically a 3rd party logger that's not active. Ours does some auth work, but we're fairly certain in this case it's just monitoring to do reporting and follow-up scans.

Do you know exactly what "kicked off" your helper process? Both in terms of the triggering event and the target file?

No. At the time, the system had been idle for 2 hours. In this case, we do observe a system update being staged in the background, but not in all other instances.

I suspect it was a FILE_NOTIFY_CLOSE. It would be insightful to know if APFS triggers that or related ES events when decompressing a file.

A big part of the problem here is that the data we're looking at is being collected LONG (2+ days) after the problem itself actually occurred, which means we don't REALLY know EXACTLY what happened.

I will submit unified logs that cover the period in question, but your point remains.

That ES check can now act as a sort of "starting gun", since you're now able to generate work targeting the relevant file within a FAR narrower time window than would otherwise occur. Indeed, when I said there were 2 threads involved, I was wrong.

That's the conclusion we've been circling too. Not that our ES hooks are directly causing the deadlock, but that by hooking certain OS events and quickly acting on them, we're exercising the file system code in a way that may cause problems.

My best explanation for that is that you basically "raced" both threads into the kernel with JUST the right/wrong timing that you triggered this deadlock.

The problem is that it doesn't affect many machines, is slow to reproduce, and is not deterministic. That said, we have a tester that tries to simulate the problem with APFS transparent compression, concurrent threads, etc. Will advise if we find something.

It's very likely that there is an APFS bug here, so if you haven't already, please file a bug on this and post the bug number back here.

FB22345232

It's ALSO very likely that your ES client (possibly both ES clients) are required to actually experience this bug, and you should address it as well.

We do whitelist several Apple housekeeping processes and daemons, but not revisiond yet. Are there any other tools directly involved with APFS compression that we should consider?

0

Answer 11

mslot OP

5d

I mistakenly created that FB ticket on the wrong account. Here is the correct ID: FB22345323

0

Answer 12

DTS Engineer OP

Apple

4d

I suspect it was a FILE_NOTIFY_CLOSE.

Are you sure it isn't an auth event? Also, are you using separate es_client_t's for your notify and auth events? The timing issue I outlined is theoretically possible in both cases, but MUCH easier to do with auth events.

It would be insightful to know if APFS triggers that or related ES events when decompressing a file.

Open is what typically triggers decompression; however, in this case, revisiond was actually deleting a file (by calling "remove" on its path), so it's pretty likely that they had closed it just prior to the remove call.

I will submit unified logs that cover the period in question, but your point remains.

I'll try and take a look, but I'm not necessarily optimistic. Frankly, these issues generally have a high degree of uncertainty about them, so my comment there was really about acknowledging that reality.

Not that our ES hooks are directly causing the deadlock, but that by hooking certain OS events and quickly acting on them, we're exercising the file system code in a way that may cause problems.

What are you actually "doing" to the files? I suspect you're copying them, but wouldn't mind confirming that.

We do whitelist several Apple housekeeping processes and daemons, but not revisiond yet. Are there any other tools directly involved with APFS compression that we should consider?

To be honest, I don't know. Originally, compressed files were a rarely used "oddity" that most of the system fairly actively avoided. Frankly, that's their implementation (reusing the resource fork) is a bit of "hack“ - they weren't intended to be "general purpose", so the implementation was designed to minimize change to the file system (HFS+), not really to be "right". However, their usage has broadened since then that will probably continue, so I wouldn't even try to guess at this point.

Having said that, my suggestion to whitelist RevisionD is really about what it "does", not how it's doing it. It's effectively creating and managing file-level backups, so "actively" screening its activity isn't necessarily all that valuable. As two other examples, I'd probably whitelist Timemachine and Spotlight's components (mds and its various helpers). Those are all daemons that end up spending most of their time "looking at arbitrary files" for very straightforward reasons. Scrutinizing their work in the same way you would any other process is going to end up generating a whole lot of extra noise without actually providing much value.

In the case of RevisionD, I haven't looked at its actual activity in detail, but I suspect the vast majority of its I/O is spent reading/writing files inside its own data store, with all of the underlying data being duplicates of other files in the system. I'm not sure actively gets you much value.

FB22345323

Perfect, thank you. Your other bug should eventually get closed as a duplicate (to that one).

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

0

Answer 13

mslot OP

4d

Are you sure it isn't an auth event?

Without going into too much detail, it's possible, but unlikely given the methods in the stack that led to the deadlock. Auth events are primarily for security and integrity protection.

What are you actually "doing" to the files? I suspect you're copying them, but wouldn't mind confirming that.

The method on the stack is -[MyPathHelper fetchFileTypeForFileAtPath:], which is mostly used for scanning and reporting.

Having said that, my suggestion to whitelist RevisionD is really about what it "does", not how it's doing it.

I was mostly thinking of its interaction with extended attributes, as we see the target threads touching those as well.

I will post more detail in the FB ticket, and we can dive deeper there.

0

Answer 14

DTS Engineer OP

Apple

1d

I was mostly thinking of its interaction with extended attributes, as we see the target threads touching those as well.

Just to clarify something, what's really going on here isn't "really" xattr access. The resource fork has a bit of a strange history, as it predates general "xattr" support by ~15 years and doesn't really "fit" into the same mold as any other xattr. It originally had its own API access path through "FSReadFork" and the "..namedfork/rsrc" suffix to open and, more importantly, is routinely MUCH larger than any other xattr, most of which are measured in “bytes". The resource fork was basically "added" to the more general xattr implementation, rather than being implemented as a "native" part of the original implementation.

Several years ago, we repurposed the resource fork and used it to implement file system level compression of specific files, which is what's happening here. Critically, these files look like standard data files and are accessed by reading "the file" using "open". The underlying xattr access is an implementation detail, though it probably is visible to the ES client.

That does lead to a note here:

Without going into too much detail, it's possible, but unlikely given the methods in the stack that led to the deadlock. Auth events are primarily for security and integrity protection.

Because of how the compressed files, the resource fork, and xattr's are entangled, it's very possible that you're receiving multiple auth events for what's actually the same "action". For example, I wouldn't be surprised if you received ES_EVENT_TYPE_AUTH_UNLINK and then ES_EVENT_TYPE_AUTH_DELETEEXTATTR, even though the app only called "remove()". Similarly, I don't know what the consequences of denying delete xattr on an unlinked file are, but I wouldn't be surprised if they're "weird".

I will post more detail in the FB ticket, and we can dive deeper there.

In terms of preventing issues like this in a "general" way, my only suggestion would be that you clone the file yourself and then operate on the clone. I don't think the clone operation itself can create this issue [1] and once the clone is created, you’re working with a completely new file, so there isn't any interaction with the original file data.

That might seem like an "extreme" solution, but I think it will be significantly safer and I suspect the performance impact will be less than you might think. Cloning REALLY is that fast.

[1] While the result is the same as creating "a copy", the cloning operation is more similar to creating a hard link than a traditional copy operation.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

0

Answer 15

mslot OP

1d

Yeah, the resource fork is just another named fork. For APFS compression, my understanding is that it's using a com.apple.decmpfs named fork, and we see those methods from the APFS kext in the stack.

Disassembling revisiond and checking the stack crawl, it appears to be working with com.apple.genstore.* xattrs.

We are still trying to identify which file(s) are involved, but so far, any logging messes with the timing to hide the problem and lsof just hangs due to the rwlock, but perhaps it's down in .DocumentRevisions-V100.

The underlying xattr access is an implementation detail, though it probably is visible to the ES client.

I will check to see if calls to extended attributes and named forks (directly or indirectly) are reported that way to the ES hooks.

clone the file yourself and then operate on the clone.

That's an interesting thought, especially for the copy-on-write features. I will take it back to the team to consider.

0