Combating `kFSEventStreamEventFlagMustScanSubDirs`

Hello!

In our application we use FSEventStream to track the file system changes, this is required to keep up-to-date cached view on the file system and manage auxiliary state like indices. To get support for tracking symlinks that may point outside of the directory of interest, we have set up an FSEventStream that is watching the root ('/') of the file system.

Now, the issue is, the application receives a significant amount of MustScanSubDirs events with the UserDropped flag raised, which creates a non-trivial load on the system by our application as we need to check if all the auxiliary state is still adequate. Tangentially, it was noted that the count of these events is higher on Apple Silicon systems in comparison to the older generation Intel MacBooks.

The MustScanSubDirs events inevitably come en masse during a high file system load, which is not surprising by itself, but we are looking for a way to reduce the count. Interestingly, in some cases we have observed a steady stream of such events even after our load test which creates/deletes files would complete, for at least several minutes after.

What we have tried:

  1. Given it has the UserDropped flag, which indicates that our application cannot keep up with the stream, we have completely removed any work from the event processing callback, reducing it to an event counter increment.
  2. For the same reason as above, set the priority of the thread that backs the processing run loop to soft real-time. Instruments' System Trace showed that now our thread is always scheduled within several microseconds after it was awaken by fseventsd and was never preempted.
  3. Use DispatchQueue instead of the deprecated RunLoop combination with FSEventStream. The expectation was that it would reduce context switches. Experimented with serial and concurrent dispatch queues with high QoS.
  4. Reduce the scope down to the directory of interest instead of watching the file system root.
  5. Different combinations of kFSEventStreamCreateFlag* flags with different values of the latency parameter (from 0 to 5 seconds).
  6. Recreate the FSEventStream after a MustScanSubDirs event, starting at the event ID directly preceding it (the last non-dropped event). The expectation was that reading from the event log would allow to bypass the "keep up" requirements.

None of that helped to definitively lower the count of MustScanSubDirs events. What's worse, as mentioned above, even after the load test would complete, occasionally, we'd continue to receive such events for a while after, without our test generating any load, and judging from the count of events in total during that period, no other application was creating a massive load either. So far we have only observed that when watching the file system root.

The (6) somewhat helps with the issue, recreating the stream can allow our application to proceed further at some point, without generating a new MustScanSubDirs event following the last successfully read event, but it also has drawbacks, as the application can "stuck" for a longer periods of time (30-60 seconds) recreating the stream at the same event ID and repeatedly receiving MustScanSubDirs event immediately. Eventually it would get through that event (hopefully without losing anything in the process) and will start receiving new events again. Although, it will very soon encounter the next MustScanSubDirs event and can stuck on it, too.

In the end, we're looking for the answer to the following questions:

  1. How exactly FSEventStream decides that MustScanSubDirs should be sent? Why do we get UserDropped events even if our application seemingly can keep up with the stream of events?
  2. Are there any other knobs that can help with reducing the count of such events? Is it possible to somehow increase buffers or affect fseventsd in some other way to help our application to "keep up"? Or can we do something on the application side?
  3. Is there any other API that doesn't require root privileges to reliably track file system events? We are aware of kqueue, but it doesn't suit well as it requires to set up a watch for every individual subdirectory and/or file and can easily max out the count of permitted open file descriptors.
  4. What has changed with the Apple Silicon to increase the count of such events? The assumption was the heterogeneous cores and changes to the scheduling procedure, but we still receive a lot even with real-time threads.

Thanks!

Combating `kFSEventStreamEventFlagMustScanSubDirs`
 
 
Q