Hello!
In our application we use FSEventStream
to track the file system changes, this is required to keep up-to-date cached view on the file system and manage auxiliary state like indices. To get support for tracking symlinks that may point outside of the directory of interest, we have set up an FSEventStream
that is watching the root ('/') of the file system.
Now, the issue is, the application receives a significant amount of MustScanSubDirs
events with the UserDropped
flag raised, which creates a non-trivial load on the system by our application as we need to check if all the auxiliary state is still adequate. Tangentially, it was noted that the count of these events is higher on Apple Silicon systems in comparison to the older generation Intel MacBooks.
The MustScanSubDirs
events inevitably come en masse during a high file system load, which is not surprising by itself, but we are looking for a way to reduce the count. Interestingly, in some cases we have observed a steady stream of such events even after our load test which creates/deletes files would complete, for at least several minutes after.
What we have tried:
- Given it has the
UserDropped
flag, which indicates that our application cannot keep up with the stream, we have completely removed any work from the event processing callback, reducing it to an event counter increment. - For the same reason as above, set the priority of the thread that backs the processing run loop to soft real-time. Instruments' System Trace showed that now our thread is always scheduled within several microseconds after it was awaken by
fseventsd
and was never preempted. - Use
DispatchQueue
instead of the deprecatedRunLoop
combination withFSEventStream
. The expectation was that it would reduce context switches. Experimented with serial and concurrent dispatch queues with high QoS. - Reduce the scope down to the directory of interest instead of watching the file system root.
- Different combinations of
kFSEventStreamCreateFlag*
flags with different values of thelatency
parameter (from0
to5
seconds). - Recreate the
FSEventStream
after aMustScanSubDirs
event, starting at the event ID directly preceding it (the last non-dropped event). The expectation was that reading from the event log would allow to bypass the "keep up" requirements.
None of that helped to definitively lower the count of MustScanSubDirs
events. What's worse, as mentioned above, even after the load test would complete, occasionally, we'd continue to receive such events for a while after, without our test generating any load, and judging from the count of events in total during that period, no other application was creating a massive load either. So far we have only observed that when watching the file system root.
The (6) somewhat helps with the issue, recreating the stream can allow our application to proceed further at some point, without generating a new MustScanSubDirs
event following the last successfully read event, but it also has drawbacks, as the application can "stuck" for a longer periods of time (30-60 seconds) recreating the stream at the same event ID and repeatedly receiving MustScanSubDirs
event immediately. Eventually it would get through that event (hopefully without losing anything in the process) and will start receiving new events again. Although, it will very soon encounter the next MustScanSubDirs
event and can stuck on it, too.
In the end, we're looking for the answer to the following questions:
- How exactly
FSEventStream
decides thatMustScanSubDirs
should be sent? Why do we getUserDropped
events even if our application seemingly can keep up with the stream of events? - Are there any other knobs that can help with reducing the count of such events? Is it possible to somehow increase buffers or affect
fseventsd
in some other way to help our application to "keep up"? Or can we do something on the application side? - Is there any other API that doesn't require root privileges to reliably track file system events? We are aware of kqueue, but it doesn't suit well as it requires to set up a watch for every individual subdirectory and/or file and can easily max out the count of permitted open file descriptors.
- What has changed with the Apple Silicon to increase the count of such events? The assumption was the heterogeneous cores and changes to the scheduling procedure, but we still receive a lot even with real-time threads.
Thanks!