What’s the recommended way to recursively walk through a directory tree using File Coordination? From what I understand, coordinating a read of a directory only performs a “shallow” lock; this would mean that I’d need to implement the recursive walk myself rather than use FileManager.enumerator(at:includingPropertiesForKeys:options:errorHandler:)
plus a single NSFileCoordinator.coordinate(with:queue:byAccessor:)
call.
I’m trying to extract information from all files of a particular type, so I think using NSFileCoordinator.ReadingOptions.immediatelyAvailableMetadataOnly
on each file before acquiring a full read lock on it (if it’s the right file type) would make sense. Am I on the right track?
I’m trying to extract information from all files of a particular type, so I think using NSFileCoordinator.ReadingOptions.immediatelyAvailableMetadataOnly on each file before acquiring a full read lock on it (if it’s the right file type) would make sense. Am I on the right track?
So, the first question here is "what are you actually trying to do"?
The problem here is that, by design, file systems are basically shared databases with minimal locking, which means that file coordination for JUST metadata isn't necessarily all that useful. As a concrete example, take a basic task like calculating the size of the directory. You do that by summing up the size of every file, but it's always possible for a file you've looked at to change after you've looked at it, at which point the size of the directory is no longer "right". Now, adding file coordination into that process may (depending on the configuration you pick) change the "answer" you get, but that's only because it delayed writes "later" than they would otherwise have been. The writes are still going to happen, at which point you're just getting a different "wrong" than you would have gotten without file coordination.
Looking at "NSFileCoordinator.ReadingOptions.immediatelyAvailableMetadataOnly" in particular, a careful read of its documentation has a good "hint" about what it's "for". It starts by saying:
"Specifying this option grants the coordinated read immediately"
...but the key point there is actually what follows:
"...(barring any conflicts with other readers, writers, or file presenters on the same system)"
In other words, what you're actually saying is "let me read the metadata but let anyone who happens to be doing something finish first". Whether or not that's what you want... depends on what you're trying to do. Looking at the two "extremes":
-
If you're working inside a directory you "own" and/or where the contents are modified with some consistent "pattern", then it can help you get more "coherent" results. For example, if two files are always modified in the same coordinated write, then immediatelyAvailableMetadataOnly means you're less likely* to see the "intermediate" state where only one of them has been modified.
-
If you're scanning in the "public" file system (like the user’s home or Documents directory on macOS), then immediatelyAvailableMetadataOnly has a lot less value. You don't really have any control over what's going on, so it's likely that at least part of the waiting will be for things you don't care about. More the point, the "public" file system is also much more dynamic so the longer it takes to scan, the more likely it is that whatever you've scanned has ALREADY changed.
*One lesson I've been taught over and over again is to avoid relying on the file system working EXACTLY the way you expect. Good file system code should be flexible enough that an unexpected situation doesn't break the app.
Covering a few other details:
What’s the recommended way to recursively walk through a directory tree using File Coordination?
The race conditions inherent to the file system are what lead to the file presenter "side" of the file coordination system. That is, you can't scan a directory "fast enough" that you’re GUARANTEED to have an accurate "view" of the file system state. However, you CAN tell the system "this is what I'm interested in" and let the system tell you when something changes.
From what I understand, coordinating a read of a directory only performs a “shallow” lock; this would mean that I’d need to implement the recursive walk myself...
I'll warn you now, the words "implement the recursive walk myself" are a bit of a red flag. It is (probably) possible to read the contents of a directory faster than NSDirectoryEnumerator. It is NOT easy to do* and can't really be done with any of our high-level APIs. If you want to "recursively coordinate", I would probably do this:
*Case in point, what contentsOfDirectory(at:includingPropertiesForKeys:...) actually does is use NSDirectoryEnumerator to "fill up" an array, then return that array.
-
Use coordinate(with:queue:byAccessor:) to issue an asynchronous coordinated read against a directory.
-
Inside that block, use enumerator(at:includingPropertiesForKeys:...) with"skipsSubdirectoryDescendants" to enumerate the directory’s immediate contents (a "shallow" enumeration).
-
For every directory you encounter, issue another (asynchronous) call to "#1".
This will walk through the entire hierarchy without a lot of extra overhead and without issuing "nested" coordinated reads against the large hierarchy. One note on the concurrency side of this— to avoid a thread explosion, you'll want to use the same OperationQueue for #1; however, if you're dealing with very large hierarchies, increasing maxConcurrentOperationCount above "1" might have some** benefit.
**Unfortunately, being more specific than that is very difficult, as it depends on the file system and the performance of the target device. If performance is critical, this is something you'd need to experiment with and "tune", possibly on a device/target basis. I mentioned this for "completeness", but I would just use "1" unless this is a critical part of your product.
__
Kevin Elliott
DTS Engineer, CoreOS/Hardware