So, in my earlier post, I said:
"Misunderstanding what the system was actually doing"
Which now leads to:
To add more context, suppose the user is copying a file from one destination to a Network location.
Am I correct that you've only actually seen this on Network copies? And (possibly) only some Network copies? As a side test, I'd be curious what happens if you tested with an AFP server instead of smb.
I ask because of this:
If it took longer than 5 seconds to complete the file inspection (which is well below the deadline for the auth event), then DesktopServicesHelper just deletes this file created at the destination. This we are observing in macOS Tahoe only.
Strictly speaking, 5s is a somewhat odd amount of time. That may not sound like a long time, but it's an eternity at the time scale the kernel operates, particularly for any kind of "local" file system operation. You haven't actually said this, but I suspect you've found that the timing here is fairly precise— that is, it works fine with a delay of 4/4.5s, and it ALWAYS fails with a delay greater than "5s".
That's because I think what's actually going on here is a timeout in the SMB layer. That is, your ES event is inadvertently stalling the ES event long enough for the SMB driver to times out and unwind the entire operation. That leads to here:
Our concern is with the observed behavior where DesktopServicesHelper appears to proceed with unlinking the source file before the ES authorization event associated with the operation has received a response.
You said "appears", but what actually happened? Did you receive an unexpected ES event, or did the file system just "change"? If the SMB driver is involved, then it can (and does) make changes "outside" of the normal ES system’s normal "view".
In addition:
This we are observing in macOS Tahoe only.
...Most changes to SMB are tied to major system releases (not updates).
Finally, to this point:
Ideally, DesktopServicesHelper should wait till ES event is responded to before going ahead with deletion of the file.
In my experience, this kind of thinking is a trap many ES client developers fall into. As an ES developer, it's critical that you take responsibility for adapting and working within the system implementation, NOT expect the system to adapt to your expectations. The system is complex and actively evolving, a reality you need to anticipate and design around. More the point:
-
Any behavior a given system component implements could always be implemented by some other app/component, so changing the system component just moves the problem somewhere else without actually fixing anything.
-
Most of these behavioral issues also represent exploit opportunities an attacker could use to break your client.
As a concrete example of that second point:
If it took longer than 5 seconds to complete the file inspection
File cloning makes it very easy to generate a very large number of files very quickly. I don't know how your file scanning infrastructure is implemented but I'd be willing to bet that I could generate and open new files fast enough that:
-
I bottleneck your scanner, stalling your ES client long enough that the system terminates you.
-
Your scanner starts skipping file scans (to avoid termination), allowing me to sneak files "past" your scanner.
There really isn't any way to avoid that problem as long as your implementation relies on stalling auth requests until scans complete. The approach that does work is a combination of:
-
A higher level architecture which informs the user that a file is being blocked pending scanning completion, at which point you can simply deny whatever you want until scanning is complete.
-
Using things like background scanning and intelligent heuristics to minimize the need for the "visible" scanner above.
On that second point, note my comment above about file scanning and DesktopServicesHelper.
__
Kevin Elliott
DTS Engineer, CoreOS/Hardware