Unexpected Permission denied error on file sharing volume

I am getting recurring errors running code on macOS 15.1 on arm that is using a volume mounted from a machine running macOS 14.7.1 on x86. The code I am running copies files to the remote volume and deletes files and directories on the remote volume. The files and directories it deletes are typically files it previously had copied.

The problem is that I get permission failures trying to delete certain directories.

After this happens, if I try to list the directory using Terminal on the 15.1 system, I get a strange error:

ls -lA TestVAppearances.app/Contents/runtime-arm/Contents
total 0
ls: fts_read: Permission denied

If I try to list the directory on the target (14.7.1) system, there is no error:

TestVAppearances.app/Contents/runtime-arm/Contents:
total 0

Answered by DTS Engineer in 819722022

I am somewhat surprised that moving the application to a directory that is not and has never been displayed by Finder (before trying to delete the application) does not fix the problem.

There's an odd difference in the listing output that might explain the issue. The values for "runtime-arm" match:

Client:
drwxr-xr-x@ 1 alan  staff  16384 Dec  8 09:37 runtime-arm

Server
drwxr-xr-x@ 3 alan  staff  102 Dec  8 09:37 runtime-arm

But the values for the contents of "runtime-arm" do NOT match:

Client: 
drwxr-xr-x  1 alan  staff  16384 Dec 12 11:34 Contents

Server:
drwxr-xr-x@ 2 alan  staff  68 Dec 12 11:34 Contents

The "@" symbol above indicates that an extended attribute has been attached, so what does the command: " xattr -lx <path> " return for the 4 objects above?

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Is 14.7.1 when this specifically started happening? There was a significant issue with security scoped bookmark resolution that was introduced in 14.7.1 and was only resolved in 14.7.3.

Interesting point. I don't believe I experienced this problem on older systems. I will upgrade to 14.7.4 and see if that makes a difference.

Is the server machine logged into by multiple users?

No.

Does the volume(s) you're seeing this in have the "Ignore Ownership on this volume" checkbox checked?

Yes for the external drive, no for the internal drive. (I tested both cases.)

The broader point here is that IF reseting the server resolves the issue, then the underlying issue is actually with the server's state/configuration, NOT the actual file system data.

I believe we are in violent agreement.

I don't follow how you think this works

The Finder is not involved. The client is a command line program running on the client machine. It is presumably talking to the file sharing daemon on the server machine (via the client OS). My thought was that the file sharing daemon itself might cache some data to improve performance.

The requirement that a directory needs to empty is something individual file systems enforce, NOT the higher level system.

I can imagine the file sharing daemon enforcing this requirement when using a cache (if it actually does use a cache).

Alas, no change running 14.7.4 on the server.

Quick comment here:

Alas, no change running 14.7.4 on the server.

Since this is effecting a particular device, one think I would try here is to try and reset the file sharing system as completely as possible. Basically, delete the existing configuration, turn of smb, restart the machine, turn everything on again.

Does the volume(s) you're seeing this in have the "Ignore Ownership on this volume" checkbox checked?

Yes for the external drive, no for the internal drive. (I tested both cases.)

If this is designated "shared volume", then I would turn that setting off and properly configure the volumes permissions.

I can imagine the file sharing daemon enforcing this requirement when using a cache (if it actually does use a cache).

Ahh, no. This is definitely not a requirement we'd try and enforce outside the kernel. The check itself is fast enough that the cost of trying to maintain a somewhat current cache (just for empty) would higher than the performance gain. The Finder caches the directory size because that calculation requires retrieving all of the file records, which can be pretty expensive. Interestingly, it doesn't actually use that cache when you delete directories, probably because it's typically faster to check with the VFS system vs. the cache.

So, the last thing here would be what (if anything) is being printed to system log on the server. I can't think of anything that would really explain everything you've described, so it's basically a question of looking/hoping for the right clue from the system log.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

If I turn off File Sharing on the server, turn it back on, and reconnect the client to the remote volume, the client is then able to list the remote directory without getting Permission Denied errors. I believe this result demonstrates the bad state is somewhere in the file sharing pipeline (most likely in the file sharing daemon).

The check itself is fast enough that the cost of trying to maintain a somewhat current cache (just for empty) would higher than the performance gain.

What I had in mind was caching directory contents to speed up operations like listing the directory. It would then make sense to check the cache before trying to delete a directory.

An explanation is needed for why the files that fail are usually, but not always the same each time. I suspect the predictability results from the operation sequence generated by rm -rf, which is always the same, and the variability is the result of small timing differences for each run.

When a client perform a sequence of operations on a remote volume, is it guaranteed that the operations are performed on the actual file system in the same order? I'm suspicious of delete operations, which only return a status. Does the entire pipeline block until the file system returns a status for a delete operation?

The other possibility, which now seems remote, is that the client Finder is interfering in some way. I use the Finder to connect to the remote volume and immediately close the Finder window, but that might not be enough. Is there a way to connect to the remote volume from the command line?

I just noticed something interesting in these (old) results:

Mac-mini:test alan$ rm -rf VAquaManager.app
rm: VAquaManager.app/Contents/runtime-arm/Contents/Home: Permission denied
rm: VAquaManager.app/Contents/runtime-arm/Contents: Permission denied
rm: VAquaManager.app/Contents/runtime-arm: Permission denied
rm: VAquaManager.app/Contents/runtime-x86/Contents/Home: Permission denied
rm: VAquaManager.app/Contents/runtime-x86/Contents: Permission denied
rm: VAquaManager.app/Contents/runtime-x86: Permission denied
rm: VAquaManager.app/Contents: Permission denied
rm: VAquaManager.app: Permission denied
Mac-mini:test alan$ xattr -lr VAquaManager.app
xattr: [Errno 13] Permission denied: 'VAquaManager.app/Contents/runtime-arm/Contents/Home/conf'

The file VAquaManager.app/Contents/runtime-arm/Contents/Home/conf was not deleted, but rm did not complain about it. It only complained about its parent, which could not be deleted because it was not empty.

That tells me that the system call did not return an error, which makes me think that the file sharing daemon returned an OK status before it was known that the directory was not deleted. I observed this problem earlier when using my own program to delete. It is what made me think that operations were being performed out of order.

Regarding Finder, the client Finder responds quickly when a change is made on the server. I assume it getting notifications of changes, which implies that the file sharing daemon is also getting notifications of changes. That could explain how caching done by the file sharing daemon could get out of sync with the file system.

It could also be that the file system's implementation of FSEvents is causing the problem. When file sharing is turned off, the FSEvents listeners are unregistered and any associated state in the file system is presumably flushed. The problem is to explain why local FS operations do not fail. Does the file sharing daemon use alternative or private APIs to read directories?

That tells me that the system call did not return an error, which makes me think that the file sharing daemon returned an OK status before it was known that the directory was not deleted. I observed this problem earlier when using my own program to delete. It is what made me think that operations were being performed out of order.

I haven't looked at the details in depth but it's possible something like this does occur. From a performance perspective there is a pretty ENORMOUS performance benefit to SMB "assuming" operations will succeed and proceeding forward with a command stream instead of individually waiting on every single action.

However, keep in mind that this is a problem in the other direction as well. That is, the client cannot delete all the files it knows about and then assume the directory is empty, as someone else can be creating files at the same time it's deleting them.

Regarding Finder, the client Finder responds quickly when a change is made on the server. I assume it getting notifications of changes,

Sort of. They actually come through the VFS system and FSEvents, using a mechanism I happen to have been describing in this thread.

which implies that the file sharing daemon is also getting notifications of changes.

Well... yes. More specifically, everything the Finder "knows" come from the daemon.

That could explain how caching done by the file sharing daemon could get out of sync with the file system.

In theory I suppose, except the smb server isn't really maintaining the kind of cache you're describing, as it just wouldn't provide very much performance benefit. It also doesn't explain what makes this particular directory different.

It could also be that the file system's implementation of FSEvents is causing the problem. When file sharing is turned off, the FSEvents listeners are unregistered and any associated state in the file system is presumably flushed.

SO, since you're curious, strictly speaking, the smb server is actually monitoring through kqueue (this is the API FSEvent's is built on), not FSEvents. Note that this can be confusing if you're looking at kernel source because "fsevent" in the kernel means "file system event we're going to send out through kqueue". The "FSEvent" API is built on top of kqueue. In any case, a few important details from the kqueue side:

  • Local file systems don't generate those events the vfs layer itself does, based on it's on interactions with the vfs driver.

  • The API is entirely stateless. ALL it does is say "this happened" and nothing else.

The problem is to explain why local FS operations do not fail. Does the file sharing daemon use alternative or private APIs to read directories?

No and, to be honest, there isn't really any private API that it COULD call. The "base" APIs (meaning, the Unix layer syscalls which call into the VFS layer) are all* public and all VFS access has to go through them.

*Technically, there is is one "semi-private" VFS API, but it's for copying not directory access.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

However, keep in mind that this is a problem in the other direction as well. That is, the client cannot delete all the files it knows about and then assume the directory is empty, as someone else can be creating files at the same time it's deleting them.

I don't believe that this particular problem can be blamed on interference from other known parties, such as Finder or MDS. Such interference would only temporarily prevent a directory being deleted. As far as I can tell, the inability to delete the directory persists until File Sharing is restarted.

(I'm not worried about rogue agents that perpetually create and delete files.)

Recall that after restarting FIle Sharing, the client regained the ability to list the directory tree. Just now, in this state where clients can list the directory tree, I made another attempt to delete the (remaining) tree, and it failed (but not at the same directory).

Mac-mini:test6 alan$ rm -rf V*
rm: VAquaManager.app/Contents: Permission denied
rm: VAquaManager.app: Permission denied
Mac-mini:test6 alan$ ll V*
total 32
drwxr-xr-x  1 alan  staff  16384 Feb 28 12:02 Contents
Mac-mini:test6 alan$ ll -R
total 32
drwxr-xr-x@ 1 alan  staff  16384 Jan 14 19:59 VAquaManager.app

./VAquaManager.app:
total 32
drwxr-xr-x  1 alan  staff  16384 Feb 28 12:02 Contents

./VAquaManager.app/Contents:
total 32
drwxr-xr-x  1 alan  staff  16384 Feb 28 12:02 runtime-arm

./VAquaManager.app/Contents/runtime-arm:
total 0
ls: fts_read: Permission denied

There are problems reading the directory, but apparently no problems reading the directory metadata. (I'm not sure if this is new, as I had not tried all of these commands before now.)

Mac-mini:test6 alan$ ls -l VAquaManager.app/Contents/runtime-arm
total 0
ls: fts_read: Permission denied
Mac-mini:test6 alan$ ls VAquaManager.app/Contents/runtime-arm
ls: fts_read: Permission denied
Mac-mini:test6 alan$ xattr -l VAquaManager.app/Contents/runtime-arm
Mac-mini:test6 alan$ stat VAquaManager.app/Contents/runtime-arm
905970052 275529348 drwxr-xr-x 1 alan staff 0 16384 "Feb 28 12:13:31 2025" "Feb 28 12:02:56 2025" "Feb 28 12:02:56 2025" "Jan 14 19:59:24 2025" 4096 32 0 VAquaManager.app/Contents/runtime-arm
Mac-mini:test6 alan$ ls -ld VAquaManager.app/Contents/runtime-arm
drwxr-xr-x  1 alan  staff  16384 Feb 28 12:02 VAquaManager.app/Contents/runtime-arm

The evidence at this point, along with your explanations, is that the problem is bad state in the file sharing daemon. What kind of state could this be? You say that caching is not useful, but it sure seems like some caching is happening.

From a performance perspective there is a pretty ENORMOUS performance benefit to SMB "assuming" operations will succeed and proceeding forward with a command stream instead of individually waiting on every single action.

If this is known behavior, clients can be written to double check when needed (accepting the performance hit). However, I won't hold my breath waiting for a new command line argument to rm.

It remains to be seen if only trying to delete directories that are known to be empty avoids this problem.

It also doesn't explain what makes this particular directory different.

When trying to delete the last item a directory, returning success too soon could cause failure on the attempt to delete the directory. Perhaps something about these directories increases the odds of a delay in the deletion of the last item that is too large.

Revising my last point, if actual deletion is performed asynchronously, then presumably there is a queue of items waiting to be deleted, and it does not matter which directory elements are still present when the attempt is made to delete the directory. It does not have to be the last one.

There does seem to be a special case. When rm tries to delete the parent of a directory that is known to be non-empty, it gets an error response.

It remains to be seen if only trying to delete directories that are known to be empty avoids this problem.

I tried this and it does not solve the problem.

The directory that mysteriously fails to delete is reported to be empty just before I try to delete it.

Although deferred deletes may play a role in determining which directory fails to delete, I think an incorrect cache is the root of the problem.

Some surprising results:

  • (client) I copied runtime-x86 to the server [R]
  • (server) I duplicated R [R2]
  • (client) I tried to delete R2 — it fails (Contents/Home/conf not deleted)
  • (server) restarted File Sharing
  • (server) I duplicated R [R3]
  • (client) no errors listing R, R2, R3
  • (client) I tried to delete R3 — success
  • (server) I duplicated R [R4]
  • (client) I tried to delete R4 — success
  • (client) I tried to delete R and R2 — success

In the above test, the remote volume is APFS. If I try using an HFS+ volume, the client can delete R2 with no problem. I figured the difference was APFS using copy on write. However, if I rename (mv) the copied volume on the server, the client still can delete it on the HFS+ volume (and still fails to delete it on APFS).

Unexpected Permission denied error on file sharing volume
 
 
Q