NSFileManager getRelationship:ofDirectoryAtURL:toItemAtURL:error: returning NSURLRelationshipSame for Different Directories

I'll try to ask a question that makes sense this time :) . I'm using the following method on NSFileManager:

  • (BOOL) getRelationship:(NSURLRelationship *) outRelationship ofDirectoryAtURL:(NSURL *) directoryURL toItemAtURL:(NSURL *) otherURL error:(NSError * *) error;
  • Sets 'outRelationship' to NSURLRelationshipContains if the directory at 'directoryURL' directly or indirectly contains the item at 'otherURL', meaning 'directoryURL' is found while enumerating parent URLs starting from 'otherURL'. Sets 'outRelationship' to NSURLRelationshipSame if 'directoryURL' and 'otherURL' locate the same item, meaning they have the same NSURLFileResourceIdentifierKey value. If 'directoryURL' is not a directory, or does not contain 'otherURL' and they do not locate the same file, then sets 'outRelationship' to NSURLRelationshipOther. If an error occurs, returns NO and sets 'error'.

So this method falsely returns NSURLRelationshipSame for different directories. One is empty, one is not. Really weird behavior. Two file path urls pointing to two different file paths have the same NSURLFileResourceIdentifierKey? Could it be related to https://developer.apple.com/forums/thread/813641 ?

One url in the check lived at the same file path as the other url at one time (but no longer does). No symlinks or anything going on. Just plain directory urls.

And YES calling -removeCachedResourceValueForKey: with NSURLFileResourceIdentifierKey causes proper result of NSURLRelationshipOther to be returned. And I'm doing the check on a background queue.

Answered by DTS Engineer in 878053022

Doesn't appear to be what's going on in this case. I made this dumb little test which can easily reproduce the issue (sorry, can't get code to format well on these forums).

Interesting. So, I can actually explain what's going on, and it's actually not the cache.

So, architecturally, NSURL has two different mechanisms for tracking file location— "path" and "file reference". Path works exactly the way you'd expect (it's a string-based path to a fixed location), while file reference relies on low-level file system metadata to track files. Critically, this means that the file reference will track the object as it's moved/modified within a volume.

Secondly, keep in mind NSURLs are generally "data" objects, meaning they don't "proactively" update their content.

So, the actual issue here starts here:

if (![fm trashItemAtURL:untitledFour resultingItemURL:&resultingURL error:nil])

At the point that method returns, "untitledFour" is no longer entirely coherent, as its path points to the original location, but its reference points to the file in the trash. You can see this for yourself by running this at the top of compareBothURLS:

NSURL* pathURL = untitledFour.filePathURL;
NSURL* refURL = untitledFour.fileReferenceURL;

NSLog(@"1 %@", untitledFour.path);
NSLog(@"2 %@", pathURL.path);
NSLog(@"3 %@", refURL.path);
	
NSLog(@"A %@", untitledFour.fileReferenceURL.description);
NSLog(@"B %@", pathURL.fileReferenceURL.description);
NSLog(@"D %@", refURL.fileReferenceURL.description);

What you'll find is that:

  • In the first log set, "1" & "2" will match, both pointing to the original file location. "3" will not, pointing to the trash instead.

  • In the second log set, "A" & "C" will match, while "B" will not.

More specifically, the strings returned in the second log set will have this format:

file:///.file/id=<number>.<number>/

...and the second number will be different for "B".

With all that context:

(1) The reason getRelationship is returning "same" is that it primarily relies on file reference data and the reference data points to the file in the trash. There's an argument that it shouldn't do this, however. In its defense, using the reference data makes it much easier to sort out issues like hard-linked files and/or symbolic links allowing multiple references to the same file.

(2) The reason "removeCachedResourceValueForKey" changed the behavior is that it deleted the file reference data, forcing NSURL to resolve the data again. You'll actually get exactly the same effect if you test with "untitledFour.filePathURL".

What I'd highlight here is that the "right" behavior here isn't entirely clear. That is, is the problem that "getRelationship" is claiming that two different paths are "the same file"? Or is the problem that NSURL is returning the wrong path value for a specific file?

That question doesn't have a direct answer because the system doesn't really "know" what you actually want— are you trying to track a particular "object" (fileReferenceURL) or are you trying to reference a particular "path" (filePathURL)? It doesn't "know", so it's ended up with an slightly different object that's tracking both...

...but you can tell it what you want, at which point the API will now do exactly what you'd expect. More specifically, you can change the behavior by forcing the URL type you want immediately after you create the directory:

    if (![fm createDirectoryAtURL:untitledFour withIntermediateDirectories:YES attributes:nil error:nil])
    {
        NSLog(@"Test failed");
        return;
    }
    
#if 1
    untitledFour = untitledFour.fileReferenceURL;
#else
    untitledFour = untitledFour.filePathURL;
#endif

Strictly speaking, you could set "filePathURL" anywhere you want, but you can't create a fileReferenceURL to a non-existent object, so it needs to be after the create. In any case, either of those two configurations works the way you'd expect.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

So this method falsely returns NSURLRelationshipSame for different directories. One is empty, one is not. Really weird behavior.

Do you know where/what the directories "were"? The problem here is that there's a pretty wide variation between the "basic" case of "a bunch of files and directories sitting on a standard volume" and "the range of ALL possible edge cases".

Two file path URLs pointing to two different file paths have the same NSURLFileResourceIdentifierKey?

Yes, this is possible. As one example, the data volume basically ends up in the hierarchy "twice" meaning that, for example, the path "/System/Volumes/Data/Users/" and "/Users/" are in fact the same directory. And, yes, getRelationship returns NSURLRelationshipSame for those directories.

Now, this:

One is empty, one is not.

...is definitely "weirder". Ignoring the cache issue below, I don't think you could do it within a standard volume, but you might be able to do it using multiple volumes, particularly duplicated disk image and/or network file systems.

However, in this case:

Could it be related to https://developer.apple.com/forums/thread/813641?

One URL in the check lived at the same file path as the other URL at one time (but no longer does). No symlinks or anything going on. Just plain directory URLs.

...yes, it's a/the cache. The proof of that is this:

And YES calling -removeCachedResourceValueForKey: with NSURLFileResourceIdentifierKey causes the proper result of NSURLRelationshipOther to be returned. And I'm doing the check on a background queue.

...since any issue that is fixed by clearing the cache is, by definition, "caused" by the cache. That's a good excuse to revisit this thread here, which I'm afraid I missed:

Could it be related to https://developer.apple.com/forums/thread/813641 ?

The core of the issue here is the inherent tension between a few facts:

  1. The entire file system is essentially a lock-free database being simultaneously modified by an unconstrained number of processes/threads.

  2. Your ability to monitor file system state is relatively limited. Basically, you can either ask for the current state and receive an answer with unknown latency or ask the system to update you as things change, at which point you'll receive a stream of events... with unknown latency.

  3. Accessing the file system is sufficiently slow that it's worth avoiding/minimizing that access.

Jumping back to here, there's actually a VERY straightforward way to do this:

Two file path URLs pointing to two different file paths have the same NSURLFileResourceIdentifierKey?

That is, have two processes where:

Process 1 calls "getRelationship".

Process 2 manipulates the file system such that the following sequence occurs:

  1. Process 1 retrieves the metadata of the source object.
  2. Process 2 deletes the existing directory at the target location.
  3. Process 2 moves the source object to the target location.
  4. Process 2 deletes the contents of the target object.
  5. Process 1 retrieves the metadata of the target object.

...and process 1 now compares #1 and #5, returning NSURLRelationshipSame because they are in fact the same. Now, you might say this seems far-fetched/impossible to time; however, I never said process 2 was running on the same system. With SMB over a slow connection, I suspect you could replicate the scenario above pretty easily.

The point here is that the system’s caching behavior is simply one dynamic among many. That is, caching increases the probability of strange behavior (like the one above) because it increases the time gap between #1 and #5, and the wider the gap between actions, the more likely it is that "something" has changed. However, you can't actually shrink the gap to the point where it goes away.

One solution to these issues is for the interested processes to communicate with each other to coordinate their actions (for example, by using "File Coordination“). However, that requires all of the processes involved to participate in that mechanism, which they definitely don't today.

Realistically, the reason this all isn't a total disaster is that most of the activity here is either:

  • Directly controlled/managed by the user, who is both being careful about what they does and moving "slow" enough that collisions don't happen.

OR

  • Happening in "private" parts of the file system where only one "entity" is manipulating the data (for example, an app’s data container).

All of which leads to the big question... what are you actually trying to do?

If this is a one-off event that you're concerned/confused about, then the answer is basically, yes, the file system can be way weirder than it looks, and sometimes that means calling removeCachedResourceValueForKey "just in case".

However, if this is something that is a recurring problem for your app, then it might be worth stepping back and rethinking your approach to minimize the possibility and consequences of these kinds of "oddities".

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Thanks for the reply! I actually stumbled across this while reworking things in my app to account for NSURL caching behavior I mentioned in the other thread.

What I was doing not too long ago was using an NSCache on top of NSURL for resource values. At some point when responding to metadata changes I was calling -removeCacheResourceValues on a background thread to get refreshed data and I had discovered that -removeCacheResources could crash if another thread was reading at the same time. I guess at some point in my frustration I just moved some stuff around to stop the crashes (and I did).

I had either forgotten or just never realized that NSURL caches only for a run loop turn (or maybe just sometimes? More on that in a second). I guess this is cool in the middle of a dragging session but apparently at some point I must've just assumed that NSURL must be caching for a more meaningful period of time (from the perspective of my app anyway) because if I didn't call -removeCachedResources I'd get stale values sometimes. SO why cache on top of a cache? And I chucked my NSCache which I never really loved but apparently that was a mistake. My bad.

I guess my wish would be for NSURL to either cache forever until explicitly clear values or don't cache at all because if we're caching on a cache that may not be a cache but sometimes it seems like a cache it's hard to cache. Maybe I'm just being selfish though.

But back to the collision. So I'm reworking all this (not using NSCache this time). Now as I'm rewriting my caching code I commented out few things here and there checking some error handling code paths that seem extremely unlikely to really occur and I stumble across this collision but there are many run loop turns in between these events so I don't understanding why the cached values are living for so long in this particular case. Maybe something like cancelPreviousPerformRequestsWithTarget causes cached values to live longer but I'm not suppose to worry about the implementation details.

I can easily reproduce this with NSFileManager using the following steps:

  1. -trashItemURL:resultingItemURL: - grab the resultingItemURL.
  2. Put an empty new folder in the exact same location you just trashed.
  3. Compare the NSURLFileResourceIdentifierKey of the URLs you got from resultingItemURL with new folder at its old location and they match - until you programmatically remove the cached value.

I guess my wish would be for NSURL to either cache forever until explicitly cleared values or not cache at all because if we're caching on a cache that may not be a cache but sometimes it seems like a cache it's hard to cache.

So, the first issue here is that "not caching at all" isn't really an option. Most of the data you retrieve from NSURL all came from the same API (getattrlist) and, much of the time, that data is ALWAYS retrieved in every call. getattrlist() is a "bulk" fetch API (it's designed to return a bunch of data at once) and the vast majority of the performance cost here is the cost of the syscall itself, NOT the retrieval of the data itself or the copy out of the kernel. Putting that in concrete terms, let’s say you ask for "all" of the times for a file (ATTR_CMN_CRTIME, ATTR_CMN_MODTIME, ATTR_CMN_CHGTIME, ATTR_CMN_ACCTIME, ATTR_CMN_BKUPTIME):

  • Basically "every" file system is going to end up storing all of those values inside some kind of file system-specific structure, so the only "cost" here is the act of finding that record, not the individual time.

  • All the values involved are so small that there isn't that the transit cost "out" of the kernel is basically fixed.

...so asking for one of them costs exactly the same as asking for all 5.

Putting that another way, there's a fundamental disconnect between how file system calls work and how NSURL works. File system APIs are built as "retrieval APIs" which return as much data as possible in a single call (stat being an obvious example). All of the data returned by each system call represents the exact state of that object at a particular "instant" in time. It may not be right "now" (the file system can be constantly changing) but it WAS right at some moment in time.

On the other hand, NSURL (and lots of other API layers) want to let you retrieve individual elements separately, but that means the API then needs to decide whether to:

  1. Return the data it retrieved in an earlier call, which is both faster and provides a more "coherent" picture of the file system state, since the data being retrieved is coming from the same "fetch".

  2. Fetch new data, which is more accurate but creates inconsistent results between the "current" state and the "previous" state.

ACTUALLY doing #2 for every call is a terrible idea for both performance and coherence issues, but that means we're basically stuck trying to sort out when to reset, not if we're going to cache.

As a side note here, an API like URL.resourceValues(forKeys:) gets you much closer to how the file system itself works, since you're not retrieving a fixed dictionary from a particular instant, NOT an ambiguous data smear.

I can easily reproduce this with NSFileManager using the following steps:

  1. -trashItemURL:resultingItemURL: - grab the resultingItemURL.
  2. Put an empty new folder in the exact same location you just trashed.
  3. Compare the NSURLFileResourceIdentifierKey of the URLs you got from resultingItemURL with the new folder at its old location and they match - until you programmatically remove the cached value.

Huh. That's really weird. How did you construct those URLs? Are you building them from string paths or getting them from the system (like through an open panel or by enumerating the directory)? What does "isFileReferenceURL” mean and what happens if you do the same check but call "fileReferenceURL" on both URLs first?

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

ACTUALLY doing #2 for every call is a terrible idea for both performance and coherence issues, but that means we're basically stuck trying to sort out when to reset, not if we're going to cache.

I agree. IMO the problem is not that NSURL is caching the problem is the way it caches. The way it caches forces me to cache on top of it. The documentation claims it only caches for 1 run loop turn but as previously mentioned that is not always the case and certain values tend to get 'stuck.'

Building a cache on top of NSURL resource values which may or may not be stale can cause all sorts of weird behavior if you don't call -removeCacheResourceValues so I can cache on the true value...but NSURL I assume makes its cache thread safe so -removeCacheResourceValues probably isn't so cheap. Doesn't that mean the url cache is costing me performance by requiring me to clear it to get to the true values I really want to cache?

Huh. That's really weird. How did you construct those URLs?

Originally the URL came through NSFIlemanager enumeration, or maybe -createDirectoryAtURL: I can't remember. I'll have to try it out later when I have a little bit more time.

But I just stumbled across some really weird behavior when passing a file type from Finder to my app. It could be unrelated but I wouldn't be completely surprised if it was related to this topic. I might file a bug on that later. It would be great if this forum supported private messages I'm not sure if I'm ready to provide more details yet in the open

The documentation claims it only caches for 1 run loop turn, but as previously mentioned, that is not always the case, and certain values tend to get 'stuck.'

FYI, I think there are actually two different issues at work here:

  1. The run loop itself doesn't actually "turn" at a predictable rate. Depending on how your app is architected and the overall app state, it's entirely possible for an app to go seconds or even minutes without the main thread ever running.

  2. The documentation says that values are "automatically removed after each pass through the run loop", but that's not quite accurate. NSURL is tracking the main loop activity through a runloop observer, but it doesn't actually flush the cache until the first time "something" tries to access that URL from the main thread. If nothing on the main thread accesses that URL, then it could theoretically return the old values "forever".

...with #2 obviously being the most significant issue.

Building a cache on top of NSURL resource values, which may or may not be stale, can cause all sorts of weird behavior if you don't call -removeCacheResourceValues so I can cache on the true value...but NSURL, I assume, makes its cache thread-safe, so -removeCacheResourceValues probably isn't so cheap. Doesn't that mean the URL cache is costing me performance by requiring me to clear it to get to the true values I really want to cache?

Hypothetically, yes, but if you ACTUALLY run into performance, then I think you have a bigger issue. In terms of the lock itself, there's an os_unfair_lock that's used to protect access to the data, which means the cost of uncontested access is fairly minimal. The problem here is that having contention means that you have multiple threads attempting to manage/manipulate the same file at the same time... which is a bad idea regardless of performance.

That leads back to here:

Building a cache on top of NSURL resource values

The real question here is basically "what are you trying to do"? The problem here is that NSURL is basically a low-level primitive, not really "the" solution for file tracking. For example:

  1. Document-based apps are better off using a class like NSDocument, which manages things like file coordination and safe saves.

  2. Longer-term file tracking is better done with bookmarks, since they're harder to break and allow an app to restore access to the target as needed.

  3. Apps that manipulate files "in bulk" often end up using lower-level APIs to improve performance.

One final note here is that it's not difficult to get an NSURL object that doesn't have the automatic flushing behavior. All you need to do is take the NSURL you're starting with, pass it (or whatever API fits what you're starting with) into CFURLCreateFilePathURL() (to create a CFURLRef), then cast that CFURLRef back to NSURL. Toll-free bridging means that CFURLRef can be used exactly like an NSURL, so the only difference is that it won't free its own cache.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

The run loop itself doesn't actually "turn" at a predictable rate. Depending on how your app is architected and the overall app state, it's entirely possible for an app to go seconds or even minutes without the main thread ever running.

Doesn't appear to be what's going on in this case. I made this dumb little test which can easily reproduce the issue (sorry can't get code to format well on these forums).

+(MachoManURLTester*)sharedTester
{
	static MachoManURLTester *sharedTester = nil;
	
	static dispatch_once_t token;
	dispatch_once(&token,^{
		sharedTester = [[self alloc]init];
	});
	return sharedTester;
}

-(void)startURLTrashDance
{
	NSAssert(NSThread.currentThread.isMainThread, @"Main thread only.");
	
	NSFileManager *fm = [NSFileManager defaultManager];
	NSURL *wrapperDir = [[NSURL fileURLWithPath:NSTemporaryDirectory() isDirectory:YES] URLByAppendingPathComponent:NSUUID.UUID.UUIDString isDirectory:YES];
	if (![fm createDirectoryAtURL:wrapperDir withIntermediateDirectories:YES attributes:nil error:nil])
		{
			NSLog(@"Test failed");
			return;
		}
	
	//[[NSWorkspace sharedWorkspace] activateFileViewerSelectingURLs:@[wrapperDir]];
	
	NSURL *untitledFour = [wrapperDir URLByAppendingPathComponent:@"Untitled 4" isDirectory:YES];
	if (![fm createDirectoryAtURL:untitledFour withIntermediateDirectories:YES attributes:nil error:nil])
	{
		NSLog(@"Test failed");
		return;
	}
	
	NSLog(@"Created untitled 4.");
	
	NSURL *resultingURL = nil;
	
	if (![fm trashItemAtURL:untitledFour resultingItemURL:&resultingURL error:nil])
		{
			NSLog(@"trash failed");
			return;
		}	
	
	NSLog(@"Moved Untitled 4 to the trash.");
	
	[self performSelector:@selector(replaceTrashedURL:) withObject:untitledFour afterDelay:1.0];
	[self performSelector:@selector(compareBothURLS:) withObject:@[untitledFour,resultingURL] afterDelay:4.0];
	
}


-(void)replaceTrashedURL:(NSURL*)originalURL
{
	NSFileManager *fm = [NSFileManager defaultManager];
	if ([fm createDirectoryAtURL:originalURL withIntermediateDirectories:YES attributes:nil error:nil])
	{
		NSLog(@"Recreated Untitled 4");
	}
}

-(void)compareBothURLS:(NSArray<NSURL*>*)twoURLsArray
{
	NSLog(@"4 seconds is up - let's check");
	NSFileManager *fm = [NSFileManager defaultManager];
	NSURL *untitledFour = twoURLsArray.firstObject;
	NSURL *resultingURL = twoURLsArray.lastObject;
	
	// Uncomment these fixes the relationship check:
	//[untitledFour removeCachedResourceValueForKey:NSURLFileResourceIdentifierKey];
	//[resultingURL removeCachedResourceValueForKey:NSURLFileResourceIdentifierKey];
	
	NSURLRelationship relationship;
	NSError *error = nil;
	if ([fm getRelationship:&relationship ofDirectoryAtURL:untitledFour toItemAtURL:resultingURL error:&error])
		{
			if (relationship == NSURLRelationshipSame)
				{
					NSLog(@"NSURLRelationshipSame: %@ - %@?",untitledFour,resultingURL);
				}
			else if (relationship == NSURLRelationshipContains)
				{
					NSLog(@"NSURLRelationshipContains");
				}
			else  if (relationship == NSURLRelationshipOther)
				{
					NSLog(@"NSURLRelationshipOther");
				}
			else {
				NSLog(@"Unknown");
			}
		}			
	else 
		{
			NSLog(@"Error reading relationship: %@",error);
		}
}

@end

Just use that class and do this in a test program.

	MachoManURLTester *URLTester = [MachoManURLTester sharedTester];
	[URLTester startURLTrashDance];

And to answer your earlier question, YES the file reference urls do collide.

Accepted Answer

Doesn't appear to be what's going on in this case. I made this dumb little test which can easily reproduce the issue (sorry, can't get code to format well on these forums).

Interesting. So, I can actually explain what's going on, and it's actually not the cache.

So, architecturally, NSURL has two different mechanisms for tracking file location— "path" and "file reference". Path works exactly the way you'd expect (it's a string-based path to a fixed location), while file reference relies on low-level file system metadata to track files. Critically, this means that the file reference will track the object as it's moved/modified within a volume.

Secondly, keep in mind NSURLs are generally "data" objects, meaning they don't "proactively" update their content.

So, the actual issue here starts here:

if (![fm trashItemAtURL:untitledFour resultingItemURL:&resultingURL error:nil])

At the point that method returns, "untitledFour" is no longer entirely coherent, as its path points to the original location, but its reference points to the file in the trash. You can see this for yourself by running this at the top of compareBothURLS:

NSURL* pathURL = untitledFour.filePathURL;
NSURL* refURL = untitledFour.fileReferenceURL;

NSLog(@"1 %@", untitledFour.path);
NSLog(@"2 %@", pathURL.path);
NSLog(@"3 %@", refURL.path);
	
NSLog(@"A %@", untitledFour.fileReferenceURL.description);
NSLog(@"B %@", pathURL.fileReferenceURL.description);
NSLog(@"D %@", refURL.fileReferenceURL.description);

What you'll find is that:

  • In the first log set, "1" & "2" will match, both pointing to the original file location. "3" will not, pointing to the trash instead.

  • In the second log set, "A" & "C" will match, while "B" will not.

More specifically, the strings returned in the second log set will have this format:

file:///.file/id=<number>.<number>/

...and the second number will be different for "B".

With all that context:

(1) The reason getRelationship is returning "same" is that it primarily relies on file reference data and the reference data points to the file in the trash. There's an argument that it shouldn't do this, however. In its defense, using the reference data makes it much easier to sort out issues like hard-linked files and/or symbolic links allowing multiple references to the same file.

(2) The reason "removeCachedResourceValueForKey" changed the behavior is that it deleted the file reference data, forcing NSURL to resolve the data again. You'll actually get exactly the same effect if you test with "untitledFour.filePathURL".

What I'd highlight here is that the "right" behavior here isn't entirely clear. That is, is the problem that "getRelationship" is claiming that two different paths are "the same file"? Or is the problem that NSURL is returning the wrong path value for a specific file?

That question doesn't have a direct answer because the system doesn't really "know" what you actually want— are you trying to track a particular "object" (fileReferenceURL) or are you trying to reference a particular "path" (filePathURL)? It doesn't "know", so it's ended up with an slightly different object that's tracking both...

...but you can tell it what you want, at which point the API will now do exactly what you'd expect. More specifically, you can change the behavior by forcing the URL type you want immediately after you create the directory:

    if (![fm createDirectoryAtURL:untitledFour withIntermediateDirectories:YES attributes:nil error:nil])
    {
        NSLog(@"Test failed");
        return;
    }
    
#if 1
    untitledFour = untitledFour.fileReferenceURL;
#else
    untitledFour = untitledFour.filePathURL;
#endif

Strictly speaking, you could set "filePathURL" anywhere you want, but you can't create a fileReferenceURL to a non-existent object, so it needs to be after the create. In any case, either of those two configurations works the way you'd expect.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Very interesting. Thanks a lot for the detailed responses. As I mentioned briefly in a previous post I stumbled across this commenting out some things to test out an error handling code path. As a result of the matching NSURLFileResourceIdentifierKey the wrong error message was logged. In my app's case this is basically a harmless bug because I do nothing but the behavior did spark my curiousity. In any case I will most likely be removing -getRelationship: calls from app entirely soon.

I'm surprised that a fileReferenceURL would be cached in a filePathURL at all. My expectation is when calling fileReferenceURL on a file path url is to get a reference to the file at the exact file path right now if it is there (or nil) and I would have to hold the fileReferenceURL on first access to follow the file around.

If caching the fileReferenceURL in the URL itself has been determined to be necessary I'm also somewhat surprised an existing fileReferenceURL isn't cleared/updated when files are manipulated via high level APIs like NSFileManager -createDirectoryAtURL:.... etc. After recreation as you mentioned if you did untitledFour.fileReferenceURL you'd be manipulating the folder in trash not the new folder you created. Based on your previous reply it sounds like it would be chalked up as an app bug since you recommend grabbing the fileReferenceURL early. But IMO it isn't obviously clear by the public interface that NSURL may return a 'cached/stale' file reference url. I'm not actually doing this and i'm glad i'm aware of this possibility. It doesn't seem so far fetched that this could be a source of a dataloss bug or something worse.

In my silly example it is obvious that I'm recycling the untitledFour NSURL instance. In a real complex app where you are passing NSURLs around like hot potatoes to various objects it may not be so obvious.

I'm surprised that a fileReferenceURL would be cached in a filePathURL at all. My expectation is when calling fileReferenceURL on a file path URL is to get a reference to the file at the exact file path right now if it is there (or nil) and I would have to hold the fileReferenceURL on first access to follow the file around.

I can understand that thinking, but it's not the system’s perspective. The system’s "view" here is that the file reference is considered more "authoritative" than the path. The reason for this is pretty simple- it's easy for an app to track "a path" (just store it as a string), but the only way an app can track "a file system object" is by doing what file references "do".

The preference for the "reference" object also ends up masking also sort of common behaviors which would otherwise be highly disruptive. For example, it allows users to rename directories without having to worry about the consequence that might have on whatever files their apps might happen to be interacting with.

Based on your previous reply it sounds like it would be chalked up as an app bug since you recommend grabbing the fileReferenceURL early.

So, my own, entirely personal and slightly radical, perspective is that string-based file paths are a fairly broken defect that's become so ingrained in how most developers think about files that it's basically become "stuck" in "all" file system APIs even though they don't actually make ANY sense. Architecturally, every file system in the world ACTUALLY tracks its objects using some kind of artificial identifier (typically "a number") which is separate from object metadata (like "names"). Paths are then constructed by mapping IDs to names and then stringing those names together to make a path.

Relying on paths as the core file identifier means that every operation you perform is going through that same mapping process, opening the door to all sorts of problems which don't really have to exist at all. For example, take "Time-of-check to time-of-use (TOCTOU)" attacks. In the file system context, those attacks all look something like this:

  1. Get the system to check a particular object.
  2. Replace the object the system checked with a different object.
  3. The system now does "something" to a different file/directory than intended to.

These attacks are possible because you can't communicate the ACTUAL object you wanted to manipulate, but are instead forced to pass in a made-up reference to it. MacOS itself is heavily still reliant on path, but fileReferenceURLs are the closest construct we have to an API that doesn't fall into this "trap".

That leads to here:

But IMO it isn't obviously clear by the public interface that NSURL may return a 'cached/stale' file reference URL. I'm not actually doing this and I’m glad I’m aware of this possibility.

The underlying question here is which of these two objects "should" untitledFour be referencing? Is it tracking a fixed path location ("tmp" inside the app data container) or is it tracking a file system object (which has not been moved to "Trash")?

My own view is that "object tracking" is the better default, which means the bug here is that "path" is returning an incorrect value, NOT that getRelationship is returning the same.

However, you're right that the current behavior in this particular case is a bit of a mess, as some APIs are relying on the reference (like "getRelationship") but other APIs are directly pulling the path (like "activateFileViewerSelectingURLs"). I'm not sure what's causing that behavior, but it's absolutely broken and it's not as simple as caching. Expanding on my earlier code, if you add this logging after setting fileReferenceURL:

untitledFour = untitledFour.fileReferenceURL;
NSLog(@"1 %@", untitledFour.path);

...you'll find that the code above logs "tmp" (because the directory hasn't been moved) and the logging in compareBothURLS logs "Trash". Similarly, using "filePathURL" gets you a fixed reference to "tmp". More to the point, if you run this sequence on untitledFour immediately after trashing the directory:

[[NSWorkspace sharedWorkspace] activateFileViewerSelectingURLs:@[untitled.fileReferenceURL.filePathURL]];
sleep(1);
[[NSWorkspace sharedWorkspace] activateFileViewerSelectingURLs:@[untitled]];

...then Finder will open BOTH directories (trash, then app container). In other words, the issue here isn't that NSURL has a stale path value, it's that it's ended up in a weird state that splits the behavior of those two cases and that is definitely a bug (r.171663816).

In my silly example it is obvious that I'm recycling the untitledFour NSURL instance. In a real complex app where you are passing NSURLs around like hot potatoes to various objects it may not be so obvious.

Not at all. I think you've actually found an issue that we need to fix. I think the actual lesson here is that when you actively manipulate a URL, I would recommend deciding what kind of URL you want to "end up with" and then "pull" that type out using filePathURL/fileReferenceURL.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

NSFileManager getRelationship:ofDirectoryAtURL:toItemAtURL:error: returning NSURLRelationshipSame for Different Directories
 
 
Q