Copying NSMutableArray within a NSOperation?

I'm looking at some Apple sample code as guide. Just wanted to ask a question about this...


They have a class, that wraps an NSMutableArray.


The array is declared as non-mutable in the public API but it is backed by a mutable array ivar. In the public API, a client can call methods to add, remove items in the mutable array (but the client isn't to modify the mutable array directly).


The app also periodically can get system events, and does some processing in a NSOperation:


[scanQueue addOperationWithBlock:^{

        NSMutableArray *filesToProcess = [self.files mutableCopy];
       //DO work....comparing our files to process with what's current.

}];


Couldn't the call to mutableCopy on the scanQueue potentially cause issues with this mutable array? The public methods for clients to access this array are called on the main thread.

If they're executing on different threads, yes, that could cause problems. NSMutableArray is not thread safe to my knowledge. You would have to add synchronization around it or dispatch that -mutableCopy call to ensure ALL access to self.files occurs on the main thread (maybe performSelectorOnMainThread:withObject:waitUntilDone:).

They did wrap the scanQueue call in @synchronized like this:


@synchronized(scanQueue)
{
     

      [scanQueue addOperationWithBlock:^{

        NSMutableArray *filesToProcess = [self.files mutableCopy];


       //DO work....comparing our files to process with what's current. 


      //when done, commit detected changes on the main queue
       NSOperationQueue *mainQueue = [NSOperationQueue mainQueue];
         [mainQueue addOperationWithBlock:^{
               
             [self.files removeObjectsInArray:removedItems];
             //ect....
          }];


}];



}




All other access to self.files is on the main thread,and no @synchronized blocks wrap those calls. Is wrapping just this queue's access to the self.files sufficient to ensure that a client does not mutate the array while it is being copied in the background? (code formatting is ugly, but the cursor location on this forum software is going all over the place... having a hard time with that).

The code — specifically the mutableCopy — certainly looks wrong. The removeObjects… on the main thread is unprotected by the @synchronized, and therefore can conflict with the next operation on the background thread, regardless of what's happening elsewhere in the app.

I was thinking it could potentially cause issues.


Actually they use mutableArrayValueForKey: to make changes to the array...instead of what I posted in the block...


So it's


[[NSOperationQueue mainQueue] addOperationWithBlock:^{

       [[self mutableArrayValueForKey:@"files"]removeObjectsInArray:removedItems];

  }];


but that doesn't make any difference, does it?

No, that makes it KVO compliant, but doesn't help with the thread safety.


It's certainly possible that the code overall has a structure that prevents apparent thread safety issues from becoming real issues. However, I'd suggest it's not worth the time trying to follow that up. It would be more useful just to not use this code as a guide (to multithreading, at least).

>No, that makes it KVO compliant, but doesn't help with the thread safety.

Yeah that's what I thought, and didn't think it'd have any thready safety implications, but figured I'd throw it out there. I guess I could just either do the mutable copy bit on the main thread, in the unlikely event that there is a half a million files, it could be a performance problem...or perhaps wrap the code in the main queue operation in a @synchronized(self)

Synchronizing on "self" would be overkill. It seems like it's self.files that needs the controlled access. Personally I like to keep synchronization as fine-grained as possible, so I would synchronize on self.files. Everywhere that accesses (read, modify, write, or any combination / sequence of those that should be atomic) self.files needs to be in a synchronized block or be guaranteed to run on the same thread.


Anything you do synchronization wise could block the main thread while you make that copy of the half million files, so not much point worrying about that. (Either you're doing it on the main thread, or the main thread could be blocked waiting for your @synchronized code doing the work in the other thread to release the lock.) Plus I bet making a mutable copy is very fast even with large numbers of items. But if you're worried about that you should test it.


Yes it gets messy. Multi threading is a pain. 🙂

You are right. The property is atomic, so the extra @synchronized block shouldn't be necessary, should it?

No, atomicity doesn't change anything. The value of the property is a pointer (to a mutable array), so atomicity would protect only the accessing of the pointer, not the data structures pointed to.


It's a complete waste of time looking for a generic solution, such as encasing random blocks of code in a synchronizing construct. You'll either leave hard-to-find bugs in the edge cases, or hard-to-find deadlocks.


In this case, we can't even speculate on a real solution, because we don't know what's going on. We don't even know if scanQueue is serial or concurrent. We don't know what the point of using NSOperation is: is it to get parallel processing, to offload long tasks from the main thread, etc? Maybe NSOperation isn't the best API to use for this, perhaps direct use of GCD is easier. The code you showed has each NSOperation using, or not using, the resulting array of the previous operation, according to the timing of the block placed back on the main thread. We don't know the consequences of this sort of non-deterministic sequencing. We don't know if you're referring to this Apple sample code for it's demonstration of using NSOperation, or for what it does inside each operation.


I really think you're at the point where you should disregard this sample code, and design your own actual implementation.

The atomic / nonatomic specifier on the property is almost never sufficient to protect your data in a multithreaded situation. It only guarantees that the object pointer will always have a consistent value. In real life it's the *contents / state* of the object, not the pointer to it, that needs to be consistent. Typically, you would have code that reads the array, maybe adds or removes items, etc. ALL that has to be done while other threads are blocked (i.e. both the code that's doing the read-modify-write AND the code that's read-only must be synchronized on the same lock) if you want to guarantee that the contents of that object are always in a consistent state for everyone once they have acquired the lock.

Cool. I was able to consistently reproduce an issue with this.


1) Start a scan operation on scanQueue and self.files only has one or two objects in it (so it will complete very quickly).

2) Call the method to start another second scan operation (which will attempt to cancel any in progress operations before it begins by calling cancelAllOperations on the scanQueue).

3) The second operation will start before the first operation completes, and even if I change the code to use a NSBlockOperation instead of addOperationWithBlock: and check the isCancelled property before doing the add/remove on the main queue within the execution block, the call to cancel can be too late and the first operation will have already added some objects to self.files on the main queue.

4). Now the second operation copied the self.files before the first operation completed. So code within the operation that checks for duplicates will fail within the second operation. When the second operation completes it will add duplicate objects to self.files. This will happen even if you copy the array on the main thread outside of the operation block, because the snapshot taken prior to the second operation no longer reflects the current state of affairs.

In the sample, they do call cancelAllOperations on the queue before the start of another operation, though it is unclear to me why because they add the operation with addOperationWithBlock: which will continue executing even if cancelled because they are not/cannot check the isCancelled property and dump out?


Changing the code to this seems to work nicely (at least in limited testing):


[[NSOperationQueue mainQueue]addOperationWithBlock:^{

     NSArray *currentFiles = [self.files copy];

      //do scan queue operation here....

}];


The operation added to the main queue to copy currentFiles now happens after the operation added on the main queue at the end of the first scan (which potentially mutates the array) though I suspect that is not guarenteed.


I'm not sure if it necessary to use a lock, if all changes are gathered in the background in collections, but all mutations to self.files happen on the main thread? Calling copy on the array does a shallow not a deep copy so I think that your suggesstion, not to worry about it impacting performance too much unless testing proves otherwise is good advice.

If the main thread is mutating the collection while a background thread is simultaneously trying to read / copy it, then I could see that causing problems.


If you suspect some scenario could cause issues but it's hard to get the timing right, feel free to add some sleep() calls while you're testing, to artificially extend the critical section (the part of code that is expected to be atomic, e.g. where your main thread is adding items to the collection). If the code is done right, it should not matter where you put the sleep() calls; it should behave correctly no matter when each thread gets a chance to run.


My advice, mostly learned the hard way, is to either have ALL access (both reads and writes) via the main thread, or add synchronization on self.files around ALL places where it's accessed (both reads and writes).

>If the main thread is mutating the collection while a background thread is simultaneously trying to read / copy it, then I could see that causing problems.


Yea I thought the same thing when I first saw the sample code.

I've decided to do the copy on the main thread. So all mutations made to the array are done on the main thread. The background thread only does work when the system dispatches an event indicating that a change was made by another process. When that happens, I make the copy of the current contents on the main thread, start the operation on scanQueue and validate what's on disk to what we already have, gathering added objects and removed objects. Then when this is done on the main thread I remove the removed objects gathered in the background from files and add the added objects gathered.

Another system event can be called while scanQueue has an operation in progress, so I cancel all operations and start another fresh scan every time. I also set scanQueue's maxConcurrentOperationsCount to 1.

It still is possible to get duplicated added files because an the operation on scanQueue could have been cancelled too late and already inserted objects in files, but the next operation got its copy of files before they were inserted and duplicate checking will fail, so maybe I ought to use synchronized blocks around all access to the array like you suggest.

I suppose I could possibly use a dirty trick of grabbing the count of files on disk, and comparing it to see if it'll match the count of self.files plus the count of added files minus the count of removed files. If they don't match, then I've hit an error and I could spin of a complete rescan, dumping all objects out and stuffing all new objects for all files with what is on disk.

Copying NSMutableArray within a NSOperation?
 
 
Q