Core Data: Data insertion in multithreaded environment creates duplicate records

I am getting duplicate entries after importing data from json into sqlite. Since core data is not relational database model, primary key concept doesn't exist. So that is also not possible. As per the suggestions given on many other discussion threads, I have added condition to check whether record already exists into database. And if it exists, it updates the record and if not then inserts a new entry. This condition works fine in single threaded environment. However, in multi context environment, it creates duplicate entries for some records. For background updates I am using parent-child concurrency model wherein my main MOC is set as a parent of background MOC.

Any help in this regard will be greatly appreciated.


Thanks.

First of all, parent-child contexts are not designed for multi-threading and especially not for bulk inserting. The reason is when you save the child context, all the records are inserted into the parent on main thread and thus saved on main thread too, which negates any performance/memory benefit from the background thread and context, probably is even more inefficient than simply inserting using only the main context. The proper way to achieve this is with a seperate context and seperate persistent coordinator, see Apple's Earthquake example from WWDC 2015 sample code.


Second, Apple have added a new feature recently to do what you require, it's Constraints in the model designer (Deployment target must be iOS 9) or uniquenessConstraints in code. You would set this to the name of your unique key and it is implemented in Sqlite as a unique key. Also you need to set a merge policy on your context, in your case where you don't want duplicate records to be inserted it would be NSMergeByPropertyStoreTrumpMergePolicy. How this works is the duplicate objects are inserted in the context, but when it is saved if a record with the same constraint already exists, then the object is not saved, and I believe the object is updated with the values in the record. Thus the duplicate check is on the save rather than on the insert (so think of it working backwards from what you have just now). If you don't care about the values in the record being read into this background context then it might be worth checking if NSBatchUpdateRequest will do what you require instead, I haven't used it yet so I'm not sure if it works with constraints, I'm also not sure if its possible for the main context to be updated automatically as it is with the earthquake example.


In case you were interested in knowing what parent-child contexts are designed for, imagine you have UI for editing a new record, you create a child context and use that for the editing along with setting up undo management which I think might be more efficient on an empty context. Then if the user cancels you just throw away the context, if save then when the context is saved, the record is pushed into the main context alowing it to be shown back in the main table. This has the added advantage while the new record is being worked on, the main table of records (that likely uses an NSFetchedResultsContoller) is not being updated due to context updated events while off-screen. See Apple's Books example code for this.


Any questions, let me know!

Core Data: Data insertion in multithreaded environment creates duplicate records
 
 
Q