Core Data for "a large data set"?

I'm building a document-based app that allows users to enter a hierarchy of geographical data. Something analogous to a tree hierarchy of areas and cities. An area has a boundary and can contain sub-areas. An area can contain cities which are just points.


An average document may contain 20 to 30 areas and 500 to 2000 cities. The largest document would contain about 100 areas and 25000 cities.


I'm new to Cocoa programming and I don't know if a data set in that size range qualifies as "small", "medium", "large" or "very large". Knowing that would help me determine how to to store and load the data.


Two quotations I've read on the topic:


"If you have a large data set or require a managed object model, you may want to use

NSPersistentDocument
to create a document-based app that uses the Core Data framework." - Document-Based App Programming Guide for Mac


"Applications using very large data sets would likely benefit from a custom persistance implementation. Core Data's reliance on KVC carries with it a significant overhead. We expect pure Swift objects would offer the best performance in terms of low overhead property access" - Cocoa Programming for OS X 5th edition


For my data set size, would I be best served looking at archiving with the NSCoding protocol, core data, or something custom?

Would you be served at all by archiving your data using the NSCoding protocol? No, especially not if you're dealing with mobile devices.


NSPersistentDocument is essentially just a convenience class to take care of some of the house keeping that you'll need to do when you use CoreData.


But I think there's a problem with the question you're asking. You're asking for advice concerning which storage mechanism to use for your data without explaining what you're going to use the data to do.


If you're going to try to display the regions and points in a map overlay or something, then you're probably going to be spending your time writing a GIS style indexing system for your data. If you're going to just be querying the data for point hits or something a few points at a time, then the CoreData fetch mechanics are probably going to be fine.

I would suggest that you design your model API first. Whether the data is backed by core data, or pure sqlite, or a big in-memory array/dictionary should not matter, except in how the model is implemented.


If you can decide on a reasonable model API, then how you implement the backend is an implementation detail of the model. This way you can use whatever backend technology you want, and change it easily enough if you find your requirements change.


Since you referenced the OS X programming book, I assume this is for OS X, and unless your objects are incredibly huge, you can probably get away with having your model be completely resident in memory as 25,000 objects is not many at all. However, remember that loading and saving will take longer when you atomically load/save the whole model at once.


Thus, design your model API to be independent of how the storage and retrieval is implemented, and start with the easiest backend implementation to do the job, knowing you can swap it out at any time.

Core Data for "a large data set"?
 
 
Q