How well does core data scale with 'many' entity types?

Hi,


I wondered if I could ask a question which may have a fundamental impact on how I model some data I'm working on.


I'm working with some medical data which I want to model with SNOMED CT terms (https://en.wikipedia.org/wiki/SNOMED_CT for an overview).


SNOMED describes each medical object as a 'concept'. Concepts are hierarchical - ie. can be subclasses of another concept.


Within each term, I'm going to want to store a set of different values. I'm also going to want to query those concepts to find out which relate to my patient.


So - in a nutshell I think I'm describing entities, attributes and relationships.


The difficulty comes that SNOMED allows MANY different codes: there are 311,000 concept codes to date and this is expanding.



I'm hopefully only dealing with a small subset of those codes for now: I was thinking about the feasibility of modelling 'known' concepts as entities with a parent entity of 'concept', which can take all the detail we are expecting to store. Unknown entities would have to be modelled as generics - which would then lose all the potency of core data to manage their data efficiently, but would mean we could store the data without a precise model.



This plan rests on the idea that Core Data scales reasonably well with having lots of entities in a model. Has anyone had experience of how well Core data scale in terms of the number of entities, or having a large hierarchy of entities with parent entities?

First, a disclaimer: My first impression of SNOMED CT is that it doesn't map well at all to relational databases, and that's going to cause a few problems for CoreData's relational database backend. But if you implement SNOMED CT on plain SQL, you'll get to learn first hand why Core Data makes some of the implementation decisions that it does.


"This plan rests on the idea that Core Data scales reasonably well with having lots of entities in a model. Has anyone had experience of how well Core data scale in terms of the number of entities, or having a large hierarchy of entities with parent entities?"


Scale in terms of number of entities, and scaling on large number of parent entities is two different things. Having a huge number of tables is going to scale the same for any database, especially if you're using the sqlite data store. On the other hand, the "large number of parent entities" may cause database inflation. Historically, if you have entity types A, B, and C all ISA to type D, CoreData produces one unified table for A,B,C and D. In other words, it optimizes for query processing instead of compactness.


If the size of the entity type tables becomes an issue, you're going to have to model Snomed CT data without using Core Data relationships, and you'll end up having to use manually created entity ID's and entity lookups just like if you were dealing with references to entities across persistent stores. The good news is that those manually created entity ID's should be easier to deal with using the beta unique attribute value support.


On the other hand, if you implemented your own custom database back end and using an NSIncrementalStore to talk to it, then the SNOMED CT data would probably map great to a huge collection of entities types and sub-types. But you'll probably end up putting a lot of work into that database back end.

How well does core data scale with 'many' entity types?
 
 
Q