How to implement and use a search-tokens object in Core Data?

In WWDC 2013 Session 211 Core Data Performance Optimization and Debuggingpages 128-131 discuss using a canonicalized search in Core Data. So I am attempting to configure my Core Data model to support this method of searching. I've created a Tokens object with a token attribute of type String (?? Considering whether changing this to type transformable to support saving a string as an Array would be better ??). I also create a relationship between the Tokens object and the base entity on my data model. From the talk, it's not precisely clear how one would use this setup. So for example, I know that I probably want to use an inexpensive query against the Tokens entity with an operator like BEGINSWITH but would ANY and other operators be too costly to use?


Also I can't find any sample projects showing how to implement the search-token algorithm. So, it's a bit unclear if I should have one word in each token or if I have a string in each token attribute then it's not entirely clear how to parse that efficiently (e.g. assuming I parsed the string into the attribute of type Transformable, so that I could use CONTAINS on the array - is that recommended??). The talk did infer that one could use components from string to derive an array and somehow work with that (mentioned only in passing).


Since performance is such a problematic and widespread topic, I'd appreciate any insights (snippets or instructions) on how I should complete the implementation and make use of the search-tokens within an app.


Finally I note that new fields now support Spotlight. Should I enable this? Would this be an alternative to the search-token approach? I am trying to remain within the confines of Apple APIs (no third party dependencies). Ultimately I want to use a search controller / tableview to quickly provide full-text search capability using the search-token paradigm.

The point of canonicalized searches is that non-canonicalized searches degenerate into table scans instead of indexed-based searches.


Consider if you have the elements

  • one
  • two
  • three
  • four
  • five
  • six
  • seven
  • eight

in your database, and you do a search for strings containing the letter 'e'. To do that, you end up having to do a full table scan. That happens because your table isn't going to be indexed based on the letters each string contains, it's going to be indexed based on putting those strings in a sort order. The same degenerate case happens when you try to do a case-insenstive search, or to search for larger substrings.


Likewise, if you have the elements

  • One
  • tWo
  • thRee
  • FOur
  • Five
  • Six
  • Seven
  • Eight

in your table and you try to search for "four" case-insensitive, that degenerates into a table scan because the index is constructed based on the literal value of the string ("FOur" and "four" are indexed into different slots) so in order to match "FOur" and "four" the whole table of values has to be scanned. No one does the 2^N binary searches to find all of the case variations for a length-N string to match "four" against "Four", "fOur", etc., you just go straight to performing a full table scan.)


Short version: Strings are indexed to support the case of matching the characters at the beginning of the string, directly matching the characters without transforming them. Doing anything else degenerates into a table scan. And the whole point of indexing on a value is to be able to avoid full table scans.

Thanks for the response. This highlights why I should use strings and also leads me to the conclusion that a single entry per record is the way to proceed. Given that structure of the data I would just use predicates with the beginswith. A bit of research indicates that the Spotlight option supplies data to the system Spotlight service. I plan to implement the token insertion using the following existance check and usage. Later I will acces the token object as part of the search process. I am still hoping that someone who has implemented the use of search tokens can share their experience and point me in the right direction with specific pointers.


For example ~ a possible solution:

(And to begin with, I am not sure why Apple doesn't do this in their documentation)

These are my initial thoughts on how this might work (I am in the process of figuring this out so caveat emptor)

  1. Create a simple tokens object containing one string attribute to hold a single word
  2. This word attribute becomes a search index
  3. Add a ToMany relationship to the tokens object from a significant object that has a tags attribute
  4. Create a token attribute and link the related significant object when adding a tag on that significant object
  5. Later when looking to retrieve the signifiant object use a BEGINSWITH predicate on the token attribute
  6. Present the resulting significant objects from the predicate to the user (results are initially empty and then incremented in)


Questions about this that are not immediately clear:


Have I missed some important aspects of this?

What are the gotchas?

How to effeciently check when adding a tag that exists already and adding a new relationship?

(String equality checks are slow but I can't see a faster method ~ should I be comparing a hash value?)

Apple does not have a simple implementation of this??


func tokenExists (aToken:String) -> Bool {
        let request: NSFetchRequest = NSFetchRequest(entityName: self.className!)
   
        let predicate = NSPredicate(format: "token == %@", argumentArray: [aToken])
        request.predicate = predicate
        let error: NSErrorPointer = nil
   
        let count = self.managedObjectContext!.countForFetchRequest(request, error: error)
   
        if count == NSNotFound {
            return false
        }
        return true
    }


Usage:

func insertToken (value:String) {
    
        if  !tokenExists(value) {
            self.token = value
            saveState()
        }
    
    }
How to implement and use a search-tokens object in Core Data?
 
 
Q