With the Live Caller ID example server, the caller lookup dataset is defined in an input.txtpd and processed by running a ConstructDatabase command which creates a block.binpb and an identity.binpb file. In other words, a static input file is being processed into static block and identity files.
However, in the real world, the data content for identified and blocked numbers is something which is in a constant state of flux and evolution, as new numbers becoming available, old ones become stale, numbers which were initially considered safe change into being considered malicious etc. etc. Is the example server just that, merely an example using fixed datasets, and an actual production server is able to use live every changing data to formulate its response back to the iPhone OS query?
Here's a concrete use case - suppose it's a requirement to permit US nanp numbers but to block anything else. The total number of non US nanp numbers is so large and ever changing that it would be unfeasible to attempt to capture them in an input.txtpd file and then process that, and then to re-capture and re-process it endlessly. Instead what would be required is the ability for the Live Caller ID server to evaluate at query time, using a regular expressions for example, if a number is nanp or not. Is this possible?
Therefore how could the LCIDS use a live, ever changing data set to formulate its response if it doesn't have a fixed dataset like the Apple example server?
First off, keep in mind that "live" here is very relative. Certainly some data is going to be changing every day, particularly if we're talking about data being added into the dataset. However, I'd expect that changes to this kind of data happen in terms of "changes per day/hour", not "changes per minute".
In any case, here's a very rough overview of how a large scale, real world implementation works:
-
The full data set being shared is broken up and encrypted into one or more shards.
-
The server knows what data is in each shard but no ability to decrypt that data or infer what specific data it's returning from the request it receives. In practical terms, the process creating the shard both encrypted it and security reordered it's contents.
-
When data is modified, the server throws away it's "old" shard and generates a new shared.
Note that at very large scale, the shear number of records involved may very well mean that the change rate does approach "changes per minute". However, the shard count increases at the same, allowing this to scale up.
For more specific guidance on the technical details of this, see the article "Encoding pipeline" in the swift-homomorphic-encryption documentation.
That leads to here:
Suppose there exists elsewhere an ever changing/evolving dataset of information about a phone number,
If you're imagining a situation where a large portion of the data is ACTUALLY changing an very high frequency ("changes per minute") AND it's important that the "current" value be returned "immediately", then you may be right and this wouldn't work very well.
However, I also can't think of a situation where that would actually be an issue for the use cases LiveCaller ID is designed to address.
__
Kevin Elliott
DTS Engineer, CoreOS/Hardware