Can a Live Caller ID server supply live data or must it be static?

With the Live Caller ID example server, the caller lookup dataset is defined in an input.txtpd and processed by running a ConstructDatabase command which creates a block.binpb and an identity.binpb file. In other words, a static input file is being processed into static block and identity files.

However, in the real world, the data content for identified and blocked numbers is something which is in a constant state of flux and evolution, as new numbers becoming available, old ones become stale, numbers which were initially considered safe change into being considered malicious etc. etc. Is the example server just that, merely an example using fixed datasets, and an actual production server is able to use live every changing data to formulate its response back to the iPhone OS query?

Here's a concrete use case - suppose it's a requirement to permit US nanp numbers but to block anything else. The total number of non US nanp numbers is so large and ever changing that it would be unfeasible to attempt to capture them in an input.txtpd file and then process that, and then to re-capture and re-process it endlessly. Instead what would be required is the ability for the Live Caller ID server to evaluate at query time, using a regular expressions for example, if a number is nanp or not. Is this possible?

Answered by DTS Engineer in 827175022

Therefore how could the LCIDS use a live, ever changing data set to formulate its response if it doesn't have a fixed dataset like the Apple example server?

First off, keep in mind that "live" here is very relative. Certainly some data is going to be changing every day, particularly if we're talking about data being added into the dataset. However, I'd expect that changes to this kind of data happen in terms of "changes per day/hour", not "changes per minute".

In any case, here's a very rough overview of how a large scale, real world implementation works:

  1. The full data set being shared is broken up and encrypted into one or more shards.

  2. The server knows what data is in each shard but no ability to decrypt that data or infer what specific data it's returning from the request it receives. In practical terms, the process creating the shard both encrypted it and security reordered it's contents.

  3. When data is modified, the server throws away it's "old" shard and generates a new shared.

Note that at very large scale, the shear number of records involved may very well mean that the change rate does approach "changes per minute". However, the shard count increases at the same, allowing this to scale up.

For more specific guidance on the technical details of this, see the article "Encoding pipeline" in the swift-homomorphic-encryption documentation.

That leads to here:

Suppose there exists elsewhere an ever changing/evolving dataset of information about a phone number,

If you're imagining a situation where a large portion of the data is ACTUALLY changing an very high frequency ("changes per minute") AND it's important that the "current" value be returned "immediately", then you may be right and this wouldn't work very well.

However, I also can't think of a situation where that would actually be an issue for the use cases LiveCaller ID is designed to address.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Accepted Answer

Is the example server just that, merely an example using fixed datasets,

Yes, that's exactly what it is. The example server is using a trivial implementation to demonstrate the basic architecture and design. It's internal implementation is definitely not how real world implementations are expected to function.

and an actual production server is able to use live every changing data to formulate its response back to the iPhone OS query?

Yes, that's exactly how I'd expect a real implementation to function. Frankly, the architecture we've built only really makes sense because the data CAN'T be captured in a start forward fixed data set, even a very larger one.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

@Kevin Elliott

In the example Live Caller ID provided by Apple, the database is fixed, and the above question/answer is saying that's not how a real implementation would function.

But how could this be possible if the calling number is not exposed to the LCID server?

Suppose there exists elsewhere an ever changing/evolving dataset of information about a phone number, the LCIDS would not be able to perform a live lookup because it doesn't know the calling number?

Therefore how could the LCIDS use a live, ever changing data set to formulate its response if it doesn't have a fixed dataset like the Apple example server?

Therefore how could the LCIDS use a live, ever changing data set to formulate its response if it doesn't have a fixed dataset like the Apple example server?

First off, keep in mind that "live" here is very relative. Certainly some data is going to be changing every day, particularly if we're talking about data being added into the dataset. However, I'd expect that changes to this kind of data happen in terms of "changes per day/hour", not "changes per minute".

In any case, here's a very rough overview of how a large scale, real world implementation works:

  1. The full data set being shared is broken up and encrypted into one or more shards.

  2. The server knows what data is in each shard but no ability to decrypt that data or infer what specific data it's returning from the request it receives. In practical terms, the process creating the shard both encrypted it and security reordered it's contents.

  3. When data is modified, the server throws away it's "old" shard and generates a new shared.

Note that at very large scale, the shear number of records involved may very well mean that the change rate does approach "changes per minute". However, the shard count increases at the same, allowing this to scale up.

For more specific guidance on the technical details of this, see the article "Encoding pipeline" in the swift-homomorphic-encryption documentation.

That leads to here:

Suppose there exists elsewhere an ever changing/evolving dataset of information about a phone number,

If you're imagining a situation where a large portion of the data is ACTUALLY changing an very high frequency ("changes per minute") AND it's important that the "current" value be returned "immediately", then you may be right and this wouldn't work very well.

However, I also can't think of a situation where that would actually be an issue for the use cases LiveCaller ID is designed to address.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

@Kevin Elliott

Thank you for the reply.

Does there need to be, or recommended to be, any relationship between the structuring the contents of the shards and the structuring of user tiers?

i.e. suppose one user tier is for providing names only, and another user tier is for providing names plus images. Then is it a good idea to (wrt lookup time speed for example) to have shard(s) which contain data without images and separate shard(s) which contain images? Or can the data in the shards and the user tiers all be jumbled up.

Does there need to be, or recommended to be, any relationship between the structuring the contents of the shards and the structuring of user tiers?

Per the engineering team, no, not really.

i.e. suppose one user tier is for providing names only, and another user tier is for providing names plus images. Then is it a good idea to (wrt lookup time speed for example) to have shard(s) which contain data without images and separate shard(s) which contain images? Or can the data in the shards and the user tiers all be jumbled up.

The right answer here is something you'd probably need to sort out based on your own testing, as the actually performance will depend almost entirely on the specifics of your data and large infrastructure.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Can a Live Caller ID server supply live data or must it be static?
 
 
Q