GUI + XPC Service App Architecture Performance

Let's image that someone wants to use a background service to keep track of FSEvents activity, at the file level (a firehose, some might say).

I choose this example, to indicate the volume and rate of data transmission in question. I'm not creating a front-end for FSEvents data, but my background service may generate data at a similar pace. The service runs off of user defined document/s that specify the FSEvent background filtering to be applied. Those that match get stored into a database. But filters can match on almost all the data being emitted by FSEvents.

The user decides to check on the service's activity and database writes by launching a GUI that sends requests to the background service using XPC. So the GUI can request historic data from a database, but also get a real-time view of what FS events the service is busy filtering.

So it's a client-server approach, that's concerned with monitoring an event stream over XPC. I understand XPC is a request/response mechanism, and I might look into using a reverse connection here, but my main concern is one of performance. Is XPC capable of coping with such a high volume of data transmision? Could it cope with 1000s of rows of table data updates per second sent to a GUI frontend?

I know there are streaming protocol options that involve a TCP connection, but I really want to stay away from opening sockets.

Is XPC capable of coping with such a high volume of data transmision?

Yes, but that’s a qualified yes. Hitting that goal is going to a challenge.

I know there are streaming protocol options that involve a TCP connection, but I really want to stay away from opening sockets.

I’m not sure that sockets will help with this. XPC is, in general, more efficient than sockets.

However, if you want to play around with sockets, you can always open a socket pair and pass one end over the XPC connection.


A key question here is your definition of “real time”. IPC always has some per-message overhead, so you get more throughput if you batch your messages. However, batching introduces latency. Whether that latency matters depends on how “real” your real-time goal is.

One option that you didn’t mention is shared memory. It’s not uncommon for subsystems like this one to set up a shared memory region between the server and the client and then either use some fancy lock-free data structure or use IPC to coordinate access to that memory.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

So, the key sentence here is this (edit for clarity):

the GUI can ... but also get a real-time view of what FS events the service is busy filtering.

and then:

Could it cope with 1000s of rows of table data updates per second sent to a GUI frontend?

The issue here isn't XPC, it's your GUI. That is, what are you expecting your GUI to do with 1000s of updates per second?

What you're running into here is an example of a very common phenomena where a backend component of your app is able to generate "data" far faster than your GUI is capable of processing it- not processing it in the sense of "examine the data" but process in the sense of "displaying useful information to the user". You can't draw to the screen 1000x per second and, more importantly, if you COULD do so all you'd do is create a meaningless blur unreadable data.

However, that's not the only problem here. None of this is activity is free and, in aggregate, the cost of that activity CAN add up to a quite significant overhead. It's very easy to assume that messaging is relatively "free", but I've seen MANY cases like this where the single BIGGEST performance drag on the entire process was simple message volume.

In my experience, there are basically two good solutions to this issue:

  1. (most common) Move the aggregator to the "backend". The reality is that your GUI was never going to up data "1000/s". It was going to take the data it had, condense/combine/process that data into something useful/meaningful, then update it's own interface at a far slower update rate (typically, 10-60/s). This approach means that you move that aggregation process into the backend, then pull that data "out" of the backend at/near the same rate the interface will update. Note that "aggregation" here doesn't necessarily mean that any data has actually been "lost". For example, even if your interface IS actually receiving ALL data (for example, something like a scrolling log list), simply changing the protocol to send a batch of updates instead of one message per update and DRAMATICALLY improve performance.

  2. (far less common) Change the mechanism to one that removes real "overhead". In most cases, this involves some form of shared memory but the key here is that the GUI is "spying" on the backend, NOT "messaging" with it. One classic example of this is how ring buffers are used in audio processing. No locking is involved, nor are producer/consumer directly "interacting" with each other. The producer writes to memory, the consumer reads from it, and it all works out because they've both timed their actions so they won't overlap and (hopefully) structured the data in a way that either side will be fine if "something" goes wrong. The key point here is that the producing side doesn't worry about what the consumer actually received or processed. It simply writes to memory and assumes the consumer will "figure things out".

Note that shared memory can be used in either of these approaches. As Quinn said;

It’s not uncommon for subsystems like this one to set up a shared memory region between the server and the client and then...

Option #2 above:

either use some fancy lock-free data structure

Option #1 above:

or use IPC to coordinate access to that memory.

Finally, I do feel obligated to comment on this:

...to keep track of FSEvents activity, at the file level (a firehose, some might say).

FSEvents is actually a CLASSIC example of design #1 above. Notably:

  • The offline event system is based on directories, not files, dramatically reducing "catch up" volume.

  • The "latency" argument to event stream creation is there so that the client can control how much "noise" the API generates.

  • Even at latency "0", I think there are other points where consolidation is occurring, so you STILL won't get EVERY event that actually occurred on the file.

This design is all intentional. We do have a true "firehose" API (kqueue), which is why FSEvents was designed as the configurable aggregator for that firehose.
__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Quinn...

IPC always has some per-message overhead, so you get more throughput if you batch your messages. However, batching introduces latency. Whether that latency matters depends on how “real” your real-time goal is.

The aim it to try to get the user to believe the processing is happening in the app, instead of the background service, so any user-perceived lag would be missing the goal.

It’s not uncommon for subsystems like this one to set up a shared memory region between the server and the client and then either use some fancy lock-free data structure or use IPC to coordinate access to that memory.

This is something that I'll have to learn about. I've had a brief search online a month or two ago for negative sentiment around shared memory acceess and to trying and identify how promising this approach is, for someone with no experience in it, to base an entire app's functionality on. But if that's the correct approach here, then I'm willing to get it done.

With shared memory, should I be looking for Apple API's or will it be POISX APIs that I might be able to learn from the Linux world? I'm still in the design phase, so details aren't really necessary right now, just trying to gauge how rocky the learning path will be.

Kevin...

... what are you expecting your GUI to do with 1000s of updates per second?

Good question. 1000's is a bit of an exaggeration, but I thought it might tease out some information from Apple Engineers about other people's experiences with XPC performance. As you mentioned, the single biggest performance drag was messaging.

I understand what you're saying about displaying useful information to the user. I ran the Red Canary Endpoint Security client a while back to see how they handled the performance issues around the Endpoint Security framework, and noticed that their GUI interface isn't dealing particularily well with the high volume. A feature to filter the live view in the GUI would be how I'd solve the overload. Or the data may simply be sent over XPC to then be processed further as per the user's commands. But as you say, it's better to move this all to the backend.

Yourself and Quinn mentioning shared memory is very much more like my use-case. While it might be uncommon (one reason why I wasn't too keen on it), "spying" on the backend is a very accurate way of describing what I plan to do. In fact you mention ring buffers, which is one of the data structures I'll need to spy on, but I'm not working with audio.

Thanks for your input, both of you, it's very helpful.

With shared memory, should I be looking for Apple API's or will it be POISX APIs that I might be able to learn from the Linux world?

Both (-:

I think it makes sense to use XPC to set up the shared memory. That way you can coordinate that work with other housekeeping stuff that you do over XPC. However, once the shared memory is in place then you can consult cross-platform resources for suggestions on how to manage the actual sharing.

IMPORTANT While doing this, be aware that Apple silicon [1] uses a much weaker memory model than Intel. If you’re reading, say, a blog post and it doesn’t talk about memory models, you can’t necessarily trust its advice.

Finally, I consider shared memory an option of last resort. Before going down that path I recommend that you prototype a solution based on messages and then measure its performance. If that is Fast Enough™, you’ll save yourself a lot of grief.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

[1] As did PowerPC, back in the day (-:

IPC always has some per-message overhead, so you get more throughput if you batch your messages. However, batching introduces latency. Whether that latency matters depends on how “real” your real-time goal is.

The aim it to try to get the user to believe the processing is happening in the app, instead of the background service, so any user-perceived lag would be missing the goal.

The key word there is "user perceived lag". The actual activity volume is some much higher an faster than the users ability to process data that any other latency is basically irrelevant.

What you're REALLY talking about here is that you'd like to create an animation that is:

-Fast enough that the user thinks things look really speedy.

-Slow enough that the user has a sense of what's going on.

-"Truth-y" enough that the contents generally correspond to whatever the user thinks is actually happening.

The key is that the word "animation" is critical here. This is NOT about "real time data", it's about presenting an attractive experience to the user. You're thinking about this in terms of XPC/IPC overhead, but the reality is:

-XPC/IPC latency is FAR smaller than screen latency.

-The users ability to read/process anything you put on the screen is FAR slower than the screen update latency.

There's no point trying to get your app more data faster. XPC can easily send data far faster than the screen can show it, just like the screen can show it far faster than the user could read it.

My actual suggestion here is that you START by figuring out what you want this to look like, THEN decide what data to back fill into that experience. I would quite literally start by experimenting with Timer's and fake data to figure out what looks attractive/useful/etc.

I was curious enough that I actually ended up mocking this up myself using a timer, a text field, and some simple numeric formatting to fake something short and vaguely textual. Basically, ~10 updates/sec was very fast but readable, ~30/s was very fast/blurry, 120/s was a vaguely numeric blur. Do the same tests yourself and I believe what you'll discover is that the interface that actually looks good is SO slow relative to the volume of data the ES client COULD provide that any kind of performance/bandwidth concern is totally irrelevant. The best approach here isn't about the ES Client sending "all" it's activity to the app, it's about the ES Client cherry picking the most "interesting" actions and then sending that to the client at regular intervals.

I understand what you're saying about displaying useful information to the user. I ran the Red Canary Endpoint Security client a while back to see how they handled the performance issues around the Endpoint Security framework, and noticed that their GUI interface isn't dealing particularily well with the high volume

Note that, in my experience, many of these issues are caused by the GUI being drowned in noise, NOT because the GUI didn't have enough data to look at. That is, GUI ends up looking slow and clunky because it's wasting its time deciding what to show "next" instead of ACTUALLY showing something. Getting the data to the app "faster" just slows the interface down more.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Hi Kevin,

Thank you very much for your insightful perspective on this.

-XPC/IPC latency is FAR smaller than screen latency

I hadn't given much thought to screen latency in relation to XPC, so thank you for highlighting this.

I was curious enough that I actually ended up mocking this up myself
~30/s was very fast/blurry,

I will definitely take your advice regarding a mock up and timers. But even your results have put my mind at easy regarding XPC performance and what I'm hoping to achieve.

My actual suggestion here is that you START by figuring out what you want this to look like.

I'm always thinking about, not so much worst-case scenarios, but extreme-case scenarios. If my backend is capable of running any number of concurrent background tasks for a user, and they want to try and keep an eye on them, whether using a summary overview, or multiple task detail views, I wonder how well that would handle performance wise, but as you say, it's down to how much information the user can interpret in their perceived "real-time".

Note that, in my experience, many of these issues are caused by the GUI being drowned in noise, NOT because the GUI didn't have enough data to look at.

I was thinking I might run into trouble when users attempt to interact with a table view containing historical activity. Something like Instruments detail views. If a user wants to scroll down through a fine-grained list of activity, and that's being pulled from a DB that the backend controls, the experience is probably going to be quite smooth, given the numbers you're seeing. Assuming the disk reads are sequential, which they will be.

I haven't worked in web development for over a decade, but having spent so much time in that field, developing a macOS app with a backend architecture really does appeal to me. I know a lot of apps simply don't require an architecture like this (Pages, Keynote, etc), but any unattended background processing, and you really have to start looking at things differently.

Thanks for the time you've spent on answering my questions, and taking the time to mock up something out of curiosity.

GUI + XPC Service App Architecture Performance
 
 
Q