Does SystemMemoryReset related to my Application abnormal exit?

Hello, we are currently developing a VPN application. Recently, we have encountered several cases where the Network Extension process terminates unexpectedly. We cannot find any related crash logs on the device, but we can find system SystemMemoryReset logs. The timestamps in these logs closely match (with millisecond accuracy) the times when our VPN process terminated unexpectedly. We have a few questions:

1.How is the SystemMemoryReset event generated, and can this event be avoided?

2.When a SystemMemoryReset occurs, can it cause our VPN background process to be killed or the system to reboot?

3.If the background process can be killed, what conditions need to be met for this to happen, and what methods can we use to prevent the VPN background process from being killed?

4.Based on these logs, does our application have any related issues (the process name is CorplinkTunnel)? Do you have any suggestions for modifications?

1.How is the SystemMemoryReset event generated, and can this event be avoided?

There isn't any single case. The "SystemMemoryReset" log is a general purpose logs structure that intended to capture the broad state of overall system memory. Exactly what data it finds and what the data means will depend entirely on exactly what caused the log.

In the case of your logs, all three of them had some variation of this:

"eventReason" : "User reclaimable memory dropped below the limit. User reclaimable current: 59%. User reclaimable minimum: 65%",

This isn't actually a case I'd seen recently and I was actually confused for awhile. Typically in a log like this you'll see indications that processes were terminated (along with why they were), but there isn't any of that here. What actually caught my eye and made me dig into this deeper was the time stamps:


  "date" : "2024-06-25 23:21:08.54 +0800",
  "date" : "2024-07-01 02:46:55.41 +0800",
  "date" : "2024-06-29 02:53:53.00 +0800",

...in other words, all of these happened in "the middle of the night". What actually happened here wasn't that any single process was terminated, but instead the "all" of them were terminated by the kernel "userspace rebooting" the device. Basically, the device restarted most of the system but without doing a "full power off" (which would have fully locked the device and required a complete unlock).

What's going on here is the (longstanding) solution to a fairly tricky side effect of the systems on going evolution. Early in iOS's history the system was simply enough and total device memory constrained enough the memory issues were (relatively) self correcting- if "something" went wrong in any given component, eventually memory pressure would "push" the device to the point where that component would be terminated and things would go "back to normal" (until the problem happened again). Obviously these were bugs that still needed to be fixed, but the basic cycle was straightforward and the device couldn't really get "stuck" in any odd state.

However, what makes this tricky is how a few, largely positive, factors have made this much more complicated:

  1. Devices have far more memory, creating much more space for "stuff" to leak.

  2. The system's has fine grained tools for constraining the memory usage of individual system components, limiting components ability impact each other.

  3. There are FAR more system components, both in terms of the "total" daemons/processes/etc AND in how much variation there is between specific devices (think of all possible extension point combinations).

The combination of these factors creates a new failure case. Basically, you can end up in the state where the device is still "working" (apps still run, no visible failure, etc) in the general sense even though large amounts of memory are being held by "stuff" that shouldn't really be holding them. This has secondary effects on the user experience (for example, suspended apps being killed more frequently than they "should" be), but not in a way that would necessarily be clear or obvious. Further complicating things, these states aren't really straightforward or predictable, as they're caused as much by the complexity created by #3 as any single bug. In many case there are bugs involved (which we do find and fix), but the system is complex enough that simply "fixing bugs" won't actually prevent the problem from reappearing.

The solution we landed on is what's triggering this log. The system detects that the device is entering this state so it generates a log (so we can look for and fix bugs), then reboots userspace so that the device is starting with a "clean slate". The system does this in "the middle of the night" so that the user isn't aware it happened and, in practice, that generally keeps the device in a more "normal" operating configuration for long periods of time.

2.When a SystemMemoryReset occurs, can it cause our VPN background process to be killed or the system to reboot?

In this particular case, the system IS in fact rebooting but that is not true of all (or even most) of these logs.

3.If the background process can be killed, what conditions need to be met for this to happen, and what methods can we use to prevent the VPN background process from being killed?

I don't have any reason to believe your VPN client had any particular role in triggering this, particularly since the VPN extension points limits are SO small that your system wide impact is relatively minimal.

__
Kevin Elliott
DTS Engineer, CoreOS/Harware

Does SystemMemoryReset related to my Application abnormal exit?
 
 
Q