Issue with app not waking up intermittently due to Pushkit (VOIP)

I am developing a VoIP service. Usually, when receiving a VoIP Push, Callkit is exposed immediately after receiving the message and the app is designed to be used. However, there is an extremely intermittent phenomenon (not well reproduced) where the app does not wake up even when receiving a VoIP Push. And after a long time, the app wakes up and Callkit is activated. (A long time after receiving the call…)

Has anyone experienced the above phenomenon? I wonder if there are any reported parts depending on the OS version. (I have identified that it does not occur in the 17.x version, but it is difficult to guarantee because it occurs extremely intermittently)

The app is not running in the background, but... Could this be happening if there are a lot of pending operations in the background?

I need help urgently

Answered by DTS Engineer in 826466022

I am developing a VoIP service. Usually, when receiving a VoIP Push, Callkit is exposed immediately after receiving the message and the app is designed to be used. However, there is an extremely intermittent phenomenon (not well reproduced) where the app does not wake up even when receiving a VoIP Push. And after a long time, the app wakes up and Callkit is activated. (A long time after receiving the call…)

Has anyone experienced the above phenomenon?

Yes. What you're describing is long standing behavior common to basically "all" voip apps. It's caused by the interaction between two different behaviors:

  1. Network level issues can and will prevent normal push delivery. The EXACT cause varies widely but all of them are outside the apps control.

  2. Due to the way APNS queues messages for delivery, it's possible for a push to be queued for delivery while the device was disconnected and then delivered well after it's expiration.

If you haven't already, I would encourage you to file a bug on #2, as the best behavior here would be for the system to simply discard these expired pushes. However, until that happens you're only option is to report an incoming call and then immediately end it.

I wonder if there are any reported parts depending on the OS version. (I have identified that it does not occur in the 17.x version, but it is difficult to guarantee because it occurs extremely intermittently)

I have two answers when it comes to identifying the "reason" behind this kind of failure:

  1. IF you've identified a specific network where this is happening AND that network is critical to your business AND you have some ability to influence/control that network, then it is possible to determine and resolve these issue. Doing so generally requires very in depth analysis of the ENTIRE system, including the physical space, network infrastructure, user habits, etc. The key here is to start with the assumption that the problem in some external factor, NOT iOS or your app.

  2. If you cannot meet the criteria of #1, then don't waste time thinking or trying to investigate it.

The problem here is that these issue are not "random" in that "something" is nearly always happening. However, that "something" is basically always some external factor like:

  • A particular network configuration is somewhat broken or at least "odd".

  • The user takes the subway every thursday.

  • Maintenance put in new venting directly in front of a critical AP.

...but those factors aren't something you'll find without REALLY digging into a specific failure in great detail, often in person and directly interacting with users. If you spend a lot of time focused entirely on your logging or other app/system level data, you'll find plenty of patterns... most of which are either noise or unrelated correlations.

The app is not running in the background, but... Could this be happening if there are a lot of pending operations in the background?

No.

And additionally, is there a way to actually check the time when the APNs VoIP Push was sent to the terminal?

Take a look at the "Push Notifications Console".

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

And additionally, is there a way to actually check the time when the APNs VoIP Push was sent to the terminal? I couldn't find it even after looking for Production.. I wonder if there is a way.

I am developing a VoIP service. Usually, when receiving a VoIP Push, Callkit is exposed immediately after receiving the message and the app is designed to be used. However, there is an extremely intermittent phenomenon (not well reproduced) where the app does not wake up even when receiving a VoIP Push. And after a long time, the app wakes up and Callkit is activated. (A long time after receiving the call…)

Has anyone experienced the above phenomenon?

Yes. What you're describing is long standing behavior common to basically "all" voip apps. It's caused by the interaction between two different behaviors:

  1. Network level issues can and will prevent normal push delivery. The EXACT cause varies widely but all of them are outside the apps control.

  2. Due to the way APNS queues messages for delivery, it's possible for a push to be queued for delivery while the device was disconnected and then delivered well after it's expiration.

If you haven't already, I would encourage you to file a bug on #2, as the best behavior here would be for the system to simply discard these expired pushes. However, until that happens you're only option is to report an incoming call and then immediately end it.

I wonder if there are any reported parts depending on the OS version. (I have identified that it does not occur in the 17.x version, but it is difficult to guarantee because it occurs extremely intermittently)

I have two answers when it comes to identifying the "reason" behind this kind of failure:

  1. IF you've identified a specific network where this is happening AND that network is critical to your business AND you have some ability to influence/control that network, then it is possible to determine and resolve these issue. Doing so generally requires very in depth analysis of the ENTIRE system, including the physical space, network infrastructure, user habits, etc. The key here is to start with the assumption that the problem in some external factor, NOT iOS or your app.

  2. If you cannot meet the criteria of #1, then don't waste time thinking or trying to investigate it.

The problem here is that these issue are not "random" in that "something" is nearly always happening. However, that "something" is basically always some external factor like:

  • A particular network configuration is somewhat broken or at least "odd".

  • The user takes the subway every thursday.

  • Maintenance put in new venting directly in front of a critical AP.

...but those factors aren't something you'll find without REALLY digging into a specific failure in great detail, often in person and directly interacting with users. If you spend a lot of time focused entirely on your logging or other app/system level data, you'll find plenty of patterns... most of which are either noise or unrelated correlations.

The app is not running in the background, but... Could this be happening if there are a lot of pending operations in the background?

No.

And additionally, is there a way to actually check the time when the APNs VoIP Push was sent to the terminal?

Take a look at the "Push Notifications Console".

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Thank you for your response. It was helpful.

The core issue is that VoIP push notifications are not being received by the app. Although the server sends APNs VoIP push notifications, the app does not receive them immediately. (Occurs approximately once in 50 attempts.)

As you suggested, I separated the cases and identified the reproduction path. The reproduction steps are as follows:

  1. Both Wi-Fi and Cellular are turned on. (In some cases, Wi-Fi is connected, but the Wi-Fi icon does not appear as active.)
  2. The Wi-Fi network is unstable or problematic. (e.g., public or secured Wi-Fi networks.) - cellular is good
  3. The server sends an APNs VoIP push notification, but the app does not receive it.
  4. After a short delay, the push notification is eventually received.
  5. (This case occurs with a 90% probability when Airplane Mode is toggled from Off to On and the push notification is received for the first time.)

From this case, it seems that all iOS apps fail to receive push notifications under these conditions. Once the device returns to a state where it can receive push notifications, the pending push notification is delivered.

Questions: Is there a way to ensure that APNs push notifications are received via Cellular Mode?

In the described case, it appears that if push notifications cannot be received via Wi-Fi, they are held until Cellular Mode becomes available, at which point they are delivered to the app. If a push notification is pending under such conditions, we would rather cancel the sent push notification. Is this possible?

First off, anecdotally, the sandbox (vs production) push system seems somewhat more susceptible to this sort of failure. From looking at lots of logs, the underlying issue seems to be that the system prioritizes establishing the production connection (for obvious reasons) and under the right/wrong conditions that can mean the sandbox connection ends up offline much longer than the production connection. That doesn't mean you won't see the same issues in production (you absolutely will), but my experience is that you will see this "more" in the sandbox.

Next, moving to here:

After a short delay, the push notification is eventually received.

How short is "short"? More broadly, what's the delay that's actually problematic to you?

My general experience here is that voip push latency from server submission to end device is ~4s and that anything less that ~10s should be considered "normal" behavior. Much higher delays than that are absolutely possible (see below), but that 4->10s range is what I recommend developers treat as "expected".

Next moving to here:

Questions: Is there a way to ensure that APNs push notifications are received via Cellular Mode?

No, and it wouldn't really help. The system actually does try and maintain connections on both interfaces but that isn't always possible and, more to the point, that only helps if cellularly connectivity is reliable at the point of failure. That's far from guaranteed.

Reordering slightly:

If a push notification is pending under such conditions, we would rather cancel the sent push notification. Is this possible?

The big issue here is that you need to voip pushes with "apns-expiration = 0". The details are below, but any non-zero expiration will dramatically increase the likelihood of pushes arrived much later than you'd prefer.

In the described case, it appears that if push notifications cannot be received via Wi-Fi, they are held until Cellular Mode becomes available, at which point they are delivered to the app.

In very broad terms, the delivery flow on the server works something like this:

  1. Check if device have an "open" connection to device.
  2. Yes-> place push into "delivery" queue.
  3. No-> place push into "pending" queue.

Architecturally, what "apns-expiration = 0" basically means is "don't put this push into the pending queue". In addition, the expiration time is assessed the payload enters the pending queue or when it's moved from the pending queue into the delivery queue. It does NOT occur at the point packet actually leave the delivery queue.

SO, if payload was placed into the delivery queue and the connection is then disrupted, it can create a situation where that payload is delivered later, after the connection is reestablished.

If you're experiencing this case, then I would strongly recommend that you file a bug report on the issue. Other than that, all you can really do about this case is report the dead call and end it immediately.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Issue with app not waking up intermittently due to Pushkit (VOIP)
 
 
Q