Notifications False Sucecss Delivery

We operate a social network application, SportsYou with over 3 million monthly active users and are experiencing significant issues with push notification delivery through APNs.

We have a large number of users reporting they are not receiving push notifications. Our infrastructure uses AWS SNS integrated with APNs to deliver notifications. However, AWS CloudWatch consistently reports successful delivery (Success response), even though users confirm they never received the notifications.

Because we receive success responses from AWS SNS, our system does not attempt to recreate or refresh the device endpoints. This leaves us unable to detect or recover from these delivery failures automatically.

This issue is widespread and inconsistent. It affects users across multiple variables including different iOS versions, different device models, and different versions of our application. We cannot identify a clear pattern that would help us isolate the root cause.

With millions of active users, even a small percentage of delivery failures represents thousands of users experiencing a degraded service. This is significantly impacting user engagement and satisfaction.

We need guidance on how to properly diagnose this issue and ensure reliable notification delivery to our users. Specifically, we'd like to understand why we're receiving success responses when notifications aren't being delivered, and what steps we can take to detect and prevent these failures.

The success response from APNs only indicates that the push request was accepted. It has no bearing on the disposition of the notification after that, nor whether it was actually delivered to the end device, nor after delivery the notification was processed/displayed.

There are many reasons a notification will not be seen by the user. The most common reasons are the device being offline (or unable to keep a persistent connection to APNs), using the wrong device token, the user having turned off notifications, malformed payloads which would not be able to be shown, and various other reasons.

If the issue is widespread, and seemingly random, the most likely explanation (and the most common cause) is that the device is not connected to APNs. While in those cases the notification will be stored until the device reconnects, but a second notification sent to the same device before that happens will overwrite the first one. As I don't know how you are determining the successful delivery of notifications, I can't say if that is indeed the problem. The second most common cause is the users knowingly or accidentally turning off or silencing notifications for your app.

Indeed, when I check our logs for a sampling of notifications for the app you mentioned, I see that the top reason for non delivery is the devices being offline (in this context offline means they are not connected to APNs, or it could be that you are using an old token which has since changed), and the second top reason being the devices not in a state to receive the notification.

That said, I noticed that your undelivered notifications due to the target device not being connected is higher than what we usually see. So, it is possible that you may have device tokens in your database that are no longer pointing to active devices. There is certainly some amount of stale tokens I see used.

If you would like us to deep dive into why a particular notification was not delivered, please share the apns-id of a recent notification (in the last couple days) and we can check what happened to it.

Thank you for the detailed explanation regarding notification delivery and the analysis of our undelivered notifications. Based on your feedback, it's clear that we have a higher than normal rate of device tokens pointing to devices that are not connected to APNs, which suggests we may indeed have stale tokens in our database. To better address this issue, I have a few questions about token validation:

  1. Is there a recommended approach or API call from APNs to proactively validate whether a device token is still associated with an active device before sending notifications? We could call it whenever the app starts.

  2. Since we're using AWS SNS as our provider, are there any specific AWS CLI commands or API calls you'd recommend to help us identify and manage stale tokens?

  3. What would be considered a best practice for token lifecycle management? For instance, should we be implementing periodic health check notifications to validate tokens, or is there a more efficient method you'd recommend?

We'd appreciate any guidance on strategies to maintain a clean and active token database to improve our overall delivery rates.

Thank you again for your assistance.

Notifications False Sucecss Delivery
 
 
Q