We operate a social network application, SportsYou with over 3 million monthly active users and are experiencing significant issues with push notification delivery through APNs.
We have a large number of users reporting they are not receiving push notifications. Our infrastructure uses AWS SNS integrated with APNs to deliver notifications. However, AWS CloudWatch consistently reports successful delivery (Success response), even though users confirm they never received the notifications.
Because we receive success responses from AWS SNS, our system does not attempt to recreate or refresh the device endpoints. This leaves us unable to detect or recover from these delivery failures automatically.
This issue is widespread and inconsistent. It affects users across multiple variables including different iOS versions, different device models, and different versions of our application. We cannot identify a clear pattern that would help us isolate the root cause.
With millions of active users, even a small percentage of delivery failures represents thousands of users experiencing a degraded service. This is significantly impacting user engagement and satisfaction.
We need guidance on how to properly diagnose this issue and ensure reliable notification delivery to our users. Specifically, we'd like to understand why we're receiving success responses when notifications aren't being delivered, and what steps we can take to detect and prevent these failures.