Socket Becomes Unresponsive in Local Connectivity Extension After Lock Screen

I’m developing an app designed for hospital environments, where public internet access may not be available. The app includes two components: the main app and a Local Connectivity Extension. Both rely on persistent TCP socket connections to communicate with a local server.

We’re observing a recurring issue where the extension’s socket becomes unresponsive every 1–3 hours, but only when the device is on the lock screen, even if the main app remains in the foreground. When the screen is not locked, the connection is stable and no disconnections occur.

❗ Issue Details:

• What’s going on: The extension sends a keep-alive ping packet every second, and the server replies with a pong and a system time packet.

• The bug: The server stops receiving keep alive packets from the extension.

 • On the server, we detect about 30 second gap on the server, a gap that shows no packets were received by the extension. This was confirmed via server logs and Wireshark).

 • On the extension, from our logs there was no gap in sending packets. From it’s perspective, all packets were sent with no error.

 • Because no packet are being received by the server, no packets will be sent to the extension. Eventually the server closes the connection due to keep-alive timeout.

 • FYI we log when the NEAppPushProvider subclass sleeps and it did NOT go to sleep while we were debugging.

🧾 Example Logs:

Extension log:

2025-03-24 18:34:48.808 sendKeepAliveRequest()

2025-03-24 18:34:49.717 sendKeepAliveRequest()

2025-03-24 18:34:50.692 sendKeepAliveRequest()

... // continuous sending of the ping packet to the server, no problems here

2025-03-24 18:35:55.063 sendKeepAliveRequest()

2025-03-24 18:35:55.063 keepAliveTimer IS TIME OUT... in CoreService. // this is triggered because we did not receive any packets from the server

Server log:

2025-03-24 18:34:16.298 No keep-alive received for 16 seconds... connection ID=95b3... // this shows that there has been no packets being received by the extension ...

2025-03-24 18:34:30.298 Connection timed out on keep-alive. connection ID=95b3... // eventually closes due to no packets being received

2025-03-24 18:34:30.298 Remote Subsystem Disconnected {name=iPhone|Replica-Ext|...}

✅ Observations:

• The extension process continues running and logging keep-alive attempts.
• However, network traffic stops reaching the server, and no inbound packets are received by the extension.
• It looks like the socket becomes silently suspended or frozen, without being properly closed or throwing an error.

❓Questions:

• Do you know why this might happen within a Local Connectivity Extension, especially under foreground conditions and locked ?
Is there any known system behavior that might cause the socket to be suspended or blocked in this way after running for a few hours?

Any insights or recommendations would be greatly appreciated.

Thank you!

Correction for the original post:

On the extension, from our logs there was no gap in sending packets. From it’s perspective, all packets were sent with no error..

this is wrong ,there is a gap in sendKeepAlive message from extension to server. this is what we want to understand

2025-03-24 18:34:50.692 sendKeepAliveRequest()

// 65 seconds gap

2025-03-24 18:35:55.063 sendKeepAliveRequest()

I’m developing an app designed for hospital environments, where public internet access may not be available. The app includes two components: the main app and a Local Connectivity Extension. Both rely on persistent TCP socket connections to communicate with a local server.

A few questions and speculations here:

  • What network(s) have you reproduced this on? Are you seeing this on multiple networks (particularly controlled networks with very simply configurations) or on a specific network?

  • Is your server located on the the same local network as the clients or is it on a remote network or otherwise isolated from the clients?

  • Is there an intermediate NAT server involved?

  • What happens when the client identifies the problem at reconnects? Is it able to reconnect immediately or is there some delay/issue?

This just a guess, but one issue I've seen is that poorly behaved NAT servers will basically cause EXACTLY the behavior you're seeing. What they actually do is terminate the outer connection (from the server to the NAT router) but do NOT notify or otherwise disturb the inner connection (local device to NAT router). That means the server "knows" about the failure immediately but the client won't find out until it sends data and times out (or misses some expected transmission).

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Socket Becomes Unresponsive in Local Connectivity Extension After Lock Screen
 
 
Q