Safari 18+ network bug - randomly - The network connection was lost

We are experiencing an issue with Safari in all versions from 18.0 to 18.5 that does not occur in version 17. It affects both iPhones and Macs. And does not happen in Chrome or Windows.

The problem is impacting our customers, and our monitoring tools show a dramatic increase in error volume as more users buy/upgrade to iOS 18.

The issue relates to network connectivity that is lost randomly. I can reliably reproduce the issue online in production, as well as on my local development environment.

For example our website backoffice has a ping, that has a frequency of X seconds, or when user is doing actions like add to a cart increasing the quantity that requires backend validation with some specific frequency the issue is noticable...

To test this I ran a JS code to simulate a ping with a timer that calls a local-dev API (a probe that waits 2s to simulate "work") and delay the next HTTP requests with a dynamic value to simulate network conditions:

Note: To even make the issue more clear, I'm using GET with application/json payload to make the request not simple, and require a Pre-flight request, which doubles the issue.

(async () => {
  for (let i = 0; i < 30; i++) {
    try {
console.log(`Request start ${i} ${new Date().toLocaleString()}`);      
const res = await fetch(`https://api.redated.com:8090/1/*****/probe?`, {
        method: 'GET',
        mode: "cors",
        //headers: {'Content-Type': 'text/plain'},
        headers: { 'Content-Type': 'application/json' },
      });
      console.log(`Request end ${i} ${new Date().toLocaleString()} status:`, res.status);
    } catch (err) {
      console.error(`Request ${i} ${new Date().toLocaleString()} error:`, err);
    }
let delta = Math.floor(Math.random() * 10);   
console.log("wait delta",delta);
await new Promise(r => setTimeout(r, 1000 - delta)); 
  }
})();

For simplicity lets see a case where it fails 1 time only out of 10 requests. (Adjusting the "delta" var on the time interval create more or less errors...)

This are the results:

The network connection was lost error, which is false, since this is on my localhost machine, but this happens many times and is very reproducible in local and production online.

The dev-tools and network tab shows empty for status error, ip, connection_id etc.. its like the request is being terminated very soon.

Later I did a detailed debugging with safari and wireshark to really nail down the network flow of the problem:

I will explain what this means:

Frame 10824 – 18:52:03.939197: new connection initiated (SYN, ACK, ECE).

Frame 10831 – 18:52:04.061531: Client sends payload (preflight request) to the server.

Frame 10959 – 18:52:09.207686: Server responds with data to (preflight response) to the client.

Frame 10960 – 18:52:09.207856: Client acknowledges (ACK) receipt of the preflight response.

Frame 10961 – 18:52:09.212188: Client sends the actual request payload after preflight OK and then server replies with ACK.

Frame 11092 – 18:52:14.332951: Server sends the final payload (main request response) to the client.

Frame 11093 – 18:52:14.333093: captures the client acknowledging the final server response, which marks the successful completion of the main request.

Frame 11146 – 18:52:15.348433: [IMPORTANT] the client attempts to send another new request just one second later, which is extremely close to the keep-alive timeout of 1 second. The last message from the server was at 18:52:14.332951, meaning the connection’s keep-alive timeout is predicted to end around 18:52:15.332951 but it does not. The new request is sent at 18:52:15.348433, just microseconds after the predicted timeout. The request leaves before the client browser knows the connection is closed, but by the time it arrives at the server, the connection is already dead.

Frame 11147 – 18:52:15.356910: Shows the server finally sending the FIN,ACK to indicate the connection is closed. This happens slightly later than the predicted time, at microsecond 356910 compared to the expected 332951. The FIN,ACK corresponds to sequence 1193 from the ACK of the last data packet in frame 11093.

Conclusions: The root cause is related to network handling issues, when the server runs in a setting of keep-alive behavior and keep-alive timeout (in this case 1s) and network timming issue with Safari reusing a closed connection without retrying. In this situation the browser should retry the request, which is what other browsers do and what Safari did before version 18, since it did not suffer from this issue.

This behaviour must differ from previous Safari versions (however i read all the public change logs and could not related the regression change).

Also is more pronounced with HTTP/1.1 connections due to how the keep-alive is handled.

When the server is configured with a short keep-alive timeout of 1 second, and requests are sent at roughly one-second intervals, such as API pings at fixed intervals or user actions like incrementing a cart quantity that trigger backend calls where the probability of failure is high.

This effect is even more apparent when the request uses a preflight with POST because it doubles the chance, although GET requests are also affected.

This was a just a test case, but in real production our monitoring tools started to detect a big increment with this network error at scale, many requests per day... which is very disrupting, because user actions are randomly being dropped when the user actions and timming happens to be just near a previous connection, where keep alive timeout kicks-in, but because the browser is not yet notified it re-uses the same connection, but by the time it arrived the server is a dead connection. The safari just does nothing about it, does not even retry, be it a pre-flight or not, it just gives this error.

Other browsers don't have this issue.

Thanks!

Would you file a feedback for this issue and post the feedback number? Does the server response contain a Keep-Alive: timeout=1 header field?

We recommend adopting HTTP/2 and HTTP/3. For HTTP/1, use a keep alive timeout of 30s or longer on the server side.

Hello,

Yes I already have a feedback Id: FB19582986

Yes, the server has a short keep-alive timeout of 1s, as explained. It has always worked fine in other browsers (Chrome, Android, etc.) and even Safari versions before 18.

Our business case relies on this short timeout to improve API response speed for quick user actions or specific endpoints, we don't want to keep connections alive longer than needed.

Also, changing the timeout to 30s doesn’t fully solve the issue, it only postpones it. If a user action triggers a request just before the timeout, the connection may already be closed when it reaches the server, and the request will still fail.

We’re aware of HTTP/2 and 3, but that’s not the topic here. This behavior worked before Safari 18 and continues to work on all other platforms. You can’t “break the internet” like this, I can’t be the only one affected.

Thanks.

Anyone? Looks like no one cares, is there a more specific place where I can discuss this issue? I already open the feedback and no reply whatsoever

Safari does not reuse an HTTP/1 connection after it is idle for more than 29s, so increasing the idle timeout to more than 30s should resolve the issue. Leaving idle connections longer improves the responsiveness of the service.

The reason I asked about the Keep-Alive response header field was that the header field affects client behavior and might explain the difference in iOS 18. Does the server send that particular header in the response?

GET requests are automatically retried if the connection is lost if a response is not received. POST requests are not idempotent so they cannot be retried. This has always been the case in Safari.

Also GET requests are not allowed to contain a body. I do not think it is possible to add a JSON body to GET. Would you confirm whether the request has a body?

  1. We use short Keep-Alive to limit idle connections to a few seconds for quick API bursts, then close them to avoid wasting server resources.

  2. About idempotent request, I'm aware of the specification, although let me tell you that GETs are being retried and the issue also occurs, with the "connection is lost" error when the GET includes Content-Type: application/json, which turns the request into a CORS “non-simple” request with a preflight and safari reusing closed connection for the preflight, it fails and the GET request fails entirely.

  3. POST should also retry if the connection closed before sending, like other browsers do and Safari used to. Because no backend POST change occurs since the request never reached the server, so retrying is safe... even if its not in the specification... This is what google does, and all browsers do actually, and safari did previously.

This Safari 18+ behaviour is a regression in connection reuse also caused by failed preflights and POST (breaking apps performing POST actions as well)

Other browsers works fine. I'm aware of many others users/Devs reporting the same issue, so this change is widely breaking expected web behavior.

When Keep Alive is turned off this issue does not occurs. So if this was an issue introduced by safari 18+, only happens on Safari browsers, then for me this is a regression of expected behaviour and should be fixed...

Also about the argument of the idempotency... No side effects can occur if connection is reset before body transmission. Other browsers open a new connection or retry in this case.

I think Safari, should either open a new connection when close timeout is imminent (reading the timeout directive of Keep Alive), or retry the POST when no bytes were transmitted, which is the case because the connection is closed, nothing was transmitted, so this is not creating side effects like "double POSTing" thus can be considered idempotent safe, and can be retried.

Spec 9110 9.2.2-6: "For example, a client might automatically retry a POST request if the underlying transport connection closed before any part of a response is received, particularly if an idle persistent connection was used." (https://datatracker.ietf.org/doc/html/rfc9110#section-9.2.2-6)

Also related to https://datatracker.ietf.org/doc/html/rfc7231#section-6.5.7

And the truth is that Safari <18 works fine... So what changed in safari from 17 to 18?

According to https://datatracker.ietf.org/doc/html/rfc7231#section-6.5.7

6.5.7. 408 Request Timeout

The 408 (Request Timeout) status code indicates that the server did not receive a complete request message within the time that it was prepared to wait. A server SHOULD send the "close" connection option (Section 6.1 of [RFC7230]) in the response, since 408 implies that the server has decided to close the connection rather than continue waiting. If the client has an outstanding request in transit, the client MAY repeat that request on a new connection.

The client "MAY" repeat the request (I believe it does not have to be idempotent actually)!

Because when safari tried to reuse a connection that determined to be closed during transit (in the server) but the client(safari) doesn't know yet (delay of microseconds) the transmission of data fails (socket is not open afterall the connection was closed), and so the request was not initiated or completed, due to keep alive timeout, which would be a error 408 Request Timeout.

Safari could fix this by avoiding reusing connections near microseconds of time, as provided by the directive Keep-Alive timeout=1, or doing 1 single retry of the request event for POST or Preflight ones that affect POST

Also I was searching in Chrome source code why this works fine, and how they solved it (apparently before 2014 chrome had that issue aswell and then fixed it)

The code changes: https://codereview.chromium.org/303443011/patch/20001/30002

and the discussion of the change here: https://issues.chromium.org/issues/41110072

Safari 18&#43; network bug - randomly - The network connection was lost
 
 
Q