When my app launches, it makes maybe 9 or so network requests to load initial data. It also reads some data from disc.
Sporadically, I'm seeing an issue where some of the network requests succeed, but anything involving reading from disc does not load immediately. I'm able to move around in the app, tap buttons, swap tabs, swipe pages, so my main actor isn't stuck. Other data that don't involve disc reading / writing is also blank. About 2 minutes in, suddenly everything loads (both stuff from disc and stuff from the network), nearly instantly, the way it should have done when the app launched.
Server logs show more initial network requests succeed than we can see data loaded in the app, and then about 2 minutes later, there's a flood of the rest of the requests which then succeed.
The responses to some of these initial network requests cause us to make other network requests, and the sever sees some of those start right away.
However, other consequences of these first requests are to touch the disc (to search for manually-cached data), and anything that is supposed to happen after that does not succeed until the 2 minute mark.
But what bothers me is some things in the app which don't touch the disc also seem to have successful network requests.
I'm seeing it on an iPhone 14Pro running iOS 18.2.1, with 607 GB of disc space available.
When I take screenshots of the loading screens in my app during the apparent freeze, the clock in the screenshots are right - they reflect the clock at the moment I took the screenshot, but the EXIF data in all dozen or so images shows the exact second 2 minutes later when the server gets the resulting flood of network requests. Screenshots taken after the freeze is over have exif timestamps that match the screenshots, as short as 5 seconds after the freeze ends. The screenshot file names, though sequential, are out of order. for instance, some screenshots from 12:58 have file names numbered after screenshots taken at 12:59. but not all are out of order.
This seems like disc contention has spread outside the app, and is impacting the system writing the images to disc.
How do I diagnose a cause for this? How does disc contention affect the networking? I have caching turned off for my network requests. We only have a manual image cache, but I don't know how that would stall the display of data that should fetch and display without attempting to hit the image cache.
This happens maybe a couple of times a day for some people, maybe once every couple of weeks for others, but of course, it never when we're trying to debug it.
I'm coding resumable uploads using iOS 17's URLSession's uploadTask(withResumeData:. This function returns a non-Optional URLSessionUploadTask and does not throw. In cases where the system determines the resumeData is no longer valid, how do I detect that (so I can create a new URLSessionUploadTask from scratch)?
I'm doing this for background uploads, so it's all URLSessionDelegate apis, but what are the failure modes, and what Error types and Codes would we get specially?
Obviously, I expect the resume data is no longer usable or necessary when get a server success i.e. in the 2xx range. Does the resume data also become invalid for other server responses, like 4xx's? or 5xx's?.
I expect the resume data usually shouldn't become invalid when getting URLError's like .networkConnectionLost, since that's like half the point of having the feature in the first place, to resume after the a broken network connection. But I do expect that if the resumeData is invalid, then I should be able to reach the server and get a server response, so in that case what Code would we get?
I'm assuming the system is caching our upload file somewhere, and the resume data somehow makes a reference to it, so does that file get optimized away at some point in time when left untouched, and need us to start a fresh upload? We are also saving the file for potential future re-uploads, until we get certain assurances of completion from our backend, but I am just wondering on which logic branches I need to determine that the resumeData I thought I could use is no longer usable.