Establishing stable multi peer data transfer with Network framework

Hi all,

I'm relatively new to posting in forums, but here we go. I am currently working on an enterprise platform for iPad, and we've decided to implement Network framework to link the iPads of a particular location together for data transfer. Basically the iPads will be constantly kept in sync with one another by changing each iPad's Core Data memory store accordingly. This works great when all iPads are within range! However, when an iPad leaves the area and returns, it does not catch the updates of the others or send out its own updates efficiently. It sometimes gets them, sometimes does not. There is a lot of nuance to networking code that I'm not familiar with yet, and I frequently get very low-level network errors that I do not understand very well, and which yield little to no search results when looked up. There aren't a lot of examples out there, and I'm not sure I've set up the network to do peer to peer properly, which is reflected in the errors I wanted to ask about. There are two in specific, one is far less reproducible. The first yields a crash log, so if anyone can point me to a tool to debug this one that would be great. It also does not crash the app or cause permanent functionality loss, but that's not a good reason to ignore a bug, as that could be very different in production. I'd like to know how to handle this correctly. I have some basic print debug statements on the fail states of NWBrowser, NWConnection, and NWListener, and none of them are ever hit when these particular errors surface.

__nwlog_err_simulate_crash simulate crash already simulated

nw_channel_disconnect_flow protocol ip_listener has invalid disconnected callback, dumping backtrace:     [arm64] libnetcore-2288.140.7   0  libnetwork.dylib          0x00000001a14e9d2c __nw_create_backtrace_string + 120   1  libnetwork.dylib          0x00000001a1177b28 40081C9D-213A-3EFC-8E66-3FC0E3110449 + 3210024   2  libnetwork.dylib          0x00000001a11865a0 40081C9D-213A-3EFC-8E66-3FC0E3110449 + 3270048   3  libnetwork.dylib          0x00000001a1184964 40081C9D-213A-3EFC-8E66-3FC0E3110449 + 3262820   4  libnetwork.dylib          0x00000001a118335c 40081C9D-213A-3EFC-8E66-3FC0E3110449 + 3257180   5  libnetwork.dylib          0x00000001a1182da4 40081C9D-213A-3EFC-8E66-3FC0E3110449 + 3255716   6  libnetwork.dylib          0x00000001a146f124 40081C9D-213A-3EFC-8E66-3FC0E3110449 + 6320420   7  libnetwork.dylib          0x00000001a13d9f0c 40081C9D-213A-3EFC-8E66-3FC0E3110449 + 5709580   8  libnetwork.dylib          0x00000001a13d9b34 40081C9D-213A-3EFC-8E66-3FC0E3110449 + 5708596   9  libnetwork.dylib          0x00000001a13d92bc 40081C9D-213A-3EFC-8E66-3FC0E3110449 + 5706428   10 libnetwork.dylib          0x00000001a1180888 40081C9D-213A-3EFC-8E66-3FC0E3110449 + 3246216   11 libnetwork.dylib          0x00000001a117e2b0 40081C9D-213A-3EFC-8E66-3FC0E3110449 + 3236528   12 libnetwork.dylib          0x00000001a117d91c 40081C9D-213A-3EFC-8E66-3FC0E3110449 + 3234076   13 libdispatch.dylib          0x0000000102555de0 _dispatch_client_callout + 20   14 libdispatch.dylib          0x0000000102558ed0 _dispatch_continuation_pop + 616   15 libdispatch.dylib          0x000000010256eca0 _dispatch_source_invoke + 1384   16 libdispatch.dylib          0x0000000102560324 _dispatch_workloop_invoke + 2200   17 libdispatch.dylib          0x000000010256ba50 _dispatch_workloop_worker_thread + 1600   18 libsystem_pthread.dylib       0x00000001eb4c27a4 _pthread_wqthread + 276   19 libsystem_pthread.dylib       0x00000001eb4c974c start_wqthread + 8

The other error is about a TCP listener failing to receive data within a time limit, and the connection in that instance will permanently die. Meaning, the iPad will never communicate with the others again after that. I haven't seen that one since I made some changes, but I'd rather be sure it is addressed. However it again has to do with connections, which leads me to suspect I haven't got them set up quite right.

Has anyone encountered these issues before, or done something similar and can provide a bit of direction? I would love some insights 🙇‍♂️

The backtrace on the crashed thread is not symbolicated so it's hard to tell for sure what is going on.

Regarding:

The other error is about a TCP listener failing to receive data within a time limit, and the connection in that instance will permanently die. Meaning, the iPad will never communicate with the others again after that. I haven't seen that one since I made some changes, but I'd rather be sure it is addressed. However it again has to do with connections, which leads me to suspect I haven't got them set up quite right.

This sounds like you are coming back to a local network and your NWListener may have gone stale. What is the scenario where you listener will fail to receive data within a time limit?

Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com

Yeah that was my feeling as well. I don't know how one would go about debugging that, or how I could get it in a symbolicated form. And yes, that is exactly what is happening with the TCP listener error. It only occurs when returning to the network.

I found a way to reliably reproduce the TCP listener fail error. This code is based on the "Building a Custom Peer-to-Peer Protocol" example project, and the Advances in Networking Part 2 WWDC session, so I've created a Peer Browser/Listener/Connection class. In the Peer Browser and Peer Listener, if I do not account for the .hostPort case in the connection's updateStateHandler and the browser results, and do what the example does where they only account for the .service case, I get the following crash.

TCP listener failed to receive inbound connection within timeout, dropping connection, dumping backtrace:     [arm64] libnetcore-2288.140.7   0  libnetwork.dylib          0x00000001a14e9d2c __nw_create_backtrace_string + 120   1  libusrtcp.dylib           0x00000001a39fe2b8 62F90A19-A963-3029-ACBB-56B13A9CAFF8 + 348856   2  libusrtcp.dylib           0x00000001a3a0008c 62F90A19-A963-3029-ACBB-56B13A9CAFF8 + 356492   3  libusrtcp.dylib           0x00000001a39ffce8 62F90A19-A963-3029-ACBB-56B13A9CAFF8 + 355560   4  libusrtcp.dylib           0x00000001a3a00ed8 62F90A19-A963-3029-ACBB-56B13A9CAFF8 + 360152   5  libdispatch.dylib          0x0000000101159de0 _dispatch_client_callout + 20   6  libdispatch.dylib          0x000000010115ced0 _dispatch_continuation_pop + 616   7  libdispatch.dylib          0x0000000101172ca0 _dispatch_source_invoke + 1384   8  libdispatch.dylib          0x0000000101164324 _dispatch_workloop_invoke + 2200   9  libdispatch.dylib          0x000000010116fa50 _dispatch_workloop_worker_thread + 1600   10 libsystem_pthread.dylib       0x00000001eb4c27a4 _pthread_wqthread + 276   11 libsystem_pthread.dylib       0x00000001eb4c974c start_wqthread + 8

But if I add both cases to my set of connections to monitor, this never happens. I still get the ip_listener error when I come back online though. But that one doesn't seem to affect the performance and things begin functioning as normal again.

Yeah that was my feeling as well. I don't know how one would go about debugging that, or how I could get it in a symbolicated form.

In Xcode when your app crashes, lldb should come up in the console and you can type bt into the console and this will give you the symbolicated trace of the crashing thread. If this happened on the device then you will need to get the crash logs from the device, so see this article here.

But if I add both cases to my set of connections to monitor, this never happens.

Well done debugging and finding this missing case.

Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com
Establishing stable multi peer data transfer with Network framework
 
 
Q