Debugging Networking Problems
When writing networking-based software, developers often make a few common design and usage mistakes that can cause serious performance problems, crashes, and other misbehavior. This chapter highlights a few of those mistakes and describes how to avoid or fix them.
Clean Up Your Connections
TCP connections remain open until either the connection is explicitly closed or a timeout occurs. Unless TCP keepalive is enabled for the connection, a timeout occurs only if there is data waiting to be transmitted that cannot be delivered. This means that if you do not close your idle TCP connections, they will remain open until your program quits.
The recommended way to enable TCP keepalive is by setting the
SO_KEEPALIVE flag on the socket with
Avoid POSIX Sockets and CFSocket on iOS Where Possible
Using POSIX sockets directly has both advantages and disadvantages relative to using them through a higher-level API. The main advantages are:
Sockets greatly simplify porting networking code to and from non-Apple platforms.
You can support protocols other than TCP.
The main disadvantages are:
Sockets have many complexities that are handled for you by higher-level APIs. Thus, you will have to write more code, which usually means more bugs.
In iOS, using sockets directly using POSIX functions or
CFSocketdoes not automatically activate the device’s cellular modem or on-demand VPN.
The most appropriate times to use sockets directly are when you are developing a cross-platform tool or high-performance server software. In other circumstances, you typically should use a higher-level API.
Avoid Synchronous Networking Calls on the Main Thread
If you are performing network operations on your main thread, you must use only asynchronous calls.
Network communication is inherently prone to delays. For example, a DNS request that times out can take upwards of half a minute, and a
connect can take even longer. If you perform a synchronous networking call—a call that waits for a response and then returns the data—the thread that made the call becomes blocked in the kernel until the operation either completes or fails. If that thread happens to be your program’s main thread, your program becomes unresponsive.
In an OS X GUI app, this causes the spinning wait cursor to appear. Menus become unresponsive, clicks and keystrokes are delayed or lost, and your users begin shouting words that should not be repeated in polite company.
In iOS, your app is killed by the watchdog timer if it doesn’t respond to user interface events for a predetermined amount of time. The timeouts for most networking operations are longer than that of the iOS watchdog timer. Thus, your app is guaranteed to be killed if your network connection fails during a synchronous networking call.
If your iOS app generates a crash report with the exception code
0x8badfood, this means that the iOS watchdog timer killed your app because it was unresponsive; such a crash may have been caused by a synchronous networking call.
Cocoa (Foundation) and CFNetwork (Core Foundation) Code
The easiest and most common way to perform networking asynchronously is to schedule your networking object in the current thread’s run loop.
All Foundation or
CFNetwork networking objects—including
CFSocket—have built-in run-loop integration. Each of these objects has a set of delegate methods or callback functions that are called throughout your object’s network communication.
If you want to perform a computationally intensive task (such as processing a large downloaded file) over the course of your network communication, you should do so on a separate thread. The
NSOperation class lets you encapsulate such a task in an operation object, which you can easily run on a separate thread by adding it to an
For more information, read “Run Loops” in Threading Programming Guide. For sample code, see LinkedImageFetcher and ListAdder.
POSIX networking presents some unique challenges, and thus has some unique tips:
Create sockets correctly. The proper call for creating a TCP socket is:
socket(PF_INET, SOCK_STREAM, IPPROTO_TCP); // IPv4
socket(PF_INET6, SOCK_STREAM, IPPROTO_TCP); // IPv6
You can create a UDP socket by replacing
IPPROTO_UDP in the snippet above and replacing
If you use the select system call, keep track of which sockets are actually in use. A number of poorly written socket-based programs incorrectly assume that they own every socket or file descriptor from 3 (or 0) up through the highest-numbered socket that they currently have open, and use a simple
for loop to fill in the file descriptor set. This not only can result in poor performance, but also can cause incorrect behavior if the daemon opens files that may end up with file descriptor numbers in that range.
Instead, you should keep two pieces of state in your networking code:
An integer that keeps track of the highest-numbered socket that you currently have open. You must update this (if needed) whenever you open or close a new socket.
FD_SETin which the relevant bit is set for every open socket. Instead of passing this set to the
selectsystem call (which destroys the contents of any descriptor sets passed as parameters), you should copy the descriptor set with
FD_COPYand pass the copy to
Alternatively, if you have an array of file descriptors (or some other way to keep track of them), you can construct a new
FD_SETfrom that array.
Use run loops and nonblocking I/O to manage asynchronous reads. In all but the most trivial command-line tools, POSIX networking should always be performed in a run-loop-based fashion using either GCD, the
kqueue API, or the
select system call.
Avoid synchronous networking on the main thread. If you are using synchronous calls, POSIX networking should never be performed on the main program thread of any GUI application or other interactive tool. This includes:
Reading from and writing to sockets that are not set to non-blocking.
Connecting a socket.
Performing DNS lookups (particularly with
Instead, either perform the synchronous calls on a separate thread or use an API that performs these operations asynchronously, such as GCD, the
kqueue API, or higher-level APIs that integrate with Cocoa run loops (
CFStream, for example).
For more information about the options available to you, read “Assessing Your Networking Needs” and “Using Sockets and Socket Streams.”
Set appropriate timeouts if you are using select. The
select system call allows you to specify a period of time to wait before returning control to your code. The value you choose for this timeout is very important:
If your code needs to perform other operations in the background, this timeout value determines how often those other operations run. The shorter the value, the more frequently your code can do other things in the background, but the more CPU cycles it uses while waiting.
In general, if your app is waking up more than a few times per second (or even once per second on iOS), you should probably be doing that work on a separate thread. Moreover, this is usually a sign that you’re doing something wrong, such as polling to see whether a file has been deleted or modified instead of using a more appropriate notification-based technology such as the
If you are using this wakeup to check to see if another thread within your application has done something, you should consider using a pair of connected sockets instead. To do this, first a socket pair with
socketpair. Then add one of the connected sockets to your descriptor set. When the other thread needs to tell your networking thread about an event, it can wake your networking thread by writing data into the other socket. Be sure to keep track of which sockets were created in this way so that your code can handle the data differently depending on whether it came from an outside connection or from within your app.
If your code does not need to perform any operations in the background, you should pass
NULLensures that your code takes as little CPU as possible while waiting for incoming connections or data.
In addition, if you are using timeouts, be aware that the
select system call returns
EINTR when the timer fires. Your code should be prepared for this return value and should not treat it as an error.
Use POSIX sockets efficiently (if at all). If you are using POSIX sockets directly:
Handle or disable
When a connection closes, by default, your process receives a
SIGPIPEsignal. If your program does not handle or ignore this signal, your program will quit immediately. You can handle this in one of two ways:
Ignore the signal globally with the following line of code:
Tell the socket not to send the signal in the first place with the following lines of code (substituting the variable containing your socket in place of
int value = 1;
setsockopt(sock, SOL_SOCKET, SO_NOSIGPIPE, &value, sizeof(value));
For maximum compatibility, you should set this flag on each incoming socket immediately after calling
acceptin addition to setting the flag on the listening socket itself.
Use nonblocking sockets where possible.
Keep the socket’s send buffer full to the extent possible.
Handle incoming data early and often to keep the socket’s receive buffer empty.
Support both IPv4 and IPv6.
Check return values from socket reads and writes.
Be prepared to handle
EWOULDBLOCKerrors, which indicate that no data is available when reading, or that no output buffer space is available when writing. These errors are normal, non-fatal errors; your program should not close the connection when it gets them.
For an example of POSIX code that follows these rules, see “Writing a TCP-Based Server” in Networking Programming Topics.
Avoid Resolving DNS Names Before Connecting to a Host
The preferred way to connect to a host is with an API that accepts a DNS name, such as
Although your program can resolve a DNS name and then connect to the resulting IP address, you should generally avoid doing so. DNS lookups usually return multiple IP addresses, and (at the application layer) it is not always obvious which IP address is best for connecting to the remote host. For example:
Most modern computers and other mobile devices are multihomed. This means that they exist on more than one physical network at the same time. Your computer might be on Wi-Fi and Ethernet; your cellular phone might be on Wi-Fi and 3G; and so on. However, not all hosts are necessarily available over every connection.
For example, the Remote app on your iPhone lets you control your Apple TV, but only over the Wi-Fi connection. The two devices cannot communicate with one another over your phone’s cellular connection because your Apple TV has no public IP address.
If both your device and the server you are connecting to have multiple IP addresses on different networks, the best IP address for connecting to the server may depend on which network the connection will pass through.
For example, if your home media server has one IP address on your wired LAN and a second IP address on a Wi-Fi network, the operating system can often detect the performance difference and favor the faster, LAN-based IP address. By contrast, if your program looks up the IP address, it has approximately an equal chance of connecting to either IP address.
If the server has both IPv4 and IPv6 protocols, it may not be reachable using both protocols. By using a host-name-based API, the operating system can try both simultaneously and use the first one that connects successfully. If your program looks up the IP address, it might not connect successfully, depending on which address it chose.
If the DNS server is an older server that does not handle IPv6 and you request an
AAAA(IPv6 address) record, synchronous DNS queries will block until the request times out (30 seconds by default). If you use an API that takes a hostname, the operating system hides this problem from you by issuing IPv4 and IPv6 requests in parallel, and then canceling the outstanding IPv6 lookup as soon as an IPv4 connection is established successfully.
If a server is using Bonjour to advertise a service over a link-local IP address (because DHCP is not working on the network for some reason), if your app looks up that service and resolves it to an IP address, the resulting address will work only as long as the device retains that IP address. If the DHCP server comes online, the IP address may change, at which time your program will no longer be able to connect to the old address.
If you connect using the advertised name instead, your program will continue to be able to connect even after the server finally obtains an IP address (so long as the computer or device running that program receives the updated advertisement).
If the server is on the other side of an on-demand VPN that becomes available only when the user tries to access a whitelisted host, connecting by IP does not activate that VPN, which means that the host will never become reachable.
If you cannot avoid resolving a DNS name yourself, first check to see whether the
CFHost API fulfills your requirements; it provides a list of addresses for a given host instead of just a single address. If the
CFHost API does not meet your needs, use the DNS Service Discovery API.
For more information, read Networking Concepts, “Writing a TCP-Based Server” in Networking Programming Topics, CFHost Reference, and DNS Service Discovery C Reference. For sample code, see SRVResolver.
Understand the IPv6 Transition
The currently predominant version of the Internet Protocol (IPv4) has run out of unique addresses. To fix this problem, Internet Protocol Version 6 (IPv6) is steadily replacing IPv4. Depending on the nature of your program, this transition has different implications:
If you are writing a client-side program using high-level networking APIs and you connect by name, you should not need to change anything in your program for it to work with IPv6 addresses. (If you are not connecting by name, you probably should be. See “Avoid Resolving DNS Names Before Connecting to a Host.”)
If you are writing a server-side program or other low-level networking program, you need to make sure your socket code works correctly with both IPv4 and IPv6 addresses. The RemoteCurrency sample code project illustrates how to do this.
Do Not Use NSSocketPort (OS X) or NSFileHandle for General Socket Communication
Despite its name, the
NSSocketPort class (available in OS X only) is not intended for general network communication. The
NSSocketPort class is part of Cocoa’s distributed objects system, which is intended for controlled communication between Cocoa applications on a single machine or on a local network. For more information on the distributed objects system, see Distributed Objects Programming Topics.
NSFileHandle class is not designed for general networking. The
NSFileHandle class circumvents the standard networking stack, which carries the following drawbacks:
Network connections made with
NSFileHandlecan be significantly less efficient than those made with the standard networking APIs.
NSFileHandlefor networking has resulted in either extremely poor performance or strange, hard-to-debug failures.
There is no straightforward way to use TLS authentication and encryption on connections made with
NSFileHandledoes not automatically activate the device’s cellular modem or on-demand VPN.
NSStream for remote connections and
CFSocket for listening. For details, see “Writing a TCP-Based Server” in Networking Programming Topics.