Performance Tips

If your app works with a lot of files, the performance of its file-related code is very important. Relative to other types of operations, accessing files on disk is one of the slowest operations a computer can perform. Depending on the size and number of files, it can take anywhere from a few milliseconds to several minutes to read files from a disk-based hard drive. Therefore, you should make sure your code performs as efficiently as possible under even light to moderate work loads.

If your app slows down or becomes less responsive when it starts working with files, use the Instruments app to gather some baseline metrics. Instruments can show you how much time your app spends operating on files and help you monitor various file-related activity. As you fix each problem, be sure to run your code in Instruments again and record the results so that you can verify whether your changes worked.

Things to Look For in Your Code

If you are not sure where to start looking for potential fixes to your file-related code, here are some tips on where to start looking.

Use Modern File System Interfaces

When deciding which routines to call, choose ones that let you specify paths using NSURL objects over those that specify paths using strings. Most of the URL-based routines were introduced in OS X 10.6 and later and were designed from the beginning to take advantage of technologies like Grand Central Dispatch. This gives your code an immediate advantage on multicore computers while not requiring you to do much work.

You should also prefer routines that accept block objects over those that accept callback functions or methods. Blocks are a convenient and more efficient way to implement callback-type behaviors. In practice, blocks often require much less code to implement because they do not require you to define and manage a context data structure for passing data. Some routines might also execute your block by scheduling it in a GCD queue, which can also improve performance.

General Tips

What follows are some basic recommendations for reducing the I/O activity of your program. These may help improve your file-system related performance, but as with all tips be sure to measure before and after so that you can verify any performance gains.

The System Has its Own File Caching Mechanism

Disk caching can be a good way to accelerate access to file data, but its use is not appropriate in every situation. Caching increases the memory footprint of your app and if used inappropriately can be more expensive than simply reloading data from the disk.

Caching is most appropriate for files you plan to access multiple times. If you have files you only intend to use once, you should either disable the caches or map the file into memory.

Disabling File-System Caching

When reading data that you are certain you won’t need again soon, such as streaming a large multimedia file, tell the file system not to add that data to the file-system caches. By default, the system maintains a buffer cache with the data most recently read from disk. This disk cache is most effective when it contains frequently used data. If you leave file caching enabled while streaming a large multimedia file, you can quickly fill up the disk cache with data you won’t use again. Even worse is that this process is likely to push other data out of the cache that might have benefited from being there.

Apps can call the BSD fcntl function with the F_NOCACHE flag to enable or disable caching for a file. For more information about this function, see fcntl.

Using Mapped I/O Instead of Caching

If you intend to read data randomly from a file, you can improve performance in some situations by mapping that file directly into your app’s virtual memory space. File mapping is a programming convenience for files you want to access with read-only permissions. It lets the kernel take advantage of the virtual memory paging mechanism to read the file data only when it is needed. You can also use file mapping to overwrite existing bytes in a file; however, you cannot extend the size of file using this technique. Mapped files bypass the system disk caches, so only one copy of the file is stored in memory.

For more information about mapping files into memory, see File System Advanced Programming Topics.

Zero-Fill Delays Provide Security at a Cost

For security reasons, file systems are supposed to zero out areas on disk when they are allocated to a file. This behavior prevents data leftover from a previously deleted file from being included with the new file. The HFS Plus file system used by OS X has always implemented this zero-fill behavior.

For both reading and writing operations, the system delays the writing of zeroes until the last possible moment. When you close a file after writing to it, the system writes zeroes to any portions of the file your code did not touch. When reading from a file, the system writes zeroes to new areas only when your code attempts to read from that area or when it closes the file. This delayed-write behavior avoids redundant I/O operations to the same area of a file.

If you notice a delay when closing your files, it is likely because of this zero-fill behavior. Make sure you do the following when working with files:

  • Write data to files sequentially. Gaps in writing must be filled with zeros when the file is saved.

  • Do not move the file pointer past the end of the file and then close the file.

  • Truncate files to match the length of the data you wrote. For scratch files you plan to delete, truncate the file to zero-length.