Apple Developer Connection
Advanced Search
Member Login Log In | Not a Member? Contact ADC

>Command-line Backup Solutions on Mac OS X

Backing up your hard drive is like going to the dentist: important, but unpleasant enough that it is easy to put off or forget. Because of that, the best way to keep your data safe is to schedule regular backups. While there are numerous commercial applications that address this need (including Apple's .mac Backup), they tend to either require the user to be logged in, or be connected to an enterprise data center.

The traditional UNIX alternative is to run a command-line backup tool as part of a 'cron' script, but many system administrators are unaware of the variety of options available for unattended backups under Mac OS X.

This article discusses some of the issues faced in making backups, the special considerations of Mac OS X systems, a number of options which allow the creation of viable backups, and a few potential pitfalls of developing a backup system. This article does not cover BSD flags, or systems prior to Mac OS v10.4 Tiger. And for simplicity, this article assumes that you have not turned on ACLs (using Mac OS X Server or fsaclctl) on the filesystem you are wanting to back up. There are some subtleties in how these tools interact with ACLs that are beyond the scope of this article. Please consult the latest documentation for each tool for more details.

Some History, Including dump and tar

The essential goal of a backup is to reproduce, as accurately as possible, the state of a filesystem. It is not enough, in most cases, to duplicate the contents of files; the filesystem hierarchy, permissions, dates, and other metadata must also be preserved. The two major historical UNIX backup programs, dump and tar, took very different approaches to this. The dump utility works by reading the underlying filesystem structures, producing an exact image of the contents of the filesystem. In historical UNIX, dump works by scanning the disk's inode tables, building a complete list of all the files on the disk and their filesystem data structures, scanning all of the directories, and then scanning the contents of files. This depends heavily on the UNIX filesystem structure, which is not shared by the HFS+ filesystem. The tar program, by contrast, works through the filesystem API to build an image of a directory tree and the files in it. This approach is simpler and more portable, but the tar file format does not directly address Mac OS X features such as resource forks, or filesystem metadata.

In either case, reliable backups generally require a quiescent filesystem, that is, a filesystem which is not being actively modified while the backup is being made. For instance, if you move a file from one directory to another during a tar backup, it is possible that it will show up in both directories in the backup, or (worse) in neither. More subtle errors, or more obvious ones, can occur. If a large file is being rewritten during a backup, it is possible for the backup to end up truncating the file, or obtaining an inconsistent copy of it containing some chunks of the original file and some of the modified file. This can result in catastrophic corruption and data loss.

Unfortunately, modern systems are often in use 24/7, making it impossible to schedule true downtime for backups. In this case, though, an aggressive backup strategy becomes especially important, given the much higher probability that a given backup will be corrupt in some way. Also, it may be useful to consider backup strategies which do not operate at the filesystem level; for instance, exporting all of the tables from a database is often an operation guaranteed to produce a reliable image of that database, which can then be safely copied.

HFS/HFS+ Considerations

The HFS+ filesystem used on Mac OS X has a number of features and traits very much unlike those of a traditional UNIX filesystem. Quite simply, a backup program which is written in terms of the portable UNIX filesystem API cannot create reliable backups of an HFS+ filesystem.

The first example everyone points at is resource forks. Resource forks are simply ignored by files using the traditional UNIX API, in which opening a file yields only its data fork. However, there are other issues which can pose significant challenges to a backup strategy. File attributes (Finder flags, date stamps, and file type and creator codes, as set by SetFile and obtained by GetFileInfo) are additional data that UNIX-aware backup programs not only have no way to read, but no way to store. Collectively, these are referred to hereafter as "extended attributes".

The HFS+ filesystem does not use inodes in the sense that traditional UNIX filesystems do. (In fact, hard links are emulated through an elaborate procedure designed to produce the same behavior as traditional UNIX hard links.) As a result, a backup strategy based on scanning inodes is simply impossible.

The filesystem btree structures themselves offer a possible way to go about backing up files, but in practice, it is best to ignore the implementation and focus on the API; this will get more portable and consistent results.

Because of the differences between HFS+ and traditional UNIX filesystems (such as UFS, the Berkeley filesystem available in Mac OS X), special utilities must be used to create or restore backups from Mac OS X systems. Furthermore, non-Mac systems may not be usable for the copying or storage of files coming from a Mac system. If you unpack an archive on another machine, then repack it from the resulting files, data that can't be represented on that machine will be lost.

What Works on Mac OS X v.10.4 Tiger

There are several commercial backup systems available for Mac OS X. Details of their compatibility and feature sets are beyond the scope of this article, but the vendors will probably be happy to answer questions. Be sure to ask about all the features you need; not every program compatible with Mac OS X fully supports its features.

There are a number of utilities distributed with Mac OS X which can back up files at least partially. The most widely-known is ditto, a utility specific to Mac OS X, which has been resource-fork aware since it first appeared. (In versions of Mac OS X prior to 10.4, the -rsrc flag must be used for ditto to copy resource forks.) The ditto utility can be adapted to nearly any backup requirement, as it can copy whole file hierarchies, make archive files, or restore archive files. The archive files created by ditto can be either in the PkZip format or in the cpio format. Because ditto can be run from the command line, it is easy to automate backups using ditto, and easy to identify problems with them, without constant supervision.

Another option is tar. The tar program, starting in Mac OS X 10.4, recognizes and handles resource forks and attributes. The key weakness of tar is that it cannot handle arbitrarily long filenames or pathnames; a deep directory tree can prevent it from storing file names correctly. (This will be reported as an error.) Tar can correctly archive and restore resource forks, and attributes. On older versions of Mac OS X, tar does not preserve any data other than data forks and standard UNIX file permissions and ownership.

The pax utility is somewhat more flexible than tar, supporting multiple header formats. It also preserves extended attributes. With some of the alternative header formats, pax can support much deeper file hierarchies than tar can.

The rsync utility offers substantial improvements in performance when making backups of large file hierarchies. To copy extended attributes with it, you must specify the -E flag. You can use rsync across a network to make centralized backups.

For backup quality and speed, it's hard to beat Apple Software Restore (ASR). This is the program used to build software restore CDs on HFS+ volumes. To use ASR, make an image of your disk using Disk Utility (use Create Image From Directory, not Create Image From Device); this backup can then be restored onto other disks, or even the same disk. ASR can restore in place, or by reformatting a disk and copying files onto it. In many cases, the latter usage is much faster, but of course it does remove any existing files.

The disk images used by ASR are standard Mac disk images, and can be mounted on the desktop to copy specific files.

There is a command-line asr utility, which can be used in the same ways. Furthermore, you can specify an existing volume as a source, rather than a compressed image. If you use asr without the -erase command-line option, it will simply extract files in place. Note that ASR cannot work on any source or target which is not a volume; you can't copy files into a subdirectory with it.

To quickly restore machines to a known-good configuration, you can also use ASR in server mode, where ASR will provide a multicast stream from a source image. Client systems cannot use this for an in-place restore; they must reformat the target volume. This is a good way to back up standard system configurations, but you will need to use something else for personal files.

Caveats

There are a number of things to keep in mind when developing a backup strategy. One of the most important is to test your backup strategy under a variety of circumstances. One of the most obvious things to test is whether you can actually restore files; a developer who did an archive utility as a recreational coding project observed that people are always so impressed with the quality of its backups that no one has ever asked him whether it can restore files. (It can't, yet.) Before you begin relying on your backup mechanism, test it by restoring files! Restore a broad variety of files that exercise all your possibilities; files with resource forks, files with ACLs (if you use them), and so on.

Making backups on live filesystems gives you a significant chance of losing some data. Any strategy for backing up live filesystems should acknowledge this; if you can't afford that risk, make sure you have redundant storage to begin with. This risk is dramatically compounded when a single working data set comprises multiple files; you can end up with two files from different times, with inconsistent states.

In many cases, moving files to non-Mac systems will lose all or part of the additional data associated with them. (In extreme cases, even filename capitalization may be lost, or data past 2GB or 4GB in a single file!) ACLs are not particularly portable, and Mac-specific filesystem data, such as Finder flags, or type and creator codes, tends to get lost in the conversion. Because of this, simply copying files over to a server may not preserve them entirely.

Be especially wary of possible side-effects of changes on a previously working backup mechanism. If you don't retest your backups after changing a major filesystem feature, you may lose data. For example, the fink software installer installs a new copy of tar (in /sw/bin) which does not handle resource forks, even though the one distributed with Mac OS X does. Be sure you know exactly which program you're specifying.

For More Information

For more information, see the Mac OS X Server Command-Line Administration (PDF) documentation on the Mac OS X Server resources page in the ADC Reference Library.

Posted: 2005-10-29