
Backing up your hard drive is like going to the dentist: important, but
unpleasant enough that it is easy to put off or forget. Because of that, the
best way to keep your data safe is to schedule regular backups. While there are
numerous commercial applications that address this need (including Apple's .mac
Backup), they tend to either require the user to be logged in, or be connected
to an enterprise data center.
The traditional UNIX alternative is to run a command-line backup tool as part
of a 'cron' script, but many system administrators are unaware of the variety of
options available for unattended backups under Mac OS X.
This article discusses some of the issues faced in making backups, the special
considerations of Mac OS X systems, a number of options which allow the
creation of viable backups, and a few potential pitfalls of developing
a backup system. This article does not cover BSD flags, or systems
prior to Mac OS v10.4 Tiger. And for simplicity, this article assumes that you have
not turned on ACLs (using Mac OS X Server or fsaclctl) on the filesystem you are
wanting to back up. There are some subtleties in how these tools interact with
ACLs that are beyond the scope of this article. Please consult the latest
documentation for each tool for more details.
Some History, Including dump and tar
The essential goal of a backup is to reproduce, as accurately as possible, the
state of a filesystem. It is not enough, in most cases, to duplicate the
contents of files; the filesystem hierarchy, permissions, dates, and other
metadata must also be preserved. The two major historical UNIX backup
programs, dump and tar, took very different approaches to this. The dump
utility works by reading the underlying filesystem structures, producing an
exact image of the contents of the filesystem. In historical UNIX, dump works
by scanning the disk's inode tables, building a complete list of all the files
on the disk and their filesystem data structures, scanning all of the
directories, and then scanning the contents of files. This depends heavily
on the UNIX filesystem structure, which is not shared by the HFS+ filesystem.
The tar program, by contrast, works through the filesystem API to build an
image of a directory tree and the files in it. This approach is simpler and
more portable, but the tar file format does not directly address Mac OS X
features such as resource forks, or filesystem metadata.
In either case, reliable backups generally require a quiescent filesystem,
that is, a filesystem which is not being actively modified while the backup
is being made. For instance, if you move a file from one directory to another
during a tar backup, it is possible that it will show up in both directories
in the backup, or (worse) in neither. More subtle errors, or more obvious
ones, can occur. If a large file is being rewritten during a backup, it is
possible for the backup to end up truncating the file, or obtaining an
inconsistent copy of it containing some chunks of the original file and some
of the modified file. This can result in catastrophic corruption and data
loss.
Unfortunately, modern systems are often in use 24/7, making it impossible to
schedule true downtime for backups. In this case, though, an aggressive
backup strategy becomes especially important, given the much higher
probability that a given backup will be corrupt in some way. Also, it may be
useful to consider backup strategies which do not operate at the filesystem
level; for instance, exporting all of the tables from a database is often
an operation guaranteed to produce a reliable image of that database, which
can then be safely copied.
HFS/HFS+ Considerations
The HFS+ filesystem used on Mac OS X has a number of features and traits very
much unlike those of a traditional UNIX filesystem. Quite simply, a backup
program which is written in terms of the portable UNIX filesystem API cannot
create reliable backups of an HFS+ filesystem.
The first example everyone points at is resource forks. Resource forks are
simply ignored by files using the traditional UNIX API, in which opening a
file yields only its data fork. However, there are other issues which can
pose significant challenges to a backup strategy. File attributes (Finder
flags, date stamps, and file type and creator codes, as set
by SetFile and obtained by GetFileInfo) are additional data that UNIX-aware
backup programs not only have no way to read, but no way to store.
Collectively, these are referred to hereafter as "extended attributes".
The HFS+ filesystem does not use inodes in the sense that traditional UNIX
filesystems do. (In fact, hard links are emulated through an elaborate
procedure designed to produce the same behavior as traditional UNIX hard
links.) As a result, a backup strategy based on scanning inodes is simply
impossible.
The filesystem btree structures themselves offer a possible way to go about
backing up files, but in practice, it is best to ignore the implementation and
focus on the API; this will get more portable and consistent results.
Because of the differences between HFS+ and traditional UNIX filesystems (such
as UFS, the Berkeley filesystem available in Mac OS X), special utilities must
be used to create or restore backups from Mac OS X systems. Furthermore,
non-Mac systems may not be usable for the copying or storage of files coming
from a Mac system. If you unpack an archive on another machine, then repack
it from the resulting files, data that can't be represented on that machine
will be lost.
What Works on Mac OS X v.10.4 Tiger
There are several commercial backup systems available for Mac OS X. Details
of their compatibility and feature sets are beyond the scope of this article,
but the vendors will probably be happy to answer questions. Be sure to ask
about all the features you need; not every program compatible with Mac OS X
fully supports its features.
There are a number of utilities distributed with Mac OS X which can back up
files at least partially. The most widely-known is ditto, a utility
specific to Mac OS X, which has been resource-fork aware since it first
appeared. (In versions of Mac OS X prior to 10.4, the -rsrc flag must be
used for ditto to copy resource forks.) The ditto utility can be adapted
to nearly any backup requirement, as it can copy whole file hierarchies,
make archive files, or restore archive files. The archive files created
by ditto can be either in the PkZip format or in the cpio format. Because
ditto can be run from the command line, it is easy to automate backups using
ditto, and easy to identify problems with them, without constant supervision.
Another option is tar. The tar program, starting in Mac OS X 10.4,
recognizes and handles resource forks and attributes.
The key weakness of tar is that it cannot handle
arbitrarily long filenames or pathnames; a deep directory tree can prevent
it from storing file names correctly. (This will be reported as an error.)
Tar can correctly archive and restore resource forks, and attributes.
On older versions of Mac OS X, tar does not preserve any data other than
data forks and standard UNIX file permissions and ownership.
The pax utility is somewhat more flexible than tar, supporting multiple header
formats. It also preserves extended attributes. With some of the alternative
header formats, pax can support much deeper file hierarchies than tar can.
The rsync utility offers substantial improvements in performance when making
backups of large file hierarchies. To copy extended attributes with it,
you must specify the -E flag. You can use rsync across a network to make
centralized backups.
For backup quality and speed, it's hard to beat Apple Software Restore (ASR).
This is the program used to build software restore CDs on HFS+ volumes. To use ASR, make an image of your disk
using Disk Utility (use Create Image From Directory, not Create Image From
Device); this backup can then be restored onto other disks, or even the same
disk. ASR can restore in place, or by reformatting a disk and copying files
onto it. In many cases, the latter usage is much faster, but of course it
does remove any existing files.
The disk images used by ASR are standard Mac disk images, and can be mounted
on the desktop to copy specific files.
There is a command-line asr utility, which can be used in the same ways.
Furthermore, you can specify an existing volume as a source, rather than
a compressed image. If you use asr without the -erase command-line option,
it will simply extract files in place. Note that ASR cannot work on any
source or target which is not a volume; you can't copy files into a
subdirectory with it.
To quickly restore machines to a known-good configuration, you can also use
ASR in server mode, where ASR will provide a multicast stream from a source
image. Client systems cannot use this for an in-place restore; they must
reformat the target volume. This is a good way to back up standard system
configurations, but you will need to use something else for personal files.
Caveats
There are a number of things to keep in mind when developing a backup
strategy. One of the most important is to test your backup strategy under
a variety of circumstances. One of the most obvious things to test is whether
you can actually restore files; a developer who did an archive utility as
a recreational coding project observed that people are always so impressed
with the quality of its backups that no one has ever asked him whether it can
restore files. (It can't, yet.) Before you begin relying on your backup
mechanism, test it by restoring files! Restore a broad variety of files that
exercise all your possibilities; files with resource forks, files with ACLs
(if you use them), and so on.
Making backups on live filesystems gives you a significant chance of losing
some data. Any strategy for backing up live filesystems should acknowledge
this; if you can't afford that risk, make sure you have redundant storage to
begin with. This risk is dramatically compounded when a single working data
set comprises multiple files; you can end up with two files from different
times, with inconsistent states.
In many cases, moving files to non-Mac systems will lose all or part of the
additional data associated with them. (In extreme cases, even filename
capitalization may be lost, or data past 2GB or 4GB in a single file!) ACLs
are not particularly portable, and Mac-specific filesystem data, such as
Finder flags, or type and creator codes, tends to get lost in the conversion.
Because of this, simply copying files over to a server may not preserve them
entirely.
Be especially wary of possible side-effects of changes on a previously working
backup mechanism. If you don't
retest your backups after changing a major filesystem feature, you may lose
data. For example, the fink software installer installs a new copy
of tar (in /sw/bin) which does not handle resource forks, even though the one
distributed with Mac OS X does. Be sure you know exactly which program you're
specifying.
For More Information
For more information, see the Mac OS X Server
Command-Line Administration (PDF) documentation on the Mac OS X Server
resources page in the ADC Reference Library.
Posted: 2005-10-29
|