|
This Technote
describes the on-disk format for an HFS Plus volume. It does
not describe any programming interfaces for HFS Plus
volumes.
This technote is directed at developers who need to work
with HFS Plus at a very low level, below the abstraction
provided by the File Manager programming interface. This
includes developers of disk recovery utilities and
programmers implementing HFS Plus support on other
platforms.
This technote assumes that you have a conceptual
understanding of the HFS volume format, as described in
Inside
Macintosh: Files.
[Mar 05, 2004]
|
HFS Plus Basics
HFS Plus is a volume format for Mac OS. HFS
Plus was introduced with Mac
OS 8.1. HFS Plus is architecturally very similar to
HFS, although there have been a number of changes. The
following table summarizes the important differences.
Table 1 HFS and HFS Plus Compared
Feature
|
HFS
|
HFS Plus
|
Benefit/Comment
|
User visible name
|
Mac OS Standard
|
Mac OS Extended
|
|
Number of allocation blocks
|
16 bits worth
|
32 bits worth
|
Radical decrease in disk space used on large
volumes, and a larger number of files per volume.
|
Long file names
|
31 characters
|
255 characters
|
Obvious user benefit; also improves
cross-platform compatibility
|
File name encoding
|
MacRoman
|
Unicode
|
Allows for international-friendly file names,
including mixed script names
|
File/folder attributes
|
Support for fixed size attributes (FileInfo and
ExtendedFileInfo)
|
Allows for future meta-data extensions
|
Future systems may use metadata for a richer
Finder experience
|
OS startup support
|
System Folder ID
|
Also supports a dedicated startup file
|
May help non-Mac OS systems to boot from HFS
Plus volumes
|
catalog node size
|
512 bytes
|
4 KB
|
Maintains efficiency in the face of the other
changes. (This larger catalog node size is due to
the much longer file names [512 bytes as opposed to
32 bytes], and larger catalog records (because of
more/larger fields)).
|
Maximum file size
|
231 bytes
|
263 bytes
|
Obvious user benefit, especially for multimedia
content creators. |
The extent to which these HFS Plus features are available
through a programming interface is OS dependent. Mac OS versions
less than 9.0 do not provide programming interfaces for any
HFS Plus-specific features.
To summarize, the key goals that guided the design of the
HFS Plus volume format were:
- efficient use of disk space,
- international-friendly file names,
- future support for named forks, and
- ease booting on non-Mac OS operating systems.
The following sections describes these goals, and the
differences between HFS and HFS Plus required to meet these
goals.
Efficient Use of Disk Space
HFS divides the total space on a volume into equal-sized
pieces called allocation blocks. It uses 16-bit fields to
identify a particular allocation block, so there must be
less than 216 (65,536) allocation blocks on an
HFS volume. The size of an allocation block is typically the
smallest multiple of 512 such that there are less than
65,536 allocation blocks on the volume (i.e., the volume
size divided by 65,535, rounded up to a multiple of 512).
Any non-empty fork must occupy an integral number of
allocation blocks. This means that the amount of space
occupied by a fork is rounded up to a multiple of the
allocation block size. As volumes (and therefore allocation
blocks) get bigger, the amount of allocated but unused space
increases.
HFS Plus uses 32-bit values to identify allocation
blocks. This allows up to 2 32 (4,294,967,296)
allocation blocks on a volume. More allocation blocks means
a smaller allocation block size, especially on volumes of 1
GB or larger, which in turn means less average wasted space
(the fraction of an allocation block at the end of a fork,
where the entire allocation block is not actually used). It
also means you can have more files, since the available
space can be more finely distributed among a larger number
of files. This change is especially beneficial if the volume
contains a large number of small files.
International-Friendly File Names
HFS uses 31-byte strings to store file names. HFS does
not store any kind of script information with the file name
to indicate how it should be interpreted. File names are
compared and sorted using a routine that assumes a Roman
script, wreaking havoc for names that use some other script
(such as Japanese). Worse, this algorithm is buggy, even for
Roman scripts. The Finder and other applications interpret
the file name based on the script system in use at runtime.
Note:
The problem with using non-Roman scripts in an HFS
file name is that HFS compares file names in a case-
insensitive fashion. The case-insensitive
comparison algorithm assume a MacRoman encoding.
When presented with non-Roman text, this algorithm
fails in strange ways. The upshot is that HFS
decides that certain non-Roman file names are
duplicates of other file names, even though they
are not duplicates in the source encoding.
|
HFS Plus uses up to 255 Unicode characters to store file
names. Allowing up to 255 characters makes it easier to have
very descriptive names. Long names are especially useful
when the name is computer-generated (such as Java class
names).
The HFS catalog B-tree uses 512-byte nodes. An HFS Plus
file name can occupy up to 512 bytes (including the length
field). Since a B-tree index node must store at least two
keys (plus pointers and node descriptor), the HFS Plus
catalog must use a larger node size. The typical node size
for an HFS Plus catalog B-tree is 4 KB.
In the HFS catalog B-tree, the keys stored in an index
node always occupy a fixed amount of space, the maximum key
size. In HFS Plus, the keys in an index node may occupy a
variable amount of space determined by the actual size of
the key. This allows for less wasted space in index nodes
and creates, on typical disks, a substantially larger
branching factor in the tree (requiring fewer node accesses
to find any given record).
Future Support for Named Forks
Files on an HFS volume have two forks: a data fork and a
resource fork, either of which may be empty (zero length).
Files and directories also contain a small amount of
additional information (known as catalog information or
metadata) such as the modification date or Finder info.
Apple software teams and third-party developers often
need to store information associated with particular files
and directories. In some cases (for example, custom icons for
files), the data or resource fork is appropriate. But in
other cases (for example, custom icons for directories, or File
Sharing access privileges), using the data or resource fork
is not appropriate or not possible.
A number of products have implemented special-purpose
solutions for storing their file- and directory-related
data. But because these are not managed by the file system,
they can become inconsistent with the file and directory
structure.
HFS Plus has an attribute file, another B-tree, that can
be used to store additional information for a file or
directory. Since it is part of the volume format, this
information can be kept with the file or directory as is it
moved or renamed, and can be deleted when the file or
directory is deleted. The contents of the attribute file's
records have not been fully defined yet, but the goal is to
provide an arbitrary number of forks, identified by Unicode
names, for any file or directory.
Note:
Because the attributes file has not been fully
defined yet, current implementations are unable to
delete named forks when a file or directory is
deleted. Future implementations that properly
delete named forks will need to check for these
orphaned named forks and delete them when the
volume is mounted. The
lastMountedVersion field of the volume
header can be used to detect when such a check
needs to take place.
Whenever possible, an application should delete
named forks rather than orphan them.
|
Easy Startup of Alternative Operating Systems
HFS Plus defines a special startup file,
an unstructured fork that can be found easily during system
startup. The location and size of the startup file is
described in the volume header.
The startup file is especially useful on systems that don't
have HFS or HFS Plus support in ROM. In many respects, the
startup file is a generalization of the HFS boot blocks, one
that provides a much larger, variable-sized amount of
storage.
Back to top
Core Concepts
HFS Plus uses a number of interrelated structures to
manage the organization of data on the volume. These
structures include:
Each of these complex structures is described in its own
section. The purpose of this section is to give an overview
of the volume format, describe how the structures fit
together, and define the primitive data types used by HFS
Plus.
Terminology
HFS Plus is a specification of how a volume (files
that contain user data, along with the structure to retrieve
that data) exists on a disk (the medium on which user
data is stored). The storage space on a disk is divided into
units called sectors. A sector is the smallest
part of a disk that the disk's driver software will read or write
in a single operation (without having to read or write additional
data before or after the requested data). The size of a sector is
usually based on the way the data is physically laid out on the disk.
For hard disks, sectors are typically 512 bytes. For optical media,
sectors are typically 2048 bytes.
Most of the data structures on an HFS Plus volume do not
depend on the size of a sector, with the exception of the
journal. Because the journal does rely
on accessing individual sectors, the sector size is stored
in the jhdr_size field of the
journal header (if the
volume has a journal).
HFS Plus allocates space in units called allocation
blocks; an allocation block is simply a group of
consecutive bytes. The size (in bytes) of an allocation
block is a power of two, greater than or equal to 512, which
is set when the volume is initialized. This value cannot be
easily changed without reinitializing the volume. Allocation
blocks are identified by a 32-bit allocation block
number, so there can be at most 232
allocation blocks on a volume. Current implementations of the
file system are optimized for 4K allocation blocks.
Note:
For the best performance, the allocation block size should
be a multiple of the sector size. If the
volume has an HFS wrapper, the
wrapper's allocation block size and allocation block start
should also be multiples of the sector size to
allow the best performance.
|
All of the volume's structures, including the volume
header, are part of one or more allocation blocks (with the possible
exception of the alternate volume header, discussed
below). This differs from HFS,
which has several structures (including the boot blocks, master
directory block, and bitmap) which are not part of any
allocation block.
To promote file contiguity and avoid
fragmentation, disk space is typically allocated to files in
groups of allocation blocks, or clumps. The clump size
is always a multiple of the allocation block size. The default
clump size is specified in the volume header.
IMPORTANT:
The actual algorithm used to extend files is not part
of this specification. The implementation is not
required to act on the clump values in the volume
header; it merely provides space to store those
values.
|
Note:
The current non-contiguous algorithm in Mac OS will
begin allocating at the next free block it finds.
It will extend its allocation up to a multiple of
the clump size if there is sufficient free space
contiguous with the end of the requested
allocation. Space is not allocated in contiguous
clump-sized pieces.
|
Every HFS Plus volume must have a volume header.
The volume header contains sundry information about the
volume, such as the date and time of the volume's creation
and the number of files on the volume, as well as the
location of the other key structures on the volume. The
volume header is always located at 1024 bytes from the
start of the volume.
A copy of the volume header, known as the alternate
volume header, is stored starting at 1024 bytes before
the end of the volume. The first 1024 bytes of volume
(before the volume header), and the last 512 bytes of the
volume (after the alternate volume header) are
reserved.
All of the allocation blocks containing the volume header,
alternate volume header, or the reserved areas before the
volume header or after the alternate volume header, are marked
as used in the allocation file.
The actual number of allocation blocks marked this way depends
on the allocation block size.
An HFS Plus volume contains five special files,
which store the file system structures required to access
the file system payload: folders, user files, and
attributes. The special files are the catalog file, the
extents overflow file, the allocation file, the attributes
file and the startup file. Special files only have a single
fork (the data fork) and the extents of that fork are
described in the volume header.
The catalog file is a special file that describes
the folder and file hierarchy on a volume. The catalog file
contains vital information about all the files and folders
on a volume, as well as the catalog information, for
the files and folders that are stored in the catalog file.
The catalog file is organized as a B-tree (or "balanced
tree") to allow quick and efficient searches through a large
folder hierarchy.
The catalog file stores the file and folder names, which
consist of up to 255 Unicode characters, as described
below.
Note:
The B-Trees section contains
an in-depth description of the B-trees used by HFS
Plus. |
The attributes file is another special file which
contains additional data for a file or folder. Like the
catalog file, the attributes file is organized as a B-tree.
In the future, it will be used to store information about
additional forks. (This is similar to the way the catalog
file stores information about the data and resource forks of
a file.)
HFS Plus tracks which allocation blocks belong to a fork
by maintaining a list of the fork's extents. An
extent is a contiguous range of allocation blocks
allocated to some fork, represented by a pair of numbers:
the first allocation block number and the number of
allocation blocks. For a user file, the first eight extents
of each fork are stored in the volume's catalog file. Any
additional extents are stored in the extents overflow
file, which is also organized as a B-tree.
The extents overflow file also stores additional extents
for the special files except for the extents overflow file
itself. However, if the startup file requires more than the
eight extents in the Volume Header (and thus requires
additional extents in the extents overflow file), it would
be much harder to access, and defeat the purpose of the
startup file. So, in practice, a startup file should be
allocated such that it doesn't need additional extents in
the extents overflow file.
The allocation file is a special file which
specifies whether an allocation block is used or free. This
performs the same role as the HFS volume bitmap, although
making it a file adds flexibility to the volume format.
The startup file is another special file which
facilitates booting of non-Mac OS computers from HFS Plus
volumes.
Finally, the bad block file prevents the volume
from using certain allocation blocks because the portion of
the media that stores those blocks is defective. The bad
block file is neither a special file nor a user file; this
is merely convention used in the extents overflow file. See
Bad Block File for more details.
Broad Structure
The bulk of an HFS Plus volume consists of seven types of
information or areas:
- user file forks,
- the allocation file (bitmap),
- the catalog file,
- the extents overflow file,
- the attributes file,
- the startup file, and
- unused space.
The general structure of an HFS Plus volume is
illustrated in Figure 1.
Figure 1. Organization of an HFS Plus
Volumes.
The volume header is always at a fixed
location (1024 bytes from the start of the volume).
However, the special files can appear anywhere between
the volume header block and the alternate volume header
block. These files can appear in any order and are not
necessarily contiguous.
The information on HFS Plus volumes (with the possible
exception of the alternate volume header, as discussed
below) is organized solely
in allocation blocks. Allocation blocks are simply a means
of grouping space on the media into convenient parcels.
The size of an allocation block is a power of two,
and at least 512. The allocation block size is a volume
header parameter whose value is set when the volume is
initialized; it cannot be changed easily without
reinitializing the volume.
Note:
The allocation block size is a classic
speed-versus- space tradeoff. Increasing the
allocation block size decreases the size of the
allocation file, and often reduces the number of
separate extents that must be manipulated for every
file. It also tends to increase the average size of
a disk I/O, which decreases overhead. Decreasing
the allocation block size reduces the average
number of wasted bytes per file, making more
efficient use of the volume's space.
|
WARNING:
While HFS Plus disks with an allocation block size
smaller than 4 KB are legal, DTS recommends that
you always use a minimum 4 KB allocation block
size. Disks with a smaller allocation block size
will be markedly slower when used on systems that
do 4 KB clustered I/O, such as Mac OS X Server.
|
Primitive Data Types
This section describes the primitive data types used on
an HFS Plus volume. All data structures in this volume are
defined in the C language. The specification assumes that
the compiler will not insert any padding fields. Any
necessary padding fields are explicitly declared.
IMPORTANT:
The HFS Plus volume format is largely derived from
the HFS volume format. When defining the new
format, it was decided to remove unused fields
(primarily legacy MFS fields) and arrange all the
remaining fields so that similar fields were
grouped together and that all fields had proper
alignment (using PowerPC alignment rules).
|
Reserved and Pad
Fields
In many places this specification describes a field, or
bit within a field, as reserved. This has a definite
meaning, namely:
- When creating a structure with a reserved field, an
implementation must set the field to zero.
- When reading existing structures, an implementation
must ignore any value in the field.
- When modifying a structure with a reserved field, an
implementation must preserve the value of the reserved
field.
This definition allows for backward-compatible
enhancements to the volume format.
Pad fields have exactly the same semantics as a reserved
field. The different name merely reflects the designer's
goals when including the field, not the behavior of the
implementation.
Integer Types
All integer values are defined by one of the following
primitive types: UInt8 , SInt8 ,
UInt16 , SInt16 ,
UInt32 , SInt32 ,
UInt64 , and SInt64 . These
represent unsigned and signed (2's complement) 8-bit,
16-bit, 32-bit, and 64-bit numbers.
All multi-byte integer values are stored in big-endian
format. That is, the bytes are stored in order from most
significant byte through least significant byte, in
consecutive bytes, at increasing offset from the start of a
block. Bits are numbered from 0 to n-1 (for types
UInt n and SInt n),
with bit 0 being the least significant bit.
HFS Plus Names
File and folder names on HFS Plus consist of up to 255
Unicode characters with a preceding 16-bit length, defined
by the type HFSUniStr255 .
struct HFSUniStr255 {
UInt16 length;
UniChar unicode[255];
};
typedef struct HFSUniStr255 HFSUniStr255;
typedef const HFSUniStr255 *ConstHFSUniStr255Param;
|
UniChar is a UInt16 that
represents a character as defined in the Unicode character
set defined by The Unicode Standard, Version 2.0
[Unicode, Inc. ISBN 0-201-48345-9].
HFS Plus stores strings fully decomposed and in canonical
order. HFS Plus compares strings in a case-insensitive
fashion. Strings may contain Unicode characters that must
be ignored by this comparison. For more details on these
subtleties, see Unicode
Subtleties.
A variant of HFS Plus, called HFSX,
allows volumes whose names are compared in a case-sensitive
fashion. The names are fully decomposed and in canonical order,
but no Unicode characters are ignored during the comparison.
Text Encodings
Traditional Mac OS programming interfaces pass filenames as
Pascal strings (either as a StringPtr or as a
Str63 embedded in an FSSpec ). The
characters in those strings are not Unicode; the encoding
varies depending on how the system software was localized
and what language kits are installed. Identical sequences of
bytes can represent vastly different Unicode character
sequences. Similarly, many Unicode characters belong to more
than one Mac OS text encoding.
HFS Plus includes two features specifically designed to
help Mac OS handle the conversion between Mac OS-encoded
Pascal strings and Unicode. The first feature is the
textEncoding field of the file and folder
catalog records. This field is defined as a hint to be used
when converting the record's Unicode name back to a Mac OS-
encoded Pascal string.
The valid values for the textEncoding field
are defined in Table 2.
Table 2 Text Encodings
Encoding Name
|
Value
|
Encoding Name
|
Value
|
MacRoman
|
0
|
MacThai
|
21
|
MacJapanese
|
1
|
MacLaotian
|
22
|
MacChineseTrad
|
2
|
MacGeorgian
|
23
|
MacKorean
|
3
|
MacArmenian
|
24
|
MacArabic
|
4
|
MacChineseSimp
|
25
|
MacHebrew
|
5
|
MacTibetan
|
26
|
MacGreek
|
6
|
MacMongolian
|
27
|
MacCyrillic
|
7
|
MacEthiopic
|
28
|
MacDevanagari
|
9
|
MacCentralEurRoman
|
29
|
MacGurmukhi
|
10
|
MacVietnamese
|
30
|
MacGujarati
|
11
|
MacExtArabic
|
31
|
MacOriya
|
12
|
MacSymbol
|
33
|
MacBengali
|
13
|
MacDingbats
|
34
|
MacTamil
|
14
|
MacTurkish
|
35
|
MacTelugu
|
15
|
MacCroatian
|
36
|
MacKannada
|
16
|
MacIcelandic
|
37
|
MacMalayalam
|
17
|
MacRomanian
|
38
|
MacSinhalese
|
18
|
MacFarsi
|
140 (49)
|
MacBurmese
|
19
|
MacUkrainian
|
152 (48)
|
MacKhmer
|
20
|
|
|
IMPORTANT:
Non-Mac OS implementations of HFS Plus may choose
to simply ignore the textEncoding
field. In this case, the field must be treated as
a reserved
field.
|
Note:
Mac OS uses the textEncoding field in
the following way. When a file or folder is created
or renamed, Mac OS converts the supplied Pascal
string to a HFSUniStr255 . It stores
the source text encoding in the
textEncoding field of the catalog
record. When Mac OS needs to create a Pascal string
for that record, it uses the
textEncoding as a hint to the text
conversion process. This hint ensures a high-degree
of round-trip conversion fidelity, which in turn
improves compatibility.
|
The second use of text encodings in HFS Plus is the
encodingsBitmap field of the volume header. For
each encoding used by a catalog node on the volume, the
corresponding bit in the encodingsBitmap field
must be set.
It is acceptable for a bit in this bitmap to be set even
though no names on the volume use that encoding. This means
that when an implementation deletes or renames an object, it
does not have to clear the encoding bit if that was the last
name to use the given encoding.
IMPORTANT:
The text encoding value is used as the number of
the bit to set in encodingsBitmap to
indicate that the encoding is used on the volume.
However, encodingsBitmap is only 64
bits long, and thus the text encoding values for
MacFarsi and MacUkrainian cannot be used as bit
numbers. Instead, another bit number (shown in
parenthesis) is used.
|
Note:
Mac OS uses the encodingsBitmap field
to determine which text encoding conversion tables
to load when the volume is mounted. Text encoding
conversion tables are large, and loading them
unnecessarily is a waste of memory. Most systems
only use one text encoding, so there is a
substantial benefit to recording which encodings
are required on a volume-by-volume basis.
|
WARNING:
Non-Mac OS implementations of HFS Plus must
correctly maintain the encodingsBitmap
field. Specifically, if the implementation sets the
textEncoding field a catalog record to
a text-encoding value, it must ensure that the
corresponding bit is set in
encodingsBitmap to ensure correct
operation when that disk is mounted on a system
running Mac OS.
|
HFS Plus Dates
HFS Plus stores dates in several data structures,
including the volume header and catalog records. These dates
are stored in unsigned 32-bit integers (UInt32 )
containing the number of seconds since midnight, January 1,
1904, GMT. This is slightly different from HFS, where the
value represents local time.
The maximum representable date is February 6, 2040 at
06:28:15 GMT.
The date values do not account for leap seconds. They do
include a leap day in every year that is evenly divisible by
four. This is sufficient given that the range of
representable dates does not contain 1900 or 2100, neither
of which have leap days.
The implementation is responsible for converting these
times to the format expected by client software. For
example, the Mac OS File Manager passes dates in local time;
the Mac OS HFS Plus implementation converts dates between
local time and GMT as appropriate.
Note:
The creation date stored in
the Volume Header is NOT stored in GMT; it is
stored in local time. The reason for this is that
many applications (including backup utilities) use
the volume's creation date as a relatively unique
identifier. If the date was stored in GMT, and
automatically converted to local time by an
implementation (like Mac OS), the value would
appear to change when the local time zone or
daylight savings time settings change (and thus
cause some applications to improperly identify the
volume). The use of the volume's creation date as a
unique identifier outweighs its use as a date. This
change was introduced late in the Mac OS 8.1
project.
|
HFS Plus Permissions
For each file and folder, HFS Plus maintains a record
containing access permissions, defined by the
HFSPlusBSDInfo structure.
struct HFSPlusBSDInfo {
UInt32 ownerID;
UInt32 groupID;
UInt8 adminFlags;
UInt8 ownerFlags;
UInt16 fileMode;
union {
UInt32 iNodeNum;
UInt32 linkCount;
UInt32 rawDevice;
} special;
};
typedef struct HFSPlusBSDInfo HFSPlusBSDInfo;
|
The fields have the following meaning:
ownerID
- The Mac OS X user ID of the owner of the file or folder.
Mac OS X versions prior to 10.3 treats user ID 99 as if it was the user ID of the
user currently logged in to the console. If no user is logged in to the
console, user ID 99 is treated as user ID 0 (root). Mac OS X version 10.3
treats user ID 99 as if it was the user ID of the process making the call
(in effect, making it owned by everyone simultaneously). These substitutions
happen at run time. The actual user ID on disk is not changed.
groupID
- The Mac OS X group ID of the group associated with the
file or folder. Mac OS X typically maps group ID 99 to the group
named "unknown." There is no run time substitution of group IDs in Mac OS X.
adminFlags
- BSD flags settable by the super-user only. This field
corresponds to bits 16 through 23 of the
st_flags field of
struct stat in Mac OS X. See the
manual page for chflags(2) for more information. The following table
gives the bit position in the adminFlags field and the name of the
corresponding mask used in the st_flags field.
Bit | st_flags mask | Meaning |
0 | SF_ARCHIVED | File has been archived |
1 | SF_IMMUTABLE | File may not be changed |
2 | SF_APPEND | Writes to file may only append |
ownerFlags
- BSD flags settable by the owner of the file or directory,
or by the super-user. This field corresponds to bits 0 through 7 of the
st_flags field of
struct stat in Mac OS X. See the
manual page for chflags(2) for more information. The following table
gives the bit position in the ownerFlags field and the name of the
corresponding mask used in the st_flags field.
Bit | st_flags mask | Meaning |
0 | UF_NODUMP | Do not dump (back up or archive) this file |
1 | UF_IMMUTABLE | File may not be changed |
2 | UF_APPEND | Writes to file may only append |
3 | UF_OPAQUE | Directory is opaque (see below) |
fileMode
- BSD file type and mode bits. Note that the constants from the header
shown below are in octal (base eight), not hexadecimal.
#define S_ISUID 0004000 /* set user id on execution */
#define S_ISGID 0002000 /* set group id on execution */
#define S_ISTXT 0001000 /* sticky bit */
#define S_IRWXU 0000700 /* RWX mask for owner */
#define S_IRUSR 0000400 /* R for owner */
#define S_IWUSR 0000200 /* W for owner */
#define S_IXUSR 0000100 /* X for owner */
#define S_IRWXG 0000070 /* RWX mask for group */
#define S_IRGRP 0000040 /* R for group */
#define S_IWGRP 0000020 /* W for group */
#define S_IXGRP 0000010 /* X for group */
#define S_IRWXO 0000007 /* RWX mask for other */
#define S_IROTH 0000004 /* R for other */
#define S_IWOTH 0000002 /* W for other */
#define S_IXOTH 0000001 /* X for other */
#define S_IFMT 0170000 /* type of file mask */
#define S_IFIFO 0010000 /* named pipe (fifo) */
#define S_IFCHR 0020000 /* character special */
#define S_IFDIR 0040000 /* directory */
#define S_IFBLK 0060000 /* block special */
#define S_IFREG 0100000 /* regular */
#define S_IFLNK 0120000 /* symbolic link */
#define S_IFSOCK 0140000 /* socket */
#define S_IFWHT 0160000 /* whiteout */
|
In some versions of Unix, the sticky bit, S_ISTXT , is used
to indicate that an executable file's code should remain in memory
after the executable finishes; this can help performance if the same
executable is used again soon. Mac OS X does not use this optimization.
If the sticky bit is set for a directory, then Mac OS X restricts
movement, deletion, and renaming of files in that directory.
Files may be removed or renamed only if the user has write access
to the directory; and is the owner of the file or the directory,
or is the super-user.
special
- This field is used only for certain special kinds of files.
For directories, and most files, this field is unused and
reserved. When used,
this field is used as one of the following:
iNodeNum
- For hard link files, this field contains the link reference number.
See the Hard Links section for more
information.
linkCount
- For indirect node files, this field contains the number of hard links
that point at this indirect node file. See the
Hard Links section for more information.
rawDevice
- For block and character special devices files (when the
S_IFMT
field contains S_IFCHR or S_IFBLK ), this field
contains the device number.
WARNING:
Mac OS 8 and 9 treat the permissions as
reserved.
|
Note:
The S_IFWHT and UF_OPAQUE
values are used when the file system is mounted as part
of a union mount. A union mount presents the
combination (union) of several file systems as a single
file system. Conceptually, these file systems are
layered, one on top of another. If a file or directory
appears in multiple layers, the one in the top most
layer is used. All changes are made to the top most
file system only; the others are read-only. To delete a
file or directory that appears in a layer other than the
top layer, a whiteout entry (file type
S_IFWHT ) is created in the top layer. If a
directory that appears in a layer other than the top
layer is deleted and later recreated, the contents in
the lower layer must be hidden by setting the
UF_OPAQUE flag in the directory in the top
layer. Both S_IFWHT and
UF_OPAQUE hide corresponding names in lower
layers by preventing a union mount from accessing the
same file or directory name in a lower layer.
|
Note:
If the S_IFMT field (upper 4 bits) of the fileMode
field is zero, then Mac OS X assumes that the permissions structure is
uninitialized, and internally uses default values for all of the fields.
The default user and group IDs are 99, but can be changed at the time the
volume is mounted. This default ownerID is then subject to
substitution as described above.
This means that files created by Mac OS 8 and 9, or any other implementation
that sets the permissions fields to zeroes, will behave as if the
"ignore ownership" option is enabled for those files, even if "ignore
ownership" is disabled for the volume as a whole.
|
Fork Data Structure
HFS Plus maintains information about the contents of a
file using the HFSPlusForkData structure. Two
such structures -- one for the resource and one for the data
fork -- are stored in the catalog record for each user file.
In addition, the volume header contains a fork data
structure for each special file.
An unused extent descriptor in an extent record would
have both startBlock and
blockCount set to zero. For example, if a given
fork occupied three extents, then the last five extent
descriptors would be all zeroes.
struct HFSPlusForkData {
UInt64 logicalSize;
UInt32 clumpSize;
UInt32 totalBlocks;
HFSPlusExtentRecord extents;
};
typedef struct HFSPlusForkData HFSPlusForkData;
typedef HFSPlusExtentDescriptor HFSPlusExtentRecord[8];
|
The fields have the following meaning:
logicalSize
- The size, in bytes, of the valid data in the fork.
clumpSize
- For
HFSPlusForkData structures in the
volume header, this is the fork's
clump size, which is used in preference to the
default clump size in the volume header.
For HFSPlusForkData structures in a
catalog record, this field was intended to store a per-fork
clump size to override the default clump size
in the volume header. However, Apple
implementations prior to Mac OS X version 10.3 ignored this field.
As of Mac OS X version 10.3, this field is used to keep track of the
number of blocks actually read from the fork. See the Hot
Files section for more information.
totalBlocks
- The total number of allocation blocks used by all the
extents in this fork.
extents
- An array of extent descriptors for the fork. This
array holds the first eight extent descriptors. If more
extent descriptors are required, they are stored in the
extents overflow file.
IMPORTANT:
The HFSPlusExtentRecord is also the
data record used in the
extents overflow
file (the extent record).
|
The HFSPlusExtentDescriptor structure is
used to hold information about a specific extent.
struct HFSPlusExtentDescriptor {
UInt32 startBlock;
UInt32 blockCount;
};
typedef struct HFSPlusExtentDescriptor HFSPlusExtentDescriptor;
|
The fields have the following meaning:
startBlock
- The first allocation block in the extent.
blockCount
- The length, in allocation blocks, of the extent.
Back to top
Volume Header
Each HFS Plus volume contains a volume header
1024 bytes from the start of the volume. The volume
header -- analogous to the master directory block (MDB)
for HFS -- contains information about the volume as a whole,
including the location of other key structures in the volume.
The implementation is responsible for ensuring that this
structure is updated before the volume is unmounted.
A copy of the volume header, the alternate volume header,
is stored starting 1024 bytes before the end of the volume. The
implementation should only update this copy when the length
or location of one of the special files changes. The alternate volume
header is intended for use solely by disk repair utilities.
The first 1024 bytes and the
last 512 bytes of the volume are
reserved.
Note:
The first 1024 bytes are reserved for use as boot
blocks; the traditional Mac OS Finder will write to them when
the System Folder changes. The boot block format is
outside the scope of this specification. It is
defined in
Inside
Macintosh: Files.
The last 512 bytes were used during Apple's CPU
manufacturing process.
|
The allocation block (or blocks)
containing the first 1536 bytes (reserved space plus volume header)
are marked as used in the allocation file (see the
Allocation File section).
Also, in order to accommodate the alternate volume header and
the reserved space following it, the last allocation block (or two
allocation blocks, if the volume is formatted with 512-byte
allocation blocks) is also marked as used in the allocation
file.
IMPORTANT:
The alternate volume header is always stored at offset
1024 bytes from the end of the volume. If the
disk size is not an even multiple of the allocation
block size, this area may lie beyond the last
allocation block. However, the last allocation
block (or two allocation blocks for a volume
formatted with 512-byte allocation blocks) is still
reserved even if the alternate volume header is not
stored there.
|
The volume header is described by the
HFSPlusVolumeHeader type.
struct HFSPlusVolumeHeader {
UInt16 signature;
UInt16 version;
UInt32 attributes;
UInt32 lastMountedVersion;
UInt32 journalInfoBlock;
UInt32 createDate;
UInt32 modifyDate;
UInt32 backupDate;
UInt32 checkedDate;
UInt32 fileCount;
UInt32 folderCount;
UInt32 blockSize;
UInt32 totalBlocks;
UInt32 freeBlocks;
UInt32 nextAllocation;
UInt32 rsrcClumpSize;
UInt32 dataClumpSize;
HFSCatalogNodeID nextCatalogID;
UInt32 writeCount;
UInt64 encodingsBitmap;
UInt32 finderInfo[8];
HFSPlusForkData allocationFile;
HFSPlusForkData extentsFile;
HFSPlusForkData catalogFile;
HFSPlusForkData attributesFile;
HFSPlusForkData startupFile;
};
typedef struct HFSPlusVolumeHeader HFSPlusVolumeHeader;
|
The fields have the following meaning:
signature
- The volume signature, which must be
kHFSPlusSigWord ('H+' ) for an
HFS Plus volume, or kHFSXSigWord ('HX' )
for an HFSX volume.
version
- The version of the volume format, which is currently
4 (
kHFSPlusVersion ) for HFS Plus volumes, or
5 (kHFSXVersion ) for HFSX
volumes.
attributes
- Volume attributes, as
described below.
lastMountedVersion
- A value which uniquely identifies the implementation
that last mounted this volume for writing. This value can
be used by future implementations to detect volumes that
were last mounted by older implementations and check them
for deficiencies. Any code which modifies the on disk
structures must also set this field to a unique value which
identifies that code. Third-party implementations of HFS
Plus should place a
registered creator
code in this field. The value used by Mac OS 8.1 to
9.2.2 is
'8.10' .
The value used by Mac OS X is '10.0' . The
value used by a journaled volume
(including HFSX) in Mac OS X is 'HFSJ' .
The value used by fsck_hfs on Mac OS X is 'fsck' .
Note:
It is very important for implementations (and
utilities that directly modify the volume!) to set
the lastMountedVersion . It is also
important to choose different values when
non-trivial changes are made to an implementation
or utility. If a bug is found in an implementation
or utility, and it sets the
lastMountedVersion correctly, it will
be much easier for other implementations and
utilities to detect and correct any problems.
|
journalInfoBlock
- The allocation block number of the allocation block
which contains the
JournalInfoBlock
for this volume's journal. This field is valid only if bit
kHFSVolumeJournaledBit is set in the
attribute field; otherwise, this field is
reserved.
createDate
- The date and time when the volume was created. See
HFS Plus Dates for a
description of the format.
modifyDate
- The date and time when the volume was last modified.
See HFS Plus Dates for a
description of the format.
backupDate
- The date and time when the volume was last backed up.
The volume format requires no special action on this
field; it simply defines the field for the benefit of
user programs. See HFS Plus
Dates for a description of the format.
checkedDate
- The date and time when the volume was last checked
for consistency. Disk checking tools, such as Disk First
Aid, must set this when they perform a disk check. A disk
checking tool may use this date to perform periodic
checking of a volume.
fileCount
- The total number of files on the volume. The
fileCount field does not include the special
files. It should equal the number of file records found
in the catalog file.
folderCount
- The total number of folders on the volume.
The
folderCount field does not include the
root folder. It should equal the number of folder records
in the catalog file, minus one (since the root folder has
a folder record in the catalog file).
blockSize
- The allocation block size, in bytes.
totalBlocks
- The total number of allocation blocks on the disk.
For a disk whose size is an even
multiple of the allocation block size, all areas
on the disk are included in an allocation block,
including the volume header and alternate volume header.
For a disk whose size is not an
even multiple of the allocation block size, only the
allocation blocks that will fit entirely on the disk are
counted here. The remaining space at the end of the
disk is not used by the volume format (except for storing
the alternate volume header, as described above).
freeBlocks
- The total number of unused allocation blocks on the
disk.
nextAllocation
- Start of next allocation search. The
nextAllocation field is used by Mac OS as a
hint for where to start searching for free allocation blocks
when allocating space for a file. It contains the allocation
block number where the search should begin. An
implementation that doesn't want to use this kind of hint
can just treat the field as reserved. [Implementation
details: traditional Mac OS implementations typically
set it to the first allocation block of the extent most
recently allocated. It is not set to the allocation block
immediately following the most recently allocated extent
because of the likelihood of that extent being shortened
when the file is closed (since a whole clump may have been allocated but not
actually used).] See Allocation
File section for details.
rsrcClumpSize
- The default clump
size for resource forks, in bytes. This is a hint to the
implementation as to the size by which a growing file should
be extended. All Apple implementations to date ignore the
rsrcClumpSize and use
dataClumpSize for both data and resource
forks.
dataClumpSize
- The default clump
size for data forks, in bytes. This is a hint to the
implementation as to the size by which a growing file should
be extended. All Apple implementations to date ignore the
rsrcClumpSize and use
dataClumpSize for both data and resource
forks.
nextCatalogID
- The next unused catalog ID. See
Catalog File for a description
of catalog IDs.
writeCount
- This field is incremented every time a volume is
mounted. This allows an implementation to keep the volume
mounted even when the media is ejected (or otherwise
inaccessible). When the media is re-inserted, the
implementation can check this field to determine when the
media has been changed while it was ejected. It is very
important that an implementation or utility change the
writeCount field if it modifies the volume's
structures directly. This is particularly important if it
adds or deletes items on the volume.
encodingsBitmap
- This field keeps track of the text encodings used in
the file and folder names on the volume. This bitmap
enables some performance optimizations for
implementations that don't use Unicode names directly.
See the Text Encoding
sections for details.
finderInfo
-
This array of 32-bit items contains information used by the Mac OS
Finder, and the system software boot process.
finderInfo[0] contains the directory ID of the
directory containing the bootable system (for example, the
System Folder in Mac OS 8 or 9, or /System/Library/CoreServices
in Mac OS X). It is zero if there is no bootable system on the volume.
This value is typically equal to either finderInfo[3]
or finderInfo[5] .
finderInfo[1] contains the parent directory ID of
the startup application (for example, Finder), or zero if the volume
is not bootable.
finderInfo[2] contains the directory ID of a directory
whose window should be displayed in the Finder when the volume is
mounted, or zero if no directory window should be opened. In
traditional Mac OS, this is the first in a linked list of windows
to open; the frOpenChain field of the directory's
Finder Info contains the next directory ID
in the list. The open window list is deprecated. The Mac OS X
Finder will open this directory's window, but ignores the rest
of the open window list. The Mac OS X Finder does not modify
this field.
finderInfo[3] contains the directory ID of a bootable
Mac OS 8 or 9 System Folder, or zero if there isn't one.
finderInfo[4] is reserved.
finderInfo[5] contains the directory ID of a bootable
Mac OS X system (the /System/Library/CoreServices
directory), or zero if there is no bootable Mac OS X system on
the volume.
finderInfo[6] and finderInfo[7] are
used by Mac OS X to contain a 64-bit unique volume identifier.
One use of this identifier is for tracking whether a given
volume's ownership (user ID) information should be honored.
These elements may be zero if no such identifier has been
created for the volume.
allocationFile
- Information about the location and size of the
allocation file. See Fork
Data Structure for a description of the
HFSPlusForkData type.
extentsFile
- Information about the location and size of the
extents file. See Fork Data
Structure for a description of the
HFSPlusForkData type.
catalogFile
- Information about the location and size of the
catalog file. See Fork Data
Structure for a description of the
HFSPlusForkData type.
attributesFile
- Information about the location and size of the
attributes file. See Fork
Data Structure for a description of the
HFSPlusForkData type.
startupFile
- Information about the location and size of the
startup file. See Fork Data
Structure for a description of the
HFSPlusForkData type.
Volume Attributes
The attributes field of a volume header is
treated as a set of one-bit flags. The definition of the
bits is given by the constants listed below.
enum {
/* Bits 0-6 are reserved */
kHFSVolumeHardwareLockBit = 7,
kHFSVolumeUnmountedBit = 8,
kHFSVolumeSparedBlocksBit = 9,
kHFSVolumeNoCacheRequiredBit = 10,
kHFSBootVolumeInconsistentBit = 11,
kHFSCatalogNodeIDsReusedBit = 12,
kHFSVolumeJournaledBit = 13,
/* Bit 14 is reserved */
kHFSVolumeSoftwareLockBit = 15
/* Bits 16-31 are reserved */
};
|
The bits have the following meaning:
- bits 0-7
- An implementation must treat these as
reserved fields.
kHFSVolumeUnmountedBit (bit 8)
- This bit is set if the volume was correctly flushed
before being unmounted or ejected. An implementation must
clear this bit on the media when it mounts a volume for
writing. An implementation must set this bit on the media
as the last step of unmounting a writable volume, after
all other volume information has been flushed. If an
implementation is asked to mount a volume where this bit
is clear, it must assume the volume is inconsistent, and
do appropriate
consistency
checking before using the volume.
kHFSVolumeSparedBlocksBit (bit 9)
- This bit is set if there are any records in the
extents overflow file for bad blocks (belonging to file
ID
kHFSBadBlockFileID ). See
Bad Block File for details.
kHFSVolumeNoCacheRequiredBit (bit 10)
- This bit is set if the blocks from this volume should
not be cached. For example, a RAM or ROM disk is actually
stored in memory, so using additional memory to cache the
volume's contents would be wasteful.
kHFSBootVolumeInconsistentBit (bit 11)
- This bit is similar to
kHFSVolumeUnmountedBit , but inverted in
meaning. An implementation must set this bit on the media
when it mounts a volume for writing. An implementation
must clear this bit on the media as the last step of
unmounting a writable volume, after all other volume
information has been flushed. If an implementation is
asked to mount a volume where this bit is set, it must
assume the volume is inconsistent, and do appropriate
consistency
checking before using the volume.
kHFSCatalogNodeIDsReusedBit (bit 12)
- This bit is set when the
nextCatalogID
field overflows 32 bits, forcing smaller catalog node IDs to be reused. When this
bit is set, it is common (and not an error) for catalog
records to exist with IDs greater than or equal to
nextCatalogID . If this bit is set, you must
ensure that IDs assigned to newly created catalog records do
not conflict with IDs used by existing records.
kHFSVolumeJournaledBit (bit 13)
- If this bit is set, the volume has a
journal, which can be located using the
journalInfoBlock
field of the Volume Header.
- bit 14
- An implementation must treat this bit as
reserved.
kHFSVolumeSoftwareLockBit (bit 15)
- This bit is set if the volume is write-protected due
to a software setting. Any implementations must refuse to
write to a volume with this bit set. This flag is
especially useful for write-protecting a volume on a
media that cannot be write-protected otherwise, or for
protecting an individual partition on a partitioned
device.
- bits 16-31
- An implementation must treat these bits as
reserved.
Note:
Mac OS X versions 10.0 to 10.3 don't properly honor
kHFSVolumeSoftwareLockBit . They incorrectly
allow such volumes to be modified. This bug is expected
to be fixed in a future version of Mac OS X. (r. 3507614)
|
Note:
An implementation may keep a copy of the attributes
in memory and use bits 0-7 for its own runtime
flags. As an example, Mac OS uses bit 7,
kHFSVolumeHardwareLockBit , to indicate
that the volume is write-protected due to some
hardware setting.
|
Note:
The existence of two volume consistency bits
(kHFSVolumeUnmountedBit and
kHFSBootVolumeInconsistentBit )
deserves an explanation. Macintosh ROMs check the
consistency of a boot volume if
kHFSVolumeUnmountedBit is clear. The
ROM-based check is very slow, annoyingly so. This
checking code was significantly optimized in Mac OS
7.6. To prevent the ROM check from being used, Mac
OS 7.6 (and higher) leaves the original consistency
check bit (kHFSVolumeUnmountedBit ) set
at all times. Instead, an alternative flag
(kHFSBootVolumeInconsistentBit ) is
used to signal that the disk needs a consistency
check.
|
Note:
For the boot volume, the
kHFSBootVolumeInconsistentBit should
be used as described but
kHFSVolumeUnmountedBit should remain
set; for all other volumes, use the
kHFSVolumeUnmountedBit as described
but keep the
kHFSBootVolumeInconsistentBit clear.
This is an optimization that prevents the Mac OS
ROM from doing a very slow consistency check when
the boot volume is mounted since it only checks
kHFSVolumeUnmountedBit , and won't do a
consistency check; later on, the File Manager will
see the kHFSBootVolumeInconsistentBit
set and do a better, faster consistency check. (It
would be OK to always use both bits at the expense
of a slower Mac OS boot.)
|
Back to top
B-Trees
Note:
For a practical description of the algorithms used
to maintain a B-tree, seeAlgorithms in
C, Robert Sedgewick, Addison-Wesley, 1992.
ISBN: 0201514257.
Many textbooks describe B-trees in which an
index node contains N keys and N+1 pointers, and
where keys less than key #X lie in the subtree
pointed to by pointer #X, and keys greater than key
#X lie in the subtree pointed to by pointer #X+1.
(The B-tree implementor defines whether to use
pointer #X or #X+1 for equal keys.)
HFS and HFS Plus are slightly different; in a
given subtree, there are no keys less than the
first key of that subtree's root node.
|
This section describes the B-tree structure used for the
catalog, extents overflow, and attributes files. A B-tree is
stored in file data fork. Each B-tree has a
HFSPlusForkData
structure in the volume header that describes the size and
initial extents of that data fork.
Note:
Special files do not have a resource fork because
there is no place to store its
HFSPlusForkData in the volume header.
However, it's still important that the B-tree is in
the data fork because the fork is part of the key
used to store B-tree extents in the extents
overflow file.
|
A B-tree file is divided up into fixed-size nodes,
each of which contains records, which consist of a
key and some data. The purpose of the B-tree is to
efficiently map a key into its corresponding data. To
achieve this, keys must be ordered, that is, there must be a
well-defined way to decide whether one key is smaller than,
equal to, or larger than another key.
The node size (which is expressed in bytes) must
be power of two, from 512 through 32,768, inclusive. The
node size of a B-tree is determined when the B-tree is
created. The logical length of a B-tree file is just the
number of nodes times the node size.
There are four kinds of nodes.
- Each B-tree contains a single header node. The
header node is always the first node in the B-tree. It
contains the information needed to find other any other
node in the tree.
- Map nodes contain map records, which
hold any allocation data (a bitmap that describes the
free nodes in the B-tree) that overflows the map record
in the header node.
- Index nodes hold pointer records that
determine the structure of the B-tree.
- Leaf nodes hold data records that
contain the data associated with a given key. The key for
each data record must be unique.
All nodes share a common structure, described in the next
section.
Node Structure
Nodes are indicated by number. The node's number can be
calculated by dividing its offset into the file by the node
size. Each node has the same general structure, consisting
of three main parts: a node descriptor at the beginning of
the node, a list of record offsets at the end of the node,
and a list of records. This structure is depicted in Figure
2.
Figure 2. The structure of a node.
The node descriptor contains basic information
about the node as well as forward and backward links to
other nodes. The BTNodeDescriptor data type
describes this structure.
struct BTNodeDescriptor {
UInt32 fLink;
UInt32 bLink;
SInt8 kind;
UInt8 height;
UInt16 numRecords;
UInt16 reserved;
};
typedef struct BTNodeDescriptor BTNodeDescriptor;
|
The fields have the following meaning:
fLink
- The node number of the next node of this type, or 0
if this is the last node.
bLink
- The node number of the previous node of this type, or
0 if this is the first node.
kind
- The type of this node. There are four node kinds,
defined by the constants listed below.
height
- The level, or depth, of this node in the B-tree
hierarchy. For the header node, this field must be zero.
For leaf nodes, this field must be one. For index nodes,
this field is one greater than the height of the child
nodes it points to. The height of a map node is zero,
just like for a header node. (Think of map nodes as
extensions of the map record in the header node.)
numRecords
- The number of records contained in this node.
reserved
- An implementation must treat this as a
reserved field.
A node descriptor is always 14 (which is
sizeof(BTNodeDescriptor) ) bytes long, so the
list of records contained in a node always starts 14
bytes from the start of the node. The size of each record
can vary, depending on the record's type and the amount of
information it contains.
The records are accessed using the list of record
offsets at the end of the node. Each entry in this list
is a UInt16 which contains the offset, in
bytes, from the start of the node to the start of the
record. The offsets are stored in reverse order, with the
offset for the first record in the last two bytes of the
node, the offset for the second record is in the previous
two bytes, and so on. Since the first record is always at
offset 14, the last two bytes of the node contain the value
14.
IMPORTANT:
The list of record offsets always contains one more
entry than there is records in the node. This entry
contains the offset to the first byte of free space
in the node, and thus indicates the size of the
last record in the node. If there is no free space
in the node, the entry contains its own byte offset
from the start of the node.
|
The kind field of the node descriptor
describes the type of a node, which indicates what kinds of
records it contains and, therefore, its purpose in the
B-tree hierarchy. There are four kinds of node types given
by the following constants:
enum {
kBTLeafNode = -1,
kBTIndexNode = 0,
kBTHeaderNode = 1,
kBTMapNode = 2
};
|
It's important to realise that the B-tree node type
determines the type of records found in the node. Leaf nodes
always contain data records. Index nodes always contain
pointer records. Map nodes always contain map records. The
header node always contains a header record, a reserved
record, and a map record. The four node types and their
corresponding records are described in the subsequent
sections.
Header Nodes
The first node (node 0) in every B-tree file
is a header node, which contains essential information about
the entire B-tree file. There are three records in the header
node. The first record is the B-tree header record. The second
record is the user data record and is always 128 bytes long.
The last record is the B-tree map record; it occupies all of
the remaining space between the user data record and the record
offsets. The header node is shown in Figure 3.
Figure 3 Header node structure
The fLink field of the header node's node
descriptor contains the node number of the first map node,
or 0 if there are no map nodes. The bLink field
of the header node's node descriptor must be set to zero.
Header Record
The B-tree header record contains general
information about the B-tree such as its size, maximum key
length, and the location of the first and last leaf nodes.
The data type BTHeaderRec describes the
structure of a header record.
struct BTHeaderRec {
UInt16 treeDepth;
UInt32 rootNode;
UInt32 leafRecords;
UInt32 firstLeafNode;
UInt32 lastLeafNode;
UInt16 nodeSize;
UInt16 maxKeyLength;
UInt32 totalNodes;
UInt32 freeNodes;
UInt16 reserved1;
UInt32 clumpSize; // misaligned
UInt8 btreeType;
UInt8 keyCompareType;
UInt32 attributes; // long aligned again
UInt32 reserved3[16];
};
typedef struct BTHeaderRec BTHeaderRec;
|
Note:
The root node can be a leaf node (in the case where
there is only a single leaf node, and therefore no
index nodes, as might happen with the catalog file
on a newly initialized volume). If a tree has no
leaf nodes (like the extents overflow file on a
newly initialized volume), the
firstLeafNode ,
lastLeafNode , and
rootNode fields will all be zero. If
there is only one leaf node (as may be the case
with the catalog file on a newly initialized
volume), firstLeafNode ,
lastLeafNode , and
rootNode will all have the same value
(i.e., the node number of the sole leaf node). The
firstLeafNode and
lastLeafNode fields just make it easy
to walk through all the leaf nodes by just
following fLink/bLink fields.
|
The fields have the following meaning:
treeDepth
- The current depth of the B-tree. Always equal to the
height field of the root node.
rootNode
- The node number of the root node, the index node that
acts as the root of the B-tree. See
Index Nodes for details. There
is a possibility that the
rootNode is a leaf
node. See Inside
Macintosh: Files, pp. 2-69 for details.
leafRecords
- The total number of records contained in all of the
leaf nodes.
firstLeafNode
- The node number of the first leaf node. This may be
zero if there are no leaf nodes.
lastLeafNode
- The node number of the last leaf node. This may be
zero if there are no leaf nodes.
nodeSize
- The size, in bytes, of a node. This is a power of
two, from 512 through 32,768, inclusive.
maxKeyLength
- The maximum length of a key in an index or leaf node.
HFSVolumes.h has the
maxKeyLength values for
the catalog and extents files for both HFS and HFS Plus
(kHFSPlusExtentKeyMaximumLength ,
kHFSExtentKeyMaximumLength ,
kHFSPlusCatalogKeyMaximumLength ,
kHFSCatalogKeyMaximumLength ). The maximum
key length for the attributes B-tree will probably be a
little larger than for the catalog file. In general,
maxKeyLength has to be small enough
(compared to nodeSize ) so that a single node
can fit two keys of maximum size plus the node descriptor
and offsets.
totalNodes
- The total number of nodes (be they free or used) in
the B-tree. The length of the B-tree file is this value
times the
nodeSize .
freeNodes
- The number of unused nodes in the B-tree.
reserved1
- An implementation must treat this as a
reserved field.
clumpSize
- Ignored for HFS Plus B-trees. The
clumpSize field of the
HFSPlusForkData
record is used instead. For maximum compatibility, an
implementation should probably set the
clumpSize in the node descriptor to the same
value as the clumpSize in the
HFSPlusForkData when initializing a volume.
Otherwise, it should treat the header records's
clumpSize as reserved.
btreeType
- The value stored in this field is of type
BTreeTypes :
enum BTreeTypes{
kHFSBTreeType = 0, // control file
kUserBTreeType = 128, // user btree type starts from 128
kReservedBTreeType = 255
};
|
This field must be equal to kHFSBTreeType
for the catalog, extents, and attributes B-trees. This field
must be equal to kUserBTreeType for the hot file B-tree. Historically, values of
1 to 127 and kReservedBTreeType were used in
B-trees used by system software in Mac OS 9 and earlier.
keyCompareType
- For HFSX volumes, this field in the
catalog B-tree header defines the ordering of the keys (whether
the volume is case-sensitive or case-insensitive). In all
other cases, an implementation must treat this as a
reserved field.
Constant name | Value | Meaning |
kHFSCaseFolding | 0xCF | Case folding (case-insensitive) |
kHFSBinaryCompare | 0xBC | Binary compare (case-sensitive) |
attributes
- A set of bits used to describe various attributes of
the B-tree. The meaning of these bits is given below.
reserved3
- An implementation must treat this as a
reserved field.
The following constants define the various bits that may
be set in the attributes field of the header
record.
enum {
kBTBadCloseMask = 0x00000001,
kBTBigKeysMask = 0x00000002,
kBTVariableIndexKeysMask = 0x00000004
};
|
The bits have the following meaning:
kBTBadCloseMask
- This bit indicates that the B-tree was not closed
properly and should be checked for consistency. This bit
is not used for HFS Plus B-trees. An implementation must
treat this as
reserved.
kBTBigKeysMask
- If this bit is set, the
keyLength field
of the keys in index and leaf nodes is
UInt16 ; otherwise, it is a
UInt8 . This bit must be set for all HFS Plus
B-trees.
kBTVariableIndexKeysMask
- If this bit is set, the keys in index nodes occupy
the number of bytes indicated by their
keyLength field; otherwise, the keys in
index nodes always occupy maxKeyLength
bytes. This bit must be set for the HFS Plus Catalog
B-tree, and cleared for the HFS Plus Extents B-tree.
Bits not specified here must be treated as
reserved.
User Data Record
The second record in a header node is always 128 bytes long.
It provides a small space to store information associated with
a B-tree.
In the HFS Plus catalog, extents, and attributes B-trees, this record is
unused and reserved. In
the HFS Plus hot file B-tree, this
record contains general information about the hot file
recording process.
Map Record
The remaining space in the header node is occupied by a
third record, the map record. It is a bitmap that
indicates which nodes in the B-tree are used and which are
free. The bits are interpreted in the same way as the bits
in the allocation file.
All tolled, the node descriptor, header record, reserved
record, and record offsets occupy 256 bytes of the header
node. So the size of the map record (in bytes) is
nodeSize minus 256. If there are more nodes in
the B-tree than can be represented by the map record in the
header node, map nodes are used to store additional
allocation data.
Map Nodes
If the map record of the header node is not large enough
to represent all of the nodes in the B-tree, map
nodes are used to store the remaining allocation data.
In this case, the fLink field of the header
node's node descriptor contains the node number of the first
map node.
A map node consists of the node descriptor and a single
map record. The map record is a continuation of the map
record contained in the header node. The size of the map
record is the size of the node, minus the size of the node
descriptor (14 bytes), minus the size of two offsets (4
bytes), minus two bytes of free space. That is, the size of
the map record is the size of the node minus 20 bytes; this
keeps the length of the map record an even multiple of 4
bytes. Note that the start of the map record is not
aligned to a 4-byte boundary: it starts immediately after
the node descriptor (at an offset of 14 bytes).
The B-tree uses as many map nodes as needed to provide
allocation data for all of the nodes in the B-tree. The map
nodes are chained through the fLink fields of
their node descriptors, starting with the header node. The
fLink field of the last map node's node
descriptor is zero. The bLink field is not used
for map nodes and must be set to zero for all map nodes.
Note:
Not using the bLink field is
consistent with the HFS volume format, but not
really consistent with the overall design.
|
Keyed Records
The records in index and leaf nodes share a common
structure. They contain a keyLength , followed
by the key itself, followed by the record data.
The first part of the record, keyLength , is
either a UInt8 or a UInt16 ,
depending on the attributes field in the
B-tree's header record. If the kBTBigKeysMask
bit is set in attributes , the
keyLength is a UInt16 ; otherwise,
it's a UInt8 . The length of the key, as stored
in this field, does not include the size of the
keyLength field itself.
IMPORTANT:
All HFS Plus B-trees use a UInt16 for
their key length.
|
Immediately following the keyLength is the
key itself. The length of the key is determined by the node
type and the B-tree attributes. In leaf nodes, the length is
always determined by keyLength . In index nodes,
the length depends on the value of the
kBTVariableIndexKeysMask bit in the B-tree
attributes in the header record.
If the bit is clear, the key occupies a constant number of
bytes, determined by the maxKeyLength field of
the B-tree header record. If the bit is set, the key length
is determined by the keyLength field of the
keyed record.
Following the key is the record's data. The format of
this data depends on the node type, as explained in the next
two sections. However, the data is always aligned on a
two-byte boundary and occupies an even number of bytes. To
meet the first alignment requirement, a pad byte must be
inserted between the key and the data if the size of the
keyLength field plus the size of the key is
odd. To meet the second alignment requirement, a pad byte
must be added after the data if the data size is odd.
Index Nodes
The records in an index node are called pointer
records. They contain a keyLength , a key,
and a node number, expressed a UInt32 . The node
whose number is in a pointer record is called a child
node of the index node. An index node has two or more
children, depending on the size of the node and the size of
the keys in the node.
Note:
A root node does not need to exist (if the tree is
empty). And even if one does exist, it need not
be an index node (i.e., it could be a leaf node
if all the records fit in a single node).
|
Leaf Nodes
The bottom level of a B-tree is occupied exclusively by
leaf nodes, which contain data records instead
of pointer records. The data records contain a
keyLength , a key, and the data associated with
that key. The data may be of variable length.
In an HFS Plus B-tree, the data in the data record is the
HFS Plus volume structure (such as a
CatalogRecord , ExtentRecord , or
AttributeRecord ) associated with the key.
Searching for
Keyed Records
A B-tree is highly structured to allow for efficient
searching, insertion, and removal. This structure primarily
affects the keyed records (pointer records and data records)
and the nodes in which they are stored (index nodes and leaf
nodes). The following are the ordering requirements for
index and leaf nodes:
- Keyed records must be placed in a node such that
their keys are in ascending order.
- All the nodes in a given level (whose
height field is the same) must be chained
via their fLink and bLink
field. The node with the smallest keys must be first in
the chain and its bLink field must be zero.
The node with the largest keys must be last in the chain
and its fLink field must be zero.
- For any given node, all the keys in the node must be
less than all the keys in the next node in the chain
(pointed to by
fLink ). Similarly, all the
keys in the node must be greater than all the keys in the
previous node in the chain (pointed to by
bLink ).
Keeping the keys ordered in this way makes it possible to
quickly search the B-tree to find the data associated with a
given key. Figure 4 shows a sample B-tree containing
hypothetical keys (in this case, the keys are simply
integers).
When an implementation needs to find the data associated
with a particular search key, it begins searching at
the root node. Starting with the first record, it searches
for the record with the greatest key that is less than or
equal to the search key. In then moves to the child node
(typically an index node) and repeats the same process.
This process continues until a leaf node is reached. If
the key found in the leaf node is equal to the search key,
the found record contains the desired data associated with
the search key. If the found key is not equal to the search
key, the search key is not present in the B-tree.
Figure 4. A sample B-Tree
HFS and HFS Plus B-Trees Compared
The structure of the B-trees on an HFS Plus volume is a
closely related to the
B-tree
structure used on an HFS volume. There are three
principal differences: the size of nodes, the size of keys
within index nodes, and the size of a key length (UInt8 vs.
UInt16).
Node Sizes
In an HFS B-tree, nodes always have a fixed size of 512
bytes.
In an HFS Plus B-tree, the node size is determined by a
field (nodeSize ) in the header node. The node
size must be a power from 512 through 32,768. An
implementation must use the nodeSize field to
determine the actual node size.
Note:
The header node is always located at the start of
the B-tree, so you can find it without knowing the
B-tree node size.
|
HFS Plus uses the following default node sizes:
- 4 KB (8KB in Mac OS X) for the catalog file
- 1 KB (4KB in Mac OS X) for the extents overflow file
- 4 KB for the attributes file
These sizes are set when the volume is initialized and
cannot be easily changed. It is legal to initialize an HFS
Plus volume with different node sizes, but the node sizes
must be large enough for an index node to contain two keys
of maximum size (plus the other overhead such as a node
descriptor, record offsets, and pointers to children).
IMPORTANT:
The node size of the catalog file must be at least
kHFSPlusCatalogMinNodeSize (4096).
|
IMPORTANT:
The node size of the attributes file must be at
least kHFSPlusAttrMinNodeSize (4096).
|
Key Size in an Index Node
In an HFS B-tree, all of the keys in an index node occupy
a fixed amount of space: the maximum key length for that
B-tree. This simplifies the algorithms for inserting and
deleting records because, within an index node, one key can
be replaced by another key without worrying whether there is
adequate room for the new key. However, it is also somewhat
wasteful when the keys are variable length (such as in the
catalog file, where the key length varies with the length of
the file name).
In an HFS Plus B-tree, the keys in an index node are
allowed to vary in size. This complicates the algorithms for
inserting and deleting records, but reduces wasted space
when the length of a key can vary (such as in the catalog
file). It also means that the number of keys in an index
node will vary with the actual size of the keys.
Back to top
Catalog File
HFS Plus uses a catalog file to maintain information
about the hierarchy of files and folders on a volume. A
catalog file is organized as a B-tree
file, and hence consists of a header node, index nodes, leaf
nodes, and (if necessary) map nodes. The location of the
first extent of the catalog file (and hence of the file's
header node) is stored in the volume header. From the
catalog file's header node, an implementation can obtain the
node number of the root node of the B-tree. From the root
node, an implementation can search the B-tree for keys, as
described in the
previous section.
The B-Trees chapter defined a standard rule for the
node size of HFS Plus B-trees. As
the catalog file is a B-tree, it inherits the requirements
of this rule. In addition, the node size of the catalog file
must be at least 4 KB
(kHFSPlusCatalogMinNodeSize ).
Each file or folder in the catalog file is assigned a
unique catalog node ID (CNID). For folders, the CNID is the
folder ID, sometimes called a directory ID, or dirID;
for files, it's the file ID. For any given file or
folder, the parent ID is the CNID of the folder containing
the file or folder, known as the parent folder.
The catalog node ID is defined by the
CatalogNodeID data type.
typedef UInt32 HFSCatalogNodeID;
|
The first 16 CNIDs are reserved for use by Apple
Computer, Inc., and include the following standard
assignments:
enum {
kHFSRootParentID = 1,
kHFSRootFolderID = 2,
kHFSExtentsFileID = 3,
kHFSCatalogFileID = 4,
kHFSBadBlockFileID = 5,
kHFSAllocationFileID = 6,
kHFSStartupFileID = 7,
kHFSAttributesFileID = 8,
kHFSRepairCatalogFileID = 14,
kHFSBogusExtentFileID = 15,
kHFSFirstUserCatalogNodeID = 16
};
|
These constants have the following meaning:
kHFSRootParentID
- Parent ID of the root folder.
kHFSRootFolderID
- Folder ID of the root folder.
kHFSExtentsFileID
- File ID of the extents
overflow file.
kHFSCatalogFileID
- File ID of the catalog
file.
kHFSBadBlockFileID
- File ID of the bad block
file. The bad block file is not a file in the same
sense as a special file and a user file. See
Bad Block File for details.
kHFSAllocationFileID
- File ID of the allocation
file (introduced with HFS Plus).
kHFSStartupFileID
- File ID of the startup
file (introduced with HFS Plus).
kHFSAttributesFileID
- File ID of the attributes
file (introduced with HFS Plus).
kHFSRepairCatalogFileID
- Used temporarily by
fsck_hfs when
rebuilding the catalog file.
kHFSBogusExtentFileID
- Used temporarily during
ExchangeFiles
operations.
kHFSFirstUserCatalogNodeID
- First CNID available for use by user files and
folders.
In addition, the CNID of zero is never used and serves as
a nil value.
Typically, CNIDs are allocated sequentially, starting at
kHFSFirstUserCatalogNodeID . Versions of the HFS
Plus specification prior to Jan. 18, 2000, required the
nextCatalogID field of the volume header to be greater than the
largest CNID used on the volume (so that an implementation
could use nextCatalogID to determine the CNID to
assign to a newly created file or directory). However, this
can be a problem for volumes that create files or directories
at a high rate (for example, a busy server), since they might
run out of CNID values.
HFS Plus volumes now allow CNID values to wrap around and be
reused. The kHFSCatalogNodeIDsReusedBit in the
attributes field of the
volume header is set to indicate when CNID values have
wrapped around and been reused. When
kHFSCatalogNodeIDsReusedBit is set, the
nextCatalogID field is no longer required to be
greater than any existing CNID.
When kHFSCatalogNodeIDsReusedBit is set,
nextCatalogID may still be used as a hint for the
CNID to assign to newly created files or directories, but the
implementation must verify that CNID is not currently in use
(and pick another value if it is in use). When CNID number
nextCatalogID is already in use, an implementation
could just increment nextCatalogID until it finds
a CNID that is not in use. If nextCatalogID
overflows to zero, kHFSCatalogNodeIDsReusedBit
must be set and nextCatalogID set to
kHFSFirstUserCatalogNodeID (to avoid using any
reserved CNID values).
Note:
Mac OS X versions 10.2 and later, and all versions
of Mac OS 9 support
kHFSCatalogNodeIDsReusedBit .
|
As the catalog file is a B-tree file, it inherits its
basic structure from the definition in
B-Trees. Beyond that, you need to know
only two things about an HFS Plus catalog file to interpret
its data:
- the format of the key used both in index and leaf
nodes, and
- the format of the leaf node data records (file,
folder, and thread records).
Catalog File Key
For a given file, folder, or thread record, the catalog file
key consists of the parent folder's CNID
and the name of the file or folder. This structure is described
using the HFSPlusCatalogKey type.
struct HFSPlusCatalogKey {
UInt16 keyLength;
HFSCatalogNodeID parentID;
HFSUniStr255 nodeName;
};
typedef struct HFSPlusCatalogKey HFSPlusCatalogKey;
|
The fields have the following meaning:
keyLength
- The
keyLength field is required by all
keyed records in a B-tree.
The catalog file, in common with all HFS Plus B-trees,
uses a large key length (UInt16 ).
parentID
- For file and folder records, this is the folder
containing the file or folder represented by the record. For
thread records, this is the CNID of the
file or folder itself.
nodeName
- This field contains Unicode characters,
fully decomposed and in
canonical order. For file or folder records, this is
the name of the file or folder inside the
parentID
folder. For thread records, this is the empty string.
IMPORTANT:
The length of the key varies with the length of the
string stored in the nodeName field;
it occupies only the number of bytes required to
hold the name. The keyLength field
determines the actual length of the key; it varies
between
kHFSPlusCatalogKeyMinimumLength (6) to
kHFSPlusCatalogKeyMaximumLength (516).
|
Note:
The catalog file key mirrors the standard way you
specify a file or folder with the Mac OS File
Manager programming interface, with the exception
of the volume reference number, which determines
which volume's catalog to search.
|
Catalog file keys are compared first by
parentID and then by nodeName . The
parentID is compared as an unsigned 32-bit
integer. For case-sensitive HFSX volumes,
the characters in nodeName are compared as a
sequence of unsigned 16-bit integers. For case-insensitive
HFSX volumes and HFS Plus volumes, the nodeName
must be compared in a case-insensitive way, as described in the
Case-Insensitive String
Comparison Algorithm section.
For more information about how catalog keys are used to
find file, folder, and thread records within the catalog
tree, see Catalog Tree
Usage.
Catalog File Data
A catalog file leaf node can contain four different types
of data records:
- A folder record contains information about a
single folder.
- A file record contains information about a
single file.
- A folder thread record provides a link between
a folder and its parent folder, and lets you find a
folder record given just the folder ID.
- A file thread record provides a link between a
file and its parent folder, and lets you find a file
record given just the file ID. (In both the folder thread
and the file thread record, the thread record is used to
map the file or folder ID to the actual parent directory
ID and name.)
Each record starts with a recordType field,
which describes the type of catalog data record. The
recordType field contains one of the following
values:
enum {
kHFSPlusFolderRecord = 0x0001,
kHFSPlusFileRecord = 0x0002,
kHFSPlusFolderThreadRecord = 0x0003,
kHFSPlusFileThreadRecord = 0x0004
};
|
The values have the following meaning:
kHFSPlusFolderRecord
- This record is a
folder record. You can
use the
HFSPlusCatalogFolder type to
interpret the data.
kHFSPlusFileRecord
- This record is a file
record. You can use the
HFSPlusCatalogFile type to interpret the
data.
kHFSPlusFolderThreadRecord
- This record is a folder
thread record. You can
use the
HFSPlusCatalogThread type to
interpret the data.
kHFSPlusFileThreadRecord
- This record is a file
thread record. You can
use the
HFSPlusCatalogThread type to
interpret the data.
The next three sections describe the folder, file, and
thread records in detail.
Note:
The position of the recordType field,
and the constants chosen for the various record
types, are especially useful if you're writing
common code to handle HFS and HFS Plus volumes.
In HFS, the record type field is one byte, but
it's always followed by a one-byte reserved field
whose value is always zero. In HFS Plus, the record
type field is two bytes. You can use the HFS Plus
two-byte record type to examine an HFS record if
you use the appropriate constants, as shown below.
|
enum {
kHFSFolderRecord = 0x0100,
kHFSFileRecord = 0x0200,
kHFSFolderThreadRecord = 0x0300,
kHFSFileThreadRecord = 0x0400
};
|
The values have the following meaning:
kHFSFolderRecord
- This record is an HFS folder record. You can
use the
HFSCatalogFolder type to
interpret the data.
kHFSFileRecord
- This record is an HFS file record. You can
use the
HFSCatalogFile type to
interpret the data.
kHFSFolderThreadRecord
- This record is an HFS folder thread record.
You can use the
HFSCatalogThread
type to interpret the data.
kHFSFileThreadRecord
- This record is an HFS file thread record.
You can use the
HFSCatalogThread
type to interpret the data.
Catalog Folder Records
The catalog folder record is used in the catalog B-tree
file to hold information about a particular folder on the
volume. The data of the record is described by the
HFSPlusCatalogFolder type.
struct HFSPlusCatalogFolder {
SInt16 recordType;
UInt16 flags;
UInt32 valence;
HFSCatalogNodeID folderID;
UInt32 createDate;
UInt32 contentModDate;
UInt32 attributeModDate;
UInt32 accessDate;
UInt32 backupDate;
HFSPlusBSDInfo permissions;
FolderInfo userInfo;
ExtendedFolderInfo finderInfo;
UInt32 textEncoding;
UInt32 reserved;
};
typedef struct HFSPlusCatalogFolder HFSPlusCatalogFolder;
|
The fields have the following meaning:
recordType
- The catalog data record type. For folder records,
this is always
kHFSPlusFolderRecord .
flags
- This field contains bit flags about the folder. No
bits are currently defined for folder records. An
implementation must treat this as a
reserved field.
valence
- The number of files and folders directly contained by
this folder. This is equal to the number of file and
folder records whose key's
parentID is equal
to this folder's folderID .
IMPORTANT:
The traditional Mac OS File Manager programming
interfaces require folders to have a valence less
than 32,767. An implementation must enforce this
restriction if it wants the volume to be usable by
Mac OS. Values of 32,768 and larger are
problematic; 32,767 and smaller are OK. It's an
implementation restriction for the older Mac OS
APIs; items 32,768 and beyond would be unreachable
by PBGetCatInfo . As a practical
matter, many programs are likely to fails with
anywhere near that many items in a single folder.
So, the volume format allows more than 32,767 items
in a folder, but it's probably not a good idea to
exceed that limit right now.
|
folderID
- The CNID of this folder.
Remember that the key for a folder record contains the
CNID of the folders parent, not the CNID of the folder
itself.
createDate
- The date and time the folder was created. See
HFS Plus Dates for a
description of the format. Again, the
createDate of the Volume Header is NOT
stored in GMT; it is local time. (Further, if the volume
has an HFS wrapper, the creation date in the MDB should
be the same as the createDate in the Volume
Header).
contentModDate
- The date and time the folder's contents were last
changed. This is the time when a file or folder was
created or deleted inside this folder, or when a file or
folder was moved in or out of this folder. See
HFS Plus Dates for a
description of the format.
Note:
The traditional Mac OS APIs use the
contentModDate when getting and
setting the modification date. The traditional Mac OS
APIs treat attributeModDate as a
reserved field.
|
attributeModDate
- The last date and time that any field in the
folder's catalog record was changed. An implementation may treat
this field as reserved.
In Mac OS X, the BSD APIs use this field as the folder's change time
(returned in the
st_ctime field of struct stat ).
All versions of Mac OS 8 and 9 treat this field as reserved. See
HFS Plus Dates for a description of
the format.
accessDate
- The date and time the folder's contents were last
read. This field has no analog in the HFS catalog record.
It represents the last time the folder's contents were
read. This field exists to support POSIX semantics when
the volume is mounted on Mac OS X and some non-Mac OS platforms. See
HFS Plus Dates for a
description of the format.
IMPORTANT:
The traditional Mac OS implementation of HFS Plus does not
maintain the accessDate field. Folders
created by traditional Mac OS have an
accessDate of zero.
|
backupDate
- The date and time the folder was last backed up. The
volume format requires no special action on this field;
it simply defines the field for the benefit of user
programs. See HFS Plus Dates
for a description of the format.
permissions
- This field contains folder permissions, similar to
those defined by POSIX or AFP. See
HFS Plus Permissions
for a description of the format.
IMPORTANT:
The traditional Mac OS implementation of HFS Plus does not use
the permissions field. Folders created
by traditional Mac OS have the entire field set to 0.
|
userInfo
- This field contains information used by the Mac OS
Finder. The contents of this structure are not strictly part of the HFS Plus
specification, but general information is in the
Finder Info section of this note.
finderInfo
- This field contains information used by the Mac OS
Finder. The contents of this structure are not strictly part of the HFS Plus
specification, but general information is in the
Finder Info section of this note.
textEncoding
- A hint as to text encoding from which the folder name
was derived. This hint can be used to improve the quality
of the conversion of the name to a Mac OS-encoded Pascal
string. See Text Encodings
for details.
reserved
- An implementation must treat this as a
reserved field.
Catalog File Records
The catalog file record is used in the catalog B-tree
file to hold information about a particular file on the
volume. The data of the record is described by the
HFSPlusCatalogFile type.
struct HFSPlusCatalogFile {
SInt16 recordType;
UInt16 flags;
UInt32 reserved1;
HFSCatalogNodeID fileID;
UInt32 createDate;
UInt32 contentModDate;
UInt32 attributeModDate;
UInt32 accessDate;
UInt32 backupDate;
HFSPlusBSDInfo permissions;
FileInfo userInfo;
ExtendedFileInfo finderInfo;
UInt32 textEncoding;
UInt32 reserved2;
HFSPlusForkData dataFork;
HFSPlusForkData resourceFork;
};
typedef struct HFSPlusCatalogFile HFSPlusCatalogFile;
|
The fields have the following meaning:
recordType
- The catalog data record type. For files records, this
is always
kHFSPlusFileRecord .
flags
- This field contains bit flags about the file. The
currently defined bits are described
below. An implementation must treat undefined bits as
reserved.
reserved1
- An implementation must treat this as a
reserved field.
fileID
- The CNID of this file.
createDate
- The date and time the file was created. See
HFS Plus Dates for a
description of the format.
contentModDate
- The date and time the file's contents were last
changed by extending, truncating, or writing either of
the forks. See HFS Plus Dates
for a description of the format.
attributeModDate
- The last date and time that any field in the
file's catalog record was changed. An implementation may treat
this field as reserved.
In Mac OS X, the BSD APIs use this field as the file's change time
(returned in the
st_ctime field of struct stat ).
All versions of Mac OS 8 and 9 treat this field as reserved. See
HFS Plus Dates for a description of
the format.
accessDate
- The date and time the file's contents were last read.
This field has no analog in the HFS catalog record. It
represents the last time either of a file's forks was
read. This field exists to support POSIX semantics when
the volume is mounted on Mac OS X and some non-Mac OS platforms. See
HFS Plus Dates for a
description of the format.
IMPORTANT:
The traditional Mac OS implementation of HFS Plus does not
maintain the accessDate field. Files
created by traditional Mac OS have an
accessDate of zero.
|
backupDate
- The date and time the file was last backed up. The
volume format requires no special action on this field;
it simply defines the field for the benefit of user
programs. See HFS Plus Dates
for a description of the format.
permissions
- This field contains file permissions, similar to
those defined by POSIX. See
HFS Plus Permissions
for a description of the format.
IMPORTANT:
The traditional Mac OS implementation of HFS Plus does not use
the permissions field. Files created
by traditional Mac OS have the entire field set to 0.
|
userInfo
- This field contains information used by the Mac OS
Finder. For more information, see the
Finder Info section of this note.
finderInfo
- This field contains information used by the Mac OS
Finder. The contents of this structure are not strictly part of the HFS Plus
specification, but general information is in the
Finder Info section of this note.
textEncoding
- A hint as to text encoding from which the file name
was derived. This hint can be used to improved the
quality of the conversion of the name to a Mac OS encoded
Pascal string. See Text
Encodings for details.
reserved2
- An implementation must treat this as a
reserved field.
dataFork
- Information about the location and size of the data
fork. See Fork Data
Structure for a description of the
HFSPlusForkData type.
resourceFork
- Information about the location and size of the
resource fork. See Fork Data
Structure for a description of the
HFSPlusForkData type.
For each fork, the first eight extents are described by
the HFSPlusForkData field in the catalog file
record. If a fork is sufficiently fragmented to require more
than eight extents, the remaining extents are described by
extent records in the extents
overflow file.
The following constants define
bit flags in the file record's flags field:
enum {
kHFSFileLockedBit = 0x0000,
kHFSFileLockedMask = 0x0001,
kHFSThreadExistsBit = 0x0001,
kHFSThreadExistsMask = 0x0002
};
|
The values have the following meaning:
kHFSFileLockedBit ,
kHFSFileLockedMask
- If
kHFSFileLockedBit is set, then none
of the forks may be extended, truncated, or written to.
They may only be opened for reading (not for writing).
The catalog information (like finderInfo and
userInfo ) may still be changed.
kHFSThreadExistsBit ,
kHFSThreadExistsMask
- This bit incidates that the file has a thread record.
As all files in HFS Plus have thread records, this bit
must be set.
Catalog Thread Records
The catalog thread record is used in the catalog B-tree file
to link a CNID to the file or folder record
using that CNID. The data of the record is described by the
HFSPlusCatalogThread type.
IMPORTANT:
In HFS, thread records were required for folders
but optional for files. In HFS Plus, thread records
are required for both files and folders.
|
struct HFSPlusCatalogThread {
SInt16 recordType;
SInt16 reserved;
HFSCatalogNodeID parentID;
HFSUniStr255 nodeName;
};
typedef struct HFSPlusCatalogThread HFSPlusCatalogThread;
|
The fields have the following meaning:
recordType
- The catalog data record type. For thread records,
this is
kHFSPlusFileRecord or
kHFSPlusFolderRecord , depending on whether
the thread record refers to a file or a folder. Both
types of thread record contain the same data.
reserved1
- An implementation must treat this as a
reserved field.
parentID
- The CNID of the parent of the file
or folder referenced by this thread record.
nodeName
- The name of the file or folder referenced by this
thread record.
The next section explains how thread records can be used
to find a file or folder using just its CNID.
Catalog Tree Usage
File and folder records always have a key that contains a
non-empty nodeName . The file and folder records
for the children are all consecutive in the catalog, since
they all have the same parentID in the key, and
vary only by nodeName .
The key for a thread record is the file's or folder's CNID (not the CNID of the parent folder) and
an empty (zero length) nodeName . This allows a
file or folder to by found using just the CNID. The thread
record contains the parentID and
nodeName field of the file or folder itself.
Finding a file or folder by its CNID is a two-step
process. The first step is to use the CNID to look up the
thread record for the file or folder. This yields the file
or folder's parent folder ID and name. The second step is to
use that information to look up the real file or folder
record.
Since files do not contain other files or folders, there
are no catalog records whose key has a parentID
equal to a file's CNID and nodeName with
non-zero length. These unused key values are reserved.
Finder Info
See the
Finder Interface Reference for more detailed information
about these data types and how the Finder uses them.
struct Point {
SInt16 v;
SInt16 h;
};
typedef struct Point Point;
struct Rect {
SInt16 top;
SInt16 left;
SInt16 bottom;
SInt16 right;
};
typedef struct Rect Rect;
/* OSType is a 32-bit value made by packing four 1-byte characters
together. */
typedef UInt32 FourCharCode;
typedef FourCharCode OSType;
/* Finder flags (finderFlags, fdFlags and frFlags) */
enum {
kIsOnDesk = 0x0001, /* Files and folders (System 6) */
kColor = 0x000E, /* Files and folders */
kIsShared = 0x0040, /* Files only (Applications only) If */
/* clear, the application needs */
/* to write to its resource fork, */
/* and therefore cannot be shared */
/* on a server */
kHasNoINITs = 0x0080, /* Files only (Extensions/Control */
/* Panels only) */
/* This file contains no INIT resource */
kHasBeenInited = 0x0100, /* Files only. Clear if the file */
/* contains desktop database resources */
/* ('BNDL', 'FREF', 'open', 'kind'...) */
/* that have not been added yet. Set */
/* only by the Finder. */
/* Reserved for folders */
kHasCustomIcon = 0x0400, /* Files and folders */
kIsStationery = 0x0800, /* Files only */
kNameLocked = 0x1000, /* Files and folders */
kHasBundle = 0x2000, /* Files only */
kIsInvisible = 0x4000, /* Files and folders */
kIsAlias = 0x8000 /* Files only */
};
/* Extended flags (extendedFinderFlags, fdXFlags and frXFlags) */
enum {
kExtendedFlagsAreInvalid = 0x8000, /* The other extended flags */
/* should be ignored */
kExtendedFlagHasCustomBadge = 0x0100, /* The file or folder has a */
/* badge resource */
kExtendedFlagHasRoutingInfo = 0x0004 /* The file contains routing */
/* info resource */
};
struct FileInfo {
OSType fileType; /* The type of the file */
OSType fileCreator; /* The file's creator */
UInt16 finderFlags;
Point location; /* File's location in the folder. */
UInt16 reservedField;
};
typedef struct FileInfo FileInfo;
struct ExtendedFileInfo {
SInt16 reserved1[4];
UInt16 extendedFinderFlags;
SInt16 reserved2;
SInt32 putAwayFolderID;
};
typedef struct ExtendedFileInfo ExtendedFileInfo;
struct FolderInfo {
Rect windowBounds; /* The position and dimension of the */
/* folder's window */
UInt16 finderFlags;
Point location; /* Folder's location in the parent */
/* folder. If set to {0, 0}, the Finder */
/* will place the item automatically */
UInt16 reservedField;
};
typedef struct FolderInfo FolderInfo;
struct ExtendedFolderInfo {
Point scrollPosition; /* Scroll position (for icon views) */
SInt32 reserved1;
UInt16 extendedFinderFlags;
SInt16 reserved2;
SInt32 putAwayFolderID;
};
typedef struct ExtendedFolderInfo ExtendedFolderInfo;
|
Back to top
Extents Overflow File
HFS Plus tracks which allocation blocks belong to a
file's forks by maintaining a list of extents (contiguous
allocation blocks) that belong to that file, in the
appropriate order. Each extent is represented by a pair of
numbers: the first allocation block number of the extent and
the number of allocation blocks in the extent. The file
record in the catalog B-tree contains a record of the first
eight extents of each fork. If there are more than eight
extents in a fork, the remaining extents are stored in the
extents overflow file.
Like the catalog file, the extents overflow file is
B-tree. However, the structure of the
extents overflow file is relatively simple compared to that
of a catalog file. The extents overflow file has a simple,
fixed length key and a single type of data record.
Extents Overflow File Key
The structure of the key for the extents overflow file is
described by the HFSPlusExtentKey type.
struct HFSPlusExtentKey {
UInt16 keyLength;
UInt8 forkType;
UInt8 pad;
HFSCatalogNodeID fileID;
UInt32 startBlock;
};
typedef struct HFSPlusExtentKey HFSPlusExtentKey;
|
The fields have the following meaning:
keyLength
- The
keyLength field is required by all
keyed records in a B-tree.
The extents overflow file, in common with all HFS Plus
B-trees, uses a large key length (UInt16 ).
Keys in the extents overflow file always have the same
length, kHFSPlusExtentKeyMaximumLength (10).
forkType
- The type of fork for which this extent record
applies. This must be either 0 for the data fork or 0xFF
for the resource fork.
pad
- An implementation must treat this as a
pad field.
fileID
- The CNID of the file for which this
extent record applies.
startBlock
- The offset, in allocation blocks, into the fork of
the first extent described by this extent record. The
startBlock field lets you directly find the particular
extents for a given offset into a fork.
Note:
Typically, an implementation will keep a copy of
the initial extents from the catalog record. When
trying to access part of the fork, they see whether
that position is beyond the extents described in
the catalog record; if so, they use that offset (in
allocation blocks) to find the appropriate extents
B-tree record. See
Extents
Overflow File Usage for more information.
|
Two HFSPlusExtentKey structures are compared
by comparing their fields in the following order:
fileID , forkType ,
startBlock . Thus, all the extent records for a
particular fork are grouped together in the B-tree, right
next to all the extent records for the other fork of the
file.
Extents Overflow File Data
The data records for an extents overflow file (the
extent records) are described by the
HFSPlusExtentRecord type, which is described in
detail in Fork Data
Structure.
IMPORTANT:
Remember that the HFSPlusExtentRecord
contains descriptors for eight extents. The first
eight extents in a fork are held in its
catalog file
record. So the number of extent records for a
fork is ((number of extents - 8 + 7) / 8).
|
Extents Overflow
File Usage
The most important thing to remember about extents
overflow file is that it is only used for forks with more
than eight extents. In most cases, forks have fewer extents,
and all the extents information for the fork is held in its
catalog file record. However, for more fragmented forks, the
extra extents information is stored in the extents overflow
file.
When an implementation needs to map a fork offset into a
position on disk, it first looks through the extent records in
the catalog file record. If the fork offset is within one
these extents, the implementation can find the corresponding
position without consulting the extents overflow file.
If, on the other hand, the fork offset is beyond the last
extent recorded in the catalog file record, the
implementation must look in the next extent record, which is
stored in the extents overflow file. To find this record,
the implementation must form a key, which consists of
information about the fork (the fork type and the file ID)
and the offset info the fork (the start block).
Because extent records are partially keyed off the fork
offset of the first extent in the record, the implementation
must have all the preceding extent records in order to know
the fork offset to form the key of the next extent record.
For example, if the fork has two extent records in the
extents overflow file, the implementation must read the
first extent record to calculate the fork offset for the key
for the second extent record.
However, you can use the startBlock in the
extent key to go directly to the record you need. Here's a
complicated example:
We've got a fork with a total of 23 extents (very
fragmented!). The blockCounts for the extents,
in order, are as follows: one extent of 6 allocation blocks,
14 extents of one allocation block each, two extents of two
allocation blocks each, one extent of 7 allocation blocks,
and five more extents of one allocation block each. The fork
contains a total of 36 allocation blocks.
The block counts for the catalog's fork data are: 6, 1,
1, 1, 1, 1, 1, 1. There is an extent overflow record whose
startBlock is 13 (0+6+1+1+1+1+1+1+1), and has the following
block counts: 1, 1, 1, 1, 1, 1, 1, 2. There is a second
extent overflow record whose startBlock is 22
(13+1+1+1+1+1+1+1+2), and has the following block counts: 2,
7, 1, 1, 1, 1, 1, 0. Note this last record only contains
seven extents.
Suppose the allocation block size for the volume is 4K.
Suppose we want to start reading from the file at an offset
of 108K. We want to know where the data is on the volume,
and how much contiguous data is there.
First, we divide 108K (the fork offset) by 4K (the
allocation block size) to get 27, which is the number of
allocation blocks from the start of the fork. So, we want to
know where fork allocation block #27 is. We notice that 27
is greater than or equal to 13 (the number of allocation
blocks in the catalog's fork data), so we're going to have
to look in the extents B-tree.
We construct a search key with the appropriate
fileID and forkType , and set
startBlock to 27 (the desired fork allocation
block number). We then search the extents B-tree for the
record whose key is less than or equal to our search key. We
find the second extent overflow record (the one with
startBlock =22). It has the same
fileID and forkType , so things are
good. Now we just need to figure out which extent within
that record to use.
We compute 27 (the desired fork allocation block) minus
22 (the startBlock ) and get 5. So, we want the
extent that is 5 allocation blocks "into" the record. We try
the first extent. It's only two allocation blocks long, so
the desired extent is 3 allocation blocks after that first
extent in the record. The next extent is 7 allocation blocks
long. Since 7 is greater than 3, we know the desired fork
position is within this extent (the second extent in the
second overflow record). Further, we know that there are
7-3=4 contiguous allocation blocks (i.e., 16K).
We grab the startBlock for that second
extent (i.e., the one whose blockCount is 7);
suppose this number is 444. We add 3 (since the desired
position was 3 allocation blocks into the extent we found).
So, the desired position is in allocation block 444+3=447 on
the volume. That is 447*4K=1788K from the start of the HFS
Plus volume. (Since the Volume Header always starts 1K after
the start of the HFS Plus volume, the desired fork position
is 1787K after the start of the Volume Header.)
Bad Block File
The extent overflow file is also used to hold information
about the bad block file. The bad block file is used to mark
areas on the disk as bad, unable to be used for storing
data. This is typically used to map out bad sectors on the
disk.
Note:
All space on an HFS Plus volume is allocated in
terms of allocation blocks. Typically, allocation
blocks are larger than sectors. If a sector is
found to be bad, the entire allocation block is
unusable.
|
When an HFS Plus volume is embedded within an HFS wrapper
(the way Mac OS normally initializes a hard disk), the space
used by the HFS Plus volume is marked as part of the bad
block file within the HFS wrapper itself. (This
sounds confusing because you have a volume within another
volume.)
The bad block file is not a file in the same sense as a
user file (it doesn't have a file record in the catalog) or
one of the special files (it's not referenced by the volume
header). Instead, the bad block file uses a special
CNID
(kHFSBadBlockFileID ) as the key for extent
records in the extents overflow file. When a block is marked
as bad, an extent with this CNID and encompassing the bad
block is added to the extents overflow file. The block is
marked as used in the allocation
file. These steps prevent the block from being used for
data by the file system.
IMPORTANT:
The bad block file is necessary because marking a
bad block as used in the allocation file is
insufficient. One
common
consistency check for HFS Plus volumes is to
verify that all the allocation blocks on the volume
are being used by real data. If such a check were
run on a volume with bad blocks that weren't also
covered by extents in the bad block file, the bad
blocks would be freed and might be reused for file
system data.
|
Bad block extent records are always assumed to reference
the data fork. The forkType field of the key
must be 0.
Note:
Because an extent record holds up to eight extents,
adding a bad block extent to the bad block file
does not necessarily require the addition of a new
extent record.
|
HFS uses a similar mechanism to store information about
bad blocks. This facility is used by the
HFS Wrapper to hold an entire HFS
Plus volume as bad blocks on an HFS disk.
Back to top
Allocation File
HFS Plus uses an allocation file to keep track of whether
each allocation block in a volume is currently allocated to
some file system structure or not. The contents of the
allocation file is a bitmap. The bitmap contains one bit for
each allocation block in the volume. If a bit is set, the
corresponding allocation block is currently in use by some
file system structure. If a bit is clear, the corresponding
allocation block is not currently in use, and is available
for allocation.
Note:
HFS stores allocation information in a special area
on the volume, known as the volume bitmap.
The allocation file mechanism used by HFS Plus has
a number of advantages.
- Using a file allows the bitmap itself to be
allocated from allocation blocks. This
simplifies the design, since volumes are now
comprised of only one type of block -- the
allocation block. The HFS is slightly more
complex because it uses 512-byte blocks to hold the
allocation bitmap and allocation blocks to hold
file data.
- The allocation file does not have to be
contiguous, which allows allocation information
and user data to be interleaved. Many modern
file systems do this to reduce head travel when
growing files.
- The allocation file can be extended, which
makes it significantly easier to increase the
number of allocation blocks on a disk. This is
useful if you want to either decrease the
allocation block size on a disk, or increase the
total disk size.
- The allocation file may be shrunk. This
makes it easy to create a disk images suitable
for volumes of varying sizes. The allocation
file in the disk image is sized to hold enough
allocation data for the largest disk, and shrunk
back when the disk is written to a smaller disk.
|
Each byte in the allocation file holds the state of eight
allocation blocks. The byte at offset X into the file
contains the allocation state of allocations blocks (X * 8)
through (X * 8 + 7). Within each byte, the most significant
bit holds information about the allocation block with the
lowest number, the least significant bit holds information
about the allocation block with the highest number. Listing
1 shows how you would test whether an allocation block is in
use, assuming that you've read the entire allocation file
into memory.
static Boolean IsAllocationBlockUsed(UInt32 thisAllocationBlock,
UInt8 *allocationFileContents)
{
UInt8 thisByte;
thisByte = allocationFileContents[thisAllocationBlock / 8];
return (thisByte & (1 << (7 - (thisAllocationBlock % 8)))) != 0;
}
|
Listing 1 Determining whether an
allocation block is in use.
|
The size of the allocation file depends on the number of
allocation blocks in the volume, which in turn depends both
on the size of the disk and on the size of the
volume's allocation blocks. For example, a volume on a 1 GB disk and
having an allocation block size of 4 KB needs an allocation
file size of 256 Kbits (32 KB, or 8 allocation blocks).
Since the allocation file itself is allocated using
allocation blocks, it always occupies an integral number of
allocation blocks (its size may be rounded up).
The allocation file may be larger than the minimum number
of bits required for the given volume size. Any unused bits
in the bitmap must be set to zero.
Note:
Since the number of allocation blocks is determined
by a 32-bit number, the size of the allocation file
can be up to 512 MB in size, a radical increase
over HFS's 8 KB limit.
|
IMPORTANT:
Because the entire volume is composed of allocation
blocks (with the possible
exception of the alternate volume header, as
described above), the volume header,
alternate volume header, and reserved areas (the
first 1024 bytes and the last 512 bytes) must be
marked as allocated in the allocation file. The
actual number of allocation blocks allocated for
these areas varies with the size of the
allocation blocks. Any allocation block that
contains any of these areas must be marked
allocated.
For example, if 512-byte allocation blocks are
used, the first three and last two allocation
blocks are allocated. With 1024-byte allocation
blocks, the first two and the last allocation
blocks are allocated. For larger allocation block
sizes, only the first and last allocation blocks
are allocated for these areas.
See the Volume
Header section for a description of these
areas.
|
Back to top
Attributes File
The HFS Plus attributes file is reserved for implementing
named forks in the future. An attributes file is organized
as a B-tree file. It a special file,
described by an HFSPlusForkData record in the
volume header, with no entry in the catalog file. An
attributes files has a variable length key and three data
record types, which makes it roughly as complex as the
catalog file.
It is possible for a volume to have no attributes file.
If the first extent of the attributes file (stored in the
volume header) has zero allocation blocks, the attributes
file does not exist.
The B-Trees chapter defined a standard rule for the
node size of HFS Plus B-trees. As
the attributes file is a B-tree, it inherits the
requirements of this rule. In addition, the node size of the
attributes file must be at least 4 KB
(kHFSPlusAttrMinNodeSize ).
IMPORTANT:
The exact organization of the attributes B-tree has
not been fully designed. Specifically:
- the structure of the keys in the attribute
B-tree has not been finalized and is subject to
change, and
- addition attribute's file data record types
may be defined.
An implementation written to this specification
may use the details that are final to perform basic
consistency checks on attributes. These checks will
be compatible with future implementations written
to a final attributes specification. See
Attributes
and the Allocation File Consistency Check.
|
Attributes File Data
IMPORTANT:
Several types of attributes file data records are
defined. It is possible that additional record
types will be defined in future specifications.
Implementations written to this specification must
ignore record types not defined here.
|
The leaf nodes of an attributes file contain data
records, known as attributes. There are two types of
attributes:
- Fork data attributes are used for attributes
whose data is large. The attribute's data is stored in
extents on the volume and the attribute merely contains a
reference to those extents.
- Extension attributes augment fork data
attributes, allowing an fork data attribute to have more
than eight extents.
Each record starts with a recordType field,
which describes the type of attribute data record. The
recordType field contains one of the following
values.
enum {
kHFSPlusAttrInlineData = 0x10,
kHFSPlusAttrForkData = 0x20,
kHFSPlusAttrExtents = 0x30
};
|
The values have the following meaning:
kHFSPlusAttrInlineData
- Reserved for future use.
kHFSPlusAttrForkData
- This record is a fork
data attribute. You can use the
HFSPlusAttrForkData type to interpret the
data.
kHFSPlusAttrExtents
- This record is an
extension attribute.
You can use the
HFSPlusAttrExtents type to
interpret the data. A record of type
kHFSPlusAttrExtents is really just overflow
extents for a corresponding record of type
kHFSPlusAttrForkData . (Think of
kHFSPlusAttrForkData as being like a catalog
record and kHFSPlusAttrExtents as being like
an extents overflow record.)
The next two sections describe the fork data and
extension attributes in detail.
Fork Data Attributes
A fork data attribute is defined by the
HFSPlusAttrForkData data type.
struct HFSPlusAttrForkData {
UInt32 recordType;
UInt32 reserved;
HFSPlusForkData theFork;
};
typedef struct HFSPlusAttrForkData HFSPlusAttrForkData;
|
The fields have the following meaning:
recordType
- The attribute data record type. For fork data
attributes, this is always
kHFSPlusAttrForkData .
reserved
- An implementation must treat this as a
reserved field.
theFork
- Information about the location and size of the
attribute data. See Fork
Data Structure for a description of the
HFSPlusForkData type.
Extension Attributes
A extension attribute is defined by the
HFSPlusAttrExtents data type.
struct HFSPlusAttrExtents {
UInt32 recordType;
UInt32 reserved;
HFSPlusExtentRecord extents;
};
typedef struct HFSPlusAttrExtents HFSPlusAttrExtents;
|
The fields have the following meaning:
recordType
- The attribute data record type. For extension
attributes, this is always
kHFSPlusAttrExtents .
reserved
- An implementation must treat this as a
reserved field.
extents
- The eight extents of the attribute data described by
this record. See Fork Data
Structure for a description of the
HFSPlusExtentRecord type.
Attributes
and the Allocation File Consistency Check
While the key structure for the attributes file is not
fully specified, it is still possible for an implementation
to use attribute file information in its allocation file
consistency check. The leaf records of the attribute file
are fully defined, so the implementation can simply iterate
over them to determine which allocation blocks on the disk
are being used by fork data attributes.
See Allocation
File Consistency Check for details.
Back to top
Startup File
The startup file is a special file intended to hold
information needed when booting a system that does not have
built-in (ROM) support for HFS Plus. A boot loader can find
the startup file without full knowledge of the HFS Plus
volume format (B-trees, catalog file, and so on). Instead,
the volume header contains the
location of the first eight extents of the startup file.
IMPORTANT:
It is legal for the startup file to contain more than eight
extents, and for the remaining extents to be placed in the
extents overflow file. However, doing so defeats the purpose
of the startup file.
|
Back to top
Hard Links
Hard links are a feature that allows multiple directory entries
to refer to a single file's content. They are a way to give a single
file multiple names, possibly in multiple directories. This section
describes how Mac OS X implements hard links on HFS Plus volumes.
The Mac OS X implementation of hard links on HFS Plus volumes
was done using the existing metadata fields of the catalog records.
This makes it possible to back up and restore a volume using hard
links, by backing up and restoring individual files, without having
to understand or interpret the hard links. An HFS Plus implementation
may choose to automatically follow hard links, or not.
Hard links in HFS Plus are represented by a set of several files.
The actual file content (which is shared by each of the hard links)
is stored in a special indirect node file. This indirect node file
is the equivalent of an inode in a traditional UNIX file system.
HFS Plus uses special hard link files (or links)
to refer (or point) to an indirect node file. There is one hard link
file for each directory entry or name that refers to the file content.
Indirect node files exist in a special directory called the
metadata directory. This directory exists in the volume's root
directory. The name of the metadata directory is four null
characters followed by the string "HFS+ Private Data". The
directory's creation date is set to the creation date of the
volume's root directory. The kIsInvisible and
kNameLocked bits are set in the directory's
Finder information. The icon
location in the Finder info is set to the point
(22460, 22460). These Finder info settings are not mandatory,
but they tend to reduce accidental changes to the metadata directory.
An implementation that automatically follows hard links should
make the metadata directory inaccessable from its normal
file system interface.
Note:
The case-insensitive Unicode
string comparison used by HFS Plus and case-insensitive
HFSX sorts null characters after all other
characters, so the metadata directory will typically be the last
item in the root directory. On case-sensitive HFSX
volumes, null characters sort before other characters, so the
metadata directory will typically be the first item in the root directory.
|
Indirect node files have a special identifying number called a
link reference. The link reference is unique among indirect
node files on a given volume. The link reference is not related to
catalog node IDs. When a new indirect node
file is created, it is assigned a new link reference randomly chosen
from the range 100 to 1073741923.
The file name of an indirect node file is the string "iNode"
immediately followed by the link reference converted to decimal text,
with no leading zeroes. For example, an indirect node file with link
reference 123 would have the name "iNode123".
An indirect node file must be a file, not a directory.
Hard links to directories are not allowed because they could cause cycles
in the directory hierarchy if a hard link pointed to one of its ancestor
directories.
The linkCount field in the
permissions is an estimate of
the number of links referring to this indirect node file. An
implementation that understands hard links should increment this
value when creating an additional link, and decrement the value
when removing a link. However, some implementations (such as
traditional Mac OS) do not understand hard links and may make
changes that cause the linkCount to be inaccurate.
Similarly, it is possible for a link to refer to an indirect
node file that does not exist. When removing a link, an
implementation should not allow the linkCount
to underflow; if it is already zero, do not change it.
Note:
The inode number returned by the POSIX stat
or lstat routines in the st_ino
field of the stat structure is actually the
catalog node ID of the indirect
node file, not the link reference mentioned above.
The reason for using a separate link reference number, instead of a
catalog node ID, is to allow hard links to be
backed up and restored by utilities that are not specifically aware
of hard links. As long as they preserve filenames, Finder info,
and permissions, then
the hard links will be preserved.
|
Hard link files are ordinary files in the catalog. The
catalog node ID of a hard link
file is different from the catalog node ID
of the indirect node file it refers to, and different from the
catalog node ID of any other hard link file.
The fileType and fileCreator fields
of the userInfo in the
catalog record of a hard link file must be set to
kHardLinkFileType and kHFSPlusCreator ,
respectively. The hard link file's creation date should be set to
the creation date of the metadata directory. The hard link file's
creation date may also be set to the creation date of the volume's
root directory (if it differs from the creation date of the metadata
directory), though this is deprecated. The iNodeNum field
in the permissions is set to the
link reference of the indirect node file that the link refers to.
For better compatibility with older versions of the Mac OS Finder,
the kHasBeenInited flag should be set in the Finder
flags. The other Finder information, and other dates in the catalog
record are reserved.
enum {
kHardLinkFileType = 0x686C6E6B, /* 'hlnk' */
kHFSPlusCreator = 0x6866732B /* 'hfs+' */
}; |
POSIX semantics allow an open file to
be unlinked (deleted). These open but unlinked files are stored on HFS
Plus volumes much like a hard link. When the open file is deleted, it
is renamed and moved into the metadata directory. The new name is the
string "temp" followed by the catalog node ID
converted to decimal text. When the file is eventually closed, this
temporary file may be removed. All such temporary files may be removed
when repairing an unmounted HFS Plus volume.
Repairing the Metadata Directory
When repairing an HFS Plus volume with hard links or a metadata
directory, there are several conditions that might need to be repaired:
- Opened but deleted files (which are now orphaned).
- Orphaned indirect node files (no hard links refer to them).
- Broken hard link (hard link exists, but indirect node file does not).
- Incorrect link count.
- Link reference was 0.
Opened but deleted files are files whose names start with "temp",
and are in the metadata directory. If the volume is not in use
(not mounted, and not being used by any other utility), then these
files can be deleted. Volumes with a journal,
even one with no active transactions, may have opened but undeleted
files that need to be deleted.
Detecting an orphaned indirect node file, broken hard link, or incorrect link
count requires finding all hard link files in the catalog, and comparing
the number of found hard links for each link reference with the link
count of the corresponding indirect node file.
A hard link with a link reference equal to 0 is invalid. Such a hard
link may be the result of a hard link being copied or restored by an
implementation or utility that does not use the
permissions in catalog records. It may be possible to repair the
hard link by determining the proper link reference. Otherwise, the
hard link should be deleted.
Back to top
Symbolic Links
Similar to a hard link, a symbolic link is
a special kind of file that refers to another file or directory.
A symbolic link stores the path name of the file or directory it
refers to.
On an HFS Plus volume, a symbolic link is stored
as an ordinary file with special values in some of the fields of its
catalog record. The pathname of the
file being referred to is stored in the data fork. The file type in
the fileMode field of the
permissions is set to S_IFLNK . For compatibility
with Carbon and Classic applications, the file type of a symbolic
link is set to kSymLinkFileType , and the creator code
is set to kSymLinkCreator . The resource fork of the
symbolic link has zero length and is
reserved.
enum {
kSymLinkFileType = 0x736C6E6B, /* 'slnk' */
kSymLinkCreator = 0x72686170 /* 'rhap' */
}; |
Note:
The pathname stored in a symbolic link is assumed to be a POSIX
pathname, as used by the Mac OS X BSD and Cocoa programming interfaces.
It is not a traditional Mac OS, or Carbon, pathname. The path
is encoded in UTF-8. It must be a valid UTF-8 sequence, with no null
(zero) bytes. The path may refer to another volume. The path need
not refer to any existing file or directory. The path may be full
or partial (with or without a leading forward slash). For maximum
compatibility, the length of the path should be 1024 bytes or less.
|
Back to top
Journal
An HFS Plus volume may have an optional journal to speed
recovery when mounting a volume that was not unmounted safely
(for example, as the result of a power outage or crash). The
journal makes it quick and easy to restore the volume
structures to a consistent state, without having to scan all of
the structures. The journal is used only for the volume
structures and metadata; it does not protect the contents of a
fork.
Background
A single change to the volume may require writing coordinated
changes to many different places on the volume. If a failure
happens after some, but not all, of the changes have been
written, then the volume may be seriously damaged and may result
in catastrophic loss of data.
For example, creating a file or directory requires adding two
records (the file or folder record, and its thread record) to the
catalog B-tree. A leaf node may not have enough room for a new
record, so it may have to be split. That means that some records
will be removed from the existing node and put into a newly
allocated node. This requires adding a new key and pointer to
the index node that is the parent of the leaf being split (which
might require splitting the index node, and so on). If a failure
occurs after the split, but before the index node is updated,
all of the items in the new node would become inaccessible.
Recovery from this sort of damage is time consuming at best,
and may be impossible.
The purpose of the journal is to ensure that when a group of
related changes are being made, that either all of those changes
are actually made, or none of them are made. This is done by
gathering up all of the changes, and storing them in a separate
place (in the journal). Once the journal copy of the changes
is completely written to disk, the changes can actually be
written to their normal locations on disk. If a failure happens
at that time, the changes can simply be copied from the
journal to their normal locations. If a failure happens when
the changes are being written to the journal, but before they are
marked complete, then all of those changes are ignored.
A group of related changes is called a transaction.
When all of the changes of a transaction have been written to
their normal locations on disk, that transaction has been
committed, and is removed from the journal. The journal
may contain several transactions. Copying changes from all
transactions to their normal locations on disk is called
replaying the journal.
IMPORTANT:
Implementations accessing a journaled volume with transactions
must either refuse to access the volume, or replay the journal
to be sure the volume is consistent. If the
lastModifiedVersion field of the
volume header does not match the
signature of an implementation known to properly use and update
the journal, then the journal must not be replayed (since it may
no longer match the on-disk structures, and could cause
corruption if replayed).
|
Overview of Journal Data Structures
If kHFSVolumeJournaledBit is set in the volume header's attributes field, the
volume has a journal. The journal data stuctures consist of a
journal info block, journal header, and journal buffer. The
journal info block indicates the location and size of the
journal header and journal buffer. The journal buffer is
the space set aside to hold transactions. The journal
header describes which part of the journal buffer is active
and contains transactions waiting to be committed.
Figure 7. Overview of an HFS Plus Journal.
On HFS Plus volumes, the journal info block is stored as a
file (so that its space can be properly represented in a catalog
record and the allocation bitmap). The name of that file is
".journal_info_block" and it is stored in the
volume's root directory. The journal header and journal buffer
are stored together in a different file named
".journal" , also in the volume's root directory.
Each of these files are contiguous on disk (they occupy exactly
one extent). An implementation that uses the journal must
prevent the files from being accessed as normal files.
Note:
An implementation must find the journal info block by using the
journalInfoBlock field of the volume header, not by
the file name. Similarly, an implementation must find the
journal header and journal buffer using the contents of the
journal info block, not the file name.
|
A single transaction consists of several blocks, including
both the data to be written, and the location where that data is
to be written. This is represented on disk by a block list
header, which describes the number and sizes of the blocks,
immediately followed by the contents of those blocks.
Figure 8. A Simple Transaction.
Since block list headers are of limited size, a single
transaction may consist of several block list headers and their
associated block contents (one block list header followed by the
contents of the blocks that header describes, then the next
block list header and its block contents, and so on). If the
next field of the first block_info is
non-zero, then the next block list header is a continuation of
the same transaction.
Figure 9. A Transaction with Multiple Block Lists.
The journal buffer is treated as a circular buffer. When
reading or writing the journal buffer, the I/O operation must
stop at the end of the journal buffer and resume (wrap around)
immediately following the journal header. Block list headers or
the contents of blocks may wrap around in this way. Only a
portion of the journal buffer is active at any given time; this
portion is indicated by the start and
end fields of the journal header. The part of the
journal buffer that is not active contains no meaningful data,
and must be ignored.
To prevent ambiguity when start equals
end , the journal is never allowed to be perfectly
full (all of the journal buffer used by block lists and blocks).
If the journal was perfectly full, and start was
not equal to jhdr_size , then end would
be equal to start . You would then be unable to
differentiate between an empty and full journal.
When the journal is not empty (contains transactions),
it must be replayed to be sure the volume
is consistent. That is, the data from each of the transactions must be
written to the correct blocks on disk.
Journal Info Block
The journal info block describes where the journal header and
journal buffer are stored. The journal info block is stored at the
beginning of the allocation block whose number is stored in the
journalInfoBlock field of the
volume header. The journal info block
is described by the data type JournalInfoBlock .
struct JournalInfoBlock {
UInt32 flags;
UInt32 device_signature[8];
UInt64 offset;
UInt64 size;
UInt32 reserved[32];
};
typedef struct JournalInfoBlock JournalInfoBlock;
|
The fields have the following meaning:
flags
- Contains a set of one-bit flags. The currently
defined bits are described below.
An implementation must treat the undefined bits as
reserved.
device_signature
- This space is reserved
for describing the device containing the journal when the
journal is not stored in the volume itself (when
kJIJournalOnOtherDeviceMask is set).
offset
- The offset in bytes from the start of the device to
the start of the journal header. When the journal is stored
in the volume itself (
kJIJournalInFSMask
is set), this offset is relative to the start of the volume.
size
- The size in bytes of the journal, including the journal
header and the journal buffer. This size does not include
the journal info block.
reserved
- This space is reserved.
The following constants define bit flags in the flags field:
enum {
kJIJournalInFSMask = 0x00000001,
kJIJournalOnOtherDeviceMask = 0x00000002,
kJIJournalNeedInitMask = 0x00000004
};
|
The values have the following meaning:
kJIJournalInFSMask
- When set, the space for the journal header and
transactions resides inside the volume being journaled.
The
offset field of the journal info block
is relative to the start of the volume (allocation
block number zero).
kJIJournalOnOtherDeviceMask
- When set, the space for the journal header and
journal buffer does not reside inside the volume being
journaled. The
device_signature field
in the journal info block describes the device containing
the journal.
kJIJournalNeedInitMask
- This bit is set to indicate that the journal header is invalid
and needs to be initialized. This bit is typically set when the
journal is first created, and the space has been allocated; the first
mount of the journaled volume typically initializes the journal header
and clears this bit. When this bit is set, there are no valid transactions
in the journal.
Note:
Implementations must currently set
kJIJournalInFSMask , but not
kJIJournalOnOtherDeviceMask . Journals stored on a
separate device are not currently supported. The format of the
device_signature field is not yet defined.
|
Journal Header
The journal begins with a journal header, whose main purpose
is to describe the location of transactions in the journal
buffer. The journal header is stored using the
journal_header data type.
typedef struct journal_header {
UInt32 magic;
UInt32 endian;
UInt64 start;
UInt64 end;
UInt64 size;
UInt32 blhdr_size;
UInt32 checksum;
UInt32 jhdr_size;
} journal_header;
#define JOURNAL_HEADER_MAGIC 0x4a4e4c78
#define ENDIAN_MAGIC 0x12345678
|
The fields have the following meaning:
magic
- Contains the value
JOURNAL_HEADER_MAGIC (0x4a4e4c78).
This is used to verify the integrity of the journal header.
endian
- Contains the value
ENDIAN_MAGIC (0x12345678).
This is used to verify the integrity of the journal header.
start
- Contains the offset in bytes from the start of the journal
header to the start of the first (oldest) transaction.
end
- Contains the offset in bytes from the start of the journal
header to the end of the last (newest) transaction. Note
that this field may be less than the
start field,
indicating that the transactions wrap around the end of the
journal's circular buffer. If end equals
start , then the journal is empty, and there are
no transactions that need to be replayed.
size
- The size of the journal, in bytes. This includes the journal
header and the journal buffer. This value must
be equal to the value in the
size field of the
journal info block.
blhdr_size
- The size of one block list header,
in bytes. This value typically ranges from 4096 to 16384.
checksum
- The checksum of the journal header, computed as described
below.
jhdr_size
- The size of the journal header, in bytes. The journal header
always occupies exactly one sector so that it
can be updated atomically. Therefore, this value is equal to the
sector size (for example, 2048 on many types of optical
media).
Block List Header
The block list header describes a list of blocks
included in a transaction. A transaction may include several
block lists if it modifies more blocks than can be represented
in a single block list. The block list header is stored in a
structure of type block_list_header .
typedef struct block_list_header {
UInt16 max_blocks;
UInt16 num_blocks;
UInt32 bytes_used;
UInt32 checksum;
UInt32 pad;
block_info binfo[1];
} block_list_header;
|
The fields have the following meaning:
max_blocks
- The maximum number of blocks (
block_info
items) this block list can describe. This field is used
while in memory to keep track of journal buffer sizes. On
disk, this field is reserved.
num_blocks
- The number of elements in the
binfo array. Since
the first element of the binfo array is used to
chain multiple block lists into a single transaction, the
actual number of data blocks is num_blocks - 1 .
bytes_used
- The number of bytes occupied in the journal for this block list,
including the block list header and the data for each of the blocks
in the list. The next block list header (if any) will be
bytes_used bytes from the start of the current block
list header, wrapping from the end of the journal buffer to the
start of the journal buffer if needed.
checksum
- The checksum of the block list header,
including the first element of the
binfo array
(a total of 32 bytes).
pad
- Alignment padding. An implementation must treat this
field as reserved.
binfo
- A variable-sized array of blocks. The array contains
num_blocks+1 entries. The first entry is used
when a single transaction occupies multiple block lists,
using the next field as described above. The
remaining num_blocks entries describe where the
data from this block list will be written to disk.
The first element of the binfo array is
used to indicate whether the transaction contains additional
block lists. Each of the other elements of the
binfo array represent a single block of data
in the journal buffer which must be copied to its correct
location on disk. The fields have the following meaning:
typedef struct block_info {
UInt64 bnum;
UInt32 bsize;
UInt32 next;
} block_info;
|
bnum
- The sector number where the data in this block
must be written. If this field is 0xFFFFFFFFFFFFFFFF (all
64 bits set), then this block must be skipped and not
written. This field is reserved for the first element
of the
binfo array.
bsize
- The number of bytes to be copied from the journal buffer
to the above sector number. This value will be a
multiple of 512. This field is reserved for the first element
of the
binfo array.
next
- This field is used while in memory to keep track of
transactions that span multiple block lists. If this field
is zero in the first
block_info of a block
list, then the transaction ends with this block list;
otherwise, the transaction has one or more additional block
lists. This field is meaningful only for the first element
of the block list array. The actual on-disk value has no
meaning beyond testing for zero or non-zero.
Journal Checksums
The journal header and
block list header both contain
checksum fields. These checksums can be verified as part of a
basic consistency check of these structures. To verify the
checksum, temporarily set the checksum field to zero and then call
the calc_checksum routine with the address and size of the
header being checksummed. The function result should equal the
original value of the checksum field.
static int
calc_checksum(unsigned char *ptr, int len)
{
int i, cksum=0;
for(i=0; i < len; i++, ptr++) {
cksum = (cksum << 8) ^ (cksum + *ptr);
}
return (~cksum);
}
|
Replaying the Journal
In order to replay the journal, an implementation just loops
over the transactions, copying each individual block in the
transaction from the journal to its proper location on the
volume. Once those blocks have been flushed to the media (not
just the driver!), it may update the journal header to remove
the transactions.
Note:
Replaying the journal does not guarantee that the temporary files
associated with open but unlinked
files are deleted. After replaying the journal, these temporary
files may be deleted.
|
Here are the steps to replay the journal:
- Read the volume header
into variable
vhb . The volume may have an
HFS wrapper; if so, you will need to use
it to determine the location of the volume header.
- Test the
kHFSVolumeJournaledBit in the
attributes field of the volume header. If
it is not set, there is no journal to replay, and you
are done.
- Read the journal info block
from the allocation block number
vhb.journalInfoBlock ,
into variable jib .
- If
kJIJournalNeedsInitMask is set in jib.flags ,
the journal was never initialized, so there is no journal to replay.
- Verify that
kJIJournalInFSMask is set and
kJIJournalOnOtherDeviceMask is clear in
jib.flags .
- Read the journal header at
jib.offset bytes from the start of the volume, and
place it in variable jhdr .
- If
jhdr.start equals jhdr.end , the
journal does not have any transactions, so there is nothing
to replay.
- Set the current offset in the journal (typically a local variable)
to the start of the journal buffer,
jhdr.start .
- While
jhdr.start does not equal jhdr.end ,
perform the following steps:
- Read a block list header of
jhdr.blhdr_size bytes from the current offset
in the journal into variable blhdr .
- For each block in
bhdr.binfo[1] to
bhdr.binfo[blhdr.num_blocks] , inclusive, copy
bsize bytes from the current offset in the
journal to sector bnum on the volume (to byte
offset bnum*jdhr.jhdr_size ). Remember that
jhdr_size is the size of a sector, in bytes.
- If
bhdr.binfo[0].next is zero, you have completed
the last block list of the current transaction; set
jhdr.start to the current offset in the journal.
Note:
Remember that the journal is a circular buffer. When reading a
block list header or block from
the journal buffer (in the loop described above), you will need
to check whether it wraps around the end of the journal buffer.
If it would extend beyond the end of the journal buffer, you
must stop reading at the end of the journal buffer, and resume
reading at the start of the journal buffer (offset
jhdr.jhdr_size bytes from the start of the
journal).
|
After replaying an entire transaction (all blocks
in a block list, when bhdr.binfo[0] is zero), or after
replaying all transactions, you may update the value of the start
field in the journal header to the current
offset in the journal. This will remove those block lists from the
journal since they have been written to their correct locations on disk.
WARNING:
You must ensure that previous block writes complete before updating
the journal header's start field on disk. One way to
do this is to issue a flush to the device driver and wait until
the device driver has written all dirty blocks, and then flush the
device itself and wait for the device to write all dirty blocks
to the media.
|
Back to top
HFSX
HFSX is an extension to HFS Plus to allow additional features
that are incompatible with HFS Plus. The only such feature currently
defined is case-sensitive filenames.
HFSX volumes have a signature of 'HX'
(0x4858 ) in the signature field of the
volume header. The version
field identifies the version of HFSX used on the volume; the only
value currently defined is 5 . If features are added that
would be incompatible with older versions (that is, older versions
cannot safely access or modify the volume because of the new features),
then a different version number will be used.
Note:
A new signature was required because some utilities
did not use the version field properly. They
would attempt to use or repair the volume (including changing the
version field) when they encountered a
version value that was not previously documented.
|
WARNING:
If your implementation encounters an HFSX volume with a
version value it does not recognize, it must
not attempt to access or repair the volume. Catastrophic
data loss may result. In particular, do NOT change the
version field.
|
It is intended that future HFSX features will result in the definition
of new volume attribute bits, and that
those bits will be used to indicate which features are in use on the
volume.
An HFSX volume never has an HFS wrapper.
In an Apple partition map, the partition type (PMPartType )
of an HFSX volume is set to "Apple_HFSX".
HFSX Version 5
Introduced in Mac OS X 10.3, HFSX version 5 allows volumes with
case-sensitive file and directory names. Case-sensitive names
means that you can have two objects, whose names differ only by
the case of the letters, in the same directory at the same time.
For example, you could have "Bob", "BOB", and "bob" in the same
directory.
An HFSX volume may be either case-sensitive or case-insensitive.
Case sensitivity (or lack thereof) is global to the volume; the
setting applies to all file and directory names on the volume.
To determine whether an HFSX volume is case-sensitive, use the
keyCompareType field of the
B-tree header of the catalog file.
A value of kHFSBinaryCompare means the volume is
case-sensitive. A value of kHFSCaseFolding means the
volume is case-insensitive.
Note:
Do not assume that an HFSX volume is case-sensitive.
Always use the keyCompareType to determine
case-sensitivity or case-insensitivity.
|
A case-insensitive HFSX volume (one whose keyCompareType
is kHFSCaseFolding uses the same
Unicode string comparison algorithm as HFS Plus.
A case-sensitive HFSX volume (one whose keyCompareType
is kHFSBinaryCompare ) simply compares each character of
the name as an unsigned 16-bit integer. The first character (the one
with the smallest offset from the start of the string) that is
different determines the relative order. The string with the
numerically smaller character value is ordered before the string
with the larger character value. For example, the string "Bob"
would sort before the string "apple" because the code for the
character "B" (decimal 66) is less than the code for the character
"a" (decimal 97).
IMPORTANT:
Case-sensitive names do not ignore Unicode "ignorable"
characters. This means that a single directory may have several
names which would be considered equivalent using Unicode comparison
rules, but which are considered distinct on a case-sensitive
HFSX volume.
|
Note:
The null character (0x0000 ), as used in the name
of the "HFS+ Private Data" directory used by
hard links, sort first with
case-sensitive compares, but last with case-insensitive
compares.
|
Back to top
Metadata Zone
Mac OS X version 10.3 introduced a new policy for determining
where to allocate space for files, which improves performance
for most users. This policy places the volume metadata and
frequently used small files ("hot files")
near each other on disk, which reduces the seek time for typical
accesses. This area on disk is known as the metadata
zone.
The volume metadata are the structures that let the file system
manage the contents of the volume. It includes the
allocation bitmap file,
extents overflow file,
and the catalog file, and the
journal file. The
volume header and alternate volume
header are also metadata, but they have fixed locations within
the volume, so they are not located in the hot file area. Mac OS
X may use a quota users file and quota groups file to manage disk
space quotas on a volume. These files aren't strictly metadata,
but they are included in the metadata zone because of their
heavy use by the OS and they are too large to be considered
ordinary hot files.
Implementations are encouraged not to interfere with the metadata
zone policy. For example, a disk optimizer should avoid moving files
into the metadata zone unless that file is known to be
frequently accessed, in which case it may be added to the "hot file" list. Similarly, files in the
metadata zone should not be moved elsewhere on disk unless they
are also removed from the hot file list.
This policy is only applied to volumes whose size is at least
10GB, and which have journaling enabled.
The metadata zone is established when the volume is mounted. The
size of the zone is based upon the following sizes:
Item |
Contribution to the Metadata Zone size |
Allocation Bitmap File |
Physical size (totalBlocks times the volume's
allocation block size) of the allocation bitmap file. |
Extents Overflow File |
4MB, plus 4MB per 100GB (up to 128MB maximum) |
Journal File |
8MB, plus 8MB per 100GB (up to 512MB maximum) |
Catalog File |
10 bytes per KB (1GB minimum) |
Hot Files |
5 bytes per KB (10MB minimum; 512MB maximum) |
Quota Users File |
Described below |
Quota Groups File |
Described below |
In Mac OS X version 10.3, the amount of space reserved for the
allocation file is actually the minimum allocation file size for
the volume (the total number of allocation blocks, divided by 8,
rounded up to a multiple of the allocation block size). If the
allocation file is larger than that (which is sometimes done to
allow a volume to be more easily grown at a later time), then
there will be less space available for other metadata or
hot files in the metadata zone. This
is a bug (r. 3522516).
The amount of space reserved for each type of metadata (except for
the allocation bitmap file) is based on the total size of the volume.
For the purposes of these computations, the total size of the volume is
the allocation block size multiplied by the total number of allocation blocks.
The sizes reserved for quota users and groups files are the result of
complex calculations. In each case, the size reserved is a value of
the form (items + 1) * 64 bytes, where items
is based on the size of the volume in gigabytes, rounded down. For the
quota users file, items is 256 per gigabyte, rounded up to
a power of 2, with a minimum of 2048, and a maximum of 2097152 (2M).
For the quota groups file, items is 32 per gigabyte,
rounded up to a power of 2, with a minimum of 2048, and a maximum of
262144 (256K). The quota files are considered hot files, and occupy
the hot file area, even though they are larger
than the maximum file size normally eligible to be a hot file.
The total size of the metadata zone is the sum of the above sizes,
rounded up so that the metadata zone is represented by a whole number
of allocation blocks within the volume bitmap. That is, the start and
end of the metadata zone fall on allocation block boundaries in the
volume bitmap. That means that the size of the metadata zone is rounded
up to a multiple of 8 times the square of the allocation block size.
In Mac OS X version 10.3, the extra space due to the round up of the
metadata zone is split up between the catalog and the
hot file area (2/3 and 1/3, respectively).
The calculations for the extents overflow file and journal file divide
the total size of the volume by 100GB, rounding down. Then they add one
(to compensate for any remainder lost as part of the rounding). The result
is then multiplied by 4MB or 8MB, respectively. If the volume's total
size is not a multiple of 100GB, this is equivalent to 4MB (or 8MB) per 100GB,
rounded up.
In Mac OS X version 10.3, the metadata zone is located at the
start of the volume, following the
volume header. The
hot file area is located towards the end of
the metadata zone.
When performing normal file allocations, the allocator will
skip over the metadata zone. This ensures that the metadata will be
less fragmented, and all of the metadata will be located in the
same area on the disk. If the area outside the metadata zone is
exhausted, the allocator will then use space inside the metadata
zone for normal file allocations. Similarly, when allocating
space for metadata, the allocator will use space inside the
metadata zone first. If all of the metadata zone is in use,
then metadata allocations will use space outside the metadata
zone.
Hot Files
Most files on a disk are rarely, if ever, accessed. Most
frequently accessed (hot) files are small. To improve
performance of these small, frequently access files, they are
moved near the volume's metadata, into the metadata zone. This
reduces seek times for most accesses. As files are moved into
the metadata zone, they are also defragmented (allocated in a
single extent), which further improves performance. This
process is known as adaptive hot file clustering.
The relative importance of a
frequently used (hot) file is called its temperature.
Files with the hottest (largest) temperatures are the ones
actually moved into the metadata zone. In Mac OS X version 10.3,
a file's temperature is computed as the number of bytes read
from the file during the recording period divided by the file's
size in bytes. This is a measure of how often the file is
read.
This section describes the on-disk structures used for
tracking hot files. The algorithms used at run time are subject
to change, and are not documented here.
Migration of files into or out of the hot file area of the metadata zone is a
gradual process, based upon the user's actual file access
patterns. The migration happens in several phases:
- Recording
- Watch file accesses to determine which files are used most
- Evaluation
- Merge recently used hot files with previously found hot files
- Eviction
- Move older and less frequently used hot files out of metadata zone
to make room for newer, hotter files
- Adoption
- Move newer and hotter files into the metadata zone
Hot File B-Tree
A B-Tree is used to keep track of the
files that currently occupy the hot file area of the metadata zone. The hot file B-tree is an
ordinary file on the volume (that is, it has records in the catalog). It is a file named
".hotfiles.btree " in the root directory. To avoid
accidental manipulation of this file, the
kIsInvisible and kNameLocked bits in
the finderFlags field of the Finder info should be set.
The node size of the hot file B-tree is at least 512 bytes,
and is typically the same as the the volume's allocation block
size. Like other B-trees on an HFS Plus volume, the key length
field is 16 bits, and kBTBigKeysMask is set in the
B-tree header's attributes . The
btreeType in the header
record must be set to kUserBTreeType .
The B-tree's user data record
contains information about hot file recording. The format of the user
data is described by the HotFilesInfo structure:
#define HFC_MAGIC 0xFF28FF26
#define HFC_VERSION 1
#define HFC_DEFAULT_DURATION (3600 * 60)
#define HFC_MINIMUM_TEMPERATURE 16
#define HFC_MAXIMUM_FILESIZE (10 * 1024 * 1024)
char hfc_tag[] = "CLUSTERED HOT FILES B-TREE ";
struct HotFilesInfo {
UInt32 magic;
UInt32 version;
UInt32 duration; /* duration of sample period */
UInt32 timebase; /* recording period start time */
UInt32 timeleft; /* recording period stop time */
UInt32 threshold;
UInt32 maxfileblks;
UInt32 maxfilecnt;
UInt8 tag[32];
};
typedef struct HotFilesInfo HotFilesInfo;
|
The fields have the following meaning:
magic
- Must contain the value
HFC_MAGIC (0xFF28FF26).
version
- Contains the version of the
HotFilesInfo
structure. Version 1 of the structure is described here.
If your implementation encounters any other version number,
it should not read or modify the hot file B-tree.
duration
- Contains the duration of the current recording phase, in seconds.
In Mac OS X 10.3, this value is typically
HFC_DEFAULT_DURATION
(60 hours).
timebase
- Contains the time that the current recording phase began, in seconds
since Jan 1, 1970 GMT.
timeleft
- Contains the time remaining in the current recording phase, in seconds.
threshold
- Contains the minimum temperature for a file to be eligible to be
moved into the hot file area. Files whose temperature is less than
this value will be moved out of the hot file area.
maxfileblks
- Contains the maximum file size, in allocation blocks, for a
file to be eligible to be moved into the hot file area. Files
larger than this size will not be moved into the hot file area.
In Mac OS X 10.3, this value is typically
HFC_MAXIMUM_FILESIZE divided by the volume's
allocation block size.
maxfilecnt
- Contains the maximum number of files to place into the hot file
area. Note that the hot file area may actually contain more than
this number of files, especially if they previously existed in the
hot file area before the beginning of the recording phase. This number
represents the number of files that the hot file recording code
intents to track and eventually place into the hot file area.
tag
- Contains the null-terminated (C-style) string containing the
ASCII text
"CLUSTERED HOT FILES B-TREE " (not including the
quotes). Note that the last six bytes are five spaces and the
null (zero) byte. This field exists to make it easier to recognize
the hot file B-tree when debugging or using a disk editor. An
implementation should not attempt to verify or change this field.
Hot File Record Key
A key in the hot file B-tree is of type HotFileKey .
struct HotFileKey {
UInt16 keyLength;
UInt8 forkType;
UInt8 pad;
UInt32 temperature;
UInt32 fileID;
};
typedef struct HotFileKey HotFileKey;
#define HFC_LOOKUPTAG 0xFFFFFFFF
#define HFC_KEYLENGTH (sizeof(HotFileKey) - sizeof(UInt32))
|
The fields have the following meaning:
keyLength
- The length of a hot file key, not including the
keyLength
field itself. Hot file keys are of fixed size. This field must contain
the value 10.
forkType
- Indicates whether the fork being tracked is a data fork
(value
0x00 ) or a resource fork (value 0xFF ).
In Mac OS X version 10.3, only data forks are eligible for placement
into the hot file area.
pad
- An implementation must treat this as a
pad field.
temperature
- The fork's temperature. For hot file
thread records, this field contains the value
HFC_LOOKUPTAG
(0xFFFFFFFF ).
fileID
- The catalog node ID of the file being tracked.
Hot file keys are compared first by temperature ,
then fileID , and lastly by forkType .
All of these comparisons are unsigned.
Hot File Records
Much like the catalog file,
the hot file B-tree stores two kinds of records: hot file records and thread
records. Every fork in the hot file area has both a hot file record and
a thread record in the hot file B-tree. Hot file records are used to find
hot files based on their temperature. Thread records are used to find
hot files based on their catalog node ID and fork type.
Thread records in the hot file B-tree use a special value
(HFC_LOOKUPTAG ) in the temperature
field of the key. The data for a thread record is the
temperature of that fork, stored as a
UInt32 . So, given a catalog node
ID and fork type, it is possible to construct a key for the
fork's thread record. If a thread record exists, you can get
the temperature from the thread's data to construct the key for
the hot file record. If a thread record does not exist, then
the fork is not being tracked as a hot file.
Hot file records use all of the key fields as described
above. The data for a hot file record is 4 bytes. The data in a
hot file record is not meaningful. To aid in debugging, Mac OS X
version 10.3 typically stores the first four bytes of the file
name (encoded in UTF-8), or the ASCII text "????".
When an implementation changes a hot file's temperature, the
old hot file record must be removed, a new hot file with the new
temperature must be inserted, and the thread record's data must
be changed to contain the new temperature.
Recording Hot File Temperatures
The recording phase gathers information about file usage over time.
In order to gather useful statistics, the recording phase may last longer
than the duration of a single mount. Therefore, information about file
usage is stored on disk so that it can accumulate over time.
The clumpSize field of the
fork data structure is used to record the amount of data actually
read from a fork. Since the field is only 32 bits long, it stores
the number of allocation blocks read from the file. The fork's
temperature can be computed by dividing its clumpSize by
its totalBlocks .
Back to top
Unicode Subtleties
HFS Plus makes heavy use of Unicode strings to store file
and folder names. However, Unicode is still evolving, and
its use within a file system presents a number of
challenges. This section describes some of the challenges,
along with the solutions used by HFS Plus.
IMPORTANT:
Before reading this section, you should read
HFS Plus Names.
|
IMPORTANT:
An implementation must not use the Unicode utilities
implemented by its native platform (for decomposition and
comparison), unless those algorithms are equivalent to the
HFS Plus algorithms defined here, and are guaranteed to be
so forever. This is rarely the case. Platform algorithms
tend to evolve with the Unicode standard. The HFS Plus
algorithms cannot evolve because such evolution would
invalidate existing HFS Plus volumes.
|
Note:
The Mac OS Text Encoding Converter provides several
constants that let you convert to and from the canonical,
decomposed form stored on HFS Plus volumes. When using
CreateTextEncoding to create a text encoding,
you should set the TextEncodingBase to
kTextEncodingUnicodeV2_0 , set the
TextEncodingVariant to
kUnicodeCanonicalDecompVariant , and set the
TextEncodingFormat to
kUnicode16BitFormat . Using these values ensures
that the Unicode will be in the same form as on an HFS Plus
volume, even as the Unicode standard evolves.
|
Canonical Decomposition
Unicode allows some sequences of characters to be
represented by multiple, equivalent forms. For example, the
character ""
can be represented as the single Unicode character
u+00E9 (latin small letter e with acute), or as
the two Unicode characters u+0065 and u+0301 (the letter "e"
plus a combining acute symbol).
To reduce complexity in the B-tree key comparison
routines (which have to compare Unicode strings), HFS Plus
defines that Unicode strings will be stored in fully
decomposed form, with composing characters stored in
canonical order. The other equivalent forms are illegal in
HFS Plus strings. An implementation must convert these
equivalent forms to the fully decomposed form before storing
the string on disk.
The Unicode Decomposition
table contains a list of characters that are illegal as part
of an HFS Plus string, and the equivalent character(s) that
must be used instead. Any character appearing in a column
titled "Illegal", must be replaced by the character(s) in
the column immediately to the right (titled "Replace With").
Note:
Mac OS versions 8.1 through 10.2.x used decompositions based on
Unicode 2.1. Mac OS X version 10.3 and later use decompositions
based on Unicode 3.2. Most of the characters whose decomposition
changed are not used by any Mac encoding, so they are unlikely to
occur on an HFS Plus volume. The MacGreek encoding had the largest
number of decomposition changes.
The Unicode decomposition table mentioned above indicates
which decompositions were added, removed, or changed between
Unicode 2.1 and Unicode 3.2.
|
In addition, the Korean Hangul characters with codes in
the range u+AC00 through u+D7A3 are illegal and must be
replaced with the equivalent sequence of conjoining jamos,
as described in the Unicode 2.0 book, section 3.10.
IMPORTANT:
The characters with codes in the range u+2000
through u+2FFF are punctuation, symbols, dingbats,
arrows, box drawing, etc. The u+24xx block, for
example, has single characters for things like
"(a)". The characters in this range are
not fully decomposed; they are left
unchanged in HFS Plus strings. This allows strings
in Mac OS encodings to be converted to Unicode and
back without loss of information. This is not
unnatural since a user would not necessarily expect
a dingbat "(a)" to be equivalent to the three
character sequence "(", "a", ")" in a file name.
The characters in the range u+F900 through u+FAFF
are CJK compatibility ideographs, and are not
decomposed in HFS Plus strings.
|
So, for the example given earlier, ""
must be stored as the two Unicode characters u+0065 and
u+0301 (in that order). The Unicode character u+00E9 may not
appear in a Unicode string used as part of an HFS Plus
B-tree key.
Case-Insensitive
String Comparison Algorithm
In HFS Plus and case-insensitive HFSX,
strings must be compared in a case-insensitive fashion. The
Unicode standard does not strictly define upper and lower
case equivalence, although it does suggest some equivalences.
The HFS Plus string comparison algorithm (defined below)
include a concrete case equivalence definition. An
implementation must use the equivalence expressed by this
algorithm.
Furthermore, Unicode requires that certain formatting
characters be ignored (skipped over) during string
comparisons. The algorithm and tables used for case
equivalence also arrange to ignore these characters. An
implementations must ignore the characters that are ignored
by this algorithm.
Note:
Case-sensitive HFSX volumes do
not ignore the Unicode ignorable characters.
Those characters are significant for the purposes of
name comparion on case-sensitive HFSX.
|
The HFS Plus case-insensitive string
comparison algorithm is defined by the FastUnicodeCompare
routine, shown below. This routine returns a value that
tells the caller how the strings are ordered relative to
each other: whether the first string is less than, equal to,
or greater than the second string. An HFS Plus implementation
may use this routine directly, or use another routine that
produces the same relative ordering.
Note:
The FastUnicodeCompare routine does not handle
composed Unicode characters since they are illegal in HFS
Plus strings. As described in
Canonical
Decomposition, all HFS Plus strings must be fully
decomposed, with composing characters in canonical order.
|
/*
FastUnicodeCompare - Compare two Unicode strings;
produce a relative ordering
IF RESULT
--------------------------
str1 < str2 => -1
str1 = str2 => 0
str1 > str2 => +1
The lower case table starts with 256 entries (one for
each of the upper bytes of the original Unicode char).
If that entry is zero, then all characters with that
upper byte are already case folded. If the entry is
non-zero, then it is the _index_ (not byte offset) of
the start of the sub-table for the characters with
that upper byte. All ignorable characters are folded
to the value zero.
In pseudocode:
Let c = source Unicode character
Let table[] = lower case table
lower = table[highbyte(c)]
if (lower == 0)
lower = c
else
lower = table[lower+lowbyte(c)]
if (lower == 0)
ignore this character
To handle ignorable characters, we now need a loop to
find the next valid character. Also, we can't pre-compute
the number of characters to compare; the string length
might be larger than the number of non-ignorable characters.
Further, we must be able to handle ignorable characters at
any point in the string, including as the first or last
characters. We use a zero value as a sentinel to detect
both end-of-string and ignorable characters. Since the
File Manager doesn't prevent the NULL character (value
zero) as part of a file name, the case mapping table is
assumed to map u+0000 to some non-zero value (like 0xFFFF,
which is an invalid Unicode character).
Pseudocode:
while (1) {
c1 = GetNextValidChar(str1) -- returns zero if
-- at end of string
c2 = GetNextValidChar(str2)
if (c1 != c2) break; -- found a difference
if (c1 == 0) -- reached end of string on
-- both strings at once?
return 0; -- yes, so strings are equal
}
-- When we get here, c1 != c2. So, we just
-- need to determine which one is less.
if (c1 < c2)
return -1;
else
return 1;
*/
SInt32 FastUnicodeCompare (
register ConstUniCharArrayPtr str1,
register ItemCount length1,
register ConstUniCharArrayPtr str2,
register ItemCount length2) {
register UInt16 c1,c2;
register UInt16 temp;
register UInt16* lowerCaseTable;
lowerCaseTable = gLowerCaseTable;
while (1) {
/* Set default values for c1, c2 in
case there are no more valid chars */
c1 = 0;
c2 = 0;
/* Find next non-ignorable char from
str1, or zero if no more */
while (length1 && c1 == 0) {
c1 = *(str1++);
--length1;
/* is there a subtable for
this upper byte? */
if ((temp = lowerCaseTable[c1>>8]) != 0)
/* yes, so fold the char */
c1 = lowerCaseTable[temp + (c1 & 0x00FF)];
}
/* Find next non-ignorable char
from str2, or zero if no more */
while (length2 && c2 == 0) {
c2 = *(str2++);
--length2;
/* is there a subtable
for this upper byte? */
if ((temp = lowerCaseTable[c2>>8]) != 0)
/* yes, so fold the char */
c2 = lowerCaseTable[temp + (c2 & 0x00FF)];
}
/* found a difference, so stop looping */
if (c1 != c2)
break;
/* did we reach the end of
both strings at the same time? */
if (c1 == 0)
/* yes, so strings are equal */
return 0;
}
if (c1 < c2)
return -1;
else
return 1;
}
/* The lower case table consists of a 256-entry high-byte
table followed by some number of 256-entry subtables. The
high-byte table contains either an offset to the subtable
for characters with that high byte or zero, which means
that there are no case mappings or ignored characters in
that block. Ignored characters are mapped to zero. */
UInt16 gLowerCaseTable[] = {
/* High-byte indices ( == 0 if no case mapping and
no ignorables ) Full data tables omitted for brevity.
See the Downloadables section for URL to download
the code. */
};
|
Back to top
HFS Wrapper
An HFS Plus volume may be contained within an HFS volume
in a way that makes the volume look like an HFS volume to
systems without HFS Plus support. This has a two important
advantages:
- It allows a computer with HFS (but no HFS Plus)
support in ROM to start up from an HFS Plus volume. When
creating the wrapper, Mac OS includes a System file
containing the minimum code to locate and mount the
embedded HFS Plus volume and continue booting from its
System file.
- It improves the user experience when an HFS Plus
volume is inserted in a computer that has HFS support but
no HFS Plus support. On such a computer, the HFS wrapper
will be mounted as a volume, which prevents error dialogs
that might confuse the user into thinking the volume is
empty, damaged, or unreadable. The HFS wrapper may also
contain a Read Me document to explain the steps the user
should take to access their files.
The rest of this section describes how the HFS wrapper is
laid out and how the HFS Plus volume is embedded within the
wrapper.
IMPORTANT:
This section does not describe the HFS Plus volume format;
instead, it describes additions to the HFS volume format
that allow an HFS Plus volume (or some other volume) to be
embedded in an HFS volume. However, as all Mac OS volumes
are formatted with an HFS wrapper, all implementations
should be able to parse the wrapper to find the embedded HFS
Plus volume.
|
Note:
An HFS Plus volume is not required to have an HFS wrapper.
In that case, the volume will start at the beginning of
the disk, and the volume header will be at offset 1024 bytes.
However, Apple software currently initializes all HFS Plus
volumes with an HFS wrapper.
|
HFS Master Directory Block
An HFS volume always contains a Master Directory Block
(MDB), at offset 1024 bytes. The MDB is similar to an HFS Plus
volume header. In order to
support volumes embedded within an HFS volume, several
unused fields of the MDB have been changed, and are now used
to indicate the type, location, and size of the embedded
volume.
What was formerly the drVCSize field (at
offset 0x7C) is now named drEmbedSigWord . This
two-byte field contains a unique value that identifies the
type of embedded volume. When an HFS Plus volume is
embedded, drEmbedSigWord must be
kHFSPlusSigWord ('H+' ), the same
value stored in the signature field of an HFS
Plus volume header.
What were formerly the drVBMCSize and
drCtlCSize fields (at offset 0x7E )
have been combined into a single field occupying four bytes.
The new structure is named drEmbedExtent and is
of type HFSExtentDescriptor . It contains the
starting allocation block number (startBlock )
where the embedded volume begins and number of allocation
blocks (blockCount ) the embedded volume
occupies. The embedded volume must be contiguous. Both of
these values are in terms of the HFS wrapper's allocation
blocks, not HFS Plus allocation blocks.
Note:
The description of the HFS volume format in
Inside
Macintosh: Files describes these fields as being used to
store the size of various caches, and labels each one as
"used internally".
|
To actually find the embedded volume's location on disk,
an implementation must use the drAlBlkSiz and
drAlBlSt fields of the MDB. The
drAlBlkSiz field contains the size (in bytes)
of the HFS allocation blocks. The drAlBlSt
field contains the offset, in 512-byte blocks, of the
wrapper's allocation block 0 relative to the start of the
volume.
IMPORTANT:
This embedding introduces a transform between HFS Plus
volume offsets and disk offsets. The HFS Plus volume exists
on a virtual disk embedded within the real disk. When
accessing an HFS Plus structure on an embedded disk, an
implementation must add the offset of the embedded disk to
the HFS Plus location. Listing 2 shows how one might do this,
assuming 512-byte sectors.
|
static UInt32 HFSPlusSectorToDiskSector(UInt32 hfsPlusSector)
{
UInt32 embeddedDiskOffset;
embeddedDiskOffset = gMDB.drAlBlSt +
gMDB.drEmbedExtent.startBlock * (drAlBlkSiz / 512)
return embeddedDiskOffset + hfsPlusSector;
}
|
Listing 2. Sector transform for
embedded volumes.
|
In order to prevent accidentally changing the files in
the HFS wrapper, the wrapper volume must be marked as
software-write-protected by setting
kHFSVolumeSoftwareLockBit in the
drAtrb (volume attributes) field of the MDB.
All correct HFS implementations will prevent any changes to
the wrapper volume.
To improve performance of HFS Plus volumes, the size of
the wrapper's allocation blocks should be a multiple of the
size of the HFS Plus volume's allocation blocks. In
addition, the wrapper's allocation block start
(drAlBlSt ) should be a multiple of the HFS Plus
volume's allocation block size (or perhaps 4 KB, if the HFS
Plus allocation blocks are larger). If these recommendations
are followed, the HFS Plus allocation blocks will be
properly aligned on the disk. And, if the HFS Plus
allocation block size is a multiple of the sector size,
then blocking and deblocking at the device driver level
will be minimized.
Allocating Space for the Embedded Volume
The space occupied by the embedded volume must be marked
as allocated in the HFS wrapper's volume bitmap (similar to
the HFS Plus allocation file)
and placed in the HFS wrapper's bad block file (similar to
the HFS Plus bad block file).
This doesn't mean the blocks are actually bad; it merely
prevents the HFS Plus volume from being overwritten by newly
created files in the HFS wrapper, being deleted
accidentally, or being marked as free, usable space by HFS
disk repair utilities.
The kHFSVolumeSparedBlocksMask bit of the
drAtrb (volume attributes) field of the MDB
must be set to indicate that the volume has a bad blocks
file.
Read Me and System Files
IMPORTANT:
This section is not part of the HFS Plus volume format. It
describes how the existing Mac OS implementation of HFS Plus
creates HFS wrappers. It is provided for your information
only.
|
As initialized by the Mac OS Disk Initialization Package,
the HFS wrapper volume contains five files in the root
folder.
- Read Me -- The Read Me file, whose name is actually
"Where_have_all_my_files_gone?", contains text explaining
that this volume is really an HFS Plus volume but the
contents cannot be accessed because HFS Plus is not
currently installed on the computer. It also describes
the steps needed to install HFS Plus support. Localized
system software will also create a localized version of
the file with localized file name and text content.
- System and Finder (invisible) -- The System file
contains the minimum code to locate and mount the
embedded HFS Plus volume, and to continue booting from
the System file in the embedded volume. The Finder file
is empty; it is there to prevent older versions of the
Finder from de-blessing the wrapper's root directory,
which would prevent booting from the volume.
- Desktop DB and Desktop DF (invisible) -- The Desktop
DB and Desktop DF files are an artifact of the way the
files on the wrapper volume are created.
In addition, the root folder is set as the blessed folder
by placing its folder ID in the first SInt32 of
the drFndrInfo (Finder information) field of
the MDB.
Back to top
Volume Consistency Checks
An HFS Plus volume is a complex data structure,
consisting of many different inter-related data structures.
Inconsistencies between these data structures could cause
serious data loss. When an HFS Plus implementation mounts a
volume, it must perform basic consistency checks to ensure
that the volume is consistent. In addition, the
implementation may choose to implement other, more advanced,
consistency checks.
Many of these consistency checks take a significant
amount of time to run. While a safe implementation might run
these checks every time a volume is mounted, most
implementations will want to rely on the correctness of the
previous implementation that modified the disk. The
implementation may avoid unnecessary checking by determining
whether the volume was last unmounted cleanly. If it was,
the implementation may choose to skip a consistency check.
An implementation can determine whether a volume was
unmounted cleanly by looking at various flag bits in the
volume header. See Volume
Attributes for details.
Next Catalog Node ID Consistency Check
For an HFS Plus volume to work correctly with
many implementations, it is vital that the nextCatalogID
field of the volume header be greater than
all CNIDs currently used in the
catalog file. The algorithm to ensure this is as follows.
- The implementation must iterate through all the leaf
nodes of the catalog file, looking for file and folder
records, determining the maximum CNID of any file or
folder in the catalog.
- Once it knows the maximum CNID value, the
implementation must set
nextCatalogID to a
value greater than it.
Note:
The consistency check of nextCatalogID must be
skipped if kHFSCatalogNodeIDsReusedBit is set
in the attributes field of the
volume header.
|
Allocation File
Consistency Check
For an HFS Plus volume to work correctly, it's vital that
any allocation block in use by file system structures be
marked as allocated in the allocation file. The algorithm to
ensure this is as follows:
- The implementation must first walk the allocation
file, marking every allocation block as free. (This step
can be skipped to improve the performance of the
consistency check. All that will happen is that some
allocation blocks may have been marked as in-use, though
they are not really in use by any extent.)
- The implementation must then mark the allocation
blocks containing the first 1536 bytes and the last
1024 bytes as allocated. These areas are either
reserved or used by the volume
header.
- The implementation must then mark the allocation
blocks used by all extents in all special files (the
catalog file, the extents overflow file, the allocation
file, the attributes file, and the startup file) as
allocated. These extents are all described in the
volume header.
- The implementation must then walk the leaf nodes of
the catalog file, marking all allocation blocks used by
extents in file records (in the
HFSPlusForkData structures for the data and
resource forks).
- The implementation must then walk the leaf nodes of
the extents overflow
file, marking all allocation blocks used by all
extents in all extent records as allocated.
- The implementation must then walk the leaf nodes of
the attributes file, marking all allocation blocks used
by all extents described in fork data attributes and
extension attributes as allocated.
WARNING:
To prevent the loss of user data, an implementation must
perform this check every time it mounts a volume that wasn't
unmounted cleanly. It is most important that an allocation
block that is in use be marked in the allocation file. It is
less important that an allocation block that is not in use
be cleared in the allocation file. If an allocation block is
marked as in-use by the allocation file, but not actually in
use by any extent, then that allocation block is really just
wasting space; it isn't otherwise dangerous.
|
Back to top
Summary
Volume format specifications are fun exhausting.
Back to top
References
Inside
Macintosh: Files, especially the
Data
Organization on Volumes section.
Finder Interface Reference section of the Carbon user experience documentation.
Technical Note 1189: The Monster Disk Driver Technote,
especially the
Secrets of the Partition Map section.
Algorithms in C, Robert Sedgewick,
Addison-Wesley, 1992, especially the section on B-trees.
Change History
Back to top
Downloadables
Back to top
|