File Encodings and Fonts
Unicode is generally considered the native encoding for OS X and should be used in nearly all situations. Previous versions of Mac OS supported file encodings such as MacRoman but most modern OS X libraries support Unicode inherently. If you use Cocoa or Core Foundation routines, then you will probably never need to worry about other file encodings. If your software supports legacy file formats, however, you might need to consider file encoding issues when importing legacy file formats. The following sections describe some of the issues related to Unicode support and legacy file encodings.
File Systems and Unicode Support
Different file systems in OS X have different levels of Unicode support:
Mac OS Extended (HFS+) uses canonically decomposed Unicode 3.2 in UTF-16 format, which consists of a sequence of 16-bit codes. (Characters in the ranges U2000-U2FFF, UF900-UFA6A, and U2F800-U2FA1D are not decomposed.)
The UFS file system allows any character from Unicode 2.1 or later, but uses the UTF-8 format, which consists mostly of 8-bit ASCII codes but which may also include multibyte codes. (Characters in the ranges U2000-U2FFF, UF900-UFA6A, and U2F800-U2FA1D are not decomposed.)
Locking the canonical decomposition to a particular version of Unicode does not exclude usage of characters defined in a newer version of Unicode. Because the Unicode consortium has guaranteed not to add any more precomposed characters, applications can expect to store characters defined in future versions of Unicode without compatibility issues.
All BSD system functions expect their string parameters to be in UTF-8 encoding and nothing else. Code that calls BSD system routines should ensure that the contents of all
const *char parameters are in canonical UTF-8 encoding. In a canonical UTF-8 string, all decomposable characters are decomposed; for example, é (0x00E9) is represented as e (0x0065) + ´ (0x0301). To put things into a canonical UTF-8 encoding, use the “file-system representation” interfaces defined in Cocoa (including Core Foundation).
Getting Canonical Strings
Both Cocoa and Core Foundation provide routines for accessing canonical and non-canonical Unicode strings. Cocoa string manipulations are all handled through the
NSString class and its subclasses. In Core Foundation, you can use the
CFStringGetCStringPtr functions to obtain a C string with the desired encoding.
Cocoa employs Unicode for character encoding, making any Cocoa application capable of displaying most human languages. Although Cocoa supports vertical and bidirectional text, the
NSTypesetter class only supports layout for horizontal text. If you want to lay out vertical text, you need to define your own custom typesetter class.