Important: The information in this document is obsolete and should not be used for new development.
The Standard Roman Character Set
The Standard Roman character set is an extended version of the Macintosh character set, documented in Volume I of the original Inside Macintosh. The Macintosh character set is itself an extended version of the ASCII character set. The conventional ASCII character set, also called low ASCII, defines control codes, symbols, numbers, and letters, assigning them character codes from $00 through $7F. The Macintosh character set adds codes from $80 through $D8, representing accented characters and additional symbols. Current Macintosh file-system sorting, as well as the sorting order used by several Text Utilities routines such asRelString
, is based on the Macintosh character set.The Standard Roman character set adds more accented forms, symbols, and diacritical marks, assigning them character codes from $D9 through $FF. It thus consists of all the character codes from $00-$FF, and it includes uppercase versions of all of the lowercase accented forms, a number of symbols, and other forms. See Figure A-1.
The Standard Roman character set is the closest to a universal character encoding that exists in the Roman script system. The Standard Roman characters are available in most Roman outline fonts, but not all are available in the Apple bitmapped versions of Chicago, Geneva, New York, and Monaco.
Figure A-1 The Standard Roman character set
Nonprinting Characters
Table A-1 lists the nonprinting characters in the Standard Roman character set. The Unicode 1.0 name and the Macintosh character code (in hexadecimal and decimal) are provided also. (Unicode is an ISO standard for 16-bit universal worldwide
character encoding.)Using Roman Character Codes as Delimiters
Your application may need to use a character code or range of codes to represent noncharacter data (such as field delimiters). Character codes below $20 are never affected by the script system. Some of these character codes can be used safely for special purposes. Note, however, that most characters in this range are already assigned special meanings by parts of Macintosh system software, such as TextEdit, or by programming languages like C. Table A-2 lists the low-ASCII characters to avoid in your application.
Table A-2 Low-ASCII characters to avoid as delimiters (Continued) Character Hexadecimal representation Null $00 Home $01 Enter $03 End $04 Help $05 Backspace $08 Tab $09 Page up $0B Page down $0C Carriage return $0D F1 through F15 $10 System characters $11, $12, $13, $14[7] Clear $1B Arrow keys $1C, $1D, $1E, $1F For certain writing systems, font layouts (tables that map glyph codes to glyphs) may use some of these character codes internally, for ligatures or other contextual forms. Also, as noted in Table A-2, system fonts use codes $11 through $14 for printing special symbols such as the Apple logo. Thus in unusual situations font changes may have an impact on the glyph representation of stored character codes with values less than $20, even though a user cannot generate those codes directly.
Printing Characters
Table A-3 shows the printing characters that exist in the Standard Roman character set. Macintosh applications can assume that glyphs for these characters exist in every Roman font. (However, see also the discussion of Roman fonts on page A-18.) The Unicode 1.0 and PostScript names and Macintosh character code (in hexadecimal and decimal) are provided along with a glyph example for printable characters. Modified versions of the Standard Roman character set exist for Croatian, Romanian, Turkish, and Icelandic/Faroese, with different character assignments for the same codes. See Table A-4 through Table A-7.Variations in the Character Set
Two types of variations from the Standard Roman character set can occur. First, several languages and regional variations of Roman reassign parts of the character set; second, many specialized Roman fonts completely override the character set to provide other types of symbols.Table A-4 shows the glyph assignments in the Croatian version of the Roman character set that diverge from the Standard Roman character set, their Unicode and PostScript names, and their Macintosh character codes in hexadecimal and decimal. For example, the code (hexadecimal $A9) that is assigned to the copyright sign in the Standard Roman character set is replaced by the Scaron (that is, the Roman capital letter "S" with a hacek). The copyright sign appears later at position $D9, which is assigned to the Latin capital letter "Y" diaeresis in the Standard Roman character set.
Table A-5 shows the glyph assignments in the Romanian version of the Roman character set that diverge from the Standard Roman character set, their Unicode and PostScript names, and their Macintosh character codes in hexadecimal and decimal.
Table A-6 shows the glyph assignments in the Turkish version of the Roman character set that diverge from the Standard Roman character set, their Unicode and PostScript names, and their Macintosh character codes in hexadecimal and decimal.
Table A-7 shows the glyph assignments in the Icelandic and Faroese versions of the Roman character set that diverge from the Standard Roman character set, their Unicode and PostScript names, and their Macintosh character codes in hexadecimal and decimal.
In addition to regional variations in the character set, the Roman script system in particular contains many fonts with unique glyphs. Since the character encoding is limited to 256 values, specialized fonts such as Symbol and ITC Zapf Dingbats override the Standard Roman character encoding. For example, in the Standard Roman character set $70 corresponds to lowercase "p", but it is the numeric symbol for pi (¼) in the Symbol font, an outlined square () in Zapf Dingbats, and the musical symbol pianissimo for play quietly in the Sonata font. Hence, there is no guarantee that a Roman character code will always represent the same character in every font.
[7] System fonts use these codes for the printing characters PROPELLER, LOZENGE , RADICAL, and APPLE LOGO, respectively.