Legacy Document

Important: The information in this document is obsolete and should not be used for new development.

Inside Macintosh: Text /: Chapter 6 - Script Manager / Using the Script Manager

Obtaining Information
The second principal use for the Script Manager is in obtaining script-specific information. Many of the routines described in this section are of general interest and are used by most text applications. You can use these Script Manager routines to

determine script codes for the current script system or any other available script system, based on font information
analyze characters in your text for size (in bytes) or other properties
directly access the contents of a script system's international resources, to pass that information to other text-handling calls or to inspect or modify the information

Most text-processing applications need script-code information and character-type information, and may need to pass specific tables from international resources to some script-aware text routines. If you format currencies, you need access to the numeric-format resource. If you use special symbols or if you format numbers, you
need access to the untoken table and perhaps the number parts table of the tokens resource. If your needs are more specialized, you can obtain the contents of other tables and other resources.

Determining Script Codes From Font Information
The script management system asssociates a script system with a sequence of text by examining the font of that text. Your application may also need the same information--to test for the presence of a particular script system, to load its resources, to pass its code as a parameter to a script-aware routine, or to execute script-specific conditional code. You may need to determine what script system is currently active for displaying text, what script system is being used to sort and format text, or what script system would be used if text of a particular font were to be displayed or formatted. The Script Manager provides three routines for that purpose: FontScript, FontToScript, and IntlScript.
The FontScript function tells you which script system the font of the current graphics port belongs to. The FontToScript function tells you which (available) script system a font of any ID number belongs to. The IntlScript function tells you which script system is used by the Text Utilities to determine the number, date, time, currency, and sorting formats.
The FontToScript function returns a script code for a specified font family ID, but the FontScript and IntlScript functions return the code for the current script, the presently active script system for text manipulation. Many script-aware routines in QuickDraw, Text Utilities, the Script Manager, and other parts of the Macintosh script management system need not take an explicit script code or international resource handle as a parameter; in that case they use the current script as the script system under which they are to function.
The current script for text display is normally the font script. The current script for date and time formatting and string sorting is by default the system script. However, the settings of two flags--the font force flag and the international resources selection flag--can affect which script system is considered current at any one moment. Furthermore, if the mapping from font to script results in a request for a script system that is not available, the result defaults to the system script.
The next subsection lists the steps taken by FontScript, FontToScript, and IntlScript to determine the script codes they return, and the following subsections discuss the font force flag and the international resources selection flag in more detail.

How a Script Code Is Determined
The FontScript, FontToScript, and IntlScript functions all use a font family ID to determine the script code they return. The formula they use is presented in the discussion of resource ID numbers and script codes in the appendix "International Resources" in this book. Fonts with IDs below 16384 ($4000) are all Roman; starting with 16384 each non-Roman script system has a range of 512 ($200) font IDs available.
Nevertheless, you should always call the functions instead of hardcoding any formula, because it may change in the future. Furthermore, the function results are influenced by the states of the font force flag and the international resources selection flag, and by the availability of the determined script. Figure 6-1 shows the method the functions follow:

The three functions initialize two result flags, the script-forced result flag and the script-defaulted result flag, to FALSE. These flags are Script Manager variables, accessed through the GetScriptManagerVariable function selectors smForced and smDefault.
The three functions map the two special font designations 0 and 1, meaning the system and application fonts, to their true font family ID numbers.
FontScript and IntlScript calculate the script code from the font family ID of the current font of the active port; FontToScript calculates the script code from the supplied font family ID. If the ID is in the range $4000 to $BFFF, it is a non-Roman font; otherwise, it is Roman.
Once the initial determination of the script code has been made, the three functions diverge:

If the font is Roman, FontScript and FontToScript examine the font force flag, which can be accessed through the GetScriptManagerVariable function selector smFontForce. If the flag is TRUE, the two functions substitute the system script for the font script, and set the script-forced result flag to TRUE. If the font is non-Roman, FontScript and FontToScript ignore the state of the font
force flag.
Regardless of the font type (Roman or non-Roman), IntlScript examines the international resources selection flag, which can be accessed through the GetScriptManagerVariable function selector smIntlForce. If the flag is TRUE and the font script does not equal the system script, IntlScript substitutes the system script for the font script and sets the script-forced result flag to TRUE.

Figure 6-1 Determining script code from font family ID

A final check is made to be sure that the resulting script is installed and enabled. If it is not, the three functions substitute the system script for the script code previously determined, set the script-forced result flag to FALSE, and set the script-defaulted result flag to TRUE.
The functions return the resulting script code in their function results.

Call FontScript when you want to know which script system will be used for text layout and display. The script code returned by FontScript tells you which script system controls the functioning of such calls as CharToPixel, CharacterType, FindWordBreaks, DrawText, and DrawJustified. Typically, FontScript returns the script code for the font script; in most situations the font force flag is FALSE,
because applications usually expect to format and draw text according to the rules of the
font script.
Call FontToScript when you want to know whether the script system for text of a particular font is available, or when you wish to manipulate text of a certain script system without setting the current font to that font's ID.

Note
Because a user can set the value of the font force flag from the Text control panel, the result returned from the FontToScript or FontScript function for a font whose ID number is in the Roman range can vary from call to call.
Call IntlScript when you want to know which script system will be used for formatting dates and numbers, and for sorting strings. The script code returned by IntlScript tells you which script system controls the functioning of such calls as DateString, LongTimeString, and CompareText, when no explicit script code or resource handle is supplied to those calls. In many localized versions of sysem software, IntlScript by default returns the script code for the system script, because the international resources selection flag is by default TRUE. The Finder and other parts of system software usually expect to present dates, times, and lists of files according to the rules of the system script.
Because the two flags are independent of each other, two different meanings for current script can exist simultaneously. For example, your application might be sorting a set of strings by one script's rules, but displaying them by another's. If that is not appropriate, set the flags as needed before formatting or drawing. See the following discussion.

Using the Font Force Flag
You access and control the font force flag through the GetScriptManagerVariable and SetScriptManagerVariable functions, with the selector smfontForce. This flag directly affects the results of the FontScript and FontToScript functions, and indirectly affects the operation of script-aware text measuring and drawing routines.
At startup, the Script Manager sets the font force flag to the value specified in the system script's international configuration ('itlc') resource. Typically, that value is FALSE.
When the font force flag is set to TRUE and the system script is non-Roman, the script management system interprets font family ID numbers in the range of the Roman script system ($0002 to $3FFF) as belonging to the system script instead. Character codes representing non-Roman characters in the system script are drawn using the system font instead of in the specified Roman font. This feature exists to allow users to enter and read non-Roman text in those few applications that have hardcoded font numbers.
For example, an application may hardcode Geneva as its font; it may force the txFont field of its graphics ports to always have a value of 3. (Note that this is a violation of good programming practice.) If the application is running on a system with Hebrew as the system script, it would normally be impossible to write properly in Hebrew because the hardcoded font ID would require the font script to be Roman. However, if the font force flag is set to TRUE, the script management system notes that the current font has an ID number in the Roman range and draws glyphs from the Hebrew system font for any character codes that represent valid Hebrew characters.
Thus to enter or read non-Roman text in these applications, the user can set the font force flag to TRUE from the Text control panel. Setting the font force flag is only partially effective, because it cannot give users full control over fonts. The user cannot choose, for example, which font belonging to the system script is to be substituted for Roman.
The font force flag has no effect on non-Roman fonts and has no effect if the system script is Roman. It affects only Roman fonts when the system script is non-Roman.
You can determine the status of font forcing by inspecting the script-forced result flag and the script-defaulted result flag immediately after calling FontScript or FontToScript; see Figure 6-1.
Although the font force flag exists primarily to accommodate restrictions in certain existing applications, it is a user-changeable setting that your application should be aware of and accommodate. For example:

If you are writing any application in which the user has control over fonts, you should always set the font force flag to FALSE. There is no need to force fonts if the user can choose them.
If the user sets the font force flag to TRUE, you will get the system script when you call FontScript or FontToScript for fonts in the Roman range, even if your application allows mixed text. To preserve Roman text, you can change the setting of the font force flag before calling FontScript or FontToScript, or before calling any other script-aware text routine. If you do that, be sure to save the previous value and restore it when your application exits or becomes inactive.

Using the International Resources Selection Flag
You access and control the international resources selection flag through the GetScriptManagerVariable and SetScriptManagerVariable functions, with the selector smIntlForce. This flag directly affects the results of the IntlScript function, and indirectly affects the operation of the GetIntlResource function and the script-aware Text Utilities sorting and formatting routines.
At startup, the Script Manager sets the international resources selection flag to the value specified in the system script's international configuration ('itlc') resource. Typically, that value is TRUE.
The international resources selection flag affects the results of the GetIntlResource function (see page 6-90). GetIntlResource returns a handle to certain international resources, and the state of the international resources selection flag controls whether it is the system script or the font script whose international resources are loaded. When the flag is set to TRUE, GetIntlResource fetches the resources for the system script. When the flag is set to FALSE, GetIntlResource uses the current font in the active port to determine the script system whose resources will be fetched.
You can use the international resources selection flag to make sure that date formats, sorting, and so forth reflect the appropriate script in your application. Whenever you change the setting of the international resources selection flag, be sure to save the previous value and restore it when your application exits or becomes inactive.

Analyzing Characters
The Script Manager provides routines that let you analyze the size and type of individual characters. For example, with script systems that use 2-byte characters,
you may need to determine what part of a character a single byte represents. In either 1-byte or 2-byte script systems, you may need to know whether a particular character is a letter or a punctuation mark, whether or not it is uppercase, or whether it is part of a subscript (Roman within Cyrillic, Hiragana within Japanese, and so on).

Searching Text With Mixed Character Sizes
When searching for a single 1-byte character in text that may contain 2-byte characters, your application must not mistake part of a 2-byte character for the character you are seeking. The CharacterByteType and FillParseTable functions tell you whether a given character is 1-byte or whether it is the first or second byte of a 2-byte character.
These functions use the fact that, in a 2-byte script system, only a restricted set of values within the high-ASCII range are used as the first bytes of 2-byte characters, and those values are never used for 1-byte characters in that script system. All other byte values represent single-byte characters, control characters, or the second bytes of 2-byte characters. The ranges reserved for initial bytes of 2-byte characters vary from script system to script system, but every font has a table that gives that information, and CharacterByteType and FillParseTable use those tables to perform their calculations. For an illustration of this concept, see the discussion of character encoding in the chapter "Introduction to Text on the Macintosh" in this book.
Listing 6-6 shows a search procedure that accounts for 2-byte characters. This routine uses the Text Utilities Munger function to find a match to a key string. Because Munger might find a match beginning at the second byte of a 2-byte character, the routine checks for this case (using the CharacterByteType function) and continues searching if
it occurs.
The sample assumes two application global variables: gMainTextHandle, which is a handle to the application's text buffer, and gNewLocation, a long-integer offset into the buffer at which to start searching. The parameters keyPtr and keySize specify the string to be matched in the text buffer; scriptNum is an explicit script code. On return, the routine updates gNewLocation to point to the location at which the search string was found, or sets it to -1 if no match was found.
Listing 6-6 Handling 2-byte characters in a search procedure
PROCEDURE MySearch (keyPtr: Ptr; keySize: LongInt; scriptNum: 
Integer);
VAR
   byteType: Integer;
BEGIN
   HLock(gMainTextHandle);    {CharacterByteType can move memory}
   REPEAT BEGIN
      gNewLocation := Munger(gMainTextHandle, gNewLocation,
                              keyPtr, keySize, NIL, 0);
                              {if we matched second byte of }
                              { 2-byte char in text, continue}
      IF (gNewLocation >= 0) AND (scriptNum > 0) THEN
         byteType := CharacterByteType(gMainTextHandle^, 
                                       gNewLocation, scriptNum)
      ELSE
         byteType := smSingleByte;
   END UNTIL byteType <> smLastByte;
   HUnlock(gMainTextHandle);
   IF (gNewLocation >= 0) AND {range-check, update global}
      (gNewLocation + keySize > GetHandleSize(gMainTextHandle)) 
THEN
      gNewLocation := -1;
END;
The FillParseTable function is similar to CharacterByteType, in that it helps you find 2-byte characters. However, you don't send FillParseTable the character code to be analyzed. Instead, FillParseTable fills in an entire 256-byte table of information for you, showing every byte value that is the first byte of a 2-byte character for the current font. You can use the table filled out by FillParseTable to find 2-byte characters in a large body of text much more rapidly than you could by calling CharacterByteType for each byte value in the text.

Getting Character-Type Information
You may want to know more about a byte than whether it is part of a 2-byte character. If you are simply searching for sequences of Roman text in a buffer, or if you wish to divide a run of Japanese into Kanji, Katakana, Hiragana, and Romaji components, you can use the FindScriptRun function described in the chapter "Text Utilities" in this book. But if you have other reasons to isolate specific types of characters, you can use CharacterType.
The CharacterType function is similar to CharacterByteType, in that it tells you what kind of character occurs at a given offset in a text buffer. But the kind of information it returns is different. CharacterType tells you what the character's line direction is, whether it's uppercase, whether it belongs to a subscript within its script, whether it's a 2-byte character, and what the character's specific type and class are--letter or punctuation, low-ASCII or high-ASCII Roman letter, Katakana or Hiragana, Jamo or Hangul, and so on.
When you call the CharacterType function, you pass it a byte offset; it returns a
value that is an integer bit field giving information about the character at that offset. See Figure 6-2. The paragraphs following the figure describe the fields.
Figure 6-2 Fields in the CharacterType return value
Bits 0-3 of the CharacterType function result describe the character type of the character in question.

The Roman script system recognizes three basic character types, defined by the following constants:
Character type Hex. value Explanation
smCharPunct $0000 Punctuation (anything but a letter)
smCharAscii $0001 ASCII letter (not a number or symbol,
character code <= $7F)
smCharExtAscii $0007 High-ASCII Roman letter (not a number or symbol, character code >= $80)

Additional character-type constants are provided for Japanese Katakana and Hiragana; the ideographic subscripts such as Hanzi, Kanji, and Hanja; 2-byte Cyrillic and Greek in 2-byte systems; bidirectional script systems such as Arabic and Hebrew; and Korean Hangul and Jamo subscripts:
Character type Hex. value Explanation
smCharKatakana $0002 Japanese Katakana
smCharHiragana $0003 Japanese Hiragana
smCharIdeographic $0004 Hanzi, Kanji, Hanja
smCharTwoByteGreek $0005 2-byte Greek in 2-byte scripts
smCharTwoByteRussian $0006 2-byte Cyrillic in 2-byte scripts
smCharBidirect $0008 Arabic, Hebrew
smCharContextualLR $0009 Thai, Indic, etc.
smCharNonContextualLR $000A Cyrillic, Greek, etc.
smCharHangul $000C Korean Hangul
smCharJamo $000D Korean Jamo
smCharBopomofo $000E Chinese Bopomofo (Zhuyinfuhao)

Bits 8-11 of the CharacterType function result describe the character class of the character in question. Character classes can be considered as subtypes of character types; a given character type can have several classes that belong to it.

If the character type is smCharPunct, the following character classes are defined that include punctuation for both 1-byte and 2-byte script systems:
Character class Hex. value Explanation
smPunctNormal $0000 Normal punctuation (such as ! , . ;?)
smPunctNumber $0100 Number character (such as 0-9)
smPunctSymbol $0200 Nonpunctuation symbol (such as # $ &)
smPunctBlank $0300 Blank character (such as ASCII $00, $0D, $20)
smPunctRepeat $0400 Repeat marker in 2-byte script
smPunctGraphic $0500 Line graphics in 2-byte script

In the Korean script system, if the character type is smCharJamo, the following character classes are defined. They determine whether a given byte contains a simple or complex consonant or a simple or complex vowel:
Character class Hex. value Explanation
smJamoJaeum $0000 Simple consonant character
smJamoBogJaeum $0100 Complex consonant character
smJamoMoeum $0200 Simple vowel character
smJamoBogMoeum $0300 Complex vowel character

The Jamo and Hangul subscripts of Korean are discussed briefly along with input methods in the chapter "Introduction to Text on the Macintosh" in this book.
In the Japanese script system, if the character type is smCharKatakana or smCharHiragana, the following character classes are defined:
Character class Hex. value Explanation
$0000 (none of the following defined classes)
smKanaSmall $0001 Small Kana character
smKanaHardOK $0002 Can have dakuten
smKanaSoftOK $0003 Can have dakuten or han-dakuten

A small Kana character is a special form of Kana used to modify the pronunciation of a previous (full-sized) Kana character. Dakuten and han-dakuten are pronunciation marks that soften consonant sounds in Kana.
In 2-byte script systems, if the character type is smCharIdeographic, the following character classes are defined:
Character class Hex. value Explanation
smIdeographicLevel1 $0000 Level 1 characters
smIdeographicLevel2 $0100 Level 2 characters
smIdeographicUser $0200 User characters

The characters specified by the smIdeographicLevel1 constant are part of the
level 1 Han character set specified by Japanese, Chinese, and Korean government standards. Approximately 90 percent of normal text consists of characters from the level 1 set.
The characters specified by the smIdeographicLevel2 constant are part of the
level 2 Han character set, which includes obscure characters. The level 1 and level 2 character sets combined contain 98 percent of the character set used in the Kanji subscript.
The characters specified by smIdeographicUser represent custom characters created by the user.

Bits 12-15 of the CharacterType function result are the character modifiers of the character in question. One bit describes each modifier.

Bit 12 specifies the orientation of the character: whether it is intended for horizontal or vertical writing.
Character orientation Hex. value Explanation
smCharHorizontal $0000 Character form is for horizontal writing, or for both horizontal and vertical
smCharVertical $1000 Character form is for vertical writing only

Bit 13 specifies the direction of the character: whether its line direction is left-to-right or right-to-left.
Character direction Hex. value Explanation
smCharLeft $0000 Character with left-to-right line direction
smCharRight $2000 Character with right-to-left line direction

Bit 14 specifies the case of the character: whether it is lowercase or uppercase.
Character case Hex. value Explanation
smCharLower $0000 Lowercase character
smCharUpper $4000 Uppercase character

Bit 15 specifies the size of the character: whether it is 1 or 2 bytes long.
Character size Hex. value Explanation
smChar1byte $0000 1-byte character
smChar2byte $8000 2-byte character

You can describe individual characters with combinations of these constants. For example, if the byte being examined by CharacterType is a 1-byte English uppercase "A", then the value of the result could be expressed as smChar1Byte + smCharUpper + smCharLeft + smCharASCII. CharacterType indicates blank characters by a type smCharPunct and a class smCharBlank.
Some values are meaningful only in certain subscripts or script systems. The value smCharUpper is meaningless in a subscript that has no uppercase characters, for example; the value smIdeographicLevel is meaningless in 1-byte script systems.
You can use CharacterType for a variety of purposes--to validate input in numeric fields, to filter non-phonetic characters in an input method, or to search for punctuation, uppercase letters, and symbols. If you are breaking lines of text and are not using the Text Utilities StyledLineBreak function, you can use CharacterType to locate and skip whitespace characters at the ends of lines; see the description of text drawing in the chapter "QuickDraw Text" in this book.
The CharacterType function is described further on page 6-85.

Directly Accessing International Resources
This section shows how you can directly access the international resources of a script system. Such direct access can help you be more efficient in creating bilingual applications, formatting numbers in different scripts, accessing character information, and using tokens. Several script-aware Text Utilities calls can take a handle to an international resource as an input parameter; you can use the calls in this section to obtain those handles.
Your application can examine the international resources that determine numeric formats, date formats, string sorting, conversion to tokens, and character encoding or rendering by making the calls described here. You can also retrieve individual tables from some of the resources.
This access also helps you to provide your own versions or regional variations of certain international resources. See "Replacing a Script System's Default International Resources" beginning on page 6-48 for more information.

Note
Although you can access the international resources independently through the Resource Manager function GetResource and related calls, you can be sure to get the preferred resource of the current script system by using the calls described here.
The calls you make to access the international resources are ClearIntlResourceCache, GetIntlResource, and GetIntlResourceTable. With them, you have access to the contents of a script system's numeric-format ('itl0'), long-date-format ('itl1'), string-manipulation ('itl2'), tokens ('itl4'), and encoding/rendering ('itl5') resources.
To access one of these resources for the current script, follow these steps:

Make sure the current script is the script system containing the international resource you want to access. See "Determining Script Codes From Font Information" on page 6-21. You may need to verify the settings of the font script, the system script, and the international resources selection flag. See "Using the International Resources Selection Flag" on page 6-25.
If you need access to any version of the current script's string-manipulation or tokens resources other than its default version, call ClearIntlResourceCache first. See "Replacing a Script System's Default International Resources" on page 6-48.
Call GetIntlResource, specifying the type of resource you need. GetIntlResource returns a handle to the resource.

For an example of using GetIntlResource to extract information from an international resource, see the next section, "Using Currency, Number, and Date Formats."
To access a specific table within a string-manipulation or tokens resource, follow
these steps:

If you don't already have it, determine the script code of the script system containing the international resource you want to access. See "Determining Script Codes From Font Information" on page 6-21.
If you need access to any other than the script's default version of that resource, call ClearIntlResourceCache first. See "Replacing a Script System's Default International Resources" on page 6-48.
Call GetIntlResourceTable to get the specified table within the specified resource belonging to the specified script system. Depending on the resource, you can get its number-parts, untoken, word-selection, line-break, or whitespace table.

For more information about these tables, see the following sections: "Using Number Parts," "Retrieving Text From Tokens," "Using Word-Break Tables," and "Using Whitespace Information."

IMPORTANT
Any time you replace the default international resources for a script system, whether or not you subsequently call GetIntlResource or GetIntlResourceTable, you need to call ClearIntlResourceCache, to make sure that the replacements are used by all script-aware calls. See "Replacing a Script System's Default International Resources" beginning on page 6-48.

Using Currency, Number, and Date Formats
In general, you should use the Text Utilities routines for date, time, and number formatting. See the chapter "Text Utilities" in this book. If, however, you need to directly access fields in the numeric-format ('itl0') and long-date-format ('itl1') resources to find the characters, separators, strings, and orders for formatting numbers, dates, and times, you can do so with GetIntlResource.
Listing 6-7 shows how to determine the decimal, thousands, and list separators for number formatting in the current script. To access the numeric-format resource, the routine specifies a resource selector of 0 (for 'itl0') in the parameter theID of the GetIntlResource function. It then extracts the values it wants from the decimalPt, thousSep, and listSep fields.
Listing 6-7 Determining the number separators for the current script
PROCEDURE MyGetNumberSeparators (VAR myDecimal:Char; 
                                 VAR myThousands:Char; 
                                 VAR myListSep:Char);
VAR
   myHandle:      Intl0Hndl;
                           {make sure the desired script is set }
                           { before calling this routine}
BEGIN
   myHandle := Intl0Hndl(GetIntlResource(0));{Get 'itl0' resource}
   myDecimal := myHandle^^.decimalPt;  {for example, 1.234}
   myThousands := myHandle^^.thousSep; {for example, 1,234,567}
   myListSep := myHandle^^.listSep;    {for example, 1;2;3}
END;
IMPORTANT
Do not assume that the components of dates and times are always ordered in a left-to-right direction when displayed. If you are drawing individual time components, be careful not to simply draw them from left to right in all cases. For instance, the AM/PM characters in an English time string are on the right, whereas in an Arabic time string the equivalent characters may be on the left or right, depending on the primary line direction--even though in both cases these characters are at the end of the time string in memory.

Using Number Parts
You can access information on how separators and other parts of formatted numbers are represented in a particular script system by examining the number parts table in the script's tokens ('itl4') resource. Unlike the numeric-format resource, the number parts table supports 2-byte characters; it also contains more information, especially for complicated number formats such as scientific notation.
Your most common reason for obtaining the number parts table may be to pass it as a parameter to the Text Utilities functions StringToFormatRec, FormatRecToString, StringToExtended, and ExtendedToString. But you can also examine its contents. Listing 6-8 shows how to call the GetIntlResourceTable procedure, with a table selector of smNumberPartsTable, to obtain the number parts table associated with a given script. The routine obtains the character associated with the number part specified by thePart and saves it as a wide character, which is a character of either 1 or 2 bytes. (See the discussion of the tokens resource in the appendix "International Resources" for a definition of the WideChar data type.) To specify the system script, the parameter theScript would have the value smSystemScript. The parameter thePart can have such values as tokDecPoint and tokThousands. For a complete list of number-parts constants, see the description of the tokens resource in the appendix "International Resources" in this book.
Listing 6-8 Getting number parts from a script system's number parts table
PROCEDURE MyMapNumPartToWideChar(theScript: ScriptCode; 
                                 thePart: Integer; 
                                 VAR theWChar: WideChar);
VAR
   itlHandle: Handle;
   numpartsOffset: Longint;
   numpartsLength: LongInt;
   numpartsPtr: NumberPartsPtr;

BEGIN
   GetIntlResourceTable(theScript, smNumberPartsTable, 
                        itlHandle, numpartsOffset, 
                        numpartsLength);
   
   IF itlHandle = NIL THEN    {handle errors, }
      theWChar.b := 0         { return null WideChar}
   ELSE BEGIN                 {make numpartsPtr point to }
                              { beginning of number parts table}

      numpartsPtr := NumberPartsPtr(LongInt(itlHandle^) + 
                     numpartsOffset);
      IF thePart > tokMaxSymbols THEN  {invalid number part-- }
                                       { handle  error, }
         theWChar.b := 0               { return null WideChar}
      ELSE BEGIN
         theWChar := numpartsPtr^.data[thePart];
      END;
   END;
END;
Retrieving Text From Tokens
Tokens are abstract entities that stand for classes of text items such as alphanumeric strings, various symbols, and quoted literals. The Script Manager IntlTokenize function converts programming-language text into script-independent tokens useful to compilers or interpreters. See "Tokenization" on page 6-38. The untoken table in a script system's tokens ('itl4') resource has the opposite purpose; it helps you convert script-independent tokens into the text of a given script system.
The untoken table lists the characters associated with each fixed (invariant) token defined by that script. (An invariant token is one that, like tokenColon, represents a unique symbol. Other types of tokens, like tokenAlpha, represent an arbitrary sequence of characters.) If you need to find out, for example, how a given script system represents the "less than or equal to" symbol (is it the 1-byte character "\xE6", a 2-byte encoding of the character "\xE6", the 2-byte, 2-character sequence "<=", or something else altogether?), you can look up the values of tokenLessEqual1 and tokenLessEqual2 in that script's untoken table.
The untoken table is most useful for obtaining script-specific forms for individual common symbols, such as the ellipsis or center dot. If you truncate strings with the ellipsis character (...) or use the center dot () such as AppleShare does for echoing passwords, don't hardcode their character codes; they may not be valid in some script systems. Instead, specify tokenEllipsis or tokenCenterDot, and use the untoken table of the current script system to obtain the proper text for those tokens.

Note
If a script system has no defined character or string that corresponds to a particular token, the untoken table contains either a null string or the string "??" for that token.
You access the untoken table by calling the GetIntlResourceTable procedure with a table selector of smNumberPartsTable. Listing 6-9 provides an example of how to access the untoken table in the tokens resource. This code sample extracts the canonical string associated with a token. It sets the parameter theString to the string that corresponds to the token theToken. (Usually, this string is 4 bytes or less.) To specify the system script, the parameter theScript would have the value smSystemScript. The parameter theToken can have such values as tokenNoBreakSpace, tokenEllipsis, and tokenCenterDot. For a complete list of defined constants for tokens, see "Token Codes" beginning on page 6-58.
Listing 6-9 Getting a token string from the untoken table
PROCEDURE MyMapTokenToString(theScript: ScriptCode; theToken: 
Integer; VAR theString: Str255);

VAR
   itlHandle:        Handle;
   untokenOffset:    LongInt;
   untokenLength:    LongInt;
   untokenPtr:       UntokenTablePtr;
   untokenStringPtr: StringPtr;

BEGIN
   GetIntlResourceTable(theScript, smUnTokenTable, itlHandle, 
                        untokenOffset, untokenLength);
   
   IF itlHandle = NIL THEN    {handle errors, return null string}
      theString := ''
   ELSE BEGIN                 {make untokenPtr point to the }
                              { beginning of the untoken table}
      untokenPtr := UntokenTablePtr(LongInt(itlHandle^) + 
                     untokenOffset);
      IF theToken > untokenPtr^.lastToken THEN  {this token is }
                                             { not in table-- } 
         theString := ''                     { return null string}
      ELSE BEGIN              {index[theToken] is the offset }
                              { of the desired string from the }
                              { beginning of the untoken table}
         untokenStringPtr := StringPtr(LongInt(untokenPtr) +
untokenPtr^.index[theToken]);
         theString := untokenStringPtr^;
      END;
   END;
END;
Even though using the untoken table is conceptually the converse of calling the IntlTokenize function, their purposes are different. IntlTokenize is used as a first step toward compiling or interpreting programming-language source text, and its results are rarely returned or reconverted to source text. The untoken table is most commonly used to supply localized text for individual common tokens.

Using Word-Break Tables
If you use the Text Utilities FindWordBreaks procedure to determine the boundaries of a word, you normally do not need to pass it an explicit pointer to a word-break table. However, if you want to use a custom word-break table you can pass FindWordBreaks a pointer to that table. Word-break tables are in a script system's string-manipulation ('itl2') resource; you can gain access to them by calling the GetIntlResourceTable procedure with a table selector of smWordSelectTable or smWordWrapTable.
There are two possible table selectors because a script system may have different word breaks for word selection than it does for line breaking. If you are using FindWordBreaks to select an individual word, use smWordSelectTable when you call GetIntlResourceTable to obtain the word-break table. If you are using FindWordBreaks to find line breaks, use smWordWrapTable when you call GetIntlResourceTable.

Using Whitespace Information
Most applications that need whitespace information, such as when eliminating extra spaces in text or searching for non-space characters, can get it by calling the CharacterType function. However, if your application needs a listing of all valid whitespace characters in a script system, you can call GetIntlResourceTable with a table selector of smWhiteSpaceList. GetIntlResourceTable returns the whitespace table from the script system's tokens resource.

Character type	Hex. value	Explanation
smCharPunct	$0000	Punctuation (anything but a letter)
smCharAscii	$0001	ASCII letter (not a number or symbol, character code <= $7F)
smCharExtAscii	$0007	High-ASCII Roman letter (not a number or symbol, character code >= $80)

Character type	Hex. value	Explanation
smCharKatakana	$0002	Japanese Katakana
smCharHiragana	$0003	Japanese Hiragana
smCharIdeographic	$0004	Hanzi, Kanji, Hanja
smCharTwoByteGreek	$0005	2-byte Greek in 2-byte scripts
smCharTwoByteRussian	$0006	2-byte Cyrillic in 2-byte scripts
smCharBidirect	$0008	Arabic, Hebrew
smCharContextualLR	$0009	Thai, Indic, etc.
smCharNonContextualLR	$000A	Cyrillic, Greek, etc.
smCharHangul	$000C	Korean Hangul
smCharJamo	$000D	Korean Jamo
smCharBopomofo	$000E	Chinese Bopomofo (Zhuyinfuhao)

Character class	Hex. value	Explanation
smPunctNormal	$0000	Normal punctuation (such as ! , . ;?)
smPunctNumber	$0100	Number character (such as 0-9)
smPunctSymbol	$0200	Nonpunctuation symbol (such as # $ &)
smPunctBlank	$0300	Blank character (such as ASCII $00, $0D, $20)
smPunctRepeat	$0400	Repeat marker in 2-byte script
smPunctGraphic	$0500	Line graphics in 2-byte script

Character class	Hex. value	Explanation
smJamoJaeum	$0000	Simple consonant character
smJamoBogJaeum	$0100	Complex consonant character
smJamoMoeum	$0200	Simple vowel character
smJamoBogMoeum	$0300	Complex vowel character

Character class	Hex. value	Explanation
	$0000	(none of the following defined classes)
smKanaSmall	$0001	Small Kana character
smKanaHardOK	$0002	Can have dakuten
smKanaSoftOK	$0003	Can have dakuten or han-dakuten

Character class	Hex. value	Explanation
smIdeographicLevel1	$0000	Level 1 characters
smIdeographicLevel2	$0100	Level 2 characters
smIdeographicUser	$0200	User characters

	Character orientation	Hex. value	Explanation
	smCharHorizontal	$0000	Character form is for horizontal writing, or for both horizontal and vertical
	smCharVertical	$1000	Character form is for vertical writing only

	Character direction	Hex. value	Explanation
	smCharLeft	$0000	Character with left-to-right line direction
	smCharRight	$2000	Character with right-to-left line direction

	Character case	Hex. value	Explanation
	smCharLower	$0000	Lowercase character
	smCharUpper	$4000	Uppercase character

	Character size	Hex. value	Explanation
	smChar1byte	$0000	1-byte character
	smChar2byte	$8000	2-byte character

Shop the Apple Online Store (1-800-MY-APPLE), visit an Apple Retail Store, or find a reseller.