Important: The information in this document is obsolete and should not be used for new development.
Tables for 2-Byte Script Systems
In 2-byte script systems, the encoding/rendering resource contains byte-type and character-type information. The tables immediately follow the directory in the'itl5'
header.A byte-type table contains character-size information about a specific byte in the range of $00 to $FF. A character-type table contains detailed information about the character represented by a specific byte, given a particular character-encoding scheme.
Table B-9 shows the general structure of a typical encoding/rendering resource for a 2-byte script system.
Byte-Type Table
A byte-type table has 256 integer entries, one for each possible byte value in the range $00 to $FF. Each byte value is an index into the table. At each byte value, the table entry can have one of three values, specifying what kind of character or part of a character that byte value can represent.
When processing text sequentially in a buffer, you encounter a 2-byte character's high-order byte before its low-order byte. Thus you can determine the character relationship of a given byte (1-byte or 2-byte, high-order or low-order byte) by determining its byte type and, if necessary, comparing it with the byte type of the previous byte in the buffer.
- 1 = 1-byte character only
- 0 = 1-byte character or low-order byte of a 2-bye character
- -1 = high-order or low-order byte of a 2-byte character
Character-Type Table
The character-type table consists of one high-order byte table and a series of low-order byte tables.The high-order byte table contains 256 word-length entries. The index position of each entry represents a high-order byte value. Nonzero entries mark valid high-order bytes of a 2-byte character. If a given entry is nonzero, it specifies which low-order byte table to consult to get character-type information.
There are one or more low-order byte tables, each of which can contain either a single entry or 256 word-length entries. If all low-order bytes for a given high-order byte have the same character type, the low-order byte "table" for that high-order byte consists of a single character-type value. Otherwise, every possible low-order byte is represented by an index into the table, with an appropriate character-type value at each valid index position.
For example, to find character-type information for the Japanese character with character code $EA40, you would examine location $EA in the high-order byte table; it would indicate the existence of a low-order byte table, and in that table you would examine location $40. That location would contain information showing that the character is a 2-byte JIS level-2 ideographic character that is part of the main character set.
Character types are discussed under the description of the
CharacterType
function in the chapter "Script Manager" in this book.