The 'cmap' table
General table information
The 'cmap' table maps character codes to glyph indices. The choice of encoding for a particular font is dependent upon the conventions used by the intended platform. A font intended to run on multiple platforms with different encoding conventions will require multiple encoding tables. As a result, the 'cmap' table may contain multiple subtables, one for each supported encoding scheme.
Character codes that do not correspond to any glyph in the font should be mapped to glyph index 0. At this location in the font there must be a special glyph representing a missing character, typically a box. No character code should be mapped to glyph index -1 (0xFFFF), which is a special value reserved in processing to indicate the position of a glyph deleted from the glyph stream.
The 'cmap' table begins with a table version number followed by the number of encoding tables. The encoding subtables follow.
Note that only one of these encoding subtables is used at a time. If multiple encoding subtables are found, the 'cmap' parsing software determines which one to use. The exception is a type 14 encoding subtable, which can only be used in conjunction with another subtable. If multiple type 14 encoding subtables are present, which one is used is undefined.
The original definition of the 'cmap' table used either eight or sixteen bits for character codes. In support for versions of Unicode from 2.0 onwards, it is possible that fonts may require references to data that uses a mixture of sixteen and thirty-two bits per character, or simply thirty-two bits per character.
Type | Name | Description |
---|---|---|
UInt16 | version | Version number (Set to zero) |
UInt16 | numberSubtables | Number of encoding subtables |
The 'cmap' encoding subtables
Each 'cmap' subtable consists of three fields:
Type | Name | Description |
---|---|---|
UInt16 | platformID | Platform identifier |
UInt16 | platformSpecificID | Platform-specific encoding identifier |
UInt32 | offset | Offset of the mapping table |
The 'cmap' subtables must be sorted first in ascending order by platform identifier and then by platform-specific identifier.
The platformID and platformSpecificID fields use the same values as are used by the equivalent fields in the 'name' table:
Platform ID
|
Platform
|
Platform-specific ID |
---|---|---|
0 | Unicode | Indicates Unicode version. |
1 | Macintosh | Script Manager code. |
2 | (reserved; do not use) | |
3 | Microsoft | Microsoft encoding. |
Unicode encoding
When the platformID is 0 (Unicode), the platformSpecificID is interpreted as follows:
Platform-
specific ID code |
Meaning
|
---|---|
0 | Version 1.0 semantics |
1 | Version 1.1 semantics |
2 | ISO 10646 1993 semantics (deprecated) |
3 | Unicode 2.0 or later semantics (BMP only) |
4 | Unicode 2.0 or later semantics (non-BMP characters allowed) |
5 | Unicode Variation Sequences |
6 | Last Resort |
platformID values other than 0, 1, or 3 are allowed but cmaps using them will be ignored.
The Unicode platform's platform-specific ID 6 was intended to mark a 'cmap' subtable as one used by a last resort font. This is not required by any Apple platform.
Macintosh encoding
When the platformID is 1 (Macintosh), the platformSpecificID is a QuickDraw script code. See the 'name' table documentation for a list of these.
The use of the Macintosh platformID is currently discouraged. Subtables with a Macintosh platformID are only required for backwards compatibility with QuickDraw and will be synthesized from Unicode-based subtables if ever needed.
Windows encoding
When the platformID is 3 (Windows), the platformSpecificID is a is interpreted as follows:
Platform-
specific ID code |
Meaning
|
---|---|
0 | Symbol |
1 | Unicode BMP-only (UCS-2) |
2 | Shift-JIS |
3 | PRC |
4 | BigFive |
5 | Johab |
10 | Unicode UCS-4 |
Subtable requirements
There are three basic types of encoding subtables:
- Unicode subtables
- Unicode variation sequence subtables
- Non-Unicode subtables (everything else)
A Unicode subtable is one supporting the Unicode text-encoding standard. Such subtables have a Unicode platformID and a platformSpecificID other than 14, or a Microsoft platformID and a platformSpecificID of 1 or 10.
A Unicode variation sequence subtable is one with a Unicode platformID and a platformSpecificID of 14.
Unicode subtable requirements
Most fonts should have a Unicode encoding subtable. Fonts with a Windows/Symbol encoding subtable (3/0) should have no Unicode cmaps.
If a font has multiple Unicode encoding subtables, each character should be mapped to the same glyph by every Unicode subtable in which it appears.
Unicode variation sequence subtable requirements
Fonts should not have more than one Unicode variation sequence subtable. If a font does have multiple variation sequence subtables, only one will be used and the others ignored. Which one is used is not defined.
Fonts with a Unicode variation sequence subtable require a Unicode encoding subtable of format 4 or 12.
Non-Unicode subtable requirements
Fonts with a Windows/Symbol encoding subtable (3/0) should have no Unicode cmaps.
The use of subtables with a Macintosh platformID is discouraged.
Subtable search order
Other than a Unicode variation sequence subtable, exactly one encoding subtable will be used to map characters to glyphs. The encoding subtables are searched for the one most appropriate for use, determined by their platform/platformSpecificID values. Apple does not guarantee that Unicode encoding subtables will be looked for in any particular order. However:
- Unicode encoding subtables are used in preference to non-Unicode encoding subtables.
- Unicode encoding subtables not restricted to the BMP are used in preference to subtables restricted to the BMP.
- Unicode variation sequence subtables are always processed if another Unicode cmap of type 4 or 12 is present.
For example, if a font has both a 0/4 (Unicode, UCS-4) cmap and a 0/3 (Unicode/BMP-only) cmap, the former will be used and the latter ignored. Nonetheless, they should have identical mappings for Unicode's BMP.
If a font has a 3/10 cmap (Windows, UCS-4), it should also have a 3/1 (Windows, BMP-only) cmap as well for backwards compatibility with Windows XP. There is no equivalent requirement for Apple platforms.
See the OpenType specification for further requirements for the Windows platform.
Note that because a encoding subtable uses offsets to the actual data for any platform/platformSpecificID, it's possible for a type 0/4 and type 3/10 cmap to have the same actual data, not just identical data.
The 'cmap' table and language codes
Each 'cmap' subtable has a two-byte language code associated with it. This language code is only used for subtables with a Macintosh platformID. For such subtables, it is interpreted as one more than a QuickDraw language code, or zero if the subtable is language independent. In all other subtables, it should be zero.
Note that the use of the Macintosh platformID is currently discouraged. Subtables with a Macintosh platformID are only required for backwards compatibility with QuickDraw and will be synthesized from Unicode-based subtables if ever needed.
The 'cmap' formats
Each 'cmap' subtable is in one of nine currently available formats. These are format 0, format 2, format 4, format 6, format 8, format 10, format 12, format 13, and format 14 described in the next section.
The Macintosh standard character to glyph mapping is supported by format 0. Format 2 supports a mixed 8/16 bit mapping useful for Japanese, Chinese and Korean. Format 4 is used for 16 bit mappings. Format 6 is used for dense 16 bit mappings.
Formats 8, 10, and 12, 13, and 14 are used for mixed 16/32-bit and pure 32-bit mappings. This supports text encoded with surrogates in Unicode 2.0 and later.
Many of the cmap formats are either obsolete or were designed to meet anticipated needs which never materialized. Modern font generation tools might not need to be able to write general-purpose cmaps in formats other than 4 and 12. Formats 13 and 14 are both for specialized uses. Format 13 is structurally the same as format 12 (but with a different interpretation of the data), so support for it (if needed) is relatively easy to provide. Support for format 14 encoding subtables is required for use with Unicode variation selectors.
All cmap formats are supported by Apple platforms except 0, 8, and 10. See the OpenType specification for information on encoding subtable format support on other platforms.
'cmap' format 0
Format 0 is suitable for fonts whose character codes and glyph indices are restricted to single bytes. This was a very common situation when TrueType was introduced but is rarely encountered now.
Type | Name | Description |
---|---|---|
UInt16 | format | Set to 0 |
UInt16 | length | Length in bytes of the subtable (set to 262 for format 0) |
UInt16 | language | Language code (see above) |
UInt8 | glyphIndexArray[256] | An array that maps character codes to glyph index values |
'cmap' format 2
The format 2 mapping subtable type is used for fonts containing Japanese, Chinese, or Korean characters. The code standards used in this table are supported on Macintosh systems in Asia. These fonts contain a mixed 8/16-bit encoding, in which certain byte values are set aside to signal the first byte of a 2-byte character. These special values are also legal as the second byte of a 2-byte character.
The following table shows the format of a format 2 encoding subtable. The subHeaderKeys array maps each possible high byte into a particular member of the suborders array. This allows the determination of whether or not a second byte is used. In either case, the path leads into the glyphIndexArray from which the mapped glyph index is obtained. The sequence of operations is as follows:
Consider a high byte, i, designating an integer between 0 and 255. The value subHeaderKeys[i], divided by 8, is the index k into the subHeaders array. The value k equals 0 is special. It means that i is a one-byte code and no second byte will be referenced. If k is positive, then i is the high-byte of a two-byte code and its second byte, j, will be consumed.
Type | Name | Description |
---|---|---|
UInt16 | format | Set to 2 |
UInt16 | length | Total table length in bytes |
UInt16 | language | Language code (see above) |
UInt16 | subHeaderKeys[256] | Array that maps high bytes to subHeaders: value is index * 8 |
UInt16 * 4 | subHeaders[variable] | Variable length array of subHeader structures |
UInt16 | glyphIndexArray[variable] | Variable length array containing subarrays |
The subHeader data type is a 4-word structure defined by the C-language structure shown below:
typedef struct { UInt16 firstCode; UInt16 entryCount; int16 idDelta; UInt16 idRangeOffset; } subheader;
If k is positive, then the four values belonging to subheaders[k] are used as follows with firstCode and entryCount defining the allowable range for the second byte j:
firstCode <= j < (firstCode + entryCount)
If j is outside this range, index 0 (the missing character glyph) is returned. Otherwise, idRangeOffset is used to identify the associated range within the glyphIndexArray. The glyphIndexArray immediately follows the subHeaders array and may be loosely viewed as an extension to it. The value of the idRangeOffset is the number of bytes past the actual location of the idRangeOffset word where the glyphIndexArray element corresponding to firstCode appears. If p is zero, it is returned directly. If p is nonzero, p = p + idDelta is returned. The sum is reduced modulo 65536, if necessary.
For the one-byte case with k = 0, the structure subHeaders[0] will show firstCode = 0, entryCount = 256, and idDelta = 0. The idRangeOffset will point, as previously discussed, to the beginning of the glyphIndexArray. Indexing i words into this array gives the returned value p = glyphIndexArray[i].
The format 2 cmap is targeted at mixed 8/16-bit encodings such as Big Five and Shift-JIS. Such encodings are still widely used, and Apple platforms will correctly support them even if a format 2 cmap is not present.
'cmap' format 4
Format 4 is a two-byte encoding format. It should be used when the character codes for a font fall into several contiguous ranges, possibly with holes in some or all of the ranges. That is, some of the codes in a range may not be associated with glyphs in the font. Two-byte fonts that are densely mapped should use Format 6.
The table begins with the format number, the length and language. The format-dependent data follows. It is divided into three parts:
- A four-word header giving parameters needed for an optimized search of the segment list
- Four parallel arrays describing the segments (one segment for each contiguous range of codes)
- A variable-length array of glyph IDs
Type | Name | Description |
---|---|---|
UInt16 | format | Format number is set to 4 |
UInt16 | length | Length of subtable in bytes |
UInt16 | language | Language code (see above) |
UInt16 | segCountX2 | 2 * segCount |
UInt16 | searchRange | 2 * (2**FLOOR(log2(segCount))) |
UInt16 | entrySelector | log2(searchRange/2) |
UInt16 | rangeShift | (2 * segCount) - searchRange |
UInt16 | endCode[segCount] | Ending character code for each segment, last = 0xFFFF. |
UInt16 | reservedPad | This value should be zero |
UInt16 | startCode[segCount] | Starting character code for each segment |
UInt16 | idDelta[segCount] | Delta for all character codes in segment |
UInt16 | idRangeOffset[segCount] | Offset in bytes to glyph indexArray, or 0 |
UInt16 | glyphIndexArray[variable] | Glyph index array |
The number of segments is specified by the variable segCount. This variable is not explicitly used in the Format 4 table, however it is the number from which all of the table parameters are derived. The segCount is the number of contiguous code ranges in the font. The searchRange value is twice the largest power of 2 that is less than or equal to segCount.
The searchRange, entrySelector, and rangeShift fields are not used on Apple platforms but should be set correctly for compatibility with other platforms.
Example Format 4 subtable values are shown in this table:
segCount | 39 | Not calculated; determined from the organization of the glyph indices |
searchRange | 64 | (2 * (largest power of 2 <= 39)) = 2 * 32 |
entrySelector | 5 | (log2(the largest power of 2 < segCount)) |
rangeShift | 14 | (2 * segCount) - searchRange = (2 * 39) - 64 |
Each segment is described by a startCode, an endCode, an idDelta and an idRangeOffset. These are used for mapping the character codes in the segment. The segments are sorted in order of increasing endCode values.
To use these arrays, it is necessary to search for the first endCode that is greater than or equal to the character code to be mapped. If the corresponding startCode is less than or equal to the character code, then use the corresponding idDelta and idRangeOffset to map the character code to the glyph index. Otherwise, the missing character glyph is returned. To ensure that the search will terminate, the final endCode value must be 0xFFFF. This segment need not contain any valid mappings. It can simply map the single character code 0xFFFF to the missing character glyph, glyph 0.
If the idRangeOffset value for the segment is not 0, the mapping of the character codes relies on the glyphIndexArray. The character code offset from startCode is added to the idRangeOffset value. This sum is used as an offset from the current location within idRangeOffset itself to index out the correct glyphIdArray value. This indexing method works because glyphIdArray immediately follows idRangeOffset in the font file. The address of the glyph index is given by the following equation:
glyphIndexAddress = idRangeOffset[i] + 2 * (c - startCode[i]) + (Ptr) &idRangeOffset[i]
Multiplication by 2 in this equation is required to convert the value into bytes.
Alternatively, one may use an expression such as:
glyphIndex = *( &idRangeOffset[i] + idRangeOffset[i] / 2 + (c - startCode[i]) )
This form depends on idRangeOffset being an array of UInt16's.
Once the glyph indexing operation is complete, the glyph ID at the indicated address is checked. If it's not 0 (that is, if it's not the missing glyph), the value is added to idDelta[i] to get the actual glyph ID to use.
If the idRangeOffset is 0, the idDelta value is added directly to the character code to get the corresponding glyph index:
glyphIndex = idDelta[i] + c
NOTE: All idDelta[i] arithmetic is modulo 65536.
The following table gives an example of the parameters required to map characters 10-20, 30-90, and 100-153 to a contiguous range of glyph indices. The parameter segCount = 4 for this example. This table gives the mapping variant parameter values for a Format 4 subtable example. The example data demonstrates how the character-to glyph index mapping values are calculated. Assumptions for this table are that segCountX2 is 8, searchRange is 8, entrySelector is 2, rangeShift is 0.
Name | Segment 1 Chars 10-20 |
Segment 2 Chars 30-90 |
Segment 3 Chars 100-153 |
Segment 4 Missing Glyph |
---|---|---|---|---|
endCode | 20 | 90 | 153 | 0xFFFF |
startCode | 10 | 30 | 100 | 0xFFFF |
idDelta | -9 | -18 | -27 | 1 |
idRangeOffset | 0 | 0 | 0 | 0 |
This table performs the following mappings:
10 is mapped to 10-9 or 1 20 is mapped to 20-9 or 11 30 is mapped to 30-18 or 12 90 is mapped to 90-18 or 72
and so on.
Type 4 cmaps are required for backwards compatibility in Windows and are generally useful for BMP-only Unicode fonts. The redundancies in the header are for historical purposes—by pre-calculating these values, the performance of the lookup algorithm was substantially improved on older, slower processors.
'cmap' format 6
Format 6 is used to map 16-bit, 2-byte, characters to glyph indexes. It is sometimes called the trimmed table mapping. It should be used when character codes for a font fall into a single contiguous range. This results in what is termed a dense mapping. Two-byte fonts that are not densely mapped (due to their multiple contiguous ranges) should use Format 4. Character-to-glyph index mapping subtable Format 6 is shown in the following table:
Type | Name | Description |
---|---|---|
UInt16 | format | Format number is set to 6 |
UInt16 | length | Length in bytes |
UInt16 | language | Language code (see above) |
UInt16 | firstCode | First character code of subrange |
UInt16 | entryCount | Number of character codes in subrange |
UInt16 | glyphIndexArray[entryCount] | Array of glyph index values for character codes in the range |
The firstCode and entryCount values in the subtable specify the useful subrange within the range of possible character codes. The range begins with firstCode and has a length equal to entryCount. Codes outside of this subrange are assumed to be missing and are mapped to the glyph with index 0. For a code within the subrange, its offset from the firstCode in the subrange is used as an index into the glyphIndexArray. That array provides the glyph index associated with that character code.
Type 6 cmaps are primarily useful for BMP-only Unicode fonts.
'cmap' format 8–Mixed 16-bit and 32-bit coverage
Format 8 is a bit like format 2, in that it provides for mixed-length character codes. If a font contains Unicode surrogates, it's likely that it will also include other, regular 16-bit Unicodes as well. This requires a format to map a mixture of 16-bit and 32-bit character codes, just as format 2 allows a mixture of 8-bit and 16-bit codes. A simplifying assumption is made: namely, that there are no 32-bit character codes which share the same first 16 bits as any 16-bit character code. This means that the determination as to whether a particular 16-bit value is a standalone character code or the start of a 32-bit character code can be made by looking at the 16-bit value directly, with no further information required.
Here's the format 8 subtable format:
Type | Name | Description |
---|---|---|
UInt16 | format | Subtable format; set to 8 |
UInt16 | reserved | Set to 0 |
UInt32 | length | Byte length of this subtable (including the header) |
UInt32 | language | Language code (see above) |
UInt8 | is32[65536] | Tightly packed array of bits (8K bytes total) indicating whether the particular 16-bit (index) value is the start of a 32-bit character code |
UInt32 | nGroups | Number of groupings which follow |
Here follow the individual groups. Each group has the following format:
Type | Name | Description |
---|---|---|
UInt32 | startCharCode | First character code in this group; note that if this group is for one or more 16-bit character codes (which is determined from the is32 array), this 32-bit value will have the high 16-bits set to zero |
UInt32 | endCharCode | Last character code in this group; same condition as listed above for the startCharCode |
UInt32 | startGlyphCode | Glyph index corresponding to the starting character code |
A few notes here. The endCharCode is used, rather than a count, because comparisons for group matching are usually done on an existing character code, and having the endCharCode be there explicitly saves the necessity of an addition per group.
The presence of the packed array of bits indicating whether a particular 16-bit value is the start of a 32-bit character code is useful even when the font contains no glyphs for a particular 16-bit start value. This is because the system software often needs to know how many bytes ahead the next character begins, even if the current character maps to the missing glyph. By including this information explicitly in this table, no "secret" knowledge needs to be encoded into the OS.
Thus, although cmap format 8 is theoretically well-suited for Unicode text encoded using surrogates, it also has the flexibility to be used with other character set encodings.
To determine if a particular word (cp) is the first half of thirty-two bit code points, one can use an expression such as ( is32[ cp / 8 ] & ( 1 << ( cp % 8 ) ) ). If this is non-zero, then the word is the first half of a thirty-two bit code point.
0 is not a special value for the high word of a 32-bit code point. A font may not have both a glyph for the code point 0x0000 and glyphs for code points with a high word of 0x0000.
Type 8 cmaps have not seen any particular use since the format was introduced. Rendering engines generally do not deal with Unicode surrogates directly—the character codes are usually converted to UTF-32 before conversion to glyphs—and no other character encoding uses mixed 16/32-bit characters. The use of this format is discouraged.
'cmap' format 10–Trimmed array
Format 10 is a bit like format 6, in that it defines a trimmed array for a tight range of 32-bit character codes:
Type | Name | Description |
---|---|---|
UInt16 | format | Subtable format; set to 10 |
UInt16 | reserved | Set to 10 |
UInt32 | length | Byte length of this subtable (including the header) |
UInt32 | language | Language code (see above) |
UInt32 | startCharCode | First character code covered |
UInt32 | numChars | Number of character codes covered |
UInt16 | glyphs[] | Array of glyph indices for the character codes covered |
The format 10 cmap has seen little use since its introduction. It not supported on Windows and is the best choice only for fonts whose character repertoire is almost entirely in a contiguous block outside of Unicode's BMP. Such fonts are rare.
'cmap' format 12–Segmented coverage
Format 12 is a bit like format 4, in that it defines segments for sparse representation in 4-byte character space. Here's the subtable format:
Type | Name | Description |
---|---|---|
UInt16 | format | Subtable format; set to 12 |
UInt16 | reserved | Set to 0. |
UInt32 | length | Byte length of this subtable (including the header) |
UInt32 | language | Language code (see above) |
UInt32 | nGroups | Number of groupings which follow |
Here follow the individual groups, each of which has the following format:
Type | Name | Description |
---|---|---|
UInt32 | startCharCode | First character code in this group |
UInt32 | endCharCode | Last character code in this group |
UInt32 | startGlyphCode | Glyph index corresponding to the starting character code; subsequent charcters are mapped to sequential glyphs |
Again, the endCharCode is used, rather than a count, because comparisons for group matching are usually done on an existing character code, and having the endCharCode be there explicitly saves the necessity of an addition per group.
Format 12 is required for Unicode fonts covering characters above U+FFFF on Windows. It is the most useful of the cmap formats with 32-bit support.
'cmap' format 13–Many-to-one mappings
Format 13 is a modified version of the type 12 'cmap' subtable, used internally by Apple for its last resort font and by Unicode's last resort font. It would, in general, not be appropriate for any font other than a last resort font.
Structurally, formats 13 and 12 are identicial. The only difference lies in the intepretation of the glyph code in each range. In a format 13 'cmap' subtable, all the glyphs in the range are mapped to the same glyph code, whereas in a type 12 'cmap' subtable they are mapped to sequential glyph codes, starting with the given one.
To illustrate, suppose we have the following group in both a format 12 'cmap' table and a format 13 'cmap' subtable:
Type | Name | Value | Description |
---|---|---|---|
UInt32 | startCharCode | 0x4E00 | First character code in this group |
UInt32 | endCharCode | 0x9FCB | Last character code in this group |
UInt32 | glyphCode | 47 | Glyph index for this group |
In a type 12 'cmap' subtable, U+4E95 would be mapped to glyph (0x4E95 - 0x4E00) + 47 = 196. In a format 13 'cmap' subtable, it would be mapped to glyph 47.
Type 13 cmaps are only useful for fonts which use the same glyph for all the characters in a Unicode block—that is, only a Last Resort-like font will ever be likely to need one.
Format 14: Unicode Variation Sequences
Subtable format 14 specifies the Unicode Variation Sequences (UVSes) supported by the font. A Variation Sequence, according to the Unicode Standard, comprises a base character followed by a variation selector; e.g. <U+82A6, U+E0101>.
The subtable partitions the UVSes supported by the font into two categories: “default” and “non-default” UVSes. Given a UVS, if the glyph obtained by looking up the base character of that sequence in the Unicode encoding subtable (i.e. the UCS-4 or the BMP encoding subtable) is the glyph to use for that sequence, then the sequence is a “default” UVS; otherwise it is a “non-default” UVS, and the glyph to use for that sequence is specified in the format 14 subtable itself.
The example below shows how a font vendor can use format 14 for a JIS-2004–aware font.
(Note the presence of 24-bit integers in the structures used. The type 14 'cmap' subtable does not keep data aligned to four-byte boundaries. This is also the only 'cmap' subtable which does not stand by itself and is not completely independent of all others; a 'cmap' may not consist of a type 14 subtable alone.)
Type | Name | Description |
---|---|---|
uint16 | format | Subtable format. Set to 14. |
uint32 | length | Byte length of this subtable (including this header) |
uint32 | numVarSelectorRecords | Number of variation Selector Records |
This is immediately followed by ‘numVarSelectorRecords’ Variation Selector Records.
Type | Name | Description |
---|---|---|
uint24 | varSelector | Variation selector |
uint32 | defaultUVSOffset | Offset to Default UVS Table. May be 0. |
uint32 | nonDefaultUVSOffset | Offset to Non-Default UVS Table. May be 0. |
The Variation Selector Records are sorted in increasing order of ‘varSelector’. No two records may have the same ‘varSelector’. All offsets in a record are relative to the beginning of the format 14 encoding subtable.
A Variation Selector Record and the data its offsets point to specify those UVSes supported by the font for which the variation selector is the ‘varSelector’ value of the record. The base characters of the UVSes are stored in the tables pointed to by the offsets. The UVSes are partitioned by whether they are default or non-default UVSes.
Glyph IDs to be used for non-default UVSes are specified in the Non-Default UVS table.
Default UVS Table
A Default UVS Table is simply a range-compressed list of Unicode scalar values, representing the base characters of the default UVSes which use the varSelector of the associated Variation Selector Record.
Type | Name | Description |
---|---|---|
uint32 | numUnicodeValueRanges | Number of ranges that follow |
This is immediately followed by numUnicodeValueRanges Unicode Value Ranges, each of which represents a contiguous range of Unicode values.
Type | Name | Description |
---|---|---|
uint24 | startUnicodeValue | First value in this range |
BYTE | additionalCount | Number of additional values in this range |
For example, the range U+4E4D–U+4E4F (3 values) will set startUnicodeValue to 0x004E4D and additionalCount to 2. A singleton range will set additionalCount to 0.
(startUnicodeValue + additionalCount) must not exceed 0xFFFFFF.
The Unicode Value Ranges are sorted in increasing order of startUnicodeValue. The ranges must not overlap; i.e., (startUnicodeValue + additionalCount) must be less than the startUnicodeValue of the following range (if any).
Non-Default UVS Table
A Non-Default UVS Table is a list of pairs of Unicode scalar values and glyph IDs. The Unicode values represent the base characters of all non-default UVSes which use the ‘varSelector’ of the associated Variation Selector Record, and the glyph IDs specify the glyph IDs to use for the UVSes.
Type | Name | Description |
---|---|---|
uint32 | numUVSMappings | Number of UVS Mappings that follow |
This is immediately followed by ‘numUVSMappings’ UVS Mappings.
Type | Name | Description |
---|---|---|
uint24 | unicodeValue | Base Unicode value of the UVS |
uint16 | glyphID | Glyph ID of the UVS |
The UVS Mappings are sorted in increasing order of unicodeValue. No two mappings in this table may have the same unicodeValue values.
Example
Here is an example of how a format 14 encoding subtable may be used in a font that is aware of JIS-2004 variant glyphs. The CIDs (character IDs) in this example refer to those in the Adobe Character Collection “Adobe-Japan1”, and may be assumed to be identical to the glyph IDs in the font in our example.
JIS-2004 changed the default glyph variants for some of its code points. For example:
JIS-90: U+82A6 ⇒ CID 1142
JIS-2004: U+82A6 ⇒ CID 7961
Both of these glyph variants are supported through the use of UVSes, as the following examples from Unicode’s UVS registry show:
U+82A6 U+E0100 ⇒ CID 1142
U+82A6 U+E0101 ⇒ CID 7961
If the font wants to support the JIS-2004 variants by default, it will:
- encode glyph ID 7961 at U+82A6 in the Unicode encoding subtable,
- specify <U+82A6, U+E0101> in the UVS encoding subtable’s Default UVS Table (‘varSelector’ will be 0x0E0101 and ‘defaultUVSOffset’ will point to data containing a 0x0082A6 Unicode value)
- specify <U+82A6, U+E0100> ⇒ glyph ID 1142 in the UVS encoding subtable’s Non-Default UVS Table (‘varSelector’ will be 0x0E0100 and ‘nonDefaultBaseUVOffset’ will point to data containing a ‘unicodeValue’ 0x0082A6 and ‘glyphID’ 1142).
If, however, the font wants to support the JIS-90 variants by default, it will:
- encode glyph ID 1142 at U+82A6 in the Unicode encoding subtable,
- specify <U+82A6, U+E0100> in the UVS encoding subtable’s Default UVS Table
- specify <U+82A6, U+E0101> ⇒ glyph ID 7961 in the UVS encoding subtable’s Non-Default UVS Table
Platform-specific Information
Apple platforms do not support format 0, 8, or 10 encoding subtables.