Updated Unicode Decompositions
The Text Encoding Converter now uses the decompositions defined in Unicode 3.2. The changes are limited to characters in Greek, Thai, Gurmukhi, and Arabic/Farsi. This change affects conversion of characters between Unicode and the Mac encodings for these scripts.
xD3 = u0E33 for composed Unicode; now maps to u0E33 for decomposed Unicode too, instead of to uF860+u0E4D+u0E32 (old mapping is loosely mapped to xD3)
x91=u0A5C for composed Unicode; now maps to u0A5C for decomposed Unicode too, instead of to uF860+u0A21+u0A3C (old mapping is loosely mapped to x91)
xD5 is now always (composed & decomposed) mapped to xF860+u0A38+u0A3C instead of u0A36, since the latter is in CompostionExclusions.txt (the old mapping is loosely mapped back)
For mapping to decomposed Unicode - all of the decompositions that formerly used u030D now use u0301; the affected characters (and their mappings for composed Unicode) are:
x87=u0385, xC0=u03AC, xCD=u0386, xCE=u0388, xD7=u0389, xD8=u038A, xD9=u038C, xDA=u038E, xDB=u03AD, xDC=u03AE, xDD=u03AF, xDE=u03CC, xDF= u038F, xE0=u03CD, xF1=u03CE, xFD=u0390, xFE=u03B0 (the old mappings are loosely mapped back)
MacArabic (all variants), MacFarsi (both variants)
Table A-1shows the mapping from composed to decomposed Unicode. The items in the table were not previously decomposed.
Table A-1 MacArabic/MacFarsi mapping from composed to decomposed
These encodings are now supported by the Text Encoding Converter:
Full support for the new Chinese standard has been added to the TEC, along with new fonts in the system to support the new characters.
DOS encodings for Simba