Updated Unicode Decompositions

The Text Encoding Converter now uses the decompositions defined in Unicode 3.2. The changes are limited to characters in Greek, Thai, Gurmukhi, and Arabic/Farsi. This change affects conversion of characters between Unicode and the Mac encodings for these scripts.

  1. MacThai

    xD3 = u0E33 for composed Unicode; now maps to u0E33 for decomposed Unicode too, instead of to uF860+u0E4D+u0E32 (old mapping is loosely mapped to xD3)

  2. MacGurmukhi

    x91=u0A5C for composed Unicode; now maps to u0A5C for decomposed Unicode too, instead of to uF860+u0A21+u0A3C (old mapping is loosely mapped to x91)

    xD5 is now always (composed & decomposed) mapped to xF860+u0A38+u0A3C instead of u0A36, since the latter is in CompostionExclusions.txt (the old mapping is loosely mapped back)

  3. MacGreek

    For mapping to decomposed Unicode - all of the decompositions that formerly used u030D now use u0301; the affected characters (and their mappings for composed Unicode) are:

    x87=u0385, xC0=u03AC, xCD=u0386, xCE=u0388, xD7=u0389, xD8=u038A, xD9=u038C, xDA=u038E, xDB=u03AD, xDC=u03AE, xDD=u03AF, xDE=u03CC, xDF= u038F, xE0=u03CD, xF1=u03CE, xFD=u0390, xFE=u03B0 (the old mappings are loosely mapped back)

  4. MacArabic (all variants), MacFarsi (both variants)

    Table A-1shows the mapping from composed to decomposed Unicode. The items in the table were not previously decomposed.

    Table A-1  MacArabic/MacFarsi mapping from composed to decomposed

    Char

    Composed

    Decomposed

    xC2

    u0622

    u0627+u0653

    xC3

    u0623

    u0627+u0654

    xC4

    u0624

    u0648+u0654

    xC5

    u0625

    u0627+u0655

    xC6

    u0626

    u064A+u0654

These encodings are now supported by the Text Encoding Converter: