Apple Developer Connection
Member Login Log In | Not a Member? Contact ADC

Next Page > Hide TOC

Text Encoding Conversion Manager Reference

Framework
CoreServices/CoreServices.h
Declared in
TextCommon.h
TextEncodingConverter.h
TextEncodingPlugin.h
UnicodeConverter.h

Overview

The Text Encoding Conversion (TEC) Manager provides two facilities—the Text Encoding Converter and the Unicode Converter—that your application can use to handle text encoding conversion on the Mac OS. You will find the Text Encoding Conversion Manager helpful if you develop Internet applications, such as Web browsers or e-mail applications, applications that transfer text across different platforms, or applications based in Unicode.

Functions by Task

Creating a Text Encoding Specification

Obtaining Information From a Text Encoding Specification

Converting Between Script Manager Values and Text Encodings

Obtaining Information About Available Text Encodings

Identifying Direct Encoding Conversions

Identifying Possible Destination Encodings

Obtaining Converter Information

Creating and Deleting Converter Objects

Converting Text Between Encodings

Converting to Multiple Encoding Runs

Using Sniffers to Investigate Encodings

Getting Information About Internet and Regional Text Encoding Names

Converting to Unicode

Converting From Unicode

Converting From Unicode to Multiple Encodings

Converting Between Unicode and Pascal Strings

Obtaining Unicode Mapping Information

Truncating Strings Before Converting Them

Setting the Fallback Handler

Working With Universal Procedure Pointers

Getting UniChar Property Values

Functions

ChangeTextToUnicodeInfo

Changes the mapping information for the specified Unicode converter object used to convert text to Unicode to the new mapping you provide.

OSStatus ChangeTextToUnicodeInfo (
   TextToUnicodeInfo ioTextToUnicodeInfo,
   ConstUnicodeMappingPtr iUnicodeMapping
);

Parameters
ioTextToUnicodeInfo

The Unicode converter object of type TextToUnicodeInfo containing the mapping to be modified. You use the function CreateTextToUnicodeInfo to obtain one.

iUnicodeMapping

A structure of type UnicodeMapping identifying the new mapping to be used. This is the mapping that replaces the existing mapping in the Unicode converter object.

Return Value

A result code. See “TEC Manager Result Codes.”

Discussion

The function replaces the mapping table information that currently exists in the Unicode converter object pointed to by the ioTextToUnicodeInfo parameter with the information contained in the UnicodeMapping structure you supply as the iUnicodeMapping parameter.

ChangeTextToUnicodeInfo resets the Unicode converter object’s fields as necessary.

If an error is returned, the Unicode converter object is invalid.

Availability
Declared In
UnicodeConverter.h

ChangeUnicodeToTextInfo

Changes the mapping information contained in the specified Unicode converter object used to convert Unicode text to a non-Unicode encoding.

OSStatus ChangeUnicodeToTextInfo (
   UnicodeToTextInfo ioUnicodeToTextInfo,
   ConstUnicodeMappingPtr iUnicodeMapping
);

Parameters
ioUnicodeToTextInfo

The Unicode converter object of type UnicodeToTextInfo to be modified. You use the function CreateUnicodeToTextInfo or CreateUnicodeToTextInfoByEncoding to obtain a Unicode converter object of this type.

iUnicodeMapping

The structure of type UnicodeMapping to be used. This is the new mapping that replaces the existing mapping in the Unicode converter object.

Return Value

A result code. See “TEC Manager Result Codes.”

Discussion

The function replaces the mapping table information that currently exists in the specified Unicode converter object with the information contained in the new Unicode mapping structure you provide.

ChangeUnicodeToTextInfo resets the Unicode converter object’s fields as necessary. However, it does not initialize or reset the conversion state maintained by the Unicode converter object.

This function is especially useful for converting a string from Unicode if the Unicode string contains characters that require multiple destination encodings and you know the next destination encoding.

For example, you can change the other (destination) encoding of the Unicode mapping structure pointed to by the iUnicodeMapping parameter before you call the function ConvertFromUnicodeToText to convert the next character or sequence of characters that require a different destination encoding.

If an error is returned, the Unicode converter object is invalid.

Availability
Declared In
UnicodeConverter.h

ConvertFromPStringToUnicode

Converts a Pascal string in a Mac OS text encoding to a Unicode string.

OSStatus ConvertFromPStringToUnicode (
   TextToUnicodeInfo iTextToUnicodeInfo,
   ConstStr255Param iPascalStr,
   ByteCount iOutputBufLen,
   ByteCount *oUnicodeLen,
   UniChar oUnicodeStr[]
);

Parameters
iTextToUnicodeInfo

A Unicode converter object of type TextToUnicodeInfo for the Pascal string to be converted. You can use the function CreateTextToUnicodeInfo or CreateTextToUnicodeInfoByEncoding to create the Unicode converter object.

iPascalStr

The Pascal string to be converted to Unicode.

iOutputBufLen

The length in bytes of the output buffer pointed to by the oUnicodeStr parameter. Your application supplies this buffer to hold the returned converted string. The oUnicodeLen parameter may return a byte count that is less than this value if the converted string is smaller than the buffer size you allocated.

oUnicodeLen

On return, a pointer to the length in bytes of the converted Unicode string returned in the oUnicodeStr parameter.

oUnicodeStr

A pointer to a Unicode character array. On return, this array holds the converted Unicode string.

Return Value

A result code. See “TEC Manager Result Codes.”

Discussion

The ConvertFromPStringToUnicode function provides an easy and efficient way to convert a short Pascal string to a Unicode string without incurring the overhead associated with the function ConvertFromTextToUnicode.

If necessary, this function automatically uses fallback characters to map the text elements of the string.

Availability
Declared In
UnicodeConverter.h

ConvertFromTextToUnicode

Converts a string from any encoding to Unicode.

OSStatus ConvertFromTextToUnicode (
   TextToUnicodeInfo iTextToUnicodeInfo,
   ByteCount iSourceLen,
   ConstLogicalAddress iSourceStr,
   OptionBits iControlFlags,
   ItemCount iOffsetCount,
   const ByteOffset iOffsetArray[],
   ItemCount *oOffsetCount,
   ByteOffset oOffsetArray[],
   ByteCount iOutputBufLen,
   ByteCount *oSourceRead,
   ByteCount *oUnicodeLen,
   UniChar oUnicodeStr[]
);

Parameters
iTextToUnicodeInfo

A Unicode converter object of type TextToUnicodeInfo containing mapping and state information used for the conversion. The contents of this Unicode converter object are modified by the function. Your application obtains a Unicode converter object using the function CreateTextToUnicodeInfo.

iSourceLen

The length in bytes of the source string to be converted.

iSourceStr

The address of the source string to be converted.

iControlFlags

Conversion control flags. You can use “Conversion Masks” to set the iControlFlags parameter.

iOffsetCount

The number of offsets in the iOffsetArray parameter. Your application supplies this value. The number of entries in iOffsetArray must be fewer than the number of bytes specified in iSourceLen. If you don’t want offsets returned to you, specify 0 (zero) for this parameter.

iOffsetArray

An array of type ByteOffset. On input, you specify the array that contains an ordered list of significant byte offsets pertaining to the source string. These offsets may identify font or style changes, for example, in the source string. All array entries must be less than the length in bytes specified by the iSourceLen parameter. If you don’t want offsets returned to your application, specify NULL for this parameter and 0 (zero) for iOffsetCount.

oOffsetCount

On return, a pointer to the number of offsets that were mapped in the output stream.

oOffsetArray

An array of type ByteOffset. On return, this array contains the corresponding new offsets for the Unicode string produced by the converter.

iOutputBufLen

The length in bytes of the output buffer pointed to by the oUnicodeStr parameter. Your application supplies this buffer to hold the returned converted string. The oUnicodeLen parameter may return a byte count that is less than this value if the converted byte string is smaller than the buffer size you allocated. The relationship between the size of the source string and the Unicode string is complex and depends on the source encoding and the contents of the string.

oSourceRead

On return, a pointer to the number of bytes of the source string that were converted. If the function returns a kTECUnmappableElementErr result code, this parameter returns the number of bytes that were converted before the error occurred.

oUnicodeLen

On return, a pointer to the length in bytes of the converted stream.

oUnicodeStr

A pointer to an array used to hold a Unicode string. On input, this value points to the beginning of the array for the converted string. On return, this buffer holds the converted Unicode string. (For guidelines on estimating the size of the buffer needed, see the discussion.

Return Value

A result code. See “TEC Manager Result Codes.” The function returns a noErr result code if it has completely converted the input string to Unicode without using fallback characters.

Discussion

You specify the source string’s encoding in the Unicode mapping structure that you pass to the function CreateTextToUnicodeInfo to obtain a Unicode converter object for the conversion. You pass the Unicode converter object returned by CreateTextToUnicodeInfo to ConvertFromTextToUnicode as the iTextToUnicodeInfo parameter.

In addition to converting a text string in any encoding to Unicode, the ConvertFromTextToUnicode function can map offsets for style or font information from the source text string to the returned converted string. The converter reads the application-supplied offsets, which apply to the source string, and returns the corresponding new offsets in the converted string. If you do not want the offsets at which font or style information occurs mapped to the resulting string, you should pass NULL for iOffsetArray and 0 (zero) for iOffsetCount.

Your application must allocate a buffer to hold the resulting converted string and pass a pointer to the buffer in the oUnicodeStr parameter. To determine the size of the output buffer to allocate, you should consider the size of the source string, its encoding type, and its content in relation to the resulting Unicode string.

For example, for 1-byte encodings, such as MacRoman, the Unicode string will be at least double the size (more if it uses noncomposed Unicode) for MacArabic and MacHebrew, the corresponding Unicode string could be up to six times as big. For most 2-byte encodings, for example Shift-JIS, the Unicode string will be less than double the size. For international robustness, your application should allocate a buffer three to four times larger than the source string. If the output Unicode text is actually UTF-8—which could occur beginning with the current release of the Text Encoding Conversion Manager, version 1.2.1—the UTF-8 buffer pointer must be cast to UniCharArrayPtr before it can be passed as the oUnicodeStr parameter. Also, the output buffer length will have a wider range of variation than for UTF-16; for ASCII input, the output will be the same size; for Han input, the output will be twice as big, and so on.

Availability
Declared In
UnicodeConverter.h

ConvertFromUnicodeToPString

Converts a Unicode string to Pascal in a Mac OS text encoding.

OSStatus ConvertFromUnicodeToPString (
   UnicodeToTextInfo iUnicodeToTextInfo,
   ByteCount iUnicodeLen,
   const UniChar iUnicodeStr[],
   Str255 oPascalStr
);

Parameters
iUnicodeToTextInfo

A Unicode converter object. You use the CreateUnicodeToTextInfo or CreateUnicodeToTextInfoByEncoding function to obtain the Unicode converter object for the conversion.

iUnicodeLen

The length in bytes of the Unicode string to be converted. This is the string your application provides in the iUnicodeStr parameter.

iUnicodeStr

A pointer to an array containing the Unicode string to be converted.

oPascalStr

A buffer. On return, the converted Pascal string returned by the function.

Return Value

A result code. See “TEC Manager Result Codes.”

Discussion

The ConvertFromUnicodeToPString function provides an easy and efficient way to convert a Unicode string to a Pascal string in a Mac OS text encoding without incurring the overhead associated with use of the function ConvertFromUnicodeToText or ConvertFromUnicodeToScriptCodeRun.

If necessary, this function uses the loose mapping and fallback characters to map the text elements of the string. For fallback mappings, it uses the handler associated with the Unicode converter object.

Availability
Declared In
UnicodeConverter.h

ConvertFromUnicodeToScriptCodeRun

Converts a string from Unicode to one or more scripts.

OSStatus ConvertFromUnicodeToScriptCodeRun (
   UnicodeToTextRunInfo iUnicodeToTextInfo,
   ByteCount iUnicodeLen,
   const UniChar iUnicodeStr[],
   OptionBits iControlFlags,
   ItemCount iOffsetCount,
   const ByteOffset iOffsetArray[],
   ItemCount *oOffsetCount,
   ByteOffset oOffsetArray[],
   ByteCount iOutputBufLen,
   ByteCount *oInputRead,
   ByteCount *oOutputLen,
   LogicalAddress oOutputStr,
   ItemCount iScriptRunBufLen,
   ItemCount *oScriptRunOutLen,
   ScriptCodeRun oScriptCodeRuns[]
);

Parameters
iUnicodeToTextInfo

You use the function CreateUnicodeToTextRunInfoByScriptCode to obtain a Unicode converter object to specify for this parameter.

iUnicodeLen

The length in bytes of the Unicode string to be converted.

iUnicodeStr

A pointer to the Unicode string to be converted.

iControlFlags

Conversion control flags. The following constants define the masks for control flags valid for this parameter. You can use “Conversion Masks” and “Directionality Masks” to set the iControlFlags parameter.

If the text-run control flag is clear, ConvertFromUnicodeToScriptCodeRun attempts to convert the Unicode text to the single script from the list of scripts in the Unicode converter object that produces the best result, that is, that provides for the greatest amount of source text conversion. If the complete source text can be converted into more than one of the scripts specified in the array, then the converter chooses among them based on their order in the array. If this flag is clear, the oScriptCodeRuns parameter always points to a value equal to 1.

If you set the use-fallbacks control flag, the converter uses the default fallback characters for the current script. If the converter cannot handle a character using the current encoding, even using fallbacks, the converter attempts to convert the character using the other scripts, beginning with the first one specified in the list and skipping the one where it failed.

If you set the kUnicodeTextRunBit control flag, the converter attempts to convert the complete Unicode text string into the first script specified in the Unicode mapping structures array you passed to CreateUnicodeToTextRunInfo, CreateUnicodeToTextRunInfoByEncoding, or CreateUnicodeToTextRunInfoByScriptCode to create the Unicode converter object used for this conversion. If it cannot do this, the converter then attempts to convert the first text element that failed to the remaining scripts, in their specified order in the array. What the converter does with the next text element depends on the setting of the keep-same-encoding control flag:

If the keep-same-encoding control flag is clear, the converter returns to the original script and attempts to continue conversion with that script; this is equivalent to converting each text element to the first one that works, in the order specified.

If the Unicode-keep-same-encoding control flag is set, the converter continues with the new destination script until it encounters a text element that cannot be converted using the new script. This attempts to minimize the number of script code changes in the output text. When the converter cannot convert a text element using any of the scripts in the list and the Unicode-keep-same-encoding control flag is set, the converter uses the fallbacks default characters for the current script.

iOffsetCount

The number of offsets in the array pointed to by the iOffsetArray parameter. Your application supplies this value. The number of entries in iOffsetArray must be fewer than half the number of bytes specified in iUnicodeLen. If you don’t want offsets returned to you, specify 0 (zero) for this parameter.

iOffsetArray

An array of type ByteOffset. On input, you specify the array that contains an ordered list of significant byte offsets pertaining to the source Unicode string. These offsets may identify font or style changes, for example, in the Unicode string. If you don’t want offsets returned to your application, specify NULL for this parameter and 0 (zero) for iOffsetCount.

oOffsetCount

On return, a pointer to the number of offsets that were mapped in the output stream.

oOffsetArray

An array of type ByteOffset. On return, this array contains the corresponding new offsets for the resulting converted string.

iOutputBufLen

The length in bytes of the output buffer pointed to by the oOutputStr parameter. Your application supplies this buffer to hold the returned converted string. The oOutputLen parameter may return a byte count that is less than this value if the converted byte string is smaller than the buffer size you allocated.

oInputRead

On return, a pointer to the number of bytes of the Unicode source string that were converted. If the function returns a result code other than noErr, then this parameter returns the number of bytes that were converted before the error occurred.

oOutputLen

On return, a pointer to the length in bytes of the converted string.

oOutputStr

A buffer address. On input, this value points to the beginning of the buffer for the converted string. On return, this buffer contains the converted string in one or more encodings. When an error occurs, the ConvertFromUnicodeToScriptCodeRun function returns the converted string up to the character that caused the error.

iScriptRunBufLen

The number of script code run elements you allocated for the script code run array pointed to by the oScriptCodeRuns parameter. The converter returns the number of valid script code runs in the location pointed to by oScriptRunOutLen. Each entry in the script code run array specifies the beginning offset in the converted text and its associated script code.

oScriptRunOutLen

A pointer to a value of type ItemCount. On output, this value contains the number of valid script code runs returned in the oScriptCodeRuns parameter.

oScriptCodeRuns

An array of elements of type ScriptCodeRun. Your application should allocate an array with the number of elements you specify in the iScriptRunBufLen parameter. On return, this array contains the script code runs for the converted text string. Each entry in the array specifies the beginning offset in the converted text string and the associated script code specification.

Return Value

A result code. See “TEC Manager Result Codes.”

Discussion

To use the ConvertFromUnicodeToScriptCodeRun function, you must first set up an array of script codes containing in order of precedence the scripts to be used for the conversion. To create a Unicode converter object, you call the function CreateUnicodeToTextRunInfoByScriptCode. You pass the returned Unicode converter object as the iUnicodeToTextInfo parameter when you call the ConvertFromUnicodeToScriptCodeRun function.

Availability
Declared In
UnicodeConverter.h

ConvertFromUnicodeToText

Converts a Unicode text string to the destination encoding you specify.

OSStatus ConvertFromUnicodeToText (
   UnicodeToTextInfo iUnicodeToTextInfo,
   ByteCount iUnicodeLen,
   const UniChar iUnicodeStr[],
   OptionBits iControlFlags,
   ItemCount iOffsetCount,
   const ByteOffset iOffsetArray[],
   ItemCount *oOffsetCount,
   ByteOffset oOffsetArray[],
   ByteCount iOutputBufLen,
   ByteCount *oInputRead,
   ByteCount *oOutputLen,
   LogicalAddress oOutputStr
);

Parameters
iUnicodeToTextInfo

A Unicode converter object of type UnicodeToTextInfo for converting text from Unicode. You use the function CreateUnicodeToTextInfo or CreateUnicodeToTextInfoByEncoding to obtain a Unicode converter object to specify for this parameter. This function modifies the contents of the iUnicodeToTextInfo parameter.

iUnicodeLen

The length in bytes of the Unicode string to be converted.

iUnicodeStr

A pointer to the Unicode string to be converted. If the input text is UTF-8, which is supported for versions 1.2.1 or later of the converter, you must cast the UTF-8 buffer pointer to ConstUniCharArrayPtr before you can pass it as this parameter.

iControlFlags

Conversion control flags. You can use “Conversion Masks” and “Directionality Masks” to set the iControlFlags parameter.

iOffsetCount

The number of offsets contained in the array provided by the iOffsetArray parameter. Your application supplies this value. If you don’t want offsets returned to you, specify 0 (zero) for this parameter.

iOffsetArray

An array of type ByteOffset. On input, you specify the array that gives an ordered list of significant byte offsets pertaining to the Unicode source string to be converted. These offsets may identify font or style changes, for example, in the source string. If you don’t want offsets returned to your application, specify NULL for this parameter and 0 (zero) for iOffsetCount. All offsets must be less than iUnicodeLen.

oOffsetCount

On return, a pointer to the number of offsets that were mapped in the output stream.

oOffsetArray

An array of type ByteOffset. On return, this array contains the corresponding new offsets for the converted string in the new encoding.

iOutputBufLen

The length in bytes of the output buffer pointed to by the oOutputStr parameter. Your application supplies this buffer to hold the returned converted string. The oOutputLen parameter may return a byte count that is less than this value if the converted byte string is smaller than the buffer size you allocated.

oInputRead

On return, a pointer to a the number of bytes of the Unicode string that were converted. If the function returns a kTECUnmappableElementErr result code, this parameter returns the number of bytes that were converted before the error occurred.

oOutputLen

On return, a pointer to the length in bytes of the converted text stream.

oOutputStr

A value of type LogicalAddress. On input, this value points to a buffer for the converted string. On return, the buffer holds the converted text string. (For guidelines on estimating the size of the buffer needed, see the following discussion.

Return Value

A result code. See “TEC Manager Result Codes.”

Discussion

This function can also map offsets for style or font information from the source text string to the returned converted string. The converter reads the application-supplied offsets and returns the corresponding new offsets in the converted string. If you do not want font or style information offsets mapped to the resulting string, you should pass NULL for iOffsetArray and 0 (zero) for iOffsetCount.

Your application must allocate a buffer to hold the resulting converted string and pass a pointer to the buffer in the oOutputStr parameter. To determine the size of the output buffer to allocate, you should consider the size and content of the Unicode source string in relation to the type of encoding to which it will be converted. For example, for many encodings, such as MacRoman and Shift-JIS, the size of the returned string will be between half the size and the same size as the source Unicode string. However, for some encodings that are not Mac OS ones, such as EUC-JP, which has some 3-byte characters for Kanji, the returned string could be larger than the source Unicode string. For MacArabic and MacHebrew, the result will usually be less than half the size of the Unicode string.

Availability
Declared In
UnicodeConverter.h

ConvertFromUnicodeToTextRun

Converts a string from Unicode to one or more encodings.

OSStatus ConvertFromUnicodeToTextRun (
   UnicodeToTextRunInfo iUnicodeToTextInfo,
   ByteCount iUnicodeLen,
   const UniChar iUnicodeStr[],
   OptionBits iControlFlags,
   ItemCount iOffsetCount,
   const ByteOffset iOffsetArray[],
   ItemCount *oOffsetCount,
   ByteOffset oOffsetArray[],
   ByteCount iOutputBufLen,
   ByteCount *oInputRead,
   ByteCount *oOutputLen,
   LogicalAddress oOutputStr,
   ItemCount iEncodingRunBufLen,
   ItemCount *oEncodingRunOutLen,
   TextEncodingRun oEncodingRuns[]
);

Parameters
iUnicodeToTextInfo

You use the function CreateUnicodeToTextRunInfo, CreateUnicodeToTextRunInfoByEncoding, or CreateUnicodeToTextRunInfoByScriptCode to obtain a Unicode converter object to specify for this parameter.

iUnicodeLen

The length in bytes of the Unicode string to be converted.

iUnicodeStr

A pointer to the Unicode string to be converted.

iControlFlags

Conversion control flags. The following constants define the masks for control flags valid for this parameter. You can use “Conversion Masks” and “Directionality Masks” to set the iControlFlags parameter.

If the text-run control flag is clear, ConvertFromUnicodeToTextRun attempts to convert the Unicode text to the single encoding it chooses from the list of encodings in the Unicode mapping structures array that you provide when you create the Unicode converter object. This is the encoding that produces the best result, that is, that provides for the greatest amount of source text conversion. If the complete source text can be converted into more than one of the encodings specified in the Unicode mapping structures array, then the converter chooses among them based on their order in the array. If this flag is clear, the oEncodingRuns parameter always points to a value equal to 1.

If you set the use-fallbacks control flag, the converter uses the default fallback characters for the current encoding. If the converter cannot handle a character using the current encoding, even using fallbacks, the converter attempts to convert the character using the other encodings, beginning with the first encoding specified in the list and skipping the encoding where it failed.

If you set the kUnicodeTextRunBit control flag, the converter attempts to convert the complete Unicode text string into the first encoding specified in the Unicode mapping structures array you passed to CreateUnicodeToTextRunInfo, CreateUnicodeToTextRunInfoByEncoding, or CreateUnicodeToTextRunInfoByScriptCode when you created the Unicode converter object for this conversion. If it cannot do this, the converter then attempts to convert the first text element that failed to the remaining encodings, in their specified order in the array. What the converter does with the next text element depends on the setting of the keep-same-encoding control flag.

If the keep-same-encoding control flag is clear, the converter returns to the original encoding and attempts to continue conversion with that encoding; this is equivalent to converting each text element to the first encoding that works, in the order specified.

If the keep-same-encoding control flag is set, the converter continues with the new destination encoding until it encounters a text element that cannot be converted using the new encoding. This attempts to minimize the number of encoding changes in the output text. When the converter cannot convert a text element using any of the encodings in the list and the Unicode-keep-same-encoding control flag is set, the converter uses the fallbacks default characters for the current encoding.

iOffsetCount

The number of offsets in the array pointed to by the iOffsetArray parameter. Your application supplies this value. If you don’t want offsets returned to you, specify 0 (zero) for this parameter.

iOffsetArray

An array of type ByteOffset. On input, you specify the array that contains an ordered list of significant byte offsets pertaining to the source Unicode string. These offsets may identify font or style changes, for example, in the Unicode string. If you don’t want offsets returned to your application, specify NULL for this parameter and 0 (zero) for iOffsetCount. All offsets must be less than iUnicodeLen.

oOffsetCount

On return, a pointer to the number of offsets that were mapped in the output stream.

oOffsetArray

An array of type ByteOffset. On return, this array contains the corresponding new offsets for the resulting converted string.

iOutputBufLen

The length in bytes of the output buffer pointed to by the oOutputStr parameter. Your application supplies this buffer to hold the returned converted string. The oOutputLen parameter may return a byte count that is less than this value if the converted byte string is smaller than the buffer size you allocated.

oInputRead

On return, a pointer to the number of bytes of the Unicode source string that were converted. If the function returns a result code other than noErr, then this parameter returns the number of bytes that were converted before the error occurred.

oOutputLen

On return, a pointer to the length in bytes of the converted string.

oOutputStr

A value of type LogicalAddress. On input, this value points to the start of the buffer for the converted string. On output, this buffer contains the converted string in one or more encodings. When an error occurs, the ConvertFromUnicodeToTextRun function returns the converted string up to the character that caused the error. (For guidelines on estimating the size of the buffer needed, see the discussion following the parameter descriptions.

iEncodingRunBufLen

The number of text encoding run elements you allocated for the encoding run array pointed to by the oEncodingRuns parameter. The converter returns the number of valid encoding runs in the location pointed to by oEncodingRunOutLen. Each entry in the encoding runs array specifies the beginning offset in the converted text and its associated text encoding.

oEncodingRunOutLen

On return, a pointer to a the number of valid encoding runs returned in the oEncodingRuns parameter.

oEncodingRuns

On input, an array of structures of type TextEncodingRun. Your application should allocate an array with the number of elements you specify in the iEncodingRunBufLen parameter. On return, this array contains the encoding runs for the converted text string. Each entry in the encoding run array specifies the beginning offset in the converted text string and the associated encoding specification.

Return Value

A result code. See “TEC Manager Result Codes.”

Discussion

To use the ConvertFromUnicodeToTextRun function, you must first set up an array of structures of type UnicodeMapping containing, in order of precedence, the mapping information for the conversion. To create a Unicode converter object, you call the CreateUnicodeToTextRunInfo function passing it the Unicode mapping array, or you can the CreateUnicodeToTextRunInfoByEncoding or CreateUnicodeToTextRunInfoByScriptCode functions, which take arrays of text encodings or script codes instead of an array of Unicode mappings. You pass the returned Unicode converter object as the iUnicodeToTextInfo parameter when you call the ConvertFromUnicodeToTextRun function.

Two of the control flags that you can set for the iControlFlags parameter allow you to control how the Unicode Converter uses the multiple encodings in converting the text string. These flags are explained in the description of the iControlFlags parameter. Here is a summary of how to use these two control flags:

The ConvertFromUnicodeToTextRun function returns the converted string in the array pointed to by the oOutputStr parameter. Beginning with the first text element in the oOutputStr array, the elements of the array pointed to by the oEncodingRuns parameter identify the encodings of the converted string. The number of elements in the oEncodingRuns array may not correspond to the number of elements in the oOutputStr array. This is because the oEncodingRuns array includes only elements for the beginning of each new encoding run in the converted string.

Availability
Declared In
UnicodeConverter.h

CountUnicodeMappings

Counts available mappings that meet the specified matching criteria.

OSStatus CountUnicodeMappings (
   OptionBits iFilter,
   ConstUnicodeMappingPtr iFindMapping,
   ItemCount *oActualCount
);

Parameters
iFilter

Filter control flags representing the six subfields of the Unicode mapping structure that this function uses to match against in determining which mappings on the system to return to your application. The filter control enumeration, described in “Unicode Matching Masks,” define the constants for the subfield’s flags and their masks. You can include in the search criteria any of the three text encoding subfields for both the Unicode encoding and the other specified encoding. For any flag not turned on, the subfield value is ignored and the function does not check the corresponding subfield of the mappings on the system.

iFindMapping

A structure of type UnicodeMapping containing the text encodings whose field values are to be matched.

oActualCount

On return, a pointer to the number of matching mappings found.

Return Value

A result code. See “TEC Manager Result Codes.”

Discussion

You can filter on any of the three text encoding subfields of the Unicode mapping structure’s unicodeEncoding specification and on any of the three text encoding subfields of the structure’s otherEncoding specification. The iFilter parameter consists of a set of six control flags that you set to identify which of the corresponding six subfields to include in the match count. No filtering is performed on fields for which you do not set the corresponding filter control flag.

Availability
Declared In
UnicodeConverter.h

CreateTextEncoding

Creates and returns a text encoding specification.

TextEncoding CreateTextEncoding (
   TextEncodingBase encodingBase,
   TextEncodingVariant encodingVariant,
   TextEncodingFormat encodingFormat
);

Parameters
encodingBase

A base text encoding.

encodingVariant

A variant of the base text encoding. To specify the default variant for the base encoding given in the encodingBase parameter, you can use the kTextEncodingDefaultVariant constant.

encodingFormat

A format for the base text encoding. To specify the default format for the base encoding, you can use the kTextEncodingDefaultFormat constant. If you want to obtain a TextEncoding value that references UTF-16 or UTF-8, pass kUnicode16BitFormat or kUnicodeUTF8Format .

Return Value

The text encoding specification that the function creates from the values you pass it.

Discussion

When you create a text encoding specification, the three values that you specify are packed into an unsigned integer, which you can then pass by value to the functions that use text encodings. See the data type TextEncodingRun.

Availability
Carbon Porting Notes
Declared In
TextCommon.h

CreateTextToUnicodeInfo

Creates and returns a Unicode converter object containing information required for converting strings from a non-Unicode encoding to Unicode.

OSStatus CreateTextToUnicodeInfo (
   ConstUnicodeMappingPtr iUnicodeMapping,
   TextToUnicodeInfo *oTextToUnicodeInfo
);

Parameters
iUnicodeMapping

A pointer to a structure of type UnicodeMapping. Your application provides this structure to identify the mapping to use for the conversion. You must supply a value of type TextEncoding in the unicodeEncoding field of this structure. A TextEncoding is a triple composed of an encoding base, an encoding variant, and a format. You can obtain a UnicodeMapping value by calling the function CreateTextEncoding.

oTextToUnicodeInfo

On return, the Unicode converter object holds mapping table information you supplied as the UnicodeMapping parameter and state information related to the conversion. This information is required for conversion of a text stream in a non-Unicode encoding to Unicode.

Return Value

A result code. See “TEC Manager Result Codes.”

Discussion

You pass a Unicode converter object returned from the function CreateTextToUnicodeInfo to the function ConvertFromTextToUnicode or ConvertFromPStringToUnicode to identify the information to be used for the conversion. These two functions modify the contents of the object.

You pass a Unicode converter object returned from CreateTextToUnicodeInfo to the function TruncateForTextToUnicode to identify the information to be used to truncate the string. This function does not modify the contents of the Unicode converter object.

If an error is returned, the Unicode converter object is invalid.

Availability
Declared In
UnicodeConverter.h

CreateTextToUnicodeInfoByEncoding

Based on the given text encoding specification, creates and returns a Unicode converter object containing information required for converting strings from the specified non-Unicode encoding to Unicode.

OSStatus CreateTextToUnicodeInfoByEncoding (
   TextEncoding iEncoding,
   TextToUnicodeInfo *oTextToUnicodeInfo
);

Parameters
iEncoding

The text encoding specification for the source text.

oTextToUnicodeInfo

The Unicode converter object of type TextToUnicodeInfo returned by the function.

Return Value

A result code. See “TEC Manager Result Codes.”

Discussion

You can use this function instead of the CreateTextToUnicodeInfo function when you do not need to create a Unicode mapping structure. You simply specify the text encoding of the source text. However, this method is less efficient because the text encoding parameter must be resolved internally into a Unicode mapping.

You cannot specify a version of Unicode. The function uses a 16-bit form of Unicode as the default.

You pass a Unicode converter object returned from CreateTextToUnicodeInfoByEncoding to the function ConvertFromTextToUnicode or ConvertFromPStringToUnicode to identify the information to be used for the conversion. These two functions modify the contents of the Unicode converter object.

You pass a Unicode converter object returned from CreateTextToUnicodeInfoByEncoding to the function TruncateForTextToUnicode to identify the information to be used to truncate the string. This function does not modify the contents of the Unicode converter object.

If you are converting the text stream to Unicode as an intermediary encoding, and then from Unicode to the final destination encoding, you use the function CreateUnicodeToTextInfo to create a Unicode converter object for the second part of the process.

Availability
Carbon Porting Notes
Declared In
UnicodeConverter.h

CreateUnicodeToTextInfo

Creates and returns a Unicode converter object containing information required for converting strings from Unicode to a non-Unicode encoding.

OSStatus CreateUnicodeToTextInfo (
   ConstUnicodeMappingPtr iUnicodeMapping,
   UnicodeToTextInfo *oUnicodeToTextInfo
);

Parameters
iUnicodeMapping

A pointer to a structure of type UnicodeMapping. Your application provides this structure to identify the mapping to be used for the conversion. The unicodeEncoding field of this structure can specify a Unicode format of kUnicode16BitFormat or kUnicodeUTF8Format. Note that the versions of the Unicode Converter prior to 1.2.1 do not support kUnicodeUTF8Format.

oUnicodeToTextInfo

On return, a pointer to a Unicode converter object that holds the mapping table information you supply as the iUnicodeMapping parameter and the state information related to the conversion. The information contained in the Unicode converter object is required for the conversion of a Unicode string to a non-Unicode encoding.

Return Value

A result code. See “TEC Manager Result Codes.”

Discussion

You pass the Unicode converter object returned from CreateUnicodeToTextInfo to the function ConvertFromUnicodeToText or ConvertFromUnicodeToPString to identify the information to be used for the conversion. These two functions modify the contents of the Unicode converter object.

If an error is returned, the Unicode converter object is invalid.

Availability
Declared In
UnicodeConverter.h

CreateUnicodeToTextInfoByEncoding

Based on the given text encoding specification for the converted text, creates and returns a Unicode converter object containing information required for converting strings from Unicode to the specified non-Unicode encoding.

OSStatus CreateUnicodeToTextInfoByEncoding (
   TextEncoding iEncoding,
   UnicodeToTextInfo *oUnicodeToTextInfo
);

Parameters
iEncoding

The text encoding specification for the destination, or converted, text.

oUnicodeToTextInfo

A pointer to a Unicode converter object of type UnicodeToTextInfo.

Return Value

A result code. See “TEC Manager Result Codes.”

Discussion

You can use this function instead of the CreateUnicodeToTextInfo function to create a Unicode converter. However, this method is less efficient internally because the destination text encoding you specify must be resolved into a Unicode mapping. Using this function, you cannot specify a version of Unicode, so a default version of Unicode is used; 16-bit format is assumed.

You pass a Unicode converter object returned from the function CreateUnicodeToTextInfoByEncoding to the function ConvertFromUnicodeToText or ConvertFromUnicodeToPString to identify the information to be used for the conversion. These two functions modify the contents of the Unicode converter object.

You pass a Unicode converter object returned from CreateUnicodeToTextInfoByEncoding to the function TruncateForUnicodeToText to identify the information to be used to truncate the string. This function does not modify the contents of the Unicode converter object.

Availability
Declared In
UnicodeConverter.h

CreateUnicodeToTextRunInfo

Creates and returns a Unicode converter object containing the information required for converting a Unicode text string to strings in one or more non-Unicode encodings.

OSStatus CreateUnicodeToTextRunInfo (
   ItemCount iNumberOfMappings,
   const UnicodeMapping iUnicodeMappings[],
   UnicodeToTextRunInfo *oUnicodeToTextInfo
);

Parameters
iNumberOfMappings

The number of mappings specified by your application for converting from Unicode to any other encoding types, including other forms of Unicode. If you pass 0 for this parameter, the converter will use all of the scripts installed in the system. The primary script is the one with highest priority; ScriptOrder ('itlm' resource) determines the priority of the rest. If you set the high-order bit for this parameter, the Unicode converter assumes that the iEncodings parameter contains a single element specifying the preferred encoding. This feature is supported for versions 1.2 or later of the converter.

iUnicodeMappings

A pointer to an array of structures of type UnicodeMapping. Your application provides this structure to identify the mappings to be used for the conversion. The order in which you specify the mappings determines the priority of the destination encodings. For this function, the Unicode mapping structure can specify a Unicode format of kUnicode16BitFormat or kUnicodeUTF8Format. Note that the versions of the Unicode Converter prior to the Text Encoding Conversion Manager 1.2.1 do not support kUnicodeUTF8Format. Also, note that the unicodeEncoding field should be the same for all of the entries in iUnicodeMappings. If you pass NULL for the iUnicodeMappings parameter, the converter uses all of the scripts installed in the system, assuming the default version of Unicode with 16-bit format. The primary script is the one with the highest priority and ScriptOrder('itlm' resource) determines the priority of the rest. This is supported beginning with version 1.2 of the Text Encoding Conversion Manager.

oUnicodeToTextInfo

A pointer to a Unicode converter object for converting Unicode text strings to strings in one or more non-Unicode encodings. On return, a pointer to a Unicode converter object that holds the mapping table information you supply as the iUnicodeMappings parameter and the state information related to the conversion.

Return Value

A result code. See “TEC Manager Result Codes.”

Discussion

You pass a Unicode converter object returned from the function CreateUnicodeToTextRunInfo to the function ConvertFromUnicodeToTextRun or ConvertFromUnicodeToScriptCodeRun to identify the information to be used for the conversion. These two functions modify the contents of the Unicode converter object.

Availability
Declared In
UnicodeConverter.h

CreateUnicodeToTextRunInfoByEncoding

Based on the given text encoding specifications for the converted text runs, creates and returns a Unicode converter object containing information required for converting strings from Unicode to one or more specified non-Unicode encodings.

OSStatus CreateUnicodeToTextRunInfoByEncoding (
   ItemCount iNumberOfEncodings,
   const TextEncoding iEncodings[],
   UnicodeToTextRunInfo *oUnicodeToTextInfo
);

Parameters
iNumberOfEncodings

The number of desired encodings. If you pass 0 for this parameter, the converter will use all of the scripts installed in the system. The primary script is the one with highest priority; ScriptOrder('itlm' resource) determines the priority of the rest. If you set the high-order bit for this parameter, the Unicode converter assumes that the iEncodings parameter contains a single element specifying the preferred encoding. This feature is supported for versions 1.2 or later of the converter.

iEncodings

An array of text encoding specifications for the desired encodings. Your application provides this structure to identify the encodings to be used for the conversion. The order in which you specify the encodings determines the priority of the destination encodings. If you pass NULL for this parameter, the converter will use all of the scripts installed in the system. The primary script is the one with highest priority and ScriptOrder('itlm' resource) determines the priority of the rest.This feature is supported for versions 1.2 or later of the converter.

oUnicodeToTextInfo

A pointer to a Unicode converter object for converting Unicode text strings to strings in one or more non-Unicode encodings. On return, a pointer to a Unicode converter object that holds the encodings you supply as the iEncodings parameter and the state information related to the conversion.

Return Value

A result code. See “TEC Manager Result Codes.”

Discussion

You pass a Unicode converter object returned from CreateUnicodeToTextRunInfoByEncoding to the function ConvertFromUnicodeToTextRun or ConvertFromUnicodeToScriptCodeRun to identify the information to be used for the conversion. These two functions modify the contents of the Unicode converter object.

If an error is returned, the converter object is invalid.

Availability
Declared In
UnicodeConverter.h