Text Encoding Conversion Manager

Handle text encoding conversion between apps and transfer text across different platforms.

Topics

Creating a Text Encoding Specification

CreateTextEncoding

Creates and returns a text encoding specification.

Obtaining Information From a Text Encoding Specification

GetTextEncodingBase

Returns the base encoding of the specified text encoding.

GetTextEncodingFormat

Returns the format value of the specified text encoding.

GetTextEncodingName

Returns the localized name for a specified text encoding.

GetTextEncodingVariant

Returns the variant from the specified text encoding.

ResolveDefaultTextEncoding

Returns a text encoding specification in which any meta-values have been resolved to real values. Currently, this affects only the base encoding values packed into the text encoding specification.

Converting Between Script Manager Values and Text Encodings

RevertTextEncodingToScriptInfo

Converts the given Mac OS text encoding specification to the corresponding script code and, if possible, language code and font name.

UpgradeScriptInfoToTextEncoding

Converts any combination of a Mac OS script code, a language code, a region code, and a font name to a text encoding.

Obtaining Information About Available Text Encodings

TECCountAvailableTextEncodings

Counts and returns the number of text encodings currently configured in the Text Encoding Converter.

TECCountSubTextEncodings

Counts and returns the number of subencodings a text encoding supports.

TECGetAvailableTextEncodings

Returns the text encoding specifications currently configured in the Text Encoding Converter.

TECGetSubTextEncodings

Returns the text encoding specifications for the subencodings the encoding scheme supports.

NearestMacTextEncodings

Obtains the best and alternate Mac text encoding.

Identifying Direct Encoding Conversions

TECCountDirectTextEncodingConversions

Counts and returns the number of direct conversions currently configured in the Text Encoding Converter.

TECGetDirectTextEncodingConversions

Returns the types of direct conversions currently configured in the Text Encoding Converter.

Identifying Possible Destination Encodings

TECCountDestinationTextEncodings

Counts and returns the number of destination encodings to which a specified source encoding can be converted in one step.

TECGetDestinationTextEncodings

Returns the encoding specifications for all the destination text encodings to which the Text Encoding Converter can directly convert the specified source encoding.

Obtaining Converter Information

TECGetInfo

Allocates a converter information structure of type TECInfo in the application heap using NewHandle, fills it out, and returns a handle.

Creating and Deleting Converter Objects

TECCreateConverter

Determines a conversion path for a source and destination encoding, then creates a text encoding converter object and returns a pointer to it.

TECCreateConverterFromPath

Creates a converter object for a specific conversion path—from a source encoding through intermediate encodings to a destination encoding—and returns a pointer to it.

TECClearConverterContextInfo

Resets a converter object to its initial state so you can reuse it.

TECDisposeConverter

Disposes of a converter object.

Converting Text Between Encodings

TECConvertText

Converts a stream of text from a source encoding to a destination encoding. It uses the conversion path specified by the converter object you supply.

TECFlushText

Flushes out any data in a converter object’s temporary buffers and resets the converter object.

Converting to Multiple Encoding Runs

TECConvertTextToMultipleEncodings

Converts text in the source encoding to runs of text in multiple destination encodings. It uses the conversion path specified in the converter object you supply.

TECCreateOneToManyConverter

Determines a conversion path for the source encoding and destinations encodings you specify, creates a text encoding converter object, and returns a reference to it.

TECFlushMultipleEncodings

Flushes out any encodings that may be stored in a converter object’s temporary buffers and shifts encodings back to their default state, if any.

TECGetEncodingList

Gets the list of destination encodings from a converter object.

Using Sniffers to Investigate Encodings

TECCreateSniffer

Creates a sniffer object and returns a reference to it.

TECClearSnifferContextInfo

Resets a sniffer object to its initial settings so you can reuse it.

TECDisposeSniffer

Disposes of a sniffer object.

TECCountAvailableSniffers

Counts and returns the number of sniffers available in all installed plug-ins.

TECGetAvailableSniffers

Returns the list of sniffers available in all installed plug-ins.

TECSniffTextEncoding

Analyzes a text stream and returns the probable encodings in a ranked list, based on an array of possible encodings you supply. It also returns the number of errors and features for each encoding.

Getting Information About Internet and Regional Text Encoding Names

TECCountMailTextEncodings

Counts and returns the number of currently supported e-mail encodings for a specified region.

TECCountWebTextEncodings

Counts and returns the number of currently supported text encodings for a region code.

TECGetMailTextEncodings

Returns the currently supported mail encoding specifications for a region code.

TECGetTextEncodingFromInternetName

Returns the Mac OS text encoding specification that corresponds to an Internet encoding name.

TECGetTextEncodingInternetName

Returns the Internet encoding name that corresponds to a Mac OS text encoding.

TECGetWebTextEncodings

Returns the currently supported text encoding specifications for a region code.

Converting to Unicode

ChangeTextToUnicodeInfo

Changes the mapping information for the specified Unicode converter object used to convert text to Unicode to the new mapping you provide.

ConvertFromTextToUnicode

Converts a string from any encoding to Unicode.

CreateTextToUnicodeInfo

Creates and returns a Unicode converter object containing information required for converting strings from a non-Unicode encoding to Unicode.

CreateTextToUnicodeInfoByEncoding

Based on the given text encoding specification, creates and returns a Unicode converter object containing information required for converting strings from the specified non-Unicode encoding to Unicode.

DisposeTextToUnicodeInfo

Releases the memory allocated for the specified Unicode converter object.

ResetTextToUnicodeInfo

Reinitializes all state information kept by the context objects.

Converting From Unicode

ChangeUnicodeToTextInfo

Changes the mapping information contained in the specified Unicode converter object used to convert Unicode text to a non-Unicode encoding.

ConvertFromUnicodeToText

Converts a Unicode text string to the destination encoding you specify.

CreateUnicodeToTextInfo

Creates and returns a Unicode converter object containing information required for converting strings from Unicode to a non-Unicode encoding.

CreateUnicodeToTextInfoByEncoding

Based on the given text encoding specification for the converted text, creates and returns a Unicode converter object containing information required for converting strings from Unicode to the specified non-Unicode encoding.

DisposeUnicodeToTextInfo

Releases the memory allocated for the specified Unicode converter object.

ResetUnicodeToTextInfo

Reinitializes all state information kept by a Unicode converter object.

Converting From Unicode to Multiple Encodings

ConvertFromUnicodeToTextRun

Converts a string from Unicode to one or more encodings.

ConvertFromUnicodeToScriptCodeRun

Converts a string from Unicode to one or more scripts.

CreateUnicodeToTextRunInfo

Creates and returns a Unicode converter object containing the information required for converting a Unicode text string to strings in one or more non-Unicode encodings.

CreateUnicodeToTextRunInfoByEncoding

Based on the given text encoding specifications for the converted text runs, creates and returns a Unicode converter object containing information required for converting strings from Unicode to one or more specified non-Unicode encodings.

CreateUnicodeToTextRunInfoByScriptCode

Based on the given script codes for the converted text runs, creates and returns a Unicode converter object containing information required for converting strings from Unicode to one or more specified non-Unicode encodings.

DisposeUnicodeToTextRunInfo

Releases the memory allocated for the specified Unicode converter object.

ResetUnicodeToTextRunInfo

Reinitializes all state information kept by the context objects in TextRun conversions.

Converting Between Unicode and Pascal Strings

ConvertFromPStringToUnicode

Converts a Pascal string in a Mac OS text encoding to a Unicode string.

ConvertFromUnicodeToPString

Converts a Unicode string to Pascal in a Mac OS text encoding.

Obtaining Unicode Mapping Information

CountUnicodeMappings

Counts available mappings that meet the specified matching criteria.

QueryUnicodeMappings

Returns a list of the conversion mappings available on the system that meet specified matching criteria and returns the number of mappings found.

Truncating Strings Before Converting Them

TruncateForTextToUnicode

Identifies where your application can safely break a multibyte string to be converted to Unicode so that the string is not broken in the middle of a multibyte character.

TruncateForUnicodeToText

Identifies where your application can safely break a Unicode string to be converted to any encoding so that the string is broken in a way that preserves the text element integrity.

Setting the Fallback Handler

SetFallbackUnicodeToText

Specifies a fallback handler to be used for converting a Unicode text segment to another encoding when the Unicode Converter cannot convert the text using the mapping table specified by the Unicode converter object.

SetFallbackUnicodeToTextRun

Specifies a fallback handler to be used for converting a Unicode text segment to another encoding when the Unicode Converter cannot convert the text using the mapping table specified by a Unicode converter object.

Working With Universal Procedure Pointers

NewUnicodeToTextFallbackUPP

Creates a new universal procedure pointer (UPP) to a Unicode-to-text fallback callback.

DisposeUnicodeToTextFallbackUPP

Disposes of a a new universal procedure pointer (UPP) to a Unicode-to-text fallback callback.

InvokeUnicodeToTextFallbackUPP

Calls your Unicode-to-text fallback callback.

Getting UniChar Property Values

UCGetCharProperty

Obtains the value associated with a property type for the specified UniChar characters.

Callbacks

UnicodeToTextFallbackProcPtr

Defines a pointer to a function that converts a Unicode text element for which there is no destination encoding equivalent in the appropriate mapping table to the fallback character sequence defined by your fallback handler, and returns the converted character sequence to the Unicode Converter.

TECPluginGetPluginDispatchTablePtr

Defines a pointer to a function that returns a pointer to a plug-in dispatch table.

TECPluginNewEncodingConverterPtr

Defines a pointer to a function that determines a conversion path for a source and destination encoding, then creates a text encoding converter object and returns a pointer to it.

TECPluginClearContextInfoPtr

Defines a pointer to a function that resets a converter object to its initial state.

TECPluginConvertTextEncodingPtr

Defines a pointer to a function that converts stream of text from a source encoding to a destination encoding, using the conversion path specified by the converter object you supply.

TECPluginFlushConversionPtr

Defines a pointer to a function that flushes out any data in a converter object’s temporary buffers and resets the converter object.

TECPluginDisposeEncodingConverterPtr

Defines a pointer to a function that disposes of a converter object.

TECPluginNewEncodingSnifferPtr

Defines a pointer to a function that creates a sniffer object and returns a reference to it.

TECPluginClearSnifferContextInfoPtr

Defines a pointer to a function that resets a sniffer object to its initial settings.

TECPluginSniffTextEncodingPtr

Defines a pointer to a function that analyzes a text stream and returns the probable encodings in a ranked list, based on an array of possible encodings you supply; it also returns the number of errors and features for each encoding.

TECPluginDisposeEncodingSnifferPtr

Defines a pointer to a function that disposes of a sniffer object.

TECPluginGetCountAvailableTextEncodingsPtr

Defines a pointer to a function that obtains the available text encodings.

TECPluginGetCountAvailableTextEncodingPairsPtr

Defines a pointer to a function that obtains the available text encoding pairs.

TECPluginGetCountDestinationTextEncodingsPtr

Defines a pointer to a function that counts and returns the number of destination encodings to which a specified source encoding can be converted in one step.

TECPluginGetCountSubTextEncodingsPtr

Defines a pointer to a function that obtains the text encoding specifications for the subencodings the encoding scheme supports.

TECPluginGetCountAvailableSniffersPtr

Defines a pointer to a function that counts and returns the number of sniffers available in all installed plug-ins.

TECPluginGetCountWebEncodingsPtr

Defines a pointer to a function that obtains the available web text encodings.

TECPluginGetCountMailEncodingsPtr

Defines a pointer to a function that obtains the text encodings available for email.

TECPluginGetTextEncodingInternetNamePtr

Defines a pointer to a function that obtains the Internet text encoding name for a text encoding specification.

TECPluginGetTextEncodingFromInternetNamePtr

Defines a pointer to a function that obtains the text encoding for an Internet text encoding name.

Data Types

ConstScriptCodeRunPtr

Defines a constant script code run pointer.

ConstTextEncodingRunPtr

Defines a constant text encoding run pointer.

ConstTextPtr

Defines a constant text pointer.

ConstTextToUnicodeInfo

Defines a constant text to Unicode converter object.

ConstUniCharArrayPtr

Defines a constant Unicode character array pointer.

ConstUnicodeMappingPtr

Defines a constant Unicode mapping pointer.

ConstUnicodeToTextInfo

Defines a constant Unicode to text converter object.

ScriptCodeRun

Contains script code information for a text run.

TECBufferContextRec

Contains buffers for text and text encoding runs.

TECConversionInfo

Contains text encoding conversion information.

TECConverterContextRec

Contains converter information used by a Text Encoding Converter plug-in.

TECInfo

Contains information about the Unicode Converter, the Text Encoding Converter, and Basic Text Types.

TECObjectRef

Defines an opaque reference to a converter object.

TECPluginDispatchTable

Contains version and signature information and pointers to the callback functions used by a text encoding converter plug-in.

TECPluginSig

Defines a data type for a Text Encoding Converter plug-in signature.

TECPluginSignature

Defines a data type for a Text Encoding Converter plug-in signature.

TECPluginStateRec

Contains state information for a Text Encoding Converter plug-in.

TECPluginVersion

Defines a data type for Text Encoding Converter plug-in version.

TECSnifferContextRec

Contains infomation used by a sniffer object.

TECSnifferObjectRef

Defines a reference to an opaque sniffer object.

TextEncoding

Defines a data type for a text encoding value.

TextEncodingRun

Contains text encoding information for a text run.

TextEncodingVariant

Defines a data type for a text encoding variant.

TextToUnicodeInfo

Defines reference to an opaque Unicode converter object.

UniCharArrayOffset

Represents the boundary between two characters.

UnicodeMapping

Contains information for mapping to or from Unicode encoding.

UnicodeToTextFallbackUPP

Defines a universal procedure pointer to a Unicode-to-text-fallback callback function.

UnicodeToTextInfo

Defines a reference to an opaque Unicode to text converter object.

UnicodeToTextRunInfo

Defines a reference to an opaque Unicode to text run information converter object.

Feature Selectors

Conversion Flags

Specify how to perform conversion of text from one encoding to another.

Conversion Masks

Set or text for conversion flags.

Directionality Flags

Specify a text direction.

Directionality Masks

Set or text for directionality bits.

Unicode Converter Flags

Specify features for bug fixes in the Unicode Converter.

Unicode Converter Masks

Set or test for Unicode converter flags.

Unicode Fallback Sequencing Flag

Specifies options for setting fallback sequencing.

Unicode Fallback Sequencing Masks

Set or text for Unicode sequencing flag.

Unicode Matching Flags

Specify matching criteria for Unicode mappings.

Unicode Matching Masks

Used to set or test for Unicode matching flags.

Fallback Handler Selectors

Specify a fallback handler for the Unicode Converter to use.

Encodings and Variants

TextEncodingBase

Specify base text encodings.

Compatibility TextEncodings

Specify text encodings that are provided for backward compatibility.

EBCDIC and IBM Host Text Encodings

Specify text encodings used by IBM computers.

Encoding Variants for Big-5

Specify variants of Big-5 encoding.

Encoding Variants for Mac OS Encodings

Specify variant Mac OS encodings that use script codes other than 0

Encoding Variants for MacArabic

Specify variants of MacArabic.

Encoding Variants for MacCroatian

Specify variants of MacCroation.

Encoding Variants for MacCyrillic

Specify variants of MacCyrillic.

Encoding Variants for MacFarsi

Specify variants of MacFarsi.

Encoding Variants for MacHebrew

Specify variants of MacHebrew.

Encoding Variants for MacIcelandic

Specify variants of MacIcelandic.

Encoding Variants for MacJapanese

Specify variants of MacJapanese.

Encoding Variants for MacRoman

Specify variants of MacRoman.

Encoding Variants for MacRoman Related to Currency

Specify variants of MacRoman that are related to currency.

Encoding Variants for MacRomanian

Specify variants of MacRomanian.

Encoding Variants for MacRomanLatin1

Specify variants of MacRomanLatin1.

Encoding Variants for MacVT100

Specify variants of MacVT100.

Encoding Variants for Unicode

Specify variants of Unicode.

EUC Text Encodings

Specify Extendec Unix Code text encodings.

HFS Text Encoding

Specifies a Mac OS HFS text encoding.

ISO 2022 Text Encodings

Specify text encodings for ISO 2002.

ISO 8-bit and 7-bit Text Encodings

Specify text encodings for ISO 8-bit and 7-bit.

Mac Unicode Text Encoding

Specifies a script code that should be handled as a special Mac OS script code.

Miscellaneous Text Encoding Standards

Specify miscellaneous text encodings.

MS-DOS and Windows Text Encodings

Specify text encodings for MS-DOS and Windows.

National Standard Text Encodings

Specify text encodings for various national standards.

NextStep Platform Encodings

Specify text encodings for the NextStep platform.

Special Text Encoding Values

Specify special cases of text encodings.

TextEncodingFormat

Specify a text encoding format.

TextEncodingNameSelector

Specify the part of an encoding name you want to obtain.

Text Encoding Variants

Specify minor variants of a base encoding or group of base encodings.

Unicode and ISO UCS Text Encodings

Specify Unicode and IOS UCS text encodings.

Unsupported Unicode Variants

Represent Unicode variants that are not yet supported or fully defined.

Assorted Constants

Bidirectional Character Values

Specify bidirectional character properties.

Common and Special Unicode Values

Specify sommon and special Unicode code values.

TEC Plugin Dispatch Table Versions

Specify a version for a TEC plug-in dispatch table.

TEC Plug-in Signatures

Specify a TEC plug-in signature.

UCCharPropertyType

Specify property types for a Unicode charater.

UCCharPropertyValue

Specify a propery value for a Unicode character.

UnicodeMapVersion

Specify a Unicode mapping version.

Unwanted Data Constants

Specify data you don’t care about receiving.

Result Codes

The most common result codes returned by Text Encoding Conversion Manager are listed below.

kTextUnsupportedEncodingErr

The encoding or mapping is not supported for this function by the current set of tables or plug-ins.

kTextMalformedInputErr

The text input contains a sequence that is not legal in the specified encoding, such as a DBCS high byte followed by an invalid low byte (0x8120 in Shift-JIS).

kTextUndefinedElementErr

The text input contains a code point that is undefined in the specified encoding. The function did not completely convert the input string. You can resume conversion from a point beyond the offending character, or take some other action.

kTECMissingTableErr

The specified encoding is partially supported, but a specific table required for this function is missing.

kTECTableChecksumErr

A specific table required for this function has a checksum error, indicating that it has become corrupted.

kTECTableFormatErr

The table format is either invalid or it cannot be handled by the current version of the code. The function did not convert the string

kTECCorruptConverterErr

The converter object is invalid. Returned by the Text Encoding Converter functions only.

kTECNoConversionPathErr

The converter supports both the source and target encodings, but cannot convert between them either directly or indirectly. Returned by the Text Encoding Converter functions only.

kTECBufferBelowMinimumSizeErr

The output text buffer is too small to accommodate the result of processing of the first input text element. No part of the input string was processed.

kTECPartialCharErr

The input text ends in the middle of a multibyte character and conversion stopped. Append the unconverted input from this call to the beginning of the subsequent input text and call the function again.

kTECUnmappableElementErr

An input text element cannot be mapped to the specified output encoding(s) using the specified options. For the Unicode Converter, this error can occur only if kUnicodeUseFallbacksBit control flag is not set.

kTECIncompleteElementErr

The input text ends with a text element that might be incomplete, or contains a text element that is too long for the internal buffers.

kTECDirectionErr

An error, such as a direction stack overflow, occurred in directionality processing.

kTECGlobalsUnavailableErr

Global variables have already been deallocated, premature termination. The function did not convert the string.

kTECItemUnavailableErr

An item (for example, a name) is not available for the specified region (and encoding, if relevant).

kTECUsedFallbacksStatus

The function has completely converted the input string to the specified target using one or more fallbacks. For the Unicode Converter, this status code can only occur if the kUnicodeUseFallbacksBit control flag is set.

kTECNeedFlushStatus

The application disposed of a converter object by calling TECDisposeConverter, but there is still text contained in internal buffers. Returned by the Text Encoding Converter functions only.

kTECOutputBufferFullStatus

The converter successfully converted part of the input text, but the output buffer was not large enough to accommodate the entire input text after conversion. Convert the remaining text beginning from the position where the conversion stopped.

See Also

Managers

Alias Manager

Create and resolve alias records that describe file system objects such as files, directories, and volumes.

Component Manager

Find and use components in your app or add custom components to system-provided services, such as QuickTime and Core Audio.

File Manager

Interact with files, folders, and volumes.

Gestalt Manager

Investigate the operating environment of your app.