Apple Developer Connection
Member Login Log In | Not a Member? Contact ADC

Next Page > Hide TOC

Text Utilities Reference

Framework
CoreServices/CoreServices.h, Carbon/Carbon.h
Declared in
NumberFormatting.h
StringCompare.h
TextUtils.h
TypeSelect.h

Overview

The Text Utilities provide you with an integrated collection of routines for performing a variety of operations on textual information, ranging from modifying the contents of a string, to sorting strings from different languages, to converting times, dates, and numbers from internal representations to formatted strings and back. These routines work in conjunction with QuickDraw text drawing routines to help you display and modify text in applications that are distributed to an international audience.

The Text Utilities functions are used for numerous text-handling tasks, including

Carbon supports the majority of Text Utilities. However, Apple recommends that you use the comparison and word breaking utilities supplied by Unicode Utilities instead.

A number of obsolete Text Utilities functions-such as those prefixed with iu or IU-are not supported.

Functions by Task

Comparing Strings for Equality

Converting Between Integers and Strings

Converting Between Strings and Floating-Point Numbers

Converting Between C and Pascal Strings

Defining and Specifying Strings

Determining Sorting Order for Strings in Different Languages

Determining Sorting Order for Strings in the Same Language

Modifying Characters and Diacritical Marks

Searching for and Replacing Strings

Using Number Format Specification Strings for International Number Formatting

Working With Word, Script, and Line Boundaries

Working With Universal Procedure Pointers

Working With Type Select Records

Functions

Munger

Searches text for a specified string pattern and replaces it with another string.

long Munger (
   Handle h,
   long offset,
   const void *ptr1,
   long len1,
   const void *ptr2,
   long len2
);

Parameters
h

A handle to the text string that is being manipulated.

offset

The byte offset in the destination string at which Munger begins its operation.

ptr1

A pointer to the first character in the string for which Munger is searching.

len1

The number of bytes in the string for which Munger is searching.

ptr2

A pointer to the first character in the substitution string.

len2

The number of bytes in the substitution string.

Return Value

A negative value if Munger cannot find the designated string.

Discussion

Munger manipulates bytes in a string to which you specify a handle in the h parameter. The manipulation begins at a byte offset, specified in offset, in the string. Munger searches for the string specified by ptr1 and len1; when it finds an instance of that string, it replaces it with the substitution string, which is specified by ptr2 and len2.

Munger operates on a byte-by-byte basis, which can produce inappropriate results for 2-byte script systems. The ReplaceText function works properly for all languages. You are encouraged to use ReplaceText instead of Munger whenever possible.

Munger takes special action if either of the specified pointer values is NULL or if either of the length values is 0.

Be careful not to specify an offset with a value that is greater than the length of the destination string. Unpredictable results may occur.

Munger calls the GetHandleSize and SetHandleSize functions to access or modify the length of the string it is manipulating.

Special Considerations

Munger may move memory; your application should not call this function at interrupt time.

The destination string must be in a relocatable block that was allocated by the Memory Manager.

Availability
Declared In
TextUtils.h

Callbacks

IndexToStringProcPtr

Defines a pointer to your index-to-string callback function that retrieves the string associated with an index value.

Not recommended.

typedef Boolean (*IndexToStringProcPtr)
(
   short item,
   ScriptCode * itemsScript,
   StringPtr * itemsStringPtr,
   void * yourDataPtr
);

If you name your function MyIndexToStringProc, you would declare it like this:

Boolean MyIndexToStringProcPtr (
   short item,
   ScriptCode * itemsScript,
   StringPtr * itemsStringPtr,
   void * yourDataPtr
);

Parameters
item

The index value for which the TypeSelect function requests a string.

itemsScript

The script code of the string specified by itemsStringPtr.

itemsStringPtr

On return, points to the string that matches the index specify by the item parameter.

yourDataPtr

A pointer to your data structure. This is passed to your index-to-string callback, and can be NULL, depending on how you implement your callback function.

Return Value

Returns true if a string matching that index value was found; false otherwise.

Discussion

The use of this function is not recommended in a Unicode-based application. If you want to use this function in an application that uses the Unicode character set, you must first convert Unicode text strings to Macintosh encoded Pascal text strings. You must also provide the encoding type or be able to determine it by extracting it from the text or by examining the system or keyboard script.

Availability
Declared In
TypeSelect.h

Data Types

BreakTable

Contains information used to determine the boundaries of a word.

struct BreakTable {
   char charTypes[256];
   short tripleLength;
   short triples[1];
};
typedef struct BreakTable BreakTable;
typedef BreakTable * BreakTablePtr;

Discussion

You can supply a BreakTable as a parameter to the function FindWordBreaks.

Availability
Declared In
TextUtils.h

FormatClass

Defines a data type used to access entries in a triple integer array.

typedef SInt8 FormatClass;

Discussion

Each of the three FVector entries in a triple integer array is accessed by one of the values of the FormatClass type. See FVector for more information.

Availability
Declared In
NumberFormatting.h

FormatStatus

Defines a data type used to denote the confidence level for a conversion.

typedef short FormatStatus;

Discussion

A FormatStatus value is returned by the functions ExtendedToString, StringToExtended, FormatRecToString, and StringToFormatRec.

Availability
Declared In
NumberFormatting.h

FVector

Contains position and length information for one portion of a formatted numeric string.

struct FVector {
   short start;
   short length;
};
typedef struct FVector FVector;
typedef FVector TripleInt[3];

Fields
start

The starting byte position in the string of the specification information.

length

The number of bytes used in the string for the specification information.

Discussion

The FVector data structure is used in the TripleInt array.

Availability
Declared In
NumberFormatting.h

IndexToStringUPP

Defines a universal procedure pointer to an index-to-string callback.

typedef IndexToStringProcPtr IndexToStringUPP;

Discussion

For more information, see the description of the IndexToStringProcPtr callback function.

Availability
Declared In
TypeSelect.h

NBreakTable

Contains information used by the FindWordBreaks function to determine word boundaries.

struct NBreakTable {
   SInt8 flags1;
   SInt8 flags2;
   short version;
   short classTableOff;
   short auxCTableOff;
   short backwdTableOff;
   short forwdTableOff;
   short doBackup;
   short length;
   char charTypes[256];
   short tables[1];
};
typedef struct NBreakTable NBreakTable;
typedef NBreakTable * NBreakTablePtr;

Fields
flags1

The high-order byte of the break table format flags. If the high-order bit of this byte is set to 1, this break table is in the format used by FindWordBreaks.

flags2

The low-order byte of the break table format flags. If the value in this byte is 0, the break table belongs to a 1-byte script system; in this case FindWordBreaks does not check for 2-byte characters.

version

The version of this break table.

classTableOff

The offset in bytes from the beginning of the break table to the beginning of the class table.

auxCTableOff

The offset in bytes from the beginning of the break table to the beginning of the auxiliary class table.

backwdTableOff

The offset in bytes from the beginning of the break table to the beginning of the backward-processing table.

forwdTableOff

The offset in bytes from the beginning of the break table to the beginning of the forward-processing table.

doBackup

The minimum byte offset into the buffer for doing backward processing. If the selected character for FindWordBreaks has a byte offset less than doBackup, FindWordBreaks skips backward processing altogether and starts from the beginning of the buffer.

length

The length in bytes of the entire break table, including the individuals tables.

charTypes

The class table.

tables

The data of the auxiliary class table, backward table, and forward table.

Discussion

The tables have this format and content:

Availability
Declared In
TextUtils.h

NumFormatString

Contains data that represents the internal number formatting specification.

struct NumFormatString {
   UInt8 fLength;
   UInt8 fVersion;
   char data[254];
};
typedef struct NumFormatString NumFormatString;
typedef NumFormatString NumFormatStringRec;

Fields
fLength

The number of bytes in the data actually used for this number formatting specification.

fVersion

The version number of the number formatting specification.

data

The data that comprises the number formatting specification.

Discussion
Availability
Declared In
NumberFormatting.h

NumFormatStringRec

Defines an internal numeric representation that is independent of region, language, and other multicultural consideration.

typedef NumFormatString NumFormatStringRec;

Discussion

To allow for all of the international variations in numeric presentation styles, you need to include in your function calls a number parts table from a tokens ('itl4') resource. You can usually use the number parts table in the standard tokens resource that is supplied with the system. You also need to define the format of input and output numeric strings, including which characters (if any) to use as thousand separators, whether to indicate negative values with a minus sign or by enclosing the number in parentheses, and how to display zero values.

To make it possible to map a number that was formatted for one specification into another format, the Mac OS defines an internal numeric representation that is independent of region, language, and other multicultural considerations: the NumFormatStringRec structure. This structure is created from a number format specification string that defines the appearance of numeric strings.

Four of the numeric string functions use the number formatting specification, defined by the NumFormatStringRec data type: StringToFormatRec, FormatRecToString, StringToExtended, and ExtendedToString. The number format specification structure contains the data that represents the internal number formatting specification information. This data is stored in a private format.

Availability
Declared In
NumberFormatting.h

ScriptRunStatus

Contains script-specific information for a script run.

struct ScriptRunStatus {
   SInt8 script;
   SInt8 runVariant;
};
typedef struct ScriptRunStatus ScriptRunStatus;

Fields
script

The script code of the subscript run. Zero indicates the Roman script system.

runVariant

Script-specific information about the run, in the same format as that returned by the CharacterType function. This information includes the type of subscript—for example, Kanji, Katakana, or Hiragana for a Japanese script system.

Discussion

The FindScriptRun function returns the script run status structure, defined by the ScriptRunStatus data type, when it completes its processing, which is to find a run of subscript text in a string.

Availability
Declared In
TextUtils.h

TripleInt

Defines a data type used to return the position and length information for three different portions of a formatted numeric string.

typedef FVector TripleInt[3];

Discussion

The FormatRecToString function uses the triple-integer array, defined by the TripleInt data type, to return the starting position and length in a string of three different portions of a formatted numeric string: the positive value string, the negative value string, and the zero value string. Each element of the triple integer array is an FVector structure. Each of the three FVector entries in the triple integer array is accessed by one of the values of the FormatClass type.

Availability
Declared In
NumberFormatting.h

TypeSelectRecord

Contains a buffer of keystrokes, the script code associated with the keystrokes, and timer information.

struct TypeSelectRecord {
   unsigned long tsrLastKeyTime;
   ScriptCode tsrScript;
   Str63 tsrKeyStrokes;
};
typedef struct TypeSelectRecord TypeSelectRecord;

Fields
tsrLastKeyTime

A value that indicates timeout information.

tsrScript

A script code.

tsrKeyStrokes

The keystroke buffer.

Discussion

The TypeSelectRecord data structure is passed as a parameter to the functions TypeSelectNewKey, TypeSelectFindItem, TypeSelectCompare, and TypeSelectClear.

Availability
Declared In
TypeSelect.h

Constants

Format Result Types

Specify values that can be returned in the low byte of a format status (FormatStatus) value.

enum {
   fFormatOK = 0,
   fBestGuess = 1,
   fOutOfSynch = 2,
   fSpuriousChars = 3,
   fMissingDelimiter = 4,
   fExtraDecimal = 5,
   fMissingLiteral = 6,
   fExtraExp = 7,
   fFormatOverflow = 8,
   fFormStrIsNAN = 9,
   fBadPartsTable = 10,
   fExtraPercent = 11,
   fExtraSeparator = 12,
   fEmptyFormatString = 13
};
typedef SInt8 FormatResultType;

Constants
fFormatOK

Specifies format is okay.

Available in Mac OS X v10.0 and later.

Declared in NumberFormatting.h

fBestGuess

Specifies the format is the best guess.

Available in Mac OS X v10.0 and later.

Declared in NumberFormatting.h

fOutOfSynch

Specifies the format is out of sync.

Available in Mac OS X v10.0 and later.

Declared in NumberFormatting.h

fSpuriousChars

Specifies the format contains spurious characters.

Available in Mac OS X v10.0 and later.

Declared in NumberFormatting.h

fMissingDelimiter

Specifies a missing delimiter.

Available in Mac OS X v10.0 and later.

Declared in NumberFormatting.h

fExtraDecimal

Specifies the format contains an extra decimal sign.

Available in Mac OS X v10.0 and later.

Declared in NumberFormatting.h

fMissingLiteral

Specifies the format is missing a literal.

Available in Mac OS X v10.0 and later.

Declared in NumberFormatting.h

fExtraExp

Available in Mac OS X v10.0 and later.

Declared in NumberFormatting.h

fFormatOverflow

Specifies a format overflow.

Available in Mac OS X v10.0 and later.

Declared in NumberFormatting.h

fFormStrIsNAN

Available in Mac OS X v10.0 and later.

Declared in NumberFormatting.h

fBadPartsTable

Specifies the parts table is bad.

Available in Mac OS X v10.0 and later.

Declared in NumberFormatting.h

fExtraPercent

Specifies the format contains an extra percent sign.

Available in Mac OS X v10.0 and later.

Declared in NumberFormatting.h

fExtraSeparator

Specifies an extra separator.

Available in Mac OS X v10.0 and later.

Declared in NumberFormatting.h

fEmptyFormatString

Specifies the format string is empty.

Available in Mac OS X v10.0 and later.

Declared in NumberFormatting.h

Discussion

A format result type is returned in the low byte of a format status (FormatStatus) value. A FormatStatus value is returned by the functions ExtendedToString, StringToExtended, FormatRecToString, and StringToFormatRec. A format status value denotes the confidence level for a conversion.

TripleInt Index Values

Specify an index for a TripleInt array.

enum {
   fPositive = 0,
   fNegative = 1,
   fZero = 2
};

Constants
fPositive

Specifies the positive value string.

Available in Mac OS X v10.0 and later.

Declared in NumberFormatting.h

fNegative

Specifies the negative value string.

Available in Mac OS X v10.0 and later.

Declared in NumberFormatting.h

fZero

Specifies the zero value string.

Available in Mac OS X v10.0 and later.

Declared in NumberFormatting.h

Discussion

See TripleInt for more information.

NumFormatString Version

Specifies the first version of the NumFormatString data structure.

enum {
   fVNumber = 0
};

Discussion

See NumFormatString for more information.

Implicit Language Codes

Specify implicit language codes.

enum {
   systemCurLang = -2,
   systemDefLang = -3,
   currentCurLang = -4,
   currentDefLang = -5,
   scriptCurLang = -6,
   scriptDefLang = -7
};

Constants
systemCurLang

Specifies the current language for system script (from 'itlb').

Available in Mac OS X v10.0 and later.

Declared in StringCompare.h

systemDefLang

Specifies the default language for system script (from 'itlm').

Available in Mac OS X v10.0 and later.

Declared in StringCompare.h

currentCurLang

Specifies the current language for current script (from 'itlb').

Available in Mac OS X v10.0 and later.

Declared in StringCompare.h

currentDefLang

Specifies the default language for current script (from 'itlm').

Available in Mac OS X v10.0 and later.

Declared in StringCompare.h

scriptCurLang

Specifies the current language for specified script (from 'itlb')

Available in Mac OS X v10.0 and later.

Declared in StringCompare.h

scriptDefLang

Specifies the default language for specified script (from 'itlm')

Available in Mac OS X v10.0 and later.

Declared in StringCompare.h

Discussion

The functions LanguageOrder, StringOrder, and TextOrder accept as parameters implicit language codes listed here, as well as explicit language codes.

Type Select Modes

Contains type-select code information.

typedef SInt16 TSCode;
enum {
   tsPreviousSelectMode = -1,
   tsNormalSelectMode = 0,
   tsNextSelectMode = 1
};

Constants
tsPreviousSelectMode

Specifies previous-select mode.

Available in Mac OS X v10.0 and later.

Declared in TypeSelect.h

tsNormalSelectMode

Specifies normal-select mode.

Available in Mac OS X v10.0 and later.

Declared in TypeSelect.h

tsNextSelectMode

Specifies next-select mode.

Available in Mac OS X v10.0 and later.

Declared in TypeSelect.h

Discussion

This structure is passed as a parameter to the function TypeSelectFindItem.

Obsolete Language Code Values

Specify language code values that are no longer used.

enum {
   iuSystemCurLang = systemCurLang,
   iuSystemDefLang = systemDefLang,
   iuCurrentCurLang = currentCurLang,
   iuCurrentDefLang = currentDefLang,
   iuScriptCurLang = scriptCurLang,
   iuScriptDefLang = scriptDefLang
};



Next Page > Hide TOC


Last updated: 2007-05-29




Did this document help you?
Yes: Tell us what works for you.

It’s good, but: Report typos, inaccuracies, and so forth.

It wasn’t helpful: Tell us what would have helped.
Get information on Apple products.
Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Copyright © 2007 Apple Inc.
All rights reserved. | Terms of use | Privacy Notice