| Framework | CoreServices/CoreServices.h, Carbon/Carbon.h |
| Declared in | NumberFormatting.h StringCompare.h TextUtils.h TypeSelect.h |
The Text Utilities provide you with an integrated collection of routines for performing a variety of operations on textual information, ranging from modifying the contents of a string, to sorting strings from different languages, to converting times, dates, and numbers from internal representations to formatted strings and back. These routines work in conjunction with QuickDraw text drawing routines to help you display and modify text in applications that are distributed to an international audience.
The Text Utilities functions are used for numerous text-handling tasks, including
defining strings–including functions for allocating strings in the heap and for loading strings from resources
comparing and sorting strings–including functions for testing whether two strings are equal and functions for finding the sorting relationship between two strings
modifying the contents of strings–including routines for converting the case of characters, stripping diacritical marks, replacing substrings, and truncating strings
finding breaks and boundaries in text–including routines for finding word and line breaks, and for finding different script runs in a line of text
converting and formatting date and time strings–including routines that convert numeric and string representations of dates and times into record format, and routines that convert numeric and record representations of dates and times into strings
converting and formatting numeric strings–including routines that convert string representations of numbers into numeric representations
Carbon supports the majority of Text Utilities. However, Apple recommends that you use the comparison and word breaking utilities supplied by Unicode Utilities instead.
A number of obsolete Text Utilities functions-such as those prefixed with iu or IU-are not supported.
EqualString Deprecated in Mac OS X v10.4
IdenticalString Deprecated in Mac OS X v10.4
IdenticalText Deprecated in Mac OS X v10.4
NumToString Deprecated in Mac OS X v10.4
StringToNum Deprecated in Mac OS X v10.4
ExtendedToString Deprecated in Mac OS X v10.4
StringToExtended Deprecated in Mac OS X v10.4
c2pstr Deprecated in Mac OS X v10.4
C2PStr Deprecated in Mac OS X v10.4
c2pstrcpy Deprecated in Mac OS X v10.4
CopyCStringToPascal Deprecated in Mac OS X v10.4
CopyPascalStringToC Deprecated in Mac OS X v10.4
p2cstr Deprecated in Mac OS X v10.4
P2CStr Deprecated in Mac OS X v10.4
p2cstrcpy Deprecated in Mac OS X v10.4
GetIndString Deprecated in Mac OS X v10.4
GetString Deprecated in Mac OS X v10.4
NewString Deprecated in Mac OS X v10.4
SetString Deprecated in Mac OS X v10.4
LanguageOrder Deprecated in Mac OS X v10.4
ScriptOrder Deprecated in Mac OS X v10.4
StringOrder Deprecated in Mac OS X v10.4
TextOrder Deprecated in Mac OS X v10.4
CompareString Deprecated in Mac OS X v10.4
CompareText Deprecated in Mac OS X v10.4
RelString Deprecated in Mac OS X v10.4
relstring Deprecated in Mac OS X v10.4
LowercaseText Deprecated in Mac OS X v10.4
StripDiacritics Deprecated in Mac OS X v10.4
UppercaseStripDiacritics Deprecated in Mac OS X v10.4
UppercaseText Deprecated in Mac OS X v10.4
UpperString Deprecated in Mac OS X v10.4
upperstring Deprecated in Mac OS X v10.4
Munger
ReplaceText Deprecated in Mac OS X v10.4
FormatRecToString Deprecated in Mac OS X v10.4
StringToFormatRec Deprecated in Mac OS X v10.4
FindScriptRun Deprecated in Mac OS X v10.4
FindWordBreaks Deprecated in Mac OS X v10.4
DisposeIndexToStringUPP Deprecated in Mac OS X v10.4
InvokeIndexToStringUPP Deprecated in Mac OS X v10.4
NewIndexToStringUPP Deprecated in Mac OS X v10.4
TypeSelectClear Deprecated in Mac OS X v10.4
TypeSelectCompare Deprecated in Mac OS X v10.4
TypeSelectFindItem Deprecated in Mac OS X v10.4
TypeSelectNewKey Deprecated in Mac OS X v10.4
Searches text for a specified string pattern and replaces it with another string.
long Munger ( Handle h, long offset, const void *ptr1, long len1, const void *ptr2, long len2 );
A handle to the text string that is being manipulated.
The byte offset in the destination string at which Munger begins its operation.
A pointer to the first character in the string for which Munger is searching.
The number of bytes in the string for which Munger is searching.
A pointer to the first character in the substitution string.
The number of bytes in the substitution string.
A negative value if Munger cannot find the designated string.
Munger manipulates bytes in a string to which you specify a handle in the h parameter. The manipulation begins at a byte offset, specified in offset, in the string. Munger searches for the string specified by ptr1 and len1; when it finds an instance of that string, it replaces it with the substitution string, which is specified by ptr2 and len2.
Munger operates on a byte-by-byte basis, which can produce inappropriate results for 2-byte script systems. The ReplaceText function works properly for all languages. You are encouraged to use ReplaceText instead of Munger whenever possible.
Munger takes special action if either of the specified pointer values is NULL or if either of the length values is 0.
If ptr1 is NULL, Munger replaces characters without searching. It replaces len1 characters starting at the offset location with the substitution string.
If ptr1 is NULL and len1 is negative, Munger replaces all of the characters from the offset location to the end of the string with the substitution string.
If len1 is 0, Munger inserts the substitution string without replacing anything. Munger inserts the string at the offset location and returns the offset of the first byte past where the insertion occurred.
If ptr2 is NULL, Munger searches but does not replace. In this case, Munger returns the offset at which the string was found.
If len2 is 0 and ptr2 is not NULL, Munger searches and deletes. In this case, Munger returns the offset at which it deleted.
If the portion of the string from the offset location to its end matches the beginning of the string that Munger is searching for, Munger replaces that portion with the substitution string.
Be careful not to specify an offset with a value that is greater than the length of the destination string. Unpredictable results may occur.
Munger calls the GetHandleSize and SetHandleSize functions to access or modify the length of the string it is manipulating.
Munger may move memory; your application should not call this function at interrupt time.
The destination string must be in a relocatable block that was allocated by the Memory Manager.
TextUtils.hDefines a pointer to your index-to-string callback function that retrieves the string associated with an index value.
Not recommended.
typedef Boolean (*IndexToStringProcPtr) ( short item, ScriptCode * itemsScript, StringPtr * itemsStringPtr, void * yourDataPtr );
If you name your function MyIndexToStringProc,
you would declare it like this:
Boolean MyIndexToStringProcPtr ( short item, ScriptCode * itemsScript, StringPtr * itemsStringPtr, void * yourDataPtr );
The index value for which the TypeSelect function
requests a string.
The script code of the string specified by itemsStringPtr.
On return, points to the string that matches
the index specify by the item parameter.
A pointer to your data structure. This is
passed to your index-to-string callback, and can be NULL,
depending on how you implement your callback function.
Returns true if
a string matching that index value was found; false otherwise.
The use of this function is not recommended in a Unicode-based application. If you want to use this function in an application that uses the Unicode character set, you must first convert Unicode text strings to Macintosh encoded Pascal text strings. You must also provide the encoding type or be able to determine it by extracting it from the text or by examining the system or keyboard script.
TypeSelect.hContains information used to determine the boundaries of a word.
struct BreakTable {
char charTypes[256];
short tripleLength;
short triples[1];
};
typedef struct BreakTable BreakTable;
typedef BreakTable * BreakTablePtr;
You can supply a BreakTable as a parameter to the function FindWordBreaks.
TextUtils.h
Defines a data type used to access entries in a triple integer array.
typedef SInt8 FormatClass;
Each of the three FVector entries in a triple integer array is accessed by one of the values of the FormatClass type. See FVector for more information.
NumberFormatting.h
Defines a data type used to denote the confidence level for a conversion.
typedef short FormatStatus;
A FormatStatus value is returned by the functions ExtendedToString, StringToExtended, FormatRecToString, and StringToFormatRec.
NumberFormatting.h
Contains position and length information for one portion of a formatted numeric string.
struct FVector {
short start;
short length;
};
typedef struct FVector FVector;
typedef FVector TripleInt[3];
startThe starting byte position in the string of the specification information.
lengthThe number of bytes used in the string for the specification information.
The FVector data structure is used in the TripleInt array.
NumberFormatting.h
Defines a universal procedure pointer to an index-to-string callback.
typedef IndexToStringProcPtr IndexToStringUPP;
For more information, see the description of the IndexToStringProcPtr callback function.
TypeSelect.h
Contains information used by the FindWordBreaks function to determine word boundaries.
struct NBreakTable {
SInt8 flags1;
SInt8 flags2;
short version;
short classTableOff;
short auxCTableOff;
short backwdTableOff;
short forwdTableOff;
short doBackup;
short length;
char charTypes[256];
short tables[1];
};
typedef struct NBreakTable NBreakTable;
typedef NBreakTable * NBreakTablePtr;
flags1The high-order byte of the break table format flags. If the high-order bit of this byte is set to 1, this break table is in the format used by FindWordBreaks.
flags2The low-order byte of the break table format flags. If the value in this byte is 0, the break table belongs to a 1-byte script system; in this case FindWordBreaks does not check for 2-byte characters.
versionThe version of this break table.
classTableOffThe offset in bytes from the beginning of the break table to the beginning of the class table.
auxCTableOffThe offset in bytes from the beginning of the break table to the beginning of the auxiliary class table.
backwdTableOffThe offset in bytes from the beginning of the break table to the beginning of the backward-processing table.
forwdTableOffThe offset in bytes from the beginning of the break table to the beginning of the forward-processing table.
doBackupThe minimum byte offset into the buffer for doing backward processing. If the selected character for FindWordBreaks has a byte offset less than doBackup, FindWordBreaks skips backward processing altogether and starts from the beginning of the buffer.
lengthThe length in bytes of the entire break table, including the individuals tables.
charTypesThe class table.
tablesThe data of the auxiliary class table, backward table, and forward table.
The tables have this format and content:
The class table is an array of 256 signed bytes. Offsets into the table represent byte values; if the entry at a given offset in the table is positive, it means that a byte whose value equals that offset is a single-byte character, and the entry at that offset is the class number for the character. If the entry is negative, it means that the byte is the first byte of a 2-byte character code, and the auxiliary class table must be used to determine the character class. Odd negative numbers are handled differently from even negative numbers.
The auxiliary class table assigns character classes to 2-byte characters. It is used when the class table determines that a byte value represents the first byte of a 2-byte character.
The auxiliary class table handles odd negative values from the class table as follows. If the first word of the auxiliary class table is equal to or greater than zero, it represents the default class number for 2-byte character codes—the class assigned to every odd negative value from the class table. If the first word is less than zero, it is the negative of the offset from the beginning of the auxiliary class table to a first-byte class table (a table of 2-byte character classes that can be determined from just the first byte). The value from the class table is negated, 1 is subtracted from it to obtain an even offset, and the value at that offset into the first-byte class table is the class to be assigned.
The auxiliary class table handles even negative values from the class table as follows. The auxiliary class table begins with a variable-length word array. The words that follow the first word are offsets to row tables. Row tables have the same format as the class table, but are used to map the second byte of a 2-byte character code to a class number. If the entry in the class table for a given byte is an even negative number, FindWordBreaks negates this value to obtain the offset from the beginning of the auxiliary class table to the appropriate word, which in turn contains an offset to the appropriate row table. That row table is then used to map the second byte of the character to a class number.
The backward-processing table is a state table used by FindWordBreaks for backward searching. Using the backward-processing table, FindWordBreaks starts at a specified character, moving backward as necessary until it encounters a word boundary.
The forward-processing table is a state table used by FindWordBreaks for forward searching. Using the forward-processing table, FindWordBreaks starts at one word boundary and moves forward until it encounters another word boundary.
TextUtils.h
Contains data that represents the internal number formatting specification.
struct NumFormatString {
UInt8 fLength;
UInt8 fVersion;
char data[254];
};
typedef struct NumFormatString NumFormatString;
typedef NumFormatString NumFormatStringRec;
fLengthThe number of bytes in the data actually used for this number formatting specification.
fVersionThe version number of the number formatting specification.
dataThe data that comprises the number formatting specification.
NumberFormatting.h
Defines an internal numeric representation that is independent of region, language, and other multicultural consideration.
typedef NumFormatString NumFormatStringRec;
To allow for all of the international variations in numeric presentation styles, you need to include in your function calls a number parts table from a tokens ('itl4') resource. You can usually use the number parts table in the standard tokens resource that is supplied with the system. You also need to define the format of input and output numeric strings, including which characters (if any) to use as thousand separators, whether to indicate negative values with a minus sign or by enclosing the number in parentheses, and how to display zero values.
To make it possible to map a number that was formatted for one specification into another format, the Mac OS defines an internal numeric representation that is independent of region, language, and other multicultural considerations: the NumFormatStringRec structure. This structure is created from a number format specification string that defines the appearance of numeric strings.
Four of the numeric string functions use the number formatting specification, defined by the NumFormatStringRec data type: StringToFormatRec, FormatRecToString, StringToExtended, and ExtendedToString. The number format specification structure contains the data that represents the internal number formatting specification information. This data is stored in a private format.
NumberFormatting.h
Contains script-specific information for a script run.
struct ScriptRunStatus {
SInt8 script;
SInt8 runVariant;
};
typedef struct ScriptRunStatus ScriptRunStatus;
scriptThe script code of the subscript run. Zero indicates the Roman script system.
runVariantScript-specific information about the run, in the same format as that returned by the CharacterType function. This information includes the type of subscript—for example, Kanji, Katakana, or Hiragana for a Japanese script system.
The FindScriptRun function returns the script run status structure, defined by the ScriptRunStatus data type, when it completes its processing, which is to find a run of subscript text in a string.
TextUtils.h
Defines a data type used to return the position and length information for three different portions of a formatted numeric string.
typedef FVector TripleInt[3];
The FormatRecToString function uses the triple-integer array, defined by the TripleInt data type, to return the starting position and length in a string of three different portions of a formatted numeric string: the positive value string, the negative value string, and the zero value string. Each element of the triple integer array is an FVector structure. Each of the three FVector entries in the triple integer array is accessed by one of the values of the FormatClass type.
NumberFormatting.h
Contains a buffer of keystrokes, the script code associated with the keystrokes, and timer information.
struct TypeSelectRecord {
unsigned long tsrLastKeyTime;
ScriptCode tsrScript;
Str63 tsrKeyStrokes;
};
typedef struct TypeSelectRecord TypeSelectRecord;
tsrLastKeyTimeA value that indicates timeout information.
tsrScriptA script code.
tsrKeyStrokesThe keystroke buffer.
The TypeSelectRecord data structure is passed as a parameter to the functions TypeSelectNewKey, TypeSelectFindItem, TypeSelectCompare, and TypeSelectClear.
TypeSelect.hSpecify values that can be returned in the low byte of
a format status (FormatStatus)
value.
enum {
fFormatOK = 0,
fBestGuess = 1,
fOutOfSynch = 2,
fSpuriousChars = 3,
fMissingDelimiter = 4,
fExtraDecimal = 5,
fMissingLiteral = 6,
fExtraExp = 7,
fFormatOverflow = 8,
fFormStrIsNAN = 9,
fBadPartsTable = 10,
fExtraPercent = 11,
fExtraSeparator = 12,
fEmptyFormatString = 13
};
typedef SInt8 FormatResultType;
fFormatOKSpecifies format is okay.
Available in Mac OS X v10.0 and later.
Declared in NumberFormatting.h
fBestGuessSpecifies the format is the best guess.
Available in Mac OS X v10.0 and later.
Declared in NumberFormatting.h
fOutOfSynchSpecifies the format is out of sync.
Available in Mac OS X v10.0 and later.
Declared in NumberFormatting.h
fSpuriousCharsSpecifies the format contains spurious characters.
Available in Mac OS X v10.0 and later.
Declared in NumberFormatting.h
fMissingDelimiterSpecifies a missing delimiter.
Available in Mac OS X v10.0 and later.
Declared in NumberFormatting.h
fExtraDecimalSpecifies the format contains an extra decimal sign.
Available in Mac OS X v10.0 and later.
Declared in NumberFormatting.h
fMissingLiteralSpecifies the format is missing a literal.
Available in Mac OS X v10.0 and later.
Declared in NumberFormatting.h
fExtraExpAvailable in Mac OS X v10.0 and later.
Declared in NumberFormatting.h
fFormatOverflowSpecifies a format overflow.
Available in Mac OS X v10.0 and later.
Declared in NumberFormatting.h
fFormStrIsNANAvailable in Mac OS X v10.0 and later.
Declared in NumberFormatting.h
fBadPartsTableSpecifies the parts table is bad.
Available in Mac OS X v10.0 and later.
Declared in NumberFormatting.h
fExtraPercentSpecifies the format contains an extra percent sign.
Available in Mac OS X v10.0 and later.
Declared in NumberFormatting.h
fExtraSeparatorSpecifies an extra separator.
Available in Mac OS X v10.0 and later.
Declared in NumberFormatting.h
fEmptyFormatStringSpecifies the format string is empty.
Available in Mac OS X v10.0 and later.
Declared in NumberFormatting.h
A format result type is returned in the low byte of a format
status (FormatStatus)
value. A FormatStatus value
is returned by the functions ExtendedToString, StringToExtended, FormatRecToString,
and StringToFormatRec.
A format status value denotes the confidence level for a conversion.
Specify an index for a TripleInt array.
enum {
fPositive = 0,
fNegative = 1,
fZero = 2
};
fPositiveSpecifies the positive value string.
Available in Mac OS X v10.0 and later.
Declared in NumberFormatting.h
fNegativeSpecifies the negative value string.
Available in Mac OS X v10.0 and later.
Declared in NumberFormatting.h
fZeroSpecifies the zero value string.
Available in Mac OS X v10.0 and later.
Declared in NumberFormatting.h
See TripleInt for
more information.
Specifies the first version of the NumFormatString data
structure.
enum {
fVNumber = 0
};
See NumFormatString for
more information.
Specify implicit language codes.
enum {
systemCurLang = -2,
systemDefLang = -3,
currentCurLang = -4,
currentDefLang = -5,
scriptCurLang = -6,
scriptDefLang = -7
};
systemCurLangSpecifies the current language for system script
(from 'itlb').
Available in Mac OS X v10.0 and later.
Declared in StringCompare.h
systemDefLangSpecifies the default language for system script
(from 'itlm').
Available in Mac OS X v10.0 and later.
Declared in StringCompare.h
currentCurLangSpecifies the current language for current
script (from 'itlb').
Available in Mac OS X v10.0 and later.
Declared in StringCompare.h
currentDefLangSpecifies the default language for current
script (from 'itlm').
Available in Mac OS X v10.0 and later.
Declared in StringCompare.h
scriptCurLangSpecifies the current language for specified
script (from 'itlb')
Available in Mac OS X v10.0 and later.
Declared in StringCompare.h
scriptDefLangSpecifies the default language for specified
script (from 'itlm')
Available in Mac OS X v10.0 and later.
Declared in StringCompare.h
The functions LanguageOrder, StringOrder, and TextOrder accept as
parameters implicit language codes listed here, as well as explicit
language codes.
Contains type-select code information.
typedef SInt16 TSCode;
enum {
tsPreviousSelectMode = -1,
tsNormalSelectMode = 0,
tsNextSelectMode = 1
};
tsPreviousSelectModeSpecifies previous-select mode.
Available in Mac OS X v10.0 and later.
Declared in TypeSelect.h
tsNormalSelectModeSpecifies normal-select mode.
Available in Mac OS X v10.0 and later.
Declared in TypeSelect.h
tsNextSelectModeSpecifies next-select mode.
Available in Mac OS X v10.0 and later.
Declared in TypeSelect.h
This structure is passed as a parameter to the function TypeSelectFindItem.
Specify language code values that are no longer used.
enum {
iuSystemCurLang = systemCurLang,
iuSystemDefLang = systemDefLang,
iuCurrentCurLang = currentCurLang,
iuCurrentDefLang = currentDefLang,
iuScriptCurLang = scriptCurLang,
iuScriptDefLang = scriptDefLang
};
Last updated: 2007-05-29