Enumeration

Tokenization Modifiers

Tokenization options are used with CFStringTokenizerCreate to specify how the string should be tokenized

Declaration

enum : CFOptionFlags {
    ...
};

Overview

You use the tokenization unit options with CFStringTokenizerCreate to specify how a string should be tokenized.

You use the modifiers together with a tokenization unit to modify the way the string is tokenized.

You use the attribute specifiers to tell the tokenizer to prepare the specified attribute when it tokenizes the given string. You can retrieve the attribute value by calling CFStringTokenizerCopyCurrentTokenAttribute with one of the attribute options.

The locale sensitivity of the tokenization unit options may change in a future release.

Topics

Constants

kCFStringTokenizerUnitWord

Specifies that a string should be tokenized by word. The locale parameter of CFStringTokenizerCreate is ignored.

kCFStringTokenizerUnitSentence

Specifies that a string should be tokenized by sentence. The locale parameter of CFStringTokenizerCreate is ignored.

kCFStringTokenizerUnitParagraph

Specifies that a string should be tokenized by paragraph. The locale parameter of CFStringTokenizerCreate is ignored.

kCFStringTokenizerUnitLineBreak

Specifies that a string should be tokenized by line break. The locale parameter of CFStringTokenizerCreate is ignored.

kCFStringTokenizerUnitWordBoundary

Specifies that a string should be tokenized by locale-sensitive word boundary.

kCFStringTokenizerAttributeLatinTranscription

Used with kCFStringTokenizerUnitWord, tells the tokenizer to prepare the Latin transcription when it tokenizes the string.

kCFStringTokenizerAttributeLanguage

Tells the tokenizer to prepare the language (specified as an RFC 3066bis string) when it tokenizes the string.