A tokenizer that segments natural language text into semantic units.


class NLTokenizer : NSObject


NLTokenizer creates individual units from natural language text. Define the desired unit (word, sentence, paragraph, or document as declared in the NLTokenUnit) for tokenization, and then assign a string to tokenize. enumerateTokens(in:using:) provides the ranges of the tokens in the string based on the tokenization unit.


Creating a Tokenizer

init(unit: NLTokenUnit)

Creates a tokenizer with the specified unit.

Configuring a Tokenizer

var string: String?

The text to be tokenized.

func setLanguage(NLLanguage)

Sets the language of the text to be tokenized.

var unit: NLTokenUnit

The linguistic unit that this tokenizer uses.

struct NLTokenizer.Attributes

Hints about the contents of the string for the tokenizer.

Enumerating the Tokens

func enumerateTokens(in: Range<String.Index>, using: (Range<String.Index>, NLTokenizer.Attributes) -> Bool)

Enumerates over a given range of the string and calls the specified block for each token.

func tokens(for: Range<String.Index>) -> [Range<String.Index>]

Tokenizes the string within the provided range.

func tokenRange(at: String.Index) -> Range<String.Index>

Finds the range of the token at the given index.


Inherits From

Conforms To

See Also


Tokenizing Natural Language Text

Enumerate the words in a string.