A tokenizer that segments natural language text into semantic units.


class NLTokenizer : NSObject


NLTokenizer creates individual units from natural language text. Define the desired unit (word, sentence, paragraph, or document as declared in the NLTokenUnit) for tokenization, and then assign a string to tokenize. enumerateTokens(in:using:) provides the ranges of the tokens in the string based on the tokenization unit.


Creating a Tokenizer

init(unit: NLTokenUnit)

Creates a tokenizer with the specified unit.

Configuring a Tokenizer

var string: String?

The text to be tokenized.

func setLanguage(NLLanguage)

Sets the language of the text to be tokenized.

var unit: NLTokenUnit

The linguistic unit that this tokenizer uses.

struct NLTokenizer.Attributes

Hints about the contents of the string for the tokenizer.

Enumerating the Tokens

func enumerateTokens(in: Range<String.Index>, using: (Range<String.Index>, NLTokenizer.Attributes) -> Bool)

Enumerates over a given range of the string and calls the specified block for each token.

func tokens(for: Range<String.Index>) -> [Range<String.Index>]

Tokenizes the string within the provided range.

func tokenRange(at: String.Index) -> Range<String.Index>

Finds the range of the token at the given index.


Inherits From

Conforms To

See Also


Tokenizing Natural Language Text

Enumerate the words in a string.

Beta Software

This documentation contains preliminary information about an API or technology in development. This information is subject to change, and software implemented according to this documentation should be tested with final operating system software.

Learn more about using Apple's beta software