Enumerate the words in a string.
- Natural Language
When you work with natural language text, it’s often useful to tokenize the text into individual words. Using
NLTokenizer to enumerate words, rather than simply splitting components by whitespace, ensures correct behavior in multiple scripts and languages. For example, neither Chinese nor Japanese uses spaces to delimit words.
The example and accompanying steps below show how you use
NLTokenizer to enumerate over the words in natural language text.
Create an instance of
NLTokenas the unit to tokenize.
stringproperty of the tokenizer to the natural language text.
Enumerate over the entire range of the string by calling the
enumeratemethod, specifying the entire range of the string to process.
In the enumeration block, take a substring of the original text at
tokento obtain each word.
Run this code to print out each word in text on a new line.