NSLinguisticTagger Class Reference

Inherits from
Conforms to
Framework
/System/Library/Frameworks/Foundation.framework
Availability
Available in OS X v10.7 and later.
Declared in
NSLinguisticTagger.h
Related sample code

Overview

The NSLinguisticTagger class is used to automatically segment natural-language text and tag it with information, such as parts of speech. It can also tag languages, scripts, stem forms of words, etc. An instance of this class is assigned a string to tag, and clients can then obtain tags and ranges for tokens in that string appropriate to a given tag scheme.

Thread Safety

A given instance of NSLinguisticTagger should not be used from more than one thread simultaneously.

Tasks

Creating a Linguistic Tagger

Getting the Tag Schemes

Getting and Setting the Analyzed String

Getting and Setting Orthography

Enumerating Linguistic Tags

Determining a Sentence for a Range

Class Methods

availableTagSchemesForLanguage:

Returns the tag schemes supported by the linguistic tagger for a particular language.

+ (NSArray *)availableTagSchemesForLanguage:(NSString *)language
Parameters
language

A standard abbreviation as with NSOrthography.

Return Value

An array of “Linguistic Tag Schemes.”

Discussion

Clients wishing to know the tag schemes supported for a NSLinguisticTagger instance for a particular language may query them with this method. The language should be specified using a standard abbreviation as with NSOrthography.

Availability
  • Available in OS X v10.7 and later.
Declared In
NSLinguisticTagger.h

Instance Methods

enumerateTagsInRange:scheme:options:usingBlock:

Enumerates the specific range of the string, providing the Block with the located tags.

- (void)enumerateTagsInRange:(NSRange)range scheme:(NSString *)tagScheme options:(NSLinguisticTaggerOptions)opts usingBlock:(void (^)(NSString *tag, NSRange tokenRange, NSRange sentenceRange, BOOL *stop))block
Parameters
range

The range to analyze

tagScheme

The tag scheme.

opts

The linguistic tagger options to use. See “NSLinguisticTaggerOptions” for the constants. These constants can be combined using the C Bitwise operator.

block

The Block to apply to ranges of the string.

The Block takes four arguments:

tag

The located linguistic tag.

tokenRange

The range of the linguistic tag.

sentenceRange

The range of the sentence in which the tag occurs.

stop

A reference to a Boolean value. The block can set the value to YES to stop further processing of the set. The stop argument is an out-only argument. You should only ever set this Boolean to YES within the Block.

Discussion

The tagger will segment the string as needed into sentences and tokens, and return those ranges along with a tag for any scheme in its array of tag schemes.

This is the fundamental tagging method of NSLinguisticTagger. This method’s block iterates over all tokens intersecting a given range, supplying tags and ranges. There are several additional convenience methods, for obtaining a sentence range, information about a single token, or information about all tokens intersecting a given range at once.

For example, if the tag scheme is NSLinguisticTagSchemeLexicalClass, the tags will specify the part of speech (for word tokens) or the type of whitespace or punctuation (for whitespace or punctuation tokens). If the tag scheme is NSLinguisticTagSchemeLemma, the tags will specify the stem form of the word (if known) for each word token.

It is important to note that this method will return the ranges of all tokens that intersect the given range.

Availability
  • Available in OS X v10.7 and later.
Related Sample Code
Declared In
NSLinguisticTagger.h

initWithTagSchemes:options:

Creates a linguistic tagger instance using the specified tag schemes and options.

- (id)initWithTagSchemes:(NSArray *)tagSchemes options:(NSUInteger)opts
Parameters
tagSchemes

An array of tag schemes. See “Linguistic Tag Schemes” for the possible values.

opts

The linguistic tagger options to use. See “NSLinguisticTaggerOptions” for the constants. These constants can be combined using the C-Bitwise OR operator.

Return Value

An initialized linguistic tagger.

Availability
  • Available in OS X v10.7 and later.
Related Sample Code
Declared In
NSLinguisticTagger.h

orthographyAtIndex:effectiveRange:

Returns the orthography at the index and also returns the effective range.

- (NSOrthography *)orthographyAtIndex:(NSUInteger)charIndex effectiveRange:(NSRangePointer)effectiveRange
Parameters
charIndex

The character index to begin examination.

effectiveRange

An NSRangePointer that, upon completion, contains the range of the orthography containing charIndex.

Return Value

The orthography for the location.

Availability
  • Available in OS X v10.7 and later.
Declared In
NSLinguisticTagger.h

possibleTagsAtIndex:scheme:tokenRange:sentenceRange:scores:

Returns an array of possible tags for the given scheme at the specified range, supplying matching scores.

- (NSArray *)possibleTagsAtIndex:(NSUInteger)charIndex scheme:(NSString *)tagScheme tokenRange:(NSRangePointer)tokenRange sentenceRange:(NSRangePointer)sentenceRange scores:(NSArray **)scores
Parameters
charIndex

The initial character index.

tagScheme

The tag scheme. See “Linguistic Tag Schemes” for the possible values.

tokenRange

The token range.

sentenceRange

The range of the sentence.

scores

Returns by-reference an array of numeric scores (wrapped as NSValue objects) indicating the likelihood that the range matches the tag scheme.

Return Value

Returns an array of possible tags for the tagScheme at the specified location, starting with the most likely tag scheme. For some tag schemes only a single tag will be returned, but for others a list of possibilities will be provided.

Availability
  • Available in OS X v10.7 and later.
Declared In
NSLinguisticTagger.h

sentenceRangeForRange:

Returns the range of a sentence boundary containing the specified range.

- (NSRange)sentenceRangeForRange:(NSRange)charRange
Parameters
charRange

The range.

Return Value

Returns the range of a sentence that contains charRange.

Discussion

This method can be used to obtain the enclosing sentence range given a token range.

Availability
  • Available in OS X v10.7 and later.
Declared In
NSLinguisticTagger.h

setOrthography:range:

Sets the orthography for the specified range.

- (void)setOrthography:(NSOrthography *)orthography range:(NSRange)charRange
Parameters
orthography

The orthography.

charRange

The range.

Discussion

If the orthography of the linguistic tagger is not set, it will determine it automatically from the contents of the text. Clients should call this method only if they already know the language of the text by some other means.

Availability
  • Available in OS X v10.7 and later.
Declared In
NSLinguisticTagger.h

setString:

Sets the string to be analyzed by the linguistic tagger.

- (void)setString:(NSString *)string
Parameters
string

The string.

Availability
  • Available in OS X v10.7 and later.
Declared In
NSLinguisticTagger.h

string

Returns the string being analyzed by the linguistic tagger.

- (NSString *)string
Return Value

The string.

Availability
  • Available in OS X v10.7 and later.
Related Sample Code
Declared In
NSLinguisticTagger.h

stringEditedInRange:changeInLength:

Notifies the linguistic tagger that the string (if mutable) has changed as specified by the parameters.

- (void)stringEditedInRange:(NSRange)newCharRange changeInLength:(NSInteger)delta
Parameters
newCharRange

The range in the final string that was edited.

delta

The change in length.

Availability
  • Available in OS X v10.7 and later.
Declared In
NSLinguisticTagger.h

tagAtIndex:scheme:tokenRange:sentenceRange:

Returns a tag for a single scheme at the specified index.

- (NSString *)tagAtIndex:(NSUInteger)charIndex scheme:(NSString *)tagScheme tokenRange:(NSRangePointer)tokenRange sentenceRange:(NSRangePointer)sentenceRange
Parameters
charIndex

The initial character index.

tagScheme

The tag scheme. See “Linguistic Tag Schemes” for the possible values.

tokenRange

A pointer to the token range. If NULL, no pointer range is returned.

sentenceRange

A pointer to the range of the sentence. If NULL, no pointer range is returned.

Return Value

Returns the tag for the requested tagScheme. There are cases in which there may not be a tag for a given scheme and token, in which case the return value of the method would be nil.

Discussion

When the returned array contains entries that do not have a corresponding tagScheme, that entry is an instance of NSNull.

Availability
  • Available in OS X v10.7 and later.
Declared In
NSLinguisticTagger.h

tagSchemes

Returns the tag schemes supported by the linguistic tagger for a particular language.

- (NSArray *)tagSchemes
Return Value

An array of tag schemes. See “Linguistic Tag Schemes” for the possible values.

Availability
  • Available in OS X v10.7 and later.
Declared In
NSLinguisticTagger.h

tagsInRange:scheme:options:tokenRanges:

Returns an array of linguistic tags and token ranges.

- (NSArray *)tagsInRange:(NSRange)range scheme:(NSString *)tagScheme options:(NSLinguisticTaggerOptions)opts tokenRanges:(NSArray **)tokenRanges
Parameters
range

The range from which to return tags.

tagScheme

The tag scheme. See “Linguistic Tag Schemes” for the possible values.

opts

The linguistic tagger options to use. See “NSLinguisticTaggerOptions” for the constants. These constants can be combined using the C-Bitwise OR operator.

tokenRanges

Returns by-reference an array of token range objects wrapped in NSValue objects.

Return Value

An array of the tag schemes corresponding to the entries in the tokenRanges array.

Availability
  • Available in OS X v10.7 and later.
Declared In
NSLinguisticTagger.h

Constants

NSLinguisticTaggerOptions

These constants specify the linguistic tagger options. They can be combined using the C-Bitwise OR operator.

enum {
   
    NSLinguisticTaggerOmitWords         = 1 << 0,
   NSLinguisticTaggerOmitPunctuation   = 1 << 1,
   NSLinguisticTaggerOmitWhitespace    = 1 << 2,
   NSLinguisticTaggerOmitOther         = 1 << 3,
   NSLinguisticTaggerJoinNames         = 1 << 4
};
typedef NSUInteger NSLinguisticTaggerOptions;
Constants
NSLinguisticTaggerOmitWords

Omit tokens of type NSLinguisticTagWord (items considered to be words).

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTaggerOmitPunctuation

Omit tokens of type NSLinguisticTagPunctuation (all punctuation).

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTaggerOmitWhitespace

Omit tokens of type NSLinguisticTagWhitespace (whitespace of all sorts).

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTaggerOmitOther

Omit tokens of type NSLinguisticTagOther (non-linguistic items such as symbols).

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTaggerJoinNames

Typically, multiple-word names will be returned as multiple tokens, following the standard tokenization practice of the tagger. If this option is set, then multiple-word names will be joined together and returned as a single token.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

Linguistic Tag Schemes

These constants specify the linguistic tag schemes used by initWithTagSchemes:options: to create the linguistic tagger instance. The method tagSchemes returns an array of the schemes the instance was created with.

NSString *const NSLinguisticTagSchemeTokenType;
NSString *const NSLinguisticTagSchemeLexicalClass;
NSString *const NSLinguisticTagSchemeNameType;
NSString *const NSLinguisticTagSchemeNameTypeOrLexicalClass;
NSString *const NSLinguisticTagSchemeLemma;
NSString *const NSLinguisticTagSchemeLanguage;
NSString *const NSLinguisticTagSchemeScript;
Constants
NSLinguisticTagSchemeTokenType

This tag scheme classifies tokens according to their broad type: word, punctuation, whitespace, etc. The possible tags are: NSLinguisticTagWord, NSLinguisticTagPunctuation, NSLinguisticTagWhitespace, or NSLinguisticTagOther. For this scheme a client may use pointer equality to compare the values with the tag constants.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTagSchemeLexicalClass

This tag scheme classifies tokens according to class: part of speech for words, type of punctuation or whitespace, etc. The value will be one of the constants specified in “NSLinguisticTagSchemeLexicalClass.” For this scheme a client may use pointer equality to compare the values with the tag constants.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTagSchemeNameType

This tag scheme classifies tokens as to whether they are part of named entities of various types or not. The possible tags are: NSLinguisticTagPersonalName, NSLinguisticTagPlaceName, or NSLinguisticTagOrganizationName. For this scheme a client may use pointer equality to compare the values with the tag constants.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTagSchemeNameTypeOrLexicalClass

This tag scheme follows NSLinguisticTagSchemeNameType for names, NSLinguisticTagSchemeLexicalClass for all other tokens. The possible tags are those specified in “NSLinguisticTagSchemeLexicalClass” or “NSLinguisticTagSchemeNameType.” For this scheme a client may use pointer equality to compare the values with the tag constants.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTagSchemeLemma

This tag scheme supplies a stem forms of the words, if known.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTagSchemeLanguage

This tag scheme tags tokens according to their script. The tag values will be standard language abbreviations such as “en”, “fr”, “de”, etc., as used with the NSOrthography class. Note that the tagger generally attempts to determine the language of text at the level of an entire sentence or paragraph, rather than word by word.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTagSchemeScript

This tag scheme tags tokens according to their script. The tag values will be standard script abbreviations such as “Latn”, “Cyrl”, “Jpan”, “Hans”, “Hant”, etc.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTagSchemeTokenTypes

These constants return the linguistic token type according to their broad type.

NSString *const NSLinguisticTagWord;
NSString *const NSLinguisticTagPunctuation;
NSString *const NSLinguisticTagWhitespace;
NSString *const NSLinguisticTagOther;
Constants
NSLinguisticTagWord

The token indicates a word.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTagPunctuation

The token indicates punctuation.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTagWhitespace

The token indicates white space of any sort.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTagOther

The token indicates a token other than those currently defined.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTagSchemeLexicalClass

These constants specify the lexical class of a token.

NSString *const NSLinguisticTagNoun;
NSString *const NSLinguisticTagVerb;
NSString *const NSLinguisticTagAdjective;
NSString *const NSLinguisticTagAdverb;
NSString *const NSLinguisticTagPronoun;
NSString *const NSLinguisticTagDeterminer;
NSString *const NSLinguisticTagParticle;
NSString *const NSLinguisticTagPreposition;
NSString *const NSLinguisticTagNumber;
NSString *const NSLinguisticTagConjunction;
NSString *const NSLinguisticTagInterjection;
NSString *const NSLinguisticTagClassifier;
NSString *const NSLinguisticTagIdiom;
NSString *const NSLinguisticTagOtherWord;
NSString *const NSLinguisticTagSentenceTerminator;
NSString *const NSLinguisticTagOpenQuote;
NSString *const NSLinguisticTagCloseQuote;
NSString *const NSLinguisticTagOpenParenthesis;
NSString *const NSLinguisticTagCloseParenthesis;
NSString *const NSLinguisticTagWordJoiner;
NSString *const NSLinguisticTagDash;
NSString *const NSLinguisticTagOtherPunctuation;
NSString *const NSLinguisticTagParagraphBreak;
NSString *const NSLinguisticTagOtherWhitespace;
Constants
NSLinguisticTagNoun

The token is a noun.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTagVerb

This token is a verb.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTagAdjective

This token is an adjective

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTagAdverb

This token is an adverb.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTagPronoun

This token is a pronoun.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTagDeterminer

This token is a determiner.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTagParticle

This token is a particle.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTagPreposition

This token is a preposition.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTagNumber

This token is a number.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTagConjunction

This token is a conjunction.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTagInterjection

This token is an interjection.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTagClassifier

This token is a classifier.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTagIdiom

This token is an idiom.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTagOtherWord

This token is some other word.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTagSentenceTerminator

This token is a sentence terminator.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTagOpenQuote

This token is an open quote.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTagCloseQuote

This token is a close quote.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTagOpenParenthesis

This token is an open parenthesis.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTagCloseParenthesis

This token is a close parenthesis.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTagWordJoiner

This token is a word joiner.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTagDash

This token is a dash.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTagOtherPunctuation

This token is punctuation not recognized as another token type.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTagParagraphBreak

This token is a paragraph break.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTagOtherWhitespace

This token is whitespace.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTagSchemeNameType

These constants define linguistic tags for specific types of words: people, places, and organizations.

NSString *const NSLinguisticTagPersonalName;
NSString *const NSLinguisticTagPlaceName;
NSString *const NSLinguisticTagOrganizationName;
Constants
NSLinguisticTagPersonalName

This token is a personal name.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTagPlaceName

This token is a place name.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.

NSLinguisticTagOrganizationName

This token is an organization name.

Available in OS X v10.7 and later.

Declared in NSLinguisticTagger.h.