NSCharacterSet Class Reference

Inherits from
Conforms to
Framework
/System/Library/Frameworks/Foundation.framework
Availability
Available in OS X v10.0 and later.
Companion guide
Declared in
NSCharacterSet.h
NSURL.h
Related sample code

Overview

An NSCharacterSet object represents a set of Unicode-compliant characters. NSString and NSScanner objects use NSCharacterSet objects to group characters together for searching operations, so that they can find any of a particular set of characters during a search. The cluster’s two public classes, NSCharacterSet and NSMutableCharacterSet, declare the programmatic interface for static and dynamic character sets, respectively.

The objects you create using these classes are referred to as character set objects (and when no confusion will result, merely as character sets). Because of the nature of class clusters, character set objects aren’t actual instances of the NSCharacterSet or NSMutableCharacterSet classes but of one of their private subclasses. Although a character set object’s class is private, its interface is public, as declared by these abstract superclasses, NSCharacterSet and NSMutableCharacterSet. The character set classes adopt the NSCopying and NSMutableCopying protocols, making it convenient to convert a character set of one type to the other.

The NSCharacterSet class declares the programmatic interface for an object that manages a set of Unicode characters (see the NSString class cluster specification for information on Unicode). NSCharacterSet’s principal primitive method, characterIsMember:, provides the basis for all other instance methods in its interface. A subclass of NSCharacterSet needs only to implement this method, plus mutableCopyWithZone:, for proper behavior. For optimal performance, a subclass should also override bitmapRepresentation, which otherwise works by invoking characterIsMember: for every possible Unicode value.

NSCharacterSet is “toll-free bridged” with its Core Foundation counterpart, CFCharacterSetRef. See “Toll-Free Bridging” for more information on toll-free bridging.

The mutable subclass of NSCharacterSet is NSMutableCharacterSet.

Adopted Protocols

NSCoding
NSCopying
NSMutableCopying

Tasks

Creating a Standard Character Set

Creating a Character Set for URL Encoding

Creating a Custom Character Set

Creating and Managing Character Sets as Bitmap Representations

Testing Set Membership

Class Methods

alphanumericCharacterSet

Returns a character set containing the characters in the categories Letters, Marks, and Numbers.

+ (id)alphanumericCharacterSet
Return Value

A character set containing the characters in the categories Letters, Marks, and Numbers.

Discussion

Informally, this set is the set of all characters used as basic units of alphabets, syllabaries, ideographs, and digits.

Availability
  • Available in OS X v10.0 and later.
Related Sample Code
Declared In
NSCharacterSet.h

capitalizedLetterCharacterSet

Returns a character set containing the characters in the category of Titlecase Letters.

+ (id)capitalizedLetterCharacterSet
Return Value

A character set containing the characters in the category of Titlecase Letters.

Availability
  • Available in OS X v10.2 and later.
Declared In
NSCharacterSet.h

characterSetWithBitmapRepresentation:

Returns a character set containing characters determined by a given bitmap representation.

+ (id)characterSetWithBitmapRepresentation:(NSData *)data
Parameters
data

A bitmap representation of a character set.

Return Value

A character set containing characters determined by data.

Discussion

This method is useful for creating a character set object with data from a file or other external data source.

A raw bitmap representation of a character set is a byte array of 2^16 bits (that is, 8192 bytes). The value of the bit at position n represents the presence in the character set of the character with decimal Unicode value n. To add a character with decimal Unicode value n to a raw bitmap representation, use a statement such as the following:

unsigned char bitmapRep[8192];
bitmapRep[n >> 3] |= (((unsigned int)1) << (n & 7));

To remove that character:

bitmapRep[n >> 3] &= ~(((unsigned int)1) << (n & 7));
Availability
  • Available in OS X v10.0 and later.
Declared In
NSCharacterSet.h

characterSetWithCharactersInString:

Returns a character set containing the characters in a given string.

+ (id)characterSetWithCharactersInString:(NSString *)aString
Parameters
aString

A string containing characters for the new character set.

Return Value

A character set containing the characters in aString. Returns an empty character set if aString is empty.

Availability
  • Available in OS X v10.0 and later.
Declared In
NSCharacterSet.h

characterSetWithContentsOfFile:

Returns a character set read from the bitmap representation stored in the file a given path.

+ (id)characterSetWithContentsOfFile:(NSString *)path
Parameters
path

A path to a file containing a bitmap representation of a character set. The path name must end with the extension .bitmap.

Return Value

A character set read from the bitmap representation stored in the file at path.

Discussion

To read a bitmap representation from any file, use the NSData methoddataWithContentsOfFile:options:error: and pass the result to characterSetWithBitmapRepresentation:.

This method doesn’t use filenames to check for the uniqueness of the character sets it creates. To prevent duplication of character sets in memory, cache them and make them available through an API that checks whether the requested set has already been loaded.

Availability
  • Available in OS X v10.0 and later.
Declared In
NSCharacterSet.h

characterSetWithRange:

Returns a character set containing characters with Unicode values in a given range.

+ (id)characterSetWithRange:(NSRange)aRange
Parameters
aRange

A range of Unicode values.

aRange.location is the value of the first character to return; aRange.location + aRange.length– 1 is the value of the last.

Return Value

A character set containing characters whose Unicode values are given by aRange. If aRange.length is 0, returns an empty character set.

Discussion

This code excerpt creates a character set object containing the lowercase English alphabetic characters:

NSRange lcEnglishRange;
NSCharacterSet *lcEnglishLetters;
 
lcEnglishRange.location = (unsigned int)'a';
lcEnglishRange.length = 26;
lcEnglishLetters = [NSCharacterSet characterSetWithRange:lcEnglishRange];
Availability
  • Available in OS X v10.0 and later.
Declared In
NSCharacterSet.h

controlCharacterSet

Returns a character set containing the characters in the categories of Control or Format Characters.

+ (id)controlCharacterSet
Return Value

A character set containing the characters in the categories of Control or Format Characters.

Discussion

These characters are specifically the Unicode values U+0000 to U+001F and U+007F to U+009F.

Availability
  • Available in OS X v10.0 and later.
Declared In
NSCharacterSet.h

decimalDigitCharacterSet

Returns a character set containing the characters in the category of Decimal Numbers.

+ (id)decimalDigitCharacterSet
Return Value

A character set containing the characters in the category of Decimal Numbers.

Discussion

Informally, this set is the set of all characters used to represent the decimal values 0 through 9. These characters include, for example, the decimal digits of the Indic scripts and Arabic.

Availability
  • Available in OS X v10.0 and later.
Declared In
NSCharacterSet.h

decomposableCharacterSet

Returns a character set containing all individual Unicode characters that can also be represented as composed character sequences.

+ (id)decomposableCharacterSet
Return Value

A character set containing all individual Unicode characters that can also be represented as composed character sequences (such as for letters with accents), by the definition of “standard decomposition” in version 3.2 of the Unicode character encoding standard.

Discussion

These characters include compatibility characters as well as pre-composed characters.

Availability
  • Available in OS X v10.0 and later.
Declared In
NSCharacterSet.h

illegalCharacterSet

Returns a character set containing values in the category of Non-Characters or that have not yet been defined in version 3.2 of the Unicode standard.

+ (id)illegalCharacterSet
Return Value

A character set containing values in the category of Non-Characters or that have not yet been defined in version 3.2 of the Unicode standard.

Availability
  • Available in OS X v10.0 and later.
Declared In
NSCharacterSet.h

letterCharacterSet

Returns a character set containing the characters in the categories Letters and Marks.

+ (id)letterCharacterSet
Return Value

A character set containing the characters in the categories Letters and Marks.

Discussion

Informally, this set is the set of all characters used as letters of alphabets and ideographs.

Availability
  • Available in OS X v10.0 and later.
Declared In
NSCharacterSet.h

lowercaseLetterCharacterSet

Returns a character set containing the characters in the category of Lowercase Letters.

+ (id)lowercaseLetterCharacterSet
Return Value

A character set containing the characters in the category of Lowercase Letters.

Discussion

Informally, this set is the set of all characters used as lowercase letters in alphabets that make case distinctions.

Availability
  • Available in OS X v10.0 and later.
Declared In
NSCharacterSet.h

newlineCharacterSet

Returns a character set containing the newline characters.

+ (id)newlineCharacterSet
Return Value

A character set containing the newline characters (U+000AU+000D, U+0085).

Availability
  • Available in OS X v10.5 and later.
Declared In
NSCharacterSet.h

nonBaseCharacterSet

Returns a character set containing the characters in the category of Marks.

+ (id)nonBaseCharacterSet
Return Value

A character set containing the characters in the category of Marks.

Discussion

This set is also defined as all legal Unicode characters with a non-spacing priority greater than 0. Informally, this set is the set of all characters used as modifiers of base characters.

Availability
  • Available in OS X v10.0 and later.
Declared In
NSCharacterSet.h

punctuationCharacterSet

Returns a character set containing the characters in the category of Punctuation.

+ (id)punctuationCharacterSet
Return Value

A character set containing the characters in the category of Punctuation.

Discussion

Informally, this set is the set of all non-whitespace characters used to separate linguistic units in scripts, such as periods, dashes, parentheses, and so on.

Availability
  • Available in OS X v10.0 and later.
Declared In
NSCharacterSet.h

symbolCharacterSet

Returns a character set containing the characters in the category of Symbols.

+ (id)symbolCharacterSet
Return Value

A character set containing the characters in the category of Symbols.

Discussion

These characters include, for example, the dollar sign ($) and the plus (+) sign.

Availability
  • Available in OS X v10.3 and later.
Declared In
NSCharacterSet.h

uppercaseLetterCharacterSet

Returns a character set containing the characters in the categories of Uppercase Letters and Titlecase Letters.

+ (id)uppercaseLetterCharacterSet
Return Value

A character set containing the characters in the categories of Uppercase Letters and Titlecase Letters.

Discussion

Informally, this set is the set of all characters used as uppercase letters in alphabets that make case distinctions.

Availability
  • Available in OS X v10.0 and later.
Declared In
NSCharacterSet.h

URLFragmentAllowedCharacterSet

Returns the character set for characters allowed in a fragment URL component.

+ (id)URLFragmentAllowedCharacterSet
Discussion

The fragment component of a URL is the component after a # symbol. For example, in the URL http://www.example.com/index.html#jumpLocation, the fragment is jumpLocation.

Availability
  • Available in OS X v10.9 and later.
Declared In
NSURL.h

URLHostAllowedCharacterSet

Returns the character set for characters allowed in a host URL subcomponent.

+ (id)URLHostAllowedCharacterSet
Discussion

The host component of a URL is usually the component immediately after the first two leading slashes. If the URL contains a username and password, the host component is the component after the @ sign. For example, in the URL http://username:password@www.example.com/index.html, the host component is www.example.com.

Availability
  • Available in OS X v10.9 and later.
Declared In
NSURL.h

URLPasswordAllowedCharacterSet

Returns the character set for characters allowed in a password URL subcomponent.

+ (id)URLPasswordAllowedCharacterSet
Discussion

The password component of a URL is the component immediately following the colon after the username component of the URL, and ends at the @ sign. For example, in the URL http://username:password@www.example.com/index.html, the pass component is password.

Availability
  • Available in OS X v10.9 and later.
Declared In
NSURL.h

URLPathAllowedCharacterSet

Returns the character set for characters allowed in a path URL component.

+ (id)URLPathAllowedCharacterSet
Discussion

The path component of a URL is the component immediately following the host component (if present). It ends wherever the query or fragment component begins. For example, in the URL http://www.example.com/index.php?key1=value1, the path component is /index.php.

Availability
  • Available in OS X v10.9 and later.
Declared In
NSURL.h

URLQueryAllowedCharacterSet

Returns the character set for characters allowed in a query URL component.

+ (id)URLQueryAllowedCharacterSet
Discussion

The query component of a URL is the component immediately following a question mark (?). For example, in the URL http://www.example.com/index.php?key1=value1#jumpLink, the query component is key1=value1.

Availability
  • Available in OS X v10.9 and later.
Declared In
NSURL.h

URLUserAllowedCharacterSet

Returns the character set for characters allowed in a user URL subcomponent.

+ (id)URLUserAllowedCharacterSet
Discussion

The user component of a URL is an optional component that precedes the host component, and ends at either a colon (if a password is specified) or an @ sign (if no password is specified). For example, in the URL http://username:password@www.example.com/index.html, the user component is username.

Availability
  • Available in OS X v10.9 and later.
Declared In
NSURL.h

whitespaceAndNewlineCharacterSet

Returns a character set containing Unicode General Category Z*, U000A ~ U000D, and U0085.

+ (id)whitespaceAndNewlineCharacterSet
Return Value

A character set containing Unicode General Category Z*, U000A ~ U000D, and U0085.

Availability
  • Available in OS X v10.0 and later.
Declared In
NSCharacterSet.h

whitespaceCharacterSet

Returns a character set containing only the in-line whitespace characters space (U+0020) and tab (U+0009).

+ (id)whitespaceCharacterSet
Return Value

A character set containing only the in-line whitespace characters space (U+0020) and tab (U+0009).

Discussion

This set doesn’t contain the newline or carriage return characters.

Availability
  • Available in OS X v10.0 and later.
Related Sample Code
Declared In
NSCharacterSet.h

Instance Methods

bitmapRepresentation

Returns an NSData object encoding the receiver in binary format.

- (NSData *)bitmapRepresentation
Return Value

An NSData object encoding the receiver in binary format.

Discussion

This format is suitable for saving to a file or otherwise transmitting or archiving.

A raw bitmap representation of a character set is a byte array of 2^16 bits (that is, 8192 bytes). The value of the bit at position n represents the presence in the character set of the character with decimal Unicode value n. To test for the presence of a character with decimal Unicode value n in a raw bitmap representation, use an expression such as the following:

unsigned char bitmapRep[8192];
if (bitmapRep[n >> 3] & (((unsigned int)1) << (n  & 7))) {
    /* Character is present. */
}
Availability
  • Available in OS X v10.0 and later.
Declared In
NSCharacterSet.h

characterIsMember:

Returns a Boolean value that indicates whether a given character is in the receiver.

- (BOOL)characterIsMember:(unichar)aCharacter
Parameters
aCharacter

The character to test for membership of the receiver.

Return Value

YES if aCharacter is in the receiving character set, otherwise NO.

Availability
  • Available in OS X v10.0 and later.
Declared In
NSCharacterSet.h

hasMemberInPlane:

Returns a Boolean value that indicates whether the receiver has at least one member in a given character plane.

- (BOOL)hasMemberInPlane:(uint8_t)thePlane
Parameters
thePlane

A character plane.

Return Value

YES if the receiver has at least one member in thePlane, otherwise NO.

Discussion

This method makes it easier to find the plane containing the members of the current character set. The Basic Multilingual Plane is plane 0.

Availability
  • Available in OS X v10.2 and later.
Declared In
NSCharacterSet.h

invertedSet

Returns a character set containing only characters that don’t exist in the receiver.

- (NSCharacterSet *)invertedSet
Return Value

A character set containing only characters that don’t exist in the receiver.

Discussion

Inverting an immutable character set is much more efficient than inverting a mutable character set.

Availability
  • Available in OS X v10.0 and later.
See Also
  • invert (NSMutableCharacterSet)
Declared In
NSCharacterSet.h

isSupersetOfSet:

Returns a Boolean value that indicates whether the receiver is a superset of another given character set.

- (BOOL)isSupersetOfSet:(NSCharacterSet *)theOtherSet
Parameters
theOtherSet

A character set.

Return Value

YES if the receiver is a superset of theOtherSet, otherwise NO.

Availability
  • Available in OS X v10.2 and later.
Declared In
NSCharacterSet.h

longCharacterIsMember:

Returns a Boolean value that indicates whether a given long character is a member of the receiver.

- (BOOL)longCharacterIsMember:(UTF32Char)theLongChar
Parameters
theLongChar

A UTF32 character.

Return Value

YES if theLongChar is in the receiver, otherwise NO.

Discussion

This method supports the specification of 32-bit characters.

Availability
  • Available in OS X v10.2 and later.
Declared In
NSCharacterSet.h

Constants

NSOpenStepUnicodeReservedBase

Specifies lower bound for a Unicode character range reserved for Apple’s corporate use.

enum {
   NSOpenStepUnicodeReservedBase = 0xF400
};
Constants
NSOpenStepUnicodeReservedBase

Specifies lower bound for a Unicode character range reserved for Apple’s corporate use (the range is 0xF400–0xF8FF).

Available in OS X v10.0 and later.

Declared in NSCharacterSet.h.

Declared In
NSCharacterSet.h