Character Sets

In Core Foundation, a character set, as represented by a CFCharacterSet object, represents a set of Unicode characters. Functions can use character sets to group characters together for searching and parsing operations, so that they can find or exclude any of a particular set of characters during a search. Aside from testing for membership in a character set, a character-set object simply holds a set of character values to limit operations on strings.

You use character sets in search, parsing, and comparison operations involving strings. Programmatic interfaces that require references to CFCharacterSet objects are currently under development in both Core Foundation and Carbon.

To obtain a CFCharacterSet object that can be passed into a function, you can either use one of the predefined character sets or create your own. To use one of the predefined sets—including such things as whitespace, alphanumeric characters, and decimal digits—call CFCharacterSetGetPredefined with one of the CFCharacterSetPredefinedSet constants. Several CFCharacterSet functions create character sets from strings and bitmapped data and others allow you to create mutable character sets. You can use a predefined character set as a starting point for building a custom set by making a mutable copy of it and changing that.

Because character sets often participate in performance-critical code, you should be aware of the aspects of their use that can affect the performance of your application. Mutable character sets are generally much more expensive than immutable character sets. They consume more memory and are costly to invert (an operation often performed in scanning a string). Because of this, you should follow these guidelines: