Guesses a language of a given string and returns the guess as a BCP 47 string.


CFStringRef CFStringTokenizerCopyBestStringLanguage(CFStringRef string, CFRange range);



The string to test to identify the language.


The range of string to use for the test. If NULL, the first few hundred characters of the string are examined.

Return Value

A language in BCP 47 form, or NULL if the language in string could not be identified. Ownership follows the The Create Rule.


The result is not guaranteed to be accurate. Typically, the function requires 200-400 characters to reliably guess the language of a string.

CFStringTokenizer recognizes the following languages:

ar (Arabic), bg (Bulgarian), cs (Czech), da (Danish), de (German), el (Greek), en (English), es (Spanish), fi (Finnish), fr (French), he (Hebrew), hr (Croatian), hu (Hungarian), is (Icelandic), it (Italian), ja (Japanese), ko (Korean), nb (Norwegian Bokmål), nl (Dutch), pl (Polish), pt (Portuguese), ro (Romanian), ru (Russian), sk (Slovak), sv (Swedish), th (Thai), tr (Turkish), uk (Ukrainian), zh-Hans (Simplified Chinese), zh-Hant (Traditional Chinese).

