How does one get the locale-specific character set encoding on a Cocoa App

If (in terminal) I type 'env', I'll see a line that looks like:

LANG=en_GB.UTF-8

And I can parse that to get the 2-char 'en' locale-code, the sub-domain 'GB' and the character-set encoding of UTF-8. All well and good.

However in a Cocoa app, I can't seem to find the equivalent for the "UTF-8" part. This is a cross-platform app, but at this point I'll go with any solution...

I've tried:

    NSLocale *loc    = NSLocale.currentLocale;
    NSString *lang   = loc.localeIdentifier;
   
    setlocale(LC_ALL, NULL);
    char *text       = nl_langinfo(CODESET);
    if (text)
        NSString *charset = [NSString stringWithUTF8String:text];
    NSLog(@"lang:%@\nchar:%@\n",lang, charset);

which displays:

 lang:en-GB
char:US-ASCII

Also tried:

    // Search for locale info by preferred environment variable
    NSProcessInfo *pi = NSProcessInfo.processInfo;
    NSDictionary<NSString *,NSString *> *env = pi.environment;
    NSString *spec = env[@"LC_ALL"];
    if (spec == nil)
        spec = env[@"LC_CTYPE"];
    if (spec == nil)
        spec = env[@"LANG"];
    NSLog(@"spec:%@\n", spec);

which displays:

spec:(null)

Also tried:

    CFStringEncoding sys = CFStringGetSystemEncoding();
    CFStringRef enc = CFStringConvertEncodingToIANACharSetName(sys);
    NSString *nsEnc = (__bridge NSString *)enc;
    NSLog(@"iana:%@", nsEnc);
      
    enc = CFStringGetNameOfEncoding(sys);
    nsEnc = (__bridge NSString *)enc;
    NSLog(@"name:%@", nsEnc);
  
    CFStringEncoding compat = CFStringGetMostCompatibleMacStringEncoding(sys);
    enc = CFStringGetNameOfEncoding(compat);
    nsEnc = (__bridge NSString *)enc;
    NSLog(@"name:%@", nsEnc);

which displays:

iana:macintosh
name:Western (Mac OS Roman)
name:Western (Mac OS Roman)

Any ideas ?

A character encoding reflects the way the coded character set is mapped to bytes in memory. When using Terminal on macOS, you interact with the system via inputting characters, and the system needs to know how the characters are laid out in memory, which is the character encoding. Terminal by default uses UTF-8, which is described in the LANG environment, as you have noticed.

In a Cocoa app, you handle text, which is a series of characters, via the Cocoa framework. Typically, you create a piece of text using String in Swift, or NSString in Objective C, and pass it to the system for text rendering.

When creating a string type, you need to specify the encoding of your content – You can see that the init methods of Swift.String and NSString typically have an encoding parameter. If you don't, you are assumed to use the default encoding, which is UTF-8 for Swift.String and UTF-16 for NSString. Other than that, you most likely don't need to care the character encoding.

Having said that, I am curious why you need the encoding information for your Cocoa app and how you would use it. If you can share more context, I can probably comment more concretely.

Best,
——
Ziqiao Chen
 Worldwide Developer Relations.

Hi :)

The context is a cross-platform email application. Text can arrive encoded in any one of a large number of encodings, and similarly be sent out in the encoding that the recipient has deemed as the preferred encoding.

Because it's cross-platform, I'm using iconv (shipped by default on the Mac) to do the test for "can we decode this format to something the local platform can display", and similarly in reverse. The iconv call requires a 'from' and 'to' encoding specification. Depending on direction, the "local default" encoding spec is what I'm trying to find with the above.

Now I could just say "It's a Mac, we'll use UTF-8", but every other platform responds to either the nl_langinfo() call, or at least one of the environment variables are set. Seems odd that the platform I'm developing it on is the one that doesn't provide the API :)

Having said all that...

On the Mac, the design for the email app is going to closely follow that of Mail.app - where there's a LaunchAgent that handles all the background stuff and is the only thing with permission to actually read email content. The application (iOS or MacOS) will interact with the launchAgent over XPC, and if the launchAgent has a "Terminal.app" kind of environment rather than the restricted one I see for Applications, I could obtain the LANG variable contents after launching the app, via an XPC call.

For an email, you might look into the content-type in the email header for the text encoding. If that piece of information isn't enough, you will have to guess. Although there are some ideas that can help the guess, I don't see any way that can do it accurately.

For a piece of text that you don't know its encoding, if it is from macOS, it seems reasonable to me to bet UTF-8 in the first place, given that some built-in apps, including TextEdit and Mail, use UTF-8 as the default encoding (and that APFS uses UTF-8 for file names).

I don't have the insight about why the system doesn't provide a "local default" encoding. Folks who have are more than welcome to weigh in. However, it will be a bit confusing to me if we say, or provide an API to indicate, that the "local default" encoding for the system is UTF-8, because, as mentioned, NSString, which has been used quite extensively, uses UTF-16 by default. In this case, it seems reasonable that the system provides nothing and lets developers figure out what encoding they are handling, if not better.

Best,
——
Ziqiao Chen
 Worldwide Developer Relations.

How does one get the locale-specific character set encoding on a Cocoa App
 
 
Q