Encoding URL Data

To URL-encode strings, use the Core Foundation functions CFURLCreateStringByAddingPercentEscapes and CFURLCreateStringByReplacingPercentEscapesUsingEncoding. These functions allow you to specify a list of characters to encode in addition to high-ASCII (0x800xff) and nonprintable characters.

According to RFC 3986, the reserved characters in a URL are:

      reserved    = gen-delims / sub-delims
 
      gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"
 
      sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="
 

Therefore, to properly URL-encode a UTF-8 string for inclusion in a URL, you should do the following:

CFStringRef originalString = ...
 
CFStringRef encodedString = CFURLCreateStringByAddingPercentEscapes(
    kCFAllocatorDefault,
    originalString,
    NULL,
    CFSTR(":/?#[]@!$&'()*+,;="),
    kCFStringEncodingUTF8);

If you want to decode a URL fragment, you must first split the URL string into its constituent parts (fields and path parts). If you do not decode it, you will be unable to tell the difference (for example) between an encoded ampersand that was originally part of the contents of a field and a bare ampersand that indicated the end of the field.

After you have broken the URL into parts, you can decode each part as follows:

CFStringRef decodedString = CFURLCreateStringByReplacingPercentEscapesUsingEncoding(
    kCFAllocatorDefault,
    encodedString,
    CFSTR(""),
    kCFStringEncodingUTF8);