Problem decoding AttributedString containing emoji

I am trying to encode an AttributedString to JSON and then decode it back to an AttributedString. But when the AttributedString both (1) contains emoji, and (2) has any attributes assigned, the decoding seems to fail, producing a truncated AttributedString. By dump-ing the decoded value, I can see that the full string is still in there (in the guts property) but it is missing in normal uses of the AttributedString.

Below is an example that reproduces the problem.

import Foundation

// An arbitrary AttributedString with emoji
var attrString = AttributedString("12345💕☺️💕☺️💕☺️12345")

// Set an attribute (doesn't seem to matter which one)
attrString.imageURL = URL(string: "http://www.dummy.com/dummy.jpg")!

// Encode the AttributedString
var encoder = JSONEncoder()
encoder.outputFormatting = .prettyPrinted
let data = try! encoder.encode(attrString)

// Print the encoded JSON
print("encoded JSON for AttributedString:")
print(String(data: data, encoding: .utf8)!)

// Output from above omitted, but it looks correct with the full string represented

// Decode the AttributedString and print it
let decoder = JSONDecoder()
let decodedAttrString = try! decoder.decode(AttributedString.self, from: data)

print("decoded AttributedString:")
print(decodedAttrString)

// Output from above is a truncated AttributedString:
// 
//    12345💕☺️ {
//  	  NSImageURL = http://www.dummy.com/dummy.jpg
//   }

print("dump of AttributedString:")
dump(decodedAttrString)

// Interestingly, `dump` shows that the full string is still in there:
//  
// ▿ 12345💕☺️ {
// 	NSImageURL = http://www.dummy.com/dummy.jpg
//  }
//   ▿ _guts: Foundation.AttributedString.Guts #0
//     - string: "12345💕☺️💕☺️💕☺️12345"
//     ▿ runs: 1 element
//       ...
// 


Would this work? (I did not test):

and better

    @CodableConfiguration(from: MyAttributes.self) var attrString = AttributedString("12345💕☺️💕☺️💕☺️12345")

This appears to be a unicode multi-byte string length defect which assumes the numbers of characters = number of bytes when decoded.

Any attributed string with multi-byte unicode scalars will truncate. eg ~! (0x7E 0x21) will not truncate and ¡! (0xC2 0xA1 0x21) will truncate to ¡ (0xC2 0xA1) when decoded.

Attributed strings with no attributes (fast path?) will not truncate when decoded.

@cmonsour

Looks like this was fixed at some point.

Looks like this was fixed at some point.

Yep. Based on the info in FB9973907 — thanks for filing that and posting it here! — this seems to have been fixed in iOS 15.5 and friends.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Yep. Based on the info in FB9973907 — thanks for filing that and posting it here! — this seems to have been fixed in iOS 15.5 and friends.

Thanks. I did briefly try to exploit this on macOS and found no obvious attack vectors. Next time I'll poke at iOS and iPadOS.

Works on macCatalyst, made for iPad, visionOS, and iOS but doesn't seem to be fixed on tvOS, macOS, watchOS 🙁.

That’s surprising, given that all of these using the same code for wrangling attributed strings. If you continue to have problems here, I recommend that you file a new bug with the details. In this case, it’s be super helpful if you attached a small test project that demonstrated the issue.

Please post your bug number, just for the record.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

So I was putting together a bug report when I realized the reason it was working in macOS was because I was using a web view and not the NSAttributedString. Here is a very simple example that shows the issue that you may be able to find out why it's not converting. It could be in the HTML attributing of the string...

import SwiftUI

struct ContentView: View {
    let html = "<html><head><meta name=\"viewport\" content=\"width=device-width\" /></head><body style=\"font-family: -apple-system;color: rgb(255,255,255);\"><p>Feature Answer 5</p><p><strong>This should be bold</strong></p><p><em>This should be italic</em></p><blockquote><p>Happy Christmas emoji should be supported! 🎅🎄🎁 😉</p></blockquote></body></html>"
    
    var attributedString: AttributedString {
        let data = Data(html.utf8)
        if let attributedString = try? NSAttributedString(data: data, options: [.documentType: NSAttributedString.DocumentType.html], documentAttributes: nil) {
            return AttributedString(attributedString)
        } else {
            return "Unable to pull NSAttributed string from data."
        }
    }
    
    var body: some View {
        VStack {
            Image(systemName: "globe")
                .imageScale(.large)
                .foregroundStyle(.tint)
            Text("Hello, world!")
            Text(attributedString)
        }
        .padding()
    }
}

#Preview {
    ContentView()
}

Text encodings in HTML are… well… complex. You can’t relying on an HTML parser defaulting to UTF-8 (they often default to Latin 1). If you add an explicit text encoding to your HTML (using <meta charset="utf-8" />), does that fix things?

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Problem decoding AttributedString containing emoji
 
 
Q