Remove \u{ef} from a string that comes from NSAttributedString

I am creating an e-mail handling app.

The app fetches e-mail messages and shows simplified texts.

The app uses NSAttributedString to strip useless html tags but the output sometimes contains "\u{ef}" like characters.

That's OK but I don't need the characters.

So I tried to remove the characters like below but I couldn't.

How can I do that?


let attributedOptions: [NSAttributedString.DocumentReadingOptionKey: Any] = [

NSAttributedString.DocumentReadingOptionKey.documentType: NSAttributedString.DocumentType.html,

NSAttributedString.DocumentReadingOptionKey.characterEncoding: String.Encoding.utf8.rawValue

]

let attributedString = try? NSMutableAttributedString(data: encoded, options: attributedOptions, documentAttributes: nil)


1)

var stripped = attributedString.string.replacingOccurrences(of: "\u{ef}", with: "", options: NSString.CompareOptions.literal, range:nil)


2)

var stripped2 = attributedString?.string.trimmingCharacters(in: ["\u{ef}"])


The message of TestFlight's new version available notification e-mail seems to contain the strange characters.

U+00EF is Latin Small Letter I With Diaeresis, which seems like a character you’d want to keep. It’s hard to say what’s going on here without knowing more about the input data. Can you post a hex dump of a small

Data
value that reproduces this problem?

Note The easiest way to get a hex dump of a

Data
value is via this code:
NSLog("%@", myData as NSData)

Share and Enjoy

Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"

I tested (in playground)


let attributedString = NSMutableAttributedString(string: "Hello ï Hello")     // ï is "\u{ef}"
var stripped = attributedString.string.replacingOccurrences(of: "\u{ef}", with: "", options: NSString.CompareOptions.literal, range:nil)

and get:

"Hello Hello"


Which is the expected result.


So most likely, as Quinn pointed out, there is an issue in data.

Thanks for your replies.

Below is the code to reproduce my problem.

I believe "raw", "stripped" and "stripped2" contains "\u{ef}" because Xcode says the values are "(String) \n\u{ef}\n\u{ef}\n" when I break at the line.


What I want to do is trim the useless "\n" characters but "\u{ef}" seems to avoid that.


var demo = """
<div style="padding-bottom: 20px;"></div><div><!DOCTYPE HTML PUBLIC "-/
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>iTunes Connect</title>
<meta http-equiv="Content-Type" content=
"text/html; charset=utf-8" />
<style type="text/css">
/<![CDATA[*/
        A { text-decoration: none; }
        A:link { color: #0088FF; text-decoration: none; }
        A:visited { text-decoration: none; }
        A:active { color: #696969; text-decoration: underline;}
        A:hover { text-decoration: underline; }
        BODY, TD, P, UL, OL {
            font-family: Helvetica, Lucida Grande, Arial, sans-serif;
            font-size: 14px;
            line-height: 18px;
            color: #666666;
            text-align: left;
        }
        P {
            margin-bottom: 1em;
        }
        H1 {
            line-height:145%;
        }
        HR {
            border: 0;
            border-top: 1px solid #dddddd;
            margin: 10px 0px 15px 0px;
        }
        @media only screen and (max-device-width: 481px) and (min-device-pixel-ratio : 2), only screen and (min-device-width: 481px) and (-webkit-min-device-pixel-ratio : 2) {
            *[id=header] { background:url('https://itunesconnect.apple.com/itc/images/email/email-itc-logo@2X.png') no-repeat 0 top; -webkit-background-size: 173px 45px; }
            *[id=hr-fade] { background:url('https://itunesconnect.apple.com/itc/images/email/email-hr@2X.png') no-repeat 0 top; -webkit-background-size: 648px 18px; }
        }
        @media only screen and (max-device-width: 480px) {
            table[class="table"], td[class="cell"] {
                width: 270px !important;
            }
            table[class="table3"], td[class="cell3"] {
                width: 270px !important;
                text-align: left !important;
            }
            img[id="header"] {
                width: 135px !important;
                height: 35px !important;
            }
            td[id="header"] {
                -webkit-background-size: 135px 35px;
            }
            table[class="footer_table"] {
                display: none !important;
            }
            .hide { max-height: none !important; font-size: 11px !important; display: block !important; }
            p {
                text-align: left !important;
            }
        }
/]]>*/
</style>
</head>
<body>
<table class="table" border="0" cellspacing="0" cellpadding="0"
align="center" width="700">
<tr align="center">
<td class="spacer" align="left" valign="top" bgcolor="#FFFFFF"
width="20"><img src=
"https://itunesconnect.apple.com/itc/images/email/spacer.gif"
border="0" alt="" width="20" height="1" /></td>
<td align="left">
<table class="table3" width="648" border="0" cellpadding="0"
cellspacing="0" align="center" bgcolor="#FFFFFF">
<tr>
<td class="cell3" id="header" width="648" align="left" style=
"padding-bottom: 10px;"><img id="header" src=
"https://itunesconnect.apple.com/itc/images/email/email-itc-logo@2X.png"
border="0" alt="iTunes Connect" width="173" height="45" name=
"header" /></td>
</tr>
</table>
</body>
"""
        guard let encoded = demo.data(using: .utf8) else {
            return ""
        }
        let attributedOptions: [NSAttributedString.DocumentReadingOptionKey: Any] = [
            NSAttributedString.DocumentReadingOptionKey.documentType: NSAttributedString.DocumentType.html,
            NSAttributedString.DocumentReadingOptionKey.characterEncoding: String.Encoding.utf8.rawValue
        ]
        guard let attributedString = try? NSMutableAttributedString(data: encoded, options: attributedOptions, documentAttributes: nil) else {
            return ""
        }
        let raw = attributedString.string
        let stripped = attributedString.string.replacingOccurrences(of: "\u{ef}", with: "", options: NSString.CompareOptions.literal, range:nil)
        let stripped2 =  attributedString.string.trimmingCharacters(in: ["\u{ef}"])

I just post the code of reproduction but got "Currently being moderated."

I put the soure of NSAttributedString that contains url and that might be not good.


I tried to remove the url strings in the source but that didn't reproduce the problem.

A hex dump of Data is below.

So long...


"<3c646976 20737479 6c653d22 70616464 696e672d 626f7474 6f6d3a20 32307078 3b223e3c 2f646976 3e3c6469 763e3c21 444f4354 59504520 48544d4c 20505542 4c494320 222d2f2f 5733432f 2f445444 2048544d 4c20342e 30312054 72616e73 6974696f 6e616c2f 2f454e22 3e0a3c68 746d6c20 786d6c6e 733d2268 7474703a 2f223e0a 3c686561 643e0a3c 7469746c 653e6954 756e6573 20436f6e 6e656374 3c2f7469 746c653e 0a3c6d65 74612068 7474702d 65717569 763d2243 6f6e7465 6e742d54 79706522 20636f6e 74656e74 3d0a2274 6578742f 68746d6c 3b206368 61727365 743d7574 662d3822 202f3e0a 3c737479 6c652074 7970653d 22746578 742f6373 73223e0a 2f2a3c21 5b434441 54415b2a 2f0a2020 20202020 20204120 7b207465 78742d64 65636f72 6174696f 6e3a206e 6f6e653b 207d0a0a 20202020 20202020 413a6c69 6e6b207b 20636f6c 6f723a20 23303038 3846463b 20746578 742d6465 636f7261 74696f6e 3a206e6f 6e653b20 7d0a0a20 20202020 20202041 3a766973 69746564 207b2074 6578742d 6465636f 72617469 6f6e3a20 6e6f6e65 3b207d0a 0a202020 20202020 20413a61 63746976 65207b20 636f6c6f 723a2023 36393639 36393b20 74657874 2d646563 6f726174 696f6e3a 20756e64 65726c69 6e653b7d 0a0a2020 20202020 2020413a 686f7665 72207b20 74657874 2d646563 6f726174 696f6e3a 20756e64 65726c69 6e653b20 7d0a0a20 20202020 20202042 4f44592c 2054442c 20502c20 554c2c20 4f4c207b 0a202020 20202020 20202020 20666f6e 742d6661 6d696c79 3a204865 6c766574 6963612c 204c7563 69646120 4772616e 64652c20 41726961 6c2c2073 616e732d 73657269 663b0a20 20202020 20202020 20202066 6f6e742d 73697a65 3a203134 70783b0a 20202020 20202020 20202020 6c696e65 2d686569 6768743a 20313870 783b0a20 20202020 20202020 20202063 6f6c6f72 3a202336 36363636 363b0a20 20202020 20202020 20202074 6578742d 616c6967 6e3a206c 6566743b 0a202020 20202020 207d0a0a 20202020 20202020 50207b0a 20202020 20202020 20202020 6d617267 696e2d62 6f74746f 6d3a2031 656d3b0a 20202020 20202020 7d0a0a20 20202020 20202048 31207b0a 20202020 20202020 20202020 6c696e65 2d686569 6768743a 31343525 3b0a2020 20202020 20207d0a 0a202020 20202020 20485220 7b0a2020 20202020 20202020 2020626f 72646572 3a20303b 0a202020 20202020 20202020 20626f72 6465722d 746f703a 20317078 20736f6c 69642023 64646464 64643b0a 20202020 20202020 20202020 6d617267 696e3a20 31307078 20307078 20313570 78203070 783b0a20 20202020 2020207d 0a0a2020 20202020 2020406d 65646961 206f6e6c 79207363 7265656e 20616e64 20286d61 782d6465 76696365 2d776964 74683a20 34383170 78292061 6e642028 6d696e2d 64657669 63652d70 6978656c 2d726174 696f203a 2032292c 206f6e6c 79207363 7265656e 20616e64 20286d69 6e2d6465 76696365 2d776964 74683a20 34383170 78292061 6e642028 2d776562 6b69742d 6d696e2d 64657669 63652d70 6978656c 2d726174 696f203a 20322920 7b0a2020 20202020 20202020 20202a5b 69643d68 65616465 725d207b 20626163 6b67726f 756e643a 75726c28 27687474 70733a2f 2729206e 6f2d7265 70656174 20302074 6f703b20 2d776562 6b69742d 6261636b 67726f75 6e642d73 697a653a 20313733 70782034 3570783b 207d0a20 20202020 20202020 2020202a 5b69643d 68722d66 6164655d 207b2062 61636b67 726f756e 643a7572 6c282768 74747073 3a2f2729 206e6f2d 72657065 61742030 20746f70 3b202d77 65626b69 742d6261 636b6772 6f756e64 2d73697a 653a2036 34387078 20313870 783b207d 0a202020 20202020 207d0a0a 20202020 20202020 406d6564 6961206f 6e6c7920 73637265 656e2061 6e642028 6d61782d 64657669 63652d77 69647468 3a203438 30707829 207b0a0a 20202020 20202020 20202020 7461626c 655b636c 6173733d 22746162 6c65225d 2c207464 5b636c61 73733d22 63656c6c 225d207b 0a202020 20202020 20202020 20202020 20776964 74683a20 32373070 78202169 6d706f72 74616e74 3b0a2020 20202020 20202020 20207d0a 20202020 20202020 20202020 7461626c 655b636c 6173733d 22746162 6c653322 5d2c2074 645b636c 6173733d 2263656c 6c33225d 207b0a20 20202020 20202020 20202020 20202077 69647468 3a203237 30707820 21696d70 6f727461 6e743b0a 20202020 20202020 20202020 20202020 74657874 2d616c69 676e3a20 6c656674 2021696d 706f7274 616e743b 0a202020 20202020 20202020 207d0a20 20202020 20202020 20202069 6d675b69 643d2268 65616465 72225d20 7b0a2020 20202020 20202020 20202020 20207769 6474683a 20313335 70782021 696d706f 7274616e 743b0a20 20202020 20202020 20202020 20202068 65696768 743a2033 35707820 21696d70 6f727461 6e743b0a 20202020 20202020 20202020 7d0a2020 20202020 20202020 20207464 5b69643d 22686561 64657222 5d207b0a 20202020 20202020 20202020 20202020 2d776562 6b69742d 6261636b 67726f75 6e642d73 697a653a 20313335 70782033 3570783b 0a202020 20202020 20202020 207d0a20 20202020 20202020 20202074 61626c65 5b636c61 73733d22 666f6f74 65725f74 61626c65 225d207b 0a202020 20202020 20202020 20202020 20646973 706c6179 3a206e6f 6e652021 696d706f 7274616e 743b0a20 20202020 20202020 2020207d 0a0a2020 20202020 20202020 20202e68 69646520 7b206d61 782d6865 69676874 3a206e6f 6e652021 696d706f 7274616e 743b2066 6f6e742d 73697a65 3a203131 70782021 696d706f 7274616e 743b2064 6973706c 61793a20 626c6f63 6b202169 6d706f72 74616e74 3b207d0a 0a202020 20202020 20202020 2070207b 0a202020 20202020 20202020 20202020 20746578 742d616c 69676e3a 206c6566 74202169 6d706f72 74616e74 3b0a2020 20202020 20202020 20207d0a 0a202020 20202020 207d0a2f 2a5d5d3e 2a2f0a3c 2f737479 6c653e0a 3c2f6865 61643e0a 3c626f64 793e0a3c 7461626c 6520636c 6173733d 22746162 6c652220 626f7264 65723d22 30222063 656c6c73 70616369 6e673d22 30222063 656c6c70 61646469 6e673d22 30220a61 6c69676e 3d226365 6e746572 22207769 6474683d 22373030 223e0a3c 74722061 6c69676e 3d226365 6e746572 223e0a3c 74642063 6c617373 3d227370 61636572 2220616c 69676e3d 226c6566 74222076 616c6967 6e3d2274 6f702220 6267636f 6c6f723d 22234646 46464646 220a7769 6474683d 22323022 3e3c696d 67207372 633d0a22 68747470 733a2f22 0a626f72 6465723d 22302220 616c743d 22222077 69647468 3d223230 22206865 69676874 3d223122 202f3e3c 2f74643e 0a3c7464 20616c69 676e3d22 6c656674 223e0a3c 7461626c 6520636c 6173733d 22746162 6c653322 20776964 74683d22 36343822 20626f72 6465723d 22302220 63656c6c 70616464 696e673d 2230220a 63656c6c 73706163 696e673d 22302220 616c6967 6e3d2263 656e7465 72222062 67636f6c 6f723d22 23464646 46464622 3e0a3c74 723e0a3c 74642063 6c617373 3d226365 6c6c3322 2069643d 22686561 64657222 20776964 74683d22 36343822 20616c69 676e3d22 6c656674 22207374 796c653d 0a227061 6464696e 672d626f 74746f6d 3a203130 70783b22 3e3c696d 67206964 3d226865 61646572 22207372 633d0a22 68747470 733a2f22 0a626f72 6465723d 22302220 616c743d 22695475 6e657320 436f6e6e 65637422 20776964 74683d22 31373322 20686569 6768743d 22343522 206e616d 653d0a22 68656164 65722220 2f3e3c2f 74643e0a 3c2f7472 3e0a3c2f 7461626c 653e0a3c 2f626f64 793e>"

Here is what I get once converted in ascii (which seems the right coding when we look at result.


I do not see U+00EF character


Did you edit it ? Seems incomplete, like

img src=

with no source defined (is it the url after ?)


<div style="padding-bottom: 20px;"></div><div><!DOCTYPE HTML PUBLIC "-/

<html xmlns="http:/">

<head>

<title>iTunes Connect</title>

<meta http-equiv="Content-Type" content=

"text/html; charset=utf-8" />

<style type="text/css">

/<![CDATA[*/

A { text-decoration: none; }

A:link { color: #0088FF; text-decoration: none; }

A:visited { text-decoration: none; }

A:active { color: #696969; text-decoration: underline;}

A:hover { text-decoration: underline; }

BODY, TD, P, UL, OL {

font-family: Helvetica, Lucida Grande, Arial, sans-serif;

font-size: 14px;

line-height: 18px;

color: #666666;

text-align: left;

}

P {

margin-bottom: 1em;

}

H1 {

line-height:145%;

}

HR {

border: 0;

border-top: 1px solid #dddddd;

margin: 10px 0px 15px 0px;

}

@media only screen and (max-device-width: 481px) and (min-device-pixel-ratio : 2), only screen and (min-device-width: 481px) and (-webkit-min-device-pixel-ratio : 2) {

*[id=header] { background:url('https:/') no-repeat 0 top; -webkit-background-size: 173px 45px; }

*[id=hr-fade] { background:url('https:/') no-repeat 0 top; -webkit-background-size: 648px 18px; }

}

@media only screen and (max-device-width: 480px) {

table[class="table"], td[class="cell"] {

width: 270px !important;

}

table[class="table3"], td[class="cell3"] {

width: 270px !important;

text-align: left !important;

}

img[id="header"] {

width: 135px !important;

height: 35px !important;

}

td[id="header"] {

-webkit-background-size: 135px 35px;

}

table[class="footer_table"] {

display: none !important;

}

.hide { max-height: none !important; font-size: 11px !important; display: block !important; }

p {

text-align: left !important;

}

}

/]]>*/

</style>

</head>

<body>

<table class="table" border="0" cellspacing="0" cellpadding="0"

align="center" width="700">

<tr align="center">

<td class="spacer" align="left" valign="top" bgcolor="#FFFFFF"

width="20"><img src=

"https:/"

border="0" alt="" width="20" height="1" /></td>

<td align="left">

<table class="table3" width="648" border="0" cellpadding="0"

cellspacing="0" align="center" bgcolor="#FFFFFF">

<tr>

<td class="cell3" id="header" width="648" align="left" style=

"padding-bottom: 10px;"><img id="header" src=

"https:/"

border="0" alt="iTunes Connect" width="173" height="45" name=

"header" /></td>

</tr>

</table>

</body>

With reference to your second post, I took the code from there and put it in a small test project (Xcode 9.0, targeting iOS 11.0). I ran it and set a breakpoint on line 94. When I stopped in the debugging I use the console to print

raw
:
(lldb) po raw
"\n\n\n"

No weird occurrences of

\u{ef}
.

Thinking that DevForums might have munged your string I took your hex dump, used Hex Fiend to create a

test.dat
file from it, added that to my project, and then loaded it with this code:
let dataURL = Bundle.main.url(forResource: "test", withExtension: "dat")!
let encoded = try! Data(contentsOf: dataURL)

I repeated my earlier test and it produced the same results (three LFs, no weird characters).

Finally, I took

test.dat
and used Hex Fiend to search for C3 AF within it (that’s the UTF-8 of U+00EF). No hits.

I recommend that you take some time to create a small focused test project that definitely illustrates the problem. Upload the project to a file sharing site, then post that URL along with specific instructions on how you ran it and the problem you saw.

Share and Enjoy

Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"

I uploaded the Xcode project below.


github.com/mota-raven/NSAttributedStringTest


I embed the source string to reproduce this problem.

The source contains some URLs and I think that is the key.


Please open and execute the sample app in a simulator, and push "test" button to go.

You can see the "\u{ef}" in Xcode GUI when you set the break point but "po raw" shows only "\n", to me either.


I try to trim "\n", but it is between \u{ef} ("\u{ef}\n\u{ef}"), I can't.

Accepted Answer

OK, I see what you’re seeing. Yay! Specifically, if I set a breakpoint on line 125 of

ViewController.swift
I see the
\u{ef}
in the GUI but not via
po
. I can see this junk using
po
if I render it to UTF-8 bytes:
(lldb) po raw.data(using: .utf8)! as NSData 
<0aefbfbc 0aefbfbc 0a>

The UTF-8 bytes EF BF BC correspond to U+FFFC OBJECT REPLACEMENT CHARACTER. Regardless of the following, you should definitely file a bug against Xcode’s GUI debugger because it’s clearly rendering this in correctly. Consider this code snippet:

let raw = "xxx\u{fffc}yyy"
print(raw)

If you put it in a command line tool and set a breakpoint on line 2, the GUI debugging shows

raw
as
xxx\u{ef}yyy
, which is very wrong.

Please post your bug number, just for the record.

As to where this U+FFFC is coming from, I’m not sure. I took a quick look at how

NSMutableAttributedString
parses HTML and decided that’s way too deep a rabbit hole for me to dive down in the context of DevForums.

On the other hand, getting rid of these characters is easy once you know what they are. Changing line 126 to this:

let stripped = attributedString.string.replacingOccurrences(of: "\u{fffc}", with: "", options: NSString.CompareOptions.literal, range:nil)

does the trick.

Share and Enjoy

Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"

I got the answer. Thank you so much!


I reported this as #35266044.

Another alternative is to use:
(lldb) p raw

(String) $R0 = "\n\u{ef}\n\u{ef}\n"


I find 'p' works better for a lot of swift types. You have to break the muscle memory of Obj-c but on the other hand, you get to type one less character!

Thanks so much for the help!!

Remove \u{ef} from a string that comes from NSAttributedString
 
 
Q