Internationalization, using escape chars with right-to-left languages

Question

Created Mar ’17

Replies 5

Boosts 0

Views 4.2k

Participants 2

Anyone know what is the format for using escape chars in right-to-left languages when using iOS's internationalization feature?

A newline in left-to-right languages is \n (slash n). Does iOS's internationalization interpreter want to see a \n (slash n) or a n\ (n slash) in the Localizable.strings file when using newline char? Am adding Hebrew, as well as a handful of other L-to-R languages to an app and haven't found the correct syntax for this case. So if any of those fine Hebrew speaking iOS programmers can help out, it'd be appreciated!

In addition, an English to Hebrew translation of some verbiage yielded some double quotes in the translated Hebrew. When the Hebrew text was dropped into a tag in the (he(ebrew) language) Localizable.strings file, xcode parsed the first double quote as an end to that tags' string data. The escaping slash had to go after the double quote char. Because it didn't escape the double quote in the Hebrew text when it came before, as it does in Left-to-Right languages.

Empirically it seems that the relationship of the slash in Xcode to the escaped char is dependant on the direction of the depicted spoken languages' text. But certainly would be nice if Apple would stand up and give a definitive response so as to not cause a bug farther down the road. Because xcode is acting as the interpreter by providing the Internationalization function. Literally and figuratively.

Answered by DTS Engineer in 215937022

On Google's conversion page there is a Copy button that sticks the translated text into the sys's copy buffer.

OK, let’s start with that string. If you paste that into a standalone document (UTF-16 big endian, just to make things easier to read) you’ll see this:

$ hexdump -Cv standalone.txt
00000000  05 d0 05 d5 05 de 05 e8  00 20 05 d0 05 ea 00 20  |......... ..... |
00000010  05 db 05 ea 05 d5 05 d1  05 ea 00 20 05 d4 05 d3  |........... ....|
00000020  05 d5 05 d0 00 22 05 dc                          |....."..|
00000028

The last three code points are the ones we care about, namely:

U+05D0 HEBREW LETTER ALEF
U+0022 QUOTATION MARK
U+05DC HEBREW LETTER LAMED

btw The real problem here seems to be with your translation. The U+0022 is a compatibility thing, and the actual character to use would be U+05F4 HEBREW PUNCTUATION GERSHAYIM. If I change the doc to use that, the escaping issue becomes irrelevant and the text looks a lot nicer. However, your overall question is still valid — you may eventually encounter some right-to-left text that actually needs escaping — so let’s continue.

I then pasted that text into the Hebrew variant of a

Localizable.strings

file. Here’s a hex dump of that (again in UTF-16 big endian):

$ hexdump -Cv he.lproj/Localizable.strings
…
000000f0          …        00 22  00 53 00 70 00 65 00 61    …    ".S.p.e.a|
00000100  00 6b 00 20 00 74 00 68  00 65 00 20 00 65 00 6d  |.k. .t.h.e. .e.m|
00000110  00 61 00 69 00 6c 00 20  00 61 00 64 00 64 00 72  |.a.i.l. .a.d.d.r|
00000120  00 65 00 73 00 73 00 22  00 20 00 3d 00 20 00 22  |.e.s.s.". .=. ."|
00000130  05 d0 05 d5 05 de 05 e8  00 20 05 d0 05 ea 00 20  |......... ..... |
00000140  05 db 05 ea 05 d5 05 d1  05 ea 00 20 05 d4 05 d3  |........... ....|
00000150  05 d5 05 d0 00 22 05 dc  00 22 00 3b 00 0a        |....."...".;..|
0000015e

So Xcode’s editor hasn’t changed things; the sequence we’re focusing on, U+05D0 U+0022 U+05DC, is intact. However, the U+0022 in the middle still needs escaping.

As I mentioned earlier, Xcode interprets escape characters in Unicode storage order, so the

'\'

(U+005C REVERSE SOLIDUS) needs to go before the U+0022. I inserted this with my favourite hex editor and that makes Xcode happy.

What’s interesting is that, when I open the file in Xcode, it renders the

'\'

to the right of the quote (which is the key point of confusion on your part). This happens due to Unicode’s BiDi algorithm. The Hebrew characters in play (U+05D0 and U+05DC) are both strongly right-to-left (BiDi class

, per Table 4 of UAX #9). OTOH, the punctuation (U+0022 and U+005C) are ‘neutrals’ (specifically BiDi class

ON

), which mean they take on the directionality of their surroundings, which is why Xcode is showing them right-to-left in this context.

So, to reiterate:

Unicode text is stored in the order in which it is to be interpreted
Xcode’s localisation support interprets escape characters in that storage order
Hebrew characters are rendered right-to-left
Punctuation characters are neutral, so rendered right-to-left when they appear inside right-to-left text

I suspect that all of this would be eminently routine to someone who is used to working with right-to-left text, so if you gave this file to a Hebrew localiser they wouldn’t be phased at all (-:

Is there a feature in Xcode that "localizes" all the text strings requested by the iTunes Connect web interface upload session.

Not within Xcode. However, there is a way to automate the process of updating all of your app’s metadata, namely iTunes Connect Transporter. For a high-level intro, watch WWDC 2015 Session 304 iTunes Connect: Development to Distribution.

Share and Enjoy
—
Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"

Boost

Answer 1

DTS Engineer OP

Apple

Mar ’17

Does iOS's internationalization interpreter want to see a
\n
(slash n) or a
n\
(n slash) in the
Localizable.strings
file when using newline char?

The former.

When the Hebrew text was dropped into a tag in the (he(ebrew) language)
Localizable.strings
file, xcode parsed the first double quote as an end to that tags' string data.

I don’t understand what you’re getting at here. Can you post a concrete example?

Empirically it seems that the relationship of the slash in Xcode to the escaped char is dependant on the direction of the depicted spoken languages' text.

That’s not my understanding of it; in general Unicode stores (and thus Xcode’s

.strings

file parser parses) text in order in which it is to be interpreted. This can be confusing when dealing with directionality boundaries, as a weak character can be associated with either side of the boundary. I suspect that’s what’s going on with your second issue but it’s hard to be sure without more detail.

Share and Enjoy
—
Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"

0

Answer 2

Peter- OP

Mar ’17

Quinn, thank you for your comments. Quick example . . . a tag in the Localizable.Strings file located in the en.lproj subdir has a string like the following: Speak the email address In xcode the file is referred to as Localizable.Strings (English) For the sake of clearly i'm leaving out the leading and trailing double quotes that are used to define a NSSring string. Because the project has been Localized and includes the Hebrew Language, there is another Localizable.Strings file located in the he.lproj subdir. That file is refered to in Xcode as Localizable.Strings (Hebrew) Because i don't speak Hebrew I use google translate to convert the text Speak the email address to Hebrew. On Google's conversion page there is a Copy button that sticks the translated text into the sys's copy buffer. The text, fortunately, is also formatted. And one bit of the formatting data is a reference to the fact the text is from a right-to-left language. The result of the google translation is: אומר את כתובת הדוא"ל Notice the second char from the left, the double quote. So when you paste the buffer into the Localizable.Strings (Hebrew) file, that double quote needs to be escaped.

The question I have is that when within a string of Right-to-Left language based characters, it seems the escaping slash needs to be to the right of the char. Empirically speaking. But <newline> and <linefeed> also seem to need the escaping slash to the right. e.g.: n\ and \r

What i'd rather find out ahead of time is the correct use and placement of an escaping char, when used within a Right-to-Left language based text within a mixed-mode, Right-to-Left and Left-To-Right programing env't, such as Apple's Localization feature.

0

Answer 3

Peter- OP

Mar ’17

Eskimo dude, just realized you an Apple employee! Okay . . . looks like I'm, as well as the whole Apple Developer community, going to get a solid answer. Didn't realize you probably know like the back of your hand the file layout within Xcode. Wasn't "talking down" to you. Just wanted to be exceptionally clear. One final note is that I'm also using a <newline> \n, and <tab> \t char in the "Localizable" text being displayed in MainStoryboard labels.

Separately, because Quinn you responded to this Localization issue, is there a way to use Xcode's Localization feature to upload to the app store Localized app descriptions'? The "app description" verbiage you have to type in by hand when uploading a new or updated app for release. From what I've been able to figure out. It seems we have to use multiple iTunes Connect web interface sessions to provide the app's description text the app store needs for each localized language. Which is a huge waste of time after going through and separately localizing a dozen languages within Xcode. So, because Xcode is Apple's chosen method of uploading an app, and it's associated meta-date to Apple's app store. And Xcode does provide app Localization capabilities. Is there a feature in Xcode that "localizes" all the text strings requested by the iTunes Connect web interface upload session. And also incorporates them into the upload bundle? If there is, I'll open a separate discussion thread.

Thanks in advance Quinn!

0

Answer 4

DTS Engineer OP

Apple

Mar ’17

Accepted Answer

On Google's conversion page there is a Copy button that sticks the translated text into the sys's copy buffer.

OK, let’s start with that string. If you paste that into a standalone document (UTF-16 big endian, just to make things easier to read) you’ll see this:

$ hexdump -Cv standalone.txt
00000000  05 d0 05 d5 05 de 05 e8  00 20 05 d0 05 ea 00 20  |......... ..... |
00000010  05 db 05 ea 05 d5 05 d1  05 ea 00 20 05 d4 05 d3  |........... ....|
00000020  05 d5 05 d0 00 22 05 dc                          |....."..|
00000028

The last three code points are the ones we care about, namely:

U+05D0 HEBREW LETTER ALEF
U+0022 QUOTATION MARK
U+05DC HEBREW LETTER LAMED

btw The real problem here seems to be with your translation. The U+0022 is a compatibility thing, and the actual character to use would be U+05F4 HEBREW PUNCTUATION GERSHAYIM. If I change the doc to use that, the escaping issue becomes irrelevant and the text looks a lot nicer. However, your overall question is still valid — you may eventually encounter some right-to-left text that actually needs escaping — so let’s continue.

I then pasted that text into the Hebrew variant of a

Localizable.strings

file. Here’s a hex dump of that (again in UTF-16 big endian):

$ hexdump -Cv he.lproj/Localizable.strings
…
000000f0          …        00 22  00 53 00 70 00 65 00 61    …    ".S.p.e.a|
00000100  00 6b 00 20 00 74 00 68  00 65 00 20 00 65 00 6d  |.k. .t.h.e. .e.m|
00000110  00 61 00 69 00 6c 00 20  00 61 00 64 00 64 00 72  |.a.i.l. .a.d.d.r|
00000120  00 65 00 73 00 73 00 22  00 20 00 3d 00 20 00 22  |.e.s.s.". .=. ."|
00000130  05 d0 05 d5 05 de 05 e8  00 20 05 d0 05 ea 00 20  |......... ..... |
00000140  05 db 05 ea 05 d5 05 d1  05 ea 00 20 05 d4 05 d3  |........... ....|
00000150  05 d5 05 d0 00 22 05 dc  00 22 00 3b 00 0a        |....."...".;..|
0000015e

So Xcode’s editor hasn’t changed things; the sequence we’re focusing on, U+05D0 U+0022 U+05DC, is intact. However, the U+0022 in the middle still needs escaping.

As I mentioned earlier, Xcode interprets escape characters in Unicode storage order, so the

'\'

(U+005C REVERSE SOLIDUS) needs to go before the U+0022. I inserted this with my favourite hex editor and that makes Xcode happy.

What’s interesting is that, when I open the file in Xcode, it renders the

'\'

to the right of the quote (which is the key point of confusion on your part). This happens due to Unicode’s BiDi algorithm. The Hebrew characters in play (U+05D0 and U+05DC) are both strongly right-to-left (BiDi class

, per Table 4 of UAX #9). OTOH, the punctuation (U+0022 and U+005C) are ‘neutrals’ (specifically BiDi class

ON

), which mean they take on the directionality of their surroundings, which is why Xcode is showing them right-to-left in this context.

So, to reiterate:

Unicode text is stored in the order in which it is to be interpreted
Xcode’s localisation support interprets escape characters in that storage order
Hebrew characters are rendered right-to-left
Punctuation characters are neutral, so rendered right-to-left when they appear inside right-to-left text

I suspect that all of this would be eminently routine to someone who is used to working with right-to-left text, so if you gave this file to a Hebrew localiser they wouldn’t be phased at all (-:

Is there a feature in Xcode that "localizes" all the text strings requested by the iTunes Connect web interface upload session.

Not within Xcode. However, there is a way to automate the process of updating all of your app’s metadata, namely iTunes Connect Transporter. For a high-level intro, watch WWDC 2015 Session 304 iTunes Connect: Development to Distribution.

Share and Enjoy
—
Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"

0

Answer 5

Peter- OP

Mar ’17

Quite an explanation Quinn . . . on every front . . . depth of detail . . . use of examples . . . perseverance . . . projection of concept . . . man, if I wasn't riding in a car when I read your response I would have Jump For Joy for what you brought to the table! Yours is one of the most significant explanations I've ever received from Apple! So Quinn, as my Aussie friends say . . . Good on ya!

And with respect to my two Indian friends you mentioned, I thought I'd never be impacted by their fundamental, their basic, disagreement ever again. Especially after Apple diss'd the Big-Indian in favor of the puny one . . . or shall I use the more politically correct term . . "Little-Indian". But I must say Quinn I find it honorable that someone from the Eskimo nation worked into a phenomenal answer of a complex technical question support for fellow Native Americans' and their opposition to DAPL. (I too, with all due respect, stand in support for Big Indians. Especially those who oppose DAPL, for without Big Indians we wouldn't be able to communicate at all.)

I'll just end by acknowledging your reference to BiDi-versity. I was going to mention that in my initial outline of the issue. But I didn't want to overwhelm any potential response. But your inclusion of BiDi (especially in this age where we see so many hard fought gains by those previously marginalized taken by . . . well anyway) Well, again, your inclusion of BiDI in your excellent response brought you, and fellow maven of the day Professor Robert Kelly, into my dinnertime conversation. No small feat Quinn. Thank you again for your exceptional explanation! Keep up the excellent work!

Best,

Peter

0