Unexpected Insertion of U+2004 (Space) When Using UITextView with Pinyin Input on iOS 18

I encountered an issue with UITextView on iOS 18 where, when typing Pinyin, extra Unicode characters such as U+2004 are inserted unexpectedly. This occurs when using a Chinese input method.

Steps to Reproduce:

1.	Set up a UITextView with a standard delegate implementation.
2.	Use a Pinyin input method to type the character “ㄨ”.
3.	Observe that after the character “ㄨ” is typed, extra spaces (U+2004) are inserted automatically between the characters.

Code Example:

class ViewController: UIViewController {
    @IBOutlet weak var textView: UITextView!

    override func viewDidLoad() {
        super.viewDidLoad()
        // Do any additional setup after loading the view.
    }
}

extension ViewController: UITextViewDelegate {
    func textView(_ textView: UITextView, shouldChangeTextIn range: NSRange, replacementText text: String) -> Bool {
        print("shouldChangeTextIn: range \(range)")
        print("shouldChangeTextIn: replacementText \(text)")
        return true
    }

    func textViewDidChange(_ textView: UITextView) {
        let currentText = textView.text ?? ""
        let unicodeValues = currentText.unicodeScalars.map { String(format: "U+%04X", $0.value) }.joined(separator: " ")
       
        print("textViewDidChange: textView.text: \(currentText)")
        print("textViewDidChange: Unicode Scalars: \(unicodeValues)")
    }
}

Output:

shouldChangeTextIn: range {0, 0}
shouldChangeTextIn: replacementText ㄨ
textViewDidChange: textView.text: ㄨ
textViewDidChange: Unicode Scalars: U+3128
------------------------
shouldChangeTextIn: range {1, 0}
shouldChangeTextIn: replacementText ㄨ
textViewDidChange: textView.text: ㄨ ㄨ
textViewDidChange: Unicode Scalars: U+3128 U+2004 U+3128
------------------------
shouldChangeTextIn: range {3, 0}
shouldChangeTextIn: replacementText ㄨ
textViewDidChange: textView.text: ㄨ ㄨ ㄨ
textViewDidChange: Unicode Scalars: U+3128 U+2004 U+3128 U+2004 U+3128

This issue may affect text processing, especially in cases where precise text manipulation is required, such as calculating ranges in shouldChangeTextIn.

The screenshot you provided is a Japanese Kana keyboard and not a Chinese Pinyin keyboard. By trying a Simplified Chinese Pinyin keyboard with my iPhone + iOS 18.2.1, I do see something similar and different:

When typing "X" (capital) three times, I got:

textViewDidChange: textView.text: ***
textViewDidChange: Unicode Scalars: U+0058 U+0058 U+0058

When typing "x" three times, I got:

textViewDidChange: textView.text: x x x
textViewDidChange: Unicode Scalars: U+0078 U+2006 U+0078 U+2006 U+0078

So there is indeed an extra U+2006 (six-per-em space).

This behavior seems reasonable to me though, because the marked text "x" here implies a Chinese character, while "X" (capital) represents itself, as you can tell from the candiate window, and the extra U+2006 makes the difference clear.

Regarding the following:

This issue may affect text processing, especially in cases where precise text manipulation is required, such as calculating ranges in shouldChangeTextIn

Assuming that your text manipulation is on the confirmed text, you can use the following code to retrieve the marked text and remove it from textView.text to get the confirme text:

if let markedTextRange = textView.markedTextRange {
    let markedText = textView.text(in: markedTextRange) ?? ""
    print("\(#function): range = \(markedTextRange), markedText = \(markedText)")
}

Other than that, I wil be very curious why the extra space in the marked text become an issue in your use case. If you don't mind to explain a bit more, I'd see if I can comment.

Best,
——
Ziqiao Chen
 Worldwide Developer Relations.

Thank you for your attention and any suggestions!

The issue arises because I calculate the position of mentions by predicting the number of characters added. When recalculating the mention positions, I rely on the shouldChangeTextIn range and replacementText to estimate the changes. However, the replacementText only gives me the character “ㄨ”, which leads me to believe that only one character is being added. In reality, it also includes an invisible U+2004, causing the calculation to fail due to an incorrect prediction.

I am still not quite clear what the "position of mentions" means and how you calculate, but since markedTextRange gives you the range that contains the marked text (including the extra spaces, if any), as mentioned in my previous post, I guess you can still figure out the target text needed for your calculation from there?

Best,
——
Ziqiao Chen
 Worldwide Developer Relations.

@DTS Engineer Thank you for your response and clarification. My intention is to predict the text that will be added, but it seems that I cannot achieve this within shouldChangeTextIn at the moment.

I also looked into markedTextRange, but it doesn’t seem to work for my purpose as it cannot calculate the text that is about to be added to the screen. Additionally, I was wondering if the replacementText in shouldChangeTextIn is supposed to include characters like U+2006 or U+2004..

I appreciate your assistance and time in helping me with this. Thank you again!

...I was wondering if the replacementText in shouldChangeTextIn is supposed to include characters like U+2006 or U+2004..

This will be a question for Apple's Text engineering team (although the current behavior is reasonable to me), and so I’d suggest that you file a feedback report to hopefully get the team's comment – If you do so, please share your report ID here.

Best,
——
Ziqiao Chen
 Worldwide Developer Relations.

Unexpected Insertion of U+2004 (Space) When Using UITextView with Pinyin Input on iOS 18
 
 
Q