NSPredicate return wrong result

NSPredicate(format: "SELF MATCHES %@", "^[0-9A-Z]+$").evaluate(with: "126𝒥ℰℬℬ𝒢𝒦𝒮33")

Returns true, and I don't know why. 𝒥ℰℬℬ𝒢𝒦𝒮 is not between 0-9 and A-Z, and why it returns true? How to avoid similar problem like this when using NSPredicate?

Answered by DTS Engineer in 826051022

According to this doc, NSPredicate uses ICU regular expressions. That can have some weird side effects, this being one of them.

Let’s simplify your code to this:

let didMatch = NSPredicate(format: "SELF MATCHES %@", "^J$").evaluate(with: "𝒥")
print(didMatch)     // -> true

This matches because, internally, NSPredicate normalises the string. Specifically, it uses normal form KC, or compatibility decomposition followed by canonical composition. You can see the result of this with this code:

print(("𝒥" as NSString).decomposedStringWithCompatibilityMapping)

which prints

J

Curiously, enabling case-insensitivity causes the match to fail:

let didMatch = NSPredicate(format: "SELF MATCHES[c] %@", "^J$").evaluate(with: "𝒥")
print(didMatch)     // -> false

The reasons for that are… complex |-:


As to what you do about this, it kinda depends on the context. If, for example, you’re only trying to match ASCII, one option would be to preflight the string for any non-ASCII characters.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

According to this doc, NSPredicate uses ICU regular expressions. That can have some weird side effects, this being one of them.

Let’s simplify your code to this:

let didMatch = NSPredicate(format: "SELF MATCHES %@", "^J$").evaluate(with: "𝒥")
print(didMatch)     // -> true

This matches because, internally, NSPredicate normalises the string. Specifically, it uses normal form KC, or compatibility decomposition followed by canonical composition. You can see the result of this with this code:

print(("𝒥" as NSString).decomposedStringWithCompatibilityMapping)

which prints

J

Curiously, enabling case-insensitivity causes the match to fail:

let didMatch = NSPredicate(format: "SELF MATCHES[c] %@", "^J$").evaluate(with: "𝒥")
print(didMatch)     // -> false

The reasons for that are… complex |-:


As to what you do about this, it kinda depends on the context. If, for example, you’re only trying to match ASCII, one option would be to preflight the string for any non-ASCII characters.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

It’s better to reply as a reply, rather than in the comments; see Quinn’s Top Ten DevForums Tips for this and other titbits.

it means the Regex in iOS is very dangers, right?

So, I mislead you a bit here. Sorry about that. The behaviour you’re seeing is not implemented by NSRegularExpression, but rather by NSPredicate. Consider this example:

let re = try! NSRegularExpression(pattern: "^J$")
let s = "𝒥"
let m = re.matches(in: s, range: .init(s.startIndex..<s.endIndex, in: s))
print(m)    // -> []

Do you need to use a predicate here? NSPredicate is a very Objective-C thing. Are you using it with some subsystem that requires it, like Core Data? Or just because it’s generally convenient?

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

NSPredicate return wrong result
 
 
Q