New Vision API

Hey everyone,

I've been updating my code to take advantage of the new Vision API for text recognition in macOS 15. I'm noticing some very odd behavior though, it seems like in general the new Vision API consistently produces worse results than the old API. For reference here is how I'm setting up my request.

var request = RecognizeTextRequest()
request.recognitionLevel = getOCRMode()  // generally accurate
request.usesLanguageCorrection = !disableLanguageCorrection  // generally true
request.recognitionLanguages = language.split(separator: ",").map { Locale.Language(identifier: String($0)) }  // generally 'en'
let observations = try? await request.perform(on: image) as [RecognizedTextObservation]

Then I will process the results and just get the top candidate, which as mentioned above, typically is of worse quality then the same request formed with the old API.

Am I doing something wrong here?

I've just done a prototype app for recognising text from photos of product labels, using the new Vision API with Xcode 16.2 beta 2 and images from an iPad Pro (2020 M1 chip) and iPhone 15 Pro. The recognition is very accurate, even correctly recognising neatly handwritten label text.

These are the settings I'm using:

ocrRequest.recognitionLevel = .accurate
ocrRequest.usesLanguageCorrection = true
ocrRequest.automaticallyDetectsLanguage = true

Some of the labels are very small (1cm x 2cm), but even then with the iPhone camera on x2 or x3 and macro-mode the recognition is near perfect.

Regards, Michaela

Hello @ecdye,

Are you saying that, for the same input and request configurations, you are getting different outputs when using RecognizeTextRequest versus VNRecognizeTextRequest?

If so, could you provide a sample project that demonstrates this?

Best regards,

Greg

Hi @DTS Engineer,

Yes, that is correct. I've made a separate branch of the project that led me to discover this concern here that should help demonstrate this example. This is a program I've been developing to using the Vision API to perform OCR on bitmap subtitles. If you compile and run the program using

swift build
.build/debug/macSubtitleOCR --force-old-api --save-images --json Tests/Resources/sintel.sup oldAPI
.build/debug/macSubtitleOCR --save-images --json Tests/Resources/sintel.sup newAPI

This will output the test files results in the oldAPI and newAPI directories respectively. If you compare the JSON results of both and check the associated images in the image directory you will observe differences in the output between the two runs. With the new API delivering worse results. This behavior is consistent across all testing material that I have used and not unique to this sample.

I have noticed that the results in general improve when the text is placed against a plain white background, so I have removed the code that does that to make differences in the API results more clear for this example.

Please let me know if you have any further questions.

Thanks,

Ethan

New Vision API
 
 
Q