[VisionKit Text Recognition] boundingBox(for:) returns wrong results when used with .accurate recognition level

Question

Created May ’22

Replies 1

Boosts 0

Views 1.2k

Participants 2

I'm using the Vision OCR (with VNRecognizeTextRequest) to detect text on images.

For our specific use-case, we need to know the position of each of the letters, and we can do this with the function: recognizedText.boundingBox(for: (idx1..<idx2)) (where idx2 = idx1 + 1).

However, this results is only valid when the recognition level flag of the request is set to .fast, as when it is set to .accurate, the bounding box for any letter is not the bounding box of the letter itself, but the bounding box of the whole word containing the letter.

Basically, this is the same problem as the one described here: https://developer.apple.com/forums/thread/131510

The issue is we cannot use the .fast recognition level, as the text might be tilted, plus the letters are often hard to read with pretty bad contrast, and this produces unusable results with the .fast setting.

Does anyone know:

if there is a way to directly extract the bounding box of the letters from the VNRecognizedTextObservation with the .accurate setting ?
if there is an update / feature adjust planned on this issue, or if the Vision Dev team doesn't care about this issue ? Is there even a way to ask for a Bug fix on this issue for the dev team ?

We do really need this feature, so any info is a good info.

Thanks in advance for your answers.

Answered by DTS Engineer in 712496022

Hello,

The other forums post that you linked to says:

I decided to get in touch with Apple Developer Tech Support. Sure enough, this is a bug!

That is not quite correct, the “accurate” recognitionLevel looks at whole words (as opposed to a traditional character-by-character OCR approach), and therefore does not have per-character bounding boxes (this is the expected behavior). Please file an enhancement request for per-character bounding boxes when using the “accurate” recognitionLevel.

if there is an update / feature adjust planned on this issue, or if the Vision Dev team doesn't care about this issue ? Is there even a way to ask for a Bug fix on this issue for the dev team ?

We do not comment on future plans, but certainly the we would appreciate your enhancement request, which you can provide through Feedback Assistant.

Boost

Answer 1

DTS Engineer OP

Apple

May ’22

Accepted Answer