[VisionKit Text Recognition] boundingBox(for:) returns wrong results when used with .accurate recognition level

I'm using the Vision OCR (with VNRecognizeTextRequest) to detect text on images.

For our specific use-case, we need to know the position of each of the letters, and we can do this with the function: recognizedText.boundingBox(for: (idx1..<idx2)) (where idx2 = idx1 + 1).

However, this results is only valid when the recognition level flag of the request is set to .fast, as when it is set to .accurate, the bounding box for any letter is not the bounding box of the letter itself, but the bounding box of the whole word containing the letter.

Basically, this is the same problem as the one described here: https://developer.apple.com/forums/thread/131510

The issue is we cannot use the .fast recognition level, as the text might be tilted, plus the letters are often hard to read with pretty bad contrast, and this produces unusable results with the .fast setting.

Does anyone know:

  • if there is a way to directly extract the bounding box of the letters from the VNRecognizedTextObservation with the .accurate setting ?
  • if there is an update / feature adjust planned on this issue, or if the Vision Dev team doesn't care about this issue ? Is there even a way to ask for a Bug fix on this issue for the dev team ?

We do really need this feature, so any info is a good info.

Thanks in advance for your answers.

Accepted Reply

Hello,

The other forums post that you linked to says:

I decided to get in touch with Apple Developer Tech Support. Sure enough, this is a bug! 

That is not quite correct, the “accurate” recognitionLevel looks at whole words (as opposed to a traditional character-by-character OCR approach), and therefore does not have per-character bounding boxes (this is the expected behavior). Please file an enhancement request for per-character bounding boxes when using the “accurate” recognitionLevel.

if there is an update / feature adjust planned on this issue, or if the Vision Dev team doesn't care about this issue ? Is there even a way to ask for a Bug fix on this issue for the dev team ?

We do not comment on future plans, but certainly the we would appreciate your enhancement request, which you can provide through Feedback Assistant.

  • Thanks a lot for this clarification.

    I do understand the intention behind the feature (word-level recognition opposed to character-by-character), but I must confess that to me, it doesn't match the documentation that says "boundingBox(for: stringRange) - Calculates the bounding box around the characters in the range of the string.". Because to me, the text, the identifier and the types are misleading: if i give a string range with a characterwise granularity, I'm expecting to get a characterwise precision on the results...

    Maybe some people could update the documentation and/or add a paragraph in the "Discussion" section about that topic?

    I filed an enhancement request through, thanks for the advice. Do you know by chance what is the time frame for such a feature to come out of the Feature Enhancement team? (a simple time estimation like "within weeks / within months / at least a year" is fine)

    Thanks for your answers.

  • Hey BigGui,

    Thank you for filing the enhancement request! The appropriate engineering team will evaluate the enhancement request, I recommend following up with your bug report for any additional info.

    I agree that the documentation for "boundingBox(for: stringRange)" should mention the differences depending on the recognitionLevel, please file a bug report against the documentation and share the FB number with me.

Add a Comment

Replies

Hello,

The other forums post that you linked to says:

I decided to get in touch with Apple Developer Tech Support. Sure enough, this is a bug! 

That is not quite correct, the “accurate” recognitionLevel looks at whole words (as opposed to a traditional character-by-character OCR approach), and therefore does not have per-character bounding boxes (this is the expected behavior). Please file an enhancement request for per-character bounding boxes when using the “accurate” recognitionLevel.

if there is an update / feature adjust planned on this issue, or if the Vision Dev team doesn't care about this issue ? Is there even a way to ask for a Bug fix on this issue for the dev team ?

We do not comment on future plans, but certainly the we would appreciate your enhancement request, which you can provide through Feedback Assistant.

  • Thanks a lot for this clarification.

    I do understand the intention behind the feature (word-level recognition opposed to character-by-character), but I must confess that to me, it doesn't match the documentation that says "boundingBox(for: stringRange) - Calculates the bounding box around the characters in the range of the string.". Because to me, the text, the identifier and the types are misleading: if i give a string range with a characterwise granularity, I'm expecting to get a characterwise precision on the results...

    Maybe some people could update the documentation and/or add a paragraph in the "Discussion" section about that topic?

    I filed an enhancement request through, thanks for the advice. Do you know by chance what is the time frame for such a feature to come out of the Feature Enhancement team? (a simple time estimation like "within weeks / within months / at least a year" is fine)

    Thanks for your answers.

  • Hey BigGui,

    Thank you for filing the enhancement request! The appropriate engineering team will evaluate the enhancement request, I recommend following up with your bug report for any additional info.

    I agree that the documentation for "boundingBox(for: stringRange)" should mention the differences depending on the recognitionLevel, please file a bug report against the documentation and share the FB number with me.

Add a Comment