I'm using the Vision OCR (with VNRecognizeTextRequest) to detect text on images.
For our specific use-case, we need to know the position of each of the letters, and we can do this with the function: recognizedText.boundingBox(for: (idx1..<idx2))
(where idx2 = idx1 + 1).
However, this results is only valid when the recognition level flag of the request is set to .fast, as when it is set to .accurate, the bounding box for any letter is not the bounding box of the letter itself, but the bounding box of the whole word containing the letter.
Basically, this is the same problem as the one described here: https://developer.apple.com/forums/thread/131510
The issue is we cannot use the .fast recognition level, as the text might be tilted, plus the letters are often hard to read with pretty bad contrast, and this produces unusable results with the .fast setting.
Does anyone know:
- if there is a way to directly extract the bounding box of the letters from the VNRecognizedTextObservation with the
.accuratesetting ? - if there is an update / feature adjust planned on this issue, or if the Vision Dev team doesn't care about this issue ? Is there even a way to ask for a Bug fix on this issue for the dev team ?
We do really need this feature, so any info is a good info.
Thanks in advance for your answers.