I'm building a custom machine learning algorithm to get parts of an invoice. So I need to feed the words and bounding boxes of them into a model. To achieve that I tokenize the pdf page string and then use
func findString(_ string: String, withOptions options: NSString.CompareOptions = []) -> [PDFSelection]In NSStringCompareOptions I use .RegularExpression. But I'm not getting any results.
Here's my code.
// tokenize string and remove empty arrays
var dummy = pdfString!.components(separatedBy: "\n").joined(separator: " ").components(separatedBy: " ").filter{$0 !=
""}
// loop over every token and search for it with its position
var resultsFound = [[PDFSelection]]()
for word in dummy {
let pattern = "\\b" + word + "\\b"
resultsFound.append(facturaPDF!.findString(pattern, withOptions: .regularExpression))
}
resultsFound.count
// add the results to the page for results in resultsFound
for result in results {
let highlight = PDFAnnotation(bounds: result.bounds(for: paginaPDF!), forType: .highlight, withProperties: nil)
highlight.endLineStyle = .square
highlight.color = UIColor.orange.withAlphaComponent(0.5)
paginaPDF!.addAnnotation(highlight)
}If anyone has any suggestions I'll be grateful 🙂