Word Tagging Model- How to change tagging unit

I created a word tagging model in CreateML and am trying to make predictions with it using the following code: let text = "$30.00 7/1/2023" let model = TaggingModel() let input = TaggingModelInput(text: text) guard let output = try? model.prediction(input: input) else { fatalError("Unexpected runtime error.") }

However, the output separates "$" and "30.00" as separate tokens as well as "7", "/", "1", "/", etc. Is there any way to make sure prices and dates get grouped together and to simply separate tokens based on whitespace? Any help is appreciated!

Replies

An approach that could resolve your issue here is by formatting your input data as a DataFrame using the TabularData framework. You can use this to tokenize the data for training however you like (using whitespace in your case).

On inference (which seems the more pointed part of your question), the CreateML Framework offers multiple public APIs to run predictions from a model on a specified input. func prediction(from: String) throws -> [String] will use spaces as delimiters by default, but you can also use public func prediction(from tokens: [Token]) throws -> [String] and format the input sequence however you like.