I'm trying to train a text classifier model in Create ML. The Create ML app/framework offers five algorithms. I can successfully train the model with all of the algorithms except the BERT transfer learning option. When I select this algorithm, Create ML simply stops the training process immediately after the initial feature extraction phase (with no reported error).
What I've tried:
I tried simplifying the dataset to just a few classes and short examples in case there was a problem with the data.
I tried experimenting with the number of iterations and language/script options.
I checked Console.app for logged errors and found the following for the Create ML app:
error 10:38:28.385778+0000 Create ML Couldn't read event column - category is invalid. Format string is : <private>
error 10:38:30.902724+0000 Create ML Could not encode the entity <private>. Error: <private>
I'm not sure if these errors are normal or indicative of a problem. I don't know what it means by the "event" column – I don't have an event column in my data and I don't believe there should be one. These errors are not reported when using the other algorithms.
Given that I couldn't get the app to work with BERT, I switched over to the CreateML framework and followed the code samples given in the documentation. (By the way, there's an error in the docs: the line let (trainingData, testingData) = data.stratifiedSplit(on: "text", by: 0.8) should be stratifying on "label", not on "text").
The main chunk of code looks like this:
var parameters = MLTextClassifier.ModelParameters(
validation: .split(strategy: .automatic),
algorithm: .transferLearning(.bertEmbedding, revision: 1),
language: .english
)
parameters.maxIterations = 100
let sentimentClassifier = try MLTextClassifier(
trainingData: trainingData,
textColumn: "text",
labelColumn: "label",
parameters: parameters
)
Ultimately I want to train a single multilingual model, and I believe that BERT is the best choice for this. The problem is that there doesn't seem to be a way to choose the multilingual Latin script option in the API. In the Create ML app you can theoretically do this by selecting the Latin script with language set to "Automatic", as recommended in this WWDC video (relevant section starts at around 8:02). But, as far as I can tell, ModelParameters only lets you pick a specific language.
I presume the framework must provide some way to do this, since the Create ML app uses the framework under the hood, but I can't see a way to do it. Another possibility is that the Create ML app might be misrepresenting the framework – perhaps selecting a specific language in the app doesn't actually make any difference – for example, maybe all Latin languages actually use the same model under the hood and the language selector is just there to guide people to the right choice (but this is just my speculation).
Any help would be much appreciated! If possible, I'd prefer to use the Create ML app if I can get the BERT option to work – is this actually working for anyone? Or failing that, I want to use the framework to train a multilingual Latin model with BERT, so I'm looking for instructions on how to choose that specific option or confirmation that I can just choose .english to get the correct Latin multilingual model.
I'm running Xcode 26.2 on Tahoe 21.1 on an M1 Pro MacBook Pro. I have version 6.2 of the Create ML app.
Topic:
Machine Learning & AI
SubTopic:
Create ML