Train a machine learning model to classify natural language text.
- Create ML
- Create MLUI
A text classifier is a machine learning model that’s been trained to recognize patterns in natural language text, like the sentiment expressed by a sentence.
You train a text classifier by showing it lots of examples of text you’ve already labeled—for example, movie reviews that you’ve already labeled as positive, negative, or neutral.
Import Your Data
Start by gathering textual data and importing it into an
MLData instance. You can create a data table from data in formats like JSON and CSV. For example, consider a JSON file containing movie reviews that you’ve categorized by sentiment.
In a macOS playground, create the data table using the init(contentsOf:) method of
The resulting data table has two columns, named text and label.
Prepare Your Data for Training and Evaluation
The data you use to train the model needs to be different from the data you use to evaluate the model. Use the
random method of
MLData to split your data into two tables, one for training and the other for testing. The training data table contains the majority of your data, while the testing data contains the remaining 10–20%.
Create and Train the Text Classifier
Create an instance of
MLText with your training data table and the names of your columns. Training begins right away.
During training, Create ML puts aside a small percentage of the training data to use for validating the model’s progress during the training phase. These both affect training, but in different ways. Because the split is done randomly, you might get a different result each time you train the model.
Evaluate the Classifier’s Accuracy
Next, evaluate your trained model’s performance by testing it against sentences it’s never seen before. You do this by passing your testing data table to the
evaluation(on:) method, which returns an
If the evaluation performance isn’t good enough, you may need to retrain with more data or make other adjustments. For information about improving model performance, see Improving Your Model’s Accuracy.