Article

Creating a Text Classifier Model

Train a machine learning model to classify natural language text.

Overview

A text classifier is a machine learning model that’s been trained to recognize patterns in natural language text, like the sentiment expressed by a sentence.

Diagram showing how a text string maps to a label.

You train a text classifier by showing it lots of examples of text you’ve already labeled—for example, movie reviews that you’ve already labeled as positive, negative, or neutral.

Diagram showing how you train a text classifier with Create ML using training data.

Import Your Data

Start by gathering textual data and importing it into an MLDataTable instance. You can create a data table from data in formats like JSON and CSV. For example, consider a JSON file containing movie reviews that you’ve categorized by sentiment.

[
    {
        "text": "The movie was fantastic!",
        "label": "positive"
    }, {
        "text": "Very boring. Fell asleep.",
        "label": "negative"
    }, {
        "text": "It was just OK.",
        "label": "neutral"
    } ...
]

In a macOS playground, create the data table using the init(contentsOf:) method of MLDataTable.

import CreateML

let data = try MLDataTable(contentsOf: URL(fileURLWithPath: "<#/path/to/read/data.json#>"))

The resulting data table has two columns, named text and label.

Prepare Your Data for Training and Evaluation

The data you use to train the model needs to be different from the data you use to evaluate the model. Use the randomSplit(by:seed:) method of MLDataTable to split your data into two tables, one for training and the other for testing. The training data table contains the majority of your data, while the testing data contains the remaining 10–20%.

let (trainingData, testingData) = data.randomSplit(by: 0.8, seed: 5)

Create and Train the Text Classifier

Create an instance of MLTextClassifier with your training data table and the names of your columns. Training begins right away.

let sentimentClassifier = try MLTextClassifier(trainingData: trainingData,
                                               textColumn: "text",
                                               labelColumn: "label")

During training, Create ML puts aside a small percentage of the training data to use for validating the model’s progress during the training phase. These both affect training, but in different ways. Because the split is done randomly, you might get a different result each time you train the model.

To see how accurately the model performed on the training and validation data, use the classificationError properties of the model’s trainingMetrics and validationMetrics properties.

// Training accuracy as a percentage
let trainingAccuracy = (1.0 - sentimentClassifier.trainingMetrics.classificationError) * 100

// Validation accuracy as a percentage
let validationAccuracy = (1.0 - sentimentClassifier.validationMetrics.classificationError) * 100

Evaluate the Classifier’s Accuracy

Next, evaluate your trained model’s performance by testing it against sentences it’s never seen before. You do this by passing your testing data table to the evaluation(on:) method, which returns an MLClassifierMetrics instance.

let evaluationMetrics = sentimentClassifier.evaluation(on: testingData)

To get the evaluation accuracy, use the classificationError property of the returned MLClassifierMetrics instance.

// Evaluation accuracy as a percentage
let evaluationAccuracy = (1.0 - evaluationMetrics.classificationError) * 100

If the evaluation performance isn’t good enough, you may need to retrain with more data or make other adjustments. For information about improving model performance, see Improving Your Model’s Accuracy.

Save the Core ML Model

When your model is performing well enough, you’re ready to save it so you can use it in your app. Use the write(to:metadata:) method to write the Core ML model file (.mlmodel) to disk. Provide any information about the model, like its author, version, or description in an MLModelMetadata instance.

let metadata = MLModelMetadata(author: "John Appleseed",
                               shortDescription: "A model trained to classify movie review sentiment",
                               version: "1.0")

try sentimentClassifier.write(to: URL(fileURLWithPath: "<#/path/to/save/SentimentClassifier.mlmodel#>"),
                              metadata: metadata)

See Also

Natural Language

struct MLTextClassifier

A model you train to classify natural language text.

Beta
struct MLWordTagger

A model you train to classify natural language text at the word level.

Beta