Initializer

init(trainingData:tokenColumn:labelColumn:parameters:)

Creates a word tagger from a training set in a data table.

Declaration

init(trainingData: MLDataTable, tokenColumn: String, labelColumn: String, parameters: MLWordTagger.ModelParameters = ModelParameters(validationData: [])) throws

Parameters

trainingData

The tokens and their labels contained in an MLDataTable.

tokenColumn

The name of the column containing the tokens.

labelColumn

The name of the column containing the labels for the tokens.

parameters

The model parameters to use when creating the word tagger.

Discussion

You use this initializer to create a word tagger when your training data are in an MLDataTable. Typically, you’d create a data table by importing a JSON or CSV file, but you can also build a data table programmatically.

As an example, start with a JSON file that contains a sequence of objects at its root. Each object in that root sequence contains:

  • A sequence of token strings

  • A sequence of label strings, one for each token

// JSON file
[
  {
    "tokens": [
      "the",
      "quick",
      "brown",
      "fox",
      "jumped",
      "over",
      "the",
      "lazy",
      "dog"
    ]
    ,
    "labels": [
      "article",
      "adjective",
      "adjective",
      "noun",
      "verb",
      "preposition",
      "article",
      "adjective",
      "noun"
    ]
  }
]

Alternatively, start with a CSV file that has at least two columns: one for tokens and one for labels. Each entry in the token column must be an array of token strings. Each corresponding entry in the label column must be an array of label strings. The lengths of these two arrays must be the same, because the elements of one array correspond to the other. For example, the first token string, "the", corresponds to the first label string, "article", and so on.

// CSV File
tokens, labels
["the", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog"],["article", "adjective", "adjective", "noun", "verb", "preposition", "article", "adjective", "noun"]

Import the JSON or CSV file into a data table by passing the file’s URL to the initializer.

let file = Bundle.main.url(forResource: "FoxAndDog",
                           withExtension: "json")

let dataTable = try MLDataTable(contentsOf: file!)

Create the word tagger by passing in the data table to its initializer.

let wordTagger = try MLWordTagger(trainingData: dataTable,
                                  tokenColumn: "tokens",
                                  labelColumn: "labels")

See Also

Creating and Training a Word Tagger

init(trainingData: [(tokens: [MLWordTagger.Token], labels: [String])], parameters: MLWordTagger.ModelParameters)

Creates a word tagger from a training set in an array of tokens and their labels.

typealias MLWordTagger.Token

The token type of a word tagger, which is String.

struct MLWordTagger.ModelParameters

Parameters that affect the process of training a model.

let modelParameters: MLWordTagger.ModelParameters

The configuration parameters that the word tagger used for training during initialization.