Creating Data Tables for Training and Evaluation

Import and format data to create and evaluate a machine learning model.


Use a table to store your training data when you want to use a standard JSON or CSV file to create a model, or when you want to train a model from custom types you've created in your own code. Each row contains an example of the data you're training the model to classify.

At the point in your code where you use the data to train a model, you select a column to be the target of the machine learning model's predictions. The remaining columns contain the features you provide to the model to make a prediction. Figure 1 illustrates how to structure training data for a classifier that will predict a book’s genre.

Figure 1

Structured training data for a classifier that will predict a book’s genre

A table of information about a book. Columns named "title", "author", "number of pages", and "genre". The columns are all labeled as "Features", and the genre column is highlighted and additionally labeled as a "Target". Each row in the table is labeled as an "Observation".

When you create an MLDataTable from a CSV or JSON file, the Create ML framework translates the structure of your input data directly into tabular data.

Import JSON Data

To create a data table from JSON data, you use the init(contentsOf:) initializer to create a row from each dictionary in the root JSON array. The names of the keys in each dictionary are used as the names of the columns in the table.

 JSON file:
     "title": "Alice in Wonderland",
     "author": "Lewis Carroll",
     "pageCount": 124,
     "genre": "Fantasy"
   }, {
     "title": "Hamlet",
     "author": "William Shakespeare",
     "pageCount": 98,
     "genre": "Drama"
   }, ...

let bookTable = try MLDataTable(contentsOf: URL(fileURLWithPath: "books.json"))

Import Tabular Data

Alternatively, MLDataTable can import data from an in-memory dictionary or a CSV file. A CSV or comma-separated values file is a textual representation of a table. You can create a CSV file programmatically or use an app like Numbers to export a spreadsheet. These formats directly translate into a data table as rows and columns.

For example, the init(dictionary:) initializer uses the keys in the provided dictionary as column names. The value for each column key is an array of the values for that column. You can use an MLDataValue to represent a column of values, or any type that conforms to the MLDataValueConvertible protocol.

let data: [String: MLDataValueConvertible] = [
    "title": ["Alice in Wonderland", "Hamlet", "Treasure Island", "Peter Pan"],
    "author": ["Lewis Carroll", "William Shakespeare", "Robert L. Stevenson", "J. M. Barrie"],
    "pageCount": [124, 98, 280, 94],
    "genre": ["Fantasy", "Drama", "Adventure", "Fantasy"]

let bookTable = try MLDataTable(dictionary: data)

When you import a CSV file with the init(contentsOf:) initializer, it creates a row in the table from each line in the CSV file.

 CSV file:
   Alice in Wonderland,Lewis Carroll,124,Fantasy
   Hamlet,William Shakespeare,98,Drama
   Treasure Island,Robert L. Stevenson,280,Adventure
   Peter Pan,J. M. Barrie,94,Fantasy

let bookTable = try MLDataTable(contentsOf: URL(fileURLWithPath: "books.csv")) 

See Also

Structured Data

struct MLDataTable

A table of data for training or evaluating a machine learning model.

enum MLDataValue

The value of a cell in a data table.