Structure

MLDataTable

A table of data for training or evaluating a machine learning model.

Declaration

struct MLDataTable

Overview

MLDataTable is Create ML’s version of a spreadsheet in which each column, represented by MLDataColumn or MLUntypedColumn, is an observable feature, and each row is an entry of those observations. As an example, the figure below shows a data table of book information where each row represents a book and each column is a feature of the books.

A table of information about a book. Columns named "Title", "Author", "Pages", and "Genre". The first row is "Alice in Wonderland", "Lewis Carroll", "124", and "Fantasy".

In most cases you interact with columns using the typed MLDataColumn, especially when you need to directly access the contents of a column. You can also interact with columns using MLUntypedColumn if the underlying type of the column isn’t important.

After you create a data table as described in Creating Data Tables for Training and Evaluation, you can modify it with methods like append(contentsOf:), addColumn(_:named:), and removeColumn(named:). You can also filter or map the contents of the data table to derive new data tables or new columns by using various subscripts and methods like dropDuplicates() or map(_:).

Finally, when your data table is ready, use it to train and evaluate a model from these groups:

Topics

Creating a Data Table

Creating Data Tables for Training and Evaluation

Import and format data to create and evaluate a machine learning model.

init(contentsOf: URL, options: MLDataTable.ParsingOptions)

Creates a data table from an imported JSON or CSV file.

init(dictionary: [String : MLDataValueConvertible])

Creates a data table from a dictionary of column names and data values.

init(namedColumns: [String : MLUntypedColumn])

Creates a data table from a dictionary of column names and untyped columns.

init()

Creates an empty table containing no rows or columns.

Getting the Size of a Data Table

var size: (rows: Int, columns: Int)

The number of rows and columns in the data table.

Transforming Rows to Generate a Data Column

func map<T>((MLDataTable.Row) -> T) -> MLDataColumn<T>

Creates a new column by applying a given thread-safe transform to every row in the data table.

func map<T>((MLDataTable.Row) -> T?) -> MLDataColumn<T>

Creates a new column, potentially with missing values, by applying a given thread-safe transform to every row in the data table.

Adding Columns

struct MLDataColumn

A column of typed values in a data table.

func addColumn(MLUntypedColumn, named: String)

Adds an untyped column to the table.

struct MLUntypedColumn

A column of untyped values in a data table.

Accessing Columns

subscript<T>(String, T.Type) -> MLDataColumn<T>?

Retrieves a column with the specified name and type.

subscript<Element>(String) -> MLDataColumn<Element>

Retrieves or adds a typed column with the specified name.

subscript(String) -> MLUntypedColumn

Retrieves or adds an untyped column with the specified name.

Renaming Columns

func renameColumn(named: String, to: String)

Changes the name of an existing column.

Removing Columns

func removeColumn(named: String)

Removes the column with the specified name.

Masking Rows to Generate a Data Table

subscript(MLDataColumn<Bool>) -> MLDataTable

Creates a subset of the table by masking the rows with the given column of Booleans.

subscript(MLUntypedColumn) -> MLDataTable

Creates a subset of the table by masking the rows with the given untyped column.

Discarding Rows to Generate a Data Table

func dropMissing() -> MLDataTable

Creates a subset of the table by removing any row missing one or more values.

func dropDuplicates() -> MLDataTable

Creates a subset of the table by removing all duplicate rows.

Selecting Rows to Generate a Data Table

subscript(Range<Int>) -> MLDataTable

Creates a subset of the table given a range of rows.

subscript<R>(R) -> MLDataTable

Creates a subset of the table given a range expression of rows.

func prefix(Int) -> MLDataTable

Creates a subset of the table given a number of initial rows.

func suffix(Int) -> MLDataTable

Creates a subset of the table given a number of final rows.

Selecting Columns to Generate a Data Table

subscript<S>(S) -> MLDataTable

Creates a subset of the table given a sequence of column names.

Filling in Missing Values to Generate a Data Table

func fillMissing(columnNamed: String, with: MLDataValue) -> MLDataTable

Creates a modified copy of the table by filling in the missing values in the named column.

Splitting a Data Table into Two New Tables

func randomSplit(by: Double, seed: Int) -> (MLDataTable, MLDataTable)

Creates two mutually exclusive, randomly divided subsets of the table.

Appending to a Data Table

func append(contentsOf: MLDataTable)

Appends the contents of the given data table to the end of this data table.

Getting Information About a Data Table’s Rows

var rows: MLDataTable.Rows

The rows of data in the table.

struct MLDataTable.Rows

A collection of rows in a data table.

Getting Information About a Data Table’s Columns

var columnNames: MLDataTable.ColumnNames

The names of the columns in the data table.

struct MLDataTable.ColumnNames

A collection of the names of the columns in a data table.

var columnTypes: [String : MLDataValue.ValueType]

The type of the data in each column.

Getting a Description of a Data Table

var description: String

A text representation of the data table.

var playgroundDescription: Any

A description of the data table shown in a playground.

Handling Data Table Errors

var isValid: Bool

A Boolean value indicating whether the data table is valid.

var error: Error?

The underlying error present when the data table is invalid.

See Also

Tabular Data

Creating Data Tables for Training and Evaluation

Import and format data to create and evaluate a machine learning model.

enum MLClassifier

A model you train to classify data into discrete categories.

enum MLRegressor

A model you train to estimate continuous values.

enum MLDataValue

The value of a cell in a data table.