Structure

MLSoundClassifier

A structure used to train a model to classify audio data programmatically.

Declaration

struct MLSoundClassifier

Overview

Use a sound classifier to train a machine-learning model that you can use with the SoundAnalysis framework to categorize audio data.

When you create a model, you give it a training data set made up of labeled sounds, along with parameters that control the training process. For example, you can provide the model with sounds of laughter and applause, in two folders labeled Laughter and Applause, to train it to recognize these sounds.

Use single-channel audio files for training. Although the classifier accepts multi-channel audio, it only selects the first channel and discards any others. This audio can be in any compressed or uncompressed format supported by Core Audio, including M4A, MP3, AIFF, and WAV.

The sound classifier operates natively on 16 kHz audio. To ensure the best performance of your model, train it using audio with a sample rate of 16 kHz or higher. All audio in your training data should use the same bit depth and sample rate to prevent adding bias to your model, which can adversely impact its performance.

After training completes, you evaluate the trained model by showing it a data set containing labeled sounds that it hasn’t seen before. The metrics provided by this evaluation tell you whether the model performs with the accuracy you need. For example, you can see how often the model mistakes laughter for applause. If it makes too many mistakes, you can add more or better data, or change the parameters, and try again.

When your model performs as needed, you save it as a Core ML model file with the mlmodel extension. You use the SoundAnalysis framework to load this model, adding audio recognition and categorization capabilities to your app.

Topics

Creating and Training a Sound Classifier

init(trainingData: [String : [URL]], parameters: MLSoundClassifier.ModelParameters)

Creates a sound classifier from a training data set represented by a dictionary.

init(trainingData: MLSoundClassifier.DataSource, parameters: MLSoundClassifier.ModelParameters)

Creates a sound classifier from a training data set represented by a data source.

enum MLSoundClassifier.DataSource

An enumeration that describes various ways to label audio-file URLs stored on disk.

struct MLSoundClassifier.ModelParameters

A structure that describes additional model parameters that you can set on the sound classifier.

let modelParameters: MLSoundClassifier.ModelParameters

The configuration parameters used to train the model during initialization.

Assessing Model Accuracy

var trainingMetrics: MLClassifierMetrics

Measurements of the classifier’s performance on the training data set.

var validationMetrics: MLClassifierMetrics

Measurements of the classifier’s performance on the validation data set.

Evaluating a Sound Classifier

func evaluation(on: [String : [URL]]) -> MLClassifierMetrics

Returns metrics describing the classifier’s performance on labeled data, provided in a dictionary.

func evaluation(on: MLSoundClassifier.DataSource) -> MLClassifierMetrics

Returns metrics describing the classifier’s performance on the data source.

Testing a Sound Classifier

func predictions(from: [URL]) -> [String]

Classifies the specified URLs with a label.

Saving a Sound Classifier

func write(to: URL, metadata: MLModelMetadata?)

Exports a Core ML model file for use in your app.

func write(toFile: String, metadata: MLModelMetadata?)

Exports a Core ML model file for use in your app.

Describing a Sound Classifier

var model: MLModel

The underlying Core ML model of the sound classifier stored in memory.

var description: String

A string representation of the sound classifier.

var debugDescription: String

A string representation of the sound classifier that’s suitable for output during debugging.

var playgroundDescription: Any

A description of the sound classifier shown in a playground.