A structure used to train a model to classify audio data programmatically.
- macOS 10.15+
- Xcode 11.0+
- Create ML
Use a sound classifier to train a machine-learning model that you can use with the SoundAnalysis framework to categorize audio data.
When you create a model, you give it a training data set made up of labeled sounds, along with parameters that control the training process. For example, you can provide the model with sounds of laughter and applause, in two folders labeled
Applause, to train it to recognize these sounds.
Use single-channel audio files for training. Although the classifier accepts multi-channel audio, it only selects the first channel and discards any others. This audio can be in any compressed or uncompressed format supported by Core Audio, including M4A, MP3, AIFF, and WAV.
The sound classifier operates natively on 16 kHz audio. To ensure the best performance of your model, train it using audio with a sample rate of 16 kHz or higher. All audio in your training data should use the same bit depth and sample rate to prevent adding bias to your model, which can adversely impact its performance.
After training completes, you evaluate the trained model by showing it a data set containing labeled sounds that it hasn’t seen before. The metrics provided by this evaluation tell you whether the model performs with the accuracy you need. For example, you can see how often the model mistakes laughter for applause. If it makes too many mistakes, you can add more or better data, or change the parameters, and try again.
When your model performs as needed, you save it as a Core ML model file with the mlmodel extension. You use the SoundAnalysis framework to load this model, adding audio recognition and categorization capabilities to your app.