Use the SoundAnalysis framework and Core ML to analyze audio.
The SoundAnalysis framework uses a trained Core ML model to analyze and classify streamed or file-based audio.
Prepare a Model
The SoundAnalysis framework operates on a model that you’ve trained using a Create ML
MLSound, and subsequently bundled with your app. When you add the model file to your app, Xcode automatically generates a class with the same name (minus the
mlmodel extension). You create an instance of this class to load the
MLModel for use with the SoundAnalysis framework. For example, if you have a model file named
Instrument, load its associated
MLModel as shown below.
Capture Streaming Audio
As demonstrated in the code below, you can get data for analysis by using AVFoundation’s
AVAudio to capture audio data from a built-in or external microphone. Creating an audio engine instance automatically configures an
input that lets you access audio data from the device’s default microphone. Call the engine’s
start() method to begin the flow of data through the audio pipeline.
Create a Stream Analyzer
You use an
SNAudio to analyze the captured audio. Create an instance of this class with a PCM audio format that matches the input device’s native format.
To observe the analysis process, create an object that adopts the
SNResults protocol. This protocol defines the methods that you implement for the analyzer to call as it produces new results, encounters errors, or completes its processing.
The example uses a sound classifier model that’s been trained to recognize and classify various musical instrument sounds. As shown below, when the model recognizes an instrument in the analyzed audio, it delivers the result to the observer, along with a confidence in the accuracy of its prediction. As results are received, a simple observer implementation logs them to the console.
Next, create an
SNClassify for your
MLModel and add it to analyzer, along with an instance of your observer object. The analyzer doesn’t retain the observer, so maintain a strong reference to it to keep it alive.
Analyze Streaming Audio
To ensure the best performance, use a dedicated serial dispatch queue to perform your audio analysis. Dispatching this work to a separate queue ensures that the audio engine can continue to efficiently process new buffers while analysis is in progress.
Install an audio tap on the audio engine’s input node, which lets you access the stream of data captured by the microphone. As the audio engine delivers new buffers to your audio tap, dispatch each to your analysis queue and ask the analyzer to analyze it at the current frame position.
When the analyzer finishes processing the audio, it sends the results to the observer object, which produces output similar to the following. The output indicates what instrument was recognized in the last-processed audio buffers and the model’s level of confidence in that prediction.
Analyze Audio Files Offline
You can also perform offline analysis of audio file data by creating a new instance of the
SNAudio class with the audio file you want to analyze. Unlike
SNAudio, which only works with audio data in PCM format,
SNAudio operates on any compressed or uncompressed format supported by Core Audio.
As you did when performing streaming analysis, you add a request to the analyzer and an observer that’s called as the analyzer produces results.