Article

Analyzing Audio to Classify Sounds

Use the SoundAnalysis framework and Core ML to analyze audio.

Overview

The SoundAnalysis framework uses a trained Core ML model to analyze and classify streamed or file-based audio.

Prepare a Model

The SoundAnalysis framework operates on a model that you’ve trained using a Create ML MLSoundClassifier, and subsequently bundled with your app. When you add the model file to your app, Xcode automatically generates a class with the same name (minus the mlmodel extension). You create an instance of this class to load the MLModel for use with the SoundAnalysis framework. For example, if you have a model file named InstrumentClassifier.mlmodel, load its associated MLModel as shown below.

let instrumentClassifier = InstrumentClassifier()
let model: MLModel = instrumentClassifier.model

Capture Streaming Audio

As demonstrated in the code below, you can get data for analysis by using AVFoundation’s AVAudioEngine to capture audio data from a built-in or external microphone. Creating an audio engine instance automatically configures an inputNode that lets you access audio data from the device’s default microphone. Call the engine’s start() method to begin the flow of data through the audio pipeline.

func startAudioEngine() {
    // Create a new audio engine.
    audioEngine = AVAudioEngine()
    
    do {
        // Start the stream of audio data.
        try audioEngine.start()
    } catch {
        print("Unable to start AVAudioEngine: \(error.localizedDescription)")
    }
}

Create a Stream Analyzer

You use an SNAudioStreamAnalyzer to analyze the captured audio. Create an instance of this class with a PCM audio format that matches the input device’s native format.

// Get the native audio format of the engine's input bus.
let inputFormat = audioEngine.inputNode.inputFormat(forBus: 0)

// Create a new stream analyzer.
streamAnalyzer = SNAudioStreamAnalyzer(format: inputFormat)

To observe the analysis process, create an object that adopts the SNResultsObserving protocol. This protocol defines the methods that you implement for the analyzer to call as it produces new results, encounters errors, or completes its processing.

The example uses a sound classifier model that’s been trained to recognize and classify various musical instrument sounds. As shown below, when the model recognizes an instrument in the analyzed audio, it delivers the result to the observer, along with a confidence in the accuracy of its prediction. As results are received, a simple observer implementation logs them to the console.

// Observer object that is called as analysis results are found.
class ResultsObserver : NSObject, SNResultsObserving {
    
    func request(_ request: SNRequest, didProduce result: SNResult) {
        
        // Get the top classification.
        guard let result = result as? SNClassificationResult,
            let classification = result.classifications.first else { return }
        
        // Determine the time of this result.
        let formattedTime = String(format: "%.2f", result.timeRange.start.seconds)
        print("Analysis result for audio at time: \(formattedTime)")
        
        let confidence = classification.confidence * 100.0
        let percent = String(format: "%.2f%%", confidence)

        // Print the result as Instrument: percentage confidence.
        print("\(classification.identifier): \(percent) confidence.\n")
    }
    
    func request(_ request: SNRequest, didFailWithError error: Error) {
        print("The the analysis failed: \(error.localizedDescription)")
    }
    
    func requestDidComplete(_ request: SNRequest) {
        print("The request completed successfully!")
    }
}

Next, create an SNClassifySoundRequest for your MLModel and add it to analyzer, along with an instance of your observer object. The analyzer doesn’t retain the observer, so maintain a strong reference to it to keep it alive.

// Create a new observer that will be notified of analysis results.
// Keep a strong reference to this object.
resultsObserver = ResultsObserver()

do {
    // Prepare a new request for the trained model.
    let request = try SNClassifySoundRequest(mlModel: model)
    try streamAnalyzer.add(request, withObserver: resultsObserver)
} catch {
    print("Unable to prepare request: \(error.localizedDescription)")
    return
}

Analyze Streaming Audio

To ensure the best performance, use a dedicated serial dispatch queue to perform your audio analysis. Dispatching this work to a separate queue ensures that the audio engine can continue to efficiently process new buffers while analysis is in progress.

// Serial dispatch queue used to analyze incoming audio buffers.
let analysisQueue = DispatchQueue(label: "com.apple.AnalysisQueue")

Install an audio tap on the audio engine’s input node, which lets you access the stream of data captured by the microphone. As the audio engine delivers new buffers to your audio tap, dispatch each to your analysis queue and ask the analyzer to analyze it at the current frame position.

// Install an audio tap on the audio engine's input node.
audioEngine.inputNode.installTap(onBus: inputBus,
                                 bufferSize: 8192, // 8k buffer
                                 format: inputFormat) { buffer, time in
    
    // Analyze the current audio buffer.
    self.analysisQueue.async {
        self.streamAnalyzer.analyze(buffer, atAudioFramePosition: time.sampleTime)
    }
}

When the analyzer finishes processing the audio, it sends the results to the observer object, which produces output similar to the following. The output indicates what instrument was recognized in the last-processed audio buffers and the model’s level of confidence in that prediction.

Analysis result for audio at time: 1.45
Acoustic Guitar: 92.39% confidence.

...

Analysis result for audio at time: 8.74
Acoustic Guitar: 94.45% confidence.

...

Analysis result for audio at time: 14.15
Tambourine: 85.39% confidence.

...

Analysis result for audio at time: 20.92
Snare Drum: 96.87% confidence.

Analyze Audio Files Offline

You can also perform offline analysis of audio file data by creating a new instance of the SNAudioFileAnalyzer class with the audio file you want to analyze. Unlike SNAudioStreamAnalyzer, which only works with audio data in PCM format, SNAudioFileAnalyzer operates on any compressed or uncompressed format supported by Core Audio.

let audioFileURL = // URL of audio file to analyze (m4a, wav, mp3, etc.)
            
// Create a new audio file analyzer.
audioFileAnalyzer = try SNAudioFileAnalyzer(url: audioFileURL)

As you did when performing streaming analysis, you add a request to the analyzer and an observer that’s called as the analyzer produces results.

// Create a new observer that will be notified of analysis results.
resultsObserver = ResultsObserver()

// Prepare a new request for the trained model.
let request = try SNClassifySoundRequest(mlModel: model)
try audioFileAnalyzer.add(request, withObserver: resultsObserver)

Call the analyze() or analyze(completionHandler:) methods to perform the analysis of the audio data.

// Analyze the audio data.
audioFileAnalyzer.analyze()

See Also

Audio Analyzers

class SNAudioFileAnalyzer

An object you create to analyze an audio file and provide the results to your app.

Beta
class SNAudioStreamAnalyzer

An object you create to analyze a stream of audio data and provide the results to your app.

Beta