- iOS 12.0+
- Xcode 11.0+
This sample project demonstrates how to use the Speech framework to recognize words from captured audio. When you tap the Start Recording button, SpokenWord begins capturing audio from the device’s microphone. It routes that audio to the APIs of the Speech framework, which process the audio and send back any recognized text. The app displays the recognized text in its text view, continuously updating that text until you tap the Stop Recording button.
Configure the Microphone Using AVFoundation
SpokenWord uses AV Foundation to communicate with the device’s microphone. Specifically, the app configures the shared
AVAudio object to manage the app’s audio interactions with the rest of the system, and it configures an
AVAudio object to retrieve the microphone input.
When you tap its Start Recording button, the app retrieves the shared
AVAudio object, configures it for recording, and makes it the active session. Activating the session lets the system know that the app needs the microphone resource. If that resource is unavailable—perhaps because the user is talking on the phone—the
set method throws an exception.
Once the session is active, the app retrieves the
AVAudio object from its audio engine and stores it in the local
input variable. The input node represents the current audio input path, which can be the device’s built-in microphone or a microphone connected to a set of headphones.
To begin recording, the app installs a tap on the input node and starts up the audio engine, which begins collecting samples into an internal buffer. When a buffer is full, the audio engine calls the provided block. The app’s implementation of that block passes the samples directly to the request object’s
append(_:) method, which accumulates the audio samples and delivers them to the speech recognition system.
Create the Speech Recognition Request
To recognize speech from live audio, SpokenWord creates and configures an
SFSpeech object. When it receives recognition results, the app updates its text view accordingly. The app sets the request object’s
should property to
true, which causes the speech recognition system to return intermediate results as they are recognized.
To begin the speech recognition process, the app calls
recognition on its
SFSpeech object. That method uses the information in the provided request object to configure the speech recognition system and to begin processing audio asynchronously. Shortly after calling it, the app begins appending audio samples to the request object. When you tap the Stop Recording button, the app stops adding samples and ends the speech recognition process.
Because the request’s
should property is
recognition method executes its block periodically to deliver partial results. The app uses that block to update its text view with the text in the
best property of the result object. If it receives an error instead of a result, the app stops the recognition process altogether.
Respond to Availability Changes for Speech Recognition
The availability of speech recognition services can change at any time. For some languages, speech recognition relies on Apple servers, which requires an active Internet connection. If that Internet connection is lost, your app must be ready to handle the disruption of service that can occur.
Whenever the availability of speech recognition services changes, the
SFSpeech object notifies its delegate. SpokenWord provides a delegate object and implements the
speech method to respond to availability changes. When services become unavailable, the method disables the Start Recording button and updates its title. When services become available, the method reenables the button and restores its original title.