With the user's permission, get recognition of live and prerecorded speech, and receive transcriptions, alternative interpretations, and confidence levels.
- iOS 10.0+
iOS users are accustomed to using Siri to interact with apps and—when a keyboard is visible—using dictation to capture their speech. The Speech APIs let you extend and enhance the speech recognition experience within your app, without requiring a keyboard.
Getting Started with Speech Recognition
The Speech APIs perform speech recognition by communicating with Apple's servers or using an on-device speech recognizer, if available. To find out if a speech recognizer is available for a specific language, you adopt the
Because your app may need to connect to the servers to perform recognition, it's essential that you respect the privacy of your users and treat their utterances as sensitive data. For this reason, you must get the user's explicit permission before you initiate speech recognition.
To start using speech recognition in your app:
Write a sentence that tells users how they can use speech recognition in your app.
For example, if your to-do list app changes an item's status to finished when the user speaks "done," you might write "Lets you mark an item as finished by saying Done."
Add the NSSpeechRecognitionUsageDescription key to your
Infofile and provide the sentence you wrote as the string value.
requestto request the user's permission by displaying the sentence you wrote in an alert.
If the user denies permission (or if speech recognition is unavailable), handle it gracefully. For example, you might disable user interface items that indicate the availability of speech recognition.
After the user grants your app permission to perform speech recognition, create an
SFSpeechobject and create a speech recognition request.
SFSpeechclass to perform recognition on a prerecorded, on-disk audio file, and use the
SFSpeechclass to recognize live audio or in-memory content.
Audio Buffer Recognition Request
Pass the request to your
SFSpeechobject to begin recognition.
Speech is recognized incrementally, so your recognizer's handler may be called more than once. (Check the value of the
isproperty to find out when recognition is finished.) If you're working with live audio, you use
SFSpeechand append audio buffers to a request during the recognition process.
Audio Buffer Recognition Request
When recording is finished, signal the recognizer that no more audio is expected, so that recognition can finish. Note that starting a new recognition task before the previous one finishes interrupts the in-progress task.
Creating a Speech Recognizer
Here is a way to create a simple recognizer that defaults to the user's current locale and initiates speech recognition.
Best Practices for a Great User Experience
Be prepared to handle the failures that can be caused by reaching speech recognition limits. Because speech recognition is a network-based service, limits are enforced so that the service can remain freely available to all apps. Individual devices may be limited in the number of recognitions that can be performed per day and an individual app may be throttled globally, based on the number of requests it makes per day. For example, if a recognition request fails quickly (within a second or two of starting), the recognition service may be temporarily unavailable to your app and you may want to ask users to try again later.
Plan for a one-minute limit on audio duration. Speech recognition can place a relatively high burden on battery life and network usage. In iOS 10, utterance audio duration is limited to about one minute, which is similar to the limit for keyboard-related dictation.
Remind the user when your app is recording. For example, you can play "now recording" sounds and display a visual indicator that helps users understand that they're being actively recorded. You can also display speech as it is being recognized so that users understand what your app is doing and when recognition errors occur.
Do not perform speech recognition on private or sensitive information. Some speech is simply not appropriate for recognition. Avoid sending passwords, health or financial data, and other sensitive speech for recognition.