Framework

Speech

With the user's permission, get recognition of live and prerecorded speech, and receive transcriptions, alternative interpretations, and confidence levels.

Overview

iOS users are accustomed to using Siri to interact with apps and—when a keyboard is visible—using dictation to capture their speech. The Speech APIs let you extend and enhance the speech recognition experience within your app, without requiring a keyboard.

Getting Started with Speech Recognition

The Speech APIs perform speech recognition by communicating with Apple's servers or using an on-device speech recognizer, if available. To find out if a speech recognizer is available for a specific language, you adopt the SFSpeechRecognizerDelegate protocol.

Because your app may need to connect to the servers to perform recognition, it's essential that you respect the privacy of your users and treat their utterances as sensitive data. For this reason, you must get the user's explicit permission before you initiate speech recognition.

To start using speech recognition in your app:

  1. Write a sentence that tells users how they can use speech recognition in your app.

    For example, if your to-do list app changes an item's status to finished when the user speaks "done," you might write "Lets you mark an item as finished by saying Done."

  2. Add the NSSpeechRecognitionUsageDescription key to your Info.plist file and provide the sentence you wrote as the string value.

  3. Use requestAuthorization(_:) to request the user's permission by displaying the sentence you wrote in an alert.

    If the user denies permission (or if speech recognition is unavailable), handle it gracefully. For example, you might disable user interface items that indicate the availability of speech recognition.

  4. After the user grants your app permission to perform speech recognition, create an SFSpeechRecognizer object and create a speech recognition request.

    Use the SFSpeechURLRecognitionRequest class to perform recognition on a prerecorded, on-disk audio file, and use the SFSpeechAudioBufferRecognitionRequest class to recognize live audio or in-memory content.

  5. Pass the request to your SFSpeechRecognizer object to begin recognition.

    Speech is recognized incrementally, so your recognizer's handler may be called more than once. (Check the value of the isFinal property to find out when recognition is finished.) If you're working with live audio, you use SFSpeechAudioBufferRecognitionRequest and append audio buffers to a request during the recognition process.

  6. When recording is finished, signal the recognizer that no more audio is expected, so that recognition can finish. Note that starting a new recognition task before the previous one finishes interrupts the in-progress task.

Creating a Speech Recognizer

Here is a way to create a simple recognizer that defaults to the user's current locale and initiates speech recognition.

Listing 1

Getting a speech recognizer and making a recognition request

func recognizeFile(url:NSURL) {
   guard let myRecognizer = SFSpeechRecognizer() else {
      // A recognizer is not supported for the current locale
      return
   }   if !recognizer.isAvailable() {
      // The recognizer is not available right now
      return
   }   let request = SFSpeechURLRecognitionRequest(url: url)
   recognizer.recognitionTask(with: request) { (result, error) in
      guard let result = result else {
         // Recognition failed, so check error for details and handle it
         return
      }
      if result.isFinal {
         // Print the speech that has been recognized so far
         print("Speech in the file is \(result.bestTranscription.formattedString)")
      }
   }}

Best Practices for a Great User Experience

Be prepared to handle the failures that can be caused by reaching speech recognition limits. Because speech recognition is a network-based service, limits are enforced so that the service can remain freely available to all apps. Individual devices may be limited in the number of recognitions that can be performed per day and an individual app may be throttled globally, based on the number of requests it makes per day. For example, if a recognition request fails quickly (within a second or two of starting), the recognition service may be temporarily unavailable to your app and you may want to ask users to try again later.

Plan for a one-minute limit on audio duration. Speech recognition can place a relatively high burden on battery life and network usage. In iOS 10, utterance audio duration is limited to about one minute, which is similar to the limit for keyboard-related dictation.

Remind the user when your app is recording. For example, you can play "now recording" sounds and display a visual indicator that helps users understand that they're being actively recorded. You can also display speech as it is being recognized so that users understand what your app is doing and when recognition errors occur.

Do not perform speech recognition on private or sensitive information. Some speech is simply not appropriate for recognition. Avoid sending passwords, health or financial data, and other sensitive speech for recognition.

Topics

Getting a Speech Recognizer

class SFSpeechRecognizer

A supported speech recognizer.

Requesting Recognition and Monitoring Progress

class SFSpeechAudioBufferRecognitionRequest

A request to recognize speech provided in audio buffers.

class SFSpeechRecognitionRequest

A request to recognize speech from an audio source.

class SFSpeechURLRecognitionRequest

A request to recognize speech in a recorded audio file.

class SFSpeechRecognitionTask

A speech recognition task that lets you monitor recognition progress.

Working with Recognition Results and Transcriptions

class SFSpeechRecognitionResult

A recognized utterance that corresponds to a segment of recorded speech and that contains one or more transcription hypotheses.

class SFTranscription

A hypothesized textual representation of recognized speech.

class SFTranscriptionSegment

A part of the entire hypothesized transcription.

protocol SFSpeechRecognitionTaskDelegate

A protocol that supports complex or multi-utterance speech recognition requests.

protocol SFSpeechRecognizerDelegate

A protocol that helps you track the availability of a speech recognizer.

Constants

Speech Enumerations

Constants that specify types of speech recognition, the state of a recognition task, and the status of the authorization request.