Create a seamless speech experience in your apps

Augment your app's accessibility experience with speech synthesis: Discover the best times and places to add speech APIs so that everyone who uses your app can benefit. Learn how to use AVSpeechSynthesizer to complement assistive technologies like VoiceOver, and when to implement alternative APIs. And we'll show you how to route audio to the appropriate source and create apps that integrate speech seamlessly for all who need or want it.

To get the most out of this session, you should be familiar with AVFoundation and the basics of speech synthesis. For an overview, watch “AVSpeechSynthesizer: Making iOS Talk.”

리소스

관련 비디오

WWDC23

개인 맞춤형 목소리로 음성 합성 확장하기

WWDC19

Creating an Accessible Reading Experience

Hello and welcome to WWDC.

Hi, I'm Dan, and today we're gonna talk about how to create a seamless speech experience in your apps.

In this talk, I'll go over when to leverage AVSpeechSynthesizer and when you might wanna consider other APIs instead. Then, I'll go over some of the API basics, before jumping into some more advanced topics, like how to route speech through an outgoing call or how to opt speech audio out of your application shared audio session.

Let's start by taking a look at a couple different examples of when you might be considering using speech in your apps. If you are trying to use speech for everyone in your app, AVSpeechSynthesizer is likely a good choice. Perhaps your app is designed to be used without looking at the screen, like how the Maps app uses speech to provide spoken directions.

If you're trying to provide speech only for people using your app with a screen reader or other assistive technology, AVSpeechSynthesizer may not be the right choice.

All of our devices ship with VoiceOver, a built-in screen reader with a rich feature set. So there's no need for you to create your own screen reader that's local to your app using AVSpeechSynthesizer.

Instead, make your app accessible using the UIAccessibility APIs. If you need help getting started with this, we have some great talks from past WWDCs that can help.

For the cases where you want to provide infrequent additional speech to VoiceOver, such as describing an event that has occurred, you can do so by posting an announcement notification. Use UIAccessibility.post and pass in a notification type of announcement and a string that you'd like to have spoken. This will delegate the speech request to VoiceOver so it can manage it for you. If you're building an app that has an experience centered around speech, AVSpeechSynthesizer might be able to provide you with some additional flexibility even if your app is going to be used in conjunction with an assistive technology.

Augmentative Alternative Communication, or AAC, apps are one example of this kind of experience. And Proloquo2Go is an AAC app that does a great job.

It's a tool for people who are nonverbal, and it uses pictures and other visual cues to help people construct sentences that are spoken aloud via AVSpeechSynthesizer so they may communicate with others.

Apps designed for people who are blind or low-vision might also have experiences centered around speech.

Seeing AI is an app that helps people interact with the world around them by describing objects in their environment.

Notice that Seeing AI is still completely accessible to VoiceOver, and all of its UI is properly labeled.

AVSpeechSynthesizer can still help by being used to describe the objects in the viewfinder and gives the app additional control over how and when those speech requests are spoken. Now that we know when it's appropriate to use AVSpeechSynthesizer, let's dive into some of the API.

To get started, it's pretty simple. All you need to do is create an AVSpeechSynthesizer object.

Make sure that this object is retained until speech is concluded.

Next, create an AVSpeechUtterance, passing in the string you'd like to have spoken.

Finally, call the speak method on your synthesizer, passing in the utterance you've created. Getting started is that simple. If you're trying to provide basic amounts of speech for everyone using your app, this is likely all you'll need to do. By default, AVSpeechSynthesizer will configure your utterance using the settings on your device in Accessibility Spoken Content. Keep in mind that although Siri voices are available to be selected in Spoken Content Settings, they are not available through the AVSpeechSynthesizer API.

In the case that a Siri voice is the selected voice, we'll automatically configure your utterance using an appropriate fallback voice that matches the same language code as the selected Siri voice. People using your app with an assistive technology like VoiceOver likely have other speech settings configured specifically for that technology. Until now, it hasn't been possible to request that AVSpeechSynthesizer respect those settings instead of the ones on your device in Spoken Content.

Well, this year we're changing that with the new prefersAssistiveTechnologySettings API.

Setting this property to "true" on your AVSpeechUtterance will request that AVSpeechSynthesizer apply the settings from the currently running assistive technology to your utterance.

This includes speech properties such as the selected voice, speaking rate and pitch.

It's important to note that these properties will take precedence over any properties you are explicitly setting on the utterance.

If no assistive technology is running, we'll continue to use the settings from Spoken Content or whichever properties you are setting on the utterance.

We encourage you to set this on your utterances, especially if your app is likely to be used with an assistive technology like VoiceOver.

However, this API may not be appropriate for all apps, like AAC apps, so make sure you evaluate your use case accordingly. Let's take a look at how this API affects speech. In this app, tapping on each of the buttons will speak the string "Hello, world." The first button uses the default settings, whereas the second uses the prefersAssistiveTechnologySettings API. On this device, VoiceOver is running with a non-default voice and a higher speaking rate. Let's listen to how the speech sounds.

Speak "Hello, world" button. Hello, world. Speak "Hello, world" button. Hello, world.

Notice how the second speech request used the same voice and speaking rate as VoiceOver so that speech fit in much more seamlessly with the VoiceOver experience. Meanwhile, the first button spoke the string quite differently, and it may not be what the person who is using this app would've expected.

Sometimes, you might want to customize speech within your app rather than respecting the settings on the device. For example, an AAC app might wanna provide people with a list of voices to choose from and the ability to customize other speech properties so they can create a voice that sounds more unique to them. You can set a voice on your utterance, choosing one using either a language code or an identifier. You can also get a list of voices available on the system by calling the Speech Voices method on AVSpeechSynthesisVoice. This list will be updated as new voices are downloaded in Accessibility Settings. You can further customize speech by setting additional properties on your AVSpeechUtterance, such as speaking rate, a pitch multiplier and a volume. The AVSpeechSynthesizer talk from WWDC18 covers all of these properties in more detail, so I encourage you to watch that if you'd like to learn more.

An API to consider if you're building an app centered around speech is the mixToTelephonyUplink property.

Setting this property to "true" on your AVSpeechSynthesizer will route speech through the current active outgoing call, such as a phone call or a FaceTime call.

If no call is active, we'll continue to route speech through its default audio route. This is a really powerful API for apps like AAC apps as it can empower people who are nonverbal to communicate using more traditional methods.

Another API to consider if you're building an app that has an experience centered around speech, or if your app uses a lot of other audio, is the usesApplicationAudioSession API. By default, this is set to "true" on your AVSpeechSynthesizer, and speech audio will use your application shared audio session.

You can set it to "false" to delegate away the management of speech audio to the system.

This way, we'll take care of certain things for you, such as handling audio interruptions and setting your audio session to active or inactive.

Additionally, speech will duck and mix with the other audio coming from your app, so you don't have to worry about it interfering with things like sound effects, music or video.

That's an overview of some of the AVSpeechSynthesizer API. Remember, AVSpeechSynthesizer is not a replacement for the UIAccessibility APIs and, instead, should be used in conjunction with those APIs to create a great experience.

Use the prefersAssistiveTechnologySettings API to request that AVSpeechSynthesizer apply the speech settings from the currently running assistive technology to your utterances.

And if you're building an experience centered around speech, consider whether the mixToTelephonyUplink or usesApplicationAudioSession APIs can improve the experience for people using your app.

Speech can be a really powerful tool for people that use assistive technologies, so I really encourage you to take a look at how you can incorporate speech into your apps to empower those people to do some incredible things. Thanks for watching, and enjoy WWDC.

1:25 - Post an Announcement to the Running Assistive Technology
```
UIAccessibility.post(notification: .announcement, argument: "Hello World")
```

2:55 - Getting Started with AVSpeechSynthesizer

self.synthesizer = AVSpeechSynthesizer()
let utterance = AVSpeechUtterance(string: "Hello World")
self.synthesizer.speak(utterance)

4:08 - Respecting the Currently Running Assistive Technology's Speech Settings

self.synthesizer = AVSpeechSynthesizer()
let utterance = AVSpeechUtterance(string: "Hello World")
utterance.prefersAssistiveTechnologySettings = true
self.synthesizer.speak(utterance)

5:42 - Customizing Speech - Choosing a Voice

let utterance = AVSpeechUtterance(string: "Hello World")

// Choose a voice using a language code
utterance.voice = AVSpeechSynthesisVoice(language: "en-US")
        
// Choose a voice using an identifier
utterance.voice = AVSpeechSynthesisVoice(identifier: AVSpeechSynthesisVoiceIdentifierAlex)
        
// Get a list of installed voices
let voices = AVSpeechSynthesisVoice.speechVoices()

6:16 - Customizing Speech - Pitch and Rate

let utterance = AVSpeechUtterance(string: "Hello World")

// Choose a rate between 0 and 1, 0.5 is the default rate
utterance.rate = 0.75
  
// Choose a pitch multiplier between 0.5 and 2, 1 is the default multiplier
utterance.pitchMultiplier = 1.5

// Choose a volume between 0 and 1, 1 is the default value
utterance.volume = 0.5

6:34 - Mix Speech With an Outgoing Call

self.synthesizer = AVSpeechSynthesizer()
self.synthesizer.mixToTelephonyUplink = true

7:02 - Opting Speech Out of Application's Audio Session

self.synthesizer = AVSpeechSynthesizer()
self.synthesizer.usesApplicationAudioSession = false