Audio Session Basics
Read this chapter to learn which iOS development problems audio sessions solve and to make first acquaintance with audio session features.
Why Does iOS Need to Manage Audio?
On your morning commute you unlock your iPhone and start listening to a new episode of a podcast, which plays back through the built-in speaker. As your seat mate frowns, you quickly plug in your headset and the Podcast continues, its output now rerouted and at the volume you last used for the headset. Maybe you start a sudoku, which plays its own sound effects—that mix with the podcast output. A few seconds later the podcast fades to silence, an alarm sounds, and an alert appears, reminding you of a birthday. You dismiss the alert. The podcast fades back in and resumes where it left off. Sounds in your sudoku game resume working.
You might do all this in the space of a minute—and without touching any audio settings. The remarkable simplicity of the audio user experience on an iOS device belies an underlying complexity greater than that on a Mac Pro. The infrastructure that makes the simplicity possible is exposed to your application through an audio session object.
An audio session lets you provide a seamless audio experience in your application. In fact, any iOS application that uses the AV Foundation framework, Audio Queue Services, OpenAL, or the I/O audio unit must use the audio session programming interface to meet Apple’s recommendations as laid out in iOS Human Interface Guidelines.
What Is an Audio Session?
An audio session is the intermediary between your application and iOS for configuring audio behavior. Upon launch, your application automatically gets a singleton audio session. You configure it to express your application’s audio intentions. For example:
Do you intend to mix your application’s sounds with those from other applications (such as the iPod), or do you intend to silence other audio?
How do you want your application to respond to an audio interruption, such as a Clock alarm?
How should your application respond when a user plugs in or unplugs a headset?
Audio session configuration influences all audio activity while your application is running, except for user-interface sound effects that you play. You can query the audio session to discover hardware characteristics of the device your application is on—characteristics such as channel count, sample rate, and availability of audio input. These can vary by device and can change due to user actions while your application runs.
You can explicitly activate and deactivate your audio session. For application sound to play, or for recording to work, your audio session must be active. The system can also deactivate your audio session—which it does, for example, when a phone call arrives or an alarm sounds. Such a deactivation is called an interruption. The audio session APIs provide ways to respond to and recover from interruptions.
What Is an Audio Session Category?
An audio session category is a key that identifies a set of audio behaviors for your application. By setting a category, you indicate your audio intentions to the system—such as whether your audio should continue when the screen locks. The six audio session categories in iOS, along with a set of override and modifier switches, let you customize your app’s audio behavior.
Each audio session category specifies a particular pattern of “yes” and “no” for each of the following behaviors, as detailed in “Audio Session Categories”:
Allows mixing: if yes, audio from other applications (such as the iPod) can continue playing when your application plays sound.
Silenced by the Silent switch and by screen locking: if yes, your audio is silenced when the user moves the Silent switch to silent and when the screen locks. (On iPhone, this switch is called the Ring/Silent switch.)
Supports audio input: if yes, application audio input, such as for recording, is allowed.
Supports audio output: if yes, application audio output, such as for playback, is allowed.
Most applications need only set the category once, at launch. That said, you can change the category as often as you need to, and can do so whether your session is active or inactive. If your session is inactive, your category request is sent when you activate your session. If your session is already active, your category request is sent immediately.
Audio Session Default Behavior—What You Get for Free
An audio session comes with some default behavior. Specifically:
Playback is enabled and recording is disabled.
When the user moves the Silent switch (or Ring/Silent switch on iPhone) to the “silent” position, your audio is silenced.
When the user presses the Sleep/Wake button to lock the screen, or when the Auto-Lock period expires, your audio is silenced.
When your audio starts, other audio on the device—such as iPod audio that was already playing—is silenced.
This collection of behavior is encapsulated in, and named by, the
AVAudioSessionCategorySoloAmbient audio session category—the default category.
Your audio session is automatically activated on application launch. This allows you to play (or record, if you specify one of the categories that supports audio input). However, relying on the default activation is a risky state for your application. For example, if an iPhone rings and the user ignores the call—leaving your application running—your audio may no longer play, depending on which playback technology you’re using. The next section describes some strategies for dealing with such issues, and “Handling Audio Interruptions” goes into depth on the topic.
You can take advantage of default behavior as you’re bringing your application to life during development. However, the only times you can safely ignore the audio session for a shipping application are these:
Your application uses System Sound Services or the UIKit
playInputClickmethod for audio and uses no other audio APIs.
System Sound Services is the iOS technology for playing user-interface sound effects and for invoking vibration. It is unsuitable for other purposes. (See System Sound Services Reference and the Audio UI Sounds (SysSound) sample code project.)
playInputClickmethod lets you play standard keyboard clicks in a custom input or keyboard accessory view. Its audio session behavior is handled automatically by the system. See “Playing Input Clicks” in Text Programming Guide for iOS.
Your application uses no audio at all.
In all other cases, do not ship your application with the default audio session. You may elect to use the default category, but explicitly using and managing an audio session that employs the “solo ambient” category provides different behavior than using it by default, as described next.
Why a Default Audio Session Usually Isn’t What You Want
If you don’t initialize, configure, and explicitly use your audio session, your application cannot respond to interruptions or audio hardware route changes. Moreover, your application would have no voice in OS decisions about audio mixing between applications.
Here are some scenarios that clarify audio session default behavior and how you can change it:
Scenario 1. You write an audio book application. A user begins listening to The Merchant of Venice. As soon as Lord Bassanio enters, Auto-Lock times out, the screen locks, and your audio goes silent.
To ensure that audio continues upon screen locking, configure your audio session to support that. For playback that continues when the screen is locked, use the
Scenario 2. You write a first-person shooter game that uses OpenAL-based sound effects. You also provide a background soundtrack but include an option for the user to turn it off, instead using iPod Library Access to play a song from their iPod library. The user starts up their favorite killing song. The first time they fire a photon torpedo at an enemy ship, the iPod music stops.
To ensure that iPod music is not interrupted, configure your audio session to allow mixing. Use the
AVAudioSessionCategoryAmbientcategory; or modify the playback category to support mixing.
Scenario 3. You write a streaming radio application that uses Audio Queue Services for playback. While a user is listening, a phone call arrives and stops your sound, as expected. The user chooses to ignore the call and dismisses the alert. The user taps Play again to resume the music stream, but nothing happens. To resume playback, the user must quit your application and restart it.
To handle the interruption of an audio queue gracefully, implement delegate methods or write an audio session callback function to allow your application to continue playing automatically or to allow the user to manually resume playing. See “Responding to Audio Session Interruptions.”
How the System Resolves Competing Audio Demands
As your iOS application launches, built-in applications (Messages, iPod, Safari, the phone) may be running in the background. Each of these may want to produce audio: a text message arrives, a podcast you started 10 minutes ago continues playing, and so on.
If you think of an iPhone as an airport, with applications represented as taxiing planes, the system serves as a sort of control tower. Your application can make audio requests and state its desired priority, but final authority over what happens “on the tarmac” comes from the system. You communicate with the “control tower” using the audio session. Figure 1-1 illustrates a typical scenario—your application wanting to use audio while the iPod is already playing. In this scenario, your application, in effect, interrupts the iPod.
In step 1 in the figure, your application requests activation of its audio session. You’d make such a request, for example, on application launch, or perhaps in response to a user tapping the Play button in an audio recording and playback application. In step 2, the system considers the activation request. Specifically, it considers the category you’ve assigned to your audio session. In Figure 1-1, the SpeakHere application uses a category that requires other audio to be silenced.
In steps 3 and 4 in the figure, the system deactivates the iPod application’s audio session, stopping its audio playback. Finally, in step 5, the system activates the SpeakHere application’s audio session and playback can begin.
The system manages competing audio demands by having final authority to activate or deactivate any of the audio sessions present on a device. In deciding, it follows the inviolable rule that “the phone always wins.” No application, no matter how vehemently it demands priority, can trump the phone. When a call arrives, the user gets notified and your application is interrupted—no matter what audio operation you have in progress and no matter what category you have set.
To ensure that your audio is not disrupted by a phone call or a Clock alarm, the user must turn on Airplane Mode. This highlights another inviolable rule: the user, not your application, is in control of the device. For example, there is no programmatic way to silence a Clock alarm. To prevent an alarm from ruining a recording, the user must turn off scheduled alarms. Similarly, there is no way to programmatically set hardware playback volume. The user is always in control of the hardware volume using the volume buttons on the side of the device.
The Two Audio Session APIs
iOS offers two APIs for working with the audio session object, each with its own advantages:
The AVAudioSession class, described in AVAudioSession Class Reference, provides a convenient, Objective-C interface that works well with the other Objective-C code in your application. This API provides access to a core set of audio session features.
AVAudioSessionhas two key advantages. When you use it to obtain the shared instance of the audio session, the session is implicitly initialized—there is no separate initialization to perform as there is when using the C API. Second, you can take advantage of simple delegate methods for handling audio interruptions and changes to the hardware configuration such as sample rate and channel count. You can use these delegate methods no matter which audio technology you are using for playback, recording, or processing.
Audio Session Services, described in Audio Session Services Reference, is a full-featured C API that provides access to all basic and advanced features of the audio session.
You need this API for handling audio hardware route changes as well as for making modifications to the standard behavior of audio session categories. For example, the
kAudioSessionProperty_OverrideAudioRouteproperty lets you switch playback from the receiver to the speaker when using the “play and record” audio session category. This API also provides a callback mechanism for handling interruptions.
You can mix and match calls to the two audio session APIs—they are completely compatible with each other. The rest of this document goes into depth on using the various features of these APIs.
Developing with the Audio Session APIs
When you add audio session support to your application, you can run your app in the Simulator or on a device. However, the Simulator does not simulate audio session behavior and does not have access to the hardware features of a device. When running in the simulator, you cannot:
Invoke an interruption
Change the setting of the Silent switch
Simulate screen lock
Simulate the plugging in or unplugging of a headset
Query audio route information or test audio session category behavior
Test audio mixing behavior—that is, playing your audio along with audio from another application (such as the iPod)
To test the behavior of your audio session code, you need to run on a device. For some development tips when working with the Simulator, see “Running Your App in the Simulator.”