Discover how spatial audio can help you provide a theater-like experience for media in your apps and on the web. We'll show you how you can easily bring immersive audio to those listening with compatible hardware, and how to automatically deliver different listening experiences depending on someone's bandwidth or connection — all with little to no change to your code. And gain recommendations on how you can tailor the experience in your app and use spatial audio to tell stories in new, exciting ways.
♪ Bass music playing ♪ ♪ Simon Goldrei: Hello! In this session, we'll explore how to immerse your app in spatial audio.
I'm Simon and I'm part of the streaming media team here at Apple.
Do you want to offer your customers, and differentiate your service, with the experience of a movie theater? Would you like to offer immersive audio with rendering of multipoint audio sources that provides that sense of being there? Can we do all this from the convenience of the mobile device in our customer's pocket? In this session, we're going to explore spatial audio and how to deliver it with the Core AVFoundation playback APIs and WebKit.
We've got a full agenda.
Together we'll cover what is spatial audio by contrasting it to existing technology we're familiar with.
Then we'll enumerate the technologies and treatments that the feature offers.
In the second half, I'll introduce API and highlight different treatments that are applied.
Next up, we'll review the levels of support for spatial audio in prior releases so that you can target features appropriately.
I'll also reveal what's new this year in our fall 2021 OS releases.
Then to top it all off, we'll end with a demo.
I'm excited to share that with you.
You're in for a treat! To understand what is spatial audio, let's start by considering classic stereo.
Be it headphones or stereo speaker arrangements of yesteryear, the soundstage we perceive is rather limited.
We don't hear sounds from behind us, directly in front, or above us.
It's missing lifelike, positional reproduction.
And in the case of headphones, the sound emanates from tiny speakers in, or on, our heads; we call this an in-head experience.
As we naturally move our heads while watching a movie, those tiny speakers move with us.
This is not a theater-like experience, but this is where spatial audio can help.
Spatial audio offers a theater-like experience.
It's a psychoacoustic technology that has the effect of producing a compelling virtual soundstage.
It works best with multichannel content but it also offers a compelling experience for stereo content as well.
Finally, spatial audio support is offered for audiovisual and audio-only media sources.
Best of all, it works with a variety of Apple products your customers already have.
We made it simple to bring the spatial audio experience to your customers.
As I just alluded to, the best way to enjoy spatial audio in your applications is to provide multichannel audio.
That experience is most adaptive to the customer's environment when providing HLS variants that reference multichannel audio alternates.
In fact, you may already have in your content library multichannel audio source media.
Simply publishing this will enable, by default, spatial audio in your application.
There's absolutely no software change needed.
Multichannel audio tracks in regular media files -- and WebKit's MSE in the fall 2021 releases -- also benefit from limited support that I'll detail later.
Let me tell you about media experiences you can now expect to create.
There's so many experiences you can now recreate with spatial audio.
You can deliver music that surrounds us, that feels like being at the concert.
You can build full-motion video games with interactive scenes that take gamers on their own immersive adventures.
But how does this technology work? When spatial audio is used, the virtual soundstage is static.
The soundstage doesn't move with casual head movement, unlike we saw earlier with stereo.
What we get is the same audible effect and feeling we expect from the theater.
This effect is possible both from the built-in speakers in many of our products and now is also available in select headphone products.
When spatial-capable headphones are used, measurements from inertial measurement units in the playback device are compared with similar measurements in the headphones to determine the customer's head pose.
This is used to dynamically alter the audio rendering to maintain that static soundstage effect.
The result is a feeling like the audio is emanating from the original placement around the camera, or listener, for an out-of-head experience.
It even works on a turning bus or a banking airplane.
We also offer a technique to up-mix stereo sources to reproduce a 5.1 channel experience.
We provide this feature to offer spatial audio along with your existing library of stereo content.
For supported headphones, it is the default stereo treatment in our fall 2021 releases.
We also use this treatment implicitly to make spatial audio even more compelling for you to adopt and offer, because right about now, you're probably thinking that distributing multichannel audio might impede the visual quality of your media.
After all, multichannel audio is much higher bitrate than the stereo AAC renditions you offer today.
How can you possibly fit both in a constrained network bandwidth environment? This a real problem.
We solved this by making the spatial audio experience adaptive to your customer's bandwidth.
When bandwidth is insufficient to deliver a high-quality audiovisual experience, the audio seamlessly degrades to a stereo, up-mixed -- but a still spatial -- treatment.
Head-tracking, if offered before the transition, is maintained.
Soon after, when bandwidth reliably recovers, the full multichannel spatial treatment is restored.
With this adaptive spatial audio experience, it is ever more important to both normalize the volume levels between stereo and multichannel renditions.
In addition, please provide DRC -- Dynamic Range Control -- and dialnorm metadata in your media encodings as is appropriate.
This is described in more detail in the HLS Authoring Specification available at developer.apple.com.
Let's take a look now at the interfaces you can use to tailor the spatial audio experience.
To customize the default spatial audio experience in your application -- be it via AVPlayerItem or now, AVSampleBufferAudioRenderer -- you specify one of four AVAudioSpatializationFormats.
These are to permit the spatialization of mono and stereo, multichannel, and the combination of the last two -- that is mono, stereo, and multichannel source audio formats.
You can also specify zero to inhibit audio spatialization.
Do note that our platforms provide system-level controls for customers to tailor the experience further, depending on the type of audio route, through Control Center and Bluetooth settings.
We take one of these four values and set it on the allowedAudioSpatialization Formats property on an AVPlayerItem and now, new in our fall 2021 releases, an AVSampleBufferAudioRenderer.
Now, you may be wondering, how do you use AVFoundation APIs to discover if an audio route supports spatial audio? How do you know if you should deliver multichannel audio to your AVSampleBufferAudioRenderer instance? Well, in the fall 2021 releases, we're introducing a property that indicates this on an AVAudioSessionPortDescription.
In addition, on AVAudioSession, we're introducing a mechanism for you to advertise to the system that your application is able to offer multichannel audio.
This indication is shown if the customers haven't enabled the spatial audio treatment in Control Center or Bluetooth preferences.
Note that if your application uses AVPlayer, these indications are managed for you.
The isSpatialAudioEnabled property indicates that the port is capable of both rendering spatial audio and that the customer permits it.
You are encouraged to observe route change notifications and to check isSpatialAudioEnabled at each event.
Similarly, AVAudioSession will emit a spatialPlaybackCapabilities ChangedNotification when the customer alters the spatial preferences in Control Center and Bluetooth settings.
As a convenience, this notification carries information about the state of spatial audio enablement.
Use the AVAudioSession SpatialAudioEnabledKey to retrieve the state as it pertains to this notification.
Finally, to indicate to the system that your software or service can provide multichannel content, you call the function setSupportsMultichannelContent with your intent.
This is used to relay to the customer that a spatial experience is available if network conditions permit and if the treatment is enabled.
Let's now survey the feature support across the last three release years.
In macOS Catalina, iOS and iPad OS 13, spatial audio is offered via built-in speakers with AVPlayerItem and the WebKit video tag by specifying any URL with an http scheme.
It is available to customers with 2018 and later year model MacBook, iPhone, and iPad Pro product lines.
The default is to offer spatialization by selecting multichannel audio renditions where available.
In macOS Big Sur, iOS and iPad OS 14, we introduced support for the AirPods Pro and AirPods Max head-track-capable headphones.
Spatialization capabilities via these accessories is offered to 2016 and later iPhone- and iPad-paired devices.
The default remains to offer spatialization by selecting multichannel audio renditions where available.
That brings us to the all-new support in macOS Monterey, iOS, iPadOS, and now, tvOS 15.
Here we offer support via AVPlayerItem, AVSampleBufferAudioRenderer, and limited WebKit support for W3C Media Source Extensions, MSE.
The MSE path offers no interface to tailor the spatialization experience.
However, there does exist an interface to detect the availability of spatial audio support via the AudioConfiguration dictionary within the Media Capabilities API set.
The default in these releases is to offer audio spatialization by default for all of mono, stereo, and multichannel sources where available and conditions permit.
For audio-only presentations, including all AVSampleBuffer AudioRenderer uses, only multichannel audio renditions are offered the treatment by default.
Now that we know what spatial audio is and how to use it, we've got something really special lined up for you today.
We're going to show you how you can use spatial audio in your software and services to help tell stories in new, creative ways.
Let's have a listen! ♪ Upbeat music playing ♪ < Uh-oh. Let's try that again.
Um... Uh... Cupertino? We have a problem.
Offscreen voice: What is it this time? Simon: I know. I know. Anything? Offscreen whisper: Sorry, Simon. Simon: No? Offscreen whisper: It's not going to work.
Simon: Really? Offscreen whisper: I know...
Simon: Bugger it. All right. All right. All right.
So we made this really great demo to demonstrate all the cool things you can do with spatial audio but... you know...
it seems like this isn't happening today.
We're going to try something.
I can't show you this video but what if I could, um...
...describe it? OK. So I don't know where you are right now, but I want you to close your eyes and let's imagine we're in a WWDC hall.
You know what that sounds like, right? What that feels like? Picture it in your mind.
Picture the stage and the big screen.
We're just about to dim the lights and start playing this video.
Oi! You! Dim the lights.
We're high in the sky above San Francisco.
Whooshing from the bay, down through the tall buildings.
The wind is rushing all around us, and then we're zooming out of the city and down the peninsula, all the way to Apple Park.
We soar into the park and you see somebody with amazing hair duck as we fly past.
He shouts, "Slow down!" which has this really cool left-to-right Doppler effect.
Slow down! And we're flying through the Apple orchard now.
Feel the whoosh of the trees.
We find ourselves at the pond for a moment of peace and tranquility.
Whispered voice: I'm honestly not sold on spatial audio.
Simon: And then, we're on the move again, whooshing over the birds until we reach the big glass doors of Caffè Macs sliding across the smooth terrazzo floor until we find this woman.
She's eating a pizza with her iPad propped up on the table, totally absorbed in this movie that's really, really tense.
It's this teeming jungle at the earliest hours of dusk.
The audio is literally -- no really! It's -- It's pulling us into the scene.
Whispered voice: He's doing a great job describing this demo.
I almost feel like I'm there. It's so vivid.
Simon: ...and distant monkey calls. But suddenly it gets eerie... ...and all the creatures go quiet.
All we can hear is the rustling of leaves.
Then a c-c-c-crash as a tree falls.
Something is coming! Something big! Thump. Thump. Thump.
We can hear our heart beating, right in the chest.
And then... silence.
We're about to relax when...
a dinosaur bursts through the trees right across from us! And we look up, straight into its gaping maw! Voice over PA: That's a wrap on 42...
Simon: We pull back to reveal we're on a film set.
Man: I'm not sure I love that lighting...
Simon: The dinosaur has stopped moving and a crew of people have appeared to clear the set.
Woman: Go for Kelly.
Voice over radio: You got a 20 on actors? Woman: Yeah. They're in makeup watching the launch. Going off comms.
Woman: Are you coming? Simon: We follow two of them into a trailer to watch today's space launch -- streaming, of course, in surround audio.
The audience is holding their breath but we move through them and into the TV.
Woman: Go for launch. Man: Good across the board.
Simon: And now we're inside the capsule.
Woman: Let's put the pedal to the floor.
Simon: The rocket ignites! Woman: Here we go! Express service to the moon! Next stop, Tranquility.
Man: All still good. Booster separation in three, two, one...
Clean and smooth. Simon: The second stage of the rocket has dropped away, and we drop with it...
...and follow the stage as it falls back to Earth, getting faster and faster. We're dropping through the atmosphere now, back towards the ground.
A big jet plane.
Getting bigger, and bigger, and -- Woman on PA: Good morning, folks.
This is your captain speaking from the flight deck.
...our descent towards our final destination today.
Shouldn't be more...
Simon: It's beautiful! Well, so, uh... that was fun.
Let's summarize what we saw and heard.
We discovered how easy it is to offer our customers a spatial audio experience.
In fact, you may not need to do anything to your application to take advantage of spatial audio.
Just by offering multichannel audio in your HLS variant playlists is often sufficient.
Remember, it is important to normalize volume levels between stereo and multichannel renditions, and to include DRC metadata.
Finally, we've seen how you can offer spatial audio to a wide customer base across the last three years of OS releases.
In our related sessions, you can learn how to discover if your HLS resources have multichannel audio.
Learn all about that as you explore HLS variants in AVFoundation.
I hope you've enjoyed this session as much as the team here and I have.
We hope you'll immerse yourself, and your app, in spatial audio and enjoy the rest of WWDC 2021.