-
Deliver a better HLS audio experience
Discover techniques for streaming high-quality audio to bandwidth-limited networks and new audio codec support. We'll share some best practices for supporting the xHE-AAC, FLAC, and Apple Lossless Audio audio codecs, including limited support for multichannel AAC.
Resources
Related Videos
WWDC20
-
Download
Hello, everyone. I hope you're having a great conference.
Now, if you're looking to improve the data efficiency, and at the same time, the fidelity of your HLS audio streams, you've found the right session. I'm Simon, and I'm a media streaming engineer here at Apple. Together, in this session, we're going to discover how to deliver a better HLS audio experience.
Before we begin, I want to say that I'm going to provide to you some additional guidance that supplements the existing HLS authoring specification for Apple devices, a document that is available at developer.apple.com. I encourage you to become familiar with the contents of that document before we get into this session. So, if you need to, by all means, pause me. I'm a video on demand. Go check that out, get familiar with the recommendations there, and come right back in here. Without further ado, let's get into it.
Today we're going to cover two topics. The first topic, I'm going to introduce to you three new audio codecs that are new to HLS in the 2020 OS releases, and then I'm going to have a conversation with you about using two of these audio codecs in a multichannel setting. Let's discover the new audio codecs for the 2020 OS releases. The first one is xHE-AAC. That stands for Extended High-Efficiency Advanced Audio Codec. And all of those adjectives are there to remind you that this audio codec is a very efficient audio codec at low to medium bit rates. Bit rates below, say, 200 kilobits per second.
The other two audio codecs that are new to the 2020 OS releases are Lossless Audio Codecs. They are FLAC, which stands for the Free Lossless Audio Codec, and Apple Lossless. xHE-AAC is new to HLS in the 2020 releases, but it was available for file-based playback in the 2019-based releases. That's iOS 13 and macOS Catalina. FLAC and Apple Lossless have been available for file-based playback for quite some time. Let's chat about the first one, xHE-AAC. xHE-AAC also has another name in the MPEG-D standard, and that's USAC, and that stands for Unified Speech and Audio Coding.
That name is there to remind you that this is also a codec that is specifically tuned to speech reproduction. It's also very good as a general-purpose audio codec. Specifically, like I said before, those low to medium data rates. Let's have a look now at how xHE-AAC compares with the AAC family at large. It's a little bit different. The AAC family starts with a codec that you're all familiar with, which is AAC-LC. That stands for AAC Low Complexity. We recommended the use of this codec to data rates as low as 96 kilobits per second.
We identify this codec using the ISO syntax for the codec attribute, which is mp4a.40.2.
This codec evolved into another codec, which is HE-AAC. That's the High-Efficiency Advanced Audio Codec. It does so with the addition of an additional coding tool called SBR. That stands for Spectral Band Replication, where high frequencies are reconstructed from lower frequencies that are present in the core AAC media encoding.
We recommended the use of HE-AAC to data rates as low as 48 kilobits per second.
You identify HE-AAC with the codec string mp4a.40.5.
But it doesn't stop there. This evolved into a version two, HE-AAC v2, with another coding tool called Parametric Stereo. Parametric Stereo reconstructs a second audio channel from a single audio channel-- a mono audio channel-- with some additional parametric data. And we recommended the use of HE-AAC v2 to data rates as low as 32 kilobits per second.
Now, that's where the interoperability ends. All three of these audio codecs have a level of interoperability. You can decode a HE-AAC v2 with a HE-AAC decoder. There is a caveat to this, of course, which is that you'll only get one channel of audio, because the earlier codecs don't know how to deal with Parametric Stereo.
So let's take a look at what xHE-AAC does.
The backwards compatibility isn't there in xHE-AAC. The coding tools remain, or are quite similar, but they're even more advanced. They've been refined. They're more efficient. It's very important, therefore, to identify xHE-AAC correctly in your master playlists, with the ISO syntax for the codec attribute being mp4a.40.42.
This is such an advanced codec, and so efficient, that we recommend it's used down to 24 kilobits per second. Another way that xHE-AAC differs from the rest of the AAC family is how the standardization bodies have approached it. Now, we've always recommended in the HLS authoring guideline to include loudness and DRC, or Dynamic Range Control, metadata. What is Dynamic Range Control? Well, it's extra metadata that allows the media system to continuously adjust the audio signal levels to reduce the level difference between loud and soft passages.
We've recommended you include this metadata, or to ensure that your program content and any interstitials within it are all normalized to the same volume level.
Our recommendations are consistent with the new standard from ANSI/CTA-2075, which also has some informative text recommending the inclusion of this metadata.
Another standard that differs from this and, in fact, goes a little bit further, is CMAF.
CMAF stands for the Common Media Application Format. It's a format that seeks to unify media encodings between MPEG-DASH and HLS. It goes a step further in this regard in that it mandates the inclusion of this metadata in your media encodings.
For the rest of the AAC family, CMAF merely recommends that you include this metadata. So the takeaway is that DRC is becoming more relevant throughout our industry, and your inclusion of this metadata is the way forward.
Let's take a look now at how HLS intends to support xHE-AAC on Apple devices. So, as I mentioned before, it's really important for xHE-AAC that we advertise its use through the codecs attribute, again, with the syntax mp4a.40.42.
AVPlayer, from the AVFoundation framework, supports mono and stereo channel configurations. There is no multichannel support at this time.
Carriage is restricted to the fragmented MP4 container type, and the only encryption mechanism supported is common encryption.
So, how can you leverage xHE-AAC in your software and services? Well, first of all, let me reiterate, it is a well-suited codec for use for data rates as low as 24 kilobits per second, all the way up to the maximum that we have recommended, AAC, at 160 kilobits per second for stereo.
And the simplest way to leverage xHE-AAC in your software and services is to add additional low bit rate audio variants to your master playlist. The motivations for doing this are twofold. One, you want to reach customers on low data rate networks. Data rate networks, and in scenarios where they would otherwise stall. The second motivation is to reach customers on data rate constricted devices-- devices that have multiple different paths of network connectivity. An example of such a device is Apple Watch. And we have another session entitled "What's New in Streaming Audio on Apple Watch," and I encourage you to check it out. But I've got an example right here. Suppose you've got an existing master playlist in your content library that advertises two audio variants. The first audio variant is leveraging HE-AAC and uses 48 kilobits per second. The second audio variant in this playlist uses AAC-LC and is at 64 kilobits per second. To reach customers on low data rate networks and prevent them from stalling during playback, and to reach customers on data rate constricted devices, you need only introduce a new variant leveraging xHE-AAC advertising its codec string correctly. And now you've got a variant at 24 kilobits per second.
There are some additional ways that you can leverage xHE-AAC in your software and services. The first way is you can parallel some or all of your AAC codecs or your AAC variants with xHE-AAC. And your motivation here is to provide high-fidelity variants for the same given bit budget.
Another way that you can leverage this codec is you could see it as an opportunity to introduce DRC support to your playlists, migrating your library to a future where DRC is becoming increasingly relevant. So, you may be wondering, how can you coerce an AVPlayer into choosing this high-fidelity audio variant over your existing set of audio variants? Well, the answer is, we've introduced a new attribute to the string tag. It's called the SCORE attribute. We detail the SCORE attribute with more detail in the session entitled "Improved Stream Authoring with HLS Tools." I encourage you to check it out. However, I've got an example right here. In this example, I've got two audio variants.
The first audio variant is advertised as xHE-AAC.
Its bit rate is advertised at 94 kilobits per second.
I've got a second audio variant at AAC-LC at 96 kilobits per second.
And I've scored the xHE-AAC higher than the AAC-LC variant.
And you might also note that the bandwidth of the xHE-AAC variant is lower than the AAC-LC variant.
Using the SCORE attribute, the AVPlayer will prefer the xHE-AAC variant where support exists.
Let's switch gears now and talk about lossless audio. The new audio codecs, as I've already mentioned, are FLAC and Apple Lossless. Both of these are open source, but they've got an advantage over the other. FLAC is in wider general use throughout the industry, whereas Apple Lossless has more established carriage in MPEG-4. How does HLS intend to support lossless audio in the 2020 OS releases on Apple devices? Well, we have to advertise its use correctly.
Again, it's very important that we advertise its use using the codec strings "fLaC" where the "L" and the "C" are capitalized, no matter how strange that looks. And for Apple Lossless, it's "alac." AVPlayer in the AVFoundation framework supports all the channel configurations from these two audio codecs, up to eight channels. More about that in just a minute.
Carriage is restricted to the fragmented MP4 container type.
And the only encryption mechanism supported is common encryption.
So, how can you leverage lossless audio in your software and services? Well, the first one is you can add additional high bit rate audio variants to your playlists. And you would only do this if you know that your customers have plentiful bandwidth.
You would only also consider this when your customers expect exceptionally high audio fidelity.
Let's now move to our second topic for this session and have a look at how we're going to use these lossless audio channels in a multichannel usage. So, as I alluded to earlier, FLAC and Apple Lossless support up to eight channels. That's in channel configurations of 5.1 and 7.1.
Now one thing to note is that the channel layouts of the two audio codecs are slightly different. And you must follow these if you want the audio to come out the right speakers. Apple Lossless requires the center channel first, then left and right. Whereas FLAC requires left and right channel first, and then the center channel. By now you'll hopefully understand that lossless audio requires a much higher data rate than we're used to with lossy audio codecs such as AAC. But how much more? If I'm going to show you how much more FLAC requires when compared to AAC-LC, I'm going to need to make a lot more space in my graph. I'm going to have to change the y-axis. Hang on a second. I'm going to do that right here.
And there we go. Okay. AAC-LC at 48 kilohertz is still 160 kilobits per second. But now that I've changed my graph around a little bit, it's really, really small. That gives me lots of space to introduce lossless audio. The lossless audio is nearly four times more. At 16 bits per sample, 48 kilohertz, we're nearly one megabit. On the high end, where we've got 24 bits per sample and 96 kilohertz sampling rate, we're pushing nearly three megabits. But it doesn't stop there. You see, lossless audio codecs like AAC, you're able to configure the encoder to deliver a specific target bit rate. You can't do that with lossless audio codecs. They will consume as much data as they need in order to deliver the audio fidelity that you have requested.
So these are average data rates. The full picture is when I add the peak data rates. Now, at the closest audio fidelity to our AAC-- that's 16 bits per sample, 48 kilohertz sampling rate-- we are over one megabit. A similar story is present in multichannel. Note that the y-axis now changed, and AAC-LC is 400 kilobits per second. The next closest in lossless is in excess of two megabits per second, and on the high end, it's eight megabits per second for 24 bits per sample, 96 kilohertz sampling rate, and six channels of audio.
So it's very important that we consider how to scale up adaptively to these very high bit rates.
And the way that we recommend you do that is include multichannel AAC in your master playlists. Apple's software package compressor can encode it.
But one thing to note about multichannel AAC is it doesn't enjoy uniform support across Apple devices. It can't be decoded to its full channel complement. On devices where it can't be decoded to its full channel complement, you'll get two channels, or stereo. But do note, this does not preclude us from the requirement of including stereo AAC for backwards compatibility. Let me demonstrate with an example. Here we've got a playlist with the media tags up top. They declare the number of channels. Let's move that out of the way because it's not really relevant to our discussion here. The highlighted variants are the variants that are eligible for playback on a device that is able to decode multichannel AAC to its full channel complement. Assuming we're on such a device, and the device has an audio route that can render multichannel audio, we can scale adaptively from HE-AAC multichannel up to AAC-LC multichannel, all the way up to two megabits per second, lossless multichannel. The same story is true of stereo.
We can scale adaptively from an AAC stereo in a low bit rate to a high stereo AAC bit rate, and then scale adaptively up to a full megabit for stereo lossless.
Now, suppose we omit the multichannel AAC from our playlist. I don't recommend this, so don't try this at home. Here's the same playlist with the multichannel AAC omitted. Again, I've got my media tags at the top, and I'm just going to remove them.
Now, if we're on a device that supports a multichannel output-- its current audio route supports multichannel rendering-- there's a single audio variant that is eligible for playback. It's the lossless audio variant, and it requires two megabits per second. If your customer cannot sustain two megabits per second, your playback or their playback will stall.
Now, because we mandate the inclusion of stereo AAC in all playlists, the playback of the stereo will still adaptively scale from a low bit rate AAC to a high bit rate AAC, all the way up to lossless stereo.
So we've learned a lot here. Let's summarize everything that we've learned.
I've introduced three new audio codecs. And we've talked about the need for including DRC metadata in your media encodings. And it's important in the industry going forward. We've also talked about the considerations for using multichannel lossless and utilizing multichannel AAC as a means to scale up to those very high data rates. So, what should you do when you get home? Well, maybe you are already home. Well, consider how to employ xHE-AAC to target customers on low throughput networks.
And then, how you can use this codec to better utilize their existing throughput to deliver better audio fidelity.
And do consider how to use lossless audio codecs if it's applicable to your software or service. I hope you've learnt lots and have a great rest of conference. And as always, we wish you safe travels home, even if that's to the adjacent room.
-
-
Looking for something specific? Enter a topic above and jump straight to the good stuff.