Discover techniques for streaming high-quality audio to bandwidth-limited networks and new audio codec support. We'll share some best practices for supporting the xHE-AAC, FLAC, and Apple Lossless Audio audio codecs, including limited support for multichannel AAC.
Hello everyone. I hope you're having a great conference. Now, if you're looking to improve the data efficiency and, at the same time, the fidelity of your HLS audio streams, you've found the right session. I'm Simon and I'm a media streaming engineer here at Apple. Together, in this session, we're going to discover how to deliver a better HLS audio experience.
Before we begin, I want to say that I'm going to provide to you some additional guidance that supplements the existing HLS authoring specification for Apple devices, a document that is available at developer.apple.com.
I encourage you to become familiar with the contents of that document before we get into this session. So, if you need to, by all means pause me. I'm a video on demand.
Go check that out. Get familiar with the recommendations there and come right back in here. Without further ado, let's get into it. Today we're going to cover two topics. The first topic, I'm going to introduce you three new audio codecs that are new to HLS in the 2020 OS releases.
And then, I'm going to have a conversation with you about using two of these audio codecs in a multichannel setting. Let's discover the new audio codecs for the 2020 OS releases. The first one is xHE-AAC that stands for extended high efficiency advanced audio codec. And all of those adjectives are there to remind you that this audio codec is a very efficient audio codec at low to medium bitrates. Bit rates below, say, 200 kilobits per second. The other two audio codecs that are new to the 2020 OS releases are lossless audio codecs. They are FLAC which stands for the Free Lossless Audio Codec and Apple Lossless. xHE-AAC is new to HLS in the 2020 releases but it was available for file-based playback in the 2019 based releases. That's iOS 13 and macOS Catalina. FLAC and Apple Lossless have been available for file-based playback for quite some time.
Let's chat about the first one, xHE-AAC. xHE-AAC also has another name in the MPEG-D standard and that's USAC and that stands for Unified Speech and Audio Coding. That name is there to remind you that this is also a codec that is specifically tuned to speech reproduction. It's also very good as a general purpose audio codec. Specifically, like I said before, those low to medium data rates. Let's have a look, now, at how xHE-AAC compares with the AAC family at large. It's a little bit different. The AAC family starts with a codec that you're all familiar with which is AAC-LC. That stands for AAC Low Complexity. We recommended that the use of this codec to data rates as low as 96 kilobits per second. We identify this codec using the ISO syntax of the codec attribute which is "mp4a.40.2".
This codec evolved into another codec which is HE-AAC. That's the High Efficiency Advanced Audio Codec. It does so with the addition of an additional coding tool, called SBR. That stands for Spectral Band Replication where high frequencies are reconstructed from lower frequencies that are present in the core AAC media encoding.
We recommended that the use of HE-AAC to data rates as low as 48 kilobits per second.
You identify HE-AAC with the codec string "mp4a.40.5". But it doesn't stop there. This evolved into a version 2, HE-AAC v2 with another coding tool called Parametric Stereo. Parametric Stereo reconstructs a second audio channel from a single audio channel, a mono audio channel, with some additional parametric data. And we recommended the use of HE-AAC v2 to data rates as low as 32 kilobits per second. Now, that's where the interoperability ends. All three of these audio codecs have a level of interoperability. You can decode a HE-AAC v2 with a HE-AAC decoder.
There is a caveat to this, of course, which is that you'll only get one channel of audio because the earlier codecs don't know how to deal with parametric stereo.
So let's take a look at what xHE-AAC does. The backwards compatibility isn't there in xHE-AAC. The coding tools remain, or are quite similar, but they're even more advanced - they've been refined. They're more efficient.
It's very important, therefore, to identify xHE-AAC correctly in your master playlists with the ISO syntax for the codec attribute being "mp4a.40.42". This is such an advanced codec and so efficient that we recommend its use down to 24 kilobits per second. Another way that xHE-AAC differs from the rest of the AAC family, is how the standardization bodies have approached it.
Now, we've always recommended in the HLS authoring guideline to include loudness and DRC (or Dynamic Range Control) metadata. What is dynamic range control? Well, it's extra metadata that allows the media system to continuously adjust the audio signal levels to reduce the level difference between loud and soft passages. We've recommended you include this metadata or, to ensure that your program content and any interstitials within it, are all normalized to the same volume level. Our recommendations are consistent with a new standard from ANSI CTA 2075 which also has some informative text recommending the inclusion of this metadata. Another standard that differs from this and, in fact goes a little bit further, is CMAF. CMAF stands for the Common Media Application Format. It's a format that seeks to unify media encodings between MPEG-DASH and HLS. It goes a step further in this regard, in that it mandates the inclusion of this metadata in your media encodings. For the rest of the AAC family, CMAF merely recommends that you include this metadata.
So the take away is that DRC is becoming more relevant throughout our industry.
And your inclusion of this metadata is the way forward. Let's take a look now at how HLS intends to support xHE-AAC on Apple devices.
So as I mentioned before, it's really important for xHE-AAC that we advertise its use through the codecs attribute. Again, with the syntax "mp4a.40.42". AVPlayer from the AVFoundation framework supports mono and stereo channel configurations. There is no multi-channel support at this time. Carriage is restricted to the fMP4 for container type. And the only encryption mechanism supported is common encryption. So, how can you leverage xHE-AAC in your software and services? Well, first of all, let me reiterate, it is a well-suited codec for use for data rates as low as 24 kilobits per second all the way up to the maximum that we have recommended AAC at 160 kilobits per second for stereo. And the simplest way to leverage xHE-AAC in your software and services, is to add additional low bitrate audio variance to your master playlist.
The motivations for doing this are twofold. One, you want to reach customers on low data rate networks. Data rate networks and in scenarios where they would otherwise stall. The second motivation, is to reach customers on data rate constricted devices. Devices that have multiple different paths of network connectivity. An example of such a device is Apple Watch and we have another session entitled "What's New in Streaming Audio on Apple Watch" and I encourage you to check it out. But I've got an example right here.
Suppose you've got an existing master playlist in your content library that advertises 2 audio variants. The first audio variant is leveraging HE-AAC and uses 48 kilobits per second. The second audio variant in this playlist uses AAC-LC and is at 64 kilobits per second. To reach customers on low data rate networks and prevent them from stalling during playback. And to reach customers on data rate constricted devices, you need only introduce a new variant leveraging xHE-AAC advertising its codec string correctly. And now you've got a variant at 24 kilobits per second.
There are some additional ways that you can leverage xHE-AAC in your software and services. The first way is you can parallel some or all of your AAC codecs or your AAC variants with xHE-AAC. And your motivation here is to provide high fidelity variants for the same given bit budget. Another way that you can leverage this codec is you could see it as an opportunity to introduce DRC support to your playlists. Migrating your library to a future where DRC is becoming increasingly relevant. So you may be wondering how can you coerce an AVPlayer into choosing this high fidelity audio variant over your existing set of audio variants? Well, the answer is we've introduced a new attribute to the stream tag. It's called the SCORE attribute.
We detail the SCORE attribute with more detail in the session entitled "Improved Stream Authoring with HLS Tools." I encourage you to check it out.
However, I've got an example right here. In this example, I've got two audio variants.
The first audio variant is advertised as xHE-AAC. Its bitrate is advertised at 94 kilobits per second. I've got a second audio variant AAC-LC at 96 kilobits per second and I've scored the xHE-AAC higher than the AAC-LC variant. And you might also note that the bandwidth of the xHE-AAC variant is lower than the AAC-LC variant. Using the score attribute, the AVPlayer will prefer the xHE-AAC variant, where support exists.
Let's switch gears now and talk about lossless audio. The new audio codecs as I've already mentioned are FLAC and Apple Lossless. Both of these are open source but they've got an advantage over the other. FLAC is in wider general use throughout the industry, whereas Apple Lossless has more established carriage in MPEG-4. How does HLS intend to support lossless audio in the 2020 OS releases on Apple devices? Well, we have to advertise its use correctly. Again, it's very important that we advertise its use using the codec strings "fLaC" where the L and the C are capitalized, no matter how strange that looks. And, for Apple Lossless, it's "alac". AVPlayer, in the AVFoundation framework, supports all the channel configurations from these two audio codecs up to eight channels. More about that in just a minute.
Carriage is restricted to the fMP4 container type. And the only encryption mechanism supported is common encryption. So, how can you leverage lossless audio in your software and services? Well, the first one is you can add additional high-bitrate audio variants to your playlists.
And you would only do this if you know that your customers have plentiful bandwidth.
You would only also consider this, when your customers expect exceptionally high audio fidelity. Let's now move to our second topic for this session and have a look at how we're going to use these losses audio channels in a multi-channel usage. So, as I alluded to earlier, FLAC and Apple Lossless support up to eight channels. That's in channel configurations of 5.1 and 7.1. Now, one thing to note is that the channel layouts of the two audio codecs are slightly different and you must follow these if you want the audio to come out the right speakers. Apple Lossless requires the center channel first, then left and right. Whereas, FLAC requires left and right channel first and then the center channel. By now, you'll hopefully understand that lossless audio requires a much higher data right than we are used to with lossy audio codec such as AAC. But how much more? If I'm going to show you how much more FLAC requires when compared to AAC-LC, I'm going to need to make a lot more space in my graph. I'm going to have to change the y-axis. Hang on a second. I'm going to do that right here. And there we go. OK. AAC-LC at 48 KHz is still 160 kilobits per second. But now that I've changed my graph around a little bit, it's really, really small. That gives me lots of space to introduce lossless audio. The lossless audio is nearly four times more at 16 bits per sample, 48 KHz, we're nearly one megabit. On the high end, where we've got 24 bits per sample and 96 kilohertz sampling rate, we're pushing nearly 3 megabits. But it doesn't stop there. You see, lossless audio codecs, like AAC, you're able to configure the encoder to deliver a specific target bitrate. You can't do that with lossless audio codecs.
They will consume as much data as they need in order to deliver the audio fidelity that you have requested. So these are average data rates. The full picture is when I add the peak data rates. Now at the closest audio fidelity to our AAC, that's 16 bits per sample / 48 KHz sampling rate, we are over 1 megabit. A similar story is present in multi-channel. Note, that the y-axis now changed and AAC-LC is 400 kilobits per second.
The next closest in lossless is in excess of 2 megabits per second and on the high end it's 8 megabits per second for 24 bits per sample, 96 KHz sampling rate and six channels of audio. So, it's very important that we consider how to scale up adaptively to these very high bitrates. And the way that we recommend you do that is include multi-channel AAC in your master playlists. Apple's software package compressor can encode it, but one thing to note about multi-channel AAC is it doesn't enjoy uniform support across Apple devices. It can't be decoded to its full channel complement. On devices where it can't be decoded to its full channel complement, you'll get two channels or stereo. But do note, this does not preclude us from the requirement of including stereo AAC for backwards compatibility. Let me demonstrate with an example. Here we've got a playlist with the media tags up top. They declare the number of channels. Let's move that out of the way because it's not really relevant to our discussion here.
The highlighted variants and the variants that are eligible for playback on a device that is able to decode multi-channel AAC to its full channel complement.
Assuming we're on such a device and the device has an audio route that can render multi-channel audio, we can scale adaptively from HE-AAC multi-channel up to AAC-LC multi-channel all the way up to 2 megabits per second lossless multi-channel. The same story is true of stereo. We can scale adaptively from an AAC stereo in a low bitrate to a high stereo AAC bitrate and then scale adaptively up to a full megabit for stereo lossless.
Now, suppose we omit the multi-channel AAC from our playlist. I don't recommend this so don't try this at home. Here's the same playlist with the multi-channel AAC omitted. Again, I've got my media tags at the top and I'm just going to remove them. Now, if we're on a device that supports a multi-channel output, it's current audio route supports multi-channel rendering, there's a single audio variant that is eligible for playback. It's the lossless audio variant and it requires two megabits per second. If your customer cannot sustain 2 megabits per second, your playback, or their playback, will stall.
Now, because we mandate the inclusion of stereo AAC in all playlists, the playback of the stereo will still adaptively scale from a low bitrate AAC to a high bitrate AAC all the way up to lossless stereo. So we've learnt a lot here, let's summarize everything that we've learned. I've introduced three new audio codecs and we've talked about the need for including DRC metadata in your media encodings and it's importance in the industry going forward. And we've also talked about the considerations for using multi-channel lossless and utilizing multi-channel AAC as a means to scale up to those very high data rates. So what should you do when you get home? Well maybe you are already home. Well, consider how to employ xHE-AAC to target customers on low throughput networks. And then how you can use this codec to better utilize their existing throughput to deliver better audio fidelity. And do consider how to use lossless audio codecs if it's applicable to your software or service. I hope you've learnt lots and have a great rest of conference.
Looking for something specific? Enter a topic above and jump straight to the good stuff.
An error occurred when submitting your query. Please check your Internet connection and try again.