2.0 Encryption

2.1 Encryption Overview

For each encrypted stream type a protected block is identified, over which the protection process is performed. A protected block of audio is typically an audio frame; H.264 video protected blocks are the body of specific types of network adaptation layer (NAL) Units

Each protected block contains an integer number of 16-byte blocks that are encrypted using AES-128 cipher block chaining (CBC) mode with no padding, as specified in NIST Special Publication 800-38A. CBC occurs within each protected block, and the initialization vector (IV) must be reset to its original value at the start of each new protected block.

2.2 H.264 Video Streams

Stream encryption is performed within specified NAL units, in byte-stream form using start codes, as detailed in Annex B of ISO/IEC 14496-10. H.264 (AVC) video encoding ISO/IEC 14496-10 must be used for video when this specification is in operation. NAL units of type 1 and type 5 must be encrypted to this specification; other NAL unit types must not be encrypted. Listing 2-1 shows the format of a NAL unit that contains encrypted data.

Listing 2-1  Encryption of NAL units

Encrypted_nal_unit () {
    nal_unit_type_byte                // 1 byte
    unencrypted_leader                // 31 bytes
    while (bytes_remaining() > 0) {
        if (bytes_remaining() > 16) {
            encrypted_block           // 16 bytes
        }
        unencrypted_block           // MIN(144, bytes_remaining()) bytes
    }
}

Each NAL unit is formed with start code emulation prevention applied. The preceding start code is not part of the protected block and is not encrypted. The byte containing the nal_unit_type value, plus the 31 bytes that follow, are unencrypted. The contiguous data that follows the unencrypted bytes is a protected block. Any protected block with a length of 16 bytes or fewer has no encryption applied; therefore, a NAL unit with length of 48 bytes or fewer is completely unencrypted.

The protected block uses 10% skip encryption. Each 16-byte block of encrypted data is followed by up to nine 16-byte blocks of unencrypted data. If any block is encrypted (because the NAL unit’s length is greater than 48 bytes), start code emulation prevention must again be applied over the entire NAL unit, including the unencrypted sections.

To encrypt an H.264 stream, first start with a byte stream that has had start code emulation prevention applied. NAL types 1 and 5 with lengths greater than 48 bytes must be protected as defined above. Then, for those NAL units only, start code emulation prevention must be re-applied over the entire NAL unit.

To decrypt an H.264 stream, NAL units of type 1 and type 5 must be identified and unprotected. For each NAL unit of either type, start code emulation prevention must be removed unless the NAL unit’s length is 48 bytes or fewer. Then the NAL unit’s encrypted section must be located and the data in that section must be decrypted. The resulting bitstream can then be processed by a standard H.264 decoder.

2.3 Audio Streams

The encryption technology defined by this specification supports three audio formats: Advanced Audio Coding (AAC) ISO/IEC 14496-3, AC-3 audio (formerly Dolby Digital) ETSI TS 102 366 v1.3.1, and Enhanced AC-3 ETSI TS 102 366 v1.3.1.

2.3.1 Audio Formats

2.3.1.1 AAC

An AAC protected block is an audio frame that includes an audio data transport stream (ADTS) header, as shown in Listing 2-2.

Listing 2-2  Encryption of AAC Audio Frames

Encrypted_AAC_Frame () {
    ADTS_Header                        // 7 or 9 bytes
    unencrypted_leader                 // 16 bytes
    while (bytes_remaining() >= 16) {
        encrypted_block                // 16 bytes
    }
    unencrypted_trailer                // 0-15 bytes
}

The ADTS header, which can be 7 or 9 bytes long, plus the first 16 bytes of the frame after it, are unencrypted. The contiguous data section that follows is encrypted. The size, in bytes, of the encrypted section must be an integer multiple of 16 and is possibly zero. The AAC frame ends with 0 to 15 unencrypted bytes. Start code emulation prevention is not performed on the encrypted frame.

2.3.1.2 AC-3

An AC-3 protected frame is the full audio frame, a syncframe(), as shown in Listing 2-3.

Listing 2-3  Encryption of AC-3 Audio Frames

Encrypted_AC3_Frame () {
    unencrypted_leader                 // 16 bytes
    while (bytes_remaining() >= 16) {
        encrypted_block                // 16 bytes
    }
    unencrypted_trailer                // 0-15 bytes
}

The first 16 bytes, starting with the syncframe() header, are not encrypted. The contiguous data section that follows is encrypted. The AC-3 frame ends with 0 to 15 unencrypted bytes. Start code emulation prevention is not performed on the encrypted part of the frame.

2.3.1.3 Enhanced AC-3

An Enhanced AC-3 audio frame contains one or more syncframes (as defined in ETSI TS 102 366 v1.3.1). An Enhanced AC-3 protected block is a single syncframe(). Within an Enhanced AC-3 audio frame, the AES-128 cipher block chaining (CBC) initialization vector (IV) is not reset at syncframe boundaries. The IV is reset at the beginning of each audio frame.

Listing 2-4  Encryption of Enhanced AC-3 Audio Frames

Encrypted_Enhanced_AC3_syncframe () {
    unencrypted_leader                 // 16 bytes
    while (bytes_remaining() >= 16) {
        encrypted_block                // 16 bytes
    }
    unencrypted_trailer                // 0-15 bytes
}

The first 16 bytes, starting with the syncframe() header, are not encrypted. The contiguous data section that follows is encrypted. The AC-3 frame ends with 0 to 15 unencrypted bytes. Start code emulation prevention is not performed on the encrypted part of the frame.

Figure 2-1  Sample encryption of an Enhanced AC-3 packet

2.3.2 Audio Setup Information

The audio setup information must be supplied when a stream is encrypted in conformance with this specification. The big-endian setup information format is shown in Listing 2-5.

Listing 2-5  Setup Information Format

audio_setup_information() {
    audio_type               // 4 bytes
    priming                  // 2 bytes
    version                  // 1 byte
    setup_data_length        // 1 byte
    setup_data               // setup_data_length
}

The first field is a 32-bit format identifier, followed by a 16-bit priming field and an 8-bit version field. This is followed by format-specific data: first an 8-bit value containing the length, in bytes, of the format-specific data and then the format-specific data itself in an array of bytes. The setup information must be packed, with no alignment padding. The size of the setup information is 8 bytes plus the size of the format-specific data.

The field’s values are:

  • audio_type—as defined in the following sections; identifies the type of setup data carried.

  • priming—set to 0x0000 for AC-3 or Enhanced AC-3. For AAC retrieve this value from the encoder, using the Apple encoding API. If a non-Apple encoder is used and does not provide a priming value, set to 0x0000. This may lead to incorrect audio/video synchronization if the encoder has a different priming value than the value provided to the AAC decoder when the content is rendered.

  • version—set to 0x01.

  • setup_data_length—the number of bytes in the following setup data.

  • setup_data—format-specific information, as defined in the following sections.

Table 2-1  Audio_type format identifiers

Audio Format

Format Identifier

AAC-LC

'zaac'

AAC-HEv1

'zach'

AAC-HEv2

'zacp'

AC-3

'zac3'

Enhanced AC-3

'zec3'

2.3.2.1 AAC Setup

For AAC, the setup_data in the audio_setup_information is an AudioSpecificConfig() value, as defined in Section 1.6.2.1 of ISO/IEC 14496-3. This value is called DecoderSpecificInfo in MPEG-4.

2.3.2.2 AC-3 Setup

For AC-3, the setup_data in the audio_setup_information is the first 10 bytes of the audio data (the syncframe()). This comprises the syncinfo() structure and the initial part of the bsi() structure, as defined in 5.3.1 and 5.3.2 of ETSI TS 102 366 v1.3.1.

2.3.2.3 Enhanced AC-3 Setup

For Enhanced AC-3, the setup_data in the audio_setup_information is the contents of the 'dec3' EC3SpecificBox in section F.6 of ETSI TS 102 366 v1.3.1, excluding the BoxHeader.Size and BoxHeader.Type.

2.3.3 Audio Setup Carriage

2.3.3.1 Transport Audio Stream Setup

Format identifier: 'apad'

In transport streams, the audio setup information is carried in a registration_descriptor(), as defined in ISO/IEC 13818-1, sections 2.6.8 and 2.6.9 and Table 2-45.

2.3.3.2 Elementary Audio Stream Setup

Format identifier: 'PRIV'

In elementary streams the audio setup information is carried inside an ID3 Private Frame, as defined in ID3 tag version 2.4.0. The owner identifier is com.apple.streaming.audioDescription.

2.4 Other Stream Types

Stream types other than audio or video are not encrypted.