Apple Developer Connection
Member Login Log In | Not a Member? Contact ADC

< Previous PageNext Page > Hide TOC

Sound Media

Sound media is used to store compressed and uncompressed audio data in QuickTime movies. It has a media type of 'soun'. This section describes the sound sample description and the storage format of sound files using various data formats.

In this section:

Sound Sample Descriptions
Sound Sample Data


Sound Sample Descriptions

The sound sample description contains information that defines how to interpret sound media data. This sample description is based on the standard sample description, as described in “Sample Description Atoms.”

The data format field contains the format of the audio data This may specify a compression format or one of several uncompressed audio formats. Table 3-7 shows a list of some supported sound formats.

Table 3-7  Partial list of supported QuickTime audio formats.

Format

4-Character code

Description

Not specified

0x00000000

This format descriptor should not be used, but may be found in some files. Samples are assumed to be stored in either 'raw ' or 'twos' format, depending on the sample size field in the sound description.

kSoundNotCompressed

'NONE'

This format descriptor should not be used, but may be found in some files. Samples are assumed to be stored in either 'raw ' or 'twos' format, depending on the sample size field in the sound description.

k8BitOffsetBinaryFormat

'raw '

Samples are stored uncompressed, in offset-binary format (values range from 0 to 255; 128 is silence). These are stored as 8-bit offset binaries.

k16BitBigEndianFormat

'twos'

Samples are stored uncompressed, in two’s-complement format (sample values range from -128 to 127 for 8-bit audio, and -32768 to 32767 for 1- bit audio; 0 is always silence). These samples are stored in 16-bit big-endian format.

k16BitLittleEndianFormat

'sowt'

16-bit little-endian, twos-complement

kMACE3Compression

'MAC3 '

Samples have been compressed using MACE 3:1. (Obsolete.)

kMACE6Compression

'MAC6 '

Samples have been compressed using MACE 6:1. (Obsolete.)

kIMACompression

'ima4'

Samples have been compressed using IMA 4:1.

kFloat32Format

'fl32'

32-bit floating point

kFloat64Format

'fl64'

64-bit floating point

k24BitFormat

'in24'

24-bit integer

k32BitFormat

'in32'

32-bit integer

kULawCompression

'ulaw'

uLaw 2:1

kALawCompression

'alaw'

uLaw 2:1

kMicrosoftADPCMFormat

0x6D730002

Microsoft ADPCM-ACM code 2

kDVIIntelIMAFormat

0x6D730011

DVI/Intel IMAADPCM-ACM code 17

kDVAudioFormat

'dvca'

DV Audio

kQDesignCompression

'QDMC'

QDesign music

kQDesign2Compression

'QDM2'

QDesign music version 2

kQUALCOMMCompression

'Qclp'

QUALCOMM PureVoice

kMPEGLayer3Format

0x6D730055

MPEG-1 layer 3, CBR only (pre-QT4.1)

kFullMPEGLay3Format

'.mp3'

MPEG-1 layer 3, CBR & VBR (QT4.1 and later)

kMPEG4AudioFormat

'mp4a'

MPEG-4 audio

Sound Sample Description (Version 0)

There are currently two versions of the sound sample description, version 0 and version 1. Version 0 supports only uncompressed audio in raw ('raw ') or twos-complement ('twos') format, although these are sometimes incorrectly specified as either 'NONE' or 0x00000000.

Version

A 16-bit integer that holds the sample description version (currently 0 or 1).

Revision level

A 16-bit integer that must be set to 0.

Vendor

A 32-bit integer that must be set to 0.

Number of channels

A 16-bit integer that indicates the number of sound channels used by the sound sample. Set to 1 for monaural sounds, 2 for stereo sounds. Higher numbers of channels are not supported.

Sample size (bits)

A 16-bit integer that specifies the number of bits in each uncompressed sound sample. Allowable values are 8 or 16. Formats using more than 16 bits per sample set this field to 16 and use sound description version 1.

Compression ID

A 16-bit integer that must be set to 0 for version 0 sound descriptions. This may be set to –2 for some version 1 sound descriptions; see “Redefined Sample Tables.”

Packet size

A 16-bit integer that must be set to 0.

Sample rate

A 32-bit unsigned fixed-point number (16.16) that indicates the rate at which the sound samples were obtained. The integer portion of this number should match the media’s time scale. Many older version 0 files have values of 22254.5454 or 11127.2727, but most files have integer values, such as 44100. Sample rates greater than 2^16 are not supported.

Version 0 of the sound description format assumes uncompressed audio in 'raw ' or 'twos' format, 1 or 2 channels, 8 or 16 bits per sample, and a compression ID of 0.

Sound Sample Description (Version 1)

The version field in the sample description is set to 1 for this version of the sound description structure. In version 1 of the sound description, introduced in QuickTime 3, the sound description record is extended by 4 fields, each 4 bytes long, and includes the ability to add atoms to the sound description.

These added fields are used to support out-of-band configuration settings for decompression and to allow some parsing of compressed QuickTime sound tracks without requiring the services of a decompressor.

These fields introduce the idea of a packet. For uncompressed audio, a packet is a sample from a single channel. For compressed audio, this field has no real meaning; by convention, it is treated as 1/number-of-channels.

These fields also introduce the idea of a frame. For uncompressed audio, a frame is one sample from each channel. For compressed audio, a frame is a compressed group of samples whose format is dependent on the compressor.

Important:  The value of all these fields has different meaning for compressed and uncompressed audio. The meaning may not be easily deducible from the field name.

The four new fields are:

When capturing or compressing audio using the QuickTime API, the value of these fields can be obtained by calling the Apple Sound Manager’s GetCompression function. Historically, the value returned for the bytes per frame field was not always reliable, however, so this field was set by multiplying bytes per packet by the number of channels.

To facilitate playback on devices that support only one or two channels of audio in 'raw ' or 'twos' format (such as most early Macintosh and Windows computers), all other uncompressed audio formats are treated as compressed formats, allowing a simple “decompressor” component to perform the necessary format conversion during playback. The audio samples are treated as opaque compressed frames for these data types, and the fields for sample size and bytes per sample are not meaningful.

The new fields correspond to the CompressionInfo structure used by the Macintosh Sound Manager (which uses 16-bit values) to describe the compression ratio of fixed ratio audio compression algorithms. If these fields are not used, they are set to 0. File readers only need to check to see if samplesPerPacket is 0.

Redefined Sample Tables

If the compression ID in the sample description is set to –2, the sound track uses redefined sample tables optimized for compressed audio.

Unlike video media, the data structures for QuickTime sound media were originally designed for uncompressed samples. The extended version 1 sound description structure provides a great deal of support for compressed audio, but it does not deal directly with the sample table atoms that point to the media data.

The ordinary sample tables do not point to compressed frames, which are the fundamental units of compressed audio data. Instead, they appear to point to individual uncompressed audio samples, each one byte in size, within the compressed frames. When used with the QuickTime API, QuickTime compensates for this fiction in a largely transparent manner, but attempting to parse the sound samples using the original sample tables alone can be quite complicated.

With the introduction of support for the playback of variable bit-rate (VBR) audio in QuickTime 4.1, the contents of a number of these fields were redefined, so that a frame of compressed audio is treated as a single media sample. The sample-to-chunk and chunk offset atoms point to compressed frames, and the sample size table documents the size of the frames. The size is constant for CBR audio, but can vary for VBR.

The time-to-sample table documents the duration of the frames. If the time scale is set to the sampling rate, which is typical, the duration equals the number of uncompressed samples in each frame, which is usually constant even for VBR (it is common to use a fixed frame duration). If a different media timescale is used, it is necessary to convert from timescale units to sampling rate units to calculate the number of samples.

This change in the meaning of the sample tables allows you to use the tables accurately to find compressed frames.

To indicate that this new meaning is used, a version 1 sound description is used and the compression ID field is set to –2. The samplesPerPacket field and the bytesPerSample field are not necessarily meaningful for variable bit rate audio, but these fields should be set correctly in cases where the values are constant; the other two new fields ( bytesPerPacket and bytesPerFrame) are reserved and should be set to 0.

If the compression ID field is set to zero, the sample tables describe uncompressed audio samples and cannot be used directly to find and manipulate compressed audio frames. QuickTime has built-in support that allows programmers to act as if these sample tables pointed to uncompressed 1-byte audio samples.

Sound Sample Description Extensions

Version 1 of the sound sample description also defines how extensions are added to the SoundDescription record.

struct SoundDescriptionV1 {
    // original fields
    SoundDescription    desc;
    // fixed compression ratio information
    unsigned long   samplesPerPacket;
    unsigned long   bytesPerPacket;
    unsigned long   bytesPerFrame;
    unsigned long   bytesPerSample;
    // optional, additional atom-based fields --
    // ([long size, long type, some data], repeat)
};

All extensions to the SoundDescription record are made using atoms. That means one or more atoms can be appended to the end of the SoundDescription record using the standard [size, type] mechanism used throughout the QuickTime movie architecture.

siSlopeAndIntercept Atom

One possible extension to the SoundDescription record is the siSlopeAndIntercept atom, which contains slope, intercept, minClip, and maxClip parameters.

At runtime, the contents of the type siSlopeAndIntercept and siDecompressorSettings atoms are provided to the decompressor component through the standard SetInfo mechanism of the Sound Manager.

struct SoundSlopeAndInterceptRecord {
    Float64                 slope;
    Float64                 intercept;
    Float64                 minClip;
    Float64                 maxClip;
};
typedef struct SoundSlopeAndInterceptRecord SoundSlopeAndInterceptRecord;
siDecompressionParam atom ('wave')

A second extension is the siDecompressionParam atom, which provides the ability to store data specific to a given audio decompressor in the SoundDescription record. Some audio decompression algorithms, such as Microsoft’s ADPCM, require a set of out-of-band values to configure the decompressor. These are stored in an atom of type siDecompressionParam.

This atom contains other atoms with audio decompressor settings and is a required extension to the sound sample description for MPEG-4 audio. A 'wave' chunk for 'mp4a' typically contains (in order) at least a 'frma' atom, an 'mp4a' atom, an 'esds' atom, and a terminator atom.

The contents of other siDecompressionParam atoms are dependent on the audio decompressor.

Size

An unsigned 32-bit integer holding the size of the decompression parameters atom.

Type

An unsigned 32-bit field containing the four-character code 'wave'.

Extension atoms

Atoms containing the necessary out-of-band decompression parameters for the sound decompressor. For MPEG-4 audio ('mp4a'), this includes elementary stream descriptor ('esds'), format ('frma'), and terminator (0x00000000) atoms.

Format atom ('frma')

This atom shows the data format of the stored sound media.

Size

An unsigned 32-bit integer holding the size of the format atom.

Type

An unsigned 32-bit field containing the four-character code 'frma'.

Data format

The value of this field is copied from the data-format field of the Sample Description Entry.

Terminator atom (0x00000000)

This atom is present to indicate the end of the sound description. It contains no data, and has a type field of zero (0x00000000) instead of a four-character code.

Size

An unsigned 32-bit integer holding the size of the decompression parameters atom (always set to 8).

Type

An unsigned 32-bit integer set to zero (0x00000000). This is a rare instance in which the type field is not a four-character ASCII code.

MPEG-4 Elementary Stream Descriptor ('esds') Atom

This atom is a required extension to the sound sample description for MPEG-4 audio. This atom contains an elementary stream descriptor, which is defined in ISO/IEC FDIS 14496.

Size

An unsigned 32-bit integer holding the size of the elementary stream descriptor atom

Type

An unsigned 32-bit field containing the four-character code 'esds'

Version

An unsigned 32-bit field set to zero.

Elementary Stream Descriptor

An elementary stream descriptor for MPEG-4 audio, as defined in the MPEG-4 specification ISO/IEC 14496.

Sound Sample Data

The format of data stored in sound samples is completely dependent on the type of the compressed data stored in the sound sample description. The following sections discuss some of the formats supported by QuickTime.

Uncompressed 8-Bit Sound

Eight-bit audio is stored in offset-binary encodings. If the data is in stereo, the left and right channels are interleaved.

Uncompressed 16-Bit Sound

Sixteen-bit audio may be stored in two’s-complement encodings. If the data is in stereo, the left and right channels are interleaved.

IMA, uLaw, and aLaw

Floating-Point Formats

Both kFloat32Format and kFloat64Format are floating-point uncompressed formats. Depending upon codec-specific data associated with the sample description, the floating-point values may be in big-endian (network) or little-endian (Intel) byte order. This differs from the 16-bit formats, where there is a single format for each endian layout.

24- and 32-Bit Integer Formats

Both k24BitFormat and k32BitFormat are integer uncompressed formats. Depending upon codec-specific data associated with the sample description, the floating-point values may be in big-endian (network) or little-endian (Intel) byte order.

kMicrosoftADPCMFormat and kDVIIntelIMAFormat Sound Codecs

The kMicrosoftADPCMFormat and the kDVIIntelIMAFormat codec provide QuickTime interoperability with AVI and WAV files. The four-character codes used by Microsoft for their formats are numeric. To construct a QuickTime-supported codec format of this type, the Microsoft numeric ID is taken to generate a four-character code of the form 'msxx' where xx takes on the numeric ID.

kDVAudioFormat Sound Codec

The DV audio sound codec, kDVAudioFormat, decodes audio found in a DV stream. Since a DV frame contains both video and audio, this codec knows how to skip video portions of the frame and only retrieve the audio portions. Likewise, the video codec skips the audio portions and renders only the image.

kQDesignCompression Sound Codec

The kQDesignCompression sound codec is the QDesign 1 (pre-QuickTime 4) format. Note that there is also a QDesign 2 format whose four-character code is 'QDM2'.

MPEG-1 Layer 3 (MP3) Codecs

The QuickTime MPEG layer 3 (MP3) codecs come in two particular flavors, as shown in Table 3-7. The first (kMPEGLayer3Format) is used exclusively in the constant bitrate (CBR) case (pre-QuickTime 4). The other (kFullMPEGLay3Format) is used in both the CBR and variable bitrate (VBR) cases. Note that they are the same codec underneath.

MPEG-4 Audio

MPEG-4 audio is stored as a sound track with data format 'mp4a' and certain additions to the sound sample description and sound track atom. Specifically:

The audio data is stored as an elementary MPEG-4 audio stream, as defined in ISO/IEC specification 14496-1.

Formats Not Currently in Use:MACE 3:1 and 6:1

These compression formats are obsolete: MACE 3:1 and 6:1.

These are 8-bit sound codec formats, defined as follows:

kMACE3Compression = FOUR_CHAR_CODE('MAC3'), /*MACE 3:1*/
kMACE6Compression = FOUR_CHAR_CODE('MAC6'), /*MACE 6:1*/


< Previous PageNext Page > Hide TOC


Last updated: 2007-09-04




Did this document help you?
Yes: Tell us what works for you.

It’s good, but: Report typos, inaccuracies, and so forth.

It wasn’t helpful: Tell us what would have helped.
Get information on Apple products.
Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Copyright © 2007 Apple Inc.
All rights reserved. | Terms of use | Privacy Notice