QuickTime 7 breaks free of the limitations of the Sound Manager, adding many new features and capabilities that developers can take advantage of in their audio playback and capture applications.
Notably, QuickTime 7 now supports high-resolution audio, that is, audio sampled at sample rates higher than 64 kHz and up to 192 kHz, with up to 24 channels and support for surround sound. This is in stark contrast to the implementation of the Sound Manager, which only supported mono and stereo. High-resolution audio is supported by Apple’s Core Audio technology.
The result of these new audio enhancements is as follows:
A much richer approach to sound in QuickTime, with support for higher sampling rates, such as 96 kHz and 192 kHz, multiple channels and multiple channel layouts, including 5.1 surround sound and up to 24 discrete channels, meaning channels without any layout imposed on them. Support is also provided for a variety of more accurate audio representations, such as 24-bit uncompressed audio, during capture, playback, and export. Synchronization and access to uncompressed audio on a per-sample basis is also greatly improved, including access to raw PCM audio samples from VBR-compressed audio sources.
The introduction of a new abstraction layer: the audio context. An audio context represents a connection to a particular audio device. Using an audio context allows you to easily connect a movie to an audio device.
A more flexible architecture for capturing audio. For instance,
multiple sequence grabber audio channels SGAudioMediaType)
can capture from a single device at the same time, even if the device
doesn’t permit multiple clients directly, and devices with different
channel layouts or different PCM audio formats can be interconnected seamlessly.
Conversion of audio from one format to another on the fly, performing channel mix-down or remapping, upsampling or downsampling, and sample conversion as needed. This conversion can be performed during export, or as part of the output chain to a device with different playback characteristics than the stored audio, or as part of the capture and storage chain to map input from one or more devices into one or more storage formats.
Most components, with a few exceptions such as streaming and MPEG-4 exporting, will be able to make use of these new capabilities immediately. This release of QuickTime updates a number of components so that it is possible to capture, play back, edit, and export a broad variety of enhanced audio right away.
In brief, QuickTime 7 includes the following enhancements, discussed in this section:
A new abstraction layer for audio
A new sound description
A suite of sound description functions
New movie property to prevent pitch-shifting
New functions for gain, balance, and mute
New level and frequency metering API
New audio extraction and conversion API
New audio compression configuration component
New movie export properties to support high-resolution audio
New sequence grabber component for audio (SGAudioMediaType)
New Abstraction Layer For Audio
High-Resolution Audio Support
Sound Description Creation and Accessor Functions
Audio Playback Enhancements
Audio Conversion, Export, and Extraction
Standard Audio Compression Enhancements
Audio Export Enhancements
Audio Capture Enhancements
Using Sequence Grabber Audio Features
QuickTime 7 introduces the audio context––a new abstraction that represents playing to an audio device.
As defined, a QuickTime audio context is an abstraction for a connection to an audio device. This allows you to work more easily and efficiently with either single or multiple audio devices in your application.
To create an audio context, you call QTAudioContextCreateForAudioDevice and
pass in the UID of the device, which is typically a CFString.
An audio context is then returned. You can then pass that audio
content either into NewMovieFromProperties,
as you would pass in a visual context, or you can open your movie
however you would normally open it and call SetMovieAudioContext.
What that does is route all the sound tracks of the movie to that particular
device.
Note that if you want to route two different movies to the
same device, you cannot use the same audio context because the audio
context is a single connection to that device. What you do is call QTAudioContextCreateForAudioDevice again
and pass in the same device UID to get another AudioContext for
the same device, and pass that to your second movie.
High-resolution audio makes use of an enhanced sound description with the ability to describe high sampling rates, multiple channels, and more accurate audio representation and reproduction.
Significantly, the new sound description has larger fields to describe the sampling rate and number of channels, so that the sound description is no longer the limiting factor for these characteristics.
The sound description has built-in support for variable-bit-rate
(VBR) audio encoding with variable-duration compressed frames. Extensions
to the sound description allow you to describe the spatial layout
of the channels, such as quadraphonic and 5.1 surround sound, or
to label channels as discrete––that is, not
tied to a particular geometry. For more information, see “SoundDescriptionV2”.
New movie audio properties include a summary channel layout property, providing a nonredundant listing of all the channel types used in the movie—such as L/R for stereo, or L/R/Ls/Rs/C for 5-channel surround sound—and a device channel layout, listing all the channel types used by the movie’s output device.
Figure 2-28 shows the layout of surround speakers. The terminology is defined in Table 1-1.
Speaker |
Definition |
|---|---|
L |
Left speaker |
R |
Right speaker |
C |
Center speaker |
Ls |
Left surround speaker |
Rs |
Right surround speaker |
LFE |
Sub-woofer (Note that LFE is an abbreviation for low-frequency effects) |
The new sound description is supported by the data types and structures used in the Core Audio framework for Mac OS X (see Core Audio documentation). While the Core Audio API itself is not available to Windows programmers, QuickTime for Windows may include the relevant data structures, such as audio buffers and stream descriptions, audio time stamps and channel layouts, and so on, described in the Core Audio documentation.
A suite of functions has been included to support the handling of sound descriptions opaquely.
Playback at the high level is automatic and transparent; if you play a movie that contains 96 kHz or 192 kHz sound, it should just work. You should not have to modify your code. The same is true for cut-and-paste editing. If the chosen output device does not support the channel layout, sampling rate, or sample size of the movie audio, mix-down and resampling are performed automatically.
Import of high-resolution audio is automatic, provided the import component has been updated to support high-resolution audio.
Export of high-resolution audio is likewise transparent at
the high level. Export at the lower levels requires some additional
code. Your application must “opt in” to the new audio features
explicitly if it “talks” directly to an export component instance.
You do this by calling QTSetComponentProperty on
the exporter component instance and passing in the kQTMovieExporterPropertyID_EnableHighResolutionAudioFeatures property.
This is illustrated in the code sample Listing 2-1.
Capturing high-resolution audio requires new code to configure and use the new sequence grabber component for audio. The new audio capture API offers a number of improvements, including the ability to share an input device among multiple sequence grabber channels and the usage of multiple threads for increased efficiency.
When all components in a chain are able to work with high-resolution audio, clock information can be preserved across operations for sample-accurate synchronization.
QuickTime 7 provides new functions that let you create, access, and convert sound descriptions.
Sound descriptions can take three basic inputs: an AudioStreamBasicDescription,
a channel layout, and magic cookie. Sound descriptions are now treated
as if they are opaque. In QuickTime 7, when you are handed a sound
description, for example, you don’t have to go in and look at
the version field.
If you want to create a sound description, you can simply
hand it an AudioStreamBasicDescription,
an optional channel layout if you have one, and an optional magic
cookie if you need one for the described audio format. Note that
it is the format (codec) of the audio that determines whether it
needs a magic cookie, not the format of the sound description.
By calling QTSoundDescriptionCreate, you can make
a sound description of any version you choose––for example,
one that is of the lowest possible version, given that it is stereo and
16-bit, or one of any particular version you want or request.
The main point about the new API is the capability provided to create a sound description and the usage of new property getters and setters. To accomplish this, follow these steps:
Get an AudioStreamBasicDescription from
a sound description.
Get a channel layout from a sound description (if there is one).
Get the magic cookie from magic cookie (if there is one).
At this point, you have all the information you need to talk to Core Audio about this audio. You can also:
Get a user-readable textual description of the
format described by the SoundDescription.
Add or replace a channel layout to an existing sound description. For example, this is what QuickTime Player does in the properties panel where the user can change the channel assignments.
Add a magic cookie to a sound description. (This is not needed very often unless you are writing a movie importer, for example.)
To convert an existing QuickTime sound description into the
new V2 sound description, you call QTSoundDescriptionConvert. This lets
you convert sound descriptions from one version to another.
For a description of versions 0 and 1 of the SoundDescription record,
see the documentation for the QuickTime File Format.
For a description of version 2 of the SoundDescription record,
see “SoundDescriptionV2”. For details
of the sound description functions, see QTSoundDescriptionCreate and QTSoundDescriptionConvert.
In addition to playing back high-resolution audio, QuickTime 7 introduces the following audio playback enhancements:
The ability to play movies at a nonstandard rate without pitch-shifting the audio.
Getting and setting the gain, balance, and mute values for a movie, or the gain and mute values for a track.
Providing audio level and frequency metering during playback.
A new property is available for use with the NewMovieFromProperties function: kQTAudioPropertyID_RateChangesPreservePitch.
When this property is set, changing the movie playback rate will
not result in pitch-shifting of the audio. This allows you to fast-forward
through a movie without hearing chipmunks.
Setting this property also affects playback of scaled edits, making it possible to change the tempo of a sound segment or scale it to line up with a video segment, for example, without changing the pitch of the sound.
New functions are available to set the left-right balance for a movie, set the gain for a movie or track, or to mute and unmute a movie or track without changing the gain or balance settings.
The gain and mute functions duplicate existing functions for setting track and movie volume, but the new functions present a simpler and more consistant programmer interface.
For example, to mute the movie using the old SetMovieVolume function,
you would pass in a negative volume value; to preserve the current
volume over a mute and unmute operation, you had to first read the
volume, then negate it and set it for muting, then negate it and
set it again to unmute. By comparison, the new SetMovieAudioMute function
simply mutes or unmutes the movie without changing the gain value.
Note: The values set using these functions are not persistent; that is, they are not saved with the movie.
For details, see
It is now easy to obtain real-time measurements of the average audio output power level in one or more frequency bands.
You can specify the number of frequency bands to meter. QuickTime divides the possible frequency spectrum (approximately half the audio sampling rate) into that many bands. You can ask QuickTime for the center frequency of each resulting band for display in your user interface.
You can measure the levels either before or after any mix-down or remapping to an output device. For example, if you are playing four-channel surround sound into a stereo output device, you might want to meter the audio levels of all four channels, or you might prefer to see the actual output values delivered to the stereo device.
To use the frequency metering API, follow these steps:
Set the number of frequency bands to meter using SetMovieAudioFrequencyMeteringNumBands.
Call GetMovieAudioFrequencyMeteringBandFrequencies if
you need to know the frequencies of the resulting bands.
Finally, make periodic calls to GetMovieAudioFrequencyLevels to
obtain measurements in all specified bands. You can obtain either
the average values, the peak hold values, or both.
For details, see
The new audio extraction API lets you retrieve mixed, uncompressed audio from a movie.
Note that the audio extraction API currently only mixes audio from sound tracks. Other media types, such as muxed MPEG-1 audio inside a program stream, are not currently supported.
To use the audio extraction API, follow these steps:
Begin by calling MovieAudioExtractionBegin. This returns
an opaque session object that you pass to subsequent extraction
routines.
You can then get the AudioStreamBasicDescription for
the audio or layout. Note that some properties are of variable size,
such as the channel layout, depending on the audio format, so getting
the information involves a two-step process.
First,
you call MovieAudioExtractionGetPropertyInfo to
find out how much space to allocate.
Next, call MovieAudioExtractionGetProperty to obtain
the actual value of the property.
You can use the AudioStreamBasicDescription to
specify a different uncompressed format than Float 32. This causes
the extraction API to automatically convert from the stored audio
format into your specified format.
Use the MovieAudioExtractionSetProperty function
to specify channel remapping––that is, a different layout––sample
rate conversion, and preferred sample size. You can also use this
function to specify interleaved samples (default is non-interleaved)
or to set the movie time to an arbitrary point.
Note that there are basically two things you set here: an audio stream basic description (ASBD) and a channel layout. (ASBD sets the format, sample, number of channels, interleavings, and so on.)
Setup is now complete. You can now make a series of calls
to MovieAudioExtractionFillBuffer to
receive uncompressed PCM audio in your chosen format.
The default is for the first call to begin extracting
audio at the start of the movie, and for subsequent calls to begin
where the last call left off, but you can set the extraction point anywhere
in the movie timeline by calling MovieAudioExtractionSetProperty and setting
the movie time.
MovieAudioExtractionFillBuffer will
set kMovieAudioExtractionComplete in outFlags when
you reach the end of the movie audio.
You must call MovieAudioExtractionEnd when
you are done. This deallocates internal buffers and data structures
that would otherwise continue to use memory and resources.
A caveat: Ideally, the uncompressed samples would be bitwise identical whether you obtained the samples by starting at the beginning of the movie and iterating through it, or by randomly setting the movie time and extracting audio samples. This is typically the case, but for some compression schemes the output of the decompressor depends not only on the compressed sample, but the seed value in the decompressor that remains after previous operations.
The current release of QuickTime does not perform the necessary work to determine what the seed value would be when the movie time is changed prior to extracting audio; while the extracted audio is generally indistinguishable by ear, it may not always be bitwise identical.
For details about audio conversion, export, and extraction, refer to the information about the following functions:
QuickTime 7 introduces a new standard compressor component, StandardCompressionSubTypeAudio,
that adds the ability to configure high-resolution audio output
formats. It uses Core Audio internally instead of the Sound Manager,
and has a full set of component properties to make configuration
easier, especially when the developer wishes to bring up an application-specific
dialog, or no dialog, rather than the typical compression dialog.
This component essentially replaces the StandardCompressionSubTypeSound component, which
is limited to 1 or 2 channel sound with sampling rates of 65 kHz
or less. That component is retained for backward compatability with
existing code, but its use is no longer recommended.
The StandardCompressionSubTypeAudio component
is configured by getting and setting component properties, instead
of using GetInfo and SetInfo calls. These properties have a class
and ID, instead of just a single selector.
The component property API allows configuration at any level of detail without requiring a user interface dialog or direct communication with low-level components.
For details, refer to the sections “SGAudio Component Property Classes”and “SGAudio Component Property IDs.”
Note: You can also
configure the new standard audio compression component by calling SCSetSettingsFromAtomContainer.
You can pass the new standard audio compression component either
a new atom container obtained from SCGetSettingsAsAtomContainer or
an old atom container returned by calling the same function (SCGetSettingsAsAtomContainer) on
the old SubTypeSound component.
If you use MovieExportToDataRefFromProcedures,
your getProperty proc will need to support some of these property
IDs as new selectors. Note that the Movie Exporter getProperty proc
API is not changing to add a class (the class is implied).
Note: Not all properties
can be implemented by getProperty procs; the properties that getProperty
procs can implement are marked with the word "DataProc".
See the inline documentation in QuickTimeComponents.h for
more information.
Some movie export components now support high-resolution audio.
Export of high-resolution audio is transparent at the high level. If you export from a movie containing high-resolution audio to a format whose export component supports it, the transfer of data is automatic; if the export component does not support high-resolution audio, mix-down, resampling, and sound description conversion are automatic.
Export at the lower levels requires some additional code. Your application must “opt in” to the new audio features explicitly if it talks directly to an export component instance. (This is to prevent applications that have inadvisedly chosen to “walk” the opaque atom settings structure from crashing when they encounter the new and radically different structure.) The following code snippet (Listing 2-1) illustrates the opt-in process.
Listing 2-1 Opting in for high-resolution audio export
ComponentInstance exporterCI; |
ComponentDescription search = { ’spit’, ’MooV’, ’appl’, 0, 0 }; |
Boolean useHighResolutionAudio = true, canceled; |
OSStatus err = noErr; |
Component c = FindNextComponent(NULL, &search); |
exporterCI = OpenComponent(c); |
// Hey exporter, I understand high-resolution audio!! |
(void) QTSetComponentProperty(// disregard error |
exporterCI, |
kQTPropertyClass_MovieExporter, |
kQTMovieExporterPropertyID_EnableHighResolutionAudioFeatures, |
sizeof(Boolean), |
&useHighResolutionAudio); |
err = MovieExportDoUserDialog(exporterCI, myMovie, NULL, 0, 0, &canceled); |
For additional details, see “Movie Exporter Properties”.
There is a new sequence grabber channel component (’sgch’)
subtype for audio, SGAudioMediaType (’audi’),
which allows capture of high-resolution audio, supporting multi-channel,
high sample rate, high accuracy sound. This is intended to replace
the older SoundMediaType component.
Important:
The new component still captures a sound track of type SoundMediaType (’soun’);
only the sequence grabber media type changes,
not the final track media type.
The new audio channel component has a number of noteworthy features, including:
audio capture to VBR compressed formats
enabling or disabling of source channels on a multi-channel input device
mix-down and remapping of multi-channel audio source material
discrete and spatial labeling of channels (for example, 5.1 or discrete)
audio format and sample rate conversion during capture
sharing of audio input devices among multiple sequence grabber audio channels
sharing of audio playback devices among multiple sequence grabber audio channels
notification of audio device hotplug/unplug events
audio preview of source data or compressed data
splitting audio channels from a record device to separate tracks in a movie
redundant capture of multichannel audio to separate tracks in a movie (with independent data rates and compression settings)
client callbacks of audio pre- and post-mixdown, and pre- and post-conversion with propagation of audio time stamps and audio samples to interested clients
improved A/V sync
improved threading model compared with the legacy SoundMediaType
lower latency audio grabs
reduced dependency on frequent SGIdle calls
This new, advanced functionality makes extensive use of Core Audio methodology and data structures.
The audio channel component can be configured using component properties. This has several advantages over using a sequence grabber panel. For one thing, it can be configured without a user dialog, or using an application-specific dialog. For another, it is possible to test for properties and get or set them dynamically, allowing the same code to configure multiple audio input devices, including unfamiliar devices.
The application does not need to bypass the channel component
and connect directly to an input device, such as a SoundInputDriver,
to set low-level properties. This allows multiple capture channels
to share a single input device, and keeps application code from
becoming tied to a particular device type.
For a full list of the SGAudioMediaType component
properties, see “SGAudio Component Property IDs”.
For a full list of component property classes, see “SGAudio Component Property Classes”.
Once the component is configured, the audio capture—plus any desired mixdown, format or sample-rate conversion, and compression—take place in a combination of real-time and high-priority threads. Multichannel data is interleaved and samples are put into a queue. You can set up callbacks to watch the data at any of several points in the chain: pre-mixdown, post-mixdown, pre-conversion, or post-conversion.
The actual writing of the captured audio to a storage medium,
such as a disk file, takes place during calls to SGIdle.
One input device can be shared by multiple sequence grabber channels, as illustrated in Figure 2-29. Because independent mix and conversion stages exist for each sequence grabber audio channel, the sequence grabber audio channels can capture different channel mixes, sampling rates, sample sizes, or compression schemes from the same source. Similarly, multiple sequence grabber audio channels can share a common output device for previewing.
Channel mixdown or remapping, sample conversion, and any compression are all performed on high-priority threads. Each sequence grabber channel receives data from only those audio channels it has requested, in the format it has specified. The following processing may occur in the background:
software gain adjustment
mixing
sample rate conversion
bit-depth widening or shortening
float to integer conversion
byte-order conversion (big-endian to little-endian or vice-versa)
encoding of frames into compressed packets of data in the specified format
interleaving
The resulting frames or packets are held in a queue, to be
written to file or broadcast stream on the main thread. This is
accomplished during calls to SGIdle,
at which time the audio is chunked and interleaved with any video
data being captured.
Figure 2-29 is a high-level diagram that shows some of the internal workings of the sequence grabber audio channel, such as the Core Audio matrix mixer and the audio converter that lets you convert, compress, and interleave audio, and then queue the audio. From the queue, the audio can be written to disk in desired chunk sizes. One distinct advantage of this process is that you can take a single device and share it among multiple channels. This results in simultaneous recording from multiple devices into multiple tracks in a QuickTime movie. In addition, you can record multiple tracks from a single device.
Figure 2-30 illustrates a usage case that involves client channel mapping. This shows how a client can instantiate multiple sequence grabber audio channels that share a recording device. This enables the “splitting” of device channels across multiple tracks in a QuickTime movie. In Figure 2-30, there is single recording device, with four channels. The first two channels record into Track 1 in a QuickTime movie. The second sequence grabber audio channel, which records into Track 2 in a QuickTime movie, only wants channel 4 from the recording device, so that you can get one stereo track and one mono track.
In this example, device Track 0 will get into Movie Track 1, while Movie Track 3 has only one slot to fill. You can mix and match different channel map valences in such a way as to disable certain tracks in a movie and get submixes, for example. In code, it looks like this:
SInt32 map 1 [ ] = { 0, 1 }; |
SInt32 map 2 [ ] = { 3 }; |
Figure 2-31 shows another usage case that also involves client channel mapping.
A sequence grabber audio channel shown in the illustration can get four channels from a device in any order that makes sense for the client. Consider, for instance, a device that supports four-channels of audio. Using the channel map property IDs (“SGAudio Component Property IDs”), you can reorder channels from a recording device to a desired movie channel valence. In code, it looks like this:
SInt32 map [ 4 ] = { 3, 2, 1, 0 }; |
Figure 2-32 shows another example of what you can do with the feature of channel mapping, in this case mult’-ing, that is, duplicating channels from a recording device into multiple output channels in QuickTime movie tracks. For instance, you can take advantage of this channel mapping feature if you have one recording device and two sequence grabber audio channels, and they’re both going to make the same movie. The first sequence grabber audio channel wants the first stereo pair twice (1, 2, 1, 2), while the second wants the second stereo pair twice (3, 4, 3, 4). In code, it looks like this (zero-based indexing):
SInt32 map 1 [ ] = { 0, 1, 0, 1 }; |
SInt32 map 2 [ ] = { 2, 3, 2, 3 }; |
Figure 2-33 illustrates the what you can do with multiple mixes. Because you can duplicate device channels onto multiple output tracks, you can create a movie containing multiple mixes of the same source material.
This is useful for a recording situation where you have a six channel recording device and are presenting 5.1 material. You could make a QuickTime movie that has four tracks in it. In this case, the first track is getting the raw, unmixed source––that is, channels one through six. You will have a six discrete channel track, meaning that the first channel plays out to the first speaker, the second channel out to the second speaker, and so on.
In sequence grabber audio channel #2, you’ll get a 5.1 mix and apply spatial orientation to the six channels, specifying the speakers to which the audio will play. All four tracks are going into a QuickTime movie. Sequence grabber audio channel #3 presents a stereo mix-down, while sequence grabber audio channel #4 presents a mono mix-down.
Figure 2-34 shows channel mapping with multi-date rates, similar to multiple mixes, except that you can also apply compression to the mixes. As a result, you can broadcast multiple streams at once.
Figure 2-35 shows sequence grabber audio callbacks,
which are analogous to the VideoMediaType sequence
grabber channel video bottlenecks. The callbacks provide developers
with different places in the audio chain where they can “pipe
in” and look at the samples.
Figure 2-35 Sequence grabber audio callbacks, analogous to sequence grabber video bottlenecks callbacks

Figure 2-36 shows sequence grabber audio callbacks, with real-time preview. Clients can specify what they want to preview, using the sequence grabber channel play flags.
To make use of the new sequence grabber audio features, follow these steps:
Instantiate a sequence grabber channel of subtype SGAudioMediaType (’audi’),
by calling SGNewChannel(sg,
SGAudioMediaType, &audiChannel).
Use the QuickTime component property API to obtain a list
of available input and preview devices from the sequence grabber
channel, by getting the property kQTSGPropertyID_DeviceListWithAttributes (’#dva’).
Use the same component property API to get the input device
characteristics and set the desired audio format and device settings.
See “SGAudio Component Property Classes” and “SGAudio Component Property IDs” for
details. Note that this is sometimes a two-stage process, as next
described.
Use QTGetComponentPropertyInfo to
determine the size of the property value.
Allocate the necessary container and use QTGetComponentProperty to
obtain the actual value. This is necessary with properties such
as channel layout, which is a variable length structure.
Call SGStartRecord or SGStartPreview,
enabling the sequence grabber, and then make periodic calls to SGIdle.
If you are capturing only sequence grabber audio media, it
is no longer necessary to make extremely frequent calls to SGIdle,
since this function is only used to write the samples to storage,
not to capture data from the input device. When capturing video
or using an old-style sequence grabber sound media component, however,
you must still call SGIdle frequently
(at a frequency greater than the video sample rate or the sound
chunk rate).
By setting the appropriate sequence grabber channel properties
and setting up a callback, you can examine samples at various points
in the input chain, such as premix, postmix, preconversion, and
postconversion. For details, see SGAudioCallbackProc, SGAudioCallbackStruct, “SGAudio Component Property Classes” and “SGAudio Component Property IDs”.
Last updated: 2005-04-29