
Does your application need to create a custom visual display from QuickTime movie audio, similar to the iTunes Visualizer? Or extract the audio from a home movie so it can
be converted to your own custom audio format?
The iTunes Visualizer uses the audio signal to generate a visual effect.
Perhaps you have worked with iMovie or QuickTime Player and liked how these applications are able to easily substitute audio from a video clip.
For example, you may have an interesting home video, and you would like to use the same audio from this clip with some other video and images.
If so, then the new audio extraction API, introduced with QuickTime 7, is the tool for you. This new API allows you to very easily extract and mix the audio from multiple
sound tracks contained in a QuickTime movie and convert this audio to raw PCM (pulse code modulation) data. You can then use the raw audio data for playback, export or direct manipulation
in your application. Or, you may want to extract the movie audio so you can then perform signal-processing using Core Audio. With the audio extraction API, you can do all this and more.
Previously, extracting audio from a movie was difficult due to the lack of a convenient, easy-to-use API. Developers would have
to either: (a) manually step through all the movie audio samples using GetMediaSample which is cumbersome and has limitations; (b) use audio export
components (for example, with the PutMovieIntoTypedHandle function) which do not offer optimal
performance because they export all of the audio at once instead of in smaller chunks; or (c) create a sound output component which is difficult to write.
Now, the audio extraction API offers many advantages over the traditional techniques described above for performing audio extraction, including:
- more efficient execution path;
- mixing of all enabled sound tracks in the movie;
- allows you to set the start position for extraction and extract exactly as much data as you want;
- allows you to specify a channel layout for the resulting mix;
- automatically works for new track types that mix into the AudioContext;
- uses the same execution path as playback, so what you hear is what you get;
- convenient, easy to use API.
The audio extraction APIs described in this article are available in QuickTime 7 and beyond.
NOTE: the audio extraction API currently only mixes audio from sound tracks. Other media types,
such as muxed MPEG-1 audio inside a program stream, are not currently supported.
Getting Started
The basic sequence of events for audio extraction goes like this:
- Begin Extraction
- Get/Set Extraction Properties
- Fill Buffers With Audio Samples
- End Extraction
Note the movie must be active when you start audio extraction (use SetMovieActive). All the sound tracks you want to hear must be enabled as well (use
SetTrackEnabled).
Simply configure the extraction session as you'd like, then start. The extraction starts when the first call to MovieAudioExtractionFillBuffer is made. Once you call
MovieAudioExtractionFillBuffer, the configuration for the current extraction session is set and cannot be changed. If you want to change what you are extracting, you
must close the session and configure a new session as appropriate.
Note that audio extraction will return valid data up until the end of the movie, as determined by GetMovieDuration. If all the audio tracks end before that, audio extraction
will return silence from the end of the last audio track until the movie duration is reached. You can use GetTrackDuration to determine the actual length of the audio
tracks of interest, and limit your extraction pulls.
Now let's go through the process step-by-step, showing in detail how to perform audio extraction using the new API.
Step 1: Begin Extraction
Begin by calling MovieAudioExtractionBegin. This API must be called before doing any movie audio extraction. This returns an opaque session object that you pass to
subsequent extraction routines as shown in Listing 1:
Listing 1: Obtaining the Audio Extraction Session Opaque Object.
#if TARGET_OS_MAC
#include <QuickTime/QuickTime.h>
#elif TARGET_OS_WIN32
#include <Movies.h>
#endif
OSStatus err = noErr;
MovieAudioExtractionRef extractionSessionRef = nil;
err = MovieAudioExtractionBegin(movie, 0, &extractionSessionRef);
Note that the extracted audio format defaults to the aggregate channel layout of the movie (for example, all Rights mixed together,
all Left Surrounds mixed together, and so forth), 32-bit float, de-interleaved, with the sample rate set to the highest sample rate found in the movie from all the tracks.
The audio extraction API supports all varieties of PCM data (uncompressed) coming out of the audio extraction. If you ultimately require compressed data, you must do
your own compression of the returned data. You can set the format the extracted audio using the "Get/Set Audio Extraction Session Properties" (see below); this
configuration must be completed before the first call to MovieAudioExtractionFillBuffer.
Step 2: Get/Set Audio Extraction Session Properties
Audio Extraction defines a number of properties which you can get/set for a given audio extraction session. For example, you can get the audio stream basic
description (ASBD) property or the audio channel layout.
Note that some properties are of variable size, such as the channel layout, so getting the information involves a two-step process:
- First, you call
MovieAudioExtractionGetPropertyInfo to find out how much space to allocate for the property.
- Next, call
MovieAudioExtractionGetProperty to obtain the actual value of the property.
For example, the code in Listing 2 shows how to use this technique to obtain the channel layout:
Listing 2: Obtaining the Channel Layout for the Extraction Session.
#if TARGET_OS_MAC
#include <CoreAudio/CoreAudio.h>
#include <QuickTime/QuickTime.h>
#elif TARGET_OS_WIN32
#include <CoreAudioTypes.h>
#include <Movies.h>
#endif
OSStatus err = noErr;
AudioChannelLayout *layout = NULL;
UInt32 size = 0;
// First get the size of the extraction output layout
err = MovieAudioExtractionGetPropertyInfo(extractionSessionRef,
kQTPropertyClass_MovieAudioExtraction_Audio,
kQTMovieAudioExtractionAudioPropertyID_AudioChannelLayout,
NULL, &size, NULL);
if (err == noErr)
{
// Allocate memory for the channel layout
layout = (AudioChannelLayout *) calloc(1, size);
if (layout == nil)
{
err = memFullErr;
goto bail;
}
// Get the layout for the current extraction configuration.
// This will have already been expanded into channel descriptions.
err = MovieAudioExtractionGetProperty(extractionSessionRef,
kQTPropertyClass_MovieAudioExtraction_Audio,
kQTMovieAudioExtractionAudioPropertyID_AudioChannelLayout,
size, layout, nil);
}
You can use the ASBD to specify a different uncompressed format than Float32. This causes the extraction API to automatically convert from the stored audio format
into your specified format.
To set the ASBD for the extraction session, use the MovieAudioExtractionSetProperty function. For example, Listing 3 shows how to set the ASBD to return
interleaved 16-bit PCM instead of the default non-interleaved Float32:
Listing 3: Using MovieAudioExtractionSetProperty to set the ASBD to Return Interleaved 16-bit PCM for an Extraction Session.
#if TARGET_OS_MAC
#include <CoreAudio/CoreAudio.h>
#include <QuickTime/QuickTime.h>
#elif TARGET_OS_WIN32
#include <CoreAudioTypes.h>
#include <Movies.h>
#endif
OSStatus err;
AudioStreamBasicDescription asbd;
// Get the default audio extraction ASBD
err = MovieAudioExtractionGetProperty(extractionSessionRef,
kQTPropertyClass_MovieAudioExtraction_Audio,
kQTMovieAudioExtractionAudioPropertyID_AudioStreamBasicDescription,
sizeof (asbd), &asbd, nil);
// Convert the ASBD to return interleaved 16-bit PCM instead of non-interleaved Float32.
asbd.mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked
| kAudioFormatFlagsNativeEndian;
asbd.mBitsPerChannel = sizeof (SInt16) * 8;
asbd.mBytesPerFrame = sizeof(SInt16) * asbd.mChannelsPerFrame;
asbd.mBytesPerPacket = asbd.mBytesPerFrame;
// Set the new audio extraction ASBD
err = MovieAudioExtractionSetProperty(extractionSessionRef,
kQTPropertyClass_MovieAudioExtraction_Audio,
kQTMovieAudioExtractionAudioPropertyID_AudioStreamBasicDescription,
sizeof (asbd), &asbd);
The ASBD is represented by the AudioStreamBasicDescription structure. It is the fundamental descriptive structure in Core Audio (the modern audio
architecture in Mac OS X). It contains all the information needed for describing streams
of audio data. Definitions for the ASBD and channel layout structures can be found in the CoreAudio.framework/Headers/CoreAudioTypes.h header file.
Listing 4 shows what the structure looks like along with a description of each of the fields.
Listing 4: The AudioStreamBasicDescription (ASBD) structure.
typedef struct AudioStreamBasicDescription {
Float64 mSampleRate;
UInt32 mFormatID;
UInt32 mFormatFlags;
UInt32 mBytesPerPacket;
UInt32 mFramesPerPacket;
UInt32 mBytesPerFrame;
UInt32 mChannelsPerFrame;
UInt32 mBitsPerChannel;
UInt32 mReserved;
} AudioStreamBasicDescription;
mSampleRate
The number of sample frames per second of the data in the stream.
mFormatID
A four char code indicating the general kind of data in the stream.
mFormatFlags
Flags specific to each format.
mBytesPerPacket
The number of bytes in a packet of data.
mFramesPerPacket
The number of sample frames in each packet of data.
mBytesPerFrame
The number of bytes in a single sample frame of data.
mChannelsPerFrame
The number of channels in each frame of data.
mBitsPerChannel
The number of bits of sample data for each channel in a frame of data.
mReserved
Pads the structure out to force an even 8 byte alignment.
This structure encapsulates all the information for describing the basic format properties of a stream of audio data.
In audio data a frame is one sample across all channels. If the ASBD describes non-interleaved audio, the byte and frame count fields describe one channel
(mBytesPerPacket would be 2 for non-interleaved stereo 16-bit PCM). For interleaved audio, the fields describe the set of n channels (mBytesPerPacket would be 4 for
interleaved stereo 16-bit PCM). In uncompressed audio, a packet is one frame, (mFramesPerPacket == 1).
You may get the ASBD value at any time during movie audio extraction, but if you wish to set the value you must set it before the first MovieAudioExtractionFillBuffer call.
If you get this property immediately after beginning an audio extraction session, it will tell you the default extraction format for the movie. This will include the number of
channels in the default movie mix.
If you set the output ASBD, it is recommended that you also set the output channel layout. If your output ASBD has a different number of channels than the default
extraction mix, you _must_ set the output channel layout.
You can only set PCM output formats. Setting a compressed output format will fail.
Use the MovieAudioExtractionSetProperty function to specify channel mixing—that is, a different layout-sample rate conversion, and preferred
sample size.
You can also use this function to specify interleaved samples (default is non-interleaved). Note that there are basically two things you set here to accomplish this: the
audio stream basic description (ASBD) as described above, and the channel layout. (ASBD sets the format, sample, number of channels, interleavings, and so on.)
Also, instead of setting an audio channel layout you can disable all mixing of audio channels and extract them all individually using the
kQTMovieAudioExtractionMoviePropertyID_AllChannelsDiscrete property as shown in Listing 5.
Listing 5: Using kQTMovieAudioExtractionMoviePropertyID_AllChannelsDiscrete to Disable Mixing of Audio Channels.
#if TARGET_OS_MAC
#include <QuickTime/QuickTime.h>
#elif TARGET_OS_WIN32
#include <Movies.h>
#endif
OSStatus err;
Boolean allChannelsDiscrete = true;
// disable mixing of audio channels
err = MovieAudioExtractionSetProperty(extractionSessionRef,
kQTPropertyClass_MovieAudioExtraction_Movie,
kQTMovieAudioExtractionMoviePropertyID_AllChannelsDiscrete,
sizeof (Boolean), &allChannelsDiscrete);
Other properties you can set for the audio extraction session include the movie current time. The current time can be set to any arbitrary point in the movie. You can set
the current time anytime during extraction, even after you've already extracted some audio data. Listing 6 shows how to set the current time property.
Listing 6: Setting the Movie Audio Extraction Current Time.
#if TARGET_OS_MAC
#include <QuickTime/QuickTime.h>
#elif TARGET_OS_WIN32
#include <Movies.h>
#endif
OSStatus err;
TimeRecord timeRec;
Movie movie;
movie = MyGetMovie();
timeRec.scale = GetMovieTimeScale(movie);
timeRec.base = NULL;
timeRec.value.hi = 0;
timeRec.value.lo = 60 * timeRec.scale; // for instance, to start at time 1:00.00
// Set the extraction current time. The duration will
// be determined by how much is pulled.
err = MovieAudioExtractionSetProperty(extractionSessionRef,
kQTPropertyClass_MovieAudioExtraction_Movie,
kQTMovieAudioExtractionMoviePropertyID_CurrentTime,
sizeof(TimeRecord), &timeRec);
Step 3: Fill Buffers With Audio Samples
Setup is now complete. You can now make a series of calls to MovieAudioExtractionFillBuffer to receive uncompressed PCM
audio in your chosen format.
The default is for the first call to begin extracting audio at the start of the movie, and for subsequent calls to begin where the last call left off, but
you can set the extraction point anywhere in the movie timeline by setting the movie current time property as shown in the previous section (see Listing 6).
Note that MovieAudioExtractionFillBuffer will extract as many of the requested PCM frames as it can, given the limits of the buffer(s) supplied, and the limits of the input movie.
ioNumFrames will be updated with the exact number of valid frames being returned. When there is no more audio to extract from the movie,
MovieAudioExtractionFillBuffer will continue to return noErr, but no audio data will be returned. outFlags will have the kQTMovieAudioExtractionComplete bit set in this
case, as shown in Listing 7.
Listing 7: Checking for Extraction Complete.
err = MovieAudioExtractionFillBuffer(extractionSessionRef, &numFrames, slice->mBufferList, &flags);
if (flags & kQTMovieAudioExtractionComplete)
{
// extraction complete!
}
It is possible that the kQTMovieAudioExtractionComplete bit will accompany the last buffer of valid data.
Step 4: End Extraction
You must call MovieAudioExtractionEnd when you are done, as shown in Listing 8. This deallocates internal buffers and data structures that would otherwise continue to use memory and
resources.
Listing 8: Ending a Movie Extraction Session.
OSStatus err;
err = MovieAudioExtractionEnd(extractionSessionRef);
NOTE: Ideally, the uncompressed samples would be bitwise identical whether you obtained the samples
by starting at the beginning of the movie and iterated through it, or by randomly setting the movie time
and extracting audio samples. This is typically the case (particularly in QuickTime 7.0.4 and later), but
there are some unavoidable latencies when decompressing MP3 audio data, resulting in up to 2048 zeroes
preceding the start of valid data (the exact number of zeroes is encoder-dependent).
Audio Extraction from a Background Thread
Mac OS X v10.3 and later allows performing certain QuickTime operations from background threads (for a complete discussion of threads programming with
QuickTime see Technical Note TN2125: Thread-safe Programming in QuickTime). This means you can now open a
movie on a background thread and perform audio extraction from this thread (or alternately open the movie on the main thread, detach it from the main thread, attach it
to a background thread and then perform the extraction from the background thread).
The process is described in more detail below.
Calling EnterMoviesOnThread
Applications using QuickTime on background threads must first call EnterMoviesOnThread on each background thread before calling any other QuickTime APIs on
those threads. EnterMoviesOnThread is used to indicate to QuickTime that an application will be using
QuickTime APIs on the current thread.
Calling EnterMoviesOnThread also informs the Component Manager that it should not allow the use of any non-thread-safe components on that thread. If the
Component Manager is about to open a non-thread-safe component to perform a certain function, it will return a componentNotThreadSafeErr (-2098) error without
opening the component. This error code will then propagate up to the caller so be prepared to handle these types of errors in your program.
NOTE: Your application can receive a componentNotThreadSafeErr from any QuickTime API called from a
background thread, and should use this as a notification that the work being performed on the background
thread needs to be shifted over to the main thread. Be prepared to handle these errors.
When QuickTime will no longer be used on a background thread, the thread should call ExitMoviesOnThread. This indicates to QuickTime that the
application will no longer be using QuickTime from that thread.
Using QuickTime Movies in Threads for Audio Extraction
QuickTime Movies must know which thread they belong to at any given time. Obtaining a Movie reference using any of the NewMovie... APIs such as NewMovie,
NewMovieFromDataRef, NewMovieFromFile and so on will create a Movie that is already attached to the current thread.
This means if you wish to perform audio extraction from a background thread you must either:
- Open the movie from the background thread, or:
- Open the movie initially from the main thread, then move it to the background thread.
There are two APIs that must be called whenever moving a QuickTime Movie from one thread to another: when passing a QuickTime Movie from one thread to another,
call DetachMovieFromCurrentThread in the old thread and AttachMovieToCurrentThread in the new thread. This lets QuickTime know which thread owns the Movie and
ensures that the Movie is not incorrectly tasked on the wrong thread. Calls to AttachMovieToCurrentThread will fail if the movie is already attached to a thread.
Summary
Here's a brief summary of the steps/calls necessary to perform audio extraction from a background thread:
- EnterMoviesOnThread
- Open the movie (NewMovie, NewMovieFromDataRef, and so on)
- Perform audio extraction (MovieAudioExtractionBegin, and so on)
- ExitMoviesOnThread
Alternately, first open the movie from the main thread:
- Open movie from the main thread (NewMovie, NewMovieFromDataRef, and so on)
- Detach movie from main thread (DetachMovieFromCurrentThread)
Then from the background thread:
- EnterMoviesOnThread
- Attach movie to background thread (AttachMovieToCurrentThread)
- Perform audio extraction (MovieAudioExtractionBegin, and so on...)
- ExitMoviesOnThread
Conclusion
The new QuickTime 7 audio extraction API makes it very easy to retrieve mixed, uncompressed audio from any QuickTime movie. This API also offers many
advantages over traditional audio export components for performing audio extraction, such as mixing of all enabled sound tracks in the movie, specifying a channel
layout for the resulting mix, and much more. You can take advantage of this powerful, easy to use API in your next audio programming project.
For More Information
Sample Code
The following sample code projects demonstrate the audio extraction APIs using the techniques described in this article:
Other References
Posted: 2005-12-19
|