The internal architecture of an audio unit consists of scopes, elements, connections, and channels, all of which serve the audio processing code. Figure 2-1 illustrates these parts as they exist in a typical effect unit. This section describes each of these parts in turn. For discussion on the section marked DSP in the figure, representing the audio processing code in an effect unit, see “Synthesis, Processing, and Data Format Conversion Code.”
Audio Unit Scopes
Audio Unit Elements
Audio Unit Connections
Audio Unit Channels
An audio unit scope is a programmatic context. Unlike the general computer science notion of scopes, however, audio unit scopes cannot be nested. Each scope is a discrete context.
You use scopes when writing code that sets or retrieves values of parameters and properties. For example, Listing 2-1 shows an implementation of a standard GetProperty method, as used in the effect unit you build in “Tutorial: Building a Simple Effect Unit with a Generic View”:
Listing 2-1 Using “scope” in the GetProperty method
ComponentResult TremoloUnit::GetProperty ( |
AudioUnitPropertyID inID, |
AudioUnitScope inScope, // the host specifies the scope |
AudioUnitElement inElement, |
void *outData |
) { |
return AUEffectBase::GetProperty (inID, inScope, inElement, outData); |
} |
When a host application calls this method to retrieve the value of a property, the host specifies the scope in which the property is defined. The implementation of the GetProperty method, in turn, can respond to various scopes with code such as this:
if (inScope == kAudioUnitScope_Global) { |
// respond to requests targeting the global scope |
} else if (inScope == kAudioUnitScope_Input) { |
// respond to requests targeting the input scope |
} else { |
// respond to other requests |
} |
There are five scopes defined by Apple in the AudioUnitProperties.h header file in the Audio Unit framework, shown in Listing 2-2:
Listing 2-2 Audio unit scopes
enum { |
kAudioUnitScope_Global = 0, |
kAudioUnitScope_Input = 1, |
kAudioUnitScope_Output = 2, |
kAudioUnitScope_Group = 3, |
kAudioUnitScope_Part = 4 |
}; |
The three most important scopes are:
Input scope: The context for audio data coming into an audio unit. Code in an audio unit, a host application, or an audio unit view can address an audio unit’s input scope for such things as the following:
An audio unit defining additional input elements
An audio unit or a host setting an input audio data stream format
An audio unit view setting the various input levels on a mixer audio unit
A host application connecting audio units into an audio processing graph
Host applications also use the input scope when registering a render callback, as described in “Render Callback Connections.”
Output scope: The context for audio data leaving an audio unit. The output scope is used for most of the same things as input scope: connections, defining additional output elements, setting an output audio data stream format, and setting output levels in the case of a mixer unit with multiple outputs.
A host application, or a downstream audio unit in an audio processing graph, also addresses the output scope when invoking rendering.
Global scope: The context for audio unit characteristics that apply to the audio unit as a whole. Code within an audio unit addresses its own global scope for setting or getting the values of such properties as:
latency,
tail time, and
supported number(s) of channels.
Host applications can also query the global scope of an audio unit to get these values.
There are two additional audio unit scopes, intended for instrument units, defined in AudioUnitProperties.h:
Group scope: A context specific to the rendering of musical notes in instrument units
Part scope: A context specific to managing the various voices of multitimbral instrument units
This version of Audio Unit Programming Guide does not discuss group scope or part scope.
An audio unit element is a programmatic context that is nested within a scope. Most commonly, elements come into play in the input and output scopes. Here, they serve as programmatic analogs of the signal buses used in hardware audio devices. Because of this analogy, audio unit developers often refer to elements in the input or output scopes as buses; this document follows suit.
As you may have noticed in Listing 2-1, hosts specify the element as well as the scope they are targeting when getting or setting properties or parameters. Here is that method again, with the inElement parameter highlighted:
Listing 2-3 Using “element” in the GetProperty method
ComponentResult TremoloUnit::GetProperty ( |
AudioUnitPropertyID inID, |
AudioUnitScope inScope, |
AudioUnitElement inElement, // the host specifies the element here |
void *outData |
) { |
return AUEffectBase::GetProperty (inID, inScope, inElement, outData); |
} |
Elements are identified by integer numbers and are zero indexed. In the input and output scopes, element numbering must be contiguous. In the typical case, the input and output scopes each have one element, namely element (or bus) 0.
The global scope in an audio unit is unusual in that it always has exactly one element. Therefore, the global scope’s single element is always element 0.
A bus (that is, an input or output element) always has exactly one stream format. The stream format specifies a variety of characteristics for the bus, including sample rate and number of channels. Stream format is described by the audio stream description structure (AudioStreamBasicDescription), declared in the CoreAudioTypes.h header file and shown in Listing 2-4:
Listing 2-4 The audio stream description structure
struct AudioStreamBasicDescription { |
Float64 mSampleRate; // sample frames per second |
UInt32 mFormatID; // a four-char code indicating stream type |
UInt32 mFormatFlags; // flags specific to the stream type |
UInt32 mBytesPerPacket; // bytes per packet of audio data |
UInt32 mFramesPerPacket; // frames per packet of audio data |
UInt32 mBytesPerFrame; // bytes per frame of audio data |
UInt32 mChannelsPerFrame; // number of channels per frame |
UInt32 mBitsPerChannel; // bit depth |
UInt32 mReserved; // padding |
}; |
typedef struct AudioStreamBasicDescription AudioStreamBasicDescription; |
An audio unit can let a host application get and set the stream formats of its buses using the kAudioUnitProperty_StreamFormat property, declared in the AudioUnitProperties.h header file. This property’s value is an audio stream description structure.
Typically, you will need just a single input bus and a single output bus in an audio unit. When you create an effect unit by subclassing the AUEffectBase class, you get one input and one output bus by default. Your audio unit can specify additional buses by overriding the main class’s constructer. You would then indicate additional buses using the kAudioUnitProperty_BusCount property, or its synonym kAudioUnitProperty_ElementCount, both declared in the AudioUnitProperties.h header file.
You might find additional buses helpful if you are building an interleaver or deinterleaver audio unit, or an audio unit that contains a primary audio data path as well as a sidechain path for modulation data.
A bus can have exactly one connection, as described next.
A connection is a hand-off point for audio data entering or leaving an audio unit. Fresh audio data samples move through a connection and into an audio unit when the audio unit calls a render callback. Processed audio data samples leave an audio unit when the audio unit’s render method gets called. The Core Audio SDK’s class hierarchy implements audio data hand-off, working with an audio unit’s rendering code.
Hosts establish connections at the granularity of a bus, and not of individual channels. You can see this in Figure 2-1. The number of channels in a connection is defined by the stream format, which is set for the bus that contains the connection.
To connect one audio unit to another, a host application sets a property in the destination audio unit. Specifically, it sets the kAudioUnitProperty_MakeConnection property in the input scope of the destination audio unit. When you build your audio units using the Core Audio SDK, this property is implemented for you.
In setting a value for this property, the host specifies the source and destination bus numbers using an audio unit connection structure (AudioUnitConnection), shown in Listing 2-5:
Listing 2-5 The audio unit connection structure
typedef struct AudioUnitConnection { |
AudioUnit sourceAudioUnit; // the audio unit that supplies audio |
// data to the audio unit whose |
// connection property is being set |
UInt32 sourceOutputNumber; // the output bus of the source unit |
UInt32 destInputNumber; // the input bus of the destination unit |
} AudioUnitConnection; |
The kAudioUnitProperty_MakeConnection property and the audio unit connection structure are declared in the AudioUnitProperties.h file in the Audio Unit framework.
As an audio unit developer, you must make sure that your audio unit can be connected for it to be valid. You do this by supporting appropriate stream formats. When you create an audio unit by subclassing the classes in the SDK, your audio unit will be connectible. The default, required stream format for audio units is described in “Commonly Used Properties.”
Figure 1-7 illustrates that the entity upstream from an audio unit can be either another audio unit or a host application. Whichever it is, the upstream entity is typically responsible for setting an audio unit’s input stream format before a connection is established. If an audio unit cannot support the stream format being requested, it returns an error and the connection fails.
A host application can send audio data to an audio unit directly and can retrieve processed data from the audio unit directly. You don’t need to make any changes to your audio unit to support this sort of connection.
To prepare to send data to an audio unit, a host defines a render callback (shown in Figure 1-7) and registers it with the audio unit. The signature for the callback is declared in the AUComponent.h header file in the Audio Unit framework, as shown in Listing 2-6:
Listing 2-6 The render callback
typedef OSStatus (*AURenderCallback)( |
void *inRefCon, |
AudioUnitRenderActionFlags *ioActionFlags, |
const AudioTimeStamp *inTimeStamp, |
UInt32 inBusNumber, |
UInt32 inNumberFrames, |
AudioBufferList *ioData |
); |
The host must explicitly set the stream format for the audio unit’s input as a prerequisite to making the connection. The audio unit calls the callback in the host when it’s ready for more audio data.
In contrast, for an audio processing graph connection, the upstream audio unit supplies the render callback. In a graph, the upstream audio unit also sets the downstream audio unit’s input stream format.
A host can retrieve processed audio data from an audio unit directly by calling the AudioUnitRender function on the audio unit, as shown in Listing 2-7:
Listing 2-7 The AudioUnitRender function
extern ComponentResult AudioUnitRender ( |
AudioUnit ci, |
AudioUnitRenderActionFlags *ioActionFlags, |
const AudioTimeStamp *inTimeStamp, |
UInt32 inOutputBusNumber, |
UInt32 inNumberFrames, |
AudioBufferList *ioData |
); |
The Core Audio SDK passes this function call into your audio unit as a call to the audio unit’s Render method.
You can see the similarity between the render callback and AudioUnitRender signatures, which reflects their coordinated use in audio processing graph connections. Like the render callback, the AudioUnitRender function is declared in the AUComponent.h header file in the Audio Unit framework.
An audio unit channel is, conceptually, a monaural, noninterleaved path for audio data samples that goes to or from an audio unit’s processing code. The Core Audio SDK represents channels as buffers. Each buffer is described by an audio buffer structure (AudioBuffer), as declared in the CoreAudioTypes.h header file in the Core Audio framework, as shown in Listing 2-8:
Listing 2-8 The audio buffer structure
struct AudioBuffer { |
UInt32 mNumberChannels; // number of interleaved channels in the buffer |
UInt32 mDataByteSize; // size, in bytes, of the buffer |
void *mData; // pointer to the buffer |
}; |
typedef struct AudioBuffer AudioBuffer; |
An audio buffer can hold a single channel, or multiple interleaved channels. However, most types of audio units, including effect units, use only noninterleaved data. These audio units expect the mNumberChannels field in the audio buffer structure to equal 1.
Output units and format converter units can accept interleaved channels, represented by an audio buffer with the mNumberChannels field set to 2 or greater.
An audio unit manages the set of channels in a bus as an audio buffer list structure (AudioBufferList), also defined in CoreAudioTypes.h, as shown in Listing 2-9:
Listing 2-9 The audio buffer list structure
struct AudioBufferList { |
UInt32 mNumberBuffers; // the number of buffers in the list |
AudioBuffer mBuffers[kVariableLengthArray]; // the list of buffers |
}; |
typedef struct AudioBufferList AudioBufferList; |
In the common case of building an n-to-n channel effect unit, such as the one you build in “Tutorial: Building a Simple Effect Unit with a Generic View,” the audio unit template and superclasses take care of managing channels for you. You create this type of effect unit by subclassing the AUEffectBase class in the SDK.
In contrast, when you build an m-to-n channel effect unit (for example, stereo-to-mono effect unit), you must write code to manage channels. In this case, you create your effect unit by subclassing the AUBase class. (As with the rest of this document, this consideration applies to version 1.4.3 of the Core Audio SDK, current at the time of publication.)
Last updated: 2007-10-31