| Log In | Not a Member? |
Contact ADC
|
|
OverviewThis document describes compression types for storing uncompressed Y´CbCr data in a QuickTime file:
Then it describes ImageDescription extensions which must be read and written with these compression types:
ConventionsSymbols R, G, B, W, X, Y, and Z denote light measurements which are linearly related to physical radiance. Symbols R´, G´, B´, W´, X´ and Y´ denote light measurements which are nonlinearly related to physical radiance. Y is CIE luminance, and Y´ is video luma (often erroneously referred to as luminance in video standards). For nonlinear light measurements where there is no ambiguity (e.g., Cb, Cr), we will omit the prime symbol (´). For more information, see the 'colr' ImageDescription extension. floor(x) denotes the largest integer not greater than x. ceil(x) denotes the smallest integer not less than x. ImageDescription Structures and Image BuffersWhen using the compression types defined in this document, set the other fields of the ImageDescription like so:
struct ImageDescription
{
long idSize;
CodecType cType; // one of the compression types above
long resvd1;
short resvd2;
short dataRefIndex;
short version; // set to 2
short revisionLevel; // set to 0
long vendor; // set to your vendor type
CodecQ temporalQuality; // not used -- set to 0
CodecQ spatialQuality; // set to codecLosslessQuality
short width; // number of luma (Y´) sampling instants wide
short height; // number of picture lines high (incl. both fields)
Fixed hRes; // not used -- set to 72 << 16
Fixed vRes; // not used -- set to 72 << 16
long dataSize; // see below
short frameCount; // set to 1
Str31 name; // see below for codec names
short depth; // see below
short clutID; // set to -1
};
This document (except for an appendix which will be mentioned
below) can only be used to interpret a If you want to support the existing ( If your code is requested to interpret an ImageDescription with a
If your code is requested to interpret an ImageDescription with
The We will use the term "buffer" to describe an image in memory or in
a QuickTime file. "Address" refers to a memory address or a file
offset. An image buffer has The
The Define In QuickTime files, the ImageDescription ('stsd' atom)
At runtime, a QuickTime video digitizer must set the
ImageDescription As address increases,
The QuickTime Image Rates and VideoThe ProblemA QuickTime media has a 32-bit TimeScale. Each sample in a QuickTime media has a 32-bit duration. Together the duration and TimeScale let you specify the frame rate of your video data. A QuickTime movie has a 32-bit TimeScale. Each item in a QuickTime track's edit list contains a 32-bit duration in the movie TimeScale and a 32-bit media time in the media TimeScale. Therefore, the precision of track edit boundaries is determined by the movie TimeScale. A QuickTime movie also has a 32-bit duration in the movie TimeScale. All 32-bit quantities mentioned above are two's complement signed values. Since the track's edit list item's media time (in media TimeScale) and the movie duration (in movie TimeScale) are 32-bit quantities, it is important to choose TimeScales which will not overflow these fields under normal use. This issue also arises at runtime. Although most of the QuickTime APIs for manipulating media and movie times use 64-bit quantities (e.g., TimeRecord, ICMFrameTime), some of QuickTime's internal time calculations use only 32 bits, and some QuickTime APIs (e.g., CallMeWhen()) have no 64-bit version.
SolutionsPAL and digital 625 video have exactly 25 frames per second. A media TimeScale of 25 and media sample durations of 1, along with a movie TimeScale of 25, provides 2 years of time in 32 bits. This works well for video-only movies. NTSC and digital 525 video have exactly 30/1.001 frames per second. A media TimeScale of 30000 and media sample durations of 1001, along with a movie TimeScale of 30000, provide 19.9 hours of time in 32 bits. The 30/1.001 rate of NTSC and digital 525 is sometimes approximated to (or mistaken to be) 29.97, which does not equal 30/1.001. The average rate of drop-frame timecode (e.g., LTC and VITC used in the video industry) over 24 hours is exactly 29.97, but the use of drop-frame timecode does not modify the video signal rate. Some existing movies have a media TimeScale of 2997, media sample durations of 100, and a movie TimeScale of 2997. This provides 8.2 days of time before 32-bit overflow, however after 4.6 hours, this representation deviates from the actual video timing by half a frame time. For tracks with a 2997 timescale which are longer than 4.6 hours, a frame-accurate representation of the video timing is only possible if the frames are allowed to have differing durations. Most applications today only deal with material shorter than 4.6 hours. These applications should be able to read video movies with either 2997 or 30000 TimeScales. They may write either, but the 30000 TimeScale is preferred since it has a sufficient overflow time and precisely models the video signal. Finally, some existing movies have a media TimeScale of 30, media sample durations of 1, and a movie TimeScale of 30. This provides 2 years of time in 32 bits, but the representation deviates from NTSC/digital 525 video timing by half a frame time after only 16 seconds! Any such movies intended for 30/1.001 frame/second video are mislabeled. Some such movies may be intended for graphics output devices whose rates are really 30 frames/second, but typically these are low-bitrate compressed movies where precise timing is not required. Therefore, a QuickTime reader which encounters a 30 frame per second movie with a compression type from this document but without the required ImageDescription extensions from this document should assume that the actual image rate is 30/1.001. All new movies with the compression types from this document should be created with the extensions from this document and should avoid image rate 30 for standard definition video. The SMPTE HDTV formats (1920x1035, 1920x1080, 1280x720) have both a 30 (or 60) frame/second variety and a 30/1.001 (or 60/1.001) frame/second variety. Be careful to properly label HDTV data in QuickTime files.
Audio and VideoFor movies with audio and video tracks, it is sometimes desirable to specify edits on the audio track at a finer granularity than a video frame. Since all edits (regardless of track) are specified on the movie TimeScale, you must choose a TimeScale which satisfies the needs of edits on both the audio and video tracks. Ideally, you would choose a TimeScale which is the least common multiple of the audio and video media TimeScales. Audio tracks typically have a TimeScale equal to the number of audio frames per second (e.g., 22050, 44100, 48000). For some combinations of common audio and video media TimeScales, this will again result in 32-bit time overflow. This tradeoff must be made according to application needs. Y´CbCr Spatial RelationshipFor each 4:2:2 compression type, we will describe the bit position of a Y´0, Y´1, Cb, and Cr component. For this document, "pixels" in 4:2:2 implies luma (Y´) sampling instants. The leftmost luma (Y´) sample of each line in a QuickTime buffer is a Y´0 sample. The spatial relationship of the four components is like so:
For each 4:4:4 or 4:4:4:4 compression type, we will describe the bit position of a Y´, Cb, Cr, and (for 4:4:4:4) alpha (A) component. At each "pixel," there is a sample of all components:
Note: "Cb" is often erroneously called "U" and "Cr" is often erroneously called "V." Y´CbCr Numerical Value and Color ParametersFor each compression type below, we will define how the following three canonical quantities map onto numerical values of that compression type's Y´, Cb, and Cr components:
Specifically, we will define how EY´, ECb, and ECr map onto the compression type's integer values, and how those integer values are encoded (unsigned, offset binary, two's complement signed). Each ImageDescription with these compression types must include a 'colr' extension with the 'nclc' type. This extension defines color parameters for EY´, ECb, and ECr. Together, the compression type and 'colr' extension allow correct display and color conversion of the image data. Terminology Note: The terms EY´, ECb, and ECr are specific to this document and are not intended to correspond to any use of the same terms in other standards. In particular, the use of "E" should not be construed as voltage, and EY´, ECb, and ECr should not be construed to come from any particular video interface specification. This document uses EY´, ECb, and ECr as placeholder symbols which allow us to connect the compression type part of the document with the 'colr' ImageDescription extension part of the document. We will define two mapping/encoding schemes. Each compression type will use one of these schemes. Other, slightly different encoding/mapping schemes exist in the video industry, so be careful that any Y´CbCr data you bring into QuickTime matches. Scheme A: "Wide-Range" Mapping with Unsigned Y´, Two's Complement Cb, CrScheme A transforms EY´,
ECb, and ECr into
For n=8 bits, this yields:
Y´ is an unsigned integer. Cb and Cr are two's complement signed integers. Warning: In specifications such as ITU-R BT.601-4, JFIF 1.02, and SPIFF (Rec. ITU-T T.84), the symbols Cb and Cr are used to describe offset binary integers, not two's complement signed integers as are used in scheme A. The value -2n-1 (-128 for n=8 bits) may appear in a buffer due to filter undershoot. The writer of a QuickTime image may use the value. The reader of a QuickTime image must expect the value.
Warning: In scheme A, Cb and Cr have a 2n-2 (254 for n=8 bits) excursion while Y´ has a 2n-1 (255 for n=8 bits) excursion. Furthermore, ECb=0 and ECr=0 imply Cb=0 and Cr=0, respectively (Cb and Cr have a 0 center). You may encounter video data with two's complement Cb and Cr components that have other excursions and centers. In particular, you may encounter data with a 2n-1 (255 for n=8 bits) excursion and a -0.5 center, which is known as a "Full-Range" mapping. You may also encounter data with a 2n excursion (256 for n=8 bits) and a 0 center. These forms of data are not representable using the labels described in this document. Be sure to convert the data properly when bringing it into QuickTime using a compression type from this document. Failure to so could, for example, incorrectly generate the value -2n-1. Warning: The -2n-1 value could also result from poor rounding or inappropriate brightness and contrast settings at capture time. Scheme B: "Video-Range" Mapping with Unsigned Y´, Offset Binary Cb, CrScheme B comes from digital video industry specifications such as Rec. ITU-R BT. 601-4. All standard digital video tape formats (e.g., SMPTE D-1, SMPTE D-5) and all standard digital video links (e.g., SMPTE 259M-1997 serial digital video) use this scheme. Professional video storage and processing equipment from vendors such as Abekas, Accom, and SGI also use this scheme. MPEG-2, DVC and many other codecs specify source Y´CbCr pixels using this scheme. Scheme B transforms EY´,
ECb, and ECr into
For n=8 bits, this yields:
For n=10 bits, this yields:
Y´ is an unsigned integer. Cb and Cr are offset binary integers. Certain Y´, Cb, and Cr component values v are reserved as synchronization signals and must not appear in a buffer:
For n=8 bits, these are values 0 and 255. For n=10 bits, these are values 0, 1, 2, 3, 1020, 1021, 1022, and 1023. The writer of a QuickTime image is responsible for omitting these values. The reader of a QuickTime image may assume that they are not present. The remaining component values (e.g., 1-15 and 241-254 for n=8 bits and 4-63 and 961-1019 for n=10 bits) accommodate occasional filter undershoot and overshoot in image processing. In some applications, these values are used to carry other information (e.g., transparency). The writer of a QuickTime image may use these values and the reader of a QuickTime image must expect these values.
Compression TypesThe compression type defines the memory/file layout of the Y´CbCr components. ImageDescription structures with these compression types must include certain extensions described below to allow correct interchange and conversion of the image data. We will show a packing diagram for each compression type:
Zero BitsSome of the compression types below include zero bits which are not used to encode image data. The writer of a QuickTime image must place zero in these bits. Image processing operations must continue to place zero in these bits. The reader of a QuickTime image can assume the bits are zero.
'yuv2' vs. '2vuy'We will define two 8-bit 4:2:2 compression types:
'2vuy' is the preferred format for hardware and software development for which the choice is otherwise arbitrary. '2vuy' 4:2:2 Compression TypeThe ImageCompression.h token to use is
This compression type uses scheme B ("Video-Range"
Mapping with Unsigned Y´, Offset Binary Cb, Cr) to get from
EY´, ECb, and
ECr to Y´, Cb, and Cr.
There are '2vuy' files in the field with a 'yuv2' 4:2:2 Compression TypeThe ImageCompression.h token to use is
This compression type uses scheme A ("Wide-Range"
Mapping with Unsigned Y´, Two's Complement Cb, Cr) to get
from EY´, ECb, and
ECr to Y´, Cb, and Cr.
As described in QuickTime Ice Floe Dispatch 20: QuickTime Pixel Format FourCCs, the 'yuv2' file format is equivalent to the 'yuvu' pixel format. There are 'yuv2' files in the field with a 'v308' 4:4:4 Compression TypeThe ImageCompression.h token to use is
This compression type uses scheme B ("Video-Range"
Mapping with Unsigned Y´, Offset Binary Cb, Cr) to get from
EY´, ECb, and
ECr to Y´, Cb, and Cr.
'v408' 4:4:4:4 Compression TypeThe ImageCompression.h token to use is
This compression type uses scheme B ("Video-Range"
Mapping with Unsigned Y´, Offset Binary Cb, Cr) to get from
EY´, ECb, and
ECr to Y´, Cb, and Cr.
In the absence of other information, assume the A (alpha) component behaves like Y´ as described in SMPTE RP 157-1995. 16 is completely transparent and 235 is completely opaque, and one is intended to linearly blend Y´, Cb, and Cr components based on the percentage between 16 and 235. 'v216' 4:2:2 Compression TypeThe ImageCompression.h token to use is
Each n-bit component is left justified in a 16 bit
little-endian word. The 16-n least significant bits
of the 16 bit word are zero bits (described
above). This compression type uses scheme B ("Video-Range"
Mapping with Unsigned Y´, Offset Binary Cb, Cr) to get from
EY´, ECb, and
ECr to Y´, Cb, and Cr.
n is determined by a required 'sgbt' extension:
'v410' 4:4:4 Compression TypeThe ImageCompression.h token to use is
3 10-bit unsigned components are packed into a 32-bit little-endian word. Here are the bits of the 32-bit little-endian word in decreasing address order:
The 2 X bits are zero bits, described
above. This compression type uses scheme B ("Video-Range"
Mapping with Unsigned Y´, Offset Binary Cb, Cr) to get from
EY´, ECb, and
ECr to Y´, Cb, and Cr.
'v210' 4:2:2 Compression TypeThe ImageCompression.h token to use is
12 10-bit unsigned components are packed into four 32-bit little-endian words. Here are the four 32-bit words in increasing address order:
Here are the bits of the four 32-bit little-endian words in decreasing address order. We have numbered each component with its spatial order. As you move from left to right in the QuickTime image, Y´ goes from 0 to 5 and Cb and Cr go from 0 to 2. Y´ number 0 is a Y´0 sample as described in Y´CbCr Spatial Relationship.
The X bits are zero bits, described above. This compression type uses scheme B ("Video-Range" Mapping with Unsigned Y´, Offset Binary Cb, Cr) to get from EY´, ECb, and ECr to Y´, Cb, and Cr.
As described in ImageDescription Structures
and Image Buffers, the As an example, say The 'colr' ImageDescription ExtensionThis extension is always required when using the compression types in this document. It allows the reader to correctly display the video data and convert it to other formats. The goal of the 'colr' extension is to let you map the numerical values of pixels in the file to a common representation of color in which images can be correctly compared, combined, and displayed. The common representation is the CIE XYZ tristimulus values (defined in Publication CIE No. 15.2). The 'colr' extension is designed to work for multiple imaging applications such as video and print. Each application, driven by its own set of historical and economic realities, has its own set of parameters needed to map from pixel values to CIE XYZ. Therefore, the 'colr' extension has this format:
Currently, two values of
The 'nclc' 'colr' ImageDescription Extension: Y´CbCr Color ParametersA 'colr' extension of type 'nclc' is always required when using the compression types in this document. It specifies the color parameters of the canonical EY´, ECb, and ECr components which we introduced above and mapped to each compression type. If you are not familiar with the terms and/or specifications below, but you know that your video data came from a specific kind of video signal and/or is intended to be output as a specific kind of video signal, we provide a simple table below showing the best values to set in your 'colr' extension. All current video systems use this model:
R, G, and B are tristimulus values (e.g.,
candelas/meter2), whose relationship to CIE XYZ
tristimulus values can be derived from the set of primaries and white
point chosen by the ER´, EG´, and EB´,
also in the range [0,1], are related to R, G, and B by a
nonlinear transfer function labeled as f() and g() above. The
f() is typically performed inside cameras and g() is typically performed inside displays, so ER´, EG´ and EB´ are often transmitted as voltage in video signals. f(W) is equal to g-1(RI(W)) where RI (rendering intent) is an empirical factor described in documents such as Charles Poynton's "The rehabilitation of gamma," Human Vision and Electronic Imaging III, Proceedings of SPIE/IS&T Conference 3299 (San Jose, Calif., Jan. 26 - 30, 1998), ed. B. E. Rogowitz and T. N. Pappas (Bellingham, Wash.: SPIE, 1998). See also http://Home.InfoRamp.Net/~poynton/papers/IST_SPIE_9801/index.html. Finally, a matrix operation on nonlinear components (an operation
referred to as "nonconstant luminance coding") gets us between
ER´, EG´, and EB´ and
the canonical EY´, ECb, and ECr
components introduced above. The
The values below correspond to those in the sequence display extension defined in MPEG-2 (Rec. ITU-T H.262 (1995 E) section 6.3.6).
PrimariesHere are the values of
The following values, from Recommendation ITU-R BT.470-4 System M and the 1953 FCC NTSC spec, are obsolete and have never been used to code any digital image. If you are told your image is coded with these values, you almost definitely want the SMPTE C primaries (code 6 above) instead:
The following values, from Recommendation ITU-R BT.470-4 System B, G, have been superseded by EBU Tech. 3213 (code 5 above). EBU Tech. 3213 values are used for 625-line digital and composite PAL video:
Transfer FunctionHere are the values of
The MPEG-2 sequence display extension transfer_characteristics
defines a code 6 whose transfer function is identical to that in code
1. QuickTime writers should map 6 to 1 when converting from
transfer_characteristics to Recommendation ITU-R BT.470-4 specified an "assumed gamma value of the receiver for which the primary signals are pre-corrected" as 2.2 for NTSC and 2.8 for PAL systems. This information is both incomplete and obsolete. Modern 525- and 625-line digital and NTSC/PAL systems use the transfer function with code 1 above. The 'colr' extension supersedes the previously defined 'gama' ImageDescription extension. Writers of QuickTime files should never write both into an ImageDescription, and readers of QuickTime files should ignore 'gama' if 'colr' is present. Some Macintosh-based video systems apply an additional transfer function to incoming Y´CbCr data at capture time so that when it is converted to nonlinear R´G´B´ for display on a Macintosh screen at playback time, it looks correct given the default Macintosh graphics backend transfer function. This operation is currently not representable or supported by the 'colr' parameters.
MatrixHere are the values for
Then the formulas for ECb and ECr are:
The MPEG-2 sequence display extension matrix_coefficients defines
a code 5 whose matrix is identical to that in code 6. QuickTime
writers should map 5 to 6 when converting from matrix_coefficients to
The following values, from the 1953 FCC NTSC spec, are obsolete. If you are told your image is coded with these values, you almost definitely want the SMPTE 170M-1994 values (code 6 above) instead:
The following values, from Recommendation ITU-R BT.709-1, have been superseded by Recommendation ITU-R BT.709-2 (code 1 above):
Sample 'colr' SettingsUnless you know better, here are the best values to use for some common video signal formats:
Some digital video signals can carry a video index (see SMPTE RP
186-1995) which explicitly labels the The 'fiel' ImageDescription Extension: Field/Frame InformationThis extension is always required when using the compression types
in this document. It defines the temporal and spatial relationship of
each of the
We give each line of the buffer, in order from lowest address to highest address, a number
rowBytes*n.
Note: there is a proposal to allow a gap between fields in the
split-field representation (when We give each line of the buffer, in spatial order from top to bottom, a number
We give each line of the buffer, in temporal order from earliest to latest, a number
For every setting of If
If
Pixel Aspect Ratio, Clean Aperture, and Picture Aspect RatioEvery QuickTime image has a "pixel aspect ratio," which is the horizontal spacing of luma sampling instants vs. the vertical spacing of picture lines when displayed on a display device. It is specified by the 'pasp' ImageDescription extension. Applications must know the pixel aspect ratio in order to draw round circles, lines at a specified angle, etc. Every QuickTime image has a "clean aperture," which is a reference
rectangle specified relative to the Using the pixel aspect ratio and clean aperture dimensions, we can derive the "picture aspect ratio," which is the horizontal to vertical distance ratio of the clean aperture on a display device. The picture aspect ratio is typically 4:3 or 16:9. The clean aperture is used to relate locations in two QuickTime images. Given two QuickTime images with identical picture aspect ratio, you can assume that the top left corner of the clean aperture of each image is coincident, and the bottom right corner of the clean aperture of each image is coincident. The clean aperture also provides a deterministic mapping between a QuickTime image and the region of the video signal (as seen on a display device) from which it was captured or to which it will be played. Each video interface standard (e.g., NTSC, digital 525, PAL, digital 625) also defines a "clean aperture" in terms of its electrical signal. The term "clean aperture" actually originates in the video industry (see SMPTE RP 187-1995). You can think of a video interface standard's clean aperture as a fixed rectangular region on a display device of that standard. Given a QuickTime image and a QuickTime video component implementing a particular interface standard (e.g., video digitizer, image decompressor, video output), if the image and the standard have the same picture aspect ratio, then the component should map the clean aperture of the image to the clean aperture of the video signal. That is,
An example. Say a user imports two clips with a 4:3 picture
aspect ratio into an NLE application and performs a transition such
as a cross-fade between the two clips. The application needs to know
how to align the pixels of each source clip in the final result. Even
if both clips have the same The 'clap' ImageDescription extension eliminates this problem. First, QuickTime video digitizers label captured clips with the location of the clean aperture from the original video signal. Next, when the user imports these clips and performs a transition, the NLE application uses the clean aperture of each input clip to figure out how the pixels of each clip correspond. The NLE application produces an output clip with a clean aperture corresponding to the two input clips. Finally, the NLE application plays back the result, and the QuickTime image decompressor or video output component aligns the clean aperture of the clip to that of the output video signal. The clean aperture provides the missing piece of information so that the user sees no shifts. Ideally, all applications and devices would digitize and
output the same region of the video signal so that the clean aperture
would be fixed relative to the Even applications that produce synthetic imagery (e.g., animation applications) should label output clips with a 'clap' ImageDescription extension. The application should allow the user to position objects relative to the (typically 4:3 or 16:9) clean aperture. For example, this allows the output of multiple animation applications to be combined without manual alignment. And it allows an application to synthesize imagery using data originally derived from video signals (e.g., keys, mattes, motion capture data, shape capture data) and then re-combine that imagery with the original video signals without manual alignment.
The term "clean aperture" does not refer to:
The three items above can be carried by other ImageDescription extensions, but they are outside the scope of this document. The size and location of the clean aperture is fixed for a given video standard and sampling frequency; it does not depend on the capturing equipment used or the image content of the captured signal. Some NLE and many compositing applications allow the user to zoom and pan video clips relative to each other for artistic effect. The 'clap' ImageDescription extension is not intended to be used for this purpose. A QuickTime image's 'clap' ImageDescription extension should always specify where that single QuickTime image should be displayed relative to the standard clean aperture of a display device. For both QuickTime and video interface standards, the "picture center" is defined as the center of the clean aperture. Often, applications digitize a region of the video signal which is
slightly larger than the clean aperture. In the video industry, this
region is called the "production aperture," is cocentric with the
clean aperture, and is defined to have the familiar 720x486 and
720x576 dimensions for the 525- and 625-line digital signal formats.
Digitizing the production aperture accommodates edge-related
filtering artifacts by providing a fixed region where such artifacts
are allowed (see SMPTE RP 187-1995). So it is normal for a QuickTime
image to have a clean aperture whose dimensions differ from
The pixel aspect ratio, clean aperture, and picture aspect ratio may be constrained by a stated level of conformance, as described in The Production Level of Conformance The 'pasp' ImageDescription Extension: Pixel Aspect RatioThis extension is required when using the compression types in this document if the pixel aspect ratio is not square (1:1). It specifies the horizontal spacing of luma sampling instants vs. the vertical spacing of picture lines on a display device:
Picture lines are defined with the 'fiel' ImageDescription extension above. You can think of these values as the ratio of the width of a
pixel to the height of a pixel. For example, say you want to draw a
circle that appears round on the display device and whose diameter is
Pixel aspect ratio is not the same as "picture aspect ratio," which is defined in Pixel Aspect Ratio, Clean Aperture, and Picture Aspect Ratio above. Examples of common picture aspect ratios are 4:3 and 16:9.
Standard Definition Pixel Aspect RatiosThere is widespread confusion about the pixel aspect ratio of standard definition video formats. If your device transfers one of these 4:3 picture aspect ratio video formats, these are the correct pixel aspect ratios:
The luma sampling frequencies shown above (13.5 MHz, (12+3/11) MHz, and 14.75 MHz) are ubiquitous in the video industry. Furthermore, all existing (and probably future) standard definition video equipment assumes that sampling a 525-line signal at (12+3/11) MHz, or a 625-line signal at 14.75 MHz, yields square pixels. Therefore, for the standard definition formats we derive the pixel aspect ratio of non-square pixels (13.5 MHz in both cases) by taking a ratio of 13.5 MHz and the square sampling frequency. Typical applications manipulate 720 pixel wide images for non-square data and 640 (525-line) or 768 (625-line) pixel wide images for square data. Note that 640/720 does not equal 10/11, and 768/720 does not equal 59/54. If you want to convert images between square and non-square using the widths above, you will need to either crop the source image or pad the destination image: images with the widths above do not represent the same underlying region of the video signal. You should maintain the picture center when performing these padding or cropping operations. In theory, the non-square pixel aspect ratios above, and therefore
also the square luma sampling frequencies, are superseded by the new
calculations in SMPTE RP 187-1995. However, because the standard
definition pixel aspect ratios from that spec do not reflect the
actual ratios used by any existing (and probably future) standard
definition equipment, you should use the values above. For the same
reason, the 13.5 MHz and 18 MHz clean aperture figures which we will
present in Typical Here is a comparable chart for the existing standard definition formats with 16:9 picture aspect ratio. With the exception of the 18 MHz format, the formats below are electrically identical to the ones in the table above. Typically you use the 16:9 formats below by pushing a button labeled "16:9" on a standard 4:3 monitor; the monitor takes the same signal and displays it vertically compressed. The clean apertures of the 16:9 formats have the same number of pixels and lines as the 4:3 formats above. The difference is that 16:9 cameras will capture data 16 units wide by 9 units high into the clean aperture, and 16:9 monitors will display the clean aperture such that it is 16 units wide by 9 units high on the display surface, instead of 4 by 3. Since the clean aperture still has the same number of pixels and lines but now has a different picture aspect ratio, you can map 4:3 values above into 16:9 values below by simply multiplying by (16/9)/(4/3)=(4/3). The 18 MHz SMPTE 267M-1995 standard is exceptional in that its clean aperture has 4/3 as many pixels per line so that the pixels can retain the pixel aspect ratio of 525-line 4:3 digital video (10/11).
High Definition Pixel Aspect RatiosDesign of the HDTV formats had the benefit of hindsight. These formats define a clean aperture whose picture aspect ratio is exactly 16:9. Looking at the number of pixels and lines in the clean aperture, you can compute the pixel aspect ratio. One need not consider the luma sampling frequency. For 1920x1035 HDTV (SMPTE 240M-1995, SMPTE 260M-1992), the clean aperture size is uncertain. SMPTE 240M-1995 and SMPTE 260M-1992 specify a clean aperture of 1888 pixels by 1017 lines. However, SMPTE RP 187-1995 specifies a clean aperture of 1888 pixels by 1018 lines. 1018 seems to be the more sensible value: since the picture center is located halfway between two lines, a clean aperture height which is 0 mod 2 puts the top and bottom of the clean aperture on lines instead of between lines. Industry practice will decide. Here are the values:
1920x1080 and 1280x720 HDTV are square-pixel:
The 'clap' ImageDescription Extension: Clean ApertureThis extension is always required when using the compression types in this document. It defines the position of the QuickTime image clean aperture, which we defined in Pixel Aspect Ratio, Clean Aperture, and Picture Aspect Ratio above, relative to the pixels and lines of the image. The picture center is defined to fall at the center of the clean aperture. This extension allows input, output, processing and compositing of video images with correct registration.
These parameters are represented as a fraction N/D. The fraction
may or may not be in reduced terms. We will refer to the set of
parameters Each line of your QuickTime image contains pixels 0 through
The picture center of your image falls at:
horizOff and vertOff are zero,
so your QuickTime image is centered about the picture center.
The leftmost/rightmost pixel and the topmost/bottommost line of the clean aperture fall at:
For QuickTime images representing unscaled video from some video
interface standard, you must set If your QuickTime image represents scaled video from some video interface standard whose clean aperture dimensions are caX by caY, and you applied a scale factor of scaleX and scaleY (QuickTime image pixels or lines per interface standard pixels or lines), then fill in the 'clap' extension with:
Typical
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
525-line video
|
|||||||||||||||
|
625-line video
| |||||||||||||||