Video
media is used to store compressed and uncompressed image data in
QuickTime movies. It has a media type of 'vide'.
Video Sample Description
Video Sample Data
The video sample description contains information that defines how to interpret video media data. A video sample description begins with the four fields described in “General Structure of a Sample Description.”
The data format field of a video sample description indicates the type of compression that was used to compress the image data, or the color space representation of uncompressed video data. Table 3-1 shows some of the formats supported. The list is not exhaustive, and is subject to addition.
Compression type |
Description |
|---|---|
Cinepak |
|
JPEG |
|
Graphics |
|
Animation |
|
Apple video |
|
Kodak Photo CD |
|
|
Portable Network Graphics |
Motion-JPEG (format A) |
|
Motion-JPEG (format B) |
|
Sorenson video, version 1 |
|
|
Sorenson video 3 |
|
MPEG-4 video |
|
NTSC DV-25 video |
|
PAL DV-25 video |
|
Compuserve Graphics Interchange Format |
|
H.263 video |
|
Tagged Image File Format |
Uncompressed RGB |
|
|
Uncompressed Y′CbCr, 8-bit-per-component 4:2:2 |
|
Uncompressed Y′CbCr, 8-bit-per-component 4:2:2 |
|
Uncompressed Y′CbCr, 8-bit-per-component 4:4:4 |
|
Uncompressed Y′CbCr, 8-bit-per-component 4:4:4:4 |
|
Uncompressed Y′CbCr, 10, 12, 14, or 16-bit-per-component 4:2:2 |
|
Uncompressed Y′CbCr, 10-bit-per-component 4:4:4 |
|
Uncompressed Y′CbCr, 10-bit-per-component 4:2:2 |
The video media sample description adds the following fields to the general sample description.
A 16-bit integer indicating the version number of the compressed data. This is set to 0, unless a compressor has changed its data format.
A 16-bit integer that must be set to 0.
A 32-bit integer that specifies the developer of the
compressor that generated the compressed data. Often this field
contains 'appl' to
indicate Apple Computer, Inc.
A 32-bit integer containing a value from 0 to 1023 indicating the degree of temporal compression.
A 32-bit integer containing a value from 0 to 1024 indicating the degree of spatial compression.
A 16-bit integer that specifies the width of the source image in pixels.
A 16-bit integer that specifies the height of the source image in pixels.
A 32-bit fixed-point number containing the horizontal resolution of the image in pixels per inch.
A 32-bit fixed-point number containing the vertical resolution of the image in pixels per inch.
A 32-bit integer that must be set to 0.
A 16-bit integer that indicates how many frames of compressed data are stored in each sample. Usually set to 1.
A 32-byte Pascal string containing the name of the compressor
that created the image, such as "jpeg".
A 16-bit integer that indicates the pixel depth of the compressed image. Values of 1, 2, 4, 8 ,16, 24, and 32 indicate the depth of color images. The value 32 should be used only if the image contains an alpha channel. Values of 34, 36, and 40 indicate 2-, 4-, and 8-bit grayscale, respectively, for grayscale images.
A 16-bit integer that identifies which color table to use. If this field is set to –1, the default color table should be used for the specified depth. For all depths below 16 bits per pixel, this indicates a standard Macintosh color table for the specified depth. Depths of 16, 24, and 32 have no color table.
If the color table ID is set to 0, a color table is contained within the sample description itself. The color table immediately follows the color table ID field in the sample description. See “Color Table Atoms” for a complete description of a color table.
Video sample descriptions can be extended by appending other atoms. These atoms are placed after the color table, if one is present. These extensions to the sample description may contain display hints for the decompressor or may simply carry additional information associated with the images. Table 3-2 lists the currently defined extensions to video sample descriptions.
Extension type |
Description |
|---|---|
A 32-bit fixed-point number indicating the gamma level at which the image was captured. The decompressor can use this value to gamma-correct at display time. |
|
Two 8-bit integers that define field handling. This information is used by applications to modify decompressed image data or by decompressor components to determine field display order. This extension is mandatory for all uncompressed Y′CbCr data formats.The first byte specifies the field count, and may be set to 1 or 2. A value of 1 is used for progressive-scan images; a value of 2 indicates interlaced images. When the field count is 2, the second byte specifies the field ordering: which field contains the topmost scan-line, which field should be displayed earliest, and which is stored first in each sample. Each sample consists of two distinct compressed images, each coding one field: the field with the topmost scan-line, T, and the other field, B. The following defines the permitted variants:0 – There is only one field. 1 – T is displayed earliest, T is stored first in the file. 6 – B is displayed earliest, B is stored first in the file.9 – B is displayed earliest, T is stored first in the file.14 – T is displayed earliest, B is stored first in the file. |
|
The default quantization table for a Motion-JPEG data stream. |
|
The default Huffman table for a Motion-JPEG data stream. |
|
|
An MPEG-4 elementary stream descriptor atom. This extension is required for MPEG-4 video. For details, see “MPEG-4 Elementary Stream Descriptor ('esds') Atom.” |
|
Pixel aspect ratio. This extension is mandatory for video formats that use non-square pixels. For details, see “Pixel Aspect Ratio ('pasp').” |
|
Color parameters—an image description extension required for all uncompressed Y′CbCr video types. For details, see “Color Parameter Atoms ('colr').” |
|
Clean aperture—spatial relationship of Y′CbCr components relative to a canonical image center. This allows accurate alignment for compositing of video images captured using different systems. This is a mandatory extension for all uncompressed Y′CbCr data formats. For details, see “Clean Aperture ('clap').” |
This extension specifies the height-to-width ratio of pixels found in the video sample. This is a required extension for MPEG-4 and uncompressed Y′CbCr video formats when non-square pixels are used. It is optional when square pixels are used.
An unsigned 32-bit integer holding the size of the pixel aspect ratio atom.
An unsigned 32-bit field containing the four-character
code 'pasp'.
An unsigned 32-bit integer specifying the horizontal spacing of pixels, such as luma sampling instants for Y′CbCr or YUV video.
An unsigned 32-bit integer specifying the vertical spacing of pixels, such as video picture lines.
The units of measure for the hSpacing and vSpacing parameters
are not specified, as only the ratio matters. The units of measure
for height and width must be the same, however.
Description |
hSpacing |
vSpacing |
|---|---|---|
4:3 square pixels (composite NTSC or PAL) |
1 |
1 |
4:3 non-square 525 (NTSC) |
10 |
11 |
4:3 non-square 625 (PAL) |
59 |
54 |
16:9 analog (composite NTSC or PAL) |
4 |
3 |
16:9 digital 525 (NTSC) |
40 |
33 |
16:9 digital 625 (PAL) |
118 |
81 |
1920x1035 HDTV (per SMPTE 260M-1992) |
113 |
118 |
1920x1035 HDTV (per SMPTE RP 187-1995) |
1018 |
1062 |
1920x1080 HDTV or 1280x720 HDTV |
1 |
1 |
This atom contains an MPEG-4 elementary stream descriptor atom. This
is a required extension to the video sample description for MPEG-4
video. This extension appears in video sample descriptions only
when the codec type is 'mp4v'.
Note: The elementary stream descriptor which this atom contains is defined in the MPEG-4 specification ISO/IEC FDIS 14496-1.
An unsigned 32-bit integer holding the size of the elementary stream descriptor atom.
An unsigned 32-bit field containing the four-character
code 'esds'
An unsigned 8-bit integer set to zero.
A 24-bit field reserved for flags, currently set to zero.
An elementary stream descriptor for MPEG-4 video, as defined in the MPEG-4 specification ISO/IEC 14496-1 and subject to the restrictions for storage in MPEG-4 files specified in ISO/IEC 14496-14.
This atom is a required extension for uncompressed Y′CbCr
data formats. The 'colr' extension
is used to map the numerical values of pixels in the file to a common representation
of color in which images can be correctly compared, combined, and displayed.
The common representation is the CIE XYZ tristimulus values (defined
in Publication CIE No. 15.2).
Use of a common representation also allows you to correctly map between Y′CbCr and RGB color spaces and to correctly compensate for gamma on different systems.
The 'colr' extension supersedes
the previously defined 'gama' Image
Description extension. Writers of QuickTime files should never write
both into an Image Description, and readers of QuickTime files should
ignore 'gama' if 'colr' is
present.
The 'colr' extension is designed
to work for multiple imaging applications such as video and print.
Each application, driven by its own set of historical and economic
realities, has its own set of parameters needed to map from pixel
values to CIE XYZ.
The CIE XYZ representation is mapped to various stored Y′CbCr formats using a common set of transfer functions and matrixes. The transfer function coefficients and matrix values are stored as indexes into a table of canonical references. This provides support for multiple video systems while limiting the scope of possible values to a set of recognized standards.
The 'colr' atom contains four
fields: a color parameter type and three indexes. The indexes are
to a table of primaries, a table of transfer function coefficients,
and a table of matrixes.
The table of matrixes specifies the matrix used during the translation, as shown in Figure 3-2.
A 32-bit field containing a four-character code for
the color parameter type. The currently defined types are 'nclc' for
video, and 'prof' for print. The color parameter type distinguishes between print and video mappings.
If
the color parameter type is 'prof', then
this field is followed by an ICC profile. This is the color model
used by Apple’s ColorSync. The contents of this type are not defined
in this document. Contact Apple Computer for more information on
the 'prof' type 'colr' extension.
If
the color parameter type is 'nclc' then
this atom contains the following fields:
A 16-bit unsigned integer containing an index into a table specifying the CIE 1931 xy chromaticity coordinates of the white point and the red, green, and blue primaries. The table of primaries specifies the white point and the red, green, and blue primary color points for a video system.
A 16-bit unsigned integer containing an index into a table specifying the nonlinear transfer function coefficients used to translate between RGB color space values and Y′CbCr values. The table of transfer function coefficients specifies the nonlinear function coefficients used to translate between the stored Y′CbCr values and a video capture or display system, as shown in Figure 3-2.
A 16-bit unsigned integer containing an index into a table specifying the transformation matrix coefficients used to translate between RGB color space values and Y′CbCr values. The table of matrixes specifies the matrix used during the translation, as shown in Figure 3-2.
The transfer function and matrix are used as shown in the following diagram.
The Y′CbCr values stored in a file are normalized to a range of [0,1]for Y′ and [-0.5, +0.5] for Cb and Cr when performing these operations. The normalized values are then scaled to the proper bit depth for a particular Y′CbCr format before storage in the file.
Note: The symbols used for these values are not intended to correspond to the use of these same symbols in other standards. In particular, "E" should not be interpreted as voltage.
These normalized values can be mapped onto the stored integer values of a particular compression type's Y′, Cb, and Cr components using two different schemes, which we will call Scheme A and Scheme B.
Warning: Other, slightly different encoding/mapping schemes exist in the video industry, and data encoded using these schemes must be converted to one of the QuickTime schemes defined here.
Scheme A uses "Wide-Range" mapping (full scale) with unsigned Y′ and twos-complement Cb and Cr values.
This maps normalized values to stored values so that, for example, 8-bit unsigned values for Y′ go from 0-255 as the normalized value goes from 0 to 1, and 8-bit signed valued for Cb and Cr go from -127 to +127 as the normalized values go from -0.5 to +0.5.
Warning: In specifications such as ITU-R BT.601-4, JFIF 1.02, and SPIFF (Rec. ITU-T T.84), the symbols Cb and Cr are used to describe offset binary integers, not twos-complement signed integers shown here.
Scheme B uses "Video-Range" mapping with unsigned Y′ and offset binary Cb and Cr values.
Note: Scheme B comes from digital video industry specifications such as Rec. ITU-R BT. 601-4. All standard digital video tape formats (e.g., SMPTE D-1, SMPTE D-5) and all standard digital video links (e.g., SMPTE 259M-1997 serial digital video) use this scheme. Professional video storage and processing equipment from vendors such as Abekas, Accom, and SGI also use this scheme. MPEG-2, DVC and many other codecs specify source Y′CbCr pixels using this scheme.
This maps the normalized values to stored values so that, for example, 8-bit unsigned values for Y′ go from 16–235 as the normalized value goes from 0 to1, and 8-bit unsigned valued for Cb and Cr go from 16–240 as the normalized values go from -0.5 to +0.5.
For 10-bit samples, Y′ has a range of 64 to 940 as the normalized value goes from 0 to 1, and Cb and Cr have the range of 65–960 as the normalized values go from –0.5 to +0.5.
Y′ is an unsigned integer. Cb and Cr are offset binary integers.
Certain Y′, Cb, and Cr component values v are reserved as synchronization signals and must not appear in a buffer. For n = 8 bits, these are values 0 and 255. For n = 10 bits, these are values 0, 1, 2, 3, 1020, 1021, 1022, and 1023. The writer of a QuickTime image is responsible for omitting these values. The reader of a QuickTime image may assume that they are not present.
The remaining component values that fall outside the mapping for scheme B (1-15 and 241-254 for n = 8 bits and 4–63 and 961–1019 for n = 10 bits) accommodate occasional filter undershoot and overshoot in image processing. In some applications, these values are used to carry other information (e.g., transparency). The writer of a QuickTime image may use these values and the reader of a QuickTime image must expect these values.
The following tables show the primary values, transfer functions,
and matrixes indicated by the index entries in the 'colr' atom.
The R, G, and B values below are tristimulus values (such as candelas/meter^2), whose relationship to CIE XYZ values can be derived from the primaries and white point specified in the table, using the method described in SMPTE RP 177-1993. In this instance, the R, G, and B values are normalized to the range [0,1].
Index |
Values |
|---|---|
0 |
Reserved |
1 |
Recommendation ITU-R BT.709-2, SMPTE 274M-1995, and SMPTE 296M-1997 white x = 0.3127 y = 0.3290 (CIE III. D65) red x=0.640 y = 0.330 green x = 0.300 y = 0.600 blue x = 0.150 y = 0.060 |
2 |
Primary values are unknown |
3–4 |
Reserved |
5 |
SMPTE RP 145-1993, SMPTE170M-1994, 293M-1996, 240M-1995, and SMPTE 274M-1995 white x = 0.3127 y = 0.3290 (CIE III. D65) red x = 0.64 y = 0.33 green x = 0.29 y = 0.60 blue x = 0.15 y = 0.06 |
6 |
ITU-R BT.709-2, SMPTE 274M-1995, and SMPTE 296M-1997 white x = 0.3127 y = 0.3290 (CIE III. D65) red x = 0.630 y = 0.340 green x = 0.310 y = 0.595 blue x = 0.155 y = 0.070 |
7–65535 |
Reserved |
The transfer functions below are used as shown in Figure 3-2.
Index |
Video Standards |
|---|---|
0 |
Reserved |
1 |
Recommendation ITU-R BT.709-2, SMPTE 274M-1995, 296M-1997, 293M-1996, 170M-1994 See below for transfer function equations. |
2 |
Coefficient values are unknown |
3–6 |
Reserved |
7 |
Recommendation SMPTE 240M-1995 and 274M-1995 See below for transfer function equations. |
8–65535 |
Reserved |
The MPEG-2 sequence display extension transfer_characteristics defines
a code 6 whose transfer function is identical to that in code 1.
QuickTime writers should map 6 to 1 when converting from transfer_characteristics to transferFunction.
Recommendation ITU-R BT.470-4 specified an "assumed gamma value of the receiver for which the primary signals are pre-corrected" as 2.2 for NTSC and 2.8 for PAL systems. This information is both incomplete and obsolete. Modern 525- and 625-line digital and NTSC/PAL systems use the transfer function with code 1 below.
The matrix values are shown in Table 3-6 and in Figure 3-8, Figure 3-9, and Figure 3-10. These figures show a formula for obtaining the normalized value of Y′ in the range [0,1]. You can derive the formula for normalized values of Cb and Cr as follows:
If the equation for normalized Y′ has the form:
![]()
Then the formulas for normalized Cb and Cr are:
![]()
Index |
Video Standard |
|---|---|
0 |
Reserved |
1 |
Recommendation ITU-R BT.709-2 (1125/60/2:1 only), SMPTE 274M-1995, 296M-1997 See below for matrix values. |
2 |
Coefficient values are unknown |
3–5 |
Reserved |
6 |
Recommendation ITU-R BT.601-4 and BT.470-4 System B and G, SMPTE 170M-1994, 293M-1996 See below for matrix values |
7 |
SMPTE 240M-1995, 274M-1995 See below for matrix values |
8–65535 |
Reserved |
The clean aperture extension defines the relationship between the pixels in a stored image and a canonical rectangular region of a video system from which it was captured or to which it will be displayed. This can be used to correlate pixel locations in two or more images—possibly recorded using different systems—for accurate compositing. This is necessary because different video digitizer devices can digitize different regions of the incoming video signal, causing pixel misalignment between images. In particular, a stored image may contain “edge” data outside the canonical display area for a given system.
The clean aperture is either coincident with the stored image or a subset of the stored image; if it is a subset, it may be centered on the stored image, or it may be offset positively or negatively from the stored image center.
The clean aperture extension contains a width in pixels, a height in picture lines, and a horizontal and vertical offset between the stored image center and a canonical image center for the given video system. The width is typically the width of the canonical clean aperture for a video system divided by the pixel aspect ratio of the stored data. The offsets also take into account any “overscan” in the stored image. The height and width must be positive values, but the offsets may be positive, negative, or zero.
These values are given as ratios of two 32-bit numbers, so that applications can calculate precise values with minimum roundoff error. For whole values, the value should be stored in the numerator field while the denominator field is set to 1.
A 32-bit unsigned integer containing the size of the 'clap' atom.
A 32-bit unsigned integer containing the four-character
code 'clap'.
A 32-bit signed integer containing either the width of the clean aperture in pixels or the numerator portion of a fractional width.
A 32-bit signed integer containing either the denominator portion of a fractional width or the number 1.
A 32-bit signed integer containing either the height of the clean aperture in picture lines or the numerator portion of a fractional height.
A 32-bit signed integer containing either the denominator portion of a fractional height or the number 1.
A 32-bit signed integer containing either the horizontal offset of the clean aperture center minus (width–1)/2 or the numerator portion of a fractional offset. This value is typically zero.
A 32-bit signed integer containing either the denominator portion of the horizontal offset or the number 1.
A 32-bit signed integer containing either the vertical offset of the clean aperture center minus (height–1)/2 or the numerator portion of a fractional offset. This value is typically zero.
A 32-bit signed integer containing either the denominator portion of the vertical offset or the number 1.
The format of the data stored in video samples is completely dependent on the type of the compression used, as indicated in the video sample description. The following sections discuss some of the video encoding schemes supported by QuickTime.
Uncompressed RGB data is stored in a variety of different formats. The format used depends on the depth field of the video sample description. For all depths, the image data is padded on each scan line to ensure that each scan line begins on an even byte boundary.
For depths of 1, 2, 4, and 8, the values stored are indexes into the color table specified in the color table ID field.
For a depth of 16, the pixels are stored as 5-5-5 RGB values with the high bit of each 16-bit integer set to 0.
For a depth of 24, the pixels are stored packed together in RGB order.
For a depth of 32, the pixels are stored with an 8-bit alpha channel, followed by 8-bit RGB components.
RGB data can be stored in composite or planar format. Composite format stores the RGB data for each pixel contiguously, while planar format stores the R, G, and B data separately, so the RGB information for a given pixel is found using the same offset into multiple tables. For example, the data for two pixels could be represented in composite format as RGB-RGB or in planar format as RR-GG-BB.
The Y′CbCr color space is widely used for digital video. In this data format, luminance is stored as a single value (Y), and chrominance information is stored as two color-difference components (Cb and Cr). Cb is the difference between the blue component and a reference value; Cr is the difference between the red component and a reference value.
This is commonly referred to as “YUV” format, with “U” standing-in for Cb and “V” standing-in for Cr. This usage is not strictly correct, as YUV, YIC, and Y′CbCr are distinct color models for PAL, NTSC, and digital video, but most Y′CbCr data formats and codecs are described or even named as some variant of “YUV.”
The values of Y, Cb, and Cr can be represented using a variety of bit depths, trading off accuracy for file size. Similarly, the chrominance values can be sub-sampled, recording only one pixel’s color value out of two, for example, or averaging the color value of adjacent pixels. This sub-sampling is a form of compression, but if no additional lossy compression is performed on the sampled video, it is still referred to as “uncompressed” Y′CbCr video. In addition, a fourth component can be added to Y′CbCr video to record an alpha channel.
The number of components (Y′CbCr with or without alpha) and any sub-sampling are denoted using ratios of three or four numbers, such as 4:2:2 to indicate 4 bits of Y to 2 bits each of Cb and Cr (chroma sub-sampling), or 4:4:4 for equal storage of Y, Cb, and Cr (no sub-sampling), or 4:4:4:4 for Y′CbCr plus alpha with no sub-sampling. The ratios do not typically denote actual bit depths.
Uncompressed Y′CbCr video data is typically stored as follows:
Y′, Cb, and Cr components of each line are stored spatially left to right and temporally from earliest to latest.
The lines of a field or frame are stored spatially top to bottom and temporally earliest to latest.
Y′ is an unsigned integer. Cb and Cr are twos-complement signed integers.
The yuv2 stream, for example, is encoded in a series of 4-byte packets. Each packet represents two adjacent pixels on the same scan line. The bytes within each packet are ordered as follows:
y0 u y1 v |
y0 is the luminance
value for the left pixel; y1 the
luminance for the right pixel. u and v are chromatic
values that are shared by both pixels.
Accurate conversion between RGB and Y′CbCr color spaces requires a computation for each component of each pixel. An example conversion from yuv2 into RGB is represented by the following equations:
r = 1.402 * v + y + .5
g = y - .7143 * v - .3437 * u + .5
b = 1.77 * u + y + .5
The r, g, and b values range from 0 to 255.
The coefficients in these equations are derived from matrix
operations and depend on the reference values used for the primary
colors and for white. QuickTime uses canonical values for these
reference coefficients based on published standards. The sample description
extension for Y′CbCr formats includes a 'colr' atom,
which contains indexes into a table of canonical references. This
provides support for multiple video standards without opening the
door to data entry errors for stored coefficient values. Refer to
the published standards for the formulas and methods used to derive
conversion coefficients from the table entries.
QuickTime stores JPEG images according to the rules described in the ISO JPEG specification, document number DIS 10918-1.
MPEG-4 video uses the 'mp4v' data
format. The sample description requires the elementary stream descriptor
('esds') extension to the standard
video sample description. If non-square pixels are used, the pixel
aspect ratio ('pasp') extension is
also required. For details on these extensions, see “Pixel Aspect Ratio ('pasp')” and “MPEG-4 Elementary Stream Descriptor Atom ('esds').”
MPEG-4 video conforms to ISO/IEC documents 14496-1/2000(E) and 14496-2:1999/Amd.1:2000(E).
Motion-JPEG (M-JPEG) is a variant of the ISO JPEG specification for use with digital video streams. Instead of compressing an entire image into a single bitstream, Motion-JPEG compresses each video field separately, returning the resulting JPEG bitstreams consecutively in a single frame.
There are two flavors of Motion-JPEG currently in use. These two formats differ based on their use of markers. Motion-JPEG format A supports markers; Motion-JPEG format B does not. The following paragraphs describe how QuickTime stores Motion-JPEG sample data. Figure 3-11 shows an example of Motion-JPEG A dual-field sample data. Figure 3-12 shows an example of Motion- JPEG B dual-field sample data.
Each field of Motion-JPEG format A fully complies with the ISO JPEG specification, and therefore supports application markers. QuickTime uses the APP1 marker to store control information, as follows (all of the fields are 32-bit integers):
Unpredictable; should be set to 0.
Identifies the data type; this field must be set to 'mjpg'.
The actual size of the image data for this field, in bytes.
Contains the size of the image data, including pad bytes. Some video hardware may append pad bytes to the image data; this field, along with the field size field, allows you to compute how many pad bytes were added.
The offset, in bytes, from the start of the field data to the start of the next field in the bitstream. This field should be set to 0 in the last field’s marker data.
The offset, in bytes, from the start of the field data to the quantization table marker. If this field is set to 0, check the image description for a default quantization table.
The offset, in bytes, from the start of the field data to the Huffman table marker. If this field is set to 0, check the image description for a default Huffman table.
The offset from the start of the field data to the start of image marker. This field should never be set to 0.
The offset, in bytes, from the start of the field data to the start of the scan marker. This field should never be set to 0.
The offset, in bytes, from the start of the field data to the start of the data stream. Typically, this immediately follows the start of scan data.
Note: The last two fields have been added since the original Motion-JPEG specification, and so they may be missing from some Motion-JPEG A files. You should check the length of the APP1 marker before using the start of scan offset and start of data offset fields.
Motion-JPEG format B does not support markers. In place of the marker, therefore, QuickTime inserts a header at the beginning of the bitstream. Again, all of the fields are 32-bit integers.
Unpredictable; should be set to 0.
The data type; this field must be set to 'mjpg'.
The actual size of the image data for this field, in bytes.
The size of the image data, including pad bytes. Some video hardware may append pad bytes to the image data; this field, along with the field size field, allows you to compute how many pad bytes were added.
The offset, in bytes, from the start of the field data to the start of the next field in the bitstream. This field should be set to 0 in the second field’s header data.
The offset, in bytes, from the start of the field data to the quantization table. If this field is set to 0, check the image description for a default quantization table.
The offset, in bytes, from the start of the field data to the Huffman table. If this field is set to 0, check the image description for a default Huffman table.
The offset from the start of the field data to the field’s image data. This field should never be set to 0.
The offset, in bytes, from the start of the field data to the start of scan data.
The offset, in bytes, from the start of the field data to the start of the data stream. Typically, this immediately follows the start of scan data.
Note: The last two fields were “reserved, must be set to zero” in the original Motion-JPEG specification.
The Motion-JPEG format B header must be a multiple of 16 in size. When you add pad bytes to the header, set them to 0.
Because Motion-JPEG format B does not support markers, the JPEG bitstream does not have null bytes (0x00) inserted after data bytes that are set to 0xFF.
Last updated: 2007-09-04