This section shows you how to work with SMIL to create a basic layout, define display regions, create a timeline with sequential and parallel media elements, specify media elements and set their durations, and make an element into a clickable link. This section also illustrates a technique to show different elements to different viewers using a switch. It illustrates use of the SMIL elements currently supported by QuickTime.
Note: QuickTime does not support all the tags and atributes defined for SMIL version 1.0 or 2.0. QuickTime does support all the tags and attributes used in this section.
SMIL Structure
SMIL Media Elements
SMIL HREF Links
Dynamic SMIL Elements
SMIL is based on XML, which is more rigidly structured than HTML but uses the same familiar <tag> and </tag> syntax.
Because it is XML-based, SMIL tags are case-sensitive (always lowercase) and all tags have to be explicitly ended—either a tag is self-contained and ends with /> (<tagparameters="values"/>), or there are a pair of open and close tags that may enclose other elements (<tagparameters="values">elements</tag>).
Unlike HTML, SMIL does not routinely mix structure and content together in the same document; a SMIL file contains only structure and the URLs of content. Where an HTML document typically contains body text, for example, a SMIL document would contain the URL of a text file instead.
Like HTML, a SMIL document has a head and a body. The structure of a SMIL file is shown in listing 1-1.
Listing 2-1 SMIL structure
<smil> |
<head> |
<layout> |
<!-- layout tags --> |
</layout> |
</head> |
<body> |
<!-- body tags --> |
</body> |
</smil> |
All the layout information is specified in the head. This determines where things can be displayed on the screen. Some or all of the screen is divided into any number of rectangular regions, which may partly or completely overlap. The media elements are listed in the body, which is also where the temporal sequencing is specified. Visual elements in the body are assigned to a region (defined in the head) where they are to be displayed.
The layout specifies the whole display area for the presentation, then defines regions where individual media elements can be displayed.
A SMIL layout always starts with a <root-layout/> tag that gives the dimensions of the display area in pixels and assigns a background color:
<layout> |
<root-layout id="main" width="320" height="240" |
background-color="red" /> |
</layout> |
The id parameter gives the presentation a name; it can be anything you like. The height and width parameters define the display area for the presentation in pixels. You can specify the background color using hexadecimal values (#FF0000) or names (red). Listing 1-2 is a very simple SMIL presentation—it’s just a red rectangle, but you can play it using QuickTime Player:
Listing 2-2 Simple SMIL presentation
<smil> |
<head> |
<layout> |
<root-layout id="main" width="320" height="240" |
background-color="red" /> |
</layout> |
</head> |
<body> </body> |
</smil> |
The layout also defines regions within the display area. Regions themselves are invisible, but they define areas where visual media elements can be displayed. Regions can be positioned anywhere in the display area and can overlap, as shown in Listing 1-3.
Listing 2-3 Layout with root layout and two regions
<head> |
<layout> |
<root-layout id="main" width="320" height="240" |
background-color="red"/> |
<region id="r1" width="160" height="120" /> |
<region id="r2" width="50%" height="100%" |
left="100" top="0" /> |
</layout> |
</head> |
The first region is named r1, and is 160 x 120 pixels, extending from the top-left corner of the display area (the default position for a region).
The second region, r2, is half as wide as the display area (width="50%") and fills it from top to bottom (height="100%"). Region r2 is offset 100 pixels from the left edge of the display area (left="100"). Since the first region is 160 pixels wide, the two regions overlap by 60 pixels.
The top-left corner of a media element is always aligned with the top-left corner of the region it is displayed in. If you need to position an image somewhere else, just create another region at a different position—you can have as many regions as you like, and each one uses only a few bytes.
The <region/> tag accepts the following parameters:
id—gives each region a name, much like an HTML frame name.
height and width—define the size of the region, either in pixels or as a percentage of the display area.
top and left (optional)—specify the position of the region within the display area, either in pixels or as a percentage of the display area.
By default, a region extends from the top-left corner of the display area. You can change this by specifying a top and left offset. For example, top="50%"left="100" creates a region whose top-left corner is halfway down and 100 pixels from the left edge of the display area.
If you set the top or left parameter, you must specify bothtop and left as a pair, even if one of them is zero.
z-index (optional)—specifies the layering order when regions overlap.
When regions overlap, one lies on top of the other. By default, a region defined later in the layout is on top of any regions defined earlier. You can set the layering explicitly using the z-index parameter. The layer with the highest z-index value is on top. The following example defines three regions with explicit z-axis values.
<region id="r1" width="160" height="120" z-index="3" /> |
<region id="r2" width="160" height="120" z-index="2" /> |
<region id="r3" width="160" height="120" z-index="1" /> |
The three regions overlap completely, with r1 on top, r2 in the middle, and r3 at the bottom of the pile. If no z-index values had been specified, the layering would be reversed, with the last-defined region on top.
fit (optional)—defines how media elements are cropped or scaled if they have pixel dimensions different from the region they’re displayed in. There are four possible values for this parameter:
fit="hidden" (default)—images are not scaled. If an image is larger than the region, it is cropped. If an image is smaller than a region, part of the region is left empty.
fit="fill"—images are scaled to match the height and width of the region, so an image always fills the region completely. The image’s aspect ratio may be distorted to make it fit.
fit="meet"—images are scaled to meet the region’s boundaries while preserving each image’s aspect ratio, without cropping. An image may not fill the region completely, but always fills either the whole width or the whole height. The image is not cropped or distorted.
fit="slice"—images are scaled to fill the region completely while preserving each image’s aspect ratio, cropping if necessary. If the aspect ratio of an image differs from the region, the image is cropped by taking a slice from the edge or bottom where it would extend beyond the region.
Unlike HTML, scaling and cropping are applied to regions, not images. If you want to display different images on the same part of the screen, using different fit settings, create more than one region with the same area and x,y coordinates, but different fit values, and assign each image to the appropriate region. Create as many regions as you need.
Listing 1-4 is a SMIL document with two overlapping regions. It looks like a red rectangle when you play it using QuickTime Player, because regions are invisible; they just define areas where media elements can be displayed.
Listing 2-4 SMIL with overlapping regions
<smil> |
<head> |
<layout> |
<root-layout id="main" width="320" height="240" |
background-color="red"/> |
<region id="r1" width="160" height="120" /> |
<region id="r2" width="50%" height="100%" left="100" |
top="0" fit="fill" /> |
</layout> |
</head> |
<body> </body> |
</smil> |
The body of a SMIL document specifies what media elements to present, which regions to display the visual elements in, and a timeline for the presentation.
The timeline groups media elements in two ways: things that happen in sequence and things that happen in parallel. If you don’t specify whether elements should be played sequentially or in parallel, QuickTime plays them in sequence. Sequences are surrounded by the <seq> and </seq> tags. Media elements in a sequence are presented one after the other—each element is presented after the previous element ends. There are different ways to determine when an element should end.
Media elements such as audio and video have an inherent duration, so they end when you would expect them to. For example:
<seq> |
<audio src="audio1.mp3" /> |
<audio src="audio2.aiff" /> |
<audio src="audio3.wav" /> |
</seq> |
This sequence plays three audio files in a row. Each element ends when the audio has played all the way through. As soon as one element ends, the next begins.
Note that audio components have no visual part, so they are not assigned to a region.
Media elements such as still images and text have no inherent duration, so they’re usually assigned explicit durations:
<seq> |
<img src="image1.jpg" region="r1" dur="5 sec" /> |
<img src="image2.gif" region="r1" dur="7 sec" /> |
</seq> |
In this example, the first image ends after being displayed for 5 seconds, then the second image appears and is displayed for 7 seconds. If you specify an explicit duration for an element that has its own inherent duration, it either ends when it normally would or after the duration you specify, whichever comes first.
Media elements that are displayed at the same time are surrounded by the <par> and </par> tags. Parallel elements are presented starting at the same time, but they don’t necessarily end at the same time. For example:
<par> |
<audio src="themesong.mp3" /> |
<img src="poster.jpg" region="r1" dur="30 sec" /> |
<text src="lyrics.txt" region="r2" dur="30 sec" /> |
</par> |
This example plays an MP3 audio file while simultaneously displaying a JPEG image in one region and some text in another. The image and the text are displayed for 30 seconds; the audio element ends whenever the MP3 finishes playing.
You can put a group of parallel elements into a sequence. The parallel group is treated as a single element in the sequence. All the elements in the parallel group start together at the appropriate point in the sequence. When the last element in the parallel group ends, the sequence continues, as shown in the following example.
<seq> |
<video src="Intro.mov" region="r1" /> |
<par> |
<audio src="narration.aiff" /> |
<video src="slides.mov" region="r1" /> |
</par> |
<text src="credits.txt" dur="20 sec" region="r1" /> |
</seq> |
In this example, Intro.mov plays first. The narration and the slides start together as soon as Intro.mov ends. When both the narration and the slides have ended, the credits are displayed.
You can combine parallel and sequential elements in different ways to achieve the same effect. When you have a choice, always choose the structure that creates the smallest number of elements. Each element in the SMIL structure takes time to import, slightly delaying the start of the presentation. As structures nest ever deeper, these slight delays can add up.
For example, suppose you want to display two sequences of images side by side. There are two obvious approaches.
You could create a sequence of parallel elements, with the left and right image pairs specified as parallel elements. If you have twenty image pairs, this results in a single sequence element containing twenty parallel elements, each with two media elements. This creates a file with a total of 21 structural elements, as shown in Listing 1-5.
Listing 2-5 Unnecessarily complex structure
<seq> |
<par> |
<img src="slide1A.jpeg" region="left" dur="30 sec" /> |
<img src="slide1B.jpeg" region="right" dur="30 sec" /> |
</par> |
<par> |
<img src="slide2A.jpeg" region="left" dur="30 sec" /> |
<img src="slide2B.jpeg" region="right" dur="30 sec" /> |
</par> |
. |
. |
. |
</seq> |
Alternatively, you could create a single parallel element with two sequences, a left sequence and a right sequence, each with twenty media elements, as shown in Listing 1-6. This specifies an identical presentation of media, but creates a file with only 3 structural elements.
Listing 2-6 Simpler alternative structure
<par> |
<seq> |
<img src="slide1A.jpeg" region="left" dur="30 sec" /> |
<img src="slide2A.jpeg" region="left" dur="30 sec" /> |
. |
. |
. |
</seq> |
<seq> |
<img src="slide1B.jpeg" region="right" dur="30 sec" /> |
<img src="slide2B.jpeg" region="right" dur="30 sec" /> |
. |
. |
. |
</seq> |
</par> |
This version of the presentation will open much more quickly than the version with 21 structural elements.
SMIL media elements are classified by type and specified by URL. Each visual media element is assigned to a region defined in the layout. The media type, the URL, and the region for visual media must be specified. All other parameters are optional.
There are currently six defined media types:
<audio/> (nonvisual)
<video/>
<img/>
<text/>
<textstream/>
<animation/>
Use the media type that most closely describes a given media element. For a sound-only QuickTime movie, for example, use the <audio/> media type. SMIL isn’t terribly strict about this, so you can specify a FLIC animation file, for example, using either <animation/> or <video/>. Each media element is specified by a src parameter whose value is a URL. The URL can be absolute or relative and can use any protocol that QuickTime understands, including HTTP and RTSP.
Some example media types and URLs:
<audio src="http://www.myserver.com/path/myaudio.mp3" /> |
<video src="rtsp://streamserver.com/VideoOnDemand.mov"/> |
<img src="slides/slide01.jpg" /> |
<text src="subtitles.txt" /> |
<textstream src="rtsp://streamserver.com/streamtext.mov" /> |
<animation "http://www.myserver.com/myanim.flc" /> |
If the URL is specified as a local file, it would be file:///.
Important: The QuickTime plug-in can resolve absolute or relative URLs, and QuickTime Player can resolve absolute URLs, but as of this writing, QuickTime Player cannot resolve relative URLs unless they refer to documents in the same folder as the SMIL document itself. In other words, if you’re targeting QuickTime Player, you can specify a relative URL such as src="movie.mov", but not src="../movie.mov" or src="subfolder/movie.mov".
One URL protocol you may not be familiar with is data:, which lets you embed a media element inside your SMIL document. It’s normally used to embed small amounts of text that would otherwise require a separate file. Here’s an example of a data: URL:
<text region="aregion" dur="1:30" src="data:text/plain,Copyright Apple Computer, 2000" /> |
Note: The data: protocol identifier is followed immediately by the data format descriptor and a comma, then the actual data, with no blank spaces (except those that are part of the data). Because the SMIL file is a plain text file, binary data such as images must be encoded in a 7-bit ASCII format, such as Base64.
Every visual media element needs to be assigned to a display region defined in the layout. Only one element can be displayed in a region at any time (but you can have multiple regions covering the same screen area).
If the media element contains an image that is larger or smaller than its assigned display region, the image can be scaled, clipped, or both scaled and clipped, depending on the value of the fit parameter for that region.
Note: Clipping and scaling are attributes of a region, not a media element. To use different scaling or cropping guidelines for different images displayed at the same location, create multiple regions covering the same area but with different fit values.
Listing 1-7 illustrates the use of regions to display a sequence of images.
Listing 2-7 SMIL document that displays a series of JPEG images
<smil> |
<head> |
<layout> |
<root-layout id="slideshow" width="320" height="240" |
background-color="black"/> |
<region id="r1" width="100%" height="100%" fit="meet" /> |
</layout> |
</head> |
<body> |
<seq> |
<img src="http://www.myserver.com/ourlogo.jpg" |
region="r1" dur="5sec" /> |
<img src="slide1.jpg" region="r1" dur="5sec" /> |
<img src="slide2.jpg" region="r1" dur="5sec" /> |
</seq> |
</body> |
</smil> |
This example displays a sequence of three JPEG images. All the images are displayed in the same region and are automatically scaled to fill the region as completely as possible without clipping or changing their aspect ratios. Each image has a duration of 5 seconds.
Some media elements, such as audio and video, have inherent duration. Text and still images, however, have no inherent duration. The easiest way to assign a duration is with the dur parameter. For example:
<img src="slide1.jpg" region="r1" dur="30sec" /> |
You can assign an explicit duration to override an element’s inherent duration. For example, if you specify
<audio src="sound1.wav" dur="1:05" /> |
the audio file sound1.wav ends after 1 minute 5 seconds, or when the audio finishes naturally, whichever comes first.
Duration is specified in Hours:Minutes:Seconds.DecimalFractions. You can leave off the hours, or the hours and minutes, or the fractions. You can add the “sec” identifier to make things more readable. The following five expressions are all equivalent:
dur="00:00:05.000" |
dur="00:05.000" |
dur="05.000" |
dur="05" |
dur="5 Sec" |
Another way to explicitly set an element’s duration is to specify an end time or an end event. An element ends when its duration is exceeded, its end time or end event occurs, or it reaches its inherent end, whichever comes first. Setting begin and end parameters are discussed next.
You can specify an explicit start time and end time, or an event that triggers an element’s start or end, using the begin and end parameters. The time value that you specify is relative to when the element would normally begin.
For example, when you specify the following media element
<img src="slide1.jpg" region="r1" begin="5sec"/> |
you get this timing:
If the element is part of a <seq></seq> sequence, it begins 5 seconds after the preceding element ends.
If the element is part of a <par></par> group, it begins 5 seconds after the parallel group as a whole begins.
If you specify an end time, the element ends that amount of time after it would naturally begin. For example:
<img src="slide1.jpg" region="r1" begin="5sec" end="35sec" /> |
In this example, the image begins 5 seconds after its natural start time, and it ends 35 seconds after its natural start time, giving it a duration of 30 seconds. The element’s duration is equal to its end time minus its start time. If no begin value is specified, an end value is the equivalent of a dur value.
Alternately, you can specify that an element should begin or end when another element begins, ends, or reaches a specified duration. Instead of using a time as the value of the begin or end parameter, use the string:
"id(idname)(event)" |
where idname is the id value of another element, and event is either begin,end, or a time value. For example:
<par> |
<audio src="themesong.mp3" id="x" /> |
<img src="poster.jpg" region="r1" end="id(x)(end)" /> |
<text src="lyrics.txt" region="r2" end="id(x)(end)" /> |
</par> |
This example assigns an id value of x to the audio and sets the end of the image and text elements to synchronize with the end of element x.
Another example:
<par> |
<audio src="Sound1.aif" id="master" /> |
<audio src="Sound2.aif" begin="id(master)(5sec)" /> |
<audio src="Sound3.aif" end="id(master)(end)" /> |
</par> |
In this example, the element Sound1.aif begins normally and has the id of master.Sound2.aif begins 5 seconds after Sound1.aif begins. Sound3.aif begins normally (at the same time as Sound1.aif), but ends when Sound1.aif ends.
You can specify a clip from a media element using the clip-begin and clip-end attributes. This lets you play a selection from a longer file or stream. This works most efficiently with local content or streams. It does not work as well with files delivered by HTTP, because HTTP files always download in their entirety, even if only a small clip is actually played, and the file must download to the clip-begin point before it can begin playing.
The following example plays a clip from the movie some.mov, starting 4 seconds into the movie and ending 1 minute and 1 second into the movie:
<video src="some.mov" region="movieregion" |
clip-begin = "npt=0:04" |
clip-end = "npt=1:01" |
/> |
Note that the time stamp has a different format than other SMIL attributes or QuickTime parameters; it is preceded by the label npt= and the label and parameter value are jointly surrounded by quotes. The format is limited to MM:SS. Hours and fractional seconds are not supported. For values less than 60 seconds, minutes must be specified as zero and cannot be simply omitted (for example, clip-begin="npt=0:59").
You can specify clip-begin without specifying clip-end.
You can make any visual media element in a SMIL document into a clickable link by using the <a></a> tags. You can direct the URL to load in a browser window or to replace the current SMIL presentation.
To make a visual element into a link, take the following steps:
Precede the element with the <a> tag.
Put the URL of the link in the href parameter of the <a> tag.
Set the show parameter of the <a> tag to new or replace.
Follow the element with the </a> tag.
In the following example, the Apple website loads in the default browser window if the user clicks in region r1 while poster.jpg is being displayed.
<a href="http://www.apple.com/" show="new" > |
<img src="poster.jpg" region="r1" dur="00:05" /> |
</a> |
The show parameter can have two possible values:
show="replace"—replaces the current SMIL presentation in the plug-in or QuickTime Player (whichever is active). In this case, the URL must specify something that QuickTime can play.
show="new"—opens the URL in the default browser window. In this case, the URL can specify a web page or anything the browser or one of its plug-ins can display.
You can use show="new" to target a specific browser frame, specific browser window, or QuickTime Player, using the target SMIL extension. Refer to the section “QuickTime SMIL Extensions,” for more information.
QuickTime doesn’t currently allow you to jump to a named point in SMIL presentations—you can’t use URLs of the form href=name.smi#name or href=#name.
You can automatically present different elements to different viewers using the <switch></switch> tags.
SMIL supports a set of user attributes, such as screen resolution, color depth, maximum data rate, and language. Groups of elements can be listed between <switch> and </switch> tags. QuickTime selects one element from the list based on user attributes, much like QuickTime’s alternate track and alternate movie mechanism.
This can be used to select an audio track based on language, as shown in the following example:
<switch> |
<audio src="french.aif" system-language="fr"/> |
<audio src="german.aif" system-language="de"/> |
<audio src="english.aif" system-language="en"/> |
</switch> |
This example selects french.aif for French speakers, german.aif for German speakers, and english.aif for English speakers.
The <switch> element selects the first item in the list that matches the user’s system attributes. If you select an item based on connection speed, order the elements from highest speed to lowest speed—QuickTime loads the first element whose requirement is less than or equal to the viewer’s connection speed, as illustrated in the following example:
<switch> |
<audio src="192k.mp3" system-bitrate=192000"/> |
<audio src="128k.mp3" system-bitrate="128000"/> |
<audio src="qdesign.mov" system-bitrate="28800"/> |
</switch> |
To provide a default, make the default the last item in the list and don’t specify a required attribute. It’s usually a good idea to include a default, as there may be cases you haven’t allowed for explicitly. The following example selects french.aif for French speakers, german.aif for German speakers, and english.aif for all others.
<switch> |
<audio src="french.aif" system-language="fr"/> |
<audio src="german.aif" system-language="de"/> |
<audio src="english.aif"/> |
</switch> |
QuickTime supports the following user attributes:
system-bitrate—corresponds to the user’s connection speed in the QuickTime Settings control panel. The QuickTime settings are specified in kilobits per second, but system-bitrate is set in bits per second, so multiply the QuickTime setting by 1000 to get the correct system-bitrate. For example, the correct system-bitrate for QuickTime’s “56 Kbps Modem/ISDN” is 56000 and the system-bitrate corresponding to “256 Kbps DSL/Cable” is 256000.
system-screen-size—the minimum required screen resolution in pixels. The resolution is specified by HEIGHTxWIDTH. Note that this is contrary to common usage—a 640 x 480 minimum screen resolution is specified by setting system-screen-size="480x640".
system-screen-depth—the minimum required color depth, in bits. Common values are 8 (256 colors), 16 (thousands of colors), and 24 (millions of colors).
system-language—corresponds the user’s system language setting. The language is specified by a two-character code matching the ISO 639 language code specification, such as these:
Arabic—AR
Chinese—ZH
Danish—DA
Dutch—NL
English—EN
French—FR
German—DE
Greek—EL
Italian—IT
Japanese—JA
Korean—KO
Persian (Farsi)—FA
Polish—PL
Portuguese—PT
Russian—RU
Spanish—ES
Swahili—SW
Swedish—SV
For additional language codes, refer to http://www.oasis-open.org/cover/iso639a.html
Last updated: 2005-06-04