Next Previous

Assembly Language Statements

This chapter describes the assembly language statements that make up an assembly language program.

This is the general format of an assembly language statement:

[ label_field ] [ opcode_field [ operand_field ] ] [ comment_field ]

Each of the depicted fields is described in detail in one of the following sections.

A line may contain multiple statements separated by the @ character for the PowerPC assembler (and a semicolon for the i386 assembler), which may then be followed by a single comment preceded by a semicolon for the PowerPC assembler (and a # character for the i386 assembler):

[ statement [ @ statement ...] ] [ ; comment_field ]

The following rules apply to the use of whitespace within a statement:

Spaces or tabs are used to separate fields.
At least one space or tab must occur between the opcode field and the operand field.
Spaces may appear within the operand field.
Spaces and tabs are significant when they appear in a character string.

Label Field

Labels are identifiers that you use to tag the locations of program and data objects. Each label is composed of an identifier and a terminating colon. The format of the label field is:

identifier: [ identifier: ] ...

The optional label field may occur only at the beginning of a statement. The following example shows a label field containing two labels, followed by a (PowerPC-style) comment:

var: VAR:  ; two labels defined here

As shown here, letters in identifiers are case sensitive, and both uppercase and lowercase letters may be used.

Operation Code Field

The operation code field of an assembly language statement identifies the statement as a machine instruction, an assembler directive, or a macro defined by the programmer:

A machine instruction is indicated by an instruction mnemonic. An assembly language statement that contains an instruction mnemonic is intended to produce a single executable machine instruction. The operation and use of each instruction is described in the manufacturer’s user manual.
An assembler directive (or pseudo-op) performs some function during the assembly process. It doesn’t produce any executable code, but it may assign space for data in the program.
Macros are defined with the .macro directive (see .macro, .endmacro, .macros_on, and .macros_off for more information).

One or more spaces or tabs must separate the operation code field from the following operand field in a statement. Spaces or tabs are optional between the label and operation code fields, but they help to improve the readability of the program.

Intel i386 Architecture–Specific Caveats

i386 instructions can operate on byte, word, or long word data (the last is called “double word” by Intel). The desired size is indicated as part of the instruction mnemonic by adding a trailing b, w, or l:
Mnemonic
Description
b
Byte (8-bit) data.
w
Word (16-bit) data.
l
Long word (32-bit) data.
For instance, a movb instruction moves a byte of data, but a movw instruction moves a 16-bit word of data.
If no size is specified, the assembler attempts to determine the size from the operands. For example, if the 16-bit names for registers are used as operands, a 16-bit operation is performed. When both a size specifier and a size-specific register name are given, the size specifier is used. Thus, the following are all correct and result in the same operation:
movw %bx,%cx
mov %bx,%cx
movw %ebx,%ecx

Mnemonic	Description
`b`	Byte (8-bit) data.
`w`	Word (16-bit) data.
`l`	Long word (32-bit) data.

An i386 operation code can also contain optional prefixes, which are separated from the operation code by a slash (/) character. The prefix mnemonics are:

Prefix	Description
`data16`	Operation uses 16-bit data.
`addr16`	Operation uses 16-bit addresses.
`lock`	Exclusive memory lock.
`wait`	Wait for pending numeric exceptions.
`cs`, `ds`, `es`, `fs`, `gs`, `ss`	Segment register override.
`rep`, `repe`, `repne`	Repeat prefixes for string instructions.

More than one prefix may be specified for some operation codes. For example:

lock/fs/xchgl    %ebx,4(%ebp)

Segment register overrides and the 16-bit data specifications are usually given as part of the operation code itself or of its operands. For example, the following two lines of assembly generate the same instructions:

movw            %bx,%fs:4(%ebp)

data16/fs/movl  %bx,4(%ebp)

Not all prefixes are allowed with all instructions. The assembler does check that the repeat prefixes for strings instructions are used correctly but doesn’t otherwise check for correct usage.

Operand Field

The operand field of an assembly language statement supplies the arguments to the machine instruction, assembler directive, or macro.

The operand field may contain one or more operands, depending on the requirements of the preceding machine instruction or assembler directive. Some machine instructions and assembler directives don’t take any operand, and some take two or more. If the operand field contains more than one operand, the operands are generally separated by commas, as shown here:

[ operand [ , operand ] ... ]

The following types of objects can be operands:

Register operands
Register pairs
Address operands
String constants
Floating-point constants
Register lists
Expressions

Register operands in a machine instruction refer to the machine registers of the processor or coprocessor. Register names may appear in mixed case.

Intel 386 Architecture–Specific Caveats

The OS X assembler orders operand fields for i386 instructions in the reverse order from Intel’s conventions. Intel’s convention is destination first, source second; OS X assembler’s convention is source first, destination second. Where Intel documentation would describe the Compare and Exchange instruction for 32-bit operands as follows:

CMPXCHG  r/m32,r32    # Intel processor manual convention

The OS X assembler syntax for this same instruction is:

cmpxchg  r32,r/m32    # OS X assembler syntax

So, an example of actual assembly code for the OS X assembler would be:

cmpxchg  %ebx,(%eax)  # OS X assembly code

Comment Field

The assembler recognizes two types of comments in source code:

A line whose first nonwhitespace character is the hash character (#) is a comment. This style of comment is useful for passing C preprocessor output through the assembler. Note that comments of the form:
# line_number file_name level
get turned into:
.line line_number; .file file_name
This can cause problems when comments of this form that aren’t intended to specify line numbers precede assembly errors, since the error is reported as occurring on a line relative to that specified in the comment. Suppose a program contains these two lines of assembly source:
# 500
.var
If .var hasn’t been defined, this fragment results in the following error message:
var.s:500:Unknown pseudo-op: .var
A comment field, appearing on a line after one or more statements. The comment field consists of the appropriate comment character and all the characters that follow it on the line:
Character
Description
;
Comment character for PowerPC processors
#
Comment character for i386 architecture processors
An assembly language source line can consist of just the comment field; in this case, it’s equivalent to using the hash character comment style:
# This is a comment.
; This is a comment.
Note the warning given above for hash character comments beginning with a number.

Character	Description
`;`	Comment character for PowerPC processors
`#`	Comment character for i386 architecture processors

Direct Assignment Statements

This section describes direct assignment statements, which don’t conform to the normal statement syntax described earlier in this chapter. A direct assignment statement can be used to assign the value of an expression to an identifier. The format of a direct assignment statement is:

identifier = expression

If expression in a direct assignment is absolute, identifier is also absolute, and it may be treated as a constant in subsequent expressions. If expression is relocatable, identifier is also relocatable, and it is considered to be declared in the same program section as the expression.

The use of an assignment statement is analogous to using the .set directive (described in .set), except that the .set directive makes the value of the expression absolute. This is used when an assembly time constant is wanted for what would otherwise generate a relocatable expression using the position independent expression of symbol1 - symbol2. For example, the size of the function is needed as one of the fields of the C++ exception information and is set with:

.set L_foo_size, L_foo_end - _foo

.long L_foo_size ; size of function _foo

where a position independent pointer to the function is another field of the C++ exception information and is set with:

.long _foo - .  ; position independent pointer to _foo

where the runtime adds the address of the pointer to its contents to get a pointer to the function.

Once an identifier has been defined by a direct assignment statement, it may be redefined—its value is then the result of the last assignment statement. There are a few restrictions, however, concerning the redefinition of identifiers:

Register identifiers may not be redefined.
An identifier that has already been used as a label should not be redefined, since this would amount to redefining the address of a place in the program. Moreover, an identifier that has been defined in a direct assignment statement cannot later be used as a label. Only the second situation produces an assembler error message.

Next Previous