Assembly Language Statements

This chapter describes the assembly language statements that make up an assembly language program.

This is the general format of an assembly language statement:

[ label_field ] [ opcode_field [ operand_field ] ] [ comment_field ]

Each of the depicted fields is described in detail in one of the following sections.

A line may contain multiple statements separated by the @ character for the PowerPC assembler (and a semicolon for the i386 assembler), which may then be followed by a single comment preceded by a semicolon for the PowerPC assembler (and a # character for the i386 assembler):

[ statement [ @ statement ...] ] [ ; comment_field ]

The following rules apply to the use of whitespace within a statement:

Label Field

Labels are identifiers that you use to tag the locations of program and data objects. Each label is composed of an identifier and a terminating colon. The format of the label field is:

identifier: [ identifier: ] ...

The optional label field may occur only at the beginning of a statement. The following example shows a label field containing two labels, followed by a (PowerPC-style) comment:

var: VAR:  ; two labels defined here

As shown here, letters in identifiers are case sensitive, and both uppercase and lowercase letters may be used.

Operation Code Field

The operation code field of an assembly language statement identifies the statement as a machine instruction, an assembler directive, or a macro defined by the programmer:

One or more spaces or tabs must separate the operation code field from the following operand field in a statement. Spaces or tabs are optional between the label and operation code fields, but they help to improve the readability of the program.

Intel i386 Architecture–Specific Caveats

  • i386 instructions can operate on byte, word, or long word data (the last is called “double word” by Intel). The desired size is indicated as part of the instruction mnemonic by adding a trailing b, w, or l:

    Mnemonic

    Description

    b

    Byte (8-bit) data.

    w

    Word (16-bit) data.

    l

    Long word (32-bit) data.

    For instance, a movb instruction moves a byte of data, but a movw instruction moves a 16-bit word of data.

    If no size is specified, the assembler attempts to determine the size from the operands. For example, if the 16-bit names for registers are used as operands, a 16-bit operation is performed. When both a size specifier and a size-specific register name are given, the size specifier is used. Thus, the following are all correct and result in the same operation:

     movw    %bx,%cx
     mov     %bx,%cx
     movw    %ebx,%ecx
  • An i386 operation code can also contain optional prefixes, which are separated from the operation code by a slash (/) character. The prefix mnemonics are:

    Prefix

    Description

    data16

    Operation uses 16-bit data.

    addr16

    Operation uses 16-bit addresses.

    lock

    Exclusive memory lock.

    wait

    Wait for pending numeric exceptions.

    cs, ds, es, fs, gs, ss

    Segment register override.

    rep, repe, repne

    Repeat prefixes for string instructions.

    More than one prefix may be specified for some operation codes. For example:

    lock/fs/xchgl    %ebx,4(%ebp)

    Segment register overrides and the 16-bit data specifications are usually given as part of the operation code itself or of its operands. For example, the following two lines of assembly generate the same instructions:

    movw            %bx,%fs:4(%ebp)
    data16/fs/movl  %bx,4(%ebp)

    Not all prefixes are allowed with all instructions. The assembler does check that the repeat prefixes for strings instructions are used correctly but doesn’t otherwise check for correct usage.

Operand Field

The operand field of an assembly language statement supplies the arguments to the machine instruction, assembler directive, or macro.

The operand field may contain one or more operands, depending on the requirements of the preceding machine instruction or assembler directive. Some machine instructions and assembler directives don’t take any operand, and some take two or more. If the operand field contains more than one operand, the operands are generally separated by commas, as shown here:

[ operand [ , operand ] ... ]

The following types of objects can be operands:

Register operands in a machine instruction refer to the machine registers of the processor or coprocessor. Register names may appear in mixed case.

Intel 386 Architecture–Specific Caveats

The OS X assembler orders operand fields for i386 instructions in the reverse order from Intel’s conventions. Intel’s convention is destination first, source second; OS X assembler’s convention is source first, destination second. Where Intel documentation would describe the Compare and Exchange instruction for 32-bit operands as follows:

CMPXCHG  r/m32,r32    # Intel processor manual convention

The OS X assembler syntax for this same instruction is:

cmpxchg  r32,r/m32    # OS X assembler syntax

So, an example of actual assembly code for the OS X assembler would be:

cmpxchg  %ebx,(%eax)  # OS X assembly code

Comment Field

The assembler recognizes two types of comments in source code:

Direct Assignment Statements

This section describes direct assignment statements, which don’t conform to the normal statement syntax described earlier in this chapter. A direct assignment statement can be used to assign the value of an expression to an identifier. The format of a direct assignment statement is:

identifier = expression

If expression in a direct assignment is absolute, identifier is also absolute, and it may be treated as a constant in subsequent expressions. If expression is relocatable, identifier is also relocatable, and it is considered to be declared in the same program section as the expression.

The use of an assignment statement is analogous to using the .set directive (described in .set), except that the .set directive makes the value of the expression absolute. This is used when an assembly time constant is wanted for what would otherwise generate a relocatable expression using the position independent expression of symbol1 - symbol2. For example, the size of the function is needed as one of the fields of the C++ exception information and is set with:

.set L_foo_size, L_foo_end - _foo
.long L_foo_size ; size of function _foo

where a position independent pointer to the function is another field of the C++ exception information and is set with:

.long _foo - .  ; position independent pointer to _foo

where the runtime adds the address of the pointer to its contents to get a pointer to the function.

Once an identifier has been defined by a direct assignment statement, it may be redefined—its value is then the result of the last assignment statement. There are a few restrictions, however, concerning the redefinition of identifiers: