Chapter 3. The NASM Language

Table of Contents

3.1. Layout of a NASM Source Line
3.2. Pseudo-Instructions
3.2.1. DB and Friends: Declaring Initialized Data
3.2.2. RESB and Friends: Declaring Uninitialized Data
3.2.3. INCBIN: Including External Binary Files
3.2.4. EQU: Defining Constants
3.2.5. TIMES: Repeating Instructions or Data
3.3. Effective Addresses
3.3.1. 64-bit Displacements
3.3.2. RIP Relative Addressing
3.4. Immediate Operands
3.5. Constants
3.5.1. Numeric Constants
3.5.2. Character Constants
3.5.3. String Constants
3.5.4. Floating-Point Constants
3.6. Expressions
3.6.1. |: Bitwise OR Operator
3.6.2. ^: Bitwise XOR Operator
3.6.3. &: Bitwise AND Operator
3.6.4. << and >>: Bit Shift Operators
3.6.5. + and -: Addition and Subtraction Operators
3.6.6. *, /, //, % and %%: Multiplication and Division
3.6.7. Unary Operators: +, -, ~ and SEG
3.6.8. SEG and WRT
3.7. STRICT: Inhibiting Optimization
3.8. Critical Expressions
3.9. Local Labels

3.1. Layout of a NASM Source Line

Like most assemblers, each NASM source line contains (unless it is a macro, a preprocessor directive or an assembler directive: see Chapter 5) some combination of the four fields

label:  instruction operands        ; comment

As usual, most of these fields are optional; the presence or absence of any combination of a label, an instruction and a comment is allowed. Of course, the operand field is either required or forbidden by the presence and nature of the instruction field.

NASM uses backslash (\) as the line continuation character; if a line ends with backslash, the next line is considered to be a part of the backslash-ended line.

NASM places no restrictions on white space within a line: labels may have white space before them, or instructions may have no space before them, or anything. The colon after a label is also optional. Note that this means that if you intend to code lodsb alone on a line, and type lodab by accident, then that’s still a valid source line which does nothing but define a label. Running NASM with the command-line option -w+orphan-labels will cause it to warn you if you define a label alone on a line without a trailing colon.

Valid characters in labels are letters, numbers, _, $, #, @, ~, ., and ?. The only characters which may be used as the first character of an identifier are letters, . (with special meaning: see Section 3.9), _ and ?. An identifier may also be prefixed with a $ to indicate that it is intended to be read as an identifier and not a reserved word; thus, if some other module you are linking with defines a symbol called eax, you can refer to $eax in NASM code to distinguish the symbol from the register.

The instruction field may contain any machine instruction: Pentium and P6 instructions, FPU instructions, MMX instructions and even undocumented instructions are all supported. The instruction may be prefixed by LOCK, REP, REPE/REPZ or REPNE/REPNZ, in the usual way. Explicit address-size and operand-size prefixes A16, A32, O16 and O32 are provided. You can also use the name of a segment register as an instruction prefix: coding es mov [bx],ax is equivalent to coding mov [es:bx],ax. We recommend the latter syntax, since it is consistent with other syntactic features of the language, but for instructions such as LODSB, which has no operands and yet can require a segment override, there is no clean syntactic way to proceed apart from es lodsb.

An instruction is not required to use a prefix: prefixes such as CS, A32, LOCK or REPE can appear on a line by themselves, and NASM will just generate the prefix bytes.

In addition to actual machine instructions, NASM also supports a number of pseudo-instructions, described in Section 3.2.

Instruction operands may take a number of forms: they can be registers, described simply by the register name (e.g. AX, BP, EBX, CR0): NASM does not use the gas-style syntax in which register names must be prefixed by a % sign), or they can be effective addresses (see Section 3.3), constants (Section 3.5) or expressions (Section 3.6).

For floating-point instructions, NASM accepts a wide range of syntaxes: you can use two-operand forms like MASM supports, or you can use NASM’s native single-operand forms in most cases. For example, you can code:

        fadd    st1             ; this sets st0 := st0 + st1
        fadd    st0, st1        ; so does this

        fadd    st1, st0        ; this sets st1 := st1 + st0
        fadd    to st1          ; so does this

Almost any floating-point instruction that references memory must use one of the prefixes DWORD, QWORD, TWORD, DDQWORD, or OWORD to indicate what size of ((memory operand)) it refers to.