Yasm defines a set of
standard macros in the NASM preprocessor which are already defined when it starts to
process any source file. If you really need a program to be assembled with no pre-defined
macros, you can use the %clear directive to empty the preprocessor
of everything.
Most user-level NASM syntax directives (see Chapter 5) are implemented as macros which invoke primitive directives; these are described in Chapter 5. The rest of the standard macro set is described here.
The single-line macros
__YASM_MAJOR__, __YASM_MINOR__, and __YASM_SUBMINOR__
expand to the major, minor, and subminor parts of the version number of Yasm being used. In addition,
__YASM_VER__ expands to a string representation of the Yasm version and
__YASM_VERSION_ID__ expands to a 32-bit BCD-encoded representation of
the Yasm version, with the major version in the most significant 8 bits, followed by the
8-bit minor version and 8-bit subminor version, and 0 in the least significant 8 bits.
For example, under Yasm 0.5.1, __YASM_MAJOR__ would be
defined to be 0, __YASM_MINOR__ would be defined as 5,
__YASM_SUBMINOR__ would be defined as 1, __YASM_VER__ would be defined as "0.5.1",
and __YASM_VERSION_ID__ would be defined as 000050100h.
In addition, the single line macro __YASM_BUILD__ expands to the Yasm
“build” number, typically the
Subversion changeset number. It should be seen as less significant than the subminor
version, and is generally only useful in discriminating between Yasm nightly snapshots or
pre-release (e.g. release candidate) Yasm versions.
Like the C preprocessor, the NASM preprocessor allows the user to find out the file
name and line number containing the current instruction. The macro __FILE__ expands to
a string constant giving the name of the current input file (which may change through the
course of assembly if %include directives are used), and
__LINE__ expands to a numeric constant giving the current line number in
the input file.
These macros could be used, for example, to communicate debugging information to a
macro, since invoking __LINE__ inside a macro definition
(either single-line or multi-line) will return the line number of the macro call, rather than definition. So to determine where in a piece of code a crash
is occurring, for example, one could write a routine stillhere, which is passed a line number in EAX and outputs something like “line 155: still here”. You could then write a macro
%macro notdeadyet 0
push eax
mov eax, __LINE__
call stillhere
pop eax
%endmacro
and then pepper your code with calls to notdeadyet until
you find the crash point.
__YASM_OBJFMT__, and
its NASM-compatible alias __OUTPUT_FORMAT__, expand to the object
format keyword specified on the command line
with -f
(see Section 1.3.1.2).
For example, if yasm is invoked with
keyword-f elf, __YASM_OBJFMT__ expands
to elf.
These expansions match the option given on the command line exactly, even when the
object formats are equivalent. For example, -f elf and
-f elf32 are equivalent specifiers for the 32-bit ELF format,
and -f elf -m amd64 and -f elf64
are equivalent specifiers for the 64-bit ELF format, but __YASM_OBJFMT__ would expand to elf and
elf32 for the first two cases, and elf and elf64 for the second two cases.
The NASM preprocessor is
sufficiently powerful that data structures can be implemented as a set of macros. The
macros STRUC and ENDSTRUC are used to define a structure
data type.
STRUC takes one parameter, which is the name of the data
type. This name is defined as a symbol with the value zero, and also has the suffix
_size appended to it and is then defined as an EQU giving the size of the structure. Once STRUC has been issued, you are defining the structure, and should define
fields using the RESB family of pseudo-instructions, and
then invoke ENDSTRUC to finish the definition.
For example, to define a structure called mytype
containing a longword, a word, a byte and a string of bytes, you might code
struc mytype
mt_long: resd 1
mt_word: resw 1
mt_byte: resb 1
mt_str: resb 32
endstruc
The above code defines six symbols: mt_long as 0 (the
offset from the beginning of a mytype structure to the
longword field), mt_word as 4, mt_byte as 6, mt_str as 7, mytype_size as 39, and mytype itself as
zero.
The reason why the structure type name is defined at zero is a side effect of allowing structures to work with the local label mechanism: if your structure members tend to have the same names in more than one structure, you can define the above structure like this:
struc mytype
.long: resd 1
.word: resw 1
.byte: resb 1
.str: resb 32
endstruc
This defines the offsets to the structure fields as mytype.long, mytype.word, mytype.byte and mytype.str.
Since NASM syntax has no intrinsic structure
support, does not support any form of period notation to refer to the elements of a
structure once you have one (except the above local-label notation), so code such as
mov ax,[mystruc.mt_word] is not valid. mt_word is a constant just like any other constant, so the correct
syntax is mov ax,[mystruc+mt_word] or mov ax,[mystruc+mytype.word].
Having defined a structure type, the next thing
you typically want to do is to declare instances of that structure in your data segment.
The NASM preprocessor provides an easy way to do this in the ISTRUC mechanism. To
declare a structure of type mytype in a program, you code
something like this:
mystruc: istruc mytype
at mt_long, dd 123456
at mt_word, dw 1024
at mt_byte, db 'x'
at mt_str, db 'hello, world', 13, 10, 0
iend
The function of the AT macro is to make use of the
TIMES prefix to advance the assembly position to the correct
point for the specified structure field, and then to declare the specified data.
Therefore the structure fields must be declared in the same order as they were specified
in the structure definition.
If the data to go in a structure field requires more than one source line to specify,
the remaining source lines can easily come after the AT
line. For example:
at mt_str, db 123,134,145,156,167,178,189
db 190,100,0
Depending on personal taste, you can also omit the code part of the AT line completely, and start the structure field on the next line:
at mt_str
db 'hello, world'
db 13,10,0
The ALIGN and ALIGNB macros provide a convenient way to
align code or data on a word, longword, paragraph or other boundary. The syntax of the
ALIGN and ALIGNB macros is
align 4 ; align on 4-byte boundary
align 16 ; align on 16-byte boundary
align 16,nop ; equivalent to previous line
align 8,db 0 ; pad with 0s rather than NOPs
align 4,resb 1 ; align to 4 in the BSS
alignb 4 ; equivalent to previous line
Both macros require their first argument to be a power of two; they both compute the
number of additional bytes required to bring the length of the current section up to a
multiple of that power of two, and output either NOP fill or apply the TIMES prefix to their second argument to perform the alignment.
If the second argument is not specified, the default for ALIGN is NOP, and the default for
ALIGNB is RESB 1. ALIGN treats a NOP argument specially by
generating maximal NOP fill instructions (not necessarily NOP opcodes) for the current
BITS setting, whereas ALIGNB
takes its second argument literally. Otherwise, the two macros are equivalent when a
second argument is specified. Normally, you can just use ALIGN in code and data sections and ALIGNB
in BSS sections, and never need the second argument except for special purposes.
ALIGN and ALIGNB, being
simple macros, perform no error checking: they cannot warn you if their first argument
fails to be a power of two, or if their second argument generates more than one byte of
code. In each of these cases they will silently do the wrong thing.
ALIGNB (or ALIGN with a
second argument of RESB 1) can be used within structure
definitions:
struc mytype2
mt_byte: resb 1
alignb 2
mt_word: resw 1
alignb 4
mt_long: resd 1
mt_str: resb 32
endstruc
This will ensure that the structure members are sensibly aligned relative to the base of the structure.
A final caveat: ALIGNB works relative to the beginning of
the section, not the beginning of the address
space in the final executable. Aligning to a 16-byte boundary when the section you’re
in is only guaranteed to be aligned to a 4-byte boundary, for example, is a waste of
effort. Again, Yasm does not check that the section’s alignment characteristics are
sensible for the use of ALIGNB. ALIGN is more intelligent and does adjust the section alignment to be the maximum specified
alignment.