-
Notifications
You must be signed in to change notification settings - Fork 29
Formalized Intel Syntax for x86
The assembly language for x86 and x86-64 involves two major variations of syntax: the Microsoft assembler (MASM) syntax and the GNU assembler (GAS) syntax. The MASM syntax, also known as the Intel syntax, is prescriptive in Intel Software Developer Manual, and is used extensively by many non-GNU tools. The GNU syntax, also known as the AT&T syntax, derives from PDP-11 assembly to create Unix, and is default and dominant in the post-Unix world.
The advantages of the MASM syntax are:
- It looks more modern, closer to many other assembly languages, such as ARM, MIPS and RISC-V.
- It is the syntax in Intel and AMD documentation.
The disadvantages of the MASM syntax are:
- MASM is proprietary software, but it defines the defacto standard.
- It does not match some mnemonics well. For example,
cvtsi2sd
reads 'ConVerT Scalar Integer TO Scalar Double' i.e. source precedes destination, but it's actuallycvtsi2sd xmm0, rax
i.e. destination precedes source. - The syntax has not been formally described, and causes occasional ambiguity.
For instance, the Intel Software Developer Manual contains
MOV EBX, RAM_START
This is ambiguous in two ways. First, it could be interpreted as either of
MOV EBX, OFFSET RAM_START ; `movl $RAM_START, %ebx`
MOV EBX, DWORD PTR [RAM_START] ; `movl RAM_START, %ebx`
Second, on x86-64 the address might be RIP-relative or absolute, as in
MOV EBX, DWORD PTR [RAM_START]
; x86 absolute ; 8B 1D RAM_START ; `movl RAM_START, %ebx`
; x86-64 RIP-relative ; 8B 1D RAM_START ; `movl RAM_START(%rip), %ebx`
; x86-64 absolute ; 8B 1C 25 RAM_START ; `movl RAM_START, %ebx`
The first issue here is solved by interpreting it as an memory reference, but the ambiguity may still arise if the symbol results from a high-level language, such as C.
When targeting x86, the Microsoft compiler decorates C identifiers: External names that denote objects or functions with the __cdecl
or __stdcall
calling convention are prefixed with an underscore _
; external names that denote functions with the __fastcall
or __vectorcall
calling convention are prefixed with an at symbol @
. This technique prevents symbols from conflicting with keywords in assembly.
But it is no longer the case for x86-64, as well as ARM and ARM64. If a user declares an external variable with the name RSI
, the compiler may generate the ambiguous and incorrect
MOV EAX, DWORD PTR [RSI] ; parsed as `movl (%rsi), %eax`
; should have been `movl RSI, %eax`
This RFC proposes formalization of the Intel syntax, by disallowing certain constructions, to resolve ambiguity.
-
If an indirect reference contains a symbol, the symbol shall always follow a mode specifier (
PTR
orBCST
) orOFFSET
. In other words, only registers and numeric displacements are enclosed within brackets. This idea is shared with GAS syntax.MOV EAX, DWORD PTR [RCX] ; valid, complete: `movl (%rcx), %eax` MOV EAX, DWORD [RCX] ; valid, abbreviated: `movl (%rcx), %eax` MOV EAX, [RCX] ; valid, symbolless: `movl (%rcx), %eax` VMULPD ZMM0, ZMM1, QWORD BCST [RCX] ; valid, complete: `vmulpd (%rcx){1to8}, %zmm1, %zmm0`
-
An overriding segment register shall follow the operand and mode specifier if any; when there is no such specifier, it shall occur at the beginning of the operand.
MOV EAX, DWORD PTR CS:[RCX] ; valid: `movl %cs:(%rcx), %eax` MOV EAX, DWORD CS:[RCX] ; valid: `movl %cs:(%rcx), %eax` MOV EAX, CS:[RCX] ; valid: `movl %cs:(%rcx), %eax`
-
If a valid symbol name follows
PTR
,BCAST
orOFFSET
, after an overriding segment register if any, then it is always treated as a symbol, even when it is a keyword.LEA RAX, bx[RIP] ; invalid: `bx` is parsed as the register due to lack ; of a mode specifier LEA RAX, BYTE PTR bx[RIP] ; valid: `leaq bx(%rip), %rax` MOV EAX, printf ; invalid: `printf` is not a known register MOV EAX, OFFSET printf ; valid: `movl $printf, %eax` MOV EAX, RCX ; invalid: operand size mismatch MOV EAX, OFFSET RCX ; valid: `movl $RCX, %eax` MOV EAX, DWORD PTR [RCX] ; valid: `movl (%rcx), %eax` MOV EAX, DWORD PTR RCX ; valid: `movl RCX, %eax` MOV EAX, DWORD PTR RCX[RIP+12] ; valid: `movl RCX+12(%rip), %eax`
-
For instructions with a dummy memory operand (
LEA
,NOP
, etc.) and those with an uncommon size (FXSAVE
/FXRSTOR
,FNSAVE
/FNRSTOR
, etc.),BYTE PTR
should be used.NOP DWORD PTR [RAX], EAX ; warning: `BYTE PTR` should be used NOP BYTE PTR [RAX], EAX ; valid: 0F 1F 00
-
An RIP-relative operand shall have
RIP
as its base register.MOV EBX, DWORD PTR foo ; valid: `movl foo, %ebx`; might cause linker errors ; on x86-64 MOV EBX, DWORD PTR foo[RIP] ; valid: `movl foo(%rip), %ebx`
-
The base, index, scale and displacement parts of a memory operand shall appear uniformly. The displacement comes first, immediately following the mode specifier and overriding segment register. If there is at least a base or index register, they are all placed in a pair of square brackets. This idea is also shared with GAS syntax.
MOV ECX, DWORD PTR [RSI+RDI*4+field] ; warning: `field` is not a known register and is ; assumed to be a symbol MOV ECX, DWORD PTR field[RSI+RDI*4] ; valid: `movl field(%rsi,%rdi,4), %ecx`