-
Notifications
You must be signed in to change notification settings - Fork 29
Formalized Intel Syntax for x86
The assembly language for x86 and x86-64 involves two major variations of syntax: the Microsoft assembler (MASM) syntax and the GNU assembler (GAS) syntax. The MASM syntax, also known as the Intel syntax, is prescriptive in Intel Software Developer Manual, and is used extensively by many non-GNU tools. The GNU syntax, also known as the AT&T syntax, derives from PDP-11 assembly to create Unix, and is default and dominant in the post-Unix world.
The advantages of the MASM syntax are:
- It looks more modern, closer to many other assembly languages, such as ARM, MIPS and RISC-V.
- It is the syntax in Intel and AMD documentation.
The disadvantages of the MASM syntax are:
- MASM is proprietary software.
- The syntax has not been formally defined, and causes ambiguity sometimes.
For instance, the Intel Software Developer Manual contains this line:
MOV EBX, RAM_START
This is ambiguous in two ways. First, it could be interpreted as either of
MOV EBX, OFFSET RAM_START ; `movl $RAM_START, %ebx`
MOV EBX, DWORD PTR [RAM_START] ; `movl RAM_START, %ebx`
Second, on x86-64 the address might be RIP-relative or absolute, as in
MOV EBX, DWORD PTR [RAM_START]
; x86 absolute ; 8B 1D RAM_START ; `movl RAM_START, %ebx`
; x86-64 RIP-relative ; 8B 1D RAM_START ; `movl RAM_START(%rip), %ebx`
; x86-64 absolute ; 8B 1C 25 RAM_START ; `movl RAM_START, %ebx`
The first issue here is solved by interpreting it as an memory reference, but the ambiguity may still arise if the symbol results from a high-level language, such as C.
When targeting x86, the Microsoft compiler decorates C identifiers: External names that denote objects or functions with the __cdecl
or __stdcall
calling convention are prefixed with an underscore _
; external names that denote functions with the __fastcall
or __vectorcall
calling convention are prefixed with an at symbol @
. This technique prevents symbols from conflicting with keywords in assembly.
But it is no longer the case for x86-64 (as well as ARM and ARM64). If a user declares an external variable with the name RSI
, the compiler may generate the ambiguous and incorrect
MOV EAX, DWORD PTR [RSI] ; parsed as `movl (%rsi), %eax`
; should have been `movl rsi, %eax`
This RFC proposes formalization of the Intel syntax, by disallowing certain constructions to resolve ambiguity.
-
Indirect references shall always contain a mode specifier. Plain brackets are no longer allowed.
MOV EAX, [RCX] ; invalid: operand size and mode specifier are required MOV EAX, DWORD [RCX] ; invalid: mode specifier is required MOV EAX, DWORD PTR [RCX] ; valid: `movl (%rcx), %eax` VMULPD ZMM0, ZMM1, QWORD BCST [RCX] ; valid: `vmulpd (%rcx){1to8}, %zmm1, %zmm0` LEA RAX, bx[RIP] ; invalid: operand size and mode specifier are required LEA RAX, BYTE PTR bx[RIP] ; valid: `leaq bx(%rip), %rax`
-
Overriding segment registers shall occur before the operand size and mode specifier.
MOV EAX, DWORD PTR CS:[RCX] ; maybe invalid: symbol name cannot contain `:` MOV EAX, CS:DWORD PTR [RCX] ; valid: `movl %cs:(%rcx), %eax`
-
If an identifier follows
PTR
,BCAST
orOFFSET
, then it is always treated as a symbol, even when it is a keyword. In other words, only registers are enclosed within brackets. This idea is shared with GAS syntax.MOV EAX, printf ; invalid: `printf` is not a known register MOV EAX, OFFSET printf ; valid: `movl $printf, %eax` MOV EAX, RCX ; invalid: operand size mismatch MOV EAX, OFFSET RCX ; valid: `movl $RCX, %eax` MOV EAX, DWORD PTR [RCX] ; valid: `movl (%rcx), %eax` MOV EAX, DWORD PTR RCX ; valid: `movl rcx, %eax` MOV EAX, DWORD PTR RCX[RIP+10] ; valid: `movl rcx+10(%rip), %eax`
-
For instructions with a dummy memory operand (
LEA
,NOP
, etc.) and those with an uncommon size (FXSAVE
/FXRSTOR
,FNSAVE
/FNRSTOR
, etc.),BYTE PTR
shall be used.NOP DWORD PTR [RAX], EAX ; invalid: `BYTE PTR` is requred NOP BYTE PTR [RAX], EAX ; valid: 0F 1F 00
-
RIP-relative operands must have
RIP
as the base register.MOV EBX, DWORD PTR foo ; valid: `movl RAM_START, %ebx` ; note: might cause linker errors on x86-64 MOV EBX, DWORD PTR foo[RIP] ; valid: `movl RAM_START(%rip), %ebx`
-
The base, index, scale and displacement parts of a memory operand shall appear uniformly. The displacement comes first, immediately following the mode specifier. If there is at least a base or index register, they are all placed in a pair of square brackets. This idea is also shared with GAS syntax.
MOV ECX, DWORD PTR [RSI+RDI*4+field] ; invalid: `field` is not a known register MOV ECX, DWORD PTR field[RSI+RDI*4] ; valid: `movl field(%rsi,%rdi,4), %ecx`