Skip to content

Commit b7b54d5

Browse files
authored
wazevo(docs): optimizing compiler (#2065)
Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>
1 parent 15cc0c5 commit b7b54d5

File tree

5 files changed

+1196
-1
lines changed

5 files changed

+1196
-1
lines changed

site/content/docs/_index.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -143,7 +143,8 @@ Notably, the interpreter and compiler in wazero's [Runtime configuration][Runtim
143143
In wazero, a compiler is a runtime configured to compile modules to platform-specific machine code ahead of time (AOT)
144144
during the creation of [CompiledModule][CompiledModule]. This means your WebAssembly functions execute
145145
natively at runtime of the embedding Go program. Compiler is faster than Interpreter, often by order of
146-
magnitude (10x) or more, and therefore enabled by default whenever available.
146+
magnitude (10x) or more, and therefore enabled by default whenever available. You can read more about wazero's
147+
[optimizing compiler in the detailed documentation]({{< relref "/how_the_optimizing_compiler_works" >}}).
147148

148149
#### Interpreter
149150

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
+++
2+
title = "How the Optimizing Compiler Works"
3+
layout = "single"
4+
+++
5+
6+
wazero supports two modes of execution: interpreter mode and compilation mode.
7+
The interpreter mode is a fallback mode for platforms where compilation is not
8+
supported. Compilation mode is otherwise the default mode of execution: it
9+
translates Wasm modules to native code to get the best run-time performance.
10+
11+
Translating Wasm bytecode into machine code can take multiple forms. wazero
12+
1.0 performs a straightforward translation from a given instruction to a native
13+
instruction. wazero 2.0 introduces an optimizing compiler that is able to
14+
perform nontrivial optimizing transformations, such as constant folding or
15+
dead-code elimination, and it makes better use of the underlying hardware, such
16+
as CPU registers. This document digs deeper into what we mean when we say
17+
"optimizing compiler", and explains how it is implemented in wazero.
18+
19+
This document is intended for maintainers, researchers, developers and in
20+
general anyone interested in understanding the internals of wazero.
21+
22+
What is an Optimizing Compiler?
23+
-------------------------------
24+
25+
Wazero supports an _optimizing_ compiler in the style of other optimizing
26+
compilers such as LLVM's or V8's. Traditionally an optimizing
27+
compiler performs compilation in a number of steps.
28+
29+
Compare this to the **old compiler**, where compilation happens in one step or
30+
two, depending on how you count:
31+
32+
33+
```goat
34+
Input +---------------+ +---------------+
35+
Wasm Binary ---->| DecodeModule |---->| CompileModule |----> wazero IR
36+
+---------------+ +---------------+
37+
```
38+
39+
That is, the module is (1) validated then (2) translated to an Intermediate
40+
Representation (IR). The wazero IR can then be executed directly (in the case
41+
of the interpreter) or it can be further processed and translated into native
42+
code by the compiler. This compiler performs a straightforward translation from
43+
the IR to native code, without any further passes. The wazero IR is not intended
44+
for further processing beyond immediate execution or straightforward
45+
translation.
46+
47+
```goat
48+
+---- wazero IR ----+
49+
| |
50+
v v
51+
+--------------+ +--------------+
52+
| Compiler | | Interpreter |- - - executable
53+
+--------------+ +--------------+
54+
|
55+
+----------+---------+
56+
| |
57+
v v
58+
+---------+ +---------+
59+
| ARM64 | | AMD64 |
60+
| Backend | | Backend | - - - - - - - - - executable
61+
+---------+ +---------+
62+
```
63+
64+
65+
Validation and translation to an IR in a compiler are usually called the
66+
**front-end** part of a compiler, while code-generation occurs in what we call
67+
the **back-end** of a compiler. The front-end is the part of a compiler that is
68+
closer to the input, and it generally indicates machine-independent processing,
69+
such as parsing and static validation. The back-end is the part of a compiler
70+
that is closer to the output, and it generally includes machine-specific
71+
procedures, such as code-generation.
72+
73+
In the **optimizing** compiler, we still decode and translate Wasm binaries to
74+
an intermediate representation in the front-end, but we use a textbook
75+
representation called an **SSA** or "Static Single-Assignment Form", that is
76+
intended for further transformation.
77+
78+
The benefit of choosing an IR that is meant for transformation is that a lot of
79+
optimization passes can apply directly to the IR, and thus be
80+
machine-independent. Then the back-end can be relatively simpler, in that it
81+
will only have to deal with machine-specific concerns.
82+
83+
The wazero optimizing compiler implements the following compilation passes:
84+
85+
* Front-End:
86+
- Translation to SSA
87+
- Optimization
88+
- Block Layout
89+
- Control Flow Analysis
90+
91+
* Back-End:
92+
- Instruction Selection
93+
- Registry Allocation
94+
- Finalization and Encoding
95+
96+
```goat
97+
Input +-------------------+ +-------------------+
98+
Wasm Binary --->| DecodeModule |----->| CompileModule |--+
99+
+-------------------+ +-------------------+ |
100+
+----------------------------------------------------------+
101+
|
102+
| +---------------+ +---------------+
103+
+->| Front-End |----------->| Back-End |
104+
+---------------+ +---------------+
105+
| |
106+
v v
107+
SSA Instruction Selection
108+
| |
109+
v v
110+
Optimization Registry Allocation
111+
| |
112+
v v
113+
Block Layout Finalization/Encoding
114+
```
115+
116+
Like the other engines, the implementation can be found under `engine`, specifically
117+
in the `wazevo` sub-package. The entry-point is found under `internal/engine/wazevo/engine.go`,
118+
where the implementation of the interface `wasm.Engine` is found.
119+
120+
All the passes can be dumped to the console for debugging, by enabling, the build-time
121+
flags under `internal/engine/wazevo/wazevoapi/debug_options.go`. The flags are disabled
122+
by default and should only be enabled during debugging. These may also change in the future.
123+
124+
In the following we will assume all paths to be relative to the `internal/engine/wazevo`,
125+
so we will omit the prefix.
126+
127+
## Index
128+
129+
- [Front-End](frontend/)
130+
- [Back-End](backend/)
131+
- [Appendix](appendix/)
Lines changed: 185 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,185 @@
1+
+++
2+
title = "Appendix: Trampolines"
3+
layout = "single"
4+
+++
5+
6+
Trampolines are used to interface between the Go runtime and the generated
7+
code, in two cases:
8+
9+
- when we need to **enter the generated code** from the Go runtime.
10+
- when we need to **leave the generated code** to invoke a host function
11+
(written in Go).
12+
13+
In this section we want to complete the picture of how a Wasm function gets
14+
translated from Wasm to executable code in the optimizing compiler, by
15+
describing how to jump into the execution of the generated code at run-time.
16+
17+
## Entering the Generated Code
18+
19+
At run-time, user space invokes a Wasm function through the public
20+
`api.Function` interface, using methods `Call()` or `CallWithStack()`. The
21+
implementation of this method, in turn, eventually invokes an ASM
22+
**trampoline**. The signature of this trampoline in Go code is:
23+
24+
```go
25+
func entrypoint(
26+
preambleExecutable, functionExecutable *byte,
27+
executionContextPtr uintptr, moduleContextPtr *byte,
28+
paramResultStackPtr *uint64,
29+
goAllocatedStackSlicePtr uintptr)
30+
```
31+
32+
- `preambleExecutable` is a pointer to the generated code for the preamble (see
33+
below)
34+
- `functionExecutable` is a pointer to the generated code for the function (as
35+
described in the previous sections).
36+
- `executionContextPtr` is a raw pointer to the `wazevo.executionContext`
37+
struct. This struct is used to save the state of the Go runtime before
38+
entering or leaving the generated code. It also holds shared state between the
39+
Go runtime and the generated code, such as the exit code that is used to
40+
terminate execution on failure, or suspend it to invoke host functions.
41+
- `moduleContextPtr` is a pointer to the `wazevo.moduleContextOpaque` struct.
42+
This struct Its contents are basically the pointers to the module instance,
43+
specific objects as well as functions. This is sometimes called "VMContext" in
44+
other Wasm runtimes.
45+
- `paramResultStackPtr` is a pointer to the slice where the arguments and
46+
results of the function are passed.
47+
- `goAllocatedStackSlicePtr` is an aligned pointer to the Go-allocated stack
48+
for holding values and call frames. For further details refer to
49+
[Backend § Prologue and Epilogue](../backend/#prologue-and-epilogue)
50+
51+
The trampoline can be found in`backend/isa/<arch>/abi_entry_<arch>.s`.
52+
53+
For each given architecture, the trampoline:
54+
- moves the arguments to specific registers to match the behavior of the entry preamble or trampoline function, and
55+
- finally, it jumps into the execution of the generated code for the preamble
56+
57+
The **preamble** that will be jumped from `entrypoint` function is generated per function signature.
58+
59+
This is implemented in `machine.CompileEntryPreamble(*ssa.Signature)`.
60+
61+
The preamble sets the fields in the `wazevo.executionContext`.
62+
63+
At the beginning of the preamble:
64+
65+
- Set a register to point to the `*wazevo.executionContext` struct.
66+
- Save the stack pointers, frame pointers, return addresses, etc. to that
67+
struct.
68+
- Update the stack pointer to point to `paramResultStackPtr`.
69+
70+
The generated code works in concert with the assumption that the preamble has
71+
been entered through the aforementioned trampoline. Thus, it assumes that the
72+
arguments can be found in some specific registers.
73+
74+
The preamble then assigns the arguments pointed at by `paramResultStackPtr` to
75+
the registers and stack location that the generated code expects.
76+
77+
Finally, it invokes the generated code for the function.
78+
79+
The epilogue reverses part of the process, finally returning control to the
80+
caller of the `entrypoint()` function, and the Go runtime. The caller of
81+
`entrypoint()` is also responsible for completing the cleaning up procedure by
82+
invoking `afterGoFunctionCallEntrypoint()` (again, implemented in
83+
backend-specific ASM). which will restore the stack pointers and return
84+
control to the caller of the function.
85+
86+
The arch-specific code can be found in
87+
`backend/isa/<arch>/abi_entry_preamble.go`.
88+
89+
[wazero-engine-stack]: https://github.com/tetratelabs/wazero/blob/095b49f74a5e36ce401b899a0c16de4eeb46c054/internal/engine/compiler/engine.go#L77-L132
90+
[abi-arm64]: https://tip.golang.org/src/cmd/compile/abi-internal#arm64-architecture
91+
[abi-amd64]: https://tip.golang.org/src/cmd/compile/abi-internal#amd64-architecture
92+
[abi-cc]: https://tip.golang.org/src/cmd/compile/abi-internal#function-call-argument-and-result-passing
93+
94+
95+
## Leaving the Generated Code
96+
97+
In "[How do compiler functions work?][how-do-compiler-functions-work]", we
98+
already outlined how _leaving_ the generated code works with the help of a
99+
function. We will complete here the picture by briefly describing the code that
100+
is generated.
101+
102+
When the generated code needs to return control to the Go runtime, it inserts a
103+
meta-instruction that is called `exitSequence` in both `amd64` and `arm64`
104+
backends. This meta-instruction sets the `exitCode` in the
105+
`wazevo.executionContext` struct, restore the stack pointers and then returns
106+
control to the caller of the `entrypoint()` function described above.
107+
108+
As described in "[How do compiler functions
109+
work?][how-do-compiler-functions-work]", the mechanism is essentially the same
110+
when invoking a host function or raising an error. However, when a function is
111+
invoked the `exitCode` also indicates the identifier of the host function to be
112+
invoked.
113+
114+
The magic really happens in the `backend.Machine.CompileGoFunctionTrampoline()`
115+
method. This method is actually invoked when host modules are being
116+
instantiated. It generates a trampoline that is used to invoke such functions
117+
from the generated code.
118+
119+
This trampoline implements essentially the same prologue as the `entrypoint()`,
120+
but it also reserves space for the arguments and results of the function to be
121+
invoked.
122+
123+
A host function has the signature:
124+
125+
```
126+
func(ctx context.Context, stack []uint64)
127+
```
128+
129+
the function arguments in the `stack` parameter are copied over to the reserved
130+
slots of the real stack. For instance, on `arm64` the stack layout would look
131+
as follows (on `amd64` it would be similar):
132+
133+
```goat
134+
(high address)
135+
SP ------> +-----------------+ <----+
136+
| ....... | |
137+
| ret Y | |
138+
| ....... | |
139+
| ret 0 | |
140+
| arg X | | size_of_arg_ret
141+
| ....... | |
142+
| arg 1 | |
143+
| arg 0 | <----+ <-------- originalArg0Reg
144+
| size_of_arg_ret |
145+
| ReturnAddress |
146+
+-----------------+ <----+
147+
| xxxx | | ;; might be padded to make it 16-byte aligned.
148+
+--->| arg[N]/ret[M] | |
149+
sliceSize| | ............ | | goCallStackSize
150+
| | arg[1]/ret[1] | |
151+
+--->| arg[0]/ret[0] | <----+ <-------- arg0ret0AddrReg
152+
| sliceSize |
153+
| frame_size |
154+
+-----------------+
155+
(low address)
156+
```
157+
158+
Finally, the trampoline jumps into the execution of the host function using the
159+
`exitSequence` meta-instruction.
160+
161+
Upon return, the process is reversed.
162+
163+
## Code
164+
165+
- The trampoline to enter the generated function is implemented by the
166+
`backend.Machine.CompileEntryPreamble()` method.
167+
- The trampoline to return traps and invoke host functions is generated by
168+
`backend.Machine.CompileGoFunctionTrampoline()` method.
169+
170+
You can find arch-specific implementations in
171+
`backend/isa/<arch>/abi_go_call.go`,
172+
`backend/isa/<arch>/abi_entry_preamble.go`, etc. The trampolines are found
173+
under `backend/isa/<arch>/abi_entry_<arch>.s`.
174+
175+
## Further References
176+
177+
- Go's [internal ABI documentation][abi-internal] details the calling convention similar to the one we use in both arm64 and amd64 backend.
178+
- Raphael Poss's [The Go low-level calling convention on
179+
x86-64][go-call-conv-x86] is also an excellent reference for `amd64`.
180+
181+
[abi-internal]: https://tip.golang.org/src/cmd/compile/abi-internal
182+
[go-call-conv-x86]: https://dr-knz.net/go-calling-convention-x86-64.html
183+
[proposal-register-cc]: https://go.googlesource.com/proposal/+/master/design/40724-register-calling.md#background
184+
[how-do-compiler-functions-work]: ../../how_do_compiler_functions_work/
185+

0 commit comments

Comments
 (0)