Decompiles binary wasm modules into a text format that is significantly more compact and familiar (for users of C-style languages).
Example:
bin/wasm-decompile test.wasm -o test.dcmp
This tool is aimed at users that want to be able to "read" large volumes of Wasm code such as language, runtime and tool developers, or any programmers that may not have the source code of the generated wasm available, or are trying to understand what the generated code does.
The syntax has been designed to be as light-weight and as readable as possible, while still allowing one to see the underlying Wasm constructs clearly.
Be a programming language.
Though compiling this output code back into a wasm module is possible, such functionality is currently not provided. The format is very low-level, much like Wasm itself, so even though it looks more high level than the .wat format, it wouldn't be any more suitable for general purpose programming.
This section shows some aspects of the language in terms of how they map to Wasm and/or how they might differ from a typical C-like language. It does not try to define the actual semantics of Wasm, the reader is expected to already be mostly familiar with that.
wasm-decompile, much like wasm2wat, derives names from the name section
(preferrably), or linker symbols (if available), or import/export (if not
available in the other 2). For things that have no
names, names are generated starting from a
, b
, c
and so forth.
In addition, prefixes are used for things that are not arguments/locals:
f_
for functions, g_
for globals, etc.
Existing names may be generated "demangled" C++ function signatures, which in the case of functions using STL types may end up several hundred characters long. Besides removing characters not typically part of an identifier, the decompiler also strips common keywords/types from these in an effort to reduce their size.
Linker symbols are typically only available in wasm .o files, though if useful
for naming can be retained in fully linked wasm modules using the
--emit-reloc
flag to wasm.ld
. This gives you names for most functions
even when --strip-debug
was used.
Top level items may be preceded with import
or export
.
Memory is declared like memory m(initial: 1, max: 0);
Globals: global my_glob:int;
Data: data d_a(offset: 0) = "Hello, World!";
Functions (see below for instructions that may appear between {}
):
function f(a:int, b:int):int { return a + b; }
An expression is generated for any sequence of Wasm instructions that leave exactly 1 value on the stack.
For instructions that leave no value values on the stack, a statement is generated, which is an expression that sits on its own line in the context of a control-flow block, or the function itself. A statement may also be generated for expressions that return a value through control flow, such as a branch instruction.
Instructions that leave multiple values on the stack, or otherwise do stack
operations that break the "expression order", instead force the values to be
written to temporary variables (named t1
, t2
etc) which the subsequent
instructions can then operate upon (this does not happen with MVP-only code).
Arguments are defined in the function signature, as shown above.
Locals are defined upon first use: var my_local:int = 1;
The decompiler uses int
and long
for 32-bit and 64-bit integers, and
float
and double
for 32-bit and 64-bit floating point numbers.
Besides these, there are the types byte
and ubyte
(8-bit), short
and
ushort
(16-bit), and uint
, which are used exclusively with certain
load/store operations.
These tend to be the hardest to "read" in Wasm code, as they've lost all context of the data structures and types the language that Wasm was compiled from was operating upon.
wasm-decompile has a few features to try and make these more readable.
The basic form looks like an array indexing operation, so o[2]:int
says: read
element 2 from o
when seen as an array of ints. This thus accesses 4 bytes
at byte-offset 8.
o
is just declared as an int
, since there is no such thing as a pointer
type in Wasm. But wasm-decompile tries to derive them. For example, if the
code is doing o[0]:int = o[1]:int + o[2]:int
, then wasm-decompile assumes
o
points to a struct with 3 ints, and may instead compile this to:
var o:{ a:int, b:int, c:int };
o.a = o.b + o.c
The {}
type is a nameless struct declaration (named ones tbd) that hints the
reader at what kind of memory layout o
is accessing. This seems more
informative than just uncorrelated indices all over the code.
Sadly, optimized output from a compiler like LLVM often reworks memory accesses
in such crazy ways that this "struct detection" fails, for example it falls
back to indexing operations when there are holes or overlaps in the memory
layout, or types are mixed, etc. This happens even more so when locals such
as o
are being re-used for unrelated things in memory.
For accesses that are not contiguous, but at least of the same type, the
decompiler will change the pointer type from o:int
to e.g. o:float_ptr
(and
similarly, it will omit the type from the actual access, o[2]
instead
of o[2]:int
).
Additionally, wasm-decompile tried to clean up typical indexing operations.
For example, when accessing any array of 32-bit elements, generated Wasm
code often looks like (base + (index << 2))[0]:int
, since Wasm has no
built-in way to scale the index by the type of thing being loaded.
wasm-decompile then transforms this into just base[index]:int
, since the
scaling of anything between the []
by the type size is already implied.
Wasm's if-then maps fairly directly to a C-like if (c) { 1; } else { 2; }
.
Unlike most languages, these if-thens can also be expressions, as shown in
this example (wasm-decompile does not currently use the ?:
ternary).
Wasm's loop becomes a loop L { ...; continue L; }
structure. The
inclusion of a label means nested loops can continue any of them.
Wasm's blocks are little more than a label for forward jumps, and cause
excessive amounts of nesting in other text formats such as .wat, so here
they are reduced to what they naturally are: a label. This label uses {}
for denoting a block only when used as an expression, so typically does not
indent, and thus doesn't cause endless nesting:
if (c) goto L;
...
label L:
wasm-decompile uses the following operator precedence to reduce the amount of
()
needed in expressions, from high (needs no ()
) to low (always needs
()
when nested):
()
,a
,1
,a()
[]
if () {} else {}
*
,/
,%
+
,-
<<
,>>
==
,!=
,<
,>
,>=
,<=
&
,|
min
,max
=
Only +
and *
are associative, i.e. can have multiple of them in sequence
without additional ()
.