Last updated: 09/04/2022
First of all, thank you for being interested in contributing to Maat! Here are the general steps to follow:
- clone Maat's repository on your development machine
- follow instructions in HACKING.md to build and test Maat in developer mode
- create a new branch and implement your feature in it. You should base your branch on
master
for bug fixes and ondev-next
for new features - create a pull request for your branch
If you're wondering whether a given feature is worth implementing, or if you're unsure about how it should be integrated in the existing code-base, feel free to open a discussion on our Github so that we can talk about it beforehand.
Similarly, if you're forking on a new feature and questions arise before it is ready for review, don't hesitate to open a draft pull request for it. We'll use it to answer questions you might have, give traceable feedback, and make code comments and suggestions.
If you're interested in adding an interface for a new architecture in Maat, here are some pointers and general steps to follow. (Please note that here we're talking about adding architectures that are already supported by Ghidra to Maat, not writing a specification for an all new architecture).
This guide takes the example of the X86
architecture.
The first step is to add the architecture definition in src/include/maat/arch.hpp
.
You'll need to add the architecture to the Arch::Type
enum (e.g Arch::Type::X86
), and add
corresponding CPU mode(s) to the CPUMode
enum (e.g CPUMode::X86
).
Then, create a new namespace for the architecture which will contain unique identifiers for the architecture CPU registers, and a specialisation of the generic Arch
class:
/// Namespace for X86-32 specific definitions and classes
namespace X86
{
/* Registers */
static constexpr reg_t EAX = 0; ///< General purpose register
static constexpr reg_t EBX = 1; ///< General purpose register
...
static constexpr reg_t NB_REGS = 69;
class ArchX86: public Arch
{
...
};
}
Finally you can implement the Arch
subclass (e.g ArchX86
) in the src/arch/
folder. Look at other archs implemented in src/arch
and do something similar. There are only a few methods to implement such as:
- generic getters for stack and program counter registers
- getting the size of a register
- register <-> register string translation (e.g
maat::X86::EAX
<->"eax"
)
Once the architecture is implemented, don't forget to add the corresponding file in the source files list in the top-level CMakeLists.txt.
Also don't forget to register the new architecture by adding the appropriate line in
bindings/python/py_arch.cpp
so that the architecture becomes available in Python bindings
in the ARCH
enumeration.
Maat optionally allows to declare registers that are aliases for other registers. For instance for X86
, Maat uses individual registers for the flags (CF
, OF
, ZF
, etc). However, we created the EFLAGS
alias that corresponds to all flags combined. Aliases are defined in the register enumeration just as other registers:
namespace X86
{
/* Registers */
....
static constexpr reg_t EFLAGS = ... ;
...
}
Aliases then need to be supported by a custom getter and setter function when read or assigned in the CPU. Get/sets for register aliases are implemented in src/arch/register_aliases.cpp
. For X86
the functions are x86_alias_setter
and x86_alias_getter
. You can see how the setter takes a 32-bits value for EFLAGS
and then assigns each real individual flag registers according to that value. Similarily, the getter reads each flag register and concatenates them to return a 32-bits value for EFLAGS
.
Note that aliases must be explicitely listed as such. The set of aliases is passed to the CPU, which then knows whether a register is an alias, and can call the custom getter/setter when required:
std::set x86_aliases{X86::EFLAGS};
Last but not least, update the CPUContext::init_alias_getset()
method to support the new architecture, e.g:
void CPUContext::init_alias_getset(Arch::Type arch)
{
if (arch == Arch::Type::X86)
{
alias_setter = x86_alias_setter;
alias_getter = x86_alias_getter;
aliased_regs = x86_aliases;
}
...
// Your architecture goes here
}
Keep in mind that register aliasing comes with a runtime cost, and should be used sparingly.
Now that the new architecture is specified, we need to integrate it in various places in the source code.
First of all, update the MaatEngine::MaatEngine()
constructor in src/engine/engine.cpp
to handle the new architecture.
Then, update the Lifter::Lifter()
constructor in src/arch/lifter.cpp
. The only thing you need to do is point it to the Ghidra sleigh specification files (.sla
& .pspec
) for the architecture you're adding. Tip: browse Ghidra's processor list to find out the spec files names to use.
Last but not least, explicitly add the required sleigh specification files as targets in the top-level CMakeLists.txt. Use the maat_sleigh_compile()
macro which takes two arguments:
- the directory withing Ghidra's processor list which corresponds to the architecture (e.g
x86/
) - the prefix name for the
.sla
and.pspec
files (should be the same prefix, if not then please report it to use)
For example for X86 and X64 you would add:
maat_sleigh_compile(x86 x86-64)
maat_sleigh_compile(x86 x86)
To finish integrating the new architecture in Maat, you have to declare a new register translator function in src/include/maat/sleigh_interface.hpp
. It is a function that takes a register name (as defined by Ghidra's processor specification files), and returns the corresponding Maat register. For example for X86 we have:
maat::ir::Param sleigh_reg_translate_X86(const std::string& reg_name);
Then implement the function in src/third-party/sleigh/native/reg_translator.cpp
:
maat::ir::Param sleigh_reg_translate_X86(const std::string& reg_name)
{
if (reg_name == "AL") return maat::ir::Reg(maat::X86::EAX, 7, 0);
if (reg_name == "AH") return maat::ir::Reg(maat::X86::EAX, 15, 8);
...
}
Finally, update the reg_name_to_maat_reg()
function in src/third-party/sleigh/native/sleigh_interface.cpp
to make it call your register translator function for the new architecture.
After the new architecture is implemented, we recommend to add some unit tests to check that instructions are properly emulated. This will help catch various errors that might have gone under the radar during implementation.
Unitary tests live in the tests/unit-tests/
folder. You can create a new test file (e.g test_archX86.cpp
) in this folder with a method that run all tests we want on the architecture (e.g void test_archX86();
). For guidance on how we write tests, checkout the Writing tests section.
What we usually do is writing a small dedicated test function for each assembly instruction we want to test. This test function runs the instruction with given inputs and checks that the outputs (CPU & memory changed) are correct w.r.t the instruction semantics. It can run the instruction several times on different inputs, or several variants of the instruction (for instance switching between register, immediate, and memory operands).
For a model of how to write unit-tests for an architecture, you can check out the tests we wrote for X64 in tests/unit-tests/test_archX64.cpp
. Please read the following important information before writing any arch testing code:
-
at the time of writing, we only have testing code for X86 and X64. The arch testing code is huge and for the most part consists in old legacy code. It has undergone several API changes and thus can contain some awkward coding patterns, many of whom would require too much time to be rewritten. When writing tests for a new architecture, please feel free to come up with your own cleaner testing code and patterns
-
we've written a lot of tests for X86 and X64. That was very useful for early testing, but for new architecture there's no need to write tests for every single instruction! To some extend we can trust Ghidra's lifting. We recommend to write tests for a subset of the most common instructions, branching instructions, some conditional instructions, and for behaviours that are very arch specific (for example switching between ARM/THUMB on 32-bits ARM)
Using Ghidra's sleigh to lift instructions is great. However, there are some instructions whose semantics can not be described in their IR, p-code. The most obvious examples would the syscall
or cpuid
Intel instructions. For unsupported instructions, sleigh will emit a special p-code operation:
CALLOTHER <NUM>
CALLOTHER
means "I can't model this instruction, please do it yourself using a callback". NUM
uniquely identifies the unsupported instruction for which this CALLOTHER
has been emitted.
It is possible to add callbacks in Maat to execute CALLOTHER
IR instructions. To do so, following the instructions below:
- Add a unique identifier for an unsupported instruction to the
Id
enum insrc/include/maat/callother.hpp
, e.gId::X64_SYSCAL
- Update the
mnemonic_to_id()
function insrc/engine/callother.cpp
. This function takes an architecture and a mnemonic and returns the correspondignId
value (forArch::Type::X64
and"SYSCALL"
it would returnId::X64_SYSCALL
). This function is used by the sleigh interface when generating p-code - Write an emulation callback for the unsupported instruction in
src/engine/callother.cpp
- Don't forget to update the
default_handler_map()
function with the new instruction and handler
Note: depending on the instruction complexity it can be pretty touchy to correctly implement the emulation callback. Moreover, writing a CALLOTHER
callback often requires using the internals, which can be tricky at first. Don't hesitate to reach out to a maintainer for guidance!
Maat has the possibility to emulate to some extend the environment in which a process is running. It has a built-in EnvEmulator
class defined in src/include/maat/env/env.hpp
. The
default EnvEmulator
has the following essential information:
- a simple symbolic filesystem
- default ABIs to use for function calls and system calls
- a map of syscall numbers to callback functions used to emulated system calls
To add support for an new environment in Maat, you can simply create an emulator that inherits from EnvEmulator
. For instance, we implemented LinuxEmulator
defined in
src/include/maat/env/env.hpp
and implemented in src/env/env_linux.cpp
.
It is also possible to extend an existing environment implementation to support a new architecture. For instance, adding ARM support for the existing LinuxEmulator
.
When implementing a new environment, you'll likely need to specify one or several calling conventions for functions and syscalls.
Start by adding the new ABI in the maat::env::abi::Type
enum defined in src/include/maat/env/library.hpp
. Then create a class that inherits from the ABI
class and implements the
required methods to get/set arguments, return values, etc.
Note that ABI classes are intended to be used following a singleton pattern and need to implement the static const ABI& instance();
method.
Once the ABI class is implemented in src/env/abi.cpp
, don't forget to update the _get_default_abi()
and _get_syscall_abi()
methods in src/env/env.cpp
to return the new ABI for the relevant architecture/OS combination.
Once you created a new environment and added new ABIs, you can write custom callbacks to emulate system calls on this environment. A syscall callback must have the following prototype:
FunctionCallback::return_t my_callback(
MaatEngine& engine,
const std::vector<Value>& args
)
It takes a reference to the MaatEngine
that is executing the syscall, and a list of values for the syscall parameters. Thanks to the ABI
class added earlier, Maat will automatically resolve the parameter values and seemlessly pass them to the callback. This way we abstract the underlying architecture from the callback, and only need to change the ABI to make the
callback work for another architecture.
The callback returns a FunctionCallback::return_t
which is defined as a std::variant<>
and can be either:
- nothing (
std::monostate
) - a concrete integer value
- a
Value
instance (which can hold symbolic data)
For examples of implemented system calls, check out src/env/emulated_syscalls/linux_syscalls.cpp
.
Finally, after implementing syscall callbacks, make sure to add them to the new environment.
For example, for Linux we're doing it as a mapping {syscall_num:callback_function}
. We have a helper function syscall_func_map_t linux_x86_syscall_map()
implemented in src/env/emulated_syscalls/linux_syscalls.cpp
and defined in src/include/maat/env/syscalls.hpp
that
generates the syscall map and that is called from LinuxEmulator
's constructor.
Native tests are where 99% of Maat's testing is done. There are two types of native tests:
- unitary tests: they test basic low-level functionalities of the framework. The test files live in the
tests/unit-tests/
folder - advanced tests: they test Maat's API globally, often using Maat to perform some symbolic analysis on small programs and crackmes. They can be found in
tests/adv-tests
For each type of tests, there's a test_all.cpp
file with a main()
function that calls other test functions defined in the other test files.
Both tests are compiled as test binaries named unit-tests
and adv-tests
respectively. Run the binaries to run the tests. If tests are successful, the binaries exit properly. If any test fails, it will raise a maat::test_exception()
with an error message, which will abort the test binary and display the error message.
For examples of how to write tests, just take a look at the existing tests in tests/unit-tests/
.
Python tests are using pytest
and are written using Maat's Python API. They are used only for two things:
- verify that we didn't introduce breaking changes in the Python bindings
- serve as implicit reference scripts for how to use the Python bindings
Python tests mostly consist in crackmes, challenges, and small programs, on which we run advanced symbolic analysis. They are the Python counterpart of the native advanced tests we mentioned above.
No-goals of Python tests include:
- unit-testing of bindings implementation
- unit-testing of Maat functionalities