Rudroid - this might arguably be one of the worst Android emulators possible. In this blog, we'll write an emulator that can run a 'Hello World' Android ELF binary. While doing this, we will learn how to go about writing our own emulators.
Writing an emulator is an awesome way to study and probably master the low-level details of the system we are trying to emulate. I assume you have some working knowledge of Rust, a Linux machine with Rust installed or a Docker engine, and a lot of patience to go through the documentation of system calls, file formats, and more.
Topics we need to understand while writing Rudroid:
- Basic Android Operating System Architecture
- What are system calls
- How system calls are handled in AArch64
- How memory mapping works
- How the operating system loads an ELF into memory and runs it
- How we can emulate the behavior of Operating system to load an ELF into memory and run
Let's start by reading the definition of Android:
Android is an open-source, Linux-based software stack created for a wide array of devices and form factors. The following diagram shows the major components of the Android platform.
The basic architecture of Linux kernel:
Core functionalities of a kernel are:
- Process management
- Device management
- Memory management
- Interrupt handling
- Block I/O communication
- File System Management
For writing an emulator that just runs an Android ELF binary, the most interesting kernel components are Memory Management, File System Management, Process Management and Interrupt handling, and System Call Interface via which ELF communicates with Kernel.
Signals: The kernel uses signals to call into a process. For example, signals are used to notify a process of certain faults, such as division by zero.
Processes and Scheduler: Creates, schedules, and manages processes.
Virtual Memory: Allocates and manages virtual memory for processes.
File Systems: Implements the file and filesystem-related interfaces for user-space to communicate with the underlying disks.
Traps and faults: Handles traps and faults generated by the processor, such as a memory fault.
Physical memory: Manages the pool of page frames in real memory and allocates pages for virtual memory.
Interrupts: Handles all the interrupts from peripheral devices.
System calls: The system call is the means by which a process requests a specific kernel service for example read from a file, write to file, execute a program. There are several hundred system calls, which can be roughly grouped into six categories:
* file system
* process
* scheduling
* interprocess communication (ipc)
* socket (networking)
* miscellaneous.
An emulator usually has an MMU to manage guest's memory requests, an instruction interpreter (decode -> translate -> execute), signal handlers, interrupt handlers.
These are the steps an emulator usually does:
- load the target binary to memory
- figure out the ISA of target binary
- if emulator supports the ISA, initialize CPU
- initialize signal handlers
- initialize interrupt handlers
- initialize syscall handlers
- start CPU loop
What happens inside a CPU Loop:
- fetch opcode to execute at Program Counter
- increment Program Counter
- decode opcode
- translate opcode from emulated ISA to host ISA
- execute the translated opcode
- handle any raised signals/interrupts
- continue the loop
So, our Rudroid is just going to be a binary that implements an ELF loader, memory management, system call interface, filesystem. The final Rudroid's binary should take the ELF that prints 'Hello World' to stdout as command-line argument and execute it on the host. The command should look something like this:
# ./Rudroid hello_world.elf
hello world
We are going to run our Rudroid on a Linux machine. This is how our Rudroid's architecture is going to look like:
We'll try not to dwell too much into the details of the ELF file format. Take a look at this comprehensive ELF standard here.
Executable (ELFs) and shared object files (libraries) statically represent programs. When you decide to run a binary, the operating system starts by setting up a new process for the program to run.
ELFs are composed of three major components:
- an executable header (
Ehdr
) - Sections (section header are represented as
Shdr
) - Segments (also known as Program Headers are represented as
Phdr
)
Ehdr
as defined in /usr/include/elf.h
typedef struct {
unsigned char e_ident[16]; /* Magic number and other info */
uint16_t e_type; /* Object file type */
uint16_t e_machine; /* Architecture */
uint32_t e_version; /* Object file version */
uint64_t e_entry; /* Entry point virtual address */
uint64_t e_phoff; /* Program header table file offset */
uint64_t e_shoff; /* Section header table file offset */
uint32_t e_flags; /* Processor-specific flags */
uint16_t e_ehsize; /* ELF header size in bytes */
uint16_t e_phentsize; /* Program header table entry size */
uint16_t e_phnum; /* Program header table entry count */
uint16_t e_shentsize; /* Section header table entry size */
uint16_t e_shnum; /* Section header table entry count */
uint16_t e_shstrndx; /* Section header string table index*/
} Elf64_Ehdr;
Phdr
as defined in /usr/include/elf.h
typedef struct elf64_phdr {
Elf64_Word p_type;
Elf64_Word p_flags;
Elf64_Off p_offset; /* Segment file offset */
Elf64_Addr p_vaddr; /* Segment virtual address */
Elf64_Addr p_paddr; /* Segment physical address */
Elf64_Xword p_filesz; /* Segment size in file */
Elf64_Xword p_memsz; /* Segment size in memory */
Elf64_Xword p_align; /* Segment alignment, file & memory */
} Elf64_Phdr;
Shdr
as defined in /usr/include/elf.h
typedef struct elf64_shdr {
Elf64_Word sh_name; /* Section name, index in string tbl */
Elf64_Word sh_type; /* Type of section */
Elf64_Xword sh_flags; /* Miscellaneous section attributes */
Elf64_Addr sh_addr; /* Section virtual addr at execution */
Elf64_Off sh_offset; /* Section file offset */
Elf64_Xword sh_size; /* Size of section in bytes */
Elf64_Word sh_link; /* Index of another section */
Elf64_Word sh_info; /* Additional section information */
Elf64_Xword sh_addralign; /* Section alignment */
Elf64_Xword sh_entsize; /* Entry size if section holds table */
} Elf64_Shdr;
The kernel only really cares about Ehdr and Phdrs and only three types of program header entries:
- PT_LOAD : Loadable Segment
- PT_INTERP : Segment holding .interp section
- PT_GNU_STACK : flag to set program's stack to executable
The ELF loader in the kernel starts loading ELF by first examining the ELF header to check the validity of ELF. After this, the loader now loops over the program header entries, looking for PT_LOAD and PT_INTERP. For every PT_LOAD entry, the loader maps memory at load_address + phdr_header.p_vaddr
of size phdr_header.mem_size
and copies the contents of the segment into allocated memory. If PT_INTERP is found, the loader again parses this as an ELF file and maps it into memory, and keeps track of the entrypoints of the main ELF file and interpreter's ELF file.
Once this is done, the loader starts setting up and populating the stack with auxiliary vector
(ELF tables), environment variables, and command-line arguments passed to the ELF. An ELF auxiliary vector is an (id, value) pair that describes useful information about the program being run and the environment it is running in.
For this, we need an ELF parser in rust. We can either write our own ELF parser or use an already existing xmas-elf crate.
Before we could start writing an ELF loader, we also need a memory manager as we have to map the ELF into memory, manage stack, etc. Let's look at how a memory manager works.
Linux memory management subsystem is responsible, as the name implies, for managing the memory in the system. This includes implementation of virtual memory and demand paging, memory allocation both for kernel internal structures and userspace programs, mapping of files into processes address space, and many other cool things.
It provides functionality to map
and unmap
memory allocations. We have to implement these functionalities:
- map memory at a given location or of a given size
- unmap memory at a given location or of a given size
- read from memory
- write to memory
- manage permissions of the memory
Mapping ranges from an address to address + size_of_the_mapping. We can look at mmap
reference from the manual here.
void *mmap(void *addr, size_t length, int prot, int flags,
int fd, off_t offset);
mmap() creates a new mapping in the virtual address space of the calling process. The starting address for the new mapping is specified in addr. The length argument specifies the length of the mapping (which must be greater than 0).
Memory protections:
PROT_EXEC
Pages may be executed.
PROT_READ
Pages may be read.
PROT_WRITE
Pages may be written.
PROT_NONE
Pages may not be accessed.
Unicorn Engine offers this functionality:
/// Map a memory region in the emulator at the specified address.
///
/// `address` must be aligned to 4kb or this will return `Error::ARG`.
/// `size` must be a multiple of 4kb or this will return `Error::ARG`.
pub fn mem_map(&mut self,
address: u64,
size: libc::size_t,
perms: Protection
) -> Result<(), uc_error>;
/// Unmap a memory region.
///
/// `address` must be aligned to 4kb or this will return `Error::ARG`.
/// `size` must be a multiple of 4kb or this will return `Error::ARG`.
pub fn mem_unmap(&mut self,
address: u64,
size: libc::size_t
) -> Result<(), uc_error>;
/// Set the memory permissions for an existing memory region.
///
/// `address` must be aligned to 4kb or this will return `Error::ARG`.
/// `size` must be a multiple of 4kb or this will return `Error::ARG`.
pub fn mem_protect(&mut self,
address: u64,
size: libc::size_t,
perms: Protection
) -> Result<(), uc_error> {
let err = unsafe { ffi::uc_mem_protect(self.uc, address, size, perms.bits()) };
if err == uc_error::OK {
Ok(())
} else {
Err(err)
}
}
We can define protections and mapping as structs in rust:
bitflags! {
#[repr(C)]
pub struct Protection : u32 {
const NONE = 0;
const READ = 1;
const WRITE = 2;
const EXEC = 4;
const ALL = 7;
}
}
pub struct MapInfo {
pub memory_start : u64,
pub memory_end : u64,
pub memory_perms : Protection,
pub description : String,
}
Using these mem_map
, mem_unmap
functions from Unicorn, We can implement our MMU as a hashmap of starting address and MapInfo struct.
We'll also look at how system calls work and then start writing our Emulator.
A system call is a routine that allows a user application to request actions that require special privileges or functionalities. Adding system calls is one of several ways to extend the functions provided by the kernel.
In AArch64, there are special instructions for making such system calls. These instructions cause an exception, which allows controlled entry into a more privileged Exception level.
- SVC - Supervisor call: Causes an exception targeting EL1. Used by an application to call the OS.
- HVC - Hypervisor call: Causes an exception targeting EL2. Used by an OS to call the hypervisor, not available at EL0.
- SMC - Secure monitor call: Causes an exception targeting EL3. Used by an OS or hypervisor to call the EL3 firmware, not available at EL0.
InAArch64, the system call number is passed in X8
register and the return value in X0
register. We will use Unicorn's hooks to hook onto these SVC calls and execute the corresponding system call and return the results.
Since writing emulating all the AArch64 instructions is a tedious job, we will make use of Unicorn Engine for emulating the instructions. We will still see how it works.
Finally, we'll start writing the code for our Rudroid. Let's see how easy or complex it will be.
I'm going to use a Linux Docker container on my Apple M1 as the host for running Rudroid.
Rudroid's Dockerfile:
FROM rust:latest
RUN apt update -y
RUN apt install -y nano cmake
WORKDIR /setup
RUN git clone https://github.com/unicorn-engine/unicorn/
WORKDIR /setup/unicorn/
RUN ./make.sh
RUN ./make.sh install
WORKDIR /setup/
RUN git clone https://github.com/keystone-engine/keystone/
RUN mkdir build
WORKDIR /setup/keystone/build
RUN ../make-share.sh
RUN make install
RUN cp /usr/local/lib/libkeystone.so* /usr/lib/
RUN apt-get install -y clang llvm binutils-dev libunwind-dev
WORKDIR /home/
#!/bin/bash
image=Rudroid
docker build -t $image .
docker run --rm -v `pwd`:/home -v `pwd`/resources/:/resources/ -it $image bash
$ chmod +x run.sh
$ run.sh
root@9346e6664ae9:/home/code#
Here we are installing the required rust, unicorn-engine, capstone-engine, and keystone-engine.
We will extend Unicorn
impl from Unicorn Rust crate and add system call handlers, file system management, etc. I took only the required files and discarded the remaining.
➜ src git:(main) ✗ tree core/unicorn/
| |____
| | |____unicorn_const.rs
| | |____ffi.rs
| | |____mod.rs
| | |____arch
| | | |____arm64.rs
| | | |____mod.rs
Let's set up the below directory structure:
We are going to need libc
crate to interact/forward our system calls to the host and xmas-elf
crate for parsing ELF file. Add libc = "0.2.101"
and xmas-elf = "0.8.0"
to dependencies in Cargo.toml. Also added some helpers functions in utilities.rs
to print in color.🎨
[package]
name = "Rudroid"
version = "0.1.0"
edition = "2018"
[dependencies]
libc = "0.2.101"
bitflags = ">=1.1.0"
xmas-elf = "0.8.0"
byteorder = "1.4.3"
keystone = "0.9.0"
capstone="0.10.0"
nix = "0.22.1"
So I deleted the Unicorn new
implementation and struct
definition and added a new struct definition inside core/rudroid.rs
. Our new implementation declares a new struct called Emulator that keeps track of details of the Elf file, filesystem, and Unicorn hooks.
// #[derive(Debug)]
pub struct Emulator<D> {
pub debug : bool,
pub rootfs : String,
pub elf_path : String,
pub machine : header::Machine,
pub endian : header::Data,
pub arch : Arch,
pub uc : ffi::uc_handle,
pub uc_type : D,
pub filesystem : fs::FsScheme,
// mmu stuff
pub load_address : u64,
pub mmap_address : u64,
pub new_stack : u64,
pub interp_address : u64,
pub entry_point : u64,
pub elf_entry : u64,
pub brk_address : u64,
//elf arguments
pub args : Vec<String>,
pub env : Vec<String>,
pub map_infos : HashMap<u64, mmu::MapInfo>,
//hook
pub code_hooks : HashMap<*mut libc::c_void, Box<ffi::CodeHook<D>>>,
pub mem_hooks : HashMap<*mut libc::c_void, Box<ffi::MemHook<D>>>,
pub intr_hooks : HashMap<*mut libc::c_void, Box<ffi::InterruptHook<D>>>,
pub insn_in_hooks : HashMap<*mut libc::c_void, Box<ffi::InstructionInHook<D>>>,
pub insn_out_hooks : HashMap<*mut libc::c_void, Box<ffi::InstructionOutHook<D>>>,
pub insn_sys_hooks : HashMap<*mut libc::c_void, Box<ffi::InstructionSysHook<D>>>,
// syscalls stuff
pub sigmap : HashMap<u64, Vec<u8>>,
_pin : std::marker::PhantomPinned,
}
Now we have to implement Emulator
.
impl<D> Emulator<D>
{
pub fn new(elf_path: &str, rootfs: &str, elf: &mut ElfFile, endian: header::Data, args: Vec<String>, env: Vec<String>, data: D, debug: bool) -> Result<Emulator<D>, uc_error> {
let mut machine = elf.header.pt2.machine().as_machine();
let (arch, mode) = match machine {
header::Machine::AArch64 => {
(Arch::ARM64, Mode::LITTLE_ENDIAN)
},
_ => {
panic!("Not implemented yet!")
}
};
let mut handle = std::ptr::null_mut();
//uc_open: Create new instance of unicorn engine.
let err = unsafe { ffi::uc_open(arch, mode, &mut handle) };
//create a new Emulator and return.
let mut emu = Emulator {
debug : debug,
rootfs : String::from(rootfs),
elf_path : String::from(elf_path),
args : args,
env : env,
uc : handle,
uc_type : data,
arch : arch,
machine : machine,
endian : endian,
map_infos : HashMap::new(),
entry_point : 0,
elf_entry : 0,
brk_address : 0,
mmap_address : 0,
interp_address : 0,
new_stack : 0,
load_address : 0,
//hooks
code_hooks : HashMap::new(),
mem_hooks : HashMap::new(),
intr_hooks : HashMap::new(),
insn_in_hooks : HashMap::new(),
insn_out_hooks : HashMap::new(),
insn_sys_hooks : HashMap::new(),
_pin : std::marker::PhantomPinned,
//create a File System object
filesystem : fs::FsScheme::new(String::from(rootfs)),
sigmap : HashMap::new(),
};
//parse and load the ELF into memory
emu.load(elf);
// display the memory mapping
emu.display_mapped();
if err == uc_error::OK {
Ok(emu)
} else {
Err(err)
}
}
}
Replaced all the implementations of impl UnicornHandler
with impl<D> Emulator<D>
. This way, we already have all the capabilities of Unicorn
like memory management, hooks, instruction interpreter, CPU loop, etc. I think this is called Lazy programming? 🙊
As explained in the ELF Loader section above, we parse the ELF using xmas-elf
crate, go through the program headers, and map the respective segments into the memory. We also set up Stack for the program.
impl<D> Emulator<D>
{
pub fn load(& mut self, elf: &mut ElfFile)
{
self.enable_vfp();
let profile = match self.machine {
header::Machine::AArch64 => {
(linux::OS64::stack_address, linux::OS64::stack_size)
},
_ => {
panic!("[load_with_ld] Not implemented yet!")
}
};
let mut stack_address = profile.0 as u64;
let stack_size = profile.1 as usize;
//initialise stack
self.mmu_map(stack_address, stack_size, Protection::READ|Protection::WRITE, "[stack]", self.null_mut());
// load ELF and linker into memory
self.load_with_ld(stack_address.checked_add(stack_size as u64).unwrap() , 0, self.machine, elf);
stack_address = self.new_stack;
self.reg_write(RegisterARM64::SP as i32, stack_address).unwrap();
}
fn load_with_ld(&mut self, stack_address: u64, load_address: u64, archbit: header::Machine, elf: &mut ElfFile) {
let mut load_address = match load_address {
0 => {
match archbit {
header::Machine::AArch64 => {
self.mmap_address = linux::OS64::mmap_address as u64;
linux::OS64::load_address as u64
},
_ => {
panic!("Shouldn't be here");
}
}
},
_ => {
panic!("Shouldn't be here");
}
};
let mut mem_start : u64 = 0xffff_ffff;
let mut mem_end : u64 = 0xffff_ffff;
let mut mem_s : u64 = 0;
let mut mem_e : u64 = 0;
let mut interp_path : String = String::new();
match elf.header.pt2.type_().as_type() {
header::Type::Executable => {
load_address = 0;
},
header::Type::SharedObject => {
}
_ => {
panic!("Some error in head e_type: {:?}", header::Type::SharedObject);
}
}
for header in elf.program_iter() {
match header.get_type().unwrap() {
program::Type::Interp => {
let offset = header.offset() as usize;
let end_offset = (header.offset()+header.mem_size()) as usize;
let data = elf.input.get(offset..end_offset).unwrap();
interp_path = self.null_str(std::str::from_utf8(data).unwrap());
},
program::Type::Load => {
if mem_start > header.virtual_addr() || mem_start == 0xffff_ffff {
mem_start = header.virtual_addr();
};
if mem_end < header.virtual_addr()+header.mem_size() || mem_end == 0xffff_ffff {
mem_end = header.virtual_addr()+header.mem_size();
}
},
_ => {
}
}
}
mem_start = self.uc_align_down(mem_start);
mem_end = self.uc_align_up(mem_end);
for header in elf.program_iter() {
match header.get_type().unwrap() {
program::Type::Load => {
mem_s = self.uc_align_down(load_address + header.virtual_addr());
mem_e = self.uc_align_up(load_address + header.virtual_addr() + header.file_size());
let perms = utilities::to_uc_permissions(header.flags());
let desc = self.elf_path.clone();
self.mmu_map(mem_s, (mem_e-mem_s) as usize, perms, &desc, self.null_mut());
let data = elf.input.get(header.offset() as usize..
(header.offset()+header.file_size()) as usize).unwrap();
self.write(load_address+header.virtual_addr(), data);
},
_ => {
}
}
}
let loaded_mem_end = load_address + mem_end;
if loaded_mem_end > mem_e {
let desc = self.elf_path.clone();
self.mmu_map( mem_e, (loaded_mem_end-mem_e) as usize, Protection::ALL, &desc, self.null_mut());
}
self.elf_entry = elf.header.pt2.entry_point() + load_address;
self.debug_print(format!("elf_entry {:x}", self.elf_entry));
self.brk_address = mem_end + load_address + 0x2000; //not sure why?? seems to be used in ql_syscall_brk
// load interpreter if there is an interpreter
if !interp_path.is_empty() {
self.debug_print(format!("Trying to load interpreter: {}{}", self.rootfs, interp_path));
let mut interp_full_path = String::new();
interp_full_path.push_str(&self.rootfs);
interp_full_path.push_str(&interp_path);
let interp_data = std::fs::read(&interp_full_path).unwrap();
let interp_elf = ElfFile::new(interp_data.get(0..).unwrap()).unwrap();
let mut interp_mem_size: u64 = 0;
let mut interp_address : u64 = 0;
for i_header in interp_elf.program_iter() {
match i_header.get_type().unwrap() {
program::Type::Load => {
if interp_mem_size < i_header.virtual_addr() + i_header.mem_size() || interp_mem_size == 0 {
interp_mem_size = i_header.virtual_addr() + i_header.mem_size();
}
},
_ => {
}
};
}
interp_mem_size = self.uc_align_up(interp_mem_size);
match archbit {
header::Machine::AArch64 => {
interp_address = linux::OS64::interp_address as u64;
}
_ => {
panic!("what?");
}
};
//map interpreter into memory
self.mmu_map(interp_address, interp_mem_size as usize , Protection::ALL, &interp_path, self.null_mut());
for i_header in interp_elf.program_iter() {
match i_header.get_type().unwrap() {
program::Type::Load => {
let data = interp_elf.input.get(i_header.offset() as usize..
(i_header.offset()+i_header.file_size()) as usize
).unwrap();
self.write( interp_address+i_header.physical_addr(), data);
},
_ => {
}
};
}
self.interp_address = interp_address;
self.entry_point = interp_elf.header.pt2.entry_point() + self.interp_address;
}
// setup elf table
let mut elf_table: Vec<u8> = Vec::new();
let mut new_stack = stack_address;
// copy arg0 on to stack. elf_path
new_stack = self.copy_str(new_stack, &mut self.elf_path.clone());
elf_table.extend_from_slice(&self.pack(self.args.len() as u64 + 1)); // + 1 is for arg0 = elf path.
elf_table.extend_from_slice(&self.pack(new_stack));
let mut argc = self.args.len();
loop {
if argc <=0 {
break;
}
argc -= 1;
let mut arg = self.args[argc].clone();
new_stack = self.copy_str(new_stack, &mut arg);
elf_table.extend_from_slice(&self.pack(new_stack));
}
elf_table.extend_from_slice(&self.pack(0));
let mut envc = self.env.len();
loop {
if envc <=0 {
break;
}
envc -= 1;
let mut env = self.env[envc].clone();
new_stack = self.copy_str(new_stack, &mut env);
elf_table.extend_from_slice(&self.pack(new_stack));
}
elf_table.extend_from_slice(&self.pack(0));
new_stack = self.alignment(new_stack);
//our super secure random string
let mut randstr = "a".repeat(0x10);
let mut cpustr = String::from("aarch64");
let mut addr1 = self.copy_str(new_stack, &mut randstr);
new_stack = addr1;
let mut addr2 = self.copy_str(new_stack, &mut cpustr);
new_stack = addr2;
new_stack = self.alignment(new_stack);
// Set AUX
let head = elf.header;
let elf_phdr = load_address + head.pt2.ph_offset();
let elf_phent = head.pt2.ph_entry_size();
let elf_phnum = head.pt2.ph_count();
let elf_pagesz = 0x1000;
let elf_guid = linux::uid;
let elf_flags = 0;
let elf_entry = load_address + head.pt2.entry_point();
let randstraddr = addr1;
let cpustraddr = addr2;
let elf_hwcap: u64 = match head.pt2.machine().as_machine() {
header::Machine::AArch64 => {
0x078bfbfd
},
_ => {
panic!("");
}
};
//setup auxiliary vectors
elf_table.extend_from_slice(&self.new_aux_ent(AUX::AT_PHDR as u64, elf_phdr + mem_start));
elf_table.extend_from_slice(&self.new_aux_ent(AUX::AT_PHENT as u64, elf_phent as u64));
elf_table.extend_from_slice(&self.new_aux_ent(AUX::AT_PHNUM as u64, elf_phnum as u64));
elf_table.extend_from_slice(&self.new_aux_ent(AUX::AT_PAGESZ as u64, elf_pagesz as u64));
elf_table.extend_from_slice(&self.new_aux_ent(AUX::AT_BASE as u64, self.interp_address));
elf_table.extend_from_slice(&self.new_aux_ent(AUX::AT_FLAGS as u64, elf_flags));
elf_table.extend_from_slice(&self.new_aux_ent(AUX::AT_ENTRY as u64, elf_entry));
elf_table.extend_from_slice(&self.new_aux_ent(AUX::AT_UID as u64, elf_guid as u64));
elf_table.extend_from_slice(&self.new_aux_ent(AUX::AT_EUID as u64, elf_guid as u64));
elf_table.extend_from_slice(&self.new_aux_ent(AUX::AT_GID as u64, elf_guid as u64));
elf_table.extend_from_slice(&self.new_aux_ent(AUX::AT_EGID as u64, elf_guid as u64));
elf_table.extend_from_slice(&self.new_aux_ent(AUX::AT_HWCAP as u64, elf_hwcap as u64));
elf_table.extend_from_slice(&self.new_aux_ent(AUX::AT_CLKTCK as u64, 100));
elf_table.extend_from_slice(&self.new_aux_ent(AUX::AT_RANDOM as u64, randstraddr));
elf_table.extend_from_slice(&self.new_aux_ent(AUX::AT_PLATFORM as u64, cpustraddr));
elf_table.extend_from_slice(&self.new_aux_ent(AUX::AT_SECURE as u64, 0));
elf_table.extend_from_slice(&self.new_aux_ent(AUX::AT_NULL as u64, 0));
let len = 0x10 - ((new_stack - elf_table.len() as u64) & 0xf) as usize;
let padding = std::iter::repeat('0').take(len).collect::<String>();
elf_table.extend_from_slice(padding.as_bytes());
let addr = new_stack - elf_table.len() as u64;
self.write( addr, &elf_table);
new_stack = new_stack - elf_table.len() as u64;
self.new_stack = new_stack;
self.load_address = load_address;
}
fn new_aux_ent(&self, key: u64, val: u64) -> Vec<u8>
{
//pack the aux key-val pair
let mut aux: Vec<u8> = Vec::new();
aux.extend_from_slice(&self.pack(key));
aux.extend_from_slice(&self.pack(val));
aux
}
// Run linker
pub fn run_linker(&mut self)
{
utilities::context_title(Some("Emulating linker64"));
let res = self.emu_start(self.entry_point, self.elf_entry, 0, 0);
self.handle_emu_exception(res);
utilities::context_title(Some("Emulating linker64 done"));
}
}
We add three hooks in core/hook.rs
pub fn add_hooks(emu: &mut rudroid::Emulator<i64>) {
//handle SVC
emu.add_intr_hook(android::syscalls::hook_syscall).unwrap();
//handle MEM_FETCH_UNMAPPED
emu.add_mem_hook(unicorn_const::HookType::MEM_FETCH_UNMAPPED, 1, 0, callback_mem_error).unwrap();
//handle MEM_READ_UNMAPPED
emu.add_mem_hook(unicorn_const::HookType::MEM_READ_UNMAPPED, 1, 0, callback_mem_error).unwrap();
And in hook_syscall
function, we read the x8
register from the execution context, match it with syscalls of Android, and try to emulate the syscall. Instead of implementing every syscall in our code, we can just forward some of them to the host system, get the return values and forward them to the emulated binary.
mod syscalls;
mod unistd;
use crate::{core::{rudroid::Emulator, unicorn::arch::arm64::RegisterARM64}, utilities};
pub fn get_syscall(uc: &mut Emulator<i64>) -> syscalls::Syscalls {
// syscall_num = UC_ARM64_REG_X8
let syscall = uc.reg_read(RegisterARM64::X8 as i32).unwrap();
unsafe { ::std::mem::transmute(syscall) }
}
pub fn hook_syscall(uc: &mut Emulator<i64>, intno: u32) {
let pc = uc.reg_read(RegisterARM64::PC as i32).unwrap();
let syscall = get_syscall(uc);
uc.syscall(syscall);
}
impl<D> Emulator<D> {
pub fn syscall(&mut self, syscall: syscalls::Syscalls) {
if self.debug {
utilities::draw_line();
self.debug_print(format!("got syscall: {:?}", syscall));
}
match syscall {
syscalls::Syscalls::__NR_getpid =>
{
self.sys_getpid();
}
_ => {
panic!("Syscall {:?} not implemented yet!", syscall);
}
};
}
pub fn empty_syscall_return(&mut self) {
self.reg_write(RegisterARM64::X0 as i32, 0).unwrap();
}
pub fn get_arg(&mut self, num: i32) -> u64 {
// 'x0', 'x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x7'
match num {
0 => {
self.reg_read(RegisterARM64::X0 as i32).unwrap()
},
1 => {
self.reg_read(RegisterARM64::X1 as i32).unwrap()
},
2 => {
self.reg_read(RegisterARM64::X2 as i32).unwrap()
},
3 => {
self.reg_read(RegisterARM64::X3 as i32).unwrap()
},
4 => {
self.reg_read(RegisterARM64::X4 as i32).unwrap()
},
5 => {
self.reg_read(RegisterARM64::X5 as i32).unwrap()
},
6 => {
self.reg_read(RegisterARM64::X6 as i32).unwrap()
},
7 => {
self.reg_read(RegisterARM64::X7 as i32).unwrap()
},
_ => {
panic!("i do not support any more arguments :/");
}
}
}
pub fn set_return_val(&mut self, value: u64) {
self.reg_write(RegisterARM64::X0 as i32, value).unwrap();
}
}
And in main.rs
file, we parse the command line arguments to Rudroid, take target 'Hello World' ELF and rootfs
(/system/ directory copied from an android device) folder as 2 arguments, create an Emulator, load the ELF into memory and start the CPU loop.
extern crate byteorder;
extern crate capstone;
extern crate keystone;
extern crate nix;
extern crate xmas_elf;
mod utilities;
mod core;
use std::env;
use xmas_elf::ElfFile;
use crate::utilities::context_title;
fn parse_args() -> env::Args {
//! Parse Command line arguments
let mut args = env::args();
if args.len() != 3 {
panic!("Please provide an ELF library and rootfs folder");
}
args
}
fn main()
{
utilities::context_title(Some("Hello, world!"));
let mut args = parse_args();
let mut elf_filename = args.nth(1).unwrap();
let rootfs = args.next().unwrap();
let mut elf_data = std::fs::read(&mut elf_filename).unwrap();
let mut elf: ElfFile = ElfFile::new(&mut elf_data).unwrap();
//our hello world program takes no arguments or environment variables
let program_args: Vec<String> = vec![];
let program_env: Vec<String> = Vec::new();
let endian = elf.header.pt1.data();
let mut emu = core::rudroid::Emulator::new( &elf_filename, &rootfs, &mut elf, endian, program_args, program_env, 0, true).expect("Emulator initialisation failed");
//set up hooks
core::hooks::add_hooks(&mut emu);
//run linker to load dependencies of ELF and then run the main from ELF
emu.run_linker();
emu.run_elf();
context_title(Some("Emulator creted"))
}
We are already ready to execute the ELF binary, except that when any syscall is called by the binary we panic with panic!("Syscall {:?} not implemented yet!", syscall);
The best part here is, we can just do it on the fly i.e., implement the requested syscall
that was requested in the above panic.
Lets' compile and link it with Unicorn/Keystone/capstone.
build:
RUSTFLAGS="-L /usr/lib/ -lunicorn -L /usr/local/lib/ -lkeystone -Awarnings" cargo run -- /setup/hello /setup/rootfs/
Now compile and execute with make
:
You can notice in the screenshot above that Emulator panicked with 'Syscall __NR_getpid not implemented yet!'
. So, let's implement __NR_getpid syscall. If you check the documents of getpid (__NR_getpid
) documented here https://man7.org/linux/man-pages/man2/getpid.2.html
, just returns the PID of the executing process. Since here we are executing the binary in our own emulator, we can return whatever number as PID in the response. Let's return 1337 as PID.
So we create a file unistd.rs
inside syscalls
folder and implement __NR_getpid syscall.
use std::process;
use crate::core::rudroid::Emulator;
impl<D> Emulator<D> {
pub fn sys_getpid(&mut self) {
let pid = 1337;
self.set_return_val(pid as u64);
}
}
And now we run make
again to execute.
As you can see, it executed [DEBUG]: got syscall: __NR_getpid
and now panicked with 'Syscall __NR3264_mmap not implemented yet!'
. If you are not sure what this syscall does, just search in bootlin. For this __NR3264_mmap, it is defined in https://elixir.bootlin.com/linux/latest/source/include/uapi/asm-generic/unistd.h#L645
__SC_3264(__NR3264_mmap, sys_mmap2, sys_mmap)
This implements mmap. So, we go ahead and implement this as well.
impl<D> Emulator<D> {
pub fn sys_mmap(&mut self) {
let addr = self.get_arg(0);
let len = self.get_arg(1);
let prot = self.get_arg(2);
let flags = self.get_arg(3);
let fd : i32 = self.get_arg(4) as i32;
let off = self.get_arg(5) ;
let aligned_len = self.align_len(len);
let mut mmap_base = addr;
let mut need_map : bool = true;
if addr == 0 {
mmap_base = self.mmap_address;
self.mmap_address = mmap_base + aligned_len;
}
else {
need_map = false;
}
let is_fixed = (flags & MAP_FIXED) != 0;
if self.debug {
self.debug_print(format!("mmap_base 0x{:x} length 0x{:x} fixed: {} = ({:x}, {:x})", addr, len, is_fixed, mmap_base, aligned_len as usize));
}
if need_map {
self.mmu_map(mmap_base, aligned_len as usize, Protection::ALL, "[syscall_mmap]", self.null_mut());
}
if (( flags & MAP_ANONYMOUS) == 0 ) && fd < MAX_FDS && fd > 0 {
let mut data = vec![0u8; len as usize];
self.filesystem.pread(fd, &mut data, off).unwrap();
let mem_info: &str = &self.filesystem.get_path(fd).unwrap();
let map_info = MapInfo {
memory_start : mmap_base,
memory_end : mmap_base+((len+0x1000-1)/0x1000) * 0x1000,
memory_perms : Protection::ALL,
description : String::from(mem_info),
};
self.add_mapinfo(map_info);
self.write(mmap_base, &data);
}
self.set_return_val(mmap_base);
}
}
And we make
again.
Do you see where I am going? Just keep doing this for few more syscalls till I saw the output 'Hello World' 💃🕺💃🕺
Uffff. That's a long post. Hope it's useful to someone. Please DM me if I made any boo-boo.