- Understand initial process state on process entry as specified by the SystemV x86-64 ABI.
- Build a
no-std
program to analyze & visualize the initial process state.
Before starting to implement a minimal dynamic linker the first step is to
understand the process initialization
procedure.
This is important because when starting a dynamically-linked
executable the control is first passed to the dynamic linker
(interpreter) by the Linux Kernel as mentioned in
01_dynamic_linking.
Once the dynamic linker is executing it needs to prepare the execution environment for the dynamically-linked executable. The dynamic linker's main tasks are:
- To load dependencies.
- Perform re-locations.
- Run initialization routines.
After the execution environment is prepared the dynamic linker hands control to the user executable.
Due to all this requirements the dynamic must be a free-standing executable with no dependencies.
When launching an ELF executable the Linux Kernel will map in the
memory segments from the ELF file and setup some data on the stack
according to the specification in the SystemV x86-64 ABI
chapter Initial Stack and Register State.
On process startup after execve(2)
the stack looks as follows
+------------+ High Address
| .. |
| ENV strs |<-+
+->| ARG strs | |
| | .. | |
| +------------+ |
| | .. | |
| +------------+ |
| | AT_NULL | |
| +------------+ |
| | AUXV | |
| +------------+ |
| | 0x0 | |
| +------------+ |
| | ENVP |--+
| +------------+
| | 0x0 |
| +------------+
+--| ARGV |
+------------+
$rsp ->| ARGC |
+------------+ Low Address
| Offset (in bytes) | Type | Description
-----+-----------------------+------------------------+--------------------
AUXV | &ENVP + 8*#ENVP + 8 | struct { uint64_t[2] } | Auxiliary Vector
0x0 | &ENVP + 8*#ENVP | | 0 terinator (ENVP)
ENVP | &ARGV + 8*ARGC + 8 | const char* [] | Environment ptrs
0x0 | &ARGV + 8*ARGC | | 0 terinator (ARGV)
ARGV | $rsp + 8 | const char* [] | Argument ptrs
ARGC | $rsp | uint64_t | Argument count
ARGV : const char* []
is an array of pointers to string literals holding the command line arguments.ARGV[0]
is special as it holds the path of the launched program.
ARGC : uint64_t
is the number of command line arguments + 1ENVP : const char* []
is an array of pointers to string literals holding the environment variables as seen by this processAUXV : uint64_t[2]
is theauxiliary vector
providing additional information like theentry point
or theprogram header
of the program.
The AUXV
segment consists of consecutive AuxvEntry
elements terminated by the DT_NULL
element.
struct AuxvEntry {
uint64_t tag;
uint64_t val;
};
The Auxiliary Vector chapter in the x86-64 System V ABI
specifies
the following tags:
AT_NULL = 0
AT_IGNORE = 1
AT_EXECFD = 2
AT_PHDR = 3
AT_PHENT = 4
AT_PHNUM = 5
AT_PAGESZ = 6
AT_BASE = 7
AT_FLAGS = 8
AT_ENTRY = 9
AT_NOTELF = 10
AT_UID = 11
AT_EUID = 12
AT_GID = 13
AT_EGID = 14
Where AT_NULL
is used to indicate the end of AUXV
.
Regarding the state of general purpose registers on process entry the x86-64 SystemV ABI states that all registers except the ones listed below are in an unspecified state:
$rbp
: content is unspecified, but user code should set it to zero to mark the deepest stack frame$rsp
: points to the beginning of the data block provided by the Kernel and is guaranteed to be 16-byte aligned at process entry$rdx
: function pointer that the application should register withatexit(BA_OS)
.
Not sure here if clearing
$rbp
is strictly required as frame-pointer chaining is optional and can be omitted (gcc -fomit-frame-pointer
).
Before exploring and visualizing the data passed by the Linux Kernel on the stack there is one more question to answer: How to run the first instruction in a process?
Typically when building a C
program the users entry point is the main
function, however this won't contain the first instruction executed after the
process entry. This can be seen by extracting the entry point
from the ELF
header and checking against the symbols in the program. Here the entry point is
0x1020
which belongs to the symbol _start
and not main
.
readelf -h main | grep Entry
Entry point address: 0x1020
nm main | grep '1020\|main'
0000000000001119 T main
0000000000001020 T _start
This is because by default the static linker
adds some extra code & libraries
to the program like for example the libc
and the C-runtime (crt)
which
contains the _start
symbol and hence the first instruction executed.
Passing --trace
down to the static linker
sheds some light onto which
input files the static linker actually processes.
echo 'void main() {}' | gcc -x c -o /dev/null - -Wl,--trace
/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../lib/Scrt1.o
/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../lib/crti.o
/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/crtbeginS.o
/tmp/ccjZdjYx.o
/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/libgcc.a
/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../lib/libgcc_s.so
/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../lib/libc.so
/usr/lib/ld-linux-x86-64.so.2
/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/crtendS.o
/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../lib/crtn.o
/tmp/ccjZdjYx.o
is a temporary file created by the compiler containing the code echoed.
The static linker can be explicitly told to not include any default files by
using the gcc -nostdlib
argument.
echo 'void _start() {}' | gcc -x c -o /dev/null - -Wl,--trace -nostdlib
/tmp/ccbfkCoZ.o
Quoting man gcc
-nostdlib
Do not use the standard system startup files or libraries when linking.
With the capability to control the first instruction executed after process entry we finally can visualize the data passed by the Linux Kernel on the stack.
First we provide the symbol _start
(default entry point) which saves a
pointer to the Kernel data in $rdi
and jumps to a function called entry
.
The pointer is saved in $rdi
because that's the register for the first
argument of class INTEGER
(SystemV ABI Function Arugments).
.section .text, "ax", @progbits
.global _start
_start:
// Clear $rbp.
xor rbp, rbp
// Load ptr to Kernel data.
lea rdi, [rsp]
call entry
...
The full source code of the _start
function is available in entry.S.
The pointer passed to the entry
function can be used to compute ARGC
,
ARGV
and ENVP
accordingly.
void entry(uint64_t* prctx) {
uint64_t argc = *prctx;
const char** argv = (const char**)(prctx + 1);
const char** envv = (const char**)(argv + argc + 1);
...
To collect the AUXV
entries we first need to count the number of environment
variables as follows.
// entry.c
...
int envc = 0;
for (const char** env = envv; *env; ++env) {
++envc;
}
uint64_t auxv[AT_MAX_CNT];
for (unsigned i = 0; i < AT_MAX_CNT; ++i) {
auxv[i] = 0;
}
const Auxv64Entry* auxvp = (const Auxv64Entry*)(envv + envc + 1);
for (; auxvp->tag != AT_NULL; ++auxvp) {
if (auxvp->tag < AT_MAX_CNT) {
auxv[auxvp->tag] = auxvp->val;
}
}
...
Finally the data can be printed as
// entry.c
...
pfmt("Got %d arg(s)\n", argc);
for (const char** arg = argv; *arg; ++arg) {
pfmt("\targ = %s\n", *arg);
}
const int max_env = 10;
pfmt("Print first %d env var(s)\n", max_env - 1);
for (const char** env = envv; *env && (env - envv < max_env); ++env) {
pfmt("\tenv = %s\n", *env);
}
pfmt("Print auxiliary vector\n");
pfmt("\tAT_EXECFD: %ld\n", auxv[AT_EXECFD]);
pfmt("\tAT_PHDR : %p\n", auxv[AT_PHDR]);
pfmt("\tAT_PHENT : %ld\n", auxv[AT_PHENT]);
pfmt("\tAT_PHNUM : %ld\n", auxv[AT_PHNUM]);
pfmt("\tAT_PAGESZ: %ld\n", auxv[AT_PAGESZ]);
pfmt("\tAT_BASE : %lx\n", auxv[AT_BASE]);
pfmt("\tAT_FLAGS : %ld\n", auxv[AT_FLAGS]);
pfmt("\tAT_ENTRY : %p\n", auxv[AT_ENTRY]);
pfmt("\tAT_NOTELF: %lx\n", auxv[AT_NOTELF]);
pfmt("\tAT_UID : %ld\n", auxv[AT_UID]);
pfmt("\tAT_EUID : %ld\n", auxv[AT_EUID]);
pfmt("\tAT_GID : %ld\n", auxv[AT_GID]);
pfmt("\tAT_EGID : %ld\n", auxv[AT_EGID]);
...
The full source code of the entry
function is available in entry.c.
Running the program as ./entry 1 2 3 4
it yields following output:
Got 5 arg(s)
arg = ./entry
arg = 1
arg = 2
arg = 3
arg = 4
Print first 9 env var(s)
env = I3SOCK=/run/user/1000/i3/ipc-socket.1200
env = LC_NAME=en_US.UTF-8
env = LC_NUMERIC=en_US.UTF-8
env = WINDOWID=46221701
env = LC_ADDRESS=en_US.UTF-8
env = GDM_LANG=en_US.utf8
env = PWD=/home/johannst/dev/dynld/02_process_init
env = MAIL=/var/spool/mail/johannst
env = XDG_SESSION_PATH=/org/freedesktop/DisplayManager/Session
Print auxiliary vector
AT_EXECFD: 0
AT_PHDR : 0x400040
AT_PHENT : 56
AT_PHNUM : 5
AT_PAGESZ: 4096
AT_BASE : 0
AT_FLAGS : 0
AT_ENTRY : 0x401000
AT_NOTELF: 0
AT_UID : 1000
AT_EUID : 1000
AT_GID : 1000
AT_EGID : 1000
- On process entry the Linux Kernel provides data on the stack as specified in the SystemV ABI.
- By default the
static linker
adds additional code which contains the_start
symbol being the default processentry point
.