|
| 1 | +# Dynamic Linking |
| 2 | + |
| 3 | +## Build dynamically linked shecc and programs |
| 4 | + |
| 5 | +Build the dynamically linked version of shecc, but notice that shecc currently doesn't support dynamic linking for the RISC-V architecture: |
| 6 | + |
| 7 | +```shell |
| 8 | +$ make ARCH=arm DYNLINK=1 |
| 9 | +``` |
| 10 | + |
| 11 | +Next, you can use shecc to build dynamically linked programs by adding the `--dynlink` flag: |
| 12 | + |
| 13 | +```shell |
| 14 | +# Use the stage 0 compiler |
| 15 | +$ out/shecc --dynlink -o <output> <input.c> |
| 16 | +# Use the stage 1 or stage 2 compiler |
| 17 | +$ qemu-arm -L <LD_PREFIX> out/shecc-stage2.elf --dynlink -o <output> <input.c> |
| 18 | + |
| 19 | +# Execute the compiled program |
| 20 | +$ qemu-arm -L <LD_PREFIX> <output> |
| 21 | +``` |
| 22 | + |
| 23 | +When executing a dynamically linked program, you should set the ELF interpreter prefix so that `ld.so` can be invoked. Generally, it should be `/usr/arm-linux-gnueabihf` if you have installed the ARM GNU toolchain by `apt`. Otherwise, you should find and specify the correct path if you manually installed the toolchain. |
| 24 | + |
| 25 | +## Stack frame layout |
| 26 | + |
| 27 | +### Arm32 |
| 28 | + |
| 29 | +In both static and dynamic linking modes, the stack frame layout for each function can be illustrated as follows: |
| 30 | + |
| 31 | +``` |
| 32 | +High Address |
| 33 | ++------------------+ |
| 34 | +| incoming args | |
| 35 | ++------------------+ <- sp + total_size |
| 36 | +| saved lr | |
| 37 | ++------------------+ |
| 38 | +| saved r11 | |
| 39 | ++------------------+ |
| 40 | +| saved r10 | |
| 41 | ++------------------+ |
| 42 | +| saved r9 | |
| 43 | ++------------------+ |
| 44 | +| saved r8 | |
| 45 | ++------------------+ |
| 46 | +| saved r7 | |
| 47 | ++------------------+ |
| 48 | +| saved r6 | |
| 49 | ++------------------+ |
| 50 | +| saved r5 | |
| 51 | ++------------------+ |
| 52 | +| saved r4 | |
| 53 | ++------------------+ |
| 54 | +| (padding) | |
| 55 | ++------------------+ |
| 56 | +| local variables | |
| 57 | ++------------------+ <- sp + (MAX_PARAMS - MAX_ARGS_IN_REG) * 4 |
| 58 | +| outgoing args | |
| 59 | ++------------------+ <- sp (MUST be aligned to 8 bytes) |
| 60 | +Low Address |
| 61 | +``` |
| 62 | + |
| 63 | +* `total_size`: includes the size of the following elements: |
| 64 | + * `outgoing args`: a fixed size - `(MAX_PARAMS - MAX_ARGS_IN_REG) * 4` bytes |
| 65 | + * `local variables` |
| 66 | + * `saved r4-r11 and lr`: a fixed size - 36 bytes |
| 67 | + |
| 68 | +* Note that the space for `incoming args` belongs to the caller's stack frame, while the remaining space belongs to the callee's stack frame. |
| 69 | + |
| 70 | +### RISC-V |
| 71 | + |
| 72 | +(Currently not supported) |
| 73 | + |
| 74 | +## Calling Convention |
| 75 | + |
| 76 | +### Arm32 |
| 77 | + |
| 78 | +Regardless of which mode is used, the caller performs the following operations to comply with the Arm Architecture Procedure Call Standard (AAPCS) when calling a function. |
| 79 | + |
| 80 | +* The first four arguments are put into registers `r0` - `r3` |
| 81 | +* Any additional arguments are passed on the stack. Arguments are pushed onto the stack starting from the last argument, so the fifth argument resides at a lower address and the last argument at a higher address. |
| 82 | +* Align the stack pointer to 8 bytes, as external functions may access 8-byte objects that require such alignment. |
| 83 | + |
| 84 | +Then, the callee will perform these operations: |
| 85 | + |
| 86 | +- Preserve the contents of registers `r4` - `r11` on the stack upon function entry. |
| 87 | + - The callee also pushes the content of `lr` onto the stack to preserve the return address; however, this operation is not required by the AAPCS. |
| 88 | + |
| 89 | +- Restore these registers from the stack upon returning. |
| 90 | + |
| 91 | +### RISC-V |
| 92 | + |
| 93 | +In the RISC-V architecture, registers `a0` - `a7` are used as argument registers; that is, the first eight arguments are passed into these registers. |
| 94 | + |
| 95 | +Since the current implementation of shecc supports up to 8 arguments, no argument needs to be passed onto the stack. |
| 96 | + |
| 97 | +## Runtime execution flow of a dynamically linked program |
| 98 | + |
| 99 | +``` |
| 100 | + | +---------------------------+ |
| 101 | + | | program | |
| 102 | + | +-------------+ +----------------+ | | |
| 103 | + | | shell | | Dynamic linker | | +--------+ +----------+ | |
| 104 | +userspace | | | | +------+->| entry | | main | | |
| 105 | + | | $ ./program | | (ld.so) | | | point | | function | | |
| 106 | +program | +-----+-------+ +----------------+ | +-+------+ +-----+----+ | |
| 107 | + | | ^ | | ^ | | |
| 108 | + | | | +----+---------+----+-------+ |
| 109 | + | | | | | | |
| 110 | + | | | | | | |
| 111 | +----------+-------+---------------------------------------------+--------------------+---------+----+---------------------- |
| 112 | + | | | | | | |
| 113 | + | v | v | v |
| 114 | + | +-------+ (It may be another | +-------------+-----+ +------+ |
| 115 | +glibc | | execl | | | __libc_start_main +--->| exit | |
| 116 | + | +---+---+ equivalent call) | +-------------------+ +---+--+ |
| 117 | + | | | | |
| 118 | +----------+-------+---------------------------------------------+---------------------------------------------+------------ |
| 119 | +system | | | | |
| 120 | + | v | v |
| 121 | +call | +------+ (It may be another | +-------+ |
| 122 | + | | exec | | | _exit | |
| 123 | +interface | +---+--+ equivalent syscall) | +---+---+ |
| 124 | + | | | | |
| 125 | +----------+-------+---------------------------------------------+---------------------------------------------+------------ |
| 126 | + | | | | |
| 127 | + | v | v |
| 128 | + | +--------------+ +---------------+ +--------+-------------+ +---------------+ |
| 129 | + | | Validate the | | Create a new | | Startup the kernel's | | Delete the | |
| 130 | +kernel | | +--->| +--->| | | | |
| 131 | + | | executable | | process image | | program loader | | process image | |
| 132 | + | +--------------+ +---------------+ +----------------------+ +---------------+ |
| 133 | +``` |
| 134 | + |
| 135 | +1. A running process (e.g.: a shell) executes the specified program (`program`), which is dynamically linked. |
| 136 | +2. Kernel validates the executable and creates a process image if the validation passes. |
| 137 | +3. Dynamic linker (`ld.so`) is invoked by the kernel's program loader. |
| 138 | + * For the Arm architecture, the dynamic linker is `/lib/ld-linux-armhf.so.3`. |
| 139 | +4. Linker loads shared libraries such as `libc.so`. |
| 140 | +5. Linker resolves symbols and fills global offset table (GOT). |
| 141 | +6. Control transfers to the program, which starts at the entry point. |
| 142 | +7. Program executes `__libc_start_main` at the beginning. |
| 143 | +8. `__libc_start_main` calls the *main wrapper*, which pushes registers r4-r11 and lr onto the stack, sets up a global stack for all global variables (excluding read-only variables), and initializes them. |
| 144 | +9. Execute the *main wrapper*, and then invoke the main function. |
| 145 | +10. After the `main` function returns, the *main wrapper* restores the necessary registers and passes control back to `__libc_start_main`, which implicitly calls `exit(3)` to terminate the program. |
| 146 | + * Or, the `main` function can also call `exit(3)` or `_exit(2)` to directly terminate itself. |
| 147 | + |
| 148 | +## Dynamic sections |
| 149 | + |
| 150 | +When using dynamic linking, the following sections are generated for compiled programs: |
| 151 | + |
| 152 | +1. `.interp` - Path to dynamic linker |
| 153 | +2. `.dynsym` - Dynamic symbol table |
| 154 | +3. `.dynstr` - Dynamic string table |
| 155 | +4. `.rel.plt` - PLT relocations |
| 156 | +5. `.plt` - Procedure Linkage Table |
| 157 | +6. `.got` - Global Offset Table |
| 158 | +7. `.dynamic` - Dynamic linking information |
| 159 | + |
| 160 | +### Initialization of all GOT entries |
| 161 | + |
| 162 | +* `GOT[0]` is set to the starting address of the `.dynamic` section. |
| 163 | +* `GOT[1]` and `GOT[2]` are initialized to zero and reserved for the `link_map` and the resolver (`__dl_runtimer_resolve`). |
| 164 | + * The dynamic linker modifies them to point to the actual addresses at runtime. |
| 165 | +* `GOT[3]` - `GOT[N]` are initially set to the address of `PLT[0]` at compile time, causing the first call to an external function to invoke the resolver at runtime. |
| 166 | + |
| 167 | +### Explanation for PLT stubs (Arm32) |
| 168 | + |
| 169 | +Under the Arm architecture, the resolver assumes that the following three conditions are met: |
| 170 | + |
| 171 | +* `[sp]` contains the return address from the original function call. |
| 172 | +* `ip` stores the address of the callee's GOT entry. |
| 173 | +* `lr` stores the address of `GOT[2]`. |
| 174 | + |
| 175 | +Therefore, the first entry (`PLT[0]`) contains the following instructions to satisfy the first and third requirements, and then to invoke the resolver. |
| 176 | + |
| 177 | +``` |
| 178 | +push {lr} @ (str lr, [sp, #-4]!) |
| 179 | +movw sl, #:lower16:(&GOT[2]) |
| 180 | +movt sl, #:upper16:(&GOT[2]) |
| 181 | +mov lr, sl |
| 182 | +ldr pc, [lr] |
| 183 | +``` |
| 184 | + |
| 185 | +1. Push register `lr` onto the stack. |
| 186 | +2. Set register `sl` to the address of `GOT[2]`. |
| 187 | +3. Move the value of `sl` to `lr`. |
| 188 | +4. Load the value located at `[lr]` into the program counter (`pc`) |
| 189 | + |
| 190 | +The remaining PLT entries correspond to all external functions, and each entry includes the following instructions to fulfill the second requirement: |
| 191 | + |
| 192 | +``` |
| 193 | +movw ip, #:lower16:(&GOT[x]) |
| 194 | +movt ip, #:upper16:(&GOT[x]) |
| 195 | +ldr pc, [ip] |
| 196 | +``` |
| 197 | + |
| 198 | +1. Set register `ip` to the address of `GOT[x]`. |
| 199 | +2. Assign register `pc` to the value of `GOT[x]`. That is, set `pc` to the address of the callee. |
| 200 | + |
| 201 | +## PLT execution path and performance overhead |
| 202 | + |
| 203 | +Since calling an external function needs a PLT stub for indirect invocation, the execution path of the first call is as follows: |
| 204 | + |
| 205 | +1. Call the corresponding PLT stub of the external function. |
| 206 | +2. The PLT stub reads the GOT entry. |
| 207 | +3. Since the GOT entry is initially set to point to the first PLT entry, the call jumps to `PLT[0]`, which in turn calls the resolver. |
| 208 | +4. The resolver handles the symbol and updates the GOT entry. |
| 209 | +5. Jump to the actual function to continue execution. |
| 210 | + |
| 211 | +For subsequent calls, the execution path only performs steps 1, 2 and 5. Regardless of whether it is the first call or a subsequent call, calling an external function requires executing additional instructions. It is evident that the overhead accounts to 3-8 instructions compared to a direct call. |
| 212 | + |
| 213 | +For a bootstrapping compiler, this overhead is acceptable. |
| 214 | + |
| 215 | +## Binding |
| 216 | + |
| 217 | +Each external function must perform relocation via the resolver; in other words, each "symbol" needs to **bind** to its actual address. |
| 218 | + |
| 219 | +There are two types of binding: |
| 220 | + |
| 221 | +### Lazy binding |
| 222 | + |
| 223 | +The dynamic linker defers function call resolution until the function is called at runtime. |
| 224 | + |
| 225 | +### Immediate handling |
| 226 | + |
| 227 | +The dynamic linker resolves all symbols when the program is started, or when the shared library is loaded via `dlopen`. |
| 228 | + |
| 229 | +## Limitations |
| 230 | + |
| 231 | +For the current implementation of dynamic linking, note the following: |
| 232 | + |
| 233 | +* GOT is located in a writable segment (`.data` segment). |
| 234 | +* The `PT_GNU_RELRO` program header has not yet been implemented. |
| 235 | +* `DT_BIND_NOW` (force immediate binding) is not set. |
| 236 | + |
| 237 | +This implies that: |
| 238 | + |
| 239 | +* GOT entries can be modified at runtime, which may create a potential ROP (Return-Oriented Programming) attack vector. |
| 240 | +* Function pointers (GOT entries) might be hijacked due to the absence of full RELRO protection. |
| 241 | + |
| 242 | +## Reference |
| 243 | + |
| 244 | +* man page: `ld(1)` |
| 245 | +* man page: `ld.so(8)` |
| 246 | +* glibc - [`__dl_runtime_resolve`](https://elixir.bootlin.com/glibc/glibc-2.41.9000/source/sysdeps/arm/dl-trampoline.S#L30) implementation (for Arm32) |
| 247 | +* Application Binary Interface for the Arm Architecture - [`abi-aa`](https://github.com/ARM-software/abi-aa) |
| 248 | + * `aaelf32` |
| 249 | + * `aapcs32` |
0 commit comments