Skip to content

Commit d8e1d52

Browse files
committed
[WIP] Add a new document to explain dynamic linking
Since the compiler supports both static linking and dynamic linking, this commit adds a new document to explain the following: - Describe how to build the dynamic linking version of shecc. - Stack frame layout for ARM. - Calling convention for ARM and RISC-V. - Runtime execution flow. - Dynamic sections. - PLT execution path and performance overhead. - Lazy binding and immediate binding. - Limitation of the current dynamic linking implementation.
1 parent 39acfa5 commit d8e1d52

File tree

1 file changed

+249
-0
lines changed

1 file changed

+249
-0
lines changed

docs/dynamic-linking.md

Lines changed: 249 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,249 @@
1+
# Dynamic Linking
2+
3+
## Build dynamically linked shecc and programs
4+
5+
Build the dynamically linked version of shecc, but notice that shecc currently doesn't support dynamic linking for the RISC-V architecture:
6+
7+
```shell
8+
$ make ARCH=arm DYNLINK=1
9+
```
10+
11+
Next, you can use shecc to build dynamically linked programs by adding the `--dynlink` flag:
12+
13+
```shell
14+
# Use the stage 0 compiler
15+
$ out/shecc --dynlink -o <output> <input.c>
16+
# Use the stage 1 or stage 2 compiler
17+
$ qemu-arm -L <LD_PREFIX> out/shecc-stage2.elf --dynlink -o <output> <input.c>
18+
19+
# Execute the compiled program
20+
$ qemu-arm -L <LD_PREFIX> <output>
21+
```
22+
23+
When executing a dynamically linked program, you should set the ELF interpreter prefix so that `ld.so` can be invoked. Generally, it should be `/usr/arm-linux-gnueabihf` if you have installed the ARM GNU toolchain by `apt`. Otherwise, you should find and specify the correct path if you manually installed the toolchain.
24+
25+
## Stack frame layout
26+
27+
### Arm32
28+
29+
In both static and dynamic linking modes, the stack frame layout for each function can be illustrated as follows:
30+
31+
```
32+
High Address
33+
+------------------+
34+
| incoming args |
35+
+------------------+ <- sp + total_size
36+
| saved lr |
37+
+------------------+
38+
| saved r11 |
39+
+------------------+
40+
| saved r10 |
41+
+------------------+
42+
| saved r9 |
43+
+------------------+
44+
| saved r8 |
45+
+------------------+
46+
| saved r7 |
47+
+------------------+
48+
| saved r6 |
49+
+------------------+
50+
| saved r5 |
51+
+------------------+
52+
| saved r4 |
53+
+------------------+
54+
| (padding) |
55+
+------------------+
56+
| local variables |
57+
+------------------+ <- sp + (MAX_PARAMS - MAX_ARGS_IN_REG) * 4
58+
| outgoing args |
59+
+------------------+ <- sp (MUST be aligned to 8 bytes)
60+
Low Address
61+
```
62+
63+
* `total_size`: includes the size of the following elements:
64+
* `outgoing args`: a fixed size - `(MAX_PARAMS - MAX_ARGS_IN_REG) * 4` bytes
65+
* `local variables`
66+
* `saved r4-r11 and lr`: a fixed size - 36 bytes
67+
68+
* Note that the space for `incoming args` belongs to the caller's stack frame, while the remaining space belongs to the callee's stack frame.
69+
70+
### RISC-V
71+
72+
(Currently not supported)
73+
74+
## Calling Convention
75+
76+
### Arm32
77+
78+
Regardless of which mode is used, the caller performs the following operations to comply with the Arm Architecture Procedure Call Standard (AAPCS) when calling a function.
79+
80+
* The first four arguments are put into registers `r0` - `r3`
81+
* Any additional arguments are passed on the stack. Arguments are pushed onto the stack starting from the last argument, so the fifth argument resides at a lower address and the last argument at a higher address.
82+
* Align the stack pointer to 8 bytes, as external functions may access 8-byte objects that require such alignment.
83+
84+
Then, the callee will perform these operations:
85+
86+
- Preserve the contents of registers `r4` - `r11` on the stack upon function entry.
87+
- The callee also pushes the content of `lr` onto the stack to preserve the return address; however, this operation is not required by the AAPCS.
88+
89+
- Restore these registers from the stack upon returning.
90+
91+
### RISC-V
92+
93+
In the RISC-V architecture, registers `a0` - `a7` are used as argument registers; that is, the first eight arguments are passed into these registers.
94+
95+
Since the current implementation of shecc supports up to 8 arguments, no argument needs to be passed onto the stack.
96+
97+
## Runtime execution flow of a dynamically linked program
98+
99+
```
100+
| +---------------------------+
101+
| | program |
102+
| +-------------+ +----------------+ | |
103+
| | shell | | Dynamic linker | | +--------+ +----------+ |
104+
userspace | | | | +------+->| entry | | main | |
105+
| | $ ./program | | (ld.so) | | | point | | function | |
106+
program | +-----+-------+ +----------------+ | +-+------+ +-----+----+ |
107+
| | ^ | | ^ | |
108+
| | | +----+---------+----+-------+
109+
| | | | | |
110+
| | | | | |
111+
----------+-------+---------------------------------------------+--------------------+---------+----+----------------------
112+
| | | | | |
113+
| v | v | v
114+
| +-------+ (It may be another | +-------------+-----+ +------+
115+
glibc | | execl | | | __libc_start_main +--->| exit |
116+
| +---+---+ equivalent call) | +-------------------+ +---+--+
117+
| | | |
118+
----------+-------+---------------------------------------------+---------------------------------------------+------------
119+
system | | | |
120+
| v | v
121+
call | +------+ (It may be another | +-------+
122+
| | exec | | | _exit |
123+
interface | +---+--+ equivalent syscall) | +---+---+
124+
| | | |
125+
----------+-------+---------------------------------------------+---------------------------------------------+------------
126+
| | | |
127+
| v | v
128+
| +--------------+ +---------------+ +--------+-------------+ +---------------+
129+
| | Validate the | | Create a new | | Startup the kernel's | | Delete the |
130+
kernel | | +--->| +--->| | | |
131+
| | executable | | process image | | program loader | | process image |
132+
| +--------------+ +---------------+ +----------------------+ +---------------+
133+
```
134+
135+
1. A running process (e.g.: a shell) executes the specified program (`program`), which is dynamically linked.
136+
2. Kernel validates the executable and creates a process image if the validation passes.
137+
3. Dynamic linker (`ld.so`) is invoked by the kernel's program loader.
138+
* For the Arm architecture, the dynamic linker is `/lib/ld-linux-armhf.so.3`.
139+
4. Linker loads shared libraries such as `libc.so`.
140+
5. Linker resolves symbols and fills global offset table (GOT).
141+
6. Control transfers to the program, which starts at the entry point.
142+
7. Program executes `__libc_start_main` at the beginning.
143+
8. `__libc_start_main` calls the *main wrapper*, which pushes registers r4-r11 and lr onto the stack, sets up a global stack for all global variables (excluding read-only variables), and initializes them.
144+
9. Execute the *main wrapper*, and then invoke the main function.
145+
10. After the `main` function returns, the *main wrapper* restores the necessary registers and passes control back to `__libc_start_main`, which implicitly calls `exit(3)` to terminate the program.
146+
* Or, the `main` function can also call `exit(3)` or `_exit(2)` to directly terminate itself.
147+
148+
## Dynamic sections
149+
150+
When using dynamic linking, the following sections are generated for compiled programs:
151+
152+
1. `.interp` - Path to dynamic linker
153+
2. `.dynsym` - Dynamic symbol table
154+
3. `.dynstr` - Dynamic string table
155+
4. `.rel.plt` - PLT relocations
156+
5. `.plt` - Procedure Linkage Table
157+
6. `.got` - Global Offset Table
158+
7. `.dynamic` - Dynamic linking information
159+
160+
### Initialization of all GOT entries
161+
162+
* `GOT[0]` is set to the starting address of the `.dynamic` section.
163+
* `GOT[1]` and `GOT[2]` are initialized to zero and reserved for the `link_map` and the resolver (`__dl_runtimer_resolve`).
164+
* The dynamic linker modifies them to point to the actual addresses at runtime.
165+
* `GOT[3]` - `GOT[N]` are initially set to the address of `PLT[0]` at compile time, causing the first call to an external function to invoke the resolver at runtime.
166+
167+
### Explanation for PLT stubs (Arm32)
168+
169+
Under the Arm architecture, the resolver assumes that the following three conditions are met:
170+
171+
* `[sp]` contains the return address from the original function call.
172+
* `ip` stores the address of the callee's GOT entry.
173+
* `lr` stores the address of `GOT[2]`.
174+
175+
Therefore, the first entry (`PLT[0]`) contains the following instructions to satisfy the first and third requirements, and then to invoke the resolver.
176+
177+
```
178+
push {lr} @ (str lr, [sp, #-4]!)
179+
movw sl, #:lower16:(&GOT[2])
180+
movt sl, #:upper16:(&GOT[2])
181+
mov lr, sl
182+
ldr pc, [lr]
183+
```
184+
185+
1. Push register `lr` onto the stack.
186+
2. Set register `sl` to the address of `GOT[2]`.
187+
3. Move the value of `sl` to `lr`.
188+
4. Load the value located at `[lr]` into the program counter (`pc`)
189+
190+
The remaining PLT entries correspond to all external functions, and each entry includes the following instructions to fulfill the second requirement:
191+
192+
```
193+
movw ip, #:lower16:(&GOT[x])
194+
movt ip, #:upper16:(&GOT[x])
195+
ldr pc, [ip]
196+
```
197+
198+
1. Set register `ip` to the address of `GOT[x]`.
199+
2. Assign register `pc` to the value of `GOT[x]`. That is, set `pc` to the address of the callee.
200+
201+
## PLT execution path and performance overhead
202+
203+
Since calling an external function needs a PLT stub for indirect invocation, the execution path of the first call is as follows:
204+
205+
1. Call the corresponding PLT stub of the external function.
206+
2. The PLT stub reads the GOT entry.
207+
3. Since the GOT entry is initially set to point to the first PLT entry, the call jumps to `PLT[0]`, which in turn calls the resolver.
208+
4. The resolver handles the symbol and updates the GOT entry.
209+
5. Jump to the actual function to continue execution.
210+
211+
For subsequent calls, the execution path only performs steps 1, 2 and 5. Regardless of whether it is the first call or a subsequent call, calling an external function requires executing additional instructions. It is evident that the overhead accounts to 3-8 instructions compared to a direct call.
212+
213+
For a bootstrapping compiler, this overhead is acceptable.
214+
215+
## Binding
216+
217+
Each external function must perform relocation via the resolver; in other words, each "symbol" needs to **bind** to its actual address.
218+
219+
There are two types of binding:
220+
221+
### Lazy binding
222+
223+
The dynamic linker defers function call resolution until the function is called at runtime.
224+
225+
### Immediate handling
226+
227+
The dynamic linker resolves all symbols when the program is started, or when the shared library is loaded via `dlopen`.
228+
229+
## Limitations
230+
231+
For the current implementation of dynamic linking, note the following:
232+
233+
* GOT is located in a writable segment (`.data` segment).
234+
* The `PT_GNU_RELRO` program header has not yet been implemented.
235+
* `DT_BIND_NOW` (force immediate binding) is not set.
236+
237+
This implies that:
238+
239+
* GOT entries can be modified at runtime, which may create a potential ROP (Return-Oriented Programming) attack vector.
240+
* Function pointers (GOT entries) might be hijacked due to the absence of full RELRO protection.
241+
242+
## Reference
243+
244+
* man page: `ld(1)`
245+
* man page: `ld.so(8)`
246+
* glibc - [`__dl_runtime_resolve`](https://elixir.bootlin.com/glibc/glibc-2.41.9000/source/sysdeps/arm/dl-trampoline.S#L30) implementation (for Arm32)
247+
* Application Binary Interface for the Arm Architecture - [`abi-aa`](https://github.com/ARM-software/abi-aa)
248+
* `aaelf32`
249+
* `aapcs32`

0 commit comments

Comments
 (0)