Skip to content

Commit 65a0658

Browse files
Update gc_process.md
1 parent ad8e2e9 commit 65a0658

File tree

1 file changed

+63
-20
lines changed

1 file changed

+63
-20
lines changed

doc/gc_process.md

Lines changed: 63 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,37 +1,80 @@
1+
12
# Copying Garbage Collector for Theta (WebAssembly Target)
23

34
## Overview
45

5-
This document outlines the design and implementation of a **copying garbage collector** for the Theta programming language, which compiles to WebAssembly. The primary challenge addressed in this design is that WebAssembly's internal stack is opaque to the developer, which complicates the process of tracking and updating heap references stored on the stack during garbage collection. To overcome this, a **shadow stack** is maintained to track references on the WebAssembly stack.
6+
This document outlines an optimized design and implementation of a **copying garbage collector** for the Theta programming language, which compiles to WebAssembly. Due to the opaque nature of WebAssembly's internal stack, we use a **shadow stack** to track heap references across function frames. This ensures that heap references are updated correctly during garbage collection (GC), even when multiple stack frames are involved. However, to reduce the overhead associated with this approach, we implement several optimizations.
67

78
## Key Concepts
89

910
### WebAssembly Stack
10-
WebAssembly does not expose its internal stack to the developer, making it impossible to directly update stack-allocated pointers during garbage collection. However, by maintaining a parallel data structure (the shadow stack), the state of the WebAssembly stack can be tracked externally.
11+
The WebAssembly stack is managed by the WebAssembly runtime, and developers have no direct control over it. Each function call creates a new stack frame, and function frames are isolated, meaning functions can only access their own stack frame. This poses challenges for managing heap references during GC.
1112

1213
### Shadow Stack
13-
The **shadow stack** is an in-memory representation that mirrors the WebAssembly stack, containing all pointers to heap-allocated objects. The shadow stack is synchronized with the WebAssembly stack during normal execution, allowing the garbage collector to operate without needing direct access to the WebAssembly stack.
14+
The **shadow stack** is an in-memory data structure that mirrors the WebAssembly stack in terms of heap references. Each function that allocates or uses a heap reference pushes that reference onto the shadow stack, ensuring that the garbage collector can access all heap references across frames, even when WebAssembly stack frames are not directly accessible.
15+
16+
## Optimized GC Process
17+
18+
### 1. Pre-Garbage Collection: Shadow Stack Tracking
19+
- During the execution of the program, heap references from each function are pushed onto both the **WebAssembly stack** (as local variables) and the **shadow stack** (for global tracking).
20+
- The shadow stack only tracks **heap references**, minimizing the number of push/pop operations.
21+
- **Escape analysis** is applied to determine if an object escapes its current function. Non-escaping objects are handled within their local frame and do not need to be pushed to the shadow stack.
22+
23+
### 2. GC Triggered During Execution
24+
- **Garbage collection** is triggered selectively at safe points, such as function boundaries (entry or exit) or based on memory usage thresholds.
25+
- When GC is triggered, the shadow stack is used to identify and update heap references across all stack frames, regardless of whether the references are from the current or previous function calls.
26+
27+
### 3. Heap Object Relocation
28+
- During GC, heap objects are moved from one memory region (the "from-space") to another (the "to-space"), and heap references in the shadow stack are updated with new memory addresses.
29+
- Since the shadow stack contains all live heap references from every function frame, it ensures that all references, even those from previous frames, are updated correctly.
30+
31+
### 4. Post-GC: Rehydrating the WebAssembly Stack
32+
- After garbage collection, if a function's WebAssembly stack frame contains outdated heap references, those references are updated based on the shadow stack.
33+
- **Lazy updates** are applied, meaning that WebAssembly stack frames are only updated when necessary (i.e., when heap references have changed).
34+
- The updated heap references from the shadow stack are pushed back onto the WebAssembly stack when the function returns or resumes execution.
35+
36+
### Example Flow
37+
38+
#### Function A allocates memory:
39+
1. Function **A** allocates an object on the heap and stores the pointer in its WebAssembly stack frame:
40+
```wasm
41+
(local $heap_ref i32) ;; Local variable in A's frame, a pointer to a heap object
42+
```
43+
2. The pointer is also pushed onto the **shadow stack**:
44+
```wasm
45+
(global.set $shadow_stack[i32]) ;; Shadow stack holds the reference
46+
```
47+
48+
#### Function A calls function B:
49+
1. Function **A** calls function **B**, which creates a new stack frame:
50+
- Function **B** pushes its own heap references onto both the WebAssembly and shadow stacks.
51+
52+
#### GC is triggered in function B:
53+
1. Garbage collection is triggered during **function B**'s execution.
54+
2. The garbage collector uses the **shadow stack** to update heap references for both **function A** and **function B**.
55+
3. Heap objects are moved, and the **shadow stack** is updated with new memory addresses for all live objects.
56+
57+
#### Function B returns:
58+
1. When **function B** returns, **function A** resumes execution.
59+
2. If **function A**'s WebAssembly stack frame contains outdated heap references (due to GC), those references are updated using the **shadow stack**.
1460

15-
## GC Process: Stack-Swapping Approach
61+
### Optimizations
1662

17-
The copying garbage collector operates by swapping out the WebAssembly stack for the shadow stack during garbage collection, and then "rehydrating" the WebAssembly stack with updated heap references after the collection is complete.
63+
#### 1. Escape Analysis
64+
- **Escape analysis** minimizes shadow stack operations by identifying objects that do not escape their current function. These objects are handled within the local stack frame without being pushed to the shadow stack.
1865

19-
### Stages of the GC Process
66+
#### 2. Lazy Stack Updates
67+
- WebAssembly stack frames are updated only when necessary (i.e., when heap objects have been moved during GC). This reduces the frequency of updates and improves performance.
2068

21-
### 1. Pre-Garbage Collection: Stack Synchronization
22-
- During normal execution, the shadow stack is kept in **parity** with the WebAssembly stack.
23-
- All function calls and local variables that reference heap-allocated objects are stored both on the WebAssembly stack and the shadow stack.
69+
#### 3. Selective GC Triggering
70+
- GC is triggered at function boundaries or based on memory usage thresholds, reducing unnecessary garbage collection cycles.
2471

25-
### 2. GC Initiation: Stack Clearing
26-
- When garbage collection begins, the WebAssembly stack is **cleared**. This is safe because the shadow stack already holds all relevant references.
27-
- At this point, the shadow stack serves as the **primary** reference for heap-allocated objects.
72+
## Trade-offs
2873

29-
### 3. During Garbage Collection: Object Relocation
30-
- The garbage collector moves live objects from one region of memory (the "from-space") to another (the "to-space").
31-
- As objects are moved, their new addresses are updated in the **shadow stack**.
32-
- The WebAssembly stack remains empty during this phase, relying entirely on the shadow stack to track heap references.
74+
### Pros
75+
- **Correctness**: The shadow stack ensures that all heap references across all function frames are updated correctly during garbage collection.
76+
- **Optimized performance**: By applying escape analysis, lazy updates, and selective GC triggering, the overhead associated with maintaining the shadow stack and updating the WebAssembly stack is minimized.
3377

34-
### 4. Post-Garbage Collection: Stack Rehydration
35-
- Once the garbage collection process is complete and all live objects have been relocated, the WebAssembly stack is **rehydrated** using the updated shadow stack.
36-
- Each entry in the shadow stack is pushed back onto the WebAssembly stack, with any updated pointers reflecting new object locations in memory.
37-
- After rehydration, the WebAssembly stack resumes normal execution with updated references, and the shadow stack continues to track references in parallel
78+
### Cons
79+
- **Overhead**: There is still some overhead associated with pushing/popping heap references onto the shadow stack and updating the WebAssembly stack.
80+
- **Complexity**: The implementation requires careful management of the shadow stack and synchronization with the WebAssembly stack, which adds complexity to the system.

0 commit comments

Comments
 (0)