2018-10-08 08:31 <no-defun-allowed> i’m going to create the world’s first conservative copying collector
2018 no-defun-allowed
did in fact not create a conservative copying collector,
nor would she have been even close to being first to do it. But 2020
no-defun-allowed
has a conservative copying collector at least.
+This is a rough implementation of
List Processing in Real Time on a Serial Computer , which can collect
allocated conses incrementally, and somewhat conservatively (on the stack and
registers). However, the client must use a read barrier (as implemented by the
car
and cdr
functions provided.)+
Actually, no, it’s a replication copying collector. It copies everything reachable from the roots to newspace, then updates all the references in referenced conses. This only requires a write barrier to replicate modifications if there is a newspace copy of a cons.
This utterly abuses setjmp
and how stacks appear to work on my machine, so
by using this code, you agree to not sue me for limbs lost to it. One can trace
roots (no pun intended) to SBCL’s semi-conservative copying collector and the
Shenandoah JVM garbage collector, which are not likely to blow your limbs off.
The stack walking code was also gleaned off
Gilbert Baumann’s mark-sweep implementation.
Conses are organised into pages (struct page
of pages.h
), with each page
16KiB wide. Each cons is a triple <forwarding pointer, car, cdr>. We allocate
from the start of a page to its end, as typical for compacting collectors.
At the start of a collection cycle, the stack and registers are scanned for
roots.The registers are scanned by using setjmp
to “spill” the registers into
the stack, and the stack can be carefully probed for values that look like
pointers into the heap.
Checking what looks like a pointer can be done better with some heuristics instead of iterating over pages most of the time; provided pages are allocated mostly contiguously, we can eliminate most pointers outside the heap by checking for presence in an interval holding all pages. Pages are also cached, so that most searching can be eliminated.
At the end of a collection cycle, the stack and registers are scanned for conses that must be pinned (as we cannot update the conservative roots), and have their slots fixed up, by copying them from the copy.
Each object has a forwarding pointer, either to itself, or its copy in newspace. With replication copying, this pointer is only used when writing, to replicate modifications in oldspace to newspace. Checking pointer (physical) equality also requires retrieving the forwarding pointer to get canonical pointers to compare.
Garbage collections start after some number of pages, the threshold, have been allocated after the last collection finished. In order of priority, the threshold is chosen so that:
- it is never below 10 pages (about 80kB),
- stack scanning completes in less than 3 milliseconds (adjustable with
PAUSE_TARGET
), and - the heap remains between 0.5x and 1.3x its last size after a collection.