Skip to content
This repository has been archived by the owner on Nov 4, 2023. It is now read-only.

Garbage collection

blitz-research edited this page Jun 8, 2013 · 1 revision

OK, a quick rundown on the Monkey C++ GC system:

Monkey's C++ GC is incremental, but not generational. All live objects must be visited before a sweep can occur and garbage collected.

GC works in 2 modes; GC_MODE 1, where GC occurs after each OnCreate, OnUpdate etc, or GC_MODE 2, where GC occurs every object allocation.

All objects include a header which contains: a succ/pred node for linking the object into a list; a size field; and a marked/unmarked/fixed flag.

Objects are always in one of 4 lists: UNMARKED, MARKED, QUEUED or FREE. Objects in the UNMARKED list have not yet be marked, and may be elligble for garbage collection. Objects in the MARKED list have been fully marked and are therefore 'live'. Objects in the QUEUED list have been marked, but their 'children' (ie: fields) have not yet been. Objects in the FREE list are garbage and are incrementally released/destroyed as new objects are allocated. Newly created objects are placed in the UNMARKED list.

All objects also have a virtual mark() method which contains translator generated code that marks child objects, ie: object fields. In general, unmarked child objects are marked by moving them to the QUEUED list, but may also be marked by moving them directly to the MARKED list in some cases.

A complete garbage collection cycle involves the folowing:

  1. First, all root objects are moved to the QUEUED list. Root objects are objects that are either assigned to global variables or, in the case of GC_MODE 2, objects that are assigned to local variables.

  2. While the QUEUED list is not empty, an object is popped off the front of the QUEUED list, moved to the MARKED list and its mark() method is called. This may result in more objects being added to the QUEUED list, and/or more objects being moved directly to the MARKED list.

  3. Once the QUEUED list is empty a 'sweep' occurs: The UNMARKED list is moved to the FREE list, the MARKED list is moved to the UNMARKED list, and the entire process begins again.

Non incremental GC simply performs the above steps in one hit.

Incremental GC performs step 2 in increments - ie: only some objects in the QUEUED list are marked each GC. The big question is how much 'work' to do each incremental step. The algorithm used is:

  • If START is the number of live bytes after the last sweep, ALLOCED is the number of bytes allocated since the last sweep and TRIGGER is the number of bytes to allocate before the next sweep occurs, incremental GC will pop/mark objects off the QUEUED list until at least (ALLOCED/TRIGGER)*START+ALLOCED bytes have been fully marked. This ensures the amount marked is (roughly) the same each GC, assuming the worse case where all allocated bytes will eventually be marked.

Incremental GC also requires the use of a 'write barrier': each time an unmarked object is assigned to a global/field variable, it will be moved to the QUEUED list. This is to handle cases where the variable being assigned to (or, more precisely, the object the variable contains prior to assignment) has already been marked, but the object being assigned is unmarked, either because it is newly created, or because it survived the last GC but is now in the UNMARKED list. GC_MODE 2 also requires a similar write barrier for local variables. The translator can logically eliminate some write barries.

Clone this wiki locally