Micropython: a quick gc tour

2014-06-07

In this post I will try to explain what I was able to gleam from looking over micropython's garbage collector, since the gc will most likely change over time this post is relevant as of the time of writing [1].

Also please note that I am neither an authority on micropython nor garbage collectors, I just enjoy working out how language implementations tick and micropython's gc seemed particularly well suited to picking apart over an evening.

Elevator summary

Micropython uses a simple mark-and-sweep ('stop the world') garbage collector; at runtime it is initialised and is given a range of memory to rule over, micropython has a simple 'allocation table' scheme for marking which parts of it's heap are in use, garbage collection is left up to implementations (ports to a platform) and is only triggered on allocation if there is insufficient remaining free memory to satisfy the request.

Magic numbers

Sadly there are a bunch of constants which I think are worth knowing, so please bear with me.

The garbage collector source lives in micropython/py/gc.c.

For the sake of throwing concrete numbers around I am going to be primarily talking about the unix implementation (micropython/unix/ in the distribution), and since my system is 64bit I will use a 64bit unix system as the example.

The following constants are fixed regardless of platform

BLOCKS_PER_ATB = 4
WORDS_PER_BLOCK = 4

The following is provided by the platform

# 64 bit unix system uses sizeof(unsigned long) = 8
# 32 bit unix system uses sizeof(int) = 4
BYTES_PER_WORD = 8

And the following constant is then derived from this

# BYTES_PER_BLOCK = WORDS_PER_BLOCK * BYTES_PER_WORD
BYTES_PER_BLOCK = 32

Our heap

Micropython leaves the heap size up to the platform, which passes the heap to gc_init via start and end pointers

def gc_init(start, end):
    ...

a simple subtraction is all that is required to find out the length our our heap.

On my 64 bit unix system we can see the size of the heap by looking at micropython/unix/main.c:63:

# heap size in bytes
# long heap_size = 128*1024 * (sizeof(machine_uint_t) / 4);
heap_size = 262144

and then the allocation of the heap and the call to gc_init:

# char *heap = malloc(heap_size);
heap = allocate(heap_size)
gc_init(heap, heap + heap_size);

So our gcs view of the world is currently

micropython unix heap of 262144 Bytes

Initialisation

Memory in micropython is managed in chunks of BYTES_PER_BLOCK (32 Bytes) called 'block(s)', the state of each block can be one of 4 things:

# block is not currently allocated
AT_FREE = 0
# block is the first in an allocated sequence
AT_HEAD = 1
# block is allocated but is not the first in this sequence (it will be release when the matching HEAD is release)
AT_TAIL = 2
# special value used for marking during tracing
AT_MARK = 3

To record these states micropython uses an allocation table made up of 'Allocation Table Blocks' (henceforce 'ATB(s)'), each 8 bits in size.

As each block has 4 possible states this requires 2 bits per block to encode, so each ATB 'manages' 4 blocks in the pool.

micropython ATB

Micropython's gc divides it's heap into 2 areas [2], these 2 sections are called the 'Allocation Table' ('AT') and the 'Pool'.

The size of the allocation table is found via

total_byte_len = end - start
#alloc_table_byte_len = total_byte_len / (1 + BITS_PER_BYTE / 2 * BYTES_PER_BLOCK)
alloc_table_byte_len = 2032

since each ATB manages 4 blocks this means our gc now has quite a few blocks [3]

# num_blocks = alloc_table_byte_len * 4
num_blocks = 8218 blocks

micropython heap divided into Allocation Table (2032 Bytes) and Pool (260112 Bytes)

Allocation

Allocation is also quite simple and takes the number of bytes the caller would like

def gc_alloc(n_bytes):
    ...

gc_alloc must first convert this number of bytes into a number of blocks (as each block is 32 bytes)

gc_alloc then steps through the allocation table looking at each 2bit pair, each representing the allocated-state for a block, looking for a continuous sequence of n free blocks.

If gc_alloc is unsuccessful in it's search it then triggers a gc_collect and then tries again, if it is still unsuccessful then it gives up (return null).

If successful then we must mark each of the blocks as allocated, we also want all of this sequence to be treated as one logical block (they are allocated together and so should be freed together) so we mark the first block in the sequence as a head and the remaining blocks as tails

For example consider a call of gc_alloc(5) where our allocation table looks as follows:

micropython alloc example 1

There is a space in there sufficient to satisfy the request:

micropython alloc example 2

So we mark the first block as a head (AT_HEAD) and the remaining 4 blocks as tails (AT_TAIL):

micropython alloc example 3

We then return a pointer to the first block in this sequence to the caller.

Ending notes

I think here is a good enough point to end on, Micropython's gc is very clean and simple and I recommend trying to dig through it yourself.

Of course there is more to cover as in this post I only went over initialisation and allocation, the remaining interesting part is collection which I hope to cover in a later post.

[1] https://github.com/micropython/micropython commit hash 95fd3528c12c0ae548fadc5e139e5e1dcf1da63e date Tue Jun 3 19:29:03 2014 +0100

[2] There would be a 3rd area for finalisers, but for the sake of brevity I am assuming they were disabled at compile time (MICROPY_ENABLE_FINALISER set to false).

[3] The astute reader may notice that this leaves 16 Bytes 'wasted' at the end of our pool (half a block)