BLoC

This project proposes a file format for programs encoded in binary lambda calculus (BLC). Its main goal is to minimize redundancy by incorporating reference-based term sharing, while remaining easy to work with. The included bloc tool converts BLC to BLoC and greatly reduces its size.

The additional tool blocade (found in BLoCade) can compile .bloc files to various targets, including a shared pure BLC representation.

Results

Before explaining the format or its optimization techniques, let me first show you some results:

x86 C compiler 8cc translated to lambda calculus:
- the original expression takes ~5M bytes in bit-encoded BLC
- the same expression in BLoC needs only ~640K bytes (which is around 2x the original 8cc!)
The bruijn expression fac (+30), where fac is the factorial implementation from std/Math:
- the original expression takes 1200 bytes in bit-encoded BLC
- the same expression in BLoC needs only 349 bytes
My solution for the “Advent of Code” challenge 2022/01 in bruijn:
- the original expression takes 6258 bytes in bit-encoded BLC
- the same expression in BLoC needs only 935 bytes

You can find these examples in test/.

Format

It’s assumed that all bit-encoded strings are padded with zeroes at the end.

Header

from	to	content
0x00	0x04	identifier: “BLoC”
0x04	0x06	number of entries
0x06	0x??	entries

Entry

This reflects the basic structure of an expression. It uses the following derivation of normal bit-encoded BLC:

prefix	content
010M	abstraction of expression `M`
00MN	application of `M` and `N`
1ⁱ⁺¹0	bruijn index `i`
011I	index* to an entry

(*): The encoding of indices is quite special: $I=XA$, where $X\in\{00,01,10,11\}$ and length of binary index $A$ is $L(A)\in\{1,2,4,8\}$ byte respectively (see src/{build,parse}.c).

The final program will be in the last entry. The indices start counting from the number of entries down to 0.

Example

Let E be some kind of expression like E=\x.(((M (\y.N)) M) N), where M and N are both arbitrary expressions of length 16.

The raw BLC expression of E would then be E=00010101M00NMN. This obviously has the drawback of redundant repetition of M and N.

A possible encoding in BLoC:

from	to	content
0x00	0x04	“BLoC”
0x04	0x06	number of entries: 3
0x06	0x17	encoded `M`: gets a bit longer due to different encoding
0x17	0x28	encoded `N`: gets a bit longer due to different encoding
0x28	0x34	`00010010010011<M>00011<N>011<M>011<N>`, where `<M>=00{1}` and `<N>=00{0}` are indices with length of 1.25 byte

Even in this small example BLoC uses less space than BLC (0x34 vs. 0x42 bytes). Depending on the content of M and N, this could have potentially been compressed even more.

You can dump the bloc table using the -d/--dump flag of bloc. Some additional dumps for testing can also be found in test/.

Optimizer

The optimizer converts a normal BLC expression to the BLoC format.

In order to find the largest repeating sub-expressions, the expression first gets converted to a hashed tree (similar to a Merkle tree).

The repeating sub-trees then get sorted by length and inserted into the final table, while replacing them with references in the original expression (see src/tree.c for more).

As of right now, expressions don’t get beta-reduced or manipulated in any other way. As an idea for the future, long expressions could get reduced using different techniques/depths and then get replaced with the shortest one (as fully reduced expressions aren’t necessarily shorter).

Libraries

pqueue [BSD 2-Clause]: Simple priority queue implementation
xxHash [BSD 2-Clause]: Extremely fast hash algorithm

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
inc		inc
src		src
test		test
.gitignore		.gitignore
license		license
makefile		makefile
options.ggo		options.ggo
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BLoC

Results

Format

Header

Entry

Example

Optimizer

Libraries

About

Releases

Packages

Languages

License

marvinborner/BLoC

Folders and files

Latest commit

History

Repository files navigation

BLoC

Results

Format

Header

Entry

Example

Optimizer

Libraries

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages