notes

int sregexec(Biobuf *bp, /* file buffer to read */
	 Reprog *progp,	 /* regex prog to excecute */
	 unsigned long start,	/* pos to start at in file */
	 unsigned long end,	/* pos to end at in file */
	 Srematch *m);	/* Srematch pointer */

Srematch *m is a struct for storing the beginning and end positions
in a file of the bytes matched by progp.

sregexec works as follows:
	1) allocate static thread list and set various variables
	2) move to index start in the file
	3) begin stage 1 execution:
		3.1) read through file until first char in progp is found
		3.2) once first char is found record its position in m->s
		3.3) begin normal execution
		3.4) if match is reached record current position in m->e
	4) if stage 1 success return success
	5) otherwise run stage 2:
		malloc thread list and call stage 1 with it

expression to match a C function:
^([A-Za-z_][A-Za-z_*0-9]* ?)+[\n \t]+[A-Za-z_][A-Za-z_0-9]*\((.*\n*)+\)[\n \t]*{$((^\t+.*$)|(^$)|(^[A-Za-z_]+[A-Za-z_0-9]*:$))+}\n

^([A-Za-z_][A-Za-z_*0-9]* ?)+\**[\n \t]+[A-Za-z_][A-Za-z_0-9]*\((.*(/\*.+\*/\n)?)+\)[\n \t]*{$((^\t+.*$)|(^$)|(^[A-Za-z_]+[A-Za-z_0-9]*:$))+}$

^([A-Za-z_][A-Za-z_*0-9]* ?)+\**[\n \t]+[A-Za-z_][A-Za-z_0-9]*\((.*\n*)+\)( /\*.*\*/)?[\n \t]+{$((^\t+.*$)|(^$)|(^[A-Za-z_]+[A-Za-z_0-9]*:$))+^}$


Using lazy matching and allowing `.` to match newlines, we can reduce a
function matching expression to the following:

^([A-Za-z_][A-Za-z_*0-9]* ?)+\**[\n ]*[A-Za-z_][A-Za-z_0-9]*\(.+\).+{$.+^}$
^([A-Za-z_][A-Za-z_*0-9]*[ \t]?)+\**[\n \t]*[A-Za-z_][A-Za-z_0-9]*\(([^)]+\n?)+\)([\n \t]*/\*.+\*/)?[\n \t]?{$.+^}$


3 programs:

- siv, a grep like program for finding things in streams, name is sieve with no e's
- pike, a stream editor with sam-like syntax, named after Rob Pike.
- auk, an awk implementation that accepts structural regular expressions, named after the great Auk.


In siv we'll store the Sresub's in a 2D


For each expression given to siv, read through the file and find the matches
for that expression, and store all these matches in a Sresubarr[]

Then for each layer, starting from layer 1 (second layer), check if each Sresub is contained within
an Sresub in the previous layer, and discard the Sresub's that aren't


Adding a check for successful extract() in siv() makes searching through files with few matches
approximately 75% faster.

The new algorithm for siv() should be faster in cases where there are lots of matches. *Need to test*