-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathnotes
64 lines (38 loc) · 2.39 KB
/
notes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
int sregexec(Biobuf *bp, /* file buffer to read */
Reprog *progp, /* regex prog to excecute */
unsigned long start, /* pos to start at in file */
unsigned long end, /* pos to end at in file */
Srematch *m); /* Srematch pointer */
Srematch *m is a struct for storing the beginning and end positions
in a file of the bytes matched by progp.
sregexec works as follows:
1) allocate static thread list and set various variables
2) move to index start in the file
3) begin stage 1 execution:
3.1) read through file until first char in progp is found
3.2) once first char is found record its position in m->s
3.3) begin normal execution
3.4) if match is reached record current position in m->e
4) if stage 1 success return success
5) otherwise run stage 2:
malloc thread list and call stage 1 with it
expression to match a C function:
^([A-Za-z_][A-Za-z_*0-9]* ?)+[\n \t]+[A-Za-z_][A-Za-z_0-9]*\((.*\n*)+\)[\n \t]*{$((^\t+.*$)|(^$)|(^[A-Za-z_]+[A-Za-z_0-9]*:$))+}\n
^([A-Za-z_][A-Za-z_*0-9]* ?)+\**[\n \t]+[A-Za-z_][A-Za-z_0-9]*\((.*(/\*.+\*/\n)?)+\)[\n \t]*{$((^\t+.*$)|(^$)|(^[A-Za-z_]+[A-Za-z_0-9]*:$))+}$
^([A-Za-z_][A-Za-z_*0-9]* ?)+\**[\n \t]+[A-Za-z_][A-Za-z_0-9]*\((.*\n*)+\)( /\*.*\*/)?[\n \t]+{$((^\t+.*$)|(^$)|(^[A-Za-z_]+[A-Za-z_0-9]*:$))+^}$
Using lazy matching and allowing `.` to match newlines, we can reduce a
function matching expression to the following:
^([A-Za-z_][A-Za-z_*0-9]* ?)+\**[\n ]*[A-Za-z_][A-Za-z_0-9]*\(.+\).+{$.+^}$
^([A-Za-z_][A-Za-z_*0-9]*[ \t]?)+\**[\n \t]*[A-Za-z_][A-Za-z_0-9]*\(([^)]+\n?)+\)([\n \t]*/\*.+\*/)?[\n \t]?{$.+^}$
3 programs:
- siv, a grep like program for finding things in streams, name is sieve with no e's
- pike, a stream editor with sam-like syntax, named after Rob Pike.
- auk, an awk implementation that accepts structural regular expressions, named after the great Auk.
In siv we'll store the Sresub's in a 2D
For each expression given to siv, read through the file and find the matches
for that expression, and store all these matches in a Sresubarr[]
Then for each layer, starting from layer 1 (second layer), check if each Sresub is contained within
an Sresub in the previous layer, and discard the Sresub's that aren't
Adding a check for successful extract() in siv() makes searching through files with few matches
approximately 75% faster.
The new algorithm for siv() should be faster in cases where there are lots of matches. *Need to test*