-
Notifications
You must be signed in to change notification settings - Fork 6
/
assignment-one.xml
190 lines (190 loc) · 12.9 KB
/
assignment-one.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
<sheaf>
<section>
<assignment title="Grammars, Lexing, and Parsing">
<instructions>
<text><![CDATA[
In this assignment you will be implementing lexing and parsing algorithms using Python. You must submit a single Python source file named <code>hw1/hw1.py</code>. <b style="color:firebrick;">An incorrect file path and/or file name will result in an automatic 10-point deduction.</b> Please follow the <a href="#A">gsubmit</a> directions.
]]></text>
<paragraph><![CDATA[
Your solutions to each of the problem parts below will be graded on their correctness, concision, and mathematical legibility. The different problems and problem parts rely on the lecture notes and on each other; carefully consider whether you can use functions from the lecture notes, or functions you define in one part within subsequent parts.
]]></paragraph>
<paragraph><![CDATA[
<b style="color:green;">A testing script with many test cases is available for download: <a href="hw1-tests.py"><code>hw1-tests.py</code></a>. You should be able to place it in the same directory as <code>hw1.py</code> and run it. Feel free to modify or extend it as you see fit.</b>
]]></paragraph>
</instructions>
<problems>
<problem>
<parts>
<part>
<text><![CDATA[
Implement a function <code>regexp()</code> that takes a list of tokens and returns a single string. This string should be a regular expression that matches only the tokens in that list and nothing else.
]]></text>
<code class="py"><![CDATA[
>>> re.match(regexp(["abc", "def", "xyz"]), "abc")
<_sre.SRE_Match object; span=(0, 3), match='abc'>
>>> re.match(regexp(["abc", "def", "xyz"]), "ghi")
None
]]></code>
</part>
<part>
<text><![CDATA[
Implement a function <code>tokenize()</code> that takes two arguments: a list of terminals and a concrete syntax character string to tokenize. It should build a regular expression, use that regular expression to convert the second string into a token sequence (represented in Python as a list of strings), and then return that token sequence. <b style="color:green;">You should not assume the tokens will always be separated by spaces in the concrete syntax</b>.
]]></text>
<code class="py"><![CDATA[
>>> tokenize(["red", "blue", "#"], "red#red blue# red#blue")
['red', '#', 'red', 'blue', '#', 'red', '#', 'blue']
]]></code>
</part>
<part>
<text><![CDATA[
Implement a predictive recursive descent parsing algorithm <code>tree()</code> that takes a token sequence (represented in Python as a list of strings) and parses that token sequence into a parse tree according to the following grammar definition:
]]></text>
<text hooks="math"><![CDATA[
\begin{eqnarray}
<i>tree</i> & ::= & <b>two</b> <b>children</b> <b>start</b> <i>tree</i> <b>;</b> <i>tree</i> <b>end</b> \\
& | & <b>one</b> <b>child</b> <b>start</b> <i>tree</i> <b>end</b>\\
& | & <b>zero</b> <b>children</b>
\end{eqnarray}
]]></text>
<text><![CDATA[
Each node in the parse tree should be labeled <code>"Two"</code>, <code>"One"</code>, or <code>"Zero"</code>. An example input and output are provided below. <b style="color:green;">Note that your implementation of <code>tree()</code> must return a tuple containing the parse tree as well as the token list containing the remaining unparsed tokens (the token list may be empty).</b>
]]></text>
<code class="py"><![CDATA[
>>> tree(tokenize(\
["children", "child", "two", "one", "zero", "start", "end", ";"],\
"one child start two children start one child start zero children end; zero children end end"\
))
({'One': [{'Two': [{'One': ['Zero']}, 'Zero']}]}, [])
]]></code>
<text><![CDATA[
For a different view of the parse tree component (the first element in the output tuple above), we reproduce below the parse tree with indentation using the <code>json</code> module. <b style="color:green;">Your implementation does not need to format its results in this way and can produce it in the above format.</b>
]]></text>
<code class="py"><![CDATA[
{'One': [
{'Two': [
{'One': [
'Zero'
]
},
'Zero'
]
}
]
}
]]></code>
</part>
</parts>
</problem>
<problem>
<text><![CDATA[
In this problem you will implement a complete parser for a programming language with the following grammar definition (note that in this language, variables always appear with an initial <b>$</b> character and numeric literals always appear with an initial <code>#</code> character, and these special characters could be separated by whitespace from the actual variable or number):
]]></text>
<text hooks="math"><![CDATA[
\begin{eqnarray}
<i>program</i> & ::= & <b>print</b> <i>term</i> <b>;</b> <i>program</i> \\
& | & <b>input</b> <b>$</b> <i>variable</i> <b>;</b> <i>program</i> \\
& | & <b>assign</b> <b>$</b> <i>variable</i> <b>:=</b> <i>term</i> <b>;</b> <i>program</i> \\
& | & <b>end</b> <b>;</b> \\
<i>formula</i> & ::= & <b>true</b> \\
& | & <b>false</b> \\
& | & <b>not (</b> <i>formula</i> <b>)</b> \\
& | & <b>xor (</b> <i>formula</i> <b>,</b> <i>formula</i> <b>)</b> \\
& | & <b>equal (</b> <i>term</i> <b>,</b> <i>term</i> <b>)</b> \\
& | & <b>less (</b> <i>term</i> <b>,</b> <i>term</i> <b>)</b> \\
& | & <b>greater (</b> <i>term</i> <b>,</b> <i>term</i> <b>)</b> \\
<i>term</i> & ::= & <b>#</b> <i>number</i> \\
& | & <b>$</b> <i>variable</i> \\
& | & <b>plus (</b> <i>term</i> <b>,</b> <i>term</i> <b>)</b> \\
& | & <b>max (</b> <i>term</i> <b>,</b> <i>term</i> <b>)</b> \\
& | & <b>if (</b> <i>formula</i> <b>,</b> <i>term</i> <b>,</b> <i>term</i> <b>)</b> \\
<i>variable</i> & ::= & non-empty character strings containing only alphabetical characters (lowercase or uppercase) \\
<i>number</i> & ::= & <code>0|([1-9][0-9]*)</code>
\end{eqnarray}
]]></text>
<text><![CDATA[
In addition to a function corresponding to the tokenization algorithm, the parser implementation will consist of five interdependent Python functions: <code>program()</code>, <code>formula()</code>, <code>term()</code>, <code>variable()</code>, and <code>number()</code>. Each function corresponds to one of the productions. Below is an implementation of <code>number()</code>:
]]></text>
<code class="py"><![CDATA[
import re
def number(tokens):
if re.match(r"^(0|([1-9][0-9]*))$", tokens[0]):
return ({"Number": [int(tokens[0])]}, tokens[1:])
]]></code>
<text><![CDATA[
You should examine each of the other productions to determine whether a <a href="#47254608df414ace8d04c630a2b15689">predictive recursive descent</a> parser or a <a href="#95cb72f5b4d24ddba2c8081c7b42618a">backtracking recursive descent</a> parser is required. You should also make calls within each function to one or more of the other functions as needed.
]]></text>
<text><![CDATA[
<p>Your algorithm must parse all programs in this programming language correctly. However, it need not return useful error messages if the input provided does not conform to the grammar definition.</p>
]]></text>
<parts>
<part>
<text><![CDATA[
Implement a function <code>variable()</code> that takes a single token list as an argument and, if the token sequence conforms to the grammar definition, returns a tuple containing two values: an instance of the abstract syntax (i.e., parse tree) with the root node label <code>"Variable"</code>, and the remaining, unparsed suffix of the input token sequence.
]]></text>
</part>
<part>
<text><![CDATA[
Implement a function <code>term()</code> that takes a single token list as an argument and, if the token sequence conforms to the grammar definition, returns a tuple containing two values: an instance of the abstract syntax (i.e., parse tree) with an appropriate root node label (<code>"Plus"</code>, <code>"Max"</code>, <code>"If"</code>, <code>"Variable"</code>, or <code>"Number"</code>), and the remaining, unparsed suffix of the input token sequence.
]]></text>
<code class="py"><![CDATA[
>>> term(['max', '(', 'plus', '(', '#', '2', ',', '#', '3', ')', ',', '#', '4', ')'])
(
{ "Max": [
{ "Plus": [
{ "Number": [2]},
{ "Number": [3]}
]
},
{ "Number": [4]}
]
},
[] # Empty token sequence.
)
]]></code>
</part>
<part>
<text><![CDATA[
Implement a function <code>formula()</code> that takes a single token list as an argument and, if the token sequence conforms to the grammar definition, returns a tuple containing two values: an instance of the abstract syntax (i.e., parse tree) with an appropriate root node label (<code>"True"</code>, <code>"False"</code>, <code>"Not"</code>, <code>"Xor"</code>, <code>"Equal"</code>, <code>"Less"</code>, or <code>"Greater"</code>), and the remaining, unparsed suffix of the input token sequence.
]]></text>
</part>
<part>
<text><![CDATA[
Implement a function <code>program()</code> that takes a single token list as an argument and, if the token sequence conforms to the grammar definition, returns a tuple containing two values: an instance of the abstract syntax (i.e., parse tree) with an appropriate root node label (<code>"Print"</code>, <code>"Input"</code>, <code>"Assign"</code>, or <code>"End"</code>), and the remaining, unparsed suffix of the input token sequence.
]]></text>
</part>
<part>
<text><![CDATA[
Implement a function <code>parse()</code> that takes a single string as an argument. If the input string conforms to the grammar of the programming language, the function should return a parse tree corresponding to that input string.
]]></text>
</part>
</parts>
</problem>
<problem>
<text><![CDATA[
Extend or modify your parser from the previous problem to support the following grammar:
]]></text>
<text hooks="math"><![CDATA[
\begin{eqnarray}
<i>program</i> & ::= & <b>print</b> <i>term</i> <b>;</b> <i>program</i> \\
& | & <b>input</b> <b>$</b> <i>variable</i> <b>;</b> <i>program</i> \\
& | & <b>assign</b> <b>$</b> <i>variable</i> <b>:=</b> <i>term</i> <b>;</b> <i>program</i> \\
& | & <b>end</b> <b>;</b> \\
<i>formula</i> & ::= & <b>true</b> | <b>false</b> | <b>not (</b> <i>formula</i> <b>)</b> | <b>xor (</b> <i>formula</i> <b>,</b> <i>formula</i> <b>)</b> \\
& | & <b>equal (</b> <i>term</i> <b>,</b> <i>term</i> <b>)</b> | <b>less (</b> <i>term</i> <b>,</b> <i>term</i> <b>)</b> | <b>greater (</b> <i>term</i> <b>,</b> <i>term</i> <b>)</b> \\
& | & <b>(</b> <i>formula</i> <b>xor</b> <i>formula</i> <b>)</b> \\
& | & <b>(</b> <i>term</i> <b>==</b> <i>term</i> <b>)</b> | <b>(</b> <i>term</i> <b><</b> <i>term</i> <b>)</b> | <b>(</b> <i>term</i> <b>></b> <i>term</i> <b>)</b> \\
<i>term</i> & ::= & <b>#</b> <i>number</i> | <b>$</b> <i>variable</i> | <b>plus (</b> <i>term</i> <b>,</b> <i>term</i> <b>)</b> | <b>max (</b> <i>term</i> <b>,</b> <i>term</i> <b>)</b> | <b>if (</b> <i>formula</i> <b>,</b> <i>term</i> <b>,</b> <i>term</i> <b>)</b> \\
& | & <b>(</b> <i>term</i> <b>+</b> <i>term</i> <b>)</b> | <b>(</b> <i>term</i> <b>max</b> <i>term</i> <b>)</b> \\
& | & <b>(</b> <i>formula</i> <b>?</b> <i>term</i> <b>:</b> <i>term</i> <b>)</b> \\
<i>variable</i> & ::= & non-empty character strings containing only alphabetical characters (lowercase or uppercase) \\
<i>number</i> & ::= & positive integers or negative integers preceded by <b>−</b>
\end{eqnarray}
]]></text>
<text><![CDATA[
The language that conforms to the above grammar is a strict superset of the language in the previous problem, so your solution should behave the same way as before on the subset of the language in the previous problem. You may use the same parse tree labels for both the prefix and infix notation for each construct and operator (<b>+</b> corresponds to <b>plus</b>, <b>?</b> and <b>:</b> are the <a href="https://en.wikipedia.org/wiki/%3F:">ternary operator</a> and correspond to <b>if</b>, <b>==</b> corresponds to <b>equal</b>, <b><</b> corresponds to <b>less</b>, and <b>></b> corresponds to <b>greater</b>).
]]></text>
</problem>
</problems>
</assignment>
</section>
</sheaf>