Elva is a generic tokenizer written in Java that can be used for basically anything.
The goal of this project was to exercise my Java skills.
Currently recognized tokens:
- Numbers (floats and wholes) (positive and negative)
- Parenthesis
- Common math operations (+, -, *, /, =)
- Commas
- Identifiers (can include digits and underscores)
- Whitespace
- EOF
- Any other unrecognized token is saved as an "UNKNOWN" token
Tokens, when stringified look like standard XML like tags.
So, for example the input x = y * 5 would result in the following tokens:
<IDENT start="1" end="1">x</IDENT>
<WHITESPACE start="2" end="2"/>
<EQUALS start="3" end="3"/>
<WHITESPACE start="4" end="4"/>
<IDENT start="5" end="5">y</IDENT>
<WHITESPACE start="6" end="6"/>
<MULTIPLY start="7" end="7"/>
<WHITESPACE start="8" end="8"/>
<NUMBER start="9" end="9">5</NUMBER>
<EOF start="9" end="9"/>The tokenizer is tested with unit tests.