XLang

About

XLang is a parser framework for language modeling. It is also a demonstration of various ways one could build a yacc-based parser. Seven variations of parser are presented, varying in features.

The first variation (0_XLang_full), is a full-featured parser. With each successive variation, non-critical features are stripped until there is nothing left but a flex-less parser, the last variation (7_XLang_no-flex).

full, no-strings, no-comments, no-locations, no-reentrant, stdio, file, no-flex

The order in which features are removed is completely arbitrary. There's no good reason one cannot cannot have both "comments" and "file input" enabled, or both "reentrant" and "hand-rolled lexer" enabled.

The code is organized in a way such that anyone unfamiliar with Lex-Yacc can quickly isolate a variation that is most relevent, and most readily customizable to his/her specific requirements.

Variation vs. Feature Table

variation	lexer	input	reentrant	locations	comments	strings
full	flex	buffer	yes	yes	yes	yes
no-strings	flex	buffer	yes	yes	yes	no
no-comments	flex	buffer	yes	yes	no	no
no-locations	flex	buffer	yes	no	no	no
no-reentrant	flex	buffer	no	no	no	no
stdio	flex	stdio	no	no	no	no
file	flex	file	no	no	no	no
file-yyin	flex	file	no	no	no	no
no-flex	hand-rolled	file	no	no	no	no

HINT: The "stdio" variation is easiest to implement -- The flex lexer
      accepts standard stream input by default. It takes extra effort to
      hack flex into using "buffer input" or "file input".

Feature Detail

lexer:

    flex: Use flex to generate a lexer.
          This option is preferred when regex significantly simplifies the
          lexer.

    hand-rolled: Use a custom lexer.
                 This option is preferred when speed is critical -- Flex is
                 slow. This option is also preferred when unicode support
                 is needed -- Flex has no unicode support as of the time of
                 this writing.

input:

    buffer: Use in-memory buffer as input.
            This option is preferred when speed is critical -- Standard
            streams are slow. This option is also preferred when the input
            string already resides in memory and does not require passing
            through a standard stream in order to be used.

    stdio: Use standard input.
           This is the default behavior.

reentrant: Tell the parser not to use global variables.
           This option is required to make the parser multi-thread
           compliant.

locations: Tell the parser to provide a location variable "$@" for each BNF
           production. This option demonstrates the standard way to track
           syntax error line/column number using Lex-Yacc.

comments: Tell the lexer to conditionally enable/disable certain lexer
          rules for stateful scanning. This option demonstrates the
          standard way to ignore line/block comments.

strings: Tell the lexer to conditionally enable/disable certain lexer rules
         for stateful scanning. This option demonstrates *one* of the many
         ways to handle strings, characters, and escaped characters.

Requirements

Unix tools and 3rd party components (accessible from $PATH):

gcc (with -std=c++0x support), flex, bison, valgrind, cppcheck, doxygen, graphviz, ticpp

Environment variables:

$INCLUDE_PATH_EXTERN -- where "ticpp/ticpp.h" resides
$LIB_PATH_EXTERN -- where "libticppd.a" resides

Make Targets

target	action
all	make binaries
test	all + run tests
pure	test + use valgrind to check for memory leaks
dot	test + generate .png graph for tests
lint	use cppcheck to perform static analysis on .cpp files
doc	use doxygen to generate documentation
xml	test + generate .xml for tests
import	test + use ticpp to serialize-to/deserialize-from xml
clean	remove all intermediate files

FAQ

What is XLang ?

XLang is a starting point for people looking to construct their own language using Lex-Yacc. XLang is thoroughly tested and comes with its own test suite.
What isn't XLang ?

XLang isn't a unified parser front-end for every language under the sun. It makes some assumptions about the target language, meaning, literal values must be C-like. Otherwise, it places no restrictions on the grammar.
How should one use XLang ?

I recommend hacking the "stdio" example to your liking, but any of the other variations equally suitable as a base for beginners.
What licenses apply when using XLang ?

GPL3.
What's the motivation behind writing XLang ?

Lex-Yacc is a pragmatic solution that will get the job done, but it doesn't do so right out of the box. It needs some infrastructure to get most sizeable projects going.

I built XLang out of my own need for a parser SDK that takes care of the tedious tree building business so I could focus more on the fun stuff -- AST visitation.

But one may ask, isn't this exactly what production rule actions are for ? No. In order to perform multi-pass visitation to an AST, using Yacc, one would have to re-parse the input everytime, and that's wasteful. XLang builds an AST for you so you can visit it as many times as needed.

To make a long answer short, Tom Niemann's Lex-Yacc tutorial is my primary source of inspiration for building XLang. Please visit his site to learn more.
How does XLang differ from the example in Tom's tutorial ?

a) XLang uses C++, with unions used only where necessary. Tom's tutorial uses a C-style polymorphism technique where the last member of a class is a union of several types. I prefer C++ polymorphism.

b) XLang has its own memory management system based on allocators. Tom's tutorial omits AST node freeing entirely.

c) XLang offers several flavors of Lex-Yacc whereas Tom's tutorial offers just one "stdio" parser.

d) XLang's ASTs are "flattened", meaning lists are interpreted as lists instead of deep-recursing binary trees. This tree organization lends itself better to AST visitation (less likely to stack-overflow).
```
(1+2+3)        (1+2+3)
non-flattened: flattened:

    +             +
   / \          / | \
  +   3        1  2  3
 / \
1   2
```
e) XLang's visitor borrows concepts from Anand Shankar Krishnamoorthi's cooperative visitor and Jeremy Blosser's Java Tip 98 to enable visit double-dispatch without the need to implement the "accept" method for every single AST node class.

f) XLang uses namespaces and isolates all the AST building code into a static library.
Why allocators ?

I've incorporated a simple memory allocator in this project because I want to keep my AST node classes as clean as possible, without destructors that delete child nodes. But the way the allocator was written really doesn't improve the program's memory usage patterns. With some effort, I suppose one could upgrade that into a real allocator with true memory pooling.
Why c++0x ?

Lambda functions are one of the few c++0x feature used here, and only because they solve the problem elegantly. The goal is to deliberately avoid implementing clean-up code within the AST node destructors. This goes along a grander design decision to keep the AST node class clear of method clutter by moving all the predicate code into visitors. Lambdas help provide ad hoc anonymous callbacks for the allocator, whose job is to perform clean-up originally intended for the destructor.

References

"Flex your lexical analysis muscles": http://www.codeguru.com/cpp/cpp/algorithms/strings/article.php/c12717/Flex-Your-Lexical-Analysis-Muscles.htm; http://www.developer.com/net/cplus/article.php/3636641/Flex-Your-Lexical-Analysis-Muscles.htm

Additional Reading

http://osdir.com/ml/lex.flex.windows/2003-05/msg00017.html
http://tldp.org/HOWTO/Lex-YACC-HOWTO-5.html
http://net.pku.edu.cn/~course/cs201/2003/mirrorWebster.cs.ucr.edu/Page_softeng/softDevGuide_6.html

Keywords

Lex, Yacc, Flex, Bison, Parser, Reentrant

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.detail.md

README.detail.md

XLang

About

Variation vs. Feature Table

Feature Detail

Requirements

Make Targets

FAQ

References

Additional Reading

Keywords

Files

README.detail.md

Latest commit

History

README.detail.md

File metadata and controls

XLang

About

Variation vs. Feature Table

Feature Detail

Requirements

Make Targets

FAQ

References

Additional Reading

Keywords