This is now unused in favor of instead utilizing a more mature pre-existing parsing framework
Later in development of Caustic, this may or may not be revisited
The basic_compiler
module is a less advanced compiler, but is used to
bootstrap the Compiler
The Compiler
class compiles grammars from Caustic grammar (.cag
) files into nodes,
and uses a grammer system built in Caustic grammar format and compiled with the basic_compiler
module
The Compiler
is loaded through the load_compiler()
function in the package,
and can be cached to the disk using the save_compiler()
function
The nodes
module provides the nodes themselves, and allows manually building grammar by
supplying nodes
The serialize
module provides functions for serializing and deserializing nodes
The util
module provides small utilities
Pragmas are special directives embedded in the grammar
These are only supported on the bootstrapped compile
module
$include [path]
Allows putting multiple grammar files together
Relative paths provided as [path]
will be checked against the following
directories, in order:
- The path of the includer/importer (if possible)
- The
builtin_path
of thecompiler
module (the location ofcompiler.py
) - The current directory
Comments may start with a #
A statement begins with an identifier, followed by an =
,
then an expression, and finally a ;
An identifier is a sequence of alphanumeric characters, underscores, and periods
Note: basic_compiler
will not accept identifiers with periods
Expressions consist of nodes, where a node can be as simple as a string to as complex as a group
nodes.Node.name
Named nodes are denoted by a name (alphanumeric, underscores, and periods), followed
by a :
, and then the node/expression
This controls the return value of containing groups
Note: basic_compiler
will not accept node names with periods
"Anonymous" named nodes are expressions prefixed with :
, but with
no leading name
"Unpack" nodes are expressions prefixed with ^:
Note: basic_compiler
will not accept unpack nodes
nodes.NodeGroup
The top level of an expression is implicitly grouped
A simple group node is opened by (
and closed by )
Groups match the nodes inside of them in a sequence in order
The return value of this group will be dependent on its contents' naming:
- A group containing no named nodes will return a list of its nodes' results
- A group containing nodes with "anonymous" names returns the last matched anonymous nodes' return value
- A group containing named nodes returns a dict containing a mapping of the names to the nodes' results
- Any unpack nodes will unpack either their elements (sequence) or their names and values into the surrounding group's result
Mixing anonymous and named expressions in a single group will result in an error
nodes.NodeGroup
,keep_whitespace=True
A whitespace sensitive group is opened by {
and closed by }
The only difference between this type of group and a normal group is that it does not implicitly
discard whitespace between its nodes
nodes.UnionNode
A union is opened by [
and closed by ]
Unions match any of their contained nodes
nodes.NodeRange
Can be created in the following ways:
- [node]
: Matches any amount of[node]
x- [node]
: Matchesx
or more of[node]
-x [node]
: Matches up to (but not including)x
of[node]
a-b [node]
: Matches betweena
(inclusive) andb
(exclusive) of[node]
Note that this should be placed after a (name)[#naming]
Real nodes are nodes that actually match content, such as strings or patterns
nodes.StringNode
The simplest node, denoted either by single quotes (''
) or double quotes (""
)
Supports escape characters
Note: despite the name of this node, it is important to remember that the nodes only match bytes!
nodes.PatternNode
Matches a regular expression, denoted by slashes (/
) in the following syntax:
target group
/
pattern/
flags
In a pattern, if a target group is given (as an integer), the result of this node will be the bytes of that group instead of the entire match
Supports these common RegEx flags:
i
: ignore case / case insensitivem
: multiline -^
matches beginning of line or string,$
matches end of eithers
: single-line / "dotall" -.
matches newlines as well
"Meta" nodes that don't actually match anything, but can change some context
nodes.Stealer
A "stealer" node is denoted by a !
, and is only acceptable in a group
If a group reaches a "stealer" node, then the group will raise an exception if any of the subsequent nodes fail
nodes.Context
A context is created with an opening <
and closing >
Context nodes always mach, with the result being the (string) contents
Context nodes should contain either a string, or a short sequence of alphanumeric characters and underscores
nodes.NodeRef
Denoted by an @
, followed by a node name (as a string of alphanumeric characters, underscores, and periods)
Matches the value of the targeted node, and returns the result of that
Must be bound using either its .bind()
method, or automatically through the
default compilers
Note: basic_compiler
will not accept node references with periods
- Implemented node saving and loading through the
serialize
module - Moved
compiler.bind_nodes()
toutil.bind_nodes()
- Completely reworked compiler caching
- Removed
$import
pragma - Moved
WHITESPACE_PATT
to.util
- Changed
nodes.Node.NO_RETURN
to singleton(ish)util.NO_MATCH
- Fixed an inaccuracy in README
- Added builtin
grammar.cag
to package - Added precompiled
precompiled_nodes.pkl
to package
- Fixed error causted by
compiler.py
Compiler.compile_buffermatcher()
passing unneeded kwarg to.pre_process()
- Made
NodeSyntaxError
self-formatting also include exception notes
- Added support for periods in node names
- Fixed
Compiler.post_process_compile()
not actually doing anything
- Implemented unpacking nodes
- Fixed several nodes improperly stripping whitespace
- Fixed unpacking never triggering
- Fixed
NodeRange
s raising exceptions upon backtracking