Skip to content
Oliver Stueker edited this page May 5, 2015 · 1 revision

Almost all parsers will contain a block analyzer. This reflects the finest granularity of mapping file structure onto classes. A good example of a file with blocks is usually a punch/dump/archive file. Here is GAMESS-UK's punch file (with ellipses to reduce size). Note that GAMESS uses the term "block" explicity in the same way as we do.

GAMESS is very helpful to the parser-writer (not all codes are):

 * the blocks are explicitly delimited (by the 'block' records and the end of file). In this case the start of one block is the end of the last (there is no "end block").
 * the blocks has explicit and unique names (e.g "lowdin_atomic_charges records").
 * the block has semi-explicit semantics about the file structure ("index" and "elements")
 * blocks and lines are clearly separated by newlines

The general strategy is:

 * read each block and creat a new instance of GamessBlock for each.
 * fill each block with the unchanged lines
 * store all block in sequence in a BlockContainer

---

 * iterate through all blocks parsing each with a method specific to that block. Most blocks can be parsed independently of each other but sometimes it is useful or critical to know certain sizes (e.g. the number of atoms).
 * check the items in the block against a code-specific dictionary
 * create CML elements for each type of structure within the block (molecules, arrays, strings, etc.)

This will result in a file of CML elements (CMLElement) , termed XML or rawCML

The rawCML is then parsed with a parser that understands high-level CML semantics. It creates semanticCML ("completeCML") and may add annotations or further properties. For example it might calculate bonds from interatomic distances.

Clone this wiki locally