Table of Contents

Computational infrastructure

Current components

 * [[JUMBO-Converters]] (Java) for legacy2CML and other transformations
 * [[Lensfield2]] build system with dependencies
 * [[|RESTful]] system for uploading and aggregation
 * [[|Greenchain]] server on virtual machine at Cambridge - allows free upload (ca 25 GB available)
 * [[Chempound]] a standalone database server for archiving the outputs of computational chemistry calculations.

Current development

 * [[ANTLR]] technology for parsing (QB and Weerapong developing).

Dependencies and underlying technology

 * [[|Java 1.5 JDK]]
 * [[|Maven]] for resolving Java dependencies. Check [[Maven]] for a basic tutorial and known problems.
 * [[|Mercurial]] for interacting with software repositories.  Check [[Mercurial]] for a basic tutorial and known problems.
 * [[|Avogadro]] for visualizing the parsed output. Check [[Avogadro]] for instructions about how to install it.

Optional (Opensource) Components

 * [[NWChem]] - a powerful Opensource electronic structure code.
 * [[|Avogadro]] - an Opensource molecular modelling environment

Parsing (and chunking) QC datafiles

Ways to create semantic compchem

 * Embed calls in the code. Current libraries include:
    1. [[FoX]] (FORTRAN95, Toby White).
    1. [[JUMBO]]/[[CMLXOM]] (Java, Peter Murray-Rust)
    1. [[|LMX]] (move to another section if I am wrong)
 * Write scripts or programs that read files and convert into semantic form:
    1. [[JUMBO-Converters]]
    1. [[Openbabel]]
    1. [[|cclib]]
 * Write high-level parsers:
    1. [[ANTLR]]

The current approach adopted by the Quixote Project is to use the JUMBO-Converters.

Attributes for Compchem

    • NB:** we are currently working this out the prototype on the Prototype_data page.
Taken from the EtherPad at
    • Metadata:**
 * author email in logfile (e.g. through title) (is this normally in the logfile or it would be a good practice)
 * datacite DOI in logfile (pre-publication)
 * publication associated to the logfile (if published)
    • Definition of the system:**
 * geometry/structure/nuclear coordinates (its all the same thing, size n)
 * charge/spin/state (from my point of view, the spin and the state go in the provenance section, they are constraints to the wavefunction)
    • Provenance (type of calculation):**
 * level of the theory (RHF, B3LYP, MP2, AM1, etc.)
 * basis set (either with an agreed-upon name, as in BSSE, or custom basis sets)
 * additional details to the level of the theory (frozen core, etc.)
 * convergence parameters for SCF, CC iterations, etc.
 * initial guess for the iterative procedures (e.g., Hückel guess for SCF)
 * algorithm used for the iterative procedures
    • Results of the calculation (observables):**
 * energy
 * energy gradient (size n)
 * energy hessian (size n^2^)
 * wave function (size n)
 * density matrix (size n^2^)
 * Mulliken charges (or some other type) (size n)
 * Normal Modes; hessian eigenvalues, eigenvectors
    • Performance of the calculation:**
 * wall-clock time
 * CPU time
 * number of cores it ran into
 * total RAM used
 * scratch space in disk used
 * code exited successfully or unsuccessfully
    • CML examples:**
 * [[|GULP ouptut]]
 * [[|CASTEP output (shows use of properties)]]
 * [[|similar CASTEP output]]
 * [[|DLPOLY output]]


 * [[Dictionaries_examples]]
 * [[Creating_dictionaries]]

Formats for storing structured QC data

General formats

 * [[CML]]
 * [[|HDF5]]
 * [[|Q5cost]]

QC formats

 * CMLcomp: [[|Old schema]], [[|Newer preprint]]

QC codes and their datafiles' structure

Some of the codes we intend to support

 * [[DALTON]]
 * [[Gaussian]]
 * [[GAMESS]]
 * [[GAMESS-UK]]
 * [[MOLCAS]]
 * [[MOLDEN]]
 * [[MOLPRO]]
 * [[MPQC]]
 * [[MOPAC7]]
 * [[NWChem]]
 * [[ORCA]]
 * [[QChem]]

A long list of quantum chemistry and solid state physics codes:

Please edit the pages that these point to. Add:

 * any home page(s)
 * examples of files
 * notes on any already existing parsers
 * person of contact

Open datasets

 * [[Pablo_Echenique's_dataset]]
 * [[cclib_test_set]]
 * [[|Mark Monroe's dataset]]
 * [[Three_NWChem_6.0_single-points]]

Uploading and downloading

We expect a variety of approaches to upload and download as people try out different mechanisms. Here we describe the simple REST approach.


We have set up a server at (The "patents" is historical - we may be able to rename it). This server is wide open in all respects so please don't advertise it to spammers. We may tackle security later -if the server is hacked or damaged we simply close it and start again.

You can use a REST-based approach to:

 * upload files to server
 * download files/URLs from server
 * list files in a server "directory"
 * delete files/URLs on server

The URL structure reflects the local files hierarchy directly so I shall often simply call the webpages "files" and "directories"

The power of REST is:

 * its simplicitly
 * the library support
 * the warm feeling you get from doing something really simply that works exactly as you want.

The current HTTP commands are:

 * PUT (or possibly POST) puts a file
 * GET gets a file
 * DELETE delete a file

That's it.

REST is supported in almost ball languages (I don't know about FORTRAN and I wouldn't use it anyway for this). We shall use Java from: Check this out (if you haven't done already). The code (if you need it) is in: . The routines we shall use are:

 * }



 * [[|The S stands for Simple]] Why PMR uses REST, not SOAP
 * [[|Learn REST: A Tutorial (by M. Elkstein)]]
