Resources_and_technology

<<toc></toc>>

Table of Contents Computational infrastructure Current components Current development Dependencies and underlying technology Optional (Opensource) Components Parsing (and chunking) QC datafiles Ways to create semantic compchem Attributes for Compchem Dictionaries Formats for storing structured QC data General formats QC formats QC codes and their datafiles' structure Open datasets Uploading and downloading Server Download Misc

Computational infrastructure

Current components

 * [[JUMBO-Converters]] (Java) for legacy2CML and other transformations
 * [[Lensfield2]] build system with dependencies
 * [[http://quixote.wikispot.org/Resources_and_technology|RESTful]] system for uploading and aggregation
 * [[http://quixote.wikispot.org/Resources_and_technology|Greenchain]] server on virtual machine at Cambridge - allows free upload (ca 25 GB available)
 * [[Chempound]] a standalone database server for archiving the outputs of computational chemistry calculations.

Current development

 * [[ANTLR]] technology for parsing (QB and Weerapong developing).

Dependencies and underlying technology

 * [[http://www.oracle.com/technetwork/java/javase/downloads/index.html|Java 1.5 JDK]]
 * [[http://quixote.wikispot.org/Tutorials_and_problems#Maven|Maven]] for resolving Java dependencies. Check [[Maven]] for a basic tutorial and known problems.
 * [[http://mercurial.selenic.com/|Mercurial]] for interacting with software repositories.  Check [[Mercurial]] for a basic tutorial and known problems.
 * [[http://sourceforge.net/projects/avogadro/files/avogadro/1.0.1/|Avogadro]] for visualizing the parsed output. Check [[Avogadro]] for instructions about how to install it.

Optional (Opensource) Components

 * [[NWChem]] - a powerful Opensource electronic structure code.
 * [[http://avogadro.openmolecules.net/|Avogadro]] - an Opensource molecular modelling environment

Parsing (and chunking) QC datafiles

Ways to create semantic compchem

 * Embed calls in the code. Current libraries include:

    1. [[FoX]] (FORTRAN95, Toby White).
    1. [[JUMBO]]/[[CMLXOM]] (Java, Peter Murray-Rust)
    1. [[http://www.codalogic.com/lmx/|LMX]] (move to another section if I am wrong)

 * Write scripts or programs that read files and convert into semantic form:

    1. [[JUMBO-Converters]]
    1. [[Openbabel]]
    1. [[http://cclib.sourceforge.net/wiki/index.php/Main_Page|cclib]]

 * Write high-level parsers:

    1. [[ANTLR]]

The current approach adopted by the Quixote Project is to use the JUMBO-Converters.

Attributes for Compchem

- NB:** we are currently working this out the prototype on the Prototype_data page.

Taken from the EtherPad at http://okfnpad.org/zcam2010

- Metadata:**

 * author email in logfile (e.g. through title) (is this normally in the logfile or it would be a good practice)
 * datacite DOI in logfile (pre-publication)
 * publication associated to the logfile (if published)

- Definition of the system:**

 * geometry/structure/nuclear coordinates (its all the same thing, size n)
 * charge/spin/state (from my point of view, the spin and the state go in the provenance section, they are constraints to the wavefunction)

- Provenance (type of calculation):**

 * level of the theory (RHF, B3LYP, MP2, AM1, etc.)
 * basis set (either with an agreed-upon name, as in BSSE, or custom basis sets)
 * additional details to the level of the theory (frozen core, etc.)
 * convergence parameters for SCF, CC iterations, etc.
 * initial guess for the iterative procedures (e.g., Hückel guess for SCF)
 * algorithm used for the iterative procedures

- Results of the calculation (observables):**

 * energy
 * energy gradient (size n)
 * energy hessian (size n^2^)
 * wave function (size n)
 * density matrix (size n^2^)
 * Mulliken charges (or some other type) (size n)
 * Normal Modes; hessian eigenvalues, eigenvectors

- Performance of the calculation:**

 * wall-clock time
 * CPU time
 * number of cores it ran into
 * total RAM used
 * scratch space in disk used
 * code exited successfully or unsuccessfully

- CML examples:**

 * [[http://cml.svn.sourceforge.net/viewvc/cml/schema2/trunk/examples/complex/calcite1.xml?revision=161|GULP ouptut]]
 * [[http://cml.svn.sourceforge.net/viewvc/cml/schema2/trunk/examples/complex/castep2.xml?revision=161|CASTEP output (shows use of properties)]]
 * [[http://cml.svn.sourceforge.net/viewvc/cml/schema2/trunk/examples/complex/castep3.xml?revision=161|similar CASTEP output]]
 * [[http://cml.svn.sourceforge.net/viewvc/cml/schema2/trunk/examples/complex/dlpoly.xml?revision=161|DLPOLY output]]

Dictionaries

 * [[Dictionaries_examples]]
 * [[Creating_dictionaries]]

Formats for storing structured QC data

General formats

 * [[CML]]
 * [[http://www.hdfgroup.org/HDF5/|HDF5]]
 * [[http://abigrid.cineca.it/abigrid/the-docs-archive/q5cost/index_html|Q5cost]]

QC formats

 * CMLcomp: [[http://cml.sourceforge.net/schema/cmlComp/HTMLDOCS/cmlcomp.pdf|Old schema]], [[http://como.cheng.cam.ac.uk/preprints/c4e-Preprint-97.pdf|Newer preprint]]

QC codes and their datafiles' structure

Some of the codes we intend to support

 * [[DALTON]]
 * [[Gaussian]]
 * [[GAMESS]]
 * [[GAMESS-UK]]
 * [[MOLCAS]]
 * [[MOLDEN]]
 * [[MOLPRO]]
 * [[MPQC]]
 * [[MOPAC7]]
 * [[NWChem]]
 * [[ORCA]]
 * [[QChem]]
 * [[TURBOMOLE]]

A long list of quantum chemistry and solid state physics codes: http://en.wikipedia.org/wiki/List_of_quantum_chemistry_and_solid_state_physics_software

Please edit the pages that these point to. Add:

 * any home page(s)
 * examples of files
 * notes on any already existing parsers
 * person of contact

Open datasets

 * [[Pablo_Echenique's_dataset]]
 * [[cclib_test_set]]
 * [[http://www.oci.uzh.ch/group.pages/baldridge/efiles/UZH_GC3_M06_O3ADD6.tgz|Mark Monroe's dataset]]
 * [[Three_NWChem_6.0_single-points]]

Uploading and downloading

We expect a variety of approaches to upload and download as people try out different mechanisms. Here we describe the simple REST approach.

Server

We have set up a server at http://greenchain.ch.cam.ac.uk/patents/quixote/ (The "patents" is historical - we may be able to rename it). This server is wide open in all respects so please don't advertise it to spammers. We may tackle security later -if the server is hacked or damaged we simply close it and start again.

You can use a REST-based approach to:

 * upload files to server
 * download files/URLs from server
 * list files in a server "directory"
 * delete files/URLs on server

The URL structure reflects the local files hierarchy directly so I shall often simply call the webpages "files" and "directories"

The power of REST is:

 * its simplicitly
 * the library support
 * the warm feeling you get from doing something really simply that works exactly as you want.

The current HTTP commands are:

 * PUT (or possibly POST) puts a file
 * GET gets a file
 * DELETE delete a file

That's it.

REST is supported in almost ball languages (I don't know about FORTRAN and I wouldn't use it anyway for this). We shall use Java from: http://bitbucket.org/petermr/lensfieldjumbo. Check this out (if you haven't done already). The code (if you need it) is in: . The routines we shall use are:

* }

Download

Misc

 * [[http://harmful.cat-v.org/software/xml/soap/simple|The S stands for Simple]] Why PMR uses REST, not SOAP
 * [[http://rest.elkstein.org/|Learn REST: A Tutorial (by M. Elkstein)]]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly