Skip to content
Oliver Stueker edited this page May 5, 2015 · 2 revisions

Introduction to CML

Extracted from a message by Peter Murray-Rust:


CML covers a wide range of published chemistry and is informed by what people publish and communicate. It is also useful for building in-memory models. So it's very broad (probably larger in scope than MathML/Mathematica and at least as complex as GML for geo). It also requires a lot of code.

The good news is that CML is now pretty stable. Joe's work on validation has helped it converge to something that a machine can decide is or is not valid.

To simplify things for this list, there are FIVE main areas for CML of which you really only need two:

 * CMLLite (primarily molecule). Sometimes called CMLCore in the past
 * CMLComp. What we are talking about in Quixote.

...

 * CMLReact. for reactions. Not relevant unless we are computing reactions and then only a little bit
 * CMLCryst. for solid state. Again only required if we are computing solid state
 * CMLSpect. for spectra. Again only required if we are computing spectra (and then only a small part)

Note that CML also interoperates with other MLs. So for anything mathematical we use MathML, for text XHTML, for graphics SVG, for Geo (yes, we are doing atmospheric chemistry) KML or equivalent and so on... Almost everything can be done in this way while keeping CML for chemistry.

So we need to understand:

 * molecules, atoms, bonds (coordinates, charge, isotopes, stereochem, etc.)
 * computational job structure (modules)
 * numeric quantities (scalar/array/matrix convoluted with string/double/integer)
 * dictionaries and validation

Most of that is now contained within the problems we are addressing.

So at present we are learning by doing.

This has been explored 3-4 years ago by the minerals/materials community and that has also worked out well so we are in known territory.

What is new is that we have a robust infrastructure for dictionaries and a better set of tools for creating CML from legacy and a much better set of tools from the Blue Obelisk.


Conventions and validation

Extracted from a message by Peter Murray-Rust:


The latest CML spec (Schema 3) deliberately allows flexibilty in XML content models. This means, for example, that molecule/spectrum is valid and so is spectrum/molecule - both make sense. To be able to write software that reads this flexibility we need an agreement in the community. Each community has the freedom to create its own conventions, subject to the requirement that it must be valid against the (very flexible) CML Schema 3. The success of a convention is then a social, not technical, phenomenon. If group A develops a convention and B, C, D adopt it, then there is wide interoperability. If A develops a convention and B develops an alternative then there is fragmentation. It's not always a bad thing to have more than one way to do it, but it helps if there is a single approach. It's a disaster if, as now, there are 50++ different ways of experessing the same computational concept.

There are currently more-or-less well defined conventions for crystallography, solid-state CMLComp, spectra (JSpecView), Reactions (MaCiE), and CMLDictionaries. These have grown organically an do not easily interoperate. So Joe and Weerapong have been developing a compchem convention and Sam and Joe have done this for dictionaries. Recently Sebastian and colleagues have offered to explor conventions for molecular dynamics.

The basis of conventions is expressed at http://www.xml-cml.org/convention/. Joe is working on a more formal account of a convention. A typical convention (for dictionaries) is given at http://www.xml-cml.org/convention/dictionary. This is a formal specification of what must, may and should be in a CML dictionary. Joe has written a validator which can enforce the constraints expressed in the convention. Thus, for example, when I use this on some of my own dictionaries I fine they are not convention-valid and I have to edit them. This is a good example of being forced to clean up the past experiments.

A convention offers the following:

 * an announcement that an identified community cares about a sub-domain of chemistry.
 * a prose description of the scope and constraints and practice of the convention
 * a validator that determines whether a given file conforms to a convention (and where it deviates)

In addition for software developers it offers:

 * a statement as to what the components in a convention are, and how they can be combined.
 * indications of what constraints may/must/should be imposed on CML documents valid against this conventions.
 * an indication or a guarantee as to what CML components may be found in a conformant document
 * an indication of their semantics

The convention-validator can be seen as an extension to CML schema since there are many concepts which cannot be represented in Schema (e.g. "each bond must have pointers to atoms and these atoms must exist within the local scope"). Joe's validator has some hundreds of such rules and we would expect that cmlcomp would have similar. It is probable that the exciting job of writing cmlcomp-convention can be shared in the community.

There will be clashes of opinion as to what should be in conventions. This is certain! We are exploring new ground and concepts are not well defined. Software has to be written for all the rules in the convention (not necessarily all at once). Software has to be written to process conformant documents (though much exists already). Many schemas and designs - e.g. SGML - have been too ambitious to implement so if you are thinking of developing a convention make sure there are people (including yourself!) who are prepared to implement it.

In many cases the best approach is iterative. If it cannot be implemented, then it should not be in the convention. Tweaking the convention to accommodate the software is often an alternative to writing software conformant to a difficult convention. And difficulty changes with time as more tools become available.

I am addicted to the IETF mantra "Rough consensus and running code". I hope we all are.


Examples

The Examples_of_CML page has some examples.

Some very simple example CML is found in the FoX tutorial

Clone this wiki locally