diff --git a/README.md b/README.md index 1706d6e..afbca56 100644 --- a/README.md +++ b/README.md @@ -16,13 +16,15 @@ and hosted as [OPTIMADE APIs](https://optimade.org), enabling enhanced data disc explorability. This prototype repository contains two Python packages that work towards this -aim. +aim, as well as example scripts for deployment. - `mc_optimade`: defines a config file format for annotating archives and registered the desired OPTIMADE entries, and a workflow for ingesting them and converting into OPTIMADE types using pre-existing parsers (e.g., ASE for structures). The archive is converted into an intermediate [OPTIMADE JSON Lines](https://github.com/Materials-Consortia/OPTIMADE/issues/471#issuecomment-1589274856) format that can be ingested into a database and used to serve a full OPTIMADE API. - `optimade_launch`: provides a platform for launching an OPTIMADE API server from such a JSON lines file. It does so using the [`optimade-python-tools`](https://github.com/Materials-Consortia/optimade-python-tools/) reference server implementation. +- `mcloud_implementation`: A set of tools and configuration used to deploy the +"archive watcher" and associated scrapers for MCA. ## Relevant links diff --git a/src/mcloud_implementation/docs/index.md b/src/mcloud_implementation/docs/index.md new file mode 100644 index 0000000..a3e82db --- /dev/null +++ b/src/mcloud_implementation/docs/index.md @@ -0,0 +1,73 @@ +# Materials Cloud Archive OPTIMADE integration + +Users can now specify that their Materials Cloud Archive (MCA) submissions be +hosted with an [OPTIMADE API](https://optimade.org), allowing structural data +(and otherwise) to be queried by OPTIMADE clients. +This makes any structural data more discoverable, as structures and their +properties will be returned alongside queries to other major [data +providers](https://www.optimade.org/providers-dashboard/), and additionally +enables future programmatic re-use of the data. +This approach has already found use in select cases where AiiDA graphs were +exported and stored by MCA, and subsequently exposed with OPTIMADE APIs, but now +the functionality can be used on many common data types such as those understood +by [ASE](https://wiki.fysik.dtu.dk/ase/) and [pymatgen](https://pymatgen.org). + +To enable this for an MCA submission, users must provide an additional config +file at the top-level of their submission, named `optimade.yml`. +The contents of this file will instruct the MCA data pipelines to ingest data +from supported formats, then create and expose a queryable database. +The full config file format, with examples, is described in the +[MCA-OPTIMADE integration GitHub repository](https://github.com/materialscloud-org/archive-optimade-integration/). + +## Example + +As a simple illustration of the functionality, let's say a user is submitting a +.zip file containing Crystallographic Information Files (CIF) describing the +outputs of some calculations, with a simple `.csv` file describing computed +properties of those crystals. + +In this case, the config file first has to describe where the structural +data can be found, e.g.,: + +```yaml +entries: + - entry_type: structures + entry_paths: + - file: structures.zip + matches: + - structures/cifs/*.cif +``` + +Here, ASE will be used to parse the CIF files, and the +[optimade-python-tools](https://github.com/Materials-Consortia/optimade-python-tools) +library will be used to construct an OPTIMADE structure object. + +The location of the computed properties can then be defined in a similar way (continuing the `entries->entry_type` block): + +```yaml +entries: + - entry_type: structures + property_paths: + - file: + matches: + - data/data.csv + - data/data2.csv +``` + +Finally, definitions for the properties found in the `.csv` files can be +configured for enhanced sharing via OPTIMADE: + +```yaml +entries: + - entry_type: structures + property_definitions: + - name: energy + title: Total energy per atom + description: The total energy per atom as computed by GGA-DFT. + unit: eV/atom + type: float +``` + +which will enable database queries over these properties, and easier re-use by other scientists. + +This full example, along with more complex examples, can be found on GitHub at [materialscloud-org/arcihve-optimade-integration](https://github.com/materialscloud-org/archive-optimade-integration/tree/main/src/mc_optimade/examples/folder_of_cifs).