Skip to content

Latest commit

 

History

History
119 lines (75 loc) · 6.53 KB

README.md

File metadata and controls

119 lines (75 loc) · 6.53 KB

Cloud factory for the accurate materials data

DOI

using the MPDS data platform, AiiDA workflows, and CRYSTAL simulation engine.

MPDS AiiDA CRYSTAL

Rationale

  • get accurate encyclopedic, reference, and benchmarking scientific data
  • get vast systematic training data for machine learning
  • use the cheap commodity cloud environment (not necessarily the HPC cluster)
  • ensure provenance tracking and reproducibility of simulations with AiiDA

Installation

The code in this repo requires the aiida-crystal-dft, yascheduler, and mpds-ml-labs Python packages installed. In their turn, they depend on the aiida, mpds_client, and other Python packages.

Thus, installation is as follows (replace pip with pip3 if needed and mind virtual env):

pip install git+https://github.com/tilde-lab/aiida-crystal-dft
pip install git+https://github.com/tilde-lab/yascheduler
pip install git+https://github.com/mpds-io/mpds-ml-labs
git clone https://github.com/mpds-io/mpds-aiida
pip install mpds-aiida/

Here some reader's AiiDA experience is assumed. Note, since the AiiDA does not support cloud environments, the custom cloud scheduler engine yascheduler should be employed. This scheduler manages the CRYSTAL simulation engine at the cloud VPS instances and encapsulates all the details, concerning the remote computer task submission, queue, and results retrieval, as well as the VPS management. This scheduler runs its own daemon and lives together with the AiiDA at the same machine. However, AiiDA considers it as a remote service, accessible via the ssh transport, so the command ssh $USER@localhost should pass. To achieve that, the reader might run e.g.:

ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ssh $USER@localhost

(Note, that the AiiDA should be aware of the ~/.ssh/id_rsa.pub key file while SSH setup!)

For simplicity the yascheduler can share the database with AiiDA. Setting up the yascheduler looks like:

vi /etc/yascheduler/yascheduler.conf
yainit
service yascheduler start

The AiiDA should be set up normally, and the stub remote computer (e.g. cluster: yascheduler), as well as the stub CRYSTAL code (e.g. codes: Pcrystal) should be added:

reentry scan
verdi setup
verdi computer setup
verdi computer configure ssh $COMPUTER
verdi computer test $COMPUTER --print-traceback
verdi code setup

Why stub? Because the computer and code management is delegated to the yascheduler, taking care of the on-demand cloud resources management.

The Gaussian basis sets used by CRYSTAL engine should be added to the AiiDA database. We download the entire basis set library from the CRYSTAL website and save some selected basis sets as *.basis files using the script scripts/bs_unito_download.py. Then, in a subfolder with the *.basis files, one runs:

verdi data crystal_dft uploadfamily --name=$BASIS_FAMILY

or, to add the internal basis sets predefined in CRYSTAL:

verdi data crystal_dft createpredefined

Then the desired name ($BASIS_FAMILY) should be used in the calculation settings inside mpds_aiida/calc_templates (see below).

Usage

The MPDS platform is the main data source for generating the simulation inputs and checking the simulation results. An access to the binary compounds data subset is free, one should login at the MPDS and get the MPDS API key:

export MPDS_KEY=...

(Please do not forget to withdraw i.e. invalidate the API key after finishing the work.)

A template system is used to control the calculation parameters, see the mpds_aiida/calc_templates subfolder. Note, that the options: resources template directive makes no sense with our custom cloud scheduler. The cluster, codes, and basis_family template directives have to be specified exactly as defined above.

The following on-demand cloud providers are currently supported (resp. yascheduler directives given in brackets):

  • Hetzner (hetzner_token, hetzner_max_nodes), API token must be issued for a project
  • Upcloud (upcloud_login, upcloud_pass, upcloud_max_nodes), API permissions are set in account settings

At the moment of writing, the chosen default Hetzner configuration (CX51) runs a test task for 2-2.5 hours on average and costs EUR 35.88 per month, the chosen default Upcloud configuration (8 cores, 4Gb memory) runs a test task for 1.5 hours on average and costs $89 per month.

More examples are given in the scripts subfolder.

An operation principle is briefly illustrated below.

General workflow

Note: this repo is subject to change and presents an ongoing work in progress.

Licensing

The resulting data are available at the MPDS platform, according to the CC BY 4.0 license.

Issues and troubleshooting

Please, report any issues in the respective repositories: aiida-crystal-dft, yascheduler, mpds-ml-labs, aiida, mpds_client, etc.

The Google Cloud machines need first to be prepared via the web-browser SSH console (note sudo -i). The file /etc/ssh/sshd_config should be changed to allow root user to log in.

The Amazon EC2 machines need first to be accessed with the admin user (note sudo -i). Then the file /root/.ssh/authorized_keys needs to be cleaned to allow root user to log in.