GitHub - omron-sinicx/crystalformer: The official code respository for "Crystalformer: Infinitely Connected Attention for Periodic Structure Encoding" (ICLR 2024)

Crystalformer: Infinitely Connected Attention for Periodic Structure Encoding
Tatsunori Taniai, Ryo Igarashi, Yuta Suzuki, Naoya Chiba, Kotaro Saito, Yoshitaka Ushiku, and Kanta Ono
In The Twelfth International Conference on Learning Representations (ICLR 2024)

[Paper] [Project]

Citation

@inproceedings{taniai2024crystalformer,
  title     = {Crystalformer: Infinitely Connected Attention for Periodic Structure Encoding},
  author    = {Tatsunori Taniai and 
               Ryo Igarashi and 
               Yuta Suzuki and 
               Naoya Chiba and 
               Kotaro Saito and 
               Yoshitaka Ushiku and 
               Kanta Ono
               },
  booktitle = {The Twelfth International Conference on Learning Representations},
  year      = {2024},
  url       = {https://openreview.net/forum?id=fxQiecl9HB}
}

Setup a Docker environment

cd docker/pytorch21_cuda121
docker build -t main/crystalformer:latest .
docker run --gpus=all --name crystalformer --shm-size=2g -v ../../:/workspace -it main/crystalformer:latest /bin/bash

Prepare datasets

In the docker container:

cd /workspace/data
python download_megnet_elastic.py
python downlad_jarvis.py    # This step may take 1 hour and can be skipped for simple testing.

Testing

Download pretrained weights: [GoogleDrive] In the `/workspace' directory in the docker container:

unzip weights.zip
. demo.sh

Currently, pretrained models for MEGNET's bandgap and e_form with 4 or 7 attention blocks are available.

Training

Single GPU Training

In the `/workspace' directory in the docker container:

CUDA_VISIBLE_DEVICES=0 python train.py -p latticeformer/default.json \
    --save_path result \
    --n_epochs 500 \
    --experiment_name demo \
    --num_layers 4 \
    --value_pe_dist_real 64 \
    --target_set jarvis__megnet-shear \
    --targets shear \
    --batch_size 128 \
    --lr 0.0005 \
    --model_dim 128 \
    --embedding_dim 128 \

Setting --value_pe_dist_real 0 yields the "simplified model" in the paper.

Multiple GPU Training

In the `/workspace' directory in the docker container:

CUDA_VISIBLE_DEVICES=0,1 python train.py -p latticeformer/default.json \
    --save_path result \
    --n_epochs 500 \
    --experiment_name demo \
    --num_layers 4 \
    --value_pe_dist_real 64 \
    --target_set jarvis__megnet-shear \
    --targets shear \
    --batch_size 128 \
    --lr 0.0005 \
    --model_dim 128 \
    --embedding_dim 128 \

Currently, the throughput gain by multi-gpu training is limited. Suggest 2 or 4 GPUs at most.

Datasets and Targets

target_set	targets	Unit	train	val	test
jarvis__megnet	e_form	eV/atom	60000	5000	4239
jarvis__megnet	bandgap	eV	60000	5000	4239
jarvis__megnet-bulk	bulk	log(GPA)	4664	393	393
jarvis__megnet-shear	shear	log(GPA)	4664	392	393
dft_3d_2021	formation_energy	eV/atom	44578	5572	5572
dft_3d_2021	total_energy	eV/atom	44578	5572	5572
dft_3d_2021	opt_bandgap	eV	44578	5572	5572
dft_3d_2021-mbj_bandgap	mbj_bandgap	eV	14537	1817	1817
dft_3d_2021-ehull	ehull	eV	44296	5537	5537

Use the following hyperparameters:

For the jarvis__megnet datasets: --n_epochs 500 --batch_size 128
For the dft_3d_2021-mbj_bandgap dataset: --n_epochs 1600 --batch_size 256
For the other dft_3d_2021 datasets: --n_epochs 800 --batch_size 256

Hyperparameters

General training hyperparameters:

n_epochs (int): The number of training epochs.
batch_size (int): The batch size (i.e., the number of materials per training step).
loss_func (L1|MSE|Smooth_L1): The regression loss function form.
optimizer (adamw|adam|): The choice of optimizer.
adam_betas (floats): beta1 and beta2 of Adam and AdamW.
lr (float): The initial learning rate. The default setting (5e-4) works mostly the best.
lr_sch (inverse_sqrt_nowarmup|const): The learning rate schedule. inverse_sqrt_nowarmup sets learning rate to lr*sqrt(t/(t+T)) where T is specified by sch_params. const uses a constant learning rate lr.

Final MLP's hyperparameters:

embedding_dim (ints): The intermediate dims of the final MLP after pooling, defining Pooling-Repeat[Linear-ReLU]-FinalLinear. The default setting (128) defines Pooling-Linear(128)-ReLU-FinalLinear(1).
norm_type (no|bn): Whether or not use BatchNorm in MLP.

Transformer's hyperparameters:

num_layers (int): The number of self-attention blocks. Should be 4 or higher.
model_dim (int): The feature dimension of Transformer.
ff_dim (int): The intermediate feature dimension of the feed-forward networks in Transformer.
head_num (int): The number of heads of multi-head attention (HMA).

Crystalformer's hyperparameters.

scale_real (float or floats): "r_0" in the paper. (Passing multiple values allows different settings for individual attention blocks.)
gauss_lb_real (float): The bound "b" for the rho function in the paper.
value_pe_dist_real (int): The number of radial basis functions (i.e., edge feature dim "K" in the paper). Should be a multiple of 16.
value_pe_dist_max (float): "r_max" in the paper. A positive value directly specifies r_max in Å, while a negative value specifies r_max via r_max = (-value_pe_dist_max)*scale_real.
domain (real|multihead|real-reci): Whether use reciprocal-space attention by parallel MHA (multihead) or block-wisely interleaving between real and reciprocal space (real-reci). When reciprocal-space attention is used, scale_reci and gauss_lb_reci can also be specified.

Use a custom dataset

For each of train, val, and test splits, make a list of dicts containing pymatgen's Structures and label values:

list
- dict
  - 'structure': pymatgen.core.structure.Structure
  - 'property1': a float value of propety1 of this structure
  - 'property2': a float value of propety2 of this structure
  - ...

Dump the list of each split in a directory with your dataset name as

import os
import pickle

target_set = 'your_dataset_name'
split = 'train' # or 'val' or 'test'

os.makedirs(f'data/{target_set}/{split}/raw', exist_ok=True)
with open(f'data/{target_set}/{split}/raw/raw_data.pkl', mode="wb") as fp:
    pickle.dump(data_list, fp)

Then, you can specify your dataset and its target property name as

python train -p latticeformer/default.json \
  --target_set your_dataset_name \
  --targets [property1|property2] \

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
dataloaders		dataloaders
docker/pytorch21_cuda121		docker/pytorch21_cuda121
losses		losses
models		models
params/latticeformer		params/latticeformer
LICENSE		LICENSE
README.md		README.md
demo.py		demo.py
demo.sh		demo.sh
init_datasets.py		init_datasets.py
train.py		train.py
train.sh		train.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Citation

Setup a Docker environment

Prepare datasets

Testing

Training

Single GPU Training

Multiple GPU Training

Datasets and Targets

Hyperparameters

Use a custom dataset

About

Releases

Packages

Languages

License

omron-sinicx/crystalformer

Folders and files

Latest commit

History

Repository files navigation

Citation

Setup a Docker environment

Prepare datasets

Testing

Training

Single GPU Training

Multiple GPU Training

Datasets and Targets

Hyperparameters

Use a custom dataset

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages