Skip to content

Commit

Permalink
GRACE-1L/2L-OAM models (#202)
Browse files Browse the repository at this point in the history
* update prediction scripts

* bugfix join_grace_preds.py

* join_grace_preds.py save relaxed structures

* add GRACE-OAM's yaml files

* tweak test_grace_discovery.py + join_grace_preds.py

- fix outdated phonondb_pbe_103_structures URL in data-files.yml
- simplify file path handling in hpc.py and test_hpc.py

* update GRACE model YAML files with pred_file paths and pr_url

- rename grace2l-r6.yml to grace-2L-mptrj.yml
- rename GRACE-(1|2)L-OAM model YAML files to grace-(1|2)l-oam.yml
- update Model enum in data.py to include new GRACE model paths

* rename metrics.geo_opt.pred_col to struct_col in model YAML files

* analyze_geo_opt.py add CLI

- with argparse for passing models, symprec values to analyze and debug mode
- Add example usage in script docstring

* add discovery metrics to grace-1L-oam.yml and grace-2L-oam.yml

* calc both MP-corrected and uncorrected e_form_per_atom in models/grace/join_grace_preds.py

* contributing.md update recommended model artifact names

* add parallel processing to analyze_geo_opt.py and improve error handling

- uses ProcessPoolExecutor
- add CLI option for specifying number of worker processes
- improve logging and progress tracking

* add geo_opt metrics to GRACE Omat L1+L2 model YAMLs

- add figshare pred_file_url for geo_opt, discovery and kappa_103 in grace-1L-oam.yml and grace-2L-oam.yml

---------

Co-authored-by: Yury Lysogorskiy <yura.lysogorskii@gmail.com>
Co-authored-by: Janosh Riebesell <janosh.riebesell@gmail.com>
  • Loading branch information
3 people authored Feb 7, 2025
1 parent 3516d23 commit 7a0f8c8
Show file tree
Hide file tree
Showing 33 changed files with 710 additions and 213 deletions.
18 changes: 12 additions & 6 deletions contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,24 +18,30 @@ There's also a [PyPI package](https://pypi.org/project/matbench-discovery) for f

To submit a new model to this benchmark and add it to our leaderboard, please create a pull request to the [`main` branch][repo] that includes at least these 3 required files:

1. `<yyyy-mm-dd>-<model_name>-preds.csv.gz`: Your model's energy predictions for all ~250k WBM compounds as compressed CSV. The recommended way to create this file is with `pandas.DataFrame.to_csv("<yyyy-mm-dd>-<model_name>-wbm-IS2RE.csv.gz")`. See e.g. [`test_mace_discovery`](https://github.com/janosh/matbench-discovery/blob/-/models/mace/test_mace_discovery.py) for code that generates this file.
1. `<yyyy-mm-dd>-<model_name>-preds.csv.gz`: Your model's energy predictions for all ~250k WBM compounds as compressed CSV. The recommended way to create this file is with `pandas.DataFrame.to_csv("<yyyy-mm-dd>-wbm-IS2RE.csv.gz")`. See e.g. [`test_mace_discovery`](https://github.com/janosh/matbench-discovery/blob/-/models/mace/test_mace_discovery.py) for code that generates this file.

### Sharing Model Prediction Files

You should share your model's predictions through a cloud storage service (e.g. Figshare, Zenodo, Google Drive, Dropbox, AWS, etc.) and include the download links in your PR description. Your cloud storage directory should contain:
You should share your model's predictions through a cloud storage service (e.g. Figshare, Zenodo, Google Drive, Dropbox, AWS, etc.) and include the download links in your PR description. Your cloud storage directory should contain files with the following naming convention: `<arch-name>/<model-variant>/<yyyy-mm-dd>-<eval-task>.{csv.gz|json.gz}`. For example, a in the case of MACE-MP-0, the file paths would be:

1. `<yyyy-mm-dd>-<model_name>-wbm-geo-opt.json.gz`: The model's relaxed structures as compressed JSON containing:
- geometry optimization: `mace/mace-mp-0/2023-12-11-wbm-IS2RE-FIRE.json.gz`
- discovery: `mace/mace-mp-0/2023-12-11-wbm-IS2RE.csv.gz`
- phonons: `mace/mace-mp-0/2024-11-09-kappa-103-FIRE-dist=0.01-fmax=1e-4-symprec=1e-5.json.gz`

The files should contain the following information:

1. `<arch-name>/<model-variant>/<yyyy-mm-dd>-wbm-geo-opt-<optimizer>.json.gz`: The model's relaxed structures as compressed JSON containing:

- Final relaxed structures (as ASE `Atoms` or pymatgen `Structures`)
- Final energies (eV), forces (eV/Å), stress (eV/ų) and volume (ų)
- Material IDs matching the WBM test set

2. `<yyyy-mm-dd>-<model_name>-wbm-IS2RE.csv.gz`: A compressed CSV file with:
2. `<arch-name>/<model-variant>/<yyyy-mm-dd>-wbm-IS2RE.csv.gz`: A compressed CSV file with:

- Material IDs matching the WBM test set
- Final formation energies per atom (eV/atom)

3. `<yyyy-mm-dd>-<model_name>-wbm-kappa.json.gz`: A compressed JSON file with:
3. `<arch-name>/<model-variant>/<yyyy-mm-dd>-kappa-103-<values-of-dist|fmax|symprec>.json.gz`: A compressed JSON file with:
- Material IDs matching the WBM test set
- Predicted thermal conductivity (κ) values (W/mK)

Expand Down Expand Up @@ -78,7 +84,7 @@ To submit a new model to this benchmark and add it to our leaderboard, please cr
df_traj.to_csv("trajectory.csv.gz") # Save final structure and trajectory data
```

1. `test_<model_name>_discovery.(py|ipynb)`: The Python script that generated the WBM final energy predictions given the initial (unrelaxed) DFT structures. Ideally, this file should have comments explaining at a high level what the code is doing and how the model works so others can understand and reproduce your results. If the model deployed on this benchmark was trained specifically for this purpose (i.e. if you wrote any training/fine-tuning code while preparing your PR), please also include it as `train_<model_name>.(py|ipynb)`.
1. `test_<model_name>_discovery.py`: The Python script that generated the WBM final energy predictions given the initial (unrelaxed) DFT structures. Ideally, this file should have comments explaining at a high level what the code is doing and how the model works so others can understand and reproduce your results. If the model deployed on this benchmark was trained specifically for this purpose (i.e. if you wrote any training/fine-tuning code while preparing your PR), please also include it as `train_<model_name>.py`.
1. `<model_name.yml>`: A file to record all relevant metadata of your algorithm like model name and version, authors, package requirements, links to publications, notes, etc. Here's a template:

```yml
Expand Down
2 changes: 1 addition & 1 deletion matbench_discovery/data-files.yml
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ wbm_dft_geo_opt_symprec_1e_5:
md5:

phonondb_pbe_103_structures:
url: https://figshare.com/ndownloader/files/51680888
url: https://figshare.com/ndownloader/files/52179965
path: phonons/2024-11-09-phononDB-PBE-103-structures.extxyz
description: 103 phononDB structures run by Togo with PBE settings received in private communication. See https://github.com/atztogo/phonondb/blob/bba206/README.md#url-links-to-phono3py-finite-displacement-method-inputs-of-103-compounds-on-mdr-at-nims-pbe for details.
md5: a396d4c517fa6d57defeffc6c83f0118
Expand Down
4 changes: 3 additions & 1 deletion matbench_discovery/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -472,7 +472,9 @@ class Model(Files, base_dir=f"{ROOT}/models"):
eqv2_s_dens = "eqV2/eqV2-s-dens-mp.yml"
eqv2_m = "eqV2/eqV2-m-omat-mp-salex.yml"

grace2l_r6 = "grace/grace2l-r6.yml"
grace_2l_mptrj = "grace/grace-2L-mptrj.yml"
grace_2l_oam = "grace/grace-2L-oam.yml"
grace_1l_oam = "grace/grace-1L-oam.yml"

# --- Model Combos
# # CHGNet-relaxed structures fed into MEGNet for formation energy prediction
Expand Down
2 changes: 1 addition & 1 deletion matbench_discovery/hpc.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ def slurm_submit(
# Copy the file to a temporary directory if submit_as_temp_file is True
if submit_as_temp_file and SLURM_SUBMIT_KEY in sys.argv:
temp_dir = tempfile.mkdtemp(prefix="slurm_job_")
temp_file_path = os.path.join(temp_dir, os.path.basename(py_file_path))
temp_file_path = f"{temp_dir}/{os.path.basename(py_file_path)}"
shutil.copy2(py_file_path, temp_file_path)
py_file_path = temp_file_path

Expand Down
27 changes: 13 additions & 14 deletions matbench_discovery/metrics/geo_opt.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ def write_geo_opt_metrics_to_yaml(
round_trip_yaml.dump(model_metadata, file)


def calc_geo_opt_metrics(df_model_analysis: pd.DataFrame) -> pd.DataFrame:
def calc_geo_opt_metrics(df_model_analysis: pd.DataFrame) -> dict[str, float]:
"""Calculate geometry optimization metrics for a single model.
Args:
Expand All @@ -81,22 +81,21 @@ def calc_geo_opt_metrics(df_model_analysis: pd.DataFrame) -> pd.DataFrame:
model_name (str): Name of the model being analyzed.
Returns:
pd.DataFrame: DataFrame with geometry optimization metrics.
Shape = (1, n_metrics). Columns include:
- structure_rmsd_vs_dft: Mean RMSD between predicted and DFT structures
- n_sym_ops_mae: Mean absolute error in number of symmetry operations
- symmetry_decrease: Fraction of structures with decreased symmetry
- symmetry_match: Fraction of structures with matching symmetry
- symmetry_increase: Fraction of structures with increased symmetry
- n_structs: Number of structures evaluated
dict[str, float]: Geometry optimization metrics with keys:
- structure_rmsd_vs_dft: Mean RMSD between predicted and DFT structures
- n_sym_ops_mae: Mean absolute error in number of symmetry operations
- symmetry_decrease: Fraction of structures with decreased symmetry
- symmetry_match: Fraction of structures with matching symmetry
- symmetry_increase: Fraction of structures with increased symmetry
- n_structs: Number of structures evaluated
"""
# Get relevant columns
spg_diff = df_model_analysis[MbdKey.spg_num_diff]
n_sym_ops_diff = df_model_analysis[MbdKey.n_sym_ops_diff]
rmsd = df_model_analysis[MbdKey.structure_rmsd_vs_dft]

# Count total number of structures (excluding NaN values)
total = len(spg_diff.dropna())
n_structs = len(spg_diff.dropna())

# Calculate RMSD and MAE metrics
mean_rmsd = rmsd.mean()
Expand All @@ -112,8 +111,8 @@ def calc_geo_opt_metrics(df_model_analysis: pd.DataFrame) -> pd.DataFrame:
return {
str(MbdKey.structure_rmsd_vs_dft): float(mean_rmsd),
str(Key.n_sym_ops_mae): float(sym_ops_mae),
str(Key.symmetry_decrease): float(sym_decreased.sum() / total),
str(Key.symmetry_match): float(sym_matched.sum() / total),
str(Key.symmetry_increase): float(sym_increased.sum() / total),
str(Key.n_structures): total,
str(Key.symmetry_decrease): float(sym_decreased.sum() / n_structs),
str(Key.symmetry_match): float(sym_matched.sum() / n_structs),
str(Key.symmetry_increase): float(sym_increased.sum() / n_structs),
str(Key.n_structures): n_structs,
}
22 changes: 15 additions & 7 deletions matbench_discovery/structure.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ def perturb_structure(struct: Structure, gamma: float = 1.5) -> Structure:
def analyze_symmetry(
structures: dict[str, Structure],
*,
pbar: bool | dict[str, str] = True,
pbar: bool | dict[str, str | float | bool] = True,
symprec: float = 1e-2,
angle_tolerance: float | None = None,
) -> pd.DataFrame:
Expand All @@ -52,8 +52,8 @@ def analyze_symmetry(
Args:
structures (dict[str, Structure | Atoms]): Map of material IDs to pymatgen
Structures or ASE Atoms objects
pbar (bool | dict[str, str], optional): Whether to show progress bar.
Defaults to True.
pbar (bool | dict[str, str | float | bool], optional): Whether to show progress
bar. Defaults to True.
symprec (float, optional): Symmetry precision of moyopy. Defaults to 1e-2.
angle_tolerance (float, optional): Angle tolerance of moyopy (in radians unlike
spglib which uses degrees!). Defaults to None.
Expand Down Expand Up @@ -121,7 +121,7 @@ def pred_vs_ref_struct_symmetry(
pred_structs: dict[str, Structure],
ref_structs: dict[str, Structure],
*,
pbar: bool | dict[str, str] = True,
pbar: bool | dict[str, str | float | bool] = True,
) -> pd.DataFrame:
"""Get RMSD and compare symmetry between ML and DFT reference structures.
Expand All @@ -135,12 +135,17 @@ def pred_vs_ref_struct_symmetry(
analyze_symmetry.
pred_structs (dict[str, Structure]): Map material IDs to ML-relaxed structures
ref_structs (dict[str, Structure]): Map material IDs to reference structures
pbar (bool | dict[str, str], optional): Whether to show progress bar.
Defaults to True.
pbar (bool | dict[str, str | float | bool], optional): Whether to show progress
bar. Defaults to True.
Returns:
pd.DataFrame: with added columns for symmetry differences
"""
if df_sym_ref.index.name != Key.mat_id:
raise ValueError(f"{df_sym_ref.index.name=} must be {Key.mat_id!s}")
if df_sym_pred.index.name != Key.mat_id:
raise ValueError(f"{df_sym_pred.index.name=} must be {Key.mat_id!s}")

df_result = df_sym_pred.copy()

# Calculate differences
Expand All @@ -150,7 +155,10 @@ def pred_vs_ref_struct_symmetry(
)

structure_matcher = StructureMatcher()
shared_ids = set(pred_structs) & set(ref_structs)
ref_ids, pred_ids = set(ref_structs), set(pred_structs)
shared_ids = ref_ids & pred_ids
if len(shared_ids) == 0:
raise ValueError(f"No shared IDs between:\n{pred_ids=}\n{ref_ids=}")

# Initialize RMSD column
df_result[MbdKey.structure_rmsd_vs_dft] = None
Expand Down
2 changes: 1 addition & 1 deletion models/bowsr/bowsr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ metrics:
geo_opt:
pred_file: models/bowsr/bowsr-megnet/2023-01-23-wbm-geo-opt.json.gz
pred_file_url: https://figshare.com/ndownloader/files/52061984
pred_col: structure_bowsr_megnet
struct_col: structure_bowsr_megnet
symprec=1e-5:
rmsd: 0.043 # Å
symmetry_decrease: 0.0037 # fraction
Expand Down
2 changes: 1 addition & 1 deletion models/chgnet/chgnet.yml
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ metrics:
geo_opt:
pred_file: models/chgnet/chgnet-0.3.0/2023-12-21-wbm-geo-opt.json.gz
pred_file_url: https://figshare.com/ndownloader/files/52061999
pred_col: chgnet_structure
struct_col: chgnet_structure
symprec=1e-5:
rmsd: 0.0216 # Å
symmetry_decrease: 0.2526 # fraction
Expand Down
2 changes: 1 addition & 1 deletion models/deepmd/dpa3-v1-mptrj.yml
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ metrics:
pred_file_url: https://figshare.com/ndownloader/files/52134860
geo_opt:
pred_file: models/deepmd/dpa3-v1-mptrj/2025-01-10-wbm-geo-opt.json.gz
pred_col: dp_structure
struct_col: dp_structure
pred_file_url: https://figshare.com/ndownloader/files/52134974
symprec=1e-5:
analysis_file: models/deepmd/dpa3-v1-mptrj/2025-01-10-wbm-geo-opt-symprec=1e-5.csv.gz
Expand Down
2 changes: 1 addition & 1 deletion models/deepmd/dpa3-v1-openlam.yml
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ metrics:
pred_file_url: https://figshare.com/ndownloader/files/52134863
geo_opt:
pred_file: models/deepmd/dpa3-v1-openlam/2025-01-10-wbm-geo-opt.json.gz
pred_col: dp_structure
struct_col: dp_structure
pred_file_url: https://figshare.com/ndownloader/files/52135358
symprec=1e-5:
analysis_file: models/deepmd/dpa3-v1-openlam/2025-01-10-wbm-geo-opt-symprec=1e-5.csv.gz
Expand Down
2 changes: 1 addition & 1 deletion models/eqV2/eqV2-m-omat-mp-salex.yml
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ metrics:
geo_opt:
pred_file: models/eqV2/eqV2-m-omat-mp-salex/2024-10-18-wbm-geo-opt.json.gz
pred_file_url: https://figshare.com/ndownloader/files/51607436
pred_col: eqV2-86M-omat-mp-salex_structure
struct_col: eqV2-86M-omat-mp-salex_structure
symprec=1e-5:
rmsd: 0.0138 # Å
n_sym_ops_mae: 10.0558 # unitless
Expand Down
2 changes: 1 addition & 1 deletion models/eqV2/eqV2-s-dens-mp.yml
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ metrics:
geo_opt:
pred_file: models/eqV2/eqV2-s-dens-mp/2024-10-18-wbm-geo-opt.json.gz
pred_file_url: https://figshare.com/ndownloader/files/52062392
pred_col: eqV2-31M-dens-MP-p5_structure
struct_col: eqV2-31M-dens-MP-p5_structure
symprec=1e-5:
rmsd: 0.0138 # Å
n_sym_ops_mae: 10.0558 # unitless
Expand Down
139 changes: 139 additions & 0 deletions models/grace/grace-1L-oam.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
model_name: GRACE-1L-OAM
model_key: grace-1L-oam
model_version: GRACE-1L-OAM_2Feb25
matbench_discovery_version: 1.2.0
date_added: "2025-02-06"
date_published: "2025-02-06"
authors:
- name: Anton Bochkarev
affiliation: ICAMS, Ruhr University Bochum
email: anton.bochkarev@rub.de
- name: Yury Lysogorskiy
affiliation: ICAMS, Ruhr University Bochum
email: yury.lysogorskiy@rub.de
- name: Ralf Drautz
affiliation: ICAMS, Ruhr University Bochum
email: ralf.drautz@rub.de
trained_by:
- name: Yury Lysogorskiy
affiliation: ICAMS, Ruhr University Bochum
email: yury.lysogorskiy@rub.de
- name: Anton Bochkarev
affiliation: ICAMS, Ruhr University Bochum
email: anton.bochkarev@rub.de

repo: https://github.com/ICAMS/grace-tensorpotential
doi: https://doi.org/10.1103/PhysRevX.14.021036
paper: https://journals.aps.org/prx/abstract/10.1103/PhysRevX.14.021036
url: https://gracemaker.readthedocs.io/en/latest/gracemaker/foundation
pr_url: https://github.com/janosh/matbench-discovery/pull/202

requirements:
tensorpotential: 0.4.5
tensorflow: 2.16.2
ase: 3.23.0
pymatgen: 2023.7.14
numpy: 1.26.4

openness: OSOD
trained_for_benchmark: true
train_task: S2EF
test_task: IS2RE-SR
targets: EFS_G
model_type: UIP
model_params: 3_447_148
n_estimators: 1

training_set: [OMat24, sAlex, MPtrj]

hyperparams:
max_force: 0.03
max_steps: 500
ase_optimizer: FIRE
radial_cutoff: 6.0

metrics:
phonons:
kappa_103:
κ_SRME: 0.516 # https://github.com/MPA2suite/k_SRME/pull/20
pred_file: models/grace/grace-1L-oam/2025-02-02-kappa-103-FIRE-dist=0.01-fmax=1e-4-symprec=1e-5.json.gz
pred_file_url: https://figshare.com/ndownloader/files/52204910
geo_opt:
pred_file: models/grace/grace-1L-oam/2025-02-02-wbm-geo-opt.json.gz
pred_file_url: https://figshare.com/ndownloader/files/52204904
struct_col: grace_structure
symprec=1e-5:
rmsd: 0.0139 # Å
n_sym_ops_mae: 1.8528 # unitless
symmetry_decrease: 0.0329 # fraction
symmetry_match: 0.7336 # fraction
symmetry_increase: 0.2292 # fraction
n_structures: 256963 # count
symprec=1e-2:
rmsd: 0.0139 # Å
n_sym_ops_mae: 1.8157 # unitless
symmetry_decrease: 0.0576 # fraction
symmetry_match: 0.814 # fraction
symmetry_increase: 0.1216 # fraction
n_structures: 256963 # count
discovery:
pred_file: models/grace/grace-1L-oam/2025-02-02-wbm-IS2RE.csv.gz
pred_file_url: https://figshare.com/ndownloader/files/52204898
pred_col: e_form_per_atom_grace
full_test_set:
F1: 0.808 # fraction
DAF: 4.617 # dimensionless
Precision: 0.792 # fraction
Recall: 0.824 # fraction
Accuracy: 0.933 # fraction
TPR: 0.824 # fraction
FPR: 0.045 # fraction
TNR: 0.955 # fraction
FNR: 0.176 # fraction
TP: 36331.0 # count
FP: 9524.0 # count
TN: 203347.0 # count
FN: 7761.0 # count
MAE: 0.03 # eV/atom
RMSE: 0.073 # eV/atom
R2: 0.836 # dimensionless
missing_preds: 2 # count
missing_percent: 0.00% # fraction
most_stable_10k:
F1: 0.962 # fraction
DAF: 6.063 # dimensionless
Precision: 0.927 # fraction
Recall: 1.0 # fraction
Accuracy: 0.927 # fraction
TPR: 1.0 # fraction
FPR: 1.0 # fraction
TNR: 0.0 # fraction
FNR: 0.0 # fraction
TP: 9269.0 # count
FP: 731.0 # count
TN: 0.0 # count
FN: 0.0 # count
MAE: 0.035 # eV/atom
RMSE: 0.087 # eV/atom
R2: 0.843 # dimensionless
missing_preds: 0 # count
missing_percent: 0.00% # fraction
unique_prototypes:
F1: 0.824 # fraction
DAF: 5.255 # dimensionless
Precision: 0.803 # fraction
Recall: 0.846 # fraction
Accuracy: 0.944 # fraction
TPR: 0.846 # fraction
FPR: 0.038 # fraction
TNR: 0.962 # fraction
FNR: 0.154 # fraction
TP: 28244.0 # count
FP: 6917.0 # count
TN: 175197.0 # count
FN: 5130.0 # count
MAE: 0.031 # eV/atom
RMSE: 0.073 # eV/atom
R2: 0.842 # dimensionless
missing_preds: 0 # count
missing_percent: 0.00% # fraction
Loading

0 comments on commit 7a0f8c8

Please sign in to comment.