Skip to content

Commit 41ccd26

Browse files
authored
Remove markdown preprint, instead redirect to updated arXiv (#176)
* move important figures from /preprint to landing page via new landing-page-figs.md to use markdown syntax - Adjust margins in cumulative_metrics.py for improved layout. - better hull_dist_box_plot.py by matching x-axis label colors to the box plot colors for clarity * remove outdated text and files in site/src/routes/preprint, instead redirect from site to arXiv - contributing.md and readme.md update links to the preprint on arXiv - svelte.config.js render markdown fig ID refs for landing-page-figs.md
1 parent 85f65b6 commit 41ccd26

27 files changed

+113
-3541
lines changed

contributing.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -246,7 +246,7 @@ And you're done! Once tests pass and the PR is merged, your model will be added
246246
- the exact code in the script that launched the run, and
247247
- which versions of dependencies were installed in the environment your model ran in.
248248

249-
This information can be useful for others looking to reproduce your results or compare their model to yours i.t.o. computational cost. We therefore strongly recommend tracking all runs that went into a model submission with WandB so that the runs can be copied over to our WandB project at <https://wandb.ai/janosh/matbench-discovery> for everyone to inspect. This also allows us to include your model in more detailed analysis (see [SI](https://matbench-discovery.materialsproject.org/preprint#supplementary-information)).
249+
This information can be useful for others looking to reproduce your results or compare their model to yours i.t.o. computational cost. We therefore strongly recommend tracking all runs that went into a model submission with WandB so that the runs can be copied over to our WandB project at <https://wandb.ai/janosh/matbench-discovery> for everyone to inspect. This also allows us to include your model in more detailed analysis (see the SI in the [preprint](https://arxiv.org/abs/2308.14920)).
250250

251251
## 😵‍💫 &thinsp; Troubleshooting
252252

data/wbm/readme.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -142,6 +142,6 @@ The MP training set consists of 154,719 `ComputedStructureEntries`
142142

143143
## 📊 &thinsp; Symmetry Statistics
144144

145-
These sunburst diagrams show the spacegroup distribution of MP on the left and WBM on the right. Both have good coverage of all 7 crystal systems, the only exception being triclinic crystals which are just 1% of WBM but well represented in MP (15%). The 3 largest systems in MP are monoclinic, orthorhombic and triclinic vs orthorhombic, tetragonal and cubic in WBM. So WBM structures have overall higher symmetry which can benefit some models more than others. Wrenformer in particular uses symmetries as a coarse-grained description of the underlying structure. Its representations basically degrades to composition only on symmetry-less P1 structures. Given this spacegroup distribution, it should fare well on the WBM test set. The fact that Wrenformer is still outperformed by all interatomic potentials and some single-shot GNNs indicates the underlying methodology is unable to compete. See [SI](/preprint#spacegroup-prevalence-in-wrenformer-failure-cases) for a specific Wrenformer failure case.
145+
These sunburst diagrams show the spacegroup distribution of MP on the left and WBM on the right. Both have good coverage of all 7 crystal systems, the only exception being triclinic crystals which are just 1% of WBM but well represented in MP (15%). The 3 largest systems in MP are monoclinic, orthorhombic and triclinic vs orthorhombic, tetragonal and cubic in WBM. So WBM structures have overall higher symmetry which can benefit some models more than others. Wrenformer in particular uses symmetries as a coarse-grained description of the underlying structure. Its representations basically degrades to composition only on symmetry-less P1 structures. Given this spacegroup distribution, it should fare well on the WBM test set. The fact that Wrenformer is still outperformed by all interatomic potentials and some single-shot GNNs indicates the underlying methodology is unable to compete.
146146

147147
<slot name="spacegroup-sunbursts" />

readme.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,6 @@ If you'd like to refer to Matbench Discovery in a publication, please cite the [
2929
3030
We welcome new models additions to the leaderboard through GitHub PRs. See the [contributing guide](https://janosh.github.io/matbench-discovery/contribute) for details and ask support questions via [GitHub discussion](https://github.com/janosh/matbench-discovery/discussions).
3131

32-
For detailed results and analysis, check out the [preprint](https://janosh.github.io/matbench-discovery/preprint).
32+
For detailed results and analysis, check out the [preprint](https://arxiv.org/abs/2308.14920).
3333

3434
> Disclaimer: We evaluate how accurately ML models predict solid-state thermodynamic stability. Although this is an important aspect of high-throughput materials discovery, the ranking cannot give a complete picture of a model's general applicability to materials. A high ranking does not constitute endorsement by the Materials Project.

scripts/analyze_model_failure_cases.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,7 @@
107107

108108
fig.show()
109109

110-
# pmv.save_fig(fig, f"{FIGS}/hist-largest-each-errors-fp-diff-models.svelte")
110+
# pmv.save_fig(fig, f"{SITE_FIGS}/hist-largest-each-errors-fp-diff-models.svelte")
111111

112112

113113
# %%

scripts/model_figs/cumulative_metrics.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@
5959
for key in filter(lambda key: key.startswith("yaxis"), fig.layout):
6060
fig.layout[key].range = range_y
6161

62-
fig.layout.margin.update(l=60, r=10, t=30, b=60)
62+
fig.layout.margin.update(l=0, r=0, t=20, b=0)
6363
# use annotation for x-axis label
6464
fig.add_annotation(
6565
**dict(x=0.5, y=-0.15, xref="paper", yref="paper"),

scripts/model_figs/hull_dist_box_plot.py

Lines changed: 16 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
# %%
2+
import plotly.express as px
23
import plotly.graph_objects as go
34
import pymatviz as pmv
45

@@ -30,32 +31,37 @@
3031
fig.layout.yaxis.title = Quantity.e_above_hull_error
3132
fig.layout.margin = dict(l=0, r=0, b=0, t=0)
3233

34+
# Get the default Plotly colors that will be used for the boxes
35+
color_seq = px.colors.qualitative.Plotly
36+
3337
for idx, model in enumerate(models_to_plot):
3438
ys = [df_each_err[model].quantile(quant) for quant in (0.05, 0.25, 0.5, 0.75, 0.95)]
3539

36-
fig.add_box(y=ys, name=model, width=0.8)
40+
# Use the same color for both box and label
41+
color = color_seq[idx % len(color_seq)]
42+
fig.add_box(y=ys, name=model, width=0.8, marker_color=color)
3743

3844
# annotate median with numeric value
3945
median = ys[2]
4046
fig.add_annotation(
41-
x=idx,
42-
y=median,
43-
text=f"{median:.2}",
44-
showarrow=False,
45-
# bgcolor="rgba(0, 0, 0, 0.2)",
47+
x=idx, y=median, text=f"{median:.2}", showarrow=False, font_size=9
4648
)
4749

4850
fig.layout.showlegend = False
49-
# use line breaks to offset every other x-label
51+
# use line breaks to offset every other x-label and color them
5052
x_labels_with_offset = [
51-
f"{'<br>' * (idx % 2)}{label}" for idx, label in enumerate(models_to_plot)
53+
f"{'<br>' * (idx % 3)}<span style='color: {color_seq[idx % len(color_seq)]}'>"
54+
f"{label}</span>"
55+
for idx, label in enumerate(models_to_plot)
5256
]
57+
5358
# prevent x-labels from rotating
5459
fig.layout.xaxis.range = [-0.7, len(models_to_plot) - 0.3]
5560
fig.layout.xaxis.update(
56-
tickangle=0, tickvals=models_to_plot, ticktext=x_labels_with_offset
61+
tickangle=0,
62+
tickvals=models_to_plot,
63+
ticktext=x_labels_with_offset,
5764
)
58-
fig.layout.width = 70 * len(models_to_plot)
5965
fig.show()
6066

6167

scripts/per_element_errors.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -183,7 +183,7 @@
183183
fig.layout.legend.update(x=1, y=1, xanchor="right", yanchor="top", title="")
184184
fig.show()
185185

186-
# pmv.save_fig(fig, f"{FIGS}/element-prevalence-vs-error.svelte")
186+
pmv.save_fig(fig, f"{SITE_FIGS}/element-prevalence-vs-error.svelte")
187187
pmv.save_fig(fig, f"{PDF_FIGS}/element-prevalence-vs-error.pdf")
188188

189189

site/src/figs/bar-element-counts-mp+wbm-normalized=False.svelte

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

site/src/figs/box-hull-dist-errors-only-compliant.svelte

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

site/src/figs/box-hull-dist-errors.svelte

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

site/src/figs/cumulative-precision-recall-only-compliant.svelte

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

site/src/figs/cumulative-precision-recall.svelte

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

site/src/figs/element-prevalence-vs-error.svelte

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

site/src/figs/hist-clf-pred-hull-dist-models-9x2.svelte

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

site/src/routes/+layout.svelte

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,6 @@
2424
'/api': `API docs for the Matbench Discovery PyPI package.`,
2525
'/contribute': `Steps for contributing a new model to the benchmark.`,
2626
'/models': `Details on each model sortable by metrics.`,
27-
'/preprint': `The preprint released with the Matbench Discovery benchmark.`,
28-
'/preprint/iclr-ml4mat': `Extended abstract submitted to the ICLR ML4Materials workshop.`,
2927
}[url ?? ``]
3028
if (url && !description) console.warn(`No description for url=${url}`)
3129
$: title = url == `/` ? `` : `${url} • `
@@ -75,7 +73,11 @@
7573
<GitHubCorner href={repository} />
7674

7775
<Nav
78-
routes={[[`/home`, `/`], ...routes.filter((route) => route != `/changelog`)]}
76+
routes={[
77+
[`/home`, `/`],
78+
...routes.filter((route) => route != `/changelog`),
79+
[`/preprint`, `https://arxiv.org/abs/2308.14920`],
80+
]}
7981
style="padding: 0 var(--main-padding);"
8082
/>
8183

site/src/routes/+page.svelte

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
import { DiscoveryMetricsTable, model_is_compliant, MODEL_METADATA } from '$lib'
44
import Readme from '$root/readme.md'
55
import KappaNote from '$site/src/routes/kappa-note.md'
6+
import LandingPageFigs from '$site/src/routes/landing-page-figs.md'
67
import Icon from '@iconify/svelte'
78
import { pretty_num } from 'elementari'
89
import { Toggle, Tooltip } from 'svelte-zoo'
@@ -77,7 +78,7 @@
7778
<figure style="margin-top: 4em;" slot="metrics-table">
7879
<div class="discovery-set-toggle">
7980
{#each Object.entries(discovery_set_labels) as [key, { title, tooltip }]}
80-
<Tooltip text={tooltip} tip_style="z-index: 2; font-size: 0.8em;" max_width="3em">
81+
<Tooltip text={tooltip} tip_style="z-index: 2; font-size: 0.8em;">
8182
<button
8283
class:active={discovery_set === key}
8384
on:click={() => (discovery_set = key)}
@@ -181,6 +182,8 @@
181182
</Readme>
182183
<KappaNote />
183184

185+
<LandingPageFigs />
186+
184187
<style>
185188
figure {
186189
margin: 0;

site/src/routes/landing-page-figs.md

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
<script lang="ts">
2+
import { onMount } from 'svelte'
3+
import BoxHullDistErrors from '$figs/box-hull-dist-errors.svelte'
4+
import CumulativePrecisionRecall from '$figs/cumulative-precision-recall.svelte'
5+
import EachParityModels from '$figs/each-parity-models-9x2.svelte'
6+
import HistClfPredHullDistModels from '$figs/hist-clf-pred-hull-dist-models-9x2.svelte'
7+
import RocModels from '$figs/roc-models.svelte'
8+
import RollingMaeVsHullDistModels from '$figs/rolling-mae-vs-hull-dist-models.svelte'
9+
10+
let mounted: boolean = false
11+
onMount(() => (mounted = true))
12+
</script>
13+
14+
{#if mounted}
15+
<CumulativePrecisionRecall />
16+
17+
> @label:fig:cumulative-precision-recall Model precision and recall for thermodynamic stability classification as a function of number of materials ranked from most to least stable by each model.
18+
> CHGNet initially achieves the highest cumulative precision and recall.
19+
> Simulates materials discovery efforts of different sizes since a typical campaign will rank hypothetical materials by model-predicted hull distance from most to least stable and validate the most stable predictions first.
20+
> A higher fraction of correct stable predictions corresponds to higher precision and fewer stable materials overlooked correspond to higher recall.
21+
> This figure highlights how different models perform better or worse depending on the length of the discovery campaign.
22+
> The UIPs (CHGNet, M3GNet, MACE) are seen to offer significantly improved precision on shorter campaigns of ~20k or less materials validated as they are less prone to false positive predictions among highly stable materials.
23+
24+
{/if}
25+
26+
{#if mounted}
27+
<RocModels />
28+
29+
> @label:fig:roc-models Receiver operating characteristic (ROC) curve for each model. TPR/FPR = true/false positive rate. FPR on the $x$ axis is the fraction of unstable structures classified as stable. TPR on the $y$ axis is the fraction of stable structures classified as stable.
30+
31+
{/if}
32+
33+
{#if mounted}
34+
<BoxHullDistErrors />
35+
36+
> @label:fig:box-hull-dist-errors Box plot of interquartile ranges (IQR) of hull distance errors for each model. The whiskers extend to the 5th and 95th percentiles. The horizontal line inside the box shows the median. BOWSR has the highest median error, while Voronoi RF has the highest IQR. Note that MEGNet and CGCNN are the only models with a positive median. Their hull distance errors are biased towards more frequently predicting thermodynamic instability, explaining why they are closest to getting the overall number of stable structures in the test set right (see cumulative precision/recall in @fig:cumulative-precision-recall).
37+
38+
{/if}
39+
40+
{#if mounted}
41+
<RollingMaeVsHullDistModels style="place-self: center;" />
42+
43+
> @label:fig:rolling-mae-vs-hull-dist-models Universal potentials are more reliable classifiers because they exit the red triangle earliest.
44+
> These lines show the rolling MAE on the WBM test set as the energy to the convex hull of the MP training set is varied.
45+
> Lower is better.
46+
> Inside the large red 'triangle of peril', models are most likely to misclassify structures.
47+
> As long as a model's rolling MAE remains inside the triangle, its mean error is larger than the distance to the convex hull.
48+
> If the model's error for a given prediction happens to point towards the stability threshold at $E$<sub>above MP hull</sub> = 0, its average error will change the stability classification from true positive/negative to false negative/positive.
49+
> The width of the 'rolling window' box indicates the width over which prediction errors were averaged.
50+
51+
{/if}
52+
53+
{#if mounted}
54+
<EachParityModels />
55+
56+
> @label:fig:each-parity-models Parity plots of model-predicted energy distance to the convex hull (based on their formation energy predictions) vs DFT ground truth, color-coded by log density of points.
57+
> Models are sorted left to right and top to bottom by MAE.
58+
59+
{/if}
60+
61+
{#if mounted}
62+
<HistClfPredHullDistModels />
63+
64+
> @label:fig:hist-clf-pred-hull-dist-models Distribution of model-predicted hull distance colored by stability classification. Models are sorted from top to bottom by F1 score. The thickness of the red and yellow bands shows how often models misclassify as a function of how far away from the convex hull they place a material. While CHGNet and M3GNet perform almost equally well overall, these plots reveal that they do so via different trade-offs. M3GNet commits fewer false negatives but more false positives predictions compared to CHGNet. In a real discovery campaign, false positives have a higher opportunity cost than false negatives, since they result in wasted DFT relaxations or even synthesis time in the lab. A false negative by contrast is just one missed opportunity out of many. For this reason, models with high true positive rate (TPR) even at the expense of lower true negative rate (TNR) are generally preferred.
65+
66+
{/if}

site/src/routes/models/tmi/+page.svelte

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -20,12 +20,14 @@ Stuff that didn't make the cut into the&nbsp;<a href="/models">model page</a>.
2020

2121
<h2>Does error correlate with element prevalence in training set?</h2>
2222

23-
Answer: not much. You might (or might not) expect the more examples of structures
24-
containing a certain element models have seen in the training set, the smaller their
25-
average error on test set structures containing that element. That's not what we see in
26-
this plot. E<sub>above hull</sub> is all over the place as a function of elemental
27-
training set prevalence. Could be because the error is dominated by the least abundant
28-
element in composition or the model errors are more dependent on geometry than chemistry.
23+
Answer: not much. You might expect the more examples of structures containing a certain
24+
element models have seen in the training set, the smaller their average error on test set
25+
structures containing that element. That's not what we see in this plot. E<sub
26+
>above hull</sub
27+
>
28+
is all over the place as a function of elemental training set prevalence. Could be because
29+
the error is dominated by the least abundant element in composition or the model errors
30+
are more dependent on geometry than chemistry.
2931

3032
{#if browser}
3133
<ElementPrevalenceVsErr style="margin: 2em 0;" />

site/src/routes/preprint/+layout.server.ts

Lines changed: 0 additions & 10 deletions
This file was deleted.

site/src/routes/preprint/+layout.svelte

Lines changed: 0 additions & 91 deletions
This file was deleted.

0 commit comments

Comments
 (0)